Building a Gradual Type Checker for Lua
If you’ve ever wondered how rust-analyzer can respond in milliseconds while you type, the answer is Salsa. Salsa is the incremental computation framework that powers rust-analyzer’s query system, and this tutorial teaches it from the ground up by building something real: a gradual type checker for Lua.
We’ll use lex_lua to turn Lua source text into tokens, analisar to turn those tokens into an AST, and salsa to make the whole thing incremental. When a file changes, we only re-check what’s affected. No prior Salsa or type-theory experience is needed, but you should be comfortable with Rust. We build from zero.
Why Salsa?
Imagine you’re building a language server. Every time the user types a character, you need to re-check their code. The naive approach — re-parse every file, re-infer every type, re-emit every diagnostic — works fine for a ten-line script. It falls apart for a real project with hundreds of files and thousands of functions. You can’t afford to redo everything on every keystroke.
Salsa is the answer to that problem. It’s an incremental computation framework: you write your logic as pure functions called queries, and Salsa takes care of caching the results and only re-running queries whose inputs actually changed. Change one file, and Salsa skips every query that doesn’t depend on it. The caching is automatic. You don’t write cache invalidation logic. Salsa infers the dependency graph from what your queries actually read, so the graph is always correct and can never get out of sync with the code.
That’s the core value, but Salsa also handles the harder edges. Recursive queries that would loop forever are caught and reported as cycles. Diagnostics — errors, warnings — flow through a separate channel called accumulators, so reporting a type error doesn’t poison the query graph and prevent other checks from running.
This is the same architecture that makes rust-analyzer fast, and by the end of this tutorial, you’ll understand how it works because you’ll have built one yourself.
What You’ll Build
Seventeen chapters, one type checker. By the end, you’ll have:
- A Salsa-powered incremental type checker that re-checks only what changed
- Support for LuaCATS annotations (
---@type,---@param,---@return,---@generic) - Union types (
number|string), generic functions (identity(x: T): T), and recursive types (Node|nil) - Type narrowing (
type(x) == "number"makesxanumberinside theif) - Cross-file type checking with
require()— edit one file, invalidate only its dependents - A language server that responds incrementally as you type
Each feature builds on the ones before it. Salsa’s incrementality makes them all fast; the type system catches what would otherwise be runtime errors.
A Note on Code Structure
Each chapter is a standalone Rust crate — it compiles and runs independently. This means the AST types (Statement, Expression, BinOp, UnaryOp), the parser conversion functions (convert_stmt, convert_expr), the type environment (TypeEnv), and the type inference logic (infer_type) are repeated across chapters with incremental additions.
In production code, you’d extract these into a shared library crate. For a tutorial, the duplication is intentional: each chapter is self-contained, so you can jump to any chapter and have the full context without chasing imports across crates. The tradeoff is that fixes to shared code need to be applied in each chapter — but that’s a small price for the readability win.
Version Note
This tutorial uses Salsa 0.26. Salsa’s API changed significantly between the “Salsa 2022” era (0.12 and below) and the current 0.20+ line — the #[salsa::input], #[salsa::tracked], and #[salsa::accumulator] macros, the Database trait, and the query system are all different. If you’re coming from an older Salsa tutorial, you’ll need to re-learn the API. The concepts (incremental computation, dependency tracking, cycle detection) are the same. The syntax is not.
If Salsa has released a newer version by the time you read this, the concepts should still apply, but macro names, trait signatures, and generated method names may differ. Check the Salsa docs for the current API.
License
MIT
Chapter 1: Hello Salsa — Inputs and Tracked Functions
Imagine you’re cooking from a recipe. You’ve got your ingredients laid out on the counter — those are your inputs. They come from outside the kitchen: the grocery store, the garden, the pantry. You don’t produce them; you start with them.
Now you follow the recipe steps to turn those ingredients into a dish. Those steps are your tracked functions — pure transformations that depend only on what you put in. If you swap the butter for margarine, only the steps that used butter need to change. The step where you toasted the sesame seeds? That’s still fine. No need to redo it.
Salsa works the same way. You declare your inputs (the facts that come from outside — source code, configuration), you write pure functions that derive results from those inputs, and Salsa keeps track of which functions read which inputs. When an input changes, Salsa only re-runs the functions that actually depended on it. Everything else returns its cached result instantly.
If you’ve used a spreadsheet, this will feel familiar — type a number in a cell, and every formula that references it recalculates. Salsa is the same idea, but for Rust functions instead of spreadsheet cells. The difference is that Salsa’s dependencies are dynamic: a function reads whatever it reads at runtime, and Salsa tracks those reads automatically. No need to wire up the references by hand.
This matters because in a real type checker, you might have hundreds of source files and thousands of derived results. You can’t recompute everything on every keystroke. Salsa gives you the infrastructure to skip work — automatically.
Let’s see how this works in code.
Step 1: Define Your Input
In Salsa, inputs are the facts that come from outside the system. For a type checker, the obvious input is: the source code.
#![allow(unused)]
fn main() {
use salsa::Setter;
use std::path::PathBuf;
#[salsa::input]
pub struct SourceFile {
#[returns(ref)]
pub path: PathBuf,
#[returns(ref)]
pub text: String,
}
}
The #[salsa::input] attribute does a lot of work behind the scenes. It generates a constructor so you can create a SourceFile by calling SourceFile::new(&db, path, text), getter methods so you can read the fields, and setter methods so you can change them later. We’ll see all of these in action once we define our database.
The #[returns(ref)] attribute tells Salsa that the getter should return a reference (&String) instead of an owned value. This avoids cloning strings every time you read an input.
Key idea: Inputs are the only way data enters the system. Everything else is derived from them. This is what makes Salsa’s incrementality work — if you can trace every piece of data back to an input, you know exactly what needs to re-run when an input changes.
Step 2: Define a Tracked Function
A #[salsa::tracked] function is a pure function of its inputs. Pure means: no side effects, no reading from global state, no randomness. Everything goes through the database.
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn line_count(db: &dyn salsa::Database, source: SourceFile) -> u32 {
let text = source.text(db);
text.lines().count() as u32
}
}
The first argument is always the database (we’ll refine this in later chapters). The remaining arguments are the “keys” — what distinguishes one call from another.
When you call line_count(&db, source), Salsa does this:
- Check: have I seen this query before, with these arguments, in this revision?
- If yes → return the cached result. No re-execution.
- If no → run the function, cache the result, return it.
The function body calls source.text(db). This read is tracked. Salsa records: “line_count(source) depends on source.text.” Later, if you change the source text, Salsa knows this cache entry is stale.
What if you forget #[salsa::tracked]? Then Salsa can’t see inside the function. It doesn’t know which inputs the function reads, so it can’t tell whether the cached result is still valid. Without the annotation, you’d have a plain function — no caching, no dependency tracking, no incremental recomputation. The annotation is what makes Salsa’s guarantee possible.
Let’s add one more query to see how dependencies work:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn contains_text(db: &dyn salsa::Database, source: SourceFile, needle: String) -> bool {
let text = source.text(db);
text.contains(&needle)
}
}
Same pattern: read an input, compute a result, cache it. The needle parameter becomes part of the cache key — contains_text(db, source, "print") and contains_text(db, source, "local") are independent cache entries.
A word on String as a key: Using String as a query key means every call with a different needle string creates a new cache entry and allocates the string on the heap. For a tutorial, this is fine — it demonstrates key parameters clearly. In production code, you’d want to use interned strings (we’ll build those in Chapter 3) so that comparisons are cheap and allocations happen only once.
Step 3: Define Your Database
A database is the container that holds all the cached query results.
#![allow(unused)]
fn main() {
#[salsa::db]
#[derive(Default)]
pub struct Database {
storage: salsa::Storage<Self>,
}
#[salsa::db]
impl salsa::Database for Database {}
}
You need three things:
- A
salsa::Storage<Self>field — this is where Salsa keeps its memo tables, revision counters, and dependency graphs. - An impl of
salsa::Database— marked with#[salsa::db]. - The same
#[salsa::db]attribute on the struct itself.
Right now our database is empty — it doesn’t have any custom behavior. In later chapters, we’ll add methods and custom traits. For now, it’s a container — nothing more.
Step 4: Use It
use std::path::PathBuf;
fn main() {
let mut db = Database::default();
let source = SourceFile::new(
&db,
PathBuf::from("main.lua"),
"local x = 1\nlocal y = 2\nprint(x + y)\n".to_string(),
);
let count = line_count(&db, source);
assert_eq!(count, 3);
}
Nothing magical yet — we create a database, create an input, and query it. The result is computed and cached.
The Magic: Revisions
Now let’s change the input and see what happens:
#![allow(unused)]
fn main() {
use salsa::Setter;
source.set_text(&mut db).to("local z = 99\n".to_string());
let new_count = line_count(&db, source);
assert_eq!(new_count, 1);
}
The setter syntax is a builder pattern: set_text(&mut db) returns a setter object, and .to(...) applies the new value and bumps the revision. You might expect set_text(&mut db, value) — a regular method call — but the builder pattern lets Salsa batch multiple field updates in a single revision. For now, we’re setting one field at a time.
When we call set_text, Salsa increments its revision counter. This is Salsa’s internal clock — every input mutation bumps the revision. Each cached query remembers which revision it was computed in.
When we query line_count again, Salsa checks: “Is the current revision newer than when I last computed this?” Yes. “Did the inputs this query depends on actually change?” Yes — we set new text. So it re-runs the function.
If we query line_count again without changing the text, Salsa returns the cached result instantly. No re-execution:
#![allow(unused)]
fn main() {
let same_count = line_count(&db, source);
assert_eq!(same_count, 1); // cached! The function body never ran.
}
The second call is effectively a HashMap lookup — Salsa sees the same arguments in the same revision and returns the memoized value. The line_count function body doesn’t execute.
Per-Input Isolation
Salsa doesn’t cache globally — it caches per input. When you change one input, only the queries that actually read that input are invalidated. Everything else stays cached.
#![allow(unused)]
fn main() {
use salsa::Setter;
use std::path::PathBuf;
let other = SourceFile::new(&db, PathBuf::from("other.lua"), "return 42\n".to_string());
let other_count = line_count(&db, other); // computed, cached
source.set_text(&mut db).to("local a = 1\nlocal b = 2\n".to_string());
let other_count_again = line_count(&db, other); // cached! No re-run!
assert_eq!(other_count_again, 1);
}
We changed source’s text, not other’s. When Salsa asks “is line_count(other) still valid?”, it checks whether other’s text changed. It didn’t — so the cached value is returned instantly. source and other are isolated from each other.
In a real type checker with hundreds of files, typing in one file only invalidates queries that read that file. Queries for other files are still cached. This is why rust-analyzer can respond in milliseconds even on large projects — it’s not re-type-checking the whole world on every keystroke.
Next: Chapter 2: Parsing Lua with Analisar — We’ll parse actual Lua source code using the analisar parser and wire it into Salsa as a tracked query. Same incremental model, but now doing real work.
Chapter 2: Parsing Lua — From Source Text to a Queryable AST
You have a Lua file. You want to type-check it. First problem: you need to parse it. But Lua’s grammar has 80+ production rules, operator precedence tables, and ambiguous constructs like a:b(c) vs a.b(c). Writing a Lua parser from scratch would be its own tutorial.
Good news: the analisar crate already parses Lua. We’ll use it. The real work isn’t parsing — it’s wiring the parser into Salsa so that changing the source text automatically re-parses and re-type-checks.
Why Separate Parsing from Type-Checking?
Here’s a question you might be asking: why not type-check directly from the source text? Why have a parsing step at all?
Parsing is expensive, type-checking is more expensive. If you mash them together, you lose the ability to cache them independently. A change that only affects one function’s body shouldn’t require re-parsing the entire file — and it shouldn’t require re-type-checking unrelated functions either.
Different invalidation granularity. The parser runs on the whole source text. The type checker runs on individual expressions. By separating them, Salsa can skip the type checker when only the parse result changes but the types stay the same (this happens less often, but it can happen — think comments or whitespace that don’t affect the AST structure).
The pipeline so far:
SourceFile (input)
│
▼
parse() ← tracked, cached
│
▼
LuaAst ← owned data
│
▼
derived queries (type checking, name resolution, etc.)
Step 1: The Same Input
We reuse the SourceFile input from Chapter 1:
#![allow(unused)]
fn main() {
use std::path::PathBuf;
#[salsa::input]
pub struct SourceFile {
#[returns(ref)]
pub path: PathBuf,
#[returns(ref)]
pub text: String,
}
}
Nothing new here. The source text is our input, and Salsa tracks when it changes.
Step 2: An Owned AST
Analisar’s AST uses borrowed strings (Cow<'a, str> for identifiers). This makes sense for analisar — it avoids allocating copies. But Salsa tracked functions need to return owned data. Why? Because the cached result has to survive across revisions. If the result borrows from the source text, and the source text changes, you’ve got a dangling reference.
So we define our own owned AST types:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub struct LuaAst {
pub statements: Vec<Statement>,
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub enum Statement {
/// `local x = expr` or `x = expr`
Assignment {
local: bool,
targets: Vec<Expression>,
values: Vec<Expression>,
},
/// A bare expression statement (usually a function call)
Expression(Expression),
/// `return expr, ...`
Return(Vec<Expression>),
/// `function name(args) block end`
Function {
local: bool,
name: String,
params: Vec<String>,
body: Vec<Statement>,
},
/// `if cond then block [elseif cond then block]* [else block] end`
If {
test: Expression,
then_block: Vec<Statement>,
else_block: Option<Vec<Statement>>,
},
/// `while cond do block end`
While {
condition: Expression,
body: Vec<Statement>,
},
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub enum Expression {
Nil,
True,
False,
Number(String),
StringLiteral(String),
Name(String),
BinaryOp {
left: Box<Expression>,
op: BinOp,
right: Box<Expression>,
},
UnaryOp {
op: UnaryOp,
expr: Box<Expression>,
},
FunctionCall {
func: Box<Expression>,
args: Vec<Expression>,
},
FieldAccess {
object: Box<Expression>,
field: String,
},
Index {
object: Box<Expression>,
index: Box<Expression>,
},
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, salsa::Update)]
pub enum BinOp {
Add, Sub, Mul, Div, Mod, Pow,
Concat, Lt, Gt, Le, Ge, Eq, Ne,
And, Or,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, salsa::Update)]
pub enum UnaryOp {
Negate, Not, Length,
}
}
That’s a lot of types. But notice the pattern — every type derives salsa::Update, every string is owned (String, not Cow<'a, str>), and recursive structures use Box. These aren’t arbitrary choices. salsa::Update is required for any type used as a tracked function’s return value — Salsa caches the results of tracked functions, and when inputs change, it needs to compare old and new values to decide whether downstream queries need to re-run. For simple types (numbers, strings), it’s a straightforward equality check; for nested types, it recurses into fields. If you forget this derive, the compiler will remind you. Owned Strings are necessary because Salsa caches the result across revisions — if the AST borrowed from the source text and the source text changed, you’d have dangling references. The conversion from analisar’s Cow<'a, str> to our String happens in the convert_ast function; that’s the boundary where borrowed data becomes owned data. And Box for recursive types is Rust’s rule: a type can’t contain itself by value, so Expression::BinaryOp { left: Box<Expression>, ... } compiles, but left: Expression wouldn’t.
But wait — isn’t cloning expensive? Yes, for large files. That’s why in Chapter 3 we’ll use #[salsa::interned] for strings, and in Chapter 4 we’ll use #[salsa::tracked] for AST nodes. Both avoid cloning. For now, we keep it simple with owned Strings.
Step 3: The Parse Query
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn parse(db: &dyn salsa::Database, source: SourceFile) -> LuaAst {
let text = source.text(db);
convert_ast(text)
}
}
This is the key tracked function. It takes a SourceFile (input) and returns a LuaAst (owned data). Salsa caches this — if the source text hasn’t changed since the last revision, the cached AST is returned without re-parsing.
The convert_ast function is where we bridge analisar’s borrowed world into our owned world. We walk analisar’s AST node by node and build our own representation. It’s mechanical — mapping one enum variant to another — but it’s important to understand why we do it. analisar gives us a correct but borrowed AST; our AST is owned, hashable, and Salsa-compatible; and the conversion is the boundary between “external library” and “Salsa system.”
This boundary pattern — converting external data into Salsa-owned data at the query boundary — is common in Salsa projects. rust-analyzer does the same thing with the Rust parser.
The Conversion in Practice
The pattern is simple: each analisar variant maps to one of our variants. Here are three representative examples that show the key ideas:
#![allow(unused)]
fn main() {
// 1. Direct mapping — change the type, keep the structure
ast::Expression::Name(n) => Expression::Name(n.name.into_owned()),
// 2. Recursive wrapping — Box the children and recurse
ast::Expression::BinOp { left, op, right } => Expression::BinaryOp {
left: Box::new(convert_expr(*left)),
op: convert_binop(op),
right: Box::new(convert_expr(*right)),
},
// 3. Splitting a combined variant — analisar's Suffixed becomes FieldAccess or Index
ast::Expression::Suffixed(s) => {
let subject = convert_expr(s.subject);
// `s.property` is analisar's name for the suffix part — it could be
// a field name (an Expression::Name) or an index expression.
if !s.computed && !s.method {
// Field access: a.b — the property is always a Name in this case
if let Expression::Name(field) = convert_expr(s.property) {
Expression::FieldAccess { object: Box::new(subject), field }
} else {
// Shouldn't happen for a.b, but handle gracefully
Expression::FieldAccess { object: Box::new(subject), field: "<unknown>".to_string() }
}
} else if s.computed {
// Index: a[b]
Expression::Index { object: Box::new(subject), index: Box::new(convert_expr(s.property)) }
} else {
// Method calls (a:b()) — not modeled, placeholder keeps things compiling
Expression::FieldAccess { object: Box::new(subject), field: "<method-call>".to_string() }
}
}
}
n.name.into_owned() is where Cow<'a, str> becomes String. The analisar parser uses Cow<'a, str> for name fields — it can borrow from the source text during parsing, avoiding allocations. We need owned String for our AST (no lifetime parameter), so we call .into_owned(), a standard method on Cow. If you haven’t met Cow before: it’s a clone-on-write smart pointer that holds either a borrowed &str or an owned String. Calling .into_owned() extracts the owned String (or clones the borrowed &str into one) — either way, you get a String with no lifetime attached.
The Suffixed split is where analisar’s combined representation diverges from our simpler model. Analisar represents a.b, a[b], and a:b() with a single Suffixed node, and we split them into FieldAccess and Index. Method calls (a:b()) get a <method-call> placeholder — our AST doesn’t model them, but using a descriptive string instead of "?" means a type checker would report “unknown variable <method-call>” rather than the confusing “unknown variable ?”. In a production compiler you’d add a proper MethodCall variant; for the tutorial, the placeholder keeps things compiling and makes the simplification visible.
The _ => Expression::Nil fallback handles analisar variants we haven’t modeled yet (like table constructors). In a real project you’d add those over time. For the tutorial, silently falling back to Nil keeps things compiling without requiring every Lua construct up front.
The rest of the conversion follows the same pattern: match on the analisar variant, call convert_expr / convert_statement on children, wrap Box::new around recursive references, and call .into_owned() on Cow fields. Here’s the complete code — every variant, every helper function:
Complete Conversion Code
#![allow(unused)]
fn main() {
fn convert_ast(lua_text: &str) -> LuaAst {
let mut parser = LuaParser::new(lua_text.as_bytes());
let mut statements = Vec::new();
while let Some(Ok(stmt)) = parser.next() {
statements.push(convert_statement(stmt));
}
LuaAst { statements }
}
}
convert_ast creates an analisar parser, pulls statements one at a time, converts each, and wraps them in our LuaAst struct.
#![allow(unused)]
fn main() {
fn convert_statement(stmt: ast::Statement) -> Statement {
match stmt {
ast::Statement::Assignment { local, targets, values } => Statement::Assignment {
local,
targets: targets.into_iter().map(convert_expr).collect(),
values: values.into_iter().map(convert_expr).collect(),
},
ast::Statement::Expression(expr) => Statement::Expression(convert_expr(expr)),
ast::Statement::Return(ret) => {
Statement::Return(ret.0.into_iter().map(convert_expr).collect())
}
ast::Statement::Function { local, name, body } => Statement::Function {
local,
name: convert_func_name(&name),
params: body.par_list.names.0.into_iter().map(|n| n.name.into_owned()).collect(),
body: convert_block(body.block),
},
ast::Statement::If(if_stmt) => Statement::If {
test: convert_expr(if_stmt.test),
then_block: convert_block(if_stmt.block),
else_block: if_stmt.catch_all.map(convert_block),
},
ast::Statement::While { exp, block } => Statement::While {
condition: convert_expr(exp),
body: convert_block(block),
},
// Skip less common statement types for now
_ => Statement::Expression(Expression::Nil),
}
}
}
The Function variant shows a wrinkle: analisar gives us a FuncName type (with dot-separated names and optional method syntax), and we convert it to a plain String via the convert_func_name helper. The body.par_list.names field contains the parameter names — each one has a .name field that’s a Cow<'a, str>, so we call .into_owned() to get our owned String.
#![allow(unused)]
fn main() {
fn convert_block(block: ast::Block) -> Vec<Statement> {
block.0.into_iter().map(convert_statement).collect()
}
fn convert_func_name(name: &ast::FuncName) -> String {
let base: String = name.dot_separated.iter()
.map(|n| n.name.as_ref())
.collect::<Vec<_>>()
.join(".");
match &name.method {
Some(m) => format!("{}:{}", base, m.name),
None => base,
}
}
}
convert_func_name handles Lua’s function name syntax: foo.bar:baz becomes the string "foo.bar:baz". The dot_separated field is a vec of name segments; the optional method field adds the colon-separated method name.
#![allow(unused)]
fn main() {
fn convert_expr(expr: ast::Expression) -> Expression {
match expr {
ast::Expression::Nil => Expression::Nil,
ast::Expression::True => Expression::True,
ast::Expression::False => Expression::False,
ast::Expression::Numeral(n) => Expression::Number(n.0.into_owned()),
ast::Expression::LiteralString(s) => Expression::StringLiteral(s.0.to_string()),
ast::Expression::Name(n) => Expression::Name(n.name.into_owned()),
ast::Expression::BinOp { left, op, right } => Expression::BinaryOp {
left: Box::new(convert_expr(*left)),
op: convert_binop(op),
right: Box::new(convert_expr(*right)),
},
ast::Expression::UnaryOp { op, exp } => Expression::UnaryOp {
op: convert_unaryop(op),
expr: Box::new(convert_expr(*exp)),
},
ast::Expression::FuncCall(call) => Expression::FunctionCall {
func: Box::new(convert_expr(*call.prefix)),
args: convert_args(call.args),
},
ast::Expression::Suffixed(s) => {
let subject = convert_expr(s.subject);
if !s.computed && !s.method {
// Field access: a.b
if let Expression::Name(field) = convert_expr(s.property) {
Expression::FieldAccess {
object: Box::new(subject),
field,
}
} else {
// Shouldn't happen for a.b, but handle gracefully
Expression::FieldAccess {
object: Box::new(subject),
field: "<unknown>".to_string(),
}
}
} else if s.computed {
// Index: a[b]
Expression::Index {
object: Box::new(subject),
index: Box::new(convert_expr(s.property)),
}
} else {
// Method calls (a:b()) — placeholder keeps things compiling
Expression::FieldAccess {
object: Box::new(subject),
field: "<method-call>".to_string(),
}
}
}
_ => Expression::Nil,
}
}
}
Most variants are direct mappings — change the type, preserve the structure. The Suffixed case is the interesting one (we covered it in the examples above). The _ => Expression::Nil fallback catches table constructors and other Lua constructs we’re not modeling.
#![allow(unused)]
fn main() {
fn convert_args(args: ast::Args) -> Vec<Expression> {
match args {
ast::Args::ExpList(exprs) => exprs.into_iter().map(convert_expr).collect(),
ast::Args::String(s) => vec![Expression::StringLiteral(s.0.to_string())],
ast::Args::Table(_) => {
// Table arguments not supported in this simplified tutorial
vec![]
}
}
}
fn convert_binop(op: ast::BinaryOperator) -> BinOp {
match op {
ast::BinaryOperator::Add => BinOp::Add,
ast::BinaryOperator::Subtract => BinOp::Sub,
ast::BinaryOperator::Multiply => BinOp::Mul,
ast::BinaryOperator::Divide => BinOp::Div,
ast::BinaryOperator::Modulo => BinOp::Mod,
ast::BinaryOperator::Power => BinOp::Pow,
ast::BinaryOperator::Concatenate => BinOp::Concat,
ast::BinaryOperator::LessThan => BinOp::Lt,
ast::BinaryOperator::GreaterThan => BinOp::Gt,
ast::BinaryOperator::LessThanEqual => BinOp::Le,
ast::BinaryOperator::GreaterThanEqual => BinOp::Ge,
ast::BinaryOperator::Equal => BinOp::Eq,
ast::BinaryOperator::NotEqual => BinOp::Ne,
ast::BinaryOperator::And => BinOp::And,
ast::BinaryOperator::Or => BinOp::Or,
_ => todo!("bitwise binary operator not yet supported"),
}
}
fn convert_unaryop(op: ast::UnaryOperator) -> UnaryOp {
match op {
ast::UnaryOperator::Negate => UnaryOp::Negate,
ast::UnaryOperator::Not => UnaryOp::Not,
ast::UnaryOperator::Length => UnaryOp::Length,
ast::UnaryOperator::BitwiseNot => todo!("bitwise not not yet supported"),
}
}
}
convert_binop and convert_unaryop are straightforward mappings — with one important detail. The _ => catch-all and the BitwiseNot arm used to silently map to BinOp::Add and UnaryOp::Negate respectively. That’s worse than it sounds: ~0xFF would silently become -256, and 5 & 3 would silently become 5 + 3. No compilation error, no panic — a wrong answer with no indication that the conversion was the culprit.
We use todo!() instead. This panics at runtime if you hit an unsupported operator, which is loud and obvious. That’s the right trade-off for a tutorial: you want to know immediately when you’ve reached a construct the code doesn’t handle, not discover it later through subtly wrong results. The _ => Expression::Nil fallback earlier follows the same principle — Nil is obviously wrong in context, so a reader would spot the problem immediately. Silent remaps to plausible values are the dangerous case.
Step 4: Query Chaining
Now that we have parse(), we can build derived queries on top of it:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn top_level_names(db: &dyn salsa::Database, source: SourceFile) -> Vec<String> {
let ast = parse(db, source); // depends on parse
let mut names = Vec::new();
for stmt in &ast.statements {
match stmt {
Statement::Assignment { local: true, targets, .. } => {
for target in targets {
if let Expression::Name(name) = target {
names.push(name.clone());
}
}
}
Statement::Function { local: true, name, .. } => {
names.push(name.clone());
}
_ => {}
}
}
names
}
}
top_level_names calls parse(db, source), so it depends on the parse result. If the source changes:
parse()re-runs (its input changed)top_level_names()re-runs (its dependency — parse — produced a new result)
But here’s a subtlety: if parse() returns a different LuaAst (the source changed in a way that affects the AST), top_level_names() re-runs. If parse() returns the same LuaAst (e.g., you changed a comment that analisar doesn’t include in the AST), top_level_names() returns its cached value.
Salsa checks the value of dependencies, not only whether they were re-executed. This is called verify re-execution: Salsa re-runs parse(), compares the result to the old cache, and only propagates the change if the value is actually different.
Running
cargo run --bin ch02-parsing-lua
A Note on Code Structure
Each chapter in this tutorial is a standalone binary with its own copy of the AST types, environment, and inference logic. This is intentional — it means you can open any chapter and see the complete, working code without chasing imports across multiple files.
The downside is obvious: any bug found in one chapter exists in all of them. In production code, you’d extract shared types into a library crate. For a tutorial, self-containedness wins — you never have to wonder “which version of this function am I looking at?”
Next: Chapter 3: Interned Symbols — We’ll use #[salsa::interned] for identifiers, which lets us compare names by ID instead of by string comparison. This is how compilers handle identifiers efficiently.
Chapter 3: Interned Symbols — Fast Comparison by Giving Names Identity
Our AST uses String for every identifier. That works — until you realize what happens at scale. Every name lookup compares strings byte-by-byte. "print" == "print" walks five characters on both sides. In a file with hundreds of references, that adds up. And when Salsa caches a query result keyed by a string, it hashes the entire string to check the cache.
The Fix: Interning
Compilers solve this with interning: allocate each unique string once, and refer to it by ID. The first time you see "print", you store it and get back ID 42. Every subsequent "print" maps to the same ID 42. Name comparison becomes a single integer comparison — O(1), no string walking, no hashing. Two symbols are equal if and only if they have the same database-assigned ID.
This isn’t a new idea — Lisp systems have done it since the 1960s. Salsa gives you a first-class way to do it.
Without interning: With interning:
"print" == "print" Symbol(42) == Symbol(42)
│ │ │ │
▼ ▼ ▼ ▼
p == p → yes 42 == 42 → yes
r == r → yes Done. One comparison.
i == i → yes
n == n → yes
t == t → yes
5 character comparisons 1 integer comparison
(and a hash for Salsa caching) (and no hash — the ID IS the hash)
Same answer. Different cost. When your type checker compares thousands of names per file, that difference compounds.
#[salsa::interned] vs #[salsa::input]
Both live in the database. Both get IDs. But they’re different:
#[salsa::input] | #[salsa::interned] | |
|---|---|---|
| Created by | Outside code (your main function) | Inside tracked functions |
| Deduplication | No — each new() call creates a new entry | Yes — same field values → same ID |
| Mutable? | Yes — setters for each field | No — immutable once created |
| Lifecycle | Survives across revisions | Survives across revisions |
| Lifetime | No 'db parameter | Has 'db lifetime |
Inputs are for data that comes from outside the system and can change (source files, configuration). Interned values are for data that’s derived inside queries and never changes (identifiers, keywords).
Step 1: Define an Interned Symbol
#![allow(unused)]
fn main() {
#[salsa::interned(debug)]
pub struct Symbol<'db> {
#[returns(ref)]
pub text: String,
}
}
The 'db lifetime ties the symbol to the database — you can’t hold a Symbol after the database is dropped. The #[returns(ref)] on text means symbol.text(db) returns &str instead of String, avoiding a clone.
When you call Symbol::new(db, "print".to_string()):
- Salsa checks: have I seen
"print"before? - If yes → return the existing ID. No allocation.
- If no → store the string, assign a new ID, return it.
Step 2: The Guarantee
#![allow(unused)]
fn main() {
let s1 = Symbol::new(&db, "print".to_string());
let s2 = Symbol::new(&db, "print".to_string());
let s3 = Symbol::new(&db, "write".to_string());
assert_eq!(s1, s2); // same string → same ID → equal
assert_ne!(s1, s3); // different string → different ID → not equal
}
s1 == s2 is a single integer comparison. Not a string comparison. In a type checker doing millions of name lookups, this matters.
Step 3: Definitions and the Symbol Table
Before we can look up names, we need to represent what a definition is — the thing a name refers to. A definition has a name, a kind (variable, function, parameter), and whether it’s local:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub struct Definition {
pub name: String,
pub is_local: bool,
pub kind: DefKind,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, salsa::Update)]
pub enum DefKind {
Variable,
Function,
Parameter,
}
}
Why salsa::Update on these? Because they’re stored inside tracked query results (like Vec<Definition>). Salsa needs to compare old and new values when inputs change, and salsa::Update provides the recursive equality check. If you forget this derive, the compiler will catch it — you’ll get an error when you use Definition in a tracked function’s return type.
Why String instead of Symbol<'db> in Definition? That’s the design tension we’ll get to in a moment.
Now we can build the symbol table itself:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn symbol_table(db: &dyn salsa::Database, source: SourceFile) -> Vec<Definition> {
let ast = parse(db, source);
let mut defs = Vec::new();
for stmt in &ast.statements {
match stmt {
Statement::Assignment { local, targets, .. } => {
for target in targets {
if let Expression::Name(name) = target {
defs.push(Definition {
name: name.clone(),
is_local: *local,
kind: DefKind::Variable,
});
}
}
}
Statement::Function { local, name, params, .. } => {
defs.push(Definition {
name: name.clone(),
is_local: *local,
kind: DefKind::Function,
});
for param in params {
defs.push(Definition {
name: param.clone(),
is_local: true,
kind: DefKind::Parameter,
});
}
}
_ => {}
}
}
defs
}
}
This walks the AST and collects every name binding — local variables, function names, and function parameters. It’s a derived query: it depends on parse(), so when the source changes, the symbol table is recomputed.
Step 4: Looking Up Names with Interned Comparison
Now we can look up a name using interned comparison:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn lookup_name(db: &dyn salsa::Database, source: SourceFile, name: String) -> Option<Definition> {
// Intern the lookup name — this gives us an integer ID for fast comparison
let symbol = Symbol::new(db, name.clone());
let defs = symbol_table(db, source);
for def in &defs {
// Intern each definition name, then compare Symbols by ID
let def_symbol = Symbol::new(db, def.name.clone());
if def_symbol == symbol {
return Some(def.clone());
}
}
None
}
}
Each comparison is a single integer equality check — Symbol’s Eq impl compares the underlying Salsa IDs, not the string contents. The == on two Symbol values compiles to the same thing as comparing two u32 values.
You might notice: for a linear scan, interning doesn’t save much. We’re converting each String to a Symbol on every call, and the scan itself is O(n). The real payoff comes when the symbol table is indexed — a HashMap<Symbol, Definition> where the key is the interned ID. Symbol implements Hash via its Salsa ID, so HashMap lookups are integer hashes, not string hashes. But there’s a catch: building a HashMap for a single lookup is more work than a linear scan. The pattern only pays off when the map is cached as its own tracked query, so multiple lookups reuse it. Chapter 4 adds that cached map.
Why not use a HashMap<String, Definition> now and skip interning? Because String hashing walks every byte. Symbol hashing is a single integer. In a file with hundreds of lookups, that difference compounds. And there’s a conceptual benefit: with String, “equality” means “same bytes.” With Symbol, equality means “same identity” — the database said these are the same name. In practice, for well-formed Lua, the distinction doesn’t matter (“print” is always “print”). But the mental model shift matters: names have identity, not mere spelling.
We’re setting up the infrastructure here — the actual HashMap-indexed lookup arrives in Chapter 4, where the symbol table becomes a cached tracked query.
The design tension. You might wonder: why not store Symbol<'db> directly in Definition instead of String? Then we wouldn’t need to intern the definition names at map-construction time. The problem: Definition is returned from a tracked function, so Salsa stores it inside the database. Stored values can’t hold references into the database — and Symbol<'db> is exactly that (the 'db lifetime means “borrowed from the database”). You can’t store a database reference inside the database. This is a real constraint in Salsa — interned values give you fast comparison, but they can’t be stored in other tracked query results. Chapter 4’s tracked structs solve this: they’re first-class database entities, not plain data, so they CAN hold interned values.
When to Intern vs When Not To
Intern: identifiers compared frequently (variable names, function names, type names). The whole point is deduplication + fast comparison.
Don’t intern: one-off strings, large strings (you’d waste memory storing them forever), strings rarely compared. Not everything needs to be interned because you can.
A good rule of thumb: if you’d put it in a HashSet or use it as a HashMap key, intern it. If you’d display it once and forget it, don’t.
Running
cargo run --bin ch03-interned-symbols
Next: Chapter 4: Tracked Structs — The backbone of a Salsa-aware AST. We’ll convert our String-based AST into one that uses Salsa IDs, giving each AST node stable identity that survives edits.
Chapter 4: Tracked Structs — Entity Identity in Salsa
You have two functions in a file. You change one of them. Salsa should re-run only the queries that depend on that function — not every query in the file. But with the AST as a single input, Salsa can’t tell which function changed. The whole LuaAst is one blob: touch any part, and every query that depends on it re-runs.
The fix: each function needs its own identity in Salsa’s database. Not “part of the AST” — a separate entity with a stable ID that survives edits to other functions. That’s what tracked structs give you.
This chapter switches from whole-file parsing (the analisar crate from Chapters 2–3) to extracting individual function definitions by hand. The parsing is intentionally simple — string splitting, no nested parentheses. The point isn’t the parser, it’s the Salsa pattern: once you understand how tracked structs give functions identity, you could swap in a real parser that produces FuncDefs. (You can use analisar with tracked structs — you’d parse the whole file, then decompose the AST into individual FuncDef instances in a tracked query. The hand-rolled parser here is a simplification that keeps the focus on Salsa’s identity model, not on AST traversal.)
The Problem: Plain Data Has No Identity
Consider two Lua functions:
function add(a, b) return a + b end
function mul(a, b) return a * b end
Both have the same shape — two parameters, a body that’s a binary operation. But they’re different functions. Change add’s body, and mul shouldn’t be affected. With plain data, it is — both functions live in the same LuaAst struct, so any change invalidates everything.
You need identity. Each function needs a stable ID that survives edits to other functions. That’s what tracked structs give you.
Step 1: Define an Interned Function Name
Chapter 3 defined Symbol — a generic interned string. Here we need something more specific: an interned name that represents a function name. Same pattern, different purpose:
#![allow(unused)]
fn main() {
#[salsa::interned(debug)]
pub struct FuncName<'db> {
#[returns(ref)]
pub text: String,
}
}
Why not reuse Symbol? You could — in a small project, one interned string type is fine. But as a codebase grows, separate interned types act like newtypes: they prevent you from accidentally passing a variable name where a function name is expected. The type system catches the mistake. FuncName and Symbol have the same structure, but they’re not interchangeable — and that’s the point.
Step 2: Define a Tracked Struct
#![allow(unused)]
fn main() {
#[salsa::tracked(debug)]
pub struct FuncDef<'db> {
pub name: FuncName<'db>, // interned name (defined above)
#[tracked]
pub param_count: u32, // tracked field — per-field dependencies
#[tracked]
#[returns(ref)]
pub body_text: String, // tracked field — per-field dependencies
}
}
#[salsa::tracked] on a struct makes it database-resident. Each instance gets a unique Salsa ID, it’s stored in the database (not on the stack), it has a lifetime 'db tying it to the database, and fields can be queried independently for fine-grained dependencies.
This is different from #[salsa::interned]. Interned structs are deduplicated by content — same fields, same ID. Tracked structs have unique IDs even if their fields are identical. Think of it as the difference between a value type and a reference type: Interned means two Symbol("print") instances are the same symbol; Tracked means two FuncDef instances with identical fields are different function definitions.
Step 3: Identity vs Content
Here’s the key distinction:
┌─────────────────────────────────────┐
│ Plain data: │
│ Two {name: "add", params: 2} │
│ → Equal by value │
│ → No way to tell them apart │
├─────────────────────────────────────┤
│ Tracked struct: │
│ Two FuncDef {name: "add", ...} │
│ → Different IDs │
│ → "Which function?" matters │
└─────────────────────────────────────┘
When you change add’s body, you create a new FuncDef with a new ID. Salsa sees: “function #1 changed.” Queries that depended on function #2 (mul) aren’t invalidated because function #2’s ID is unchanged.
Step 4: Per-Field Dependencies
Tracked structs can track dependencies at the field level — but only if you opt in. By default, Salsa tracks dependencies at the struct level: reading func.param_count(db) tells Salsa you depend on func, not on param_count specifically. To get per-field tracking, mark the field with #[tracked]:
#![allow(unused)]
fn main() {
#[salsa::tracked(debug)]
pub struct FuncDef<'db> {
pub name: FuncName<'db>, // untracked (default)
#[tracked]
pub param_count: u32, // tracked — dependency on this field only
#[tracked]
#[returns(ref)]
pub body_text: String, // tracked — dependency on this field only
}
}
With #[tracked] on the fields, Salsa records a dependency on that specific field when you read it. If you change body_text but not param_count, queries that only read param_count return their cached values.
To see this in action, we’ll introduce a second query. func_signature reads only name and param_count — it doesn’t care about the body:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn func_signature(db: &dyn salsa::Database, func: FuncDef<'_>) -> String {
let name = func.name(db).text(db);
let param_count = func.param_count(db);
let params: Vec<String> = (0..param_count).map(|i| format!("p{}", i)).collect();
format!("function {}({})", name, params.join(", "))
}
}
func_complexity reads both param_count and body_text — a change to either field invalidates it:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn func_complexity(db: &dyn salsa::Database, func: FuncDef<'_>) -> u32 {
let body = func.body_text(db); // depends on body_text
let ops: u32 = ['+', '-', '*', '/', '.', '=']
.iter()
.map(|c| body.chars().filter(|&ch| ch == *c).count() as u32)
.sum();
ops + func.param_count(db) // depends on param_count
}
}
If you change only the body text, func_complexity re-runs (it depends on body_text). But if you added a query that only reads param_count, it would return its cached value. The per-field benefit shows up across different queries — not within a single query that reads both fields.
Without #[tracked], all fields are “untracked” — Salsa still stores them and you can read them, but the dependency is on the entire struct. Change any untracked field, and the struct is considered re-created, invalidating all queries that depend on it. For our FuncDef, adding #[tracked] is the right call: the whole point is fine-grained incrementality, and we want queries that only read param_count to stay cached when the body changes.
Step 5: The Wrapper/Data Pattern
There’s a design tension in our FuncDef. Its param_count and body_text fields are tracked with #[tracked] — Salsa records per-field dependencies, and changing one field doesn’t invalidate queries that only read the other. The name field is untracked — it’s part of the struct’s identity, not a tracked data field. Changing the name effectively creates a new function.
But sometimes you want identity to stay stable even when content changes. Rust-analyzer does this. Each function in the source code gets a tracked struct with an ID that doesn’t change when the body is edited — because the body isn’t a field of the tracked struct at all. It’s a separate tracked query.
The pattern looks like this:
- Tracked struct — identity only. No mutable fields. “Which function is this?”
- Tracked function — content. Takes the tracked struct as input, returns the data. “What’s in this function?”
#![allow(unused)]
fn main() {
// Identity — stable even when content changes
#[salsa::tracked]
pub struct Func {
pub name: FuncName<'db>, // the one field that *is* identity
// no body, no param_count — those are content
}
// Content — a separate query
#[salsa::tracked]
pub fn func_data(db: &dyn salsa::Database, func: Func<'_>) -> FuncData {
// look up the body, parameters, etc.
}
}
Our tutorial uses the simpler approach: FuncDef puts content directly in the tracked struct. This is easier to understand and fine for a small project. In a production system with hundreds of entities, the wrapper/data pattern pays off: it keeps identities stable across edits, which means fewer cache invalidations and faster incremental updates.
Step 6: Parsing Functions from Source
Before we can see tracked structs in action, we need to create some. That means parsing source text into FuncDef instances.
We’re switching from the analisar parser (Chapters 2–3) to a hand-rolled string-splitting approach. Why? Because analisar gives us the entire AST at once — a single LuaAst. But FuncDef is per-function. We need to produce individual tracked structs, not one big bag of data.
Here’s the implementation:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn parse_functions(db: &dyn salsa::Database, source: SourceFile) -> Vec<FuncDef<'_>> {
let text = source.text(db);
let mut functions = Vec::new();
for line in text.lines() {
let line = line.trim();
if let Some(rest) = line.strip_prefix("function ") {
// Parse: function name(p1, p2, ...) body end
//
// Simplification: this breaks on nested parentheses in
// the body (e.g., `return g(x)` — the first `)` closes
// `g(`, not the parameter list). For demonstrating tracked
// structs, this is fine. A real implementation would use
// analisar from ch02.
if let Some(paren_pos) = rest.find('(') {
let name = &rest[..paren_pos];
let after_paren = &rest[paren_pos + 1..];
if let Some(close_paren) = after_paren.find(')') {
let params_str = &after_paren[..close_paren];
let param_count = if params_str.is_empty() {
0
} else {
params_str.split(',').count() as u32
};
// Everything after ) until 'end' is the body
let body_start = paren_pos + 1 + close_paren + 1;
let body = if body_start < rest.len() {
rest[body_start..].trim().trim_end_matches("end").trim().to_string()
} else {
String::new()
};
let func_name = FuncName::new(db, name.to_string());
let func = FuncDef::new(db, func_name, param_count, body);
functions.push(func);
}
}
}
}
functions
}
}
A few things to notice. FuncName::new(db, ...) interns each function name — if two functions are named add, they share the same FuncName ID. FuncDef::new(db, ...) creates a tracked struct, and each call produces a new ID even if the fields are identical. The return type is Vec<FuncDef<'_>> — a collection of tracked structs. Salsa sees this query as depending on the SourceFile input. When the source changes, the whole parse_functions result is invalidated.
The parser is deliberately fragile. It won’t handle nested parentheses, multiline functions, or comments. That’s intentional — the point is the Salsa pattern, not parsing. A production type checker would use analisar (or any proper Lua parser) and convert its AST into tracked structs.
Step 7: Seeing It in Action
Let’s make this concrete. Say we have two functions:
function add(a, b) return a + b end
function mul(a, b) return a * b end
When we first run the type checker:
parse_functionsruns → produces twoFuncDefinstances: function #1 (add) and function #2 (mul)func_signatureandfunc_complexityrun for each → results cached perFuncDefID
Now we edit add’s body to return a .. b (string concatenation instead of addition):
Edit: add's body changed from "a + b" to "a .. b"
What happens:
parse_functionsre-runs → new source, so it producesFuncDefinstances for the updated codeadd’sFuncDefkeeps the same ID — identity is keyed by the untracked fields (fields without#[tracked]), andadd’snamedidn’t change. But Salsa compares the old and new field values:body_textchanged, so that field’s revision updates.nameandparam_countdidn’t change, so their revisions stay the same.mul’sFuncDefalso keeps the same ID — and none of its fields changed, so all field revisions stay the samefunc_complexity(mul)→ cached result returned instantly — mul’s FuncDef has the same ID and no fields changed, so all cached queries are still validfunc_complexity(add)→ re-runs because it readsbody_text, and that field’s revision changed → new complexity valuefunc_signature(mul)→ cached — it only readsnameandparam_count, neither of which changedfunc_signature(add)→ also cached!add‘sbody_textrevision changed, butfunc_signaturedoesn’t readbody_text— it only depends onnameandparam_count, and neither of those fields’ revisions changed
Step 7 is the per-field tracking payoff. add was “touched” by the re-parse, but func_signature is insulated because it never read the field that changed. Without #[tracked] on the fields, changing any field of add’s FuncDef would invalidate func_signature — even though it doesn’t use the changed field.
The key moment is step 2. With plain data, changing any part of the AST would invalidate everything. With tracked structs, add keeps its identity (same name = same entity), and Salsa tracks which fields actually changed. Queries that depend on unchanged fields never need to re-run.
This is the difference between “per-file incrementality” (Chapters 2–3: the whole file re-checks) and “per-entity incrementality” (this chapter: only the changed function re-checks). Tracked structs are what make per-entity incrementality possible.
Running
cargo run --bin ch04-tracked-structs
Next: Chapter 5: Type Inference — The heart of the tutorial. We’ll implement type inference as a Salsa tracked query and see incrementality in action. This is where everything comes together.
Chapter 5: Type Inference — The Core Query
You’ve built the pieces: inputs (source files), interned symbols (fast name comparison), and tracked structs (entities with stable identity). Now the question is: given an expression, what type does it produce?
That’s type inference. 42 produces Number. "hello" produces String. x + 1 produces Number if x is compatible with Number. The type checker’s job is to walk the AST and assign a type to every expression — and when it finds a contradiction (adding a string to a number), emit a diagnostic.
This chapter builds the core infer_type query. It’s the query that everything else depends on: diagnostics, cross-file checking, and the language server all start here.
The Mental Model: Gradual Typing
Lua is dynamically typed. You don’t write local x: number = 42. But that doesn’t mean we can’t check types — we do it gradually. If the type is known, we check it (42 + 1 → both are numbers → OK). If the type is unknown, we allow it (unknown_var + 1 → unknown is “dynamic” → OK, we trust the programmer). If there’s a contradiction, that’s an error (42 + "hello" → number + string → Error).
The key distinction: Dynamic ≠ Error. Dynamic means “we don’t know and that’s fine.” Error means “we found a contradiction.” One is intentional flexibility; the other is a bug.
Our Type enum:
#![allow(unused)]
fn main() {
pub enum Type {
Number,
String,
Boolean,
Nil,
Function { params: Vec<Type>, ret: Box<Type> },
Dynamic,
Error,
}
}
Seven variants, two that need explanation. Function { params, ret } is a record variant — it carries the parameter types and return type, not a unit or tuple. ret is Box<Type> (not Type directly) because Rust requires box indirection for recursive types: Function contains Type, which can be Function, which contains Type, and so on. Without the Box, the enum would have infinite size. Dynamic means “we don’t know and that’s OK” — it’s the gradual typing escape hatch. Error means “we found a contradiction” — a type check failed. They’re not the same: Dynamic is intentional flexibility, Error is a bug.
Compatibility check: is Type::Dynamic compatible with Type::Number? Yes. Is Type::Error compatible with anything? Yes (to avoid cascading errors). Is Type::Number compatible with Type::String? No.
Here’s the method that implements those rules:
#![allow(unused)]
fn main() {
impl Type {
fn is_compatible_with(&self, target: &Type) -> bool {
match (self, target) {
(Type::Dynamic, _) | (_, Type::Dynamic) => true,
(Type::Error, _) | (_, Type::Error) => true,
(Type::Number, Type::Number) => true,
(Type::String, Type::String) => true,
(Type::Boolean, Type::Boolean) => true,
(Type::Nil, Type::Nil) => true,
(Type::Function { params: p1, ret: r1 },
Type::Function { params: p2, ret: r2 }) => {
p1.len() == p2.len()
&& p1.iter().zip(p2.iter()).all(|(a, b)| a.is_compatible_with(b))
&& r1.is_compatible_with(r2)
}
_ => false,
}
}
}
}
Two things to notice. Dynamic is compatible with everything, in both directions — (Type::Dynamic, _) covers Dynamic on the left, (_, Type::Dynamic) covers it on the right. This is the gradual typing guarantee: if we don’t know the type, we don’t complain. Error is also compatible with everything. A type error in one expression shouldn’t cascade into errors in every expression that references it. Once you have an Error, all downstream checks pass silently — the real error was reported where it happened.
We’ll extend this method in Chapter 11 when we add union types (Number | String). For now, every type either matches itself, is Dynamic, or is Error.
Step 1: The Inference Query
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn infer_type(
db: &dyn salsa::Database,
source: SourceFile,
expr: Expression,
env: TypeEnv,
) -> Type {
match expr {
Expression::Nil => Type::Nil,
Expression::True | Expression::False => Type::Boolean,
Expression::Number(_) => Type::Number,
Expression::StringLiteral(_) => Type::String,
Expression::Name(ref name) => env.lookup(name),
Expression::BinaryOp { ref left, op, ref right } => {
let left_type = infer_type(db, source, (**left).clone(), env.clone());
let right_type = infer_type(db, source, (**right).clone(), env.clone());
match op {
BinOp::Add | BinOp::Sub | BinOp::Mul | BinOp::Div
| BinOp::Mod | BinOp::Pow => {
// Arithmetic: both sides must be compatible with Number.
if left_type.is_compatible_with(&Type::Number)
&& right_type.is_compatible_with(&Type::Number)
{
Type::Number
} else {
Type::Error
}
}
BinOp::Concat => {
// Lua's `..` operator coerces numbers to strings.
// So `1 .. "x"` is valid — the 1 becomes "1" at runtime.
// If either operand is Dynamic, we can't know whether
// coercion will happen or fail — Dynamic is the honest answer.
//
// For known types: Number and String are both valid operands.
// Anything else (Boolean, Nil, Function) is an error — Lua
// only coerces numbers, not arbitrary values.
if left_type == Type::Dynamic || right_type == Type::Dynamic {
Type::Dynamic
} else if is_concat_compatible(&left_type) && is_concat_compatible(&right_type) {
Type::String
} else {
Type::Error
}
}
}
The is_concat_compatible helper accepts both String and Number — Lua coerces numbers to strings for .., but not booleans, nil, or functions:
#![allow(unused)]
fn main() {
/// Lua coerces numbers to strings for concatenation: `1 .. "x"` is valid.
/// But it does NOT coerce booleans, nil, or functions — those are errors.
fn is_concat_compatible(ty: &Type) -> bool {
matches!(ty, Type::String | Type::Number)
}
}
You might notice an inconsistency: arithmetic returns Number when either operand is Dynamic (because Dynamic.is_compatible_with(&Number) is true), but concat returns Dynamic. Why the difference? For arithmetic, the gradual typing convention is that Dynamic + 1 optimistically returns Number — if the Dynamic turns out to be a string at runtime, that’s a type error, but the type checker assumes the best case. For concat, the coercion possibility makes this optimism less safe: 1 .. "x" works (Number coerces to String), but true .. "x" fails — and we can’t tell at compile time which case we’re in. So concat is more conservative. Both choices are defensible for gradual typing; the key is being consistent within each operator.
Back in the BinOp match, the comparison and logical operators are straightforward:
#![allow(unused)]
fn main() {
BinOp::Lt | BinOp::Gt | BinOp::Le | BinOp::Ge
| BinOp::Eq | BinOp::Ne => Type::Boolean,
// Lua's `and`/`or` return one of their operands (not a boolean).
// `true and 42` returns 42 (Number), `nil or 1` returns 1 (Number).
// When either operand is Dynamic, we return Dynamic — we can't
// know which operand the runtime will choose, so Dynamic is the
// honest answer. The real type would be a union of both operand
// types — Chapter 11 introduces the union type machinery you'd
// need to express this properly.
BinOp::And | BinOp::Or => {
if left_type == Type::Dynamic || right_type == Type::Dynamic {
Type::Dynamic
} else {
left_type
}
}
}
}
Expression::UnaryOp { op, ref expr } => {
let inner = infer_type(db, source, (**expr).clone(), env);
match op {
UnaryOp::Negate => {
if inner.is_compatible_with(&Type::Number) { Type::Number } else { Type::Error }
}
UnaryOp::Not => Type::Boolean,
UnaryOp::Length => {
if inner.is_compatible_with(&Type::String) { Type::Number } else { Type::Error }
}
}
}
Expression::FunctionCall { ref func, ref args } => {
let func_type = infer_type(db, source, (**func).clone(), env.clone());
match func_type {
Type::Function { params, ret } => {
for (i, arg) in args.iter().enumerate() {
let arg_type = infer_type(db, source, arg.clone(), env.clone());
if let Some(param_type) = params.get(i) {
if !arg_type.is_compatible_with(param_type) {
return Type::Error;
}
}
}
*ret
}
Type::Dynamic => Type::Dynamic,
Type::Error => Type::Error,
_ => Type::Error,
}
}
Expression::FieldAccess { .. } | Expression::Index { .. } => Type::Dynamic,
}
}
}
The UnaryOp arms follow the same pattern as binary operators: check compatibility, return the result type or Error. Negate requires a Number, Not always returns Boolean, and Length requires a String and returns Number (Lua’s # operator returns the length of a string).
The FunctionCall arm is where inference and the type environment finally connect to function definitions. When the function has a known Function type, we check each argument against its parameter type using is_compatible_with — a Dynamic argument is always compatible, but a String argument against a Number parameter is an error. When the function is Dynamic, the return type is Dynamic too. When the function is Error, we propagate the error. Any other type being called (calling a Number?) is also an error.
FieldAccess and Index return Dynamic — we don’t model Lua tables yet, so we can’t say anything about what’s inside them.
This is a tracked function that takes an expression and an environment, and returns a type. The environment (a list of name→type bindings) is part of the query key — same expression, different environment = different result.
Recursive calls are fine. infer_type(a + b) calls infer_type(a) and infer_type(b). Salsa memoizes each call independently. If only b changes, infer_type(a) returns its cached value, and only infer_type(b) re-runs.
Step 2: The Type Environment
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub struct TypeEnv {
pub bindings: Vec<(String, Type)>,
}
}
The salsa::Update derive is not optional decoration. It lets Salsa compare old and new values of this type for fine-grained invalidation. Every type that Salsa needs to compare for change detection — whether as an input, an output, or an intermediate value — needs this derive. When a tracked function takes or returns a TypeEnv, Salsa uses the Update impl to check whether the value actually changed. If it didn’t, downstream queries don’t re-run.
The environment maps variable names to types. When we see local x = 42, we add ("x", Type::Number) to the environment. When we look up x later, we find it’s a number. TypeEnv::lookup does a linear scan of bindings and returns the matching type, or Type::Dynamic if the name isn’t found — the same pattern as the lookup_name function in Chapter 3, but for types instead of definitions.
Why is TypeEnv part of the query key? Because the same variable name in different scopes can have different types. infer_type(db, source, expr_a, env_1) and infer_type(db, source, expr_a, env_2) are different queries with potentially different results.
This is correct but has a cost: if the environment changes (because you changed a variable’s type), many queries re-run. In a production system, you’d use tracked structs for the environment to get finer-grained invalidation.
Step 3: Type-Checking Statements
Type inference handles expressions. Type-checking handles statements — it walks the AST, calls infer_type for each expression, and updates the environment:
Statement::Assignment { local, targets, values }
→ infer the value type
→ extend the environment with target = value_type
Statement::Function { name, params, body }
→ create a new env with params as Dynamic
→ type-check the body in that env
→ add name = Dynamic to the outer env
Statement::If { test, then_block, else_block }
→ infer the test type
→ check both blocks
Statement checking is not a tracked query in our implementation — it’s a regular function called from type_check_program. This means the granularity of our type checker is per-file: changing any statement invalidates the entire file’s type check. Other files’ cached results are preserved (that’s the per-file isolation), but all statements within the changed file are re-checked from scratch — even the ones that didn’t change.
Is this a problem? For a tutorial, no. For a production system, yes — you’d want per-function granularity. That’s where tracked structs (Chapter 4) become essential: each function gets its own tracked struct, and changing one function only invalidates that function’s type check.
Here are the two tracked queries that tie it all together:
#![allow(unused)]
fn main() {
/// Type-check an entire program.
#[salsa::tracked]
pub fn type_check_program(db: &dyn salsa::Database, source: SourceFile) {
let ast = parse(db, source);
let mut env = TypeEnv::new();
for stmt in &ast.statements {
env = check_stmt(db, source, stmt, &env);
}
}
/// Get all inferred types for top-level bindings.
#[salsa::tracked]
pub fn top_level_types(db: &dyn salsa::Database, source: SourceFile) -> Vec<(String, Type)> {
let ast = parse(db, source);
let mut env = TypeEnv::new();
for stmt in &ast.statements {
env = check_stmt(db, source, stmt, &env);
}
env.bindings
}
}
Two things to notice. type_check_program walks all statements and updates the environment, but doesn’t return a value — its job is to trigger infer_type for every expression and let Salsa record the dependencies. If you later query individual expression types, they’ll be cached. top_level_types does the same walk but returns the final environment’s bindings, the top-level name→type pairs. This is the query you’d call from a language server to display hover types.
Yes, they duplicate the same loop. In a production system, type_check_program would exist as the entry point and top_level_types would read its result instead of recomputing it. For the tutorial, keeping them separate makes the dependency chain clearer: one is side-effect-driven (check everything), the other is value-driven (return the bindings).
Step 4: The Full Pipeline
SourceFile (input)
│
▼
parse() → LuaAst (plain data)
│
▼
top_level_types() → check each statement
│ │
│ ▼
│ infer_type() — recursive, memoized
│ │
│ (recursive calls, each cached)
│
▼
Vec<(String, Type)> — the top-level bindings and their types
Step 5: Incrementality in Action
#![allow(unused)]
fn main() {
use salsa::Setter;
use std::path::PathBuf;
let source = SourceFile::new(&db, PathBuf::from("example.lua"), "local x = 42\nlocal y = \"hello\"\n");
let types = top_level_types(&db, source);
// x: Number, y: String
source.set_text(&mut db).to("local a = 1 + 2\n".to_string());
let new_types = top_level_types(&db, source);
// a: Number — re-computed because source changed
}
When we change the source text, parse() re-runs, top_level_types() re-runs, and infer_type() re-runs for each expression. Everything that depended on the old source text is invalidated.
But here’s what doesn’t re-run: queries for other source files. If we have two files and only change one, the other file’s type check returns from cache instantly. This is per-input isolation from Chapter 1, now applied to real work.
Running
cargo run --bin ch05-type-inference
This chapter also has a #[cfg(test)] module — the idiomatic Rust way to write tests. The demo assertions in main() show results immediately, but cargo test is what you’d use in a real project:
cargo test --bin ch05-type-inference
Next: Chapter 6: Diagnostics as Accumulators — How to report errors without poisoning the query graph. This is essential for a real type checker: you want partial results even when there are type errors.
Chapter 6: Diagnostics as Accumulators — Collecting Errors Without Returning Them
Our type checker returns Type from infer_type. When something goes wrong — a type mismatch, an undefined variable — it returns Type::Error. But the caller has no details. Was it “expected Number, got String”? Was it “undefined variable ‘x’”? No way to know, and no way to report it to the user.
You might be tempted to change infer_type to return Result<Type, Diagnostic>. Don’t. That approach poisons the query graph: one error in a dependency forces every caller to handle the Err case, and Salsa’s caching becomes nearly useless (a function that sometimes returns Ok and sometimes Err can’t be cached reliably across revisions).
Salsa’s solution is accumulators: a tracked function “accumulates” diagnostics as a side effect of running, and the caller reads them out later. The function still returns Type — clean, cacheable, deterministic. The diagnostics are stored separately, and they’re tracked by Salsa too.
The Problem: Errors That Poison the Graph
You wrote a type checker. It returns Type. What happens when there’s a type error?
Option 1: Return Result<Type, Error>
If infer_type returns Err, downstream queries can’t use the result. The error “poisons” everything that depends on it. You lose all the type information you did compute. In a gradual type system, partial type info is valuable — you want to keep it.
Option 2: Return Type::Error (sentinel value)
Better. Downstream queries see Error and can decide what to do (propagate it, ignore it, etc.). But you lose the error message. You know something went wrong, but not what. “Cannot add Number and String” is gone — all you have is Type::Error.
Option 3: Accumulators (the Salsa way)
The query returns a Type (possibly Type::Error), and emits a diagnostic as a side effect. The diagnostic doesn’t affect the return value or the query graph. It’s collected after the query runs. You get both the type information AND the error message.
Step 1: Define the Diagnostic Types
First, we need a way to classify diagnostics. Most type errors are hard errors, but some are warnings (like unused variables). A simple enum does it:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Severity {
Error,
Warning,
}
}
Now the accumulator itself:
#![allow(unused)]
fn main() {
use salsa::Accumulator;
#[salsa::accumulator]
#[derive(Debug, Clone)]
pub struct Diagnostic {
pub severity: Severity,
pub message: String,
}
}
An accumulator is a type that can be “pushed to” from inside tracked functions. It’s a side channel — separate from the return value.
The Diagnostic struct has a convenience constructor for the common case:
#![allow(unused)]
fn main() {
impl Diagnostic {
fn error(msg: impl Into<String>) -> Self {
Diagnostic { severity: Severity::Error, message: msg.into() }
}
fn warning(msg: impl Into<String>) -> Self {
Diagnostic { severity: Severity::Warning, message: msg.into() }
}
fn emit(self, db: &dyn salsa::Database) {
self.accumulate(db);
}
}
}
The emit method is a thin wrapper around accumulate, which comes from the salsa::Accumulator trait. You could call .accumulate(db) directly — emit is a readability shortcut.
Step 2: Emit Diagnostics from Inside Queries
#![allow(unused)]
fn main() {
use salsa::Accumulator;
#[salsa::tracked]
pub fn infer_type(db: &dyn salsa::Database, source: SourceFile, expr: Expression, env: TypeEnv) -> Type {
match expr {
Expression::BinaryOp { ref left, op, ref right } => {
let lt = infer_type(db, source, (**left).clone(), env.clone());
let rt = infer_type(db, source, (**right).clone(), env.clone());
match op {
BinOp::Add => {
if !lt.is_compatible_with(&Type::Number) {
Diagnostic::error(format!("cannot use {:?} in arithmetic", lt))
.emit(db); // side effect!
}
if !rt.is_compatible_with(&Type::Number) {
Diagnostic::error(format!("cannot use {:?} in arithmetic", rt))
.emit(db);
}
if lt == Type::Error || rt == Type::Error { Type::Error } else { Type::Number }
}
// ...
}
}
// ...
}
}
}
When we find a type error, we do two things:
- Return
Type::Error— tells downstream queries “something went wrong” - Emit a
Diagnostic— tells the user what went wrong
These are separate concerns. The return value is for the query graph. The accumulator is for the human.
Step 3: Collect Accumulated Diagnostics
The file-level query is now called type_check — a shorter name now that it does more than type-check alone. In Chapter 5, we called this type_check_program. Same function, shorter name.
#![allow(unused)]
fn main() {
type_check(&db, source);
let diags: Vec<_> = type_check::accumulated::<Diagnostic>(&db, source)
.into_iter()
.cloned()
.collect();
}
After running type_check, you call type_check::accumulated::<Diagnostic> to get all diagnostics that were emitted during that query (and any queries it called). This is the read side of the accumulator.
Step 4: Incremental Accumulators
When a query re-runs, its old accumulated diagnostics are discarded and replaced. No stale diagnostics. If you fix a type error and the query re-runs successfully, the diagnostic from the previous revision disappears automatically.
#![allow(unused)]
fn main() {
use salsa::Setter;
use salsa::Accumulator;
// Bad program → diagnostics
bad.set_text(&mut db).to("local x = 42\nlocal y = \"hello\"\nlocal z = x + y\n");
type_check(&db, bad);
let bad_diags = type_check::accumulated::<Diagnostic>(&db, bad);
assert!(!bad_diags.is_empty());
// Fix it → no diagnostics
bad.set_text(&mut db).to("local x = 42\nlocal y = 1\nlocal z = x + y\n");
type_check(&db, bad);
let fixed_diags = type_check::accumulated::<Diagnostic>(&db, bad);
assert!(fixed_diags.is_empty());
}
The Pattern
┌──────────────────────────────────────────────┐
│ TRACKED FUNCTION │
│ │
│ input ──→ computation ──→ return value │
│ │ │
│ └──→ accumulate(diagnostics)│
└──────────────────────────────────────────────┘
Return value → used by downstream queries (the graph)
Accumulator → used by the human (the report)
These are intentionally separate. The graph shouldn’t break because of a diagnostic. The diagnostic shouldn’t be suppressed because the graph needs a valid value.
Why Not Result<Type, Diagnostic>?
Because Result is all-or-nothing: you get the type OR the error, never both. In gradual typing, partial information is valuable. If x is Number and y is String and z = x + y is an error, you still want to know that x is Number and y is String. With accumulators, you do.
Running
cargo run --bin ch06-diagnostics
A Note on the <method-call> Diagnostic
If you write a:b() in Lua and run it through the type checker, you’ll see a diagnostic like “unknown variable <method-call>.” That’s not a typo in your code — it’s a placeholder. Our AST doesn’t model method calls (the a:b() syntax), so Chapter 2’s parser replaces them with FieldAccess { object: a, field: "<method-call>" }. When the type checker encounters this field name, it reports “unknown variable” because <method-call> isn’t a real field. The diagnostic is telling you that method syntax isn’t supported, not that you have a typo. A production compiler would add a proper MethodCall variant to the AST; our placeholder keeps things compiling and makes the simplification visible.
Next: Chapter 7: The Language Server — Wire everything into a language server simulation and watch Salsa skip work on edits. This is the payoff for everything you’ve learned.
Chapter 7: The Language Server — Watching Type Errors Appear as You Type
You’ve built the pieces individually. Now the question is: do they work together? This chapter wires everything — inputs, tracked functions, interned symbols, tracked structs, type inference, diagnostics — into a language server loop and watches what happens when you edit a file.
A language server does the same thing on every keystroke: parse, type check, report diagnostics. The hard part isn’t the logic — it’s the speed. You need sub-100ms responses, which means you can’t re-type-check the entire project each time. Salsa’s incrementality makes this possible: change a comment and the type checker doesn’t re-run at all. Change a function body and only queries that depended on that function re-run. Let’s see it in action.
The Language Server Loop
Here’s what that loop looks like in detail:
- User opens a file → parse + type check + report diagnostics
- User types a character → update the source → re-parse + re-type-check → report diagnostics
- User saves → same as step 2, but the source is now on disk
The key constraint: step 2 has to be fast. Sub-100ms. The user is typing, and they expect the red squiggles to appear (or disappear) instantly. You can’t re-type-check the entire project on every keystroke.
This is exactly the problem Salsa solves. Let’s see how — and this time, we’ll measure it.
The LanguageServer Struct
#![allow(unused)]
fn main() {
use salsa::Setter;
use salsa::Accumulator;
use std::path::PathBuf;
struct LanguageServer {
db: Database, // owned — set_text needs &mut, but SourceFile IDs
// remain valid after mutation (Salsa guarantees this)
files: Vec<SourceFile>,
}
impl LanguageServer {
fn open_file(&mut self, path: &str, text: &str) -> usize {
let file = SourceFile::new(&self.db, PathBuf::from(path), text.to_string());
let idx = self.files.len();
self.files.push(file);
type_check(&self.db, file); // initial check
idx
}
fn edit_file(&mut self, idx: usize, new_text: &str) {
self.files[idx].set_text(&mut self.db).to(new_text.to_string());
type_check(&self.db, self.files[idx]); // incremental check
}
fn diagnostics(&self, idx: usize) -> Vec<Diagnostic> {
type_check::accumulated::<Diagnostic>(&self.db, self.files[idx])
.into_iter().cloned().collect()
}
}
}
Three operations: open, edit, get diagnostics. On each edit, we set the new source text and re-type-check. Salsa handles the rest — it only re-runs queries whose inputs actually changed.
The Edit Cycle
User types "x + y" where "x" is Number and "y" is String:
1. set_text() → new revision
2. type_check() → re-runs (source changed)
3. parse() → re-runs (source changed)
4. infer_type() → re-runs for x + y
5. infer_type(x) → cached (x didn't change)
6. infer_type(y) → cached (y didn't change)
7. Diagnostic::error("cannot add Number and String") → accumulated
8. diagnostics() → [Error: "cannot add Number and String"]
User fixes to "x + 1":
1. set_text() → new revision
2. type_check() → re-runs
3. parse() → re-runs (source changed)
4. infer_type() → re-runs for x + 1
5. infer_type(x) → cached! (same revision, same env)
6. infer_type(1) → cached! (literal, no dependencies)
7. No diagnostics emitted
8. diagnostics() → [] (old diagnostics automatically discarded)
Steps 5-6 are the magic. Even though the source changed and parse() re-ran, the individual infer_type calls for unchanged sub-expressions return from cache. The type checker only does work for the part that actually changed.
You can see this in the timing output. The first open_file call does full work. The edit_file calls are faster because Salsa skips the cached sub-expressions. With a small file the difference is subtle — with a thousand-line file, it’s the difference between “instant” and “wait for it.”
Running
cargo run --bin ch07-language-server
Per-File Isolation
#![allow(unused)]
fn main() {
let main = server.open_file("main.lua", "local x = 42\n");
let other = server.open_file("other.lua", "local a = 1 + \"hello\"\n");
// other.lua has an error
server.diagnostics(other); // → [Error: cannot add Number and String]
// main.lua is clean — and it stays cached
server.diagnostics(main); // → [] (no re-checking!)
}
When we open a second file with an error, the first file isn’t affected. Its cached type check is still valid. Salsa knows that main.lua’s queries don’t depend on other.lua’s source text, so it returns the cached result instantly.
In a real project with hundreds of files, this is the difference between “responsive on every keystroke” and “freezes for seconds after each edit.”
The Full Architecture
┌──────────────────────────────────────────────────────────────┐
│ LANGUAGE SERVER │
│ (diagnostics, hover, completion) │
└──────────────┬───────────────────────────────────────────────┘
│ accumulated diagnostics / cached types
▼
┌──────────────────────────────────────────────────────────────┐
│ TYPE CHECKER │
│ type_check() → check_stmt() → infer_type() │
│ ↓ ↓ ↓ │
│ [accumulators] [env updates] [recursive memoized calls] │
└──────────────┬───────────────────────────────────────────────┘
│ depends on
▼
┌──────────────────────────────────────────────────────────────┐
│ PARSER │
│ parse() — tracked, cached, uses analisar │
└──────────────┬───────────────────────────────────────────────┘
│ depends on
▼
┌──────────────────────────────────────────────────────────────┐
│ INPUTS │
│ SourceFile { path, text } — set on edit │
└──────────────────────────────────────────────────────────────┘
Incrementality in action:
set_text()creates a new revisionparse()re-runs (source changed)type_check()re-runs (parse result changed)infer_type()re-runs only for affected expressions- Diagnostics are re-accumulated
- Other files are NOT re-checked
This is how rust-analyzer stays fast. And now you know how.
Seeing the Speedup
The demo prints timing information to stderr. Here’s what you’ll see:
1. User opens main.lua
[timing] open "main.lua": 1.23ms
2. User adds a type error: local w = x + "hello"
[timing] edit: 0.18ms
3. User fixes the error: local w = x + 1
[timing] edit: 0.15ms
4. User opens other.lua with an error
[timing] open "other.lua": 0.95ms
The first check of each file does full work: parse the source, walk every expression, compute every type. Subsequent edits are faster because Salsa re-runs only what changed. The parse re-runs (the source changed), but infer_type for unchanged sub-expressions returns from cache.
These numbers are illustrative — with small demo files, both operations complete in microseconds and the timing difference is negligible. The effect is dramatic with larger codebases: a thousand-line file might take 50ms on first check but under 1ms on edit, because Salsa skips the cached sub-expressions.
The timing data goes to stderr so the main output stays clean. To see it:
cargo run --bin ch07-language-server 2>&1 | grep timing
Or run it without filtering — the timing lines print alongside the normal output.
What You Built
Over seven chapters, you built a gradual type checker for Lua powered by Salsa:
- Ch1 — Salsa basics: inputs, tracked functions, revisions
- Ch2 — Parsing Lua: wiring analisar into Salsa
- Ch3 — Interned symbols: fast name comparison by ID
- Ch4 — Tracked structs: entity identity in Salsa
- Ch5 — Type inference: the core tracked query, recursive and cached
- Ch6 — Accumulators: diagnostics without poisoning the graph
- Ch7 — Language server: the full pipeline, incremental on edits
Next Steps
Chapter 8: Cycle Detection — Recursive type queries can loop forever. Salsa detects cycles and reports them — but the implementation has subtlety. We’ll see how cycle detection works, why it matters for type checking, and how to handle it in your own queries.
This tutorial covers the fundamentals. To go further:
Build a real LSP using tower-lsp — the simulation here shows the loop, but a real server needs JSON-RPC, text document sync, etc.
Add type annotations — Teal-style local x: number = 42. Parse annotations and use them as hints in inference. (Covered in Chapter 10.)
Pursue fine-grained incrementality — use tracked structs for every AST node, not only function definitions. This is what rust-analyzer does.
Build go-to-definition, completion, hover — these are queries too. lookup_name from Chapter 3 is the start of go-to-definition.
Handle cross-file references — our type checker is per-file. A real one needs to handle require() and resolve names across files. (Covered in Chapter 9.) Tracked structs make this tractable: each file’s definitions are tracked, and cross-file lookups are cached.
Next: Chapter 8: Cycle Detection — Our type checker assumes queries form a clean dependency graph: parse feeds into infer_type, data flows one direction. But what happens when f calls g and g calls f? Without protection, Salsa loops forever. We’ll see how to catch cycles and keep the language server alive.
Chapter 8: Cycle Detection — What Happens When Queries Form a Loop
Two Lua functions. One calls the other. The other calls the first.
Let’s find out.
The Problem
Imagine two Lua functions that reference each other:
local function f(x) return g(x + 1) end
local function g(x) return f(x - 1) end
If infer_type(f) calls infer_type(g) which calls infer_type(f) — that’s a cycle. Without protection, this loop runs forever. The stack overflows. Your language server hangs.
Salsa catches this. Let’s see how.
A Minimal Cycle
#![allow(unused)]
fn main() {
// A tracked function that calls itself — a direct cycle.
// (SourceFile is a Salsa input struct defined in Ch1.)
#[salsa::tracked]
fn direct_cycle(db: &dyn salsa::Database, source: SourceFile) -> i32 {
direct_cycle(db, source)
}
}
When you call direct_cycle(db, file), Salsa notices that direct_cycle is already on the call stack. Instead of re-entering the function, it panics with a cycle error.
This is deliberate. Salsa’s contract is: tracked functions are pure functions of their inputs. A cycle violates that contract — the result depends on itself. There’s no correct answer, so Salsa stops.
You might wonder why Salsa doesn’t return a Result type instead of panicking. The reason is the function signature: #[salsa::tracked] functions return Type, not Result<Type, CycleError>. Changing the return type would force every caller to handle the error, even in the common (acyclic) case. Panicking is Salsa’s way of saying “this shouldn’t happen in normal code” — and the recovery mechanisms below are how you opt in to handling it when it does.
What the Panic Looks Like
thread 'main' panicked at 'dependency graph cycle when querying \
infer_type(Id { value: 42 }), set cycle_fn/cycle_initial to fixpoint iterate.
Query stack:
[infer_type(Id { value: 42 }),
infer_type(Id { value: 17 })]',
.../salsa-0.26.2/src/function/fetch.rs:178:5
The panic message tells you which query cycled and shows the query stack — the chain of calls that led to the cycle. The message also suggests using cycle_fn or cycle_initial (the fixpoint iteration approach) if you want to handle cycles instead of panicking. For a type checker, cycle_result is usually simpler — it returns a fallback value immediately, no iteration needed.
Handling Cycles Gracefully
Panicking is fine for a tutorial. For a real language server, it’s not — you don’t want the entire process to crash because one file has mutually recursive functions.
Salsa provides two ways to handle cycles without panicking:
1. catch_unwind and fallible wrappers
The simplest way to handle cycles without crashing is to wrap the query call in std::panic::catch_unwind. Salsa panics on cycles, and catch_unwind lets you catch that panic and recover:
#![allow(unused)]
fn main() {
use std::panic::{catch_unwind, AssertUnwindSafe};
fn safe_type_check(db: &Database, source: SourceFile) -> Vec<Diagnostic> {
// AssertUnwindSafe tells the compiler "I accept that this closure
// may observe inconsistent state after a panic." It's needed because
// catch_unwind requires the closure to be UnwindSafe, and closures
// that capture &Database aren't automatically UnwindSafe — the
// compiler can't prove the database is still consistent after a
// panic. In practice, Salsa's cycle panics don't corrupt state
// (they abort the query before it writes anything), so this is safe.
let result = catch_unwind(AssertUnwindSafe(|| {
type_check(db, source)
}));
match result {
Ok(diagnostics) => diagnostics,
Err(panic_payload) => {
// Salsa cycles panic with a string message like
// "dependency graph cycle when querying infer_type(...)"
if let Some(msg) = panic_payload.downcast_ref::<&str>() {
vec![Diagnostic::error(format!("cycle: {msg}"))]
} else if let Some(msg) = panic_payload.downcast_ref::<String>() {
vec![Diagnostic::error(format!("cycle: {msg}"))]
} else {
vec![Diagnostic::error("unknown cycle".to_string())]
}
}
}
}
}
Now instead of crashing, a cycle produces a diagnostic: error: dependency graph cycle when querying infer_type(...). The language server stays alive. The user sees the error in their editor and can fix the mutual recursion.
This approach — wrap the top-level query and convert the panic into a user-visible error — works reliably across Salsa versions. It’s not elegant, but it handles cycles without restructuring your queries. You might wonder why you’d use catch_unwind instead of the built-in cycle_result approach below. The tradeoff: catch_unwind is universal (wrap any query) but blunt (you get a string error message, not a typed fallback). cycle_result is precise (each query defines its own typed fallback) but per-query (you opt in for each tracked function individually). For a small project, catch_unwind is quick to add. For a production type checker, cycle_result is the better long-term choice.
If your project uses
panic = "abort"(common in release builds for performance and binary size),catch_unwindwon’t work — there’s nothing to catch. Use the cycle recovery function approach below instead.
2. Cycle recovery with cycle_result
Salsa also provides a built-in way to handle cycles without panicking. When a cycle is detected, instead of panicking, Salsa calls your recovery function and uses its return value to break the cycle.
The idea: a tracked function can specify what to return if it’s part of a cycle. For a type checker, the sensible fallback is Type::Error — “I couldn’t infer this type because of a cycle.”
In Salsa 0.26, the attribute is cycle_result. It takes the name of a function that Salsa calls when the tracked function is part of a cycle:
#![allow(unused)]
fn main() {
#[salsa::tracked(cycle_result = infer_type_fallback)]
fn infer_type(db: &dyn Db, def: Definition) -> Type {
match def.kind {
DefKind::Function => {
// Walk the function body, inferring types for each expression.
// If the body calls another function, we look up its Definition
// and call infer_type recursively. That recursive call is
// where cycles happen.
//
// (Ch5's infer_type works on expressions. A definition-level
// type checker would walk the function's body here, calling
// infer_type for each referenced Definition. The details
// aren't important — what matters is that infer_type can
// call itself on a different Definition, and that's the
// cycle we need to recover from.)
todo!("walk function body, inferring types")
}
DefKind::Variable => {
// Infer the type of the variable's initializer expression.
todo!("infer variable type")
}
DefKind::Parameter => {
// Parameters get their types from the function signature
// or from explicit annotations (Ch10).
Type::Dynamic
}
}
}
/// Fallback function: called when infer_type is part of a cycle.
///
/// Instead of panicking, we return Type::Error. Salsa propagates
/// this back through the cycle — every query that was waiting on
/// infer_type gets Error, and computation continues.
///
/// The `id` parameter identifies which query instance hit the cycle.
/// The `def` parameter is the tracked function's other input —
/// in this case, the Definition being inferred. The fallback
/// signature is (db, id, ...other_inputs), matching the tracked
/// function's parameters after db.
///
/// Note: the fallback receives `&dyn salsa::Database`, not `&dyn Db`.
/// The tracked function uses your custom `Db` trait, but the fallback
/// runs outside the normal query context — Salsa doesn't give it
/// access to your custom database methods. This restriction is
/// deliberate: the fallback runs during cycle recovery, where the
/// interrupted query's intermediate state is inconsistent. If the
/// fallback could call other tracked functions, it might trigger more
/// cycles or read stale data. Restricting it to the base `Database`
/// trait prevents that. Since the fallback's job is to return a fixed
/// value (Type::Error), it doesn't need custom methods anyway.
fn infer_type_fallback(
_db: &dyn salsa::Database,
_id: salsa::Id,
_def: Definition,
) -> Type {
Type::Error
}
}
When infer_type hits a cycle — say f calls g and g calls f — Salsa notices that infer_type for f’s Definition is already on the stack. Instead of panicking, it calls infer_type_fallback, which returns Type::Error. That Error propagates: infer_type for g’s Definition gets Error for f’s type, so it also returns Error. The cycle is broken, and the rest of the type checker continues.
-- What the user wrote:
local function f(x) return g(x + 1) end
local function g(x) return f(x - 1) end
-- What happens internally:
-- infer_type(Definition for f) -- simplified: the tracked struct for f
-- → calls infer_type(Definition for g)
-- → calls infer_type(Definition for f) ← CYCLE!
-- → infer_type_fallback returns Type::Error
-- → f's return type is Error
-- → g's return type is Error
--
-- Both functions get Type::Error. The type checker continues.
-- A diagnostic is reported: "cannot infer type of `f`: cyclic dependency"
--
-- Note: "Definition for f" is pseudocode — Definition is a Salsa tracked
-- struct, not an enum variant. You can't write Definition::f in Rust.
-- The real code would use Definition::new(db, f_name, ...) to create the
-- instance, then pass that instance to infer_type.
The cycle_result approach uses the FallbackImmediate strategy — when a cycle is detected, Salsa immediately returns the fallback value. This is the right choice for a type checker: there’s no “correct” type for a cyclic function, so returning Type::Error immediately is the sensible behavior.
Salsa also supports fixpoint iteration via the cycle_fn and cycle_initial attributes. This is useful for dataflow-style analysis where you want to iterate until the values converge. A type checker doesn’t need this — Type::Error is the only sensible answer for a cycle — but if you’re building something like a constant propagation analysis, check the Salsa docs for the cycle_fn API.
The cycle recovery API has changed across Salsa versions. The concept is stable — define a fallback, return it on cycles, keep running — but the attribute name and function signature may differ. Check the Salsa docs for your version.
Why Cycles Happen in Type Checkers
Cycles show up in real type checkers more often than you’d expect:
| Pattern | Why It Cycles |
|---|---|
| Mutual recursion | f calls g, g calls f — infer_type loops |
| Self-referential types | type T = { field: T } — struct type depends on itself |
| Circular imports | File A requires B, B requires A — parse or type_check loops |
| Recursive aliases | type Alias = Alias — name resolution loops |
Our tutorial doesn’t hit these because:
infer_typedoesn’t look up function definitions from a global table (Ch5 checks expressions in isolation)parseis per-file (no cross-file dependencies)- We don’t resolve
require()(Ch9 adds cross-file resolution)
But a real type checker will hit cycles. When you add cross-file resolution, you’ll need cycle handling.
The Architectural Fix: Break the Loop
The best way to handle cycles is to structure your queries so they don’t form loops. This means making the dependency graph explicit:
SourceFile (input)
↓
parse() → LuaAst (tracked function)
↓
top_level_names() → Vec<String> (tracked function)
↓
FuncDef (tracked struct, one per function)
↓
infer_return_type() → Type (tracked function, depends on FuncDef, not SourceFile)
The key: infer_return_type depends on FuncDef (a tracked struct), not on SourceFile (an input). When you edit the source, parse re-runs and produces new FuncDefs. Salsa creates new tracked struct instances for changed functions, and only re-runs infer_return_type for those.
Why does this help with cycles? Because the dependency graph goes one direction: SourceFile → parse → FuncDef → infer_return_type. There’s no arrow going back up. Mutual recursion between functions becomes infer_return_type(FuncDef for f) → infer_return_type(FuncDef for g) → infer_return_type(FuncDef for f) — a cycle at the entity level, which Salsa can detect and recover from. Without the FuncDef layer, you’d have type_check(SourceFile A) → type_check(SourceFile B) → type_check(SourceFile A) — a cycle at the file level, which invalidates entire files instead of individual functions.
The granularity matters. The FuncDef tracked struct is the entity boundary — the point where “something changed” becomes specific enough to handle without pulling the whole file back into the cycle.
This is the tracked-struct granularity we discussed in Ch4. It’s not only about performance — it’s about making the dependency graph acyclic by design.
What We’re Simplifying
Recovery function signatures. The cycle_result attribute and the exact signature of the fallback function (db, id, ...other_inputs) are specific to Salsa 0.26. The fallback receives the database, the query’s Salsa ID, and any other input parameters the tracked function takes (after the db parameter). Other versions may use different attribute names (like cycle_fn or cycle_initial) and different signatures (some include &salsa::Cycle for diagnostic access). The concept — define a function that returns a fallback value — is stable. The syntax may not match your version exactly. Check the Salsa docs for the current API.
try_ query variants. Some Salsa versions have explored query methods that return Result instead of panicking on cycles (e.g., try_get_{field} on tracked structs). As of Salsa 0.26, these don’t exist in the public API — catch_unwind and cycle_result are the two available approaches. If a future version adds try_ variants, they’d offer fine-grained cycle handling without wrapping every call in catch_unwind.
Cycle interaction with incremental recomputation. When Salsa invalidates a query that was part of a cycle, the recovery function runs again on the next access. The details — does it re-run the whole cycle, or only the recovered query? — depend on the version. For production use, test your recovery behavior after edits, not only on first run.
For a real implementation, read the Salsa book and the rust-analyzer source — rust-analyzer is the largest real-world Salsa user and handles cycles extensively.
Next: Chapter 9: Cross-File Analysis — We’ve been type-checking one file at a time. Real Lua projects span multiple files with require. We’ll extend our type checker to handle cross-file dependencies — resolving types from imported modules, tracking which files depend on which, and invalidating only what changed when a dependency updates.
Chapter 9: Cross-File Type Checking — require() and Module Resolution
Our type checker works great for a single file. But real Lua programs aren’t one file. They’re dozens of files connected by require():
-- utils.lua
local function double(x) return x * 2 end
return double
-- main.lua
local double = require("utils")
local result = double(42)
local sum = result + 1
Right now, our type checker treats require("utils") as Dynamic. It has no idea what utils exports. That means double(42) is also Dynamic — no type checking, no error detection, no help at all.
This chapter changes that. We’ll teach the type checker to resolve require() calls, inspect the module’s exports, and use those types when checking the importing file. And here’s the thing: Salsa already gives us cross-file incrementality for free. When you edit utils.lua, only files that depend on it re-check. Everything else stays cached.
The Dependency Graph
Before writing any code, let’s think about what happens when main.lua requires utils.lua:
┌──────────────┐ ┌──────────────┐
│ utils.lua │ │ main.lua │
│ │ │ │
│ exports: │◄─────── │ requires: │
│ double: │ require │ utils │
│ (n) → n │ │ │
└──────────────┘ └──────────────┘
When main.lua calls require("utils"), we need to:
- Find
utils.luaon disk (module resolution) - Type-check it (if we haven’t already)
- Extract its exports (what does it return?)
- Use those exports as the type of
require("utils")
This creates a dependency: main.lua depends on utils.lua. If utils.lua changes (someone edits the double function’s return type), main.lua’s type check is stale and needs re-running.
Sound familiar? This is exactly what Salsa tracks. When type_check(main.lua) reads the exports of utils.lua, Salsa records a dependency. Change utils.lua? Salsa knows main.lua needs re-checking. Don’t change it? Cached result, instant.
Step 1: Module Resolution
First, we need to find modules. Lua’s require() searches a path — a semicolon-separated list of templates like ?.lua and ?/init.lua. For this chapter, we’ll use a simple resolver that looks in a single directory.
#![allow(unused)]
fn main() {
use std::path::PathBuf;
/// Resolve a module name to a file path.
/// In real Lua, this would search package.path.
/// For us: look for "{name}.lua" in the project directory.
fn resolve_module(module_name: &str) -> Option<PathBuf> {
let path = PathBuf::from(format!("{}.lua", module_name));
if path.exists() {
Some(path)
} else {
None
}
}
}
This is intentionally simple. A real resolver would handle package.path, ?/init.lua, and preloaded modules. But the type-checking logic is the same regardless of how you find the file.
From filesystem to database.
resolve_modulechecks the filesystem — it asks the OS whether a.luafile exists. Our type checker works differently: files live as Salsa inputs in the database, not on disk. When we implementresolve_requirein Step 3, we’ll look up the resolved path in aFileRegistryinstead of checking the filesystem. The resolution logic (turn"math"into"math.lua") is the same — the difference is where we look for the file.
Step 2: The Exports Query
The key insight: a module’s exports are a derived query. They depend on the module’s source text. When the source changes, the exports change. When the exports change, anything that imported those exports is stale.
#![allow(unused)]
fn main() {
/// What does a source file export?
/// We type-check the file and find what the top-level `return` statement returns.
/// Returns the Type of the return expression, or Dynamic if there's no return.
#[salsa::tracked]
pub fn module_exports(db: &dyn salsa::Database, source: SourceFile) -> Type {
let ast = parse(db, source);
let mut env = TypeEnv::new();
let mut return_type = None;
for stmt in &ast.statements {
let result = check_stmt(db, source, stmt, &env);
if result.return_type.is_some() && return_type.is_none() {
return_type = result.return_type;
}
env = result.env;
}
return_type.unwrap_or(Type::Dynamic)
}
}
For now, a module exports a single type — the type of its return value. In a real type checker, you’d export a map of names to types (for M.double, M.triple, etc.). But the single-type approach captures the essential pattern: exports are derived, tracked, and automatically invalidated.
How We Infer Function Types
In previous chapters, function definitions were typed as Dynamic. That won’t work for cross-file checking — if require("math") returns Dynamic, we can’t detect type errors when the module changes.
The fix: when checking a function declaration, we infer its return type from the body. check_stmt for Statement::Function now:
- Checks the function body (which emits diagnostics for the body’s statements)
- Collects the return type from the body’s
returnstatement - Creates a
Type::Function { params, ret }with the inferred return type
#![allow(unused)]
fn main() {
Statement::Function { name, params, body, .. } => {
let mut fe = env.clone();
let param_types: Vec<Type> = params.iter().map(|_| Type::Dynamic).collect();
for p in params {
fe = fe.extend(p.clone(), Type::Dynamic);
}
let body_result = check_block(db, source, body, &fe);
let ret_type = body_result.return_type.unwrap_or(Type::Dynamic);
let func_type = Type::Function {
params: param_types,
ret: Box::new(ret_type),
};
StmtResult { env: env.extend(name.clone(), func_type), return_type: None }
}
}
To make this work without calling infer_type a second time (which would produce duplicate diagnostics), check_stmt and check_block now return both the updated environment and the inferred return type from any return statement they encounter. This is a small structural change — check_stmt returns a StmtResult struct instead of a TypeEnv — but it lets us propagate return types naturally through the checking process.
For the double function in math.lua, the return expression x * 2 is inferred as Number (since x is Dynamic — compatible with Number — and 2 is Number). So double gets the type Function { params: [Dynamic], ret: Number }. When the module is edited to return a string instead, double’s return type becomes String, and any caller that uses the result as a number gets a type error.
What About Type::Table?
You’ll notice a Type::Table { fields: Vec<(String, Type)> } variant in the code. This supports field-level type lookups — for example, when a module returns a table with named exports like M.double, M.triple, etc. The field_type method looks up a field by name in a table type.
Currently, no code path produces a Type::Table value. Our modules return single values (functions, numbers, strings), not tables with named fields. A complete type checker would construct Type::Table values when analyzing table constructors (local M = { double = ... }), and then require("M") would return a Table type with named fields. The Table variant and field_type method are infrastructure for that extension — the pattern is the same, with richer data.
Step 3: Resolving require() in Type Inference
Now we modify infer_type to handle require() calls. When the type checker sees require("utils"), it:
- Resolves
"utils"to a file path - Gets or creates a
SourceFilefor that path - Queries
module_exportsfor that file - Returns the export type
#![allow(unused)]
fn main() {
Expression::FunctionCall { ref func, ref args } => {
// Check if this is a require() call
if let Expression::Name(name) = &**func {
if name == "require" {
if let Some(Expression::StringLiteral(module_name)) = args.first() {
return resolve_require(db, source, module_name);
}
}
}
// Normal function call (same as before)
let ft = infer_type(db, source, (**func).clone(), env.clone());
// ... rest unchanged
}
}
The resolve_require function does the work:
#![allow(unused)]
fn main() {
use salsa::Accumulator;
use std::path::{Path, PathBuf};
fn resolve_require(db: &dyn salsa::Database, _caller: SourceFile, module_name: &str) -> Type {
// analisar's LiteralString includes the source-level quote characters,
// e.g. `"math"` (with quotes), so we strip them before lookup.
let module_name = module_name.trim_matches('"');
// Turn "math" into "math.lua" — same resolution logic from Step 1.
// We don't call resolve_module() here because that checks the
// filesystem, and our files live in the Salsa database. Instead,
// we use resolve_module's path format and look up in the registry.
let path = PathBuf::from(format!("{}.lua", module_name));
let module_file = match find_source_file(db, &path) {
Some(f) => f,
None => {
Diagnostic::error(format!("cannot find module {:?}", module_name))
.emit(db);
return Type::Error;
}
};
// Query the module's exports. This is the magic line:
// - If the module hasn't been type-checked yet, Salsa runs the query now.
// - If it HAS been checked, Salsa returns the cached result.
// - Salsa records: "the caller depends on module_file's exports."
// - Later, if module_file changes, the caller's type check is invalidated.
module_exports(db, module_file)
}
}
Read through resolve_require carefully. The magic is in that one line: module_exports(db, module_file). That’s it. Salsa handles the rest.
First time? Salsa runs the query, caches the result. Already cached? Salsa returns the cached result instantly. Dependency tracked? Yes — Salsa knows the caller read module_file’s exports. Module changed? Salsa invalidates the caller’s type check automatically.
No manual dependency tracking. No invalidation logic. No “mark file dirty” calls. Salsa does it all.
Step 4: The File Registry
There’s a practical problem: resolve_require needs to find the SourceFile for a given path. But SourceFile is a Salsa input — it lives in the database. We need a way to look it up by path.
We use a FileRegistry input that stores the mapping:
#![allow(unused)]
fn main() {
use std::path::{Path, PathBuf};
use std::sync::OnceLock;
#[salsa::input]
pub struct FileRegistry {
#[returns(ref)]
pub files: Vec<(PathBuf, SourceFile)>,
}
static REGISTRY: OnceLock<FileRegistry> = OnceLock::new();
fn find_source_file(db: &dyn salsa::Database, path: &Path) -> Option<SourceFile> {
let registry = REGISTRY.get()?;
let data = registry.files(db);
data.iter()
.find(|(p, _)| p == path)
.map(|(_, f)| *f)
}
}
The registry is stored in a OnceLock — set once during setup, never modified. This means Salsa’s purity contract is maintained in practice: the registry doesn’t change between revisions, so queries that read from it produce deterministic results.
⚠ A production language server would store the registry differently. The right approach is to add the FileRegistry as a field on the Database struct and access it through a custom trait that extends salsa::Database:
#![allow(unused)]
fn main() {
// Production pattern (not shown in this demo):
trait DbExt: salsa::Database {
fn registry(&self) -> &FileRegistry;
}
}
This ensures Salsa knows about the dependency and can invalidate queries if the registry ever changes. For this tutorial, the OnceLock approach is simpler and correct because the registry is write-once. But if you’re building a real language server where files can be added or removed during a session, use the database-field approach.
Step 5: Putting It Together
Let’s see the full system in action:
use salsa::Setter;
use salsa::Accumulator;
use std::path::{Path, PathBuf};
fn main() {
let mut db = Database::default();
let math_module = SourceFile::new(&db,
PathBuf::from("math.lua"),
"local function double(x) return x * 2 end\nreturn double\n".to_string(),
);
let main_file = SourceFile::new(&db,
PathBuf::from("main.lua"),
"local double = require(\"math\")\nlocal result = double(42)\nlocal sum = result + 1\n".to_string(),
);
let registry = FileRegistry::new(&db, vec![
(PathBuf::from("math.lua"), math_module),
(PathBuf::from("main.lua"), main_file),
]);
if REGISTRY.set(registry).is_err() {
panic!("registry already set");
}
// Type-check main.lua — this will also type-check math.lua
// because of the require() dependency
type_check(&db, main_file);
let diags = type_check::accumulated::<Diagnostic>(&db, main_file);
println!("main.lua: {} diagnostics", diags.len());
// Edit math.lua — change double to return a string
math_module.set_text(&mut db).to(
"local function double(x) return \"not a number\" end\nreturn double\n".to_string(),
);
// Re-check main.lua — Salsa knows it depends on math.lua
type_check(&db, main_file);
let new_diags = type_check::accumulated::<Diagnostic>(&db, main_file);
println!("After editing math.lua:");
println!("main.lua: {} diagnostics", new_diags.len());
// Output: main.lua: 1 diagnostics — "cannot use String in arithmetic"
}
The output:
main.lua: 0 diagnostics
After editing math.lua:
main.lua: 1 diagnostics — cannot use String in arithmetic
Before the edit, double returns Number, so result + 1 is fine. After the edit, double returns String, so result + 1 is a type error. Salsa detected the cross-file dependency automatically — we didn’t have to tell it that main.lua depends on math.lua.
The Architecture: Before and After
BEFORE (per-file only):
┌──────────┐ ┌──────────┐
│ utils.lua│ │ main.lua │
│ type_check│ │ type_check│
│ (isolated)│ │ (isolated)│
└──────────┘ └──────────┘
No connection. require() = Dynamic.
AFTER (cross-file):
┌──────────┐ module_exports ┌──────────┐
│ utils.lua│ ────────────────► │ main.lua │
│ type_check│ │ type_check│
└──────────┘ └──────────┘
Salsa tracks: Re-checks when
"main.lua read utils.lua's
utils.lua's exports" exports change
The key change isn’t the code — it’s the dependency. Salsa now knows that main.lua’s type check depends on utils.lua’s exports. Edit utils.lua, and main.lua is automatically re-checked. Don’t edit it, and the cached result is instant.
What About Cycles?
Lua allows circular requires (a.lua requires b.lua, which requires a.lua). Our type checker would loop forever — module_exports(a) calls module_exports(b) which calls module_exports(a) which…
Salsa handles this. When it detects a cycle, it panics (or recovers, if you’ve set up cycle recovery — see Chapter 8 on cycle detection). For a production type checker, you’d want cycle recovery that returns Dynamic for the cyclic dependency.
For this tutorial, we’ll treat circular requires as an error and move on. Chapter 8 covers the recovery pattern.
What We’re Simplifying
A real Lua require() system has nuances we’re skipping:
package.path and package.searchers: Lua’s module search is configurable. We use a single directory.
Module caching: Lua’s require() caches loaded modules. We rely on Salsa’s query cache instead.
package.loaded: You can pre-populate the module cache. We don’t model this.
Re-exports: A module might return another module’s result. Our type checker handles this naturally (it’s another return statement).
Field-level exports: We return a single Type for the module. A real type checker would return a HashMap<String, Type> mapping exported names to their types. This is a natural extension — the Type::Table variant and field_type method are already in place for it.
Function parameter types: Function parameters are typed as Dynamic. A complete type checker would support type annotations on parameters (function double(x: number): number).
Multiple return paths: We take the return type from the first return statement we encounter. A real type checker would merge types from all return paths (e.g., both branches of an if/else).
These simplifications keep the chapter focused on the core lesson: cross-file dependencies in Salsa are automatic. The dependency graph is implicit. You read a query; Salsa tracks it. You change an input; Salsa invalidates the right things. The code doesn’t change much — the architecture does.
Running
cargo run --bin ch09-cross-file
The tutorial now covers the full journey: from a single-file type checker to one that understands module dependencies across files. The Salsa patterns you’ve learned — inputs, tracked functions, interned symbols, tracked structs, accumulators, and cross-file queries — are the same patterns that power rust-analyzer.
Next: Chapter 10: Type Annotations — Our type checker treats every unannotated value as Dynamic. That’s the gradual typing guarantee — no annotations, no errors. But when the programmer does annotate, we should listen. We’ll parse LuaCATS annotations (---@param, ---@type, ---@return) and use them to override inferred types.
Chapter 10: Type Annotations — Teaching the Type Checker What You Know
Our type checker is pretty good at figuring things out on its own. 42 is a Number. "hello" is a String. x * 2 returns Number because * only works on numbers. But it has a blind spot: it doesn’t know what the programmer intended.
Consider:
local function greet(name)
return "Hello, " .. name
end
greet(42) -- passes without complaint
Our type checker infers that greet returns String (from the concatenation). Good. But it types name as Dynamic — it has no idea the programmer meant name to be a String. That means greet(42) passes without complaint, even though passing a number to a function that concatenates it is almost certainly a bug.
Type annotations fix this. They let the programmer say “I know something the type checker doesn’t” — and the type checker uses that knowledge to catch more mistakes.
The LuaCATS Convention
Lua doesn’t have type syntax built into the language. If you write local x: string = "hello", that’s a syntax error. But the Lua ecosystem has converged on a convention: type annotations in comments, using the LuaCATS format.
LuaCATS is used by Lua Language Server, Teal, and other tooling. It looks like this:
---@type string
local name = "hello"
---@param x number
---@param y number
---@return number
local function add(x, y)
return x + y
end
Three dashes. An @ directive. A type name. That’s it.
These are comments — Lua ignores them entirely. But a type checker can parse them and use them as input to type checking. The Lua runtime never sees them. They’re purely for tooling.
Annotations as Salsa Inputs
Here’s the key architectural insight: annotations are derived from source text, but they play the same role as inputs for downstream queries. The programmer writes them. They don’t change unless the programmer edits the file. Conceptually, they’re inputs to type checking — the programmer wrote them, they’re authoritative, and they invalidate downstream results when they change.
We don’t model annotations as a separate #[salsa::input]. Instead, they’re extracted from the source text by a tracked function. This is the right choice because annotations are derived from the source — they’re a parsed view of what’s already in the input. Making them a separate input would mean keeping them in sync with the source text manually. By extracting them via a tracked function, Salsa handles invalidation automatically: change the source → annotations re-extract → type checking re-runs.
The subtle point: annotations are semantically inputs (the programmer wrote them, they’re authoritative) but architecturally derived (extracted from source via a tracked function). That’s not a contradiction — it’s a design win. Salsa’s derived queries give us the invalidation tracking for free.
┌─────────────────┐ ┌──────────────────┐ ┌────────────┐
│ SourceFile │ │ extract_ │ │ type_check │
│ (input) │─────►│ annotations │─────►│ (tracked) │
│ │ │ (tracked) │ │ │
│ text contains │ │ parses ---@ │ │ uses both │
│ both code and │ │ comments into │ │ inferred & │
│ ---@ comments │ │ Annotation │ │ annotated │
└─────────────────┘ │ values │ │ types │
└──────────────────┘ └────────────┘
The Annotation Enum
Before we parse anything, we need a type to represent the parsed result:
#![allow(unused)]
fn main() {
/// A parsed type annotation extracted from a LuaCATS comment.
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub enum Annotation {
/// ---@type <type>
/// Binds a variable to a specific type.
/// `name` is empty for position-based matching (annotation on the
/// line before the variable). A real implementation would use
/// line numbers instead of this placeholder.
Type { name: String, ty: Type },
/// ---@param <name> <type>
/// Annotates a function parameter.
/// `func_name` scopes this annotation to the function it belongs to.
Param { func_name: String, name: String, ty: Type },
/// ---@return <type>
/// Annotates a function's return type.
/// `func_name` scopes this annotation to the function it belongs to.
Return { func_name: String, ty: Type },
}
}
Three variants, one per LuaCATS directive. Param and Return both carry func_name so we know which function they belong to — that’s how we avoid mixing up ---@param x number above function add() with ---@param x string above function greet().
Parsing Annotations
The parser is straightforward — scan each line for ---@ prefixes:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn extract_annotations(db: &dyn salsa::Database, source: SourceFile) -> Vec<Annotation> {
let text = source.text(db);
let mut annotations = Vec::new();
// We track the most recent ---@param annotations so we can associate
// them with the function that follows. LuaCATS convention: params
// and return annotations appear directly above the function.
let mut pending_params: Vec<(String, Type)> = Vec::new();
let mut pending_return: Option<Type> = None;
for line in text.lines() {
let trimmed = line.trim();
// ---@type <typename>
if let Some(rest) = trimmed.strip_prefix("---@type ") {
let ty = parse_type_name(rest.trim());
annotations.push(Annotation::Type { name: String::new(), ty });
continue;
}
// ---@param <name> <typename>
if let Some(rest) = trimmed.strip_prefix("---@param ") {
let parts: Vec<&str> = rest.splitn(2, ' ').collect();
if parts.len() == 2 {
let name = parts[0].to_string();
let ty = parse_type_name(parts[1].trim());
pending_params.push((name, ty));
}
continue;
}
// ---@return <typename>
if let Some(rest) = trimmed.strip_prefix("---@return ") {
pending_return = Some(parse_type_name(rest.trim()));
continue;
}
// Non-annotation line — check if it's a function declaration.
// LuaCATS convention: annotations must appear directly above
// the thing they annotate.
if !trimmed.is_empty() {
if let Some(func_name) = extract_function_name(trimmed) {
// Flush pending params/returns, scoped to this function.
for (name, ty) in pending_params.drain(..) {
annotations.push(Annotation::Param {
func_name: func_name.clone(),
name,
ty,
});
}
if let Some(ty) = pending_return.take() {
annotations.push(Annotation::Return {
func_name: func_name.clone(),
ty,
});
}
} else {
// Not a function line. Stray non-annotation lines between
// annotations and their target break the association.
if !pending_params.is_empty() || pending_return.is_some() {
pending_params.clear();
pending_return = None;
}
}
}
}
annotations
}
}
There are three annotation types:
| Annotation | Meaning | Example |
|---|---|---|
---@type <typename> | Variable has this type | ---@type string |
---@param <name> <typename> | Function parameter has this type | ---@param x number |
---@return <typename> | Function returns this type | ---@return string |
The supported type names are the same ones we’ve been using: number, string, boolean, nil, and any (which maps to Dynamic). This is a simplification — real LuaCATS supports unions (number|string), arrays (string[]), and generic function types. But the parsing logic is the same pattern: match the syntax, produce a Type.
The parse_type_name helper converts a string like "number" into a Type:
#![allow(unused)]
fn main() {
fn parse_type_name(name: &str) -> Type {
match name {
"number" => Type::Number,
"string" => Type::String,
"boolean" => Type::Boolean,
"nil" => Type::Nil,
"any" => Type::Dynamic,
_ => Type::Dynamic, // unknown type name → Dynamic
}
}
}
Unknown type names silently become Dynamic. A real checker would emit “unknown type name” instead — we’ll note that in simplifications below.
The extract_function_name helper picks the function name out of a line like local function add(x, y) — a quick pattern match, not a full parse:
#![allow(unused)]
fn main() {
fn extract_function_name(line: &str) -> Option<String> {
// Match "local function name" or "function name"
let after_fn = line.strip_prefix("local function ")
.or_else(|| line.strip_prefix("function "))?;
// The function name is everything before the opening paren.
after_fn.split('(').next().map(|s| s.trim().to_string())
}
}
This helper handles the two most common function forms — local function name and function name. Method-style declarations (function Foo:bar()) and dot-separated names (function M.foo()) would return the full string before the paren (Foo:bar, M.foo) as the function name — which means annotations wouldn’t match the simple func_name the programmer likely wrote. A production checker would parse the dot-separated and method portions separately. For now, annotations above those forms won’t be associated with any function — the pending params and returns will be cleared by the stray-line rule.
The Position Rule
LuaCATS has a rule: annotations must appear on the line immediately above the thing they annotate. This is how the type checker knows which function a ---@param belongs to:
---@param x number ← belongs to the function below
---@return number ← belongs to the function below
local function double(x)
return x * 2
end
Our parser implements this by tracking “pending” params and returns, then flushing them when it sees a function keyword on a non-annotation line. Each flushed annotation records the function name it belongs to — this prevents a ---@param x number above function add() from being mistakenly applied to a ---@param x string above function greet(). When a non-function, non-annotation line appears with pending annotations, the pending state is cleared: LuaCATS annotations must be directly above their target.
How Annotations Change Type Checking
Before this chapter, check_stmt for a Function always typed parameters as Dynamic. Now it asks: did the programmer annotate this parameter? If yes, use their annotation. If no, fall back to Dynamic.
First, we call extract_annotations to get the annotations for this source file:
#![allow(unused)]
fn main() {
let annotations = extract_annotations(db, source); // Salsa caches this — the file is only scanned once, regardless of how many times check_stmt is called
}
Then, when we encounter a function declaration, we look for matching ---@param annotations:
#![allow(unused)]
fn main() {
Statement::Function { name, params, body, .. } => {
let mut fe = env.clone();
let mut param_types = Vec::new();
for p in params {
let annotated = annotations.iter().find_map(|a| match a {
Annotation::Param { func_name, name: n, ty } if func_name == name && n == p => Some(ty.clone()),
_ => None,
});
let pt = annotated.unwrap_or(Type::Dynamic);
param_types.push(pt.clone());
fe = fe.extend(p.clone(), pt);
}
// ...
}
}
The annotation flows into the type environment before inference runs on the function body. This means inference can USE the annotated types — it doesn’t fight them. If the programmer says x: number and the code does x .. " items" (string concatenation on a number), the type checker catches it because the environment says x is a Number, not Dynamic.
Consuming ---@type Annotations
---@param and ---@return are easy — they carry a func_name so they only match one function. ---@type is harder. It sits above a variable assignment, and our parser stores it with an empty name field. So when check_stmt processes an assignment, how does it know which ---@type to apply?
The answer: consume it after use. The first variable that asks for a ---@type annotation gets it, and the annotation is removed from the list so it’s never applied again:
#![allow(unused)]
fn main() {
fn apply_annotation(annotations: &mut Vec<Annotation>, name: &str, inferred: Type) -> Type {
// Look for a @type annotation. The `n == name` branch is for a future
// named @type feature (e.g. ---@type string myVar). The parser currently
// always sets `name` to empty, so this branch never matches — it's here so
// the code is ready when the parser is extended.
//
// Empty-name @type is position-based — apply to the next variable
// that asks, then consume it.
let mut found_idx = None;
let mut found_ty = None;
for (i, a) in annotations.iter().enumerate() {
if let Annotation::Type { name: n, ty } = a {
if n == name || n.is_empty() {
if !inferred.is_compatible_with(ty)
&& !matches!(inferred, Type::Dynamic | Type::Error)
{
// Mismatch between declared and inferred.
// The annotation wins, but a production checker
// would emit a warning here.
}
found_idx = Some(i);
found_ty = Some(ty.clone());
break;
}
}
}
if let (Some(idx), Some(ty)) = (found_idx, found_ty) {
// Consume the annotation so it's not applied again.
annotations.remove(idx);
ty
} else {
inferred
}
}
}
The key line is annotations.remove(idx) — once a ---@type annotation is used, it’s gone. Without this, a single ---@type string would apply to every variable in scope, which is wrong. The annotation goes with the variable directly below it.
The assignment arm of check_stmt calls this for each target:
#![allow(unused)]
fn main() {
Statement::Assignment { local: _, targets, values } => {
let mut e = env.clone();
for (t, v) in targets.iter().zip(values.iter()) {
let mut vt = infer_type(db, source, v.clone(), e.clone());
if let Expression::Name(n) = t {
vt = apply_annotation(annotations, n, vt);
e = e.extend(n.clone(), vt);
}
}
StmtResult { env: e, return_type: None }
}
}
This is an approximation. Real LuaCATS matches ---@type to the variable on the next line by line number. Our consume-on-read pattern works for the common case (one annotation, one variable, right below it) but breaks if there’s a blank line between them — our parser would have already cleared the pending state. Line number tracking would fix this, and it’s the right thing for a production checker.
There’s a subtlety in how we call apply_annotation. The annotations vector must be extracted once and shared across all statements in a function body — if each check_stmt call creates its own copy, the consume-on-read is a no-op (removing from a local copy that’s discarded). Here’s how the top-level check function should look:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn type_check(db: &dyn salsa::Database, source: SourceFile) -> TypeCheckResult {
let ast = parse(db, source);
let annotations = extract_annotations(db, source);
let env = TypeEnv::empty();
// Pass a *mutable* reference to annotations so that consume-on-read
// persists across all statements in the file.
let mut annotations = annotations;
let mut result_env = env.clone();
for stmt in ast.statements(db) {
let sr = check_stmt(db, source, stmt, result_env, &mut annotations);
result_env = sr.env;
}
TypeCheckResult::new(db, source, result_env)
}
}
Now check_stmt takes &mut Vec<Annotation> instead of creating its own copy, and the annotations.remove(idx) in apply_annotation actually removes the annotation for subsequent calls:
Return Type: Annotation vs Inference
Return types have a subtlety. The programmer might annotate a return type that disagrees with what the body actually returns:
---@return number
local function oops()
return "not a number" -- declared number, returned string
end
Our type checker handles this by:
- Inferring the return type from the body (as before)
- If there’s a
---@returnannotation, checking that the inferred type is compatible - Using the annotated type for the function’s external signature
- Emitting a diagnostic if they disagree
The annotation wins for the signature — if you declare ---@return number, callers see number regardless of what the body does. But the mismatch is flagged, because it’s almost certainly a bug.
This is the same principle as TypeScript: declaration overrides inference, but contradictions are reported.
The Gradual Guarantee
The most important property of a gradual type system: unannotated code still works. No annotation → Dynamic → no type errors. You can annotate one function in a thousand-file project and get value from that one annotation.
-- No annotations — all Dynamic, no errors
local function greet(name)
return "Hello, " .. name
end
greet(42) -- no error (Dynamic is compatible with everything)
-- With annotations — type checking activates
---@param name string
local function greet(name)
return "Hello, " .. name
end
greet(42) -- ERROR: argument 1 has type Number but parameter expects String
This is why we don’t default unannotated parameters to some bottom type or require annotations everywhere. Dynamic is the escape hatch. It’s not a failure of the type checker — it’s the whole point of gradual typing.
What We’re Simplifying
No complex type parsing. Real LuaCATS supports unions (number|string), arrays (string[]), optional types (string?), and generic function types — our parser only handles simple type names. The extension point is clear: make parse_type_name recursive.
No line number tracking for ---@type. Our annotation carries a name field that’s always empty, a placeholder for a future named-annotation feature the parser doesn’t yet support. In LuaCATS, ---@type is position-based (the annotation goes above the assignment), so the type checker matches it to the next variable declaration by line number. We approximate this with a consume-on-read pattern: the first variable that asks for an annotation gets it, and the annotation is removed so it’s not reused. This works when annotations are extracted once and shared across all statements in a function body — the type_check function holds the mutable vector and passes it down. A real implementation would track line numbers for precise matching, which also handles edge cases like blank lines between the annotation and its target.
No ---@field or ---@class. LuaCATS can declare table shapes like ---@class Point { x: number, y: number }, which would connect naturally to our Type::Table variant. The pattern is the same: parse the annotation, produce a Type::Table, store it in the environment.
Unknown type names return Dynamic silently. If you write ---@param x entier (a typo for “number”), our parser returns Dynamic without warning — a real checker would emit “unknown type name: entier.”
Next: Chapter 11: Union Types and Table Classes — Our type checker knows about Number, String, Boolean, and Dynamic. But real Lua code doesn’t always fit into single boxes. We’ll add union types (number|string) and named table classes (---@class Point).
Chapter 11: Union Types — When a Value Can Be More Than One Thing
Our type checker knows about Number, String, Boolean, and Dynamic. But real Lua code doesn’t always fit into single boxes. A function parameter might accept either a string or a number. A variable might hold nil or a table. A return value might be true or an error message.
That’s what union types are for. A union type says “this value is one of these types” — Number | String means “could be a number, could be a string.” LuaCATS uses the | syntax: ---@param x number|string. Our parser doesn’t handle it yet. Let’s fix that.
And then there’s ---@class. Real Lua programs pass around tables with known shapes — a Point has x and y, a User has name and email. LuaCATS lets you declare these shapes with ---@class, and our Type::Table variant has been sitting there unused since Chapter 9. Time to connect them.
Union Types
A union type is a set of possibilities. When the type checker encounters a union, it needs to answer two questions:
Is this value compatible with this type? If a function expects Number | String and you pass 42, that’s fine — Number is one of the possibilities. If you pass true, that’s an error — Boolean isn’t in the union.
What operations are safe? If x is Number | String, you can’t do arithmetic on it (it might be a string) and you can’t concatenate it (it might be a number). You need to narrow the type first — check which possibility it actually is.
Let’s start with the representation.
Adding Type::Union
#![allow(unused)]
fn main() {
#[derive(Clone, Debug, PartialEq, Eq, Hash, salsa::Update)]
pub enum Type {
Number,
String,
Boolean,
Nil,
Dynamic,
Error,
Function {
params: Vec<Type>,
ret: Box<Type>,
},
Table {
fields: Vec<(String, Type)>,
},
Union(Vec<Type>), // <-- NEW: one of these types
}
}
Union(Vec<Type>) holds the variants. The order doesn’t matter — Union([Number, String]) is the same type as Union([String, Number]). We’ll normalize the order when we construct unions so that equality checks work correctly.
Design note: Why
Vec<Type>and notHashSet<Type>? Two reasons. First,TypederivesHashandEq, so aHashSetwould work — butsalsa::Updateneeds to compare old and new values for fine-grained invalidation, andVeccomparisons are simpler and more predictable thanHashSetcomparisons (order-dependent vs. order-independent). Second, union types are typically small (2-3 variants), so the linear-scan cost of aVecis negligible. If you had unions with hundreds of variants, you’d want a different representation, but that’s not a realistic use case.
Constructing Unions
We don’t want Union([Number]) (a union of one type — Number itself) or Union([Number, Number]) (duplicates). Let’s add a constructor that normalizes:
#![allow(unused)]
fn main() {
impl Type {
pub fn union(types: Vec<Type>) -> Type {
// Flatten nested unions: Union([Number, Union([String, Boolean])])
// → Union([Number, String, Boolean])
let mut flat = Vec::new();
for t in types {
match t {
Type::Union(inner) => flat.extend(inner),
other => flat.push(other),
}
}
// Remove duplicates and sort for consistent equality
// Sort by Debug representation for deterministic order.
// This is a simplification: Debug output isn't guaranteed stable
// across Rust versions, and format! inside a sort comparator
// allocates per comparison. For the small unions in this tutorial
// (2–3 variants), it's fine.
//
// Why not derive Ord on Type? Because Expression contains Type
// (in Function, Table variants), and deriving Ord on Type would
// require Ord on every field type — including BinOp, UnaryOp,
// and Expression itself. That cascades through the entire AST.
// A production checker would either: (a) extract a discriminant
// method on Type that returns a stable integer (0=Number, 1=String,
// etc.) and sort by that, or (b) use a canonical ordering key
// derived from the type's structure, avoiding the derive cascade.
flat.sort_by(|a, b| format!("{a:?}").cmp(&format!("{b:?}")));
flat.dedup();
// Simplify: Union of one type → that type
// Union containing Error → Error (tainted)
// Union containing Dynamic → Dynamic (absorbs alternatives)
if flat.len() == 1 {
return flat.pop().unwrap();
}
if flat.iter().any(|t| matches!(t, Type::Error)) {
return Type::Error;
}
if flat.iter().any(|t| matches!(t, Type::Dynamic)) {
return Type::Dynamic;
}
Type::Union(flat)
}
}
}
The normalization handles three cases:
- Nested unions get flattened.
Number | (String | Boolean)becomesNumber | String | Boolean. - Duplicates get removed.
Number | Number | StringbecomesNumber | String. - Singletons get unwrapped. A union of one type collapses to that type. A union containing
Errorsimplifies toError— if something went wrong, the whole type is wrong. A union containingDynamicsimplifies toDynamic— if anything goes, there’s no point listing the alternatives.
Why does
Dynamicabsorb the union? BecauseDynamicmeans “I don’t know and I don’t care.” If one branch of a union isDynamic, the type checker can’t make any guarantees about what the value is — so the union provides no more information thanDynamicalone. The same logic applies toError: if one branch isError, the whole type is tainted. This is the same reasoning as TypeScript:number | anysimplifies toany.
Compatibility: Unions on Both Sides
is_compatible_with needs to handle unions as both the “expected” type and the “actual” type:
#![allow(unused)]
fn main() {
impl Type {
pub fn is_compatible_with(&self, other: &Type) -> bool {
match (self, other) {
// Existing cases...
(Type::Dynamic, _) | (_, Type::Dynamic) => true,
(Type::Error, _) | (_, Type::Error) => true,
(a, b) if a == b => true,
// NEW: value is a union → ALL variants must be compatible with expected
(Type::Union(variants), expected) => {
// Every variant of the source union must fit the target type.
// If x: Number | Boolean and we need Number, it fails — Boolean
// isn't compatible. With `any()`, it would incorrectly pass
// because Number IS compatible. Union-to-union works the same
// way: (Number | Boolean) is NOT compatible with (Number | String)
// because Boolean doesn't fit the target.
variants.iter().all(|v| v.is_compatible_with(expected))
}
// NEW: expected is a union → value must fit at least one variant
(actual, Type::Union(variants)) => {
// A concrete value only needs to fit one variant of the union.
// Number IS compatible with Number | String because it fits
// the Number arm.
variants.iter().any(|v| actual.is_compatible_with(v))
}
// Everything else: not compatible
_ => false,
}
}
}
}
A Union([Number, String]) is compatible with Number (one of the variants matches) and with String (another matches). A Number is compatible with Union([Number, String]) (it fits one of the slots). A Boolean is NOT compatible with Union([Number, String]) (it doesn’t fit any slot).
Parsing Union Annotations
The annotation parser needs to handle ---@param x number|string:
#![allow(unused)]
fn main() {
fn parse_type_name(s: &str) -> Type {
let s = s.trim();
// Union type: "number|string"
if s.contains('|') {
let variants: Vec<Type> = s.split('|')
.map(|part| parse_type_name(part.trim()))
.collect();
return Type::union(variants);
}
match s {
"number" => Type::Number,
"string" => Type::String,
"boolean" => Type::Boolean,
"nil" => Type::Nil,
"any" => Type::Dynamic,
_ => Type::Dynamic, // Unknown type name → Dynamic. A production checker
// would emit a diagnostic here ("unknown type 'nmuber'").
// We don't because the annotation parser runs before
// diagnostics are collected — buffering parse warnings
// would require a separate collection pass.
}
}
}
The recursion handles nested unions: number|string|boolean splits into three parts, each parsed individually. Type::union() normalizes the result.
What Operations Work on Unions?
Here’s the subtle part. If x is Number | String, what can you do with it?
---@param x number|string
local function inspect(x)
return x + 1 -- ERROR: x might be a string
end
Arithmetic requires Number. But x might be String. So the type checker must reject this. In our implementation, infer_type for BinOp::Add checks that both operands are Number:
#![allow(unused)]
fn main() {
BinOp::Add | BinOp::Sub | BinOp::Mul | BinOp::Div => {
let lt = infer_type(db, source, left.clone(), env.clone());
let rt = infer_type(db, source, right.clone(), env.clone());
// For arithmetic, operands must be compatible with Number.
// With the all() rule on Union → expected, Number|String is NOT
// compatible with Number — String isn't, so the union fails the
// "all variants must fit" check. This is the correct behavior:
// arithmetic on a maybe-String value is a type error.
if lt.is_compatible_with(&Type::Number) && rt.is_compatible_with(&Type::Number) {
Type::Number
} else {
// Emit diagnostic: cannot use <type> in arithmetic
Type::Error
}
}
}
With the all() rule on the Union → expected arm, Number | String is NOT compatible with Number — because String isn’t compatible with Number, the union fails the “all variants must fit” check. This means arithmetic on a Number | String value correctly fails: x + 1 when x is Number | String produces a type error, because x isn’t definitely a Number.
But there’s a subtler question: what about assignments? If you have y: Number and x: Number | String, should y = x be allowed? With our all() rule, it correctly fails — you can’t assign a maybe-String value to a definitely-Number variable. A gradual type checker would require a type narrowing (if type(x) == "number" then y = x) before the assignment.
This is why is_compatible_with with all() on the first arm gives the right answer for both operations and assignments. The “overly lenient” behavior from earlier drafts (using any()) has been fixed.
A production type checker would still benefit from a separate is_exactly method for cases where even all() is too lenient — for example, when deciding whether to emit a specific “type mismatch” diagnostic vs. a generic “types not compatible” message. But for this tutorial, is_compatible_with with the corrected all() rule is sufficient.
Table Classes with ---@class
Lua uses tables as objects. A Point table has x and y fields. A User table has name and email fields. Without type annotations, every table is Type::Table { fields: Vec<(String, Type)> } — but the type checker doesn’t know what shape a table should have until it sees the fields.
---@class lets the programmer declare a table’s shape up front:
---@class Point
---@field x number
---@field y number
local p = { x = 1, y = 2 }
This tells the type checker: “any value of type Point has fields x: number and y: number.” It’s a named table type — like a struct, but built from Lua’s universal table mechanism.
Parsing ---@class and ---@field
Add two new annotation variants:
#![allow(unused)]
fn main() {
#[derive(Clone, Debug, PartialEq, Eq, Hash, salsa::Update)]
pub enum Annotation {
Type { name: String, ty: Type, raw_type_name: String },
Param { func_name: String, name: String, ty: Type, raw_type_name: String },
Return { func_name: String, ty: Type, raw_type_name: String },
Class { name: String, fields: Vec<(String, Type)> }, // NEW
Field { class_name: String, name: String, ty: Type }, // NEW
}
}
The raw_type_name field on Type, Param, and Return preserves the original type name string from the annotation — "Point", "number|string", etc. We need it because parse_type_name runs during annotation extraction, before class types are registered in the environment. When it encounters an unknown name like "Point", it returns Type::Dynamic — and the original name is lost. raw_type_name keeps the string around so the resolution pass (shown below) can look it up in the environment after classes are registered.
If you’re wondering why Class and Field don’t have raw_type_name: Class stores its fields directly (no type name to resolve), and Field’s ty is always a primitive type like number (field types in our simplified LuaCATS don’t reference other classes).
The parser handles ---@class Point and ---@field x number:
#![allow(unused)]
fn main() {
"@class" => {
let name = rest.trim().to_string();
Annotation::Class { name, fields: vec![] }
}
"@field" => {
// Format: ---@field <name> <type>
let parts: Vec<&str> = rest.trim().splitn(2, ' ').collect();
if parts.len() == 2 {
Annotation::Field {
class_name: String::new(), // filled in when we see which class this belongs to
name: parts[0].to_string(),
ty: parse_type_name(parts[1]),
}
} else {
continue; // malformed, skip
}
}
}
The ---@field annotations are collected under the most recent ---@class, similar to how ---@param collects under the next function:
---@class Point ← starts class "Point"
---@field x number ← field of Point
---@field y number ← field of Point
When extract_annotations finishes processing a class block, it produces a single Annotation::Class { name: "Point", fields: [("x", Number), ("y", Number)] }.
Storing Named Classes in the Environment
A ---@class Point declaration adds a named type to the environment. When the type checker later sees ---@type Point, it looks up the class by name:
#![allow(unused)]
fn main() {
fn resolve_type_by_name(name: &str, env: &TypeEnv) -> Type {
match name {
"number" => Type::Number,
"string" => Type::String,
"boolean" => Type::Boolean,
"nil" => Type::Nil,
"any" => Type::Dynamic,
other => {
// Look up named class in the environment.
// lookup returns Dynamic for unknown names, which is
// the right default for an undeclared class reference.
env.lookup(other)
}
}
}
}
Wait — this is a chicken-and-egg problem. The class type needs to be in the environment before it’s referenced, but extract_annotations is a tracked function that produces annotations, and annotations are processed during check_stmt. The class declaration itself is an annotation — so where does the type come from?
There’s a related problem: parse_type_name — called by extract_annotations — doesn’t know about the environment. When it encounters "Point", it returns Type::Dynamic. The raw type name is lost, so we can’t resolve it later.
Our solution: store the raw type name alongside the parsed type in each annotation. After the first pass registers class types in the environment, we re-resolve any annotation types that fell back to Dynamic. This is where resolve_annotation_type comes in — it checks whether a type is still Dynamic, then calls resolve_type_by_name with the now-populated environment to look up class names that parse_type_name couldn’t resolve on the first pass.
The answer: ---@class annotations are processed first, before any statements. This is a two-pass approach:
- First pass: Scan all
---@classannotations and add their types to the environment. - Resolve: Re-resolve annotation types that
parse_type_namereturnedDynamicfor, usingresolve_annotation_type(which callsresolve_type_by_namewith the now-populated environment). - Second pass: Type-check statements, using the resolved annotation types.
In our implementation, this means check_stmt for the first statement in a file already has the class types available:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn type_check(db: &dyn salsa::Database, source: SourceFile) -> Vec<Diagnostic> {
let ast = parse(db, source);
let annotations = extract_annotations(db, source);
let mut env = TypeEnv::new();
// First pass: register class types
for ann in &annotations {
if let Annotation::Class { name, fields } = ann {
env.extend(name.clone(), Type::Table { fields: fields.clone() });
}
}
// Resolve: re-resolve annotation types that parse_type_name
// returned Dynamic for (e.g., ---@type Point → Table { x: Number, y: Number })
let resolved_annotations: Vec<Annotation> = annotations
.into_iter()
.map(|ann| match ann {
Annotation::Type { name, ty, raw_type_name } => Annotation::Type {
name,
ty: resolve_annotation_type(&ty, &raw_type_name, &env),
raw_type_name,
},
Annotation::Param { func_name, name, ty, raw_type_name } => Annotation::Param {
func_name,
name,
ty: resolve_annotation_type(&ty, &raw_type_name, &env),
raw_type_name,
},
Annotation::Return { func_name, ty, raw_type_name } => Annotation::Return {
func_name,
ty: resolve_annotation_type(&ty, &raw_type_name, &env),
raw_type_name,
},
other => other,
})
.collect();
// Second pass: type-check statements
let mut diagnostics = Vec::new();
for stmt in &ast.statements {
// ... check each statement ...
}
diagnostics
}
}
This is the same two-pass pattern that real type checkers use: declarations first, then checks. It’s not specific to Salsa — it’s a consequence of forward references. Lua’s local is hoisted (the variable exists from the start of the block), so a named type should also be available before it’s “declared” in the source order.
The resolve_annotation_type helper re-resolves types that parse_type_name returned Dynamic for, using the saved raw_type_name:
#![allow(unused)]
fn main() {
fn resolve_annotation_type(ty: &Type, raw_name: &str, env: &TypeEnv) -> Type {
if let Type::Union(variants) = ty {
// Can't resolve individual union variants without per-variant
// raw names. A union like ---@param x Point|nil degrades to
// Dynamic because parse_type_name("Point") returns Dynamic
// (class not registered yet), and Type::union([Dynamic, Nil])
// collapses to Dynamic. When we get here, ty is already
// Dynamic, so this branch doesn't fire. To fix this, we'd
// need to store the raw name for each variant separately.
return Type::union(variants.to_vec());
}
if matches!(ty, Type::Dynamic) {
let resolved = resolve_type_by_name(raw_name, env);
if !matches!(resolved, Type::Dynamic) {
return resolved;
}
}
ty.clone()
}
}
When parse_type_name("Point") returned Dynamic (because the Point class wasn’t registered yet), resolve_annotation_type now looks up "Point" in the environment and finds the Type::Table { fields: [("x", Number), ("y", Number)] } that the class declaration stored. If the name still isn’t found, it stays Dynamic — we don’t have enough information to type it, and that’s okay.
A limitation to be aware of: union annotations that mix class references with other types — like ---@param x Point|nil — degrade to Dynamic silently. Here’s why: parse_type_name("Point") returns Dynamic (the class isn’t registered yet), and Type::union([Dynamic, Nil]) collapses to Dynamic because Dynamic absorbs. The raw name stored is the whole union string "Point|nil", and resolve_type_by_name("Point|nil", env) won’t find it in the environment — "Point|nil" isn’t a variable name. The fix would be storing per-variant raw names so each can be resolved independently. For now, avoid union annotations with class references, or write them as separate ---@type Point annotations instead.
Named Table Types vs Anonymous Tables
There’s a subtle difference between a named class (Point) and an anonymous table ({ x: number, y: number }). Both have the same fields. But the named class carries identity — Point and Size might have the same fields, but they’re different types.
Our implementation treats named classes as Type::Table { fields } — the fields are the same, so Point and Size with identical fields would be compatible. This is structural typing, not nominal typing. For Lua, this is the right default: Lua doesn’t have classes in the language, and ---@class is a documentation convention, not a language feature. Structural typing matches how Lua actually works.
If you wanted nominal typing (where Point ≠ Size even with the same fields), you’d add a name field to Type::Table:
#![allow(unused)]
fn main() {
Table {
name: Option<String>, // None = anonymous, Some("Point") = named
fields: Vec<(String, Type)>,
}
}
Then compatibility would check the name first: same name → compatible, different name → not compatible (even with the same fields). This is a design choice, not a correctness issue. Our tutorial uses structural typing because it’s simpler and matches Lua’s philosophy.
Field Access on Named Classes
The Expression::FieldAccess arm of infer_type already handles Type::Table:
#![allow(unused)]
fn main() {
Expression::FieldAccess { object, field } => {
let obj_type = infer_type(db, source, object.clone(), env.clone());
match obj_type {
Type::Table { fields } => {
fields.iter()
.find(|(name, _)| name == field)
.map(|(_, ty)| ty.clone())
.unwrap_or(Type::Dynamic)
}
Type::Dynamic => Type::Dynamic,
_ => Type::Error,
}
}
}
This works for named classes automatically — a ---@class Point produces a Type::Table { fields: [("x", Number), ("y", Number)] }, and field access on it returns the field’s type. No changes needed.
The type checker can now catch typos in field names:
---@class Point
---@field x number
---@field y number
local p: Point = { x = 1, y = 2 }
print(p.x) -- Number ✓ (resolved via resolve_type_by_name)
print(p.z) -- Dynamic (field not found — a real checker would emit a diagnostic)
Before the resolve_type_by_name step, ---@type Point would have resolved to Dynamic, and even p.x would return Dynamic. The resolution step is what makes class-typed variables actually useful for field access.
What We’re Simplifying
No type narrowing yet. A real gradual type checker would narrow union types after type checks: if type(x) == "number" then ... x + 1 ... end — inside the if, x is known to be Number. Our type checker doesn’t track control flow yet. This is the biggest missing feature for a production checker, and it’s where most of the complexity lives. Chapter 15 adds narrowing — the if type(x) == "number" pattern is exactly what it handles.
No ---@type on table constructors. When you write local p: Point = { x = 1, y = 2 }, the type checker should verify that the table constructor matches the Point class — all required fields present, all field types compatible. Our checker applies the annotation to p but doesn’t validate the constructor against it.
No optional fields. LuaCATS supports ---@field x? number (field might be absent). Our Type::Table has no representation for optional fields — every field is required. Adding optional fields is straightforward: change fields: Vec<(String, Type)> to fields: Vec<(String, Type, bool)> where the bool indicates optionality.
Structural, not nominal typing. Two classes with the same fields are compatible. Nominal typing (where class names matter) is a design choice, not a bug — but it’s worth knowing the tradeoff.
Next: Chapter 12: Generic Functions — ---@generic lets you write type-parameterized functions like ---@generic T on function identity(x). We’ll parse generic annotations, represent type variables in our type system, and substitute them during inference.
Chapter 12: Generic Functions — One Definition, Many Types
Our type checker can annotate functions, infer types, and handle unions and classes. But there’s a gap that gets wider the more you use it. Consider the simplest possible function:
local function identity(x)
return x
end
What’s the type of identity? It takes something and returns the same thing. With what we have so far, the best we can write is:
---@param x any
---@return any
But that’s wrong. Not “slightly imprecise” wrong — semantically wrong. If you call identity(42), you know you’ll get a number back. The type checker doesn’t. It thinks you might get anything back, because any means “I have no idea.” You’ve lost information that you — the programmer — had.
Or with unions:
---@param x number|string
---@return number|string
Still wrong. Pass a number, get back number|string. The return type is wider than it should be. The type checker should know that identity(42) returns number, not number|string. The output should be as specific as the input.
This is what generic functions solve. A generic function has type parameters — variables that stand in for types. When you call the function, the type parameters are replaced (substituted) with the actual types of the arguments. The return type is computed from that substitution.
---@generic T
---@param x T
---@return T
local function identity(x)
return x
end
Now identity(42) substitutes T = number and returns number. identity("hello") substitutes T = string and returns string. The type of the return value tracks the type of the argument. No information lost.
The Problem: Information Loss
Here’s the pattern that generic functions fix. You have a function where the output depends on the input, but not in a way that a fixed type signature can express:
| Function | Input | Output you want | Output you get with any |
|---|---|---|---|
identity(x) | number | number | any |
first(arr) | {string} | string | any |
map(arr, f) | {number}, number → string | {string} | any |
Every row has the same problem: the relationship between input and output is lost. The type checker can’t express “the output type depends on the input type” without generics.
With type parameters, these relationships become explicit:
identityis(T) → T— same type in and outfirstis({T}) → T— element type of the arraymapis({T}, (T) → U) → {U}— transform element types
The letters T and U are type variables. They’re placeholders that get resolved at the call site.
Type::Var — Type Variables
We add one new variant to Type:
#![allow(unused)]
fn main() {
pub enum Type {
Number, String, Boolean, Nil,
Function { params: Vec<Type>, ret: Box<Type> },
Table { fields: Vec<(String, Type)> },
Union(Vec<Type>),
Var(String), // NEW: type variable (T, K, V, etc.)
Dynamic, Error,
}
}
Type::Var("T".to_string()) means “some type that we’ll figure out later.” It’s a promise: when we know what T is, we’ll replace it.
Type variables interact with the existing type system in three ways.
Compatibility: Var("T") is compatible with everything, like Dynamic. An unresolved type variable could be anything — we can’t reject it. This is a simplification. A full type checker would track constraints on type variables and reject inconsistent ones, but that requires constraint solving, which is beyond a gradual type checker’s scope.
Substitution: The whole point of Var is that it gets replaced. We’ll add a substitute method that walks a type and replaces Var("T") with whatever the substitution map says.
Display: Var("T") displays as T. After substitution, it displays as the concrete type.
Parsing ---@generic
LuaCATS uses ---@generic to declare type parameters:
---@generic T
---@param x T
---@return T
local function identity(x) return x end
Multiple type parameters are comma-separated:
---@generic K, V
---@param t table
---@return K, V
local function next(t) end
We add a new Annotation variant:
#![allow(unused)]
fn main() {
pub enum Annotation {
// ... existing variants ...
Generic { func_name: String, type_params: Vec<String> },
}
}
The parsing follows the same pattern as ---@param: when we see ---@generic, we collect the type parameter names and attach them to the next function declaration.
One subtlety: how do we distinguish a type variable from a type name in ---@param annotations? When the user writes ---@param x T, is T a class named T or a type variable?
We use a convention: uppercase names are type variables. If a type name starts with an uppercase letter and contains only alphanumeric characters, it’s a type variable. This matches LuaCATS practice — T, K, V are universally used as type parameters, while number, string, Point, User are concrete types.
#![allow(unused)]
fn main() {
fn parse_type_name(s: &str) -> Type {
// ...
_ if s.chars().next().map(|c| c.is_uppercase()).unwrap_or(false)
&& s.chars().all(|c| c.is_alphanumeric()) => Type::Var(s.to_string()),
_ => Type::Dynamic,
}
}
This is a heuristic. A real type checker would cross-reference the ---@generic declaration — names listed there are type variables, everything else is a type name. Our convention gets the same result in practice without that extra bookkeeping. The one case where it breaks down: a capitalized built-in name like String would be parsed as Type::Var("String") instead of Type::String. In practice, LuaCATS annotations use lowercase for built-in types, so this rarely comes up — but a production checker should check ---@generic declarations instead of relying on casing alone.
Substitution: The Core Mechanism
When a generic function is called, we need to figure out what the type variables resolve to. The mechanism is substitution — a mapping from type variable names to concrete types.
Here’s how it works for identity(42):
identityhas type(T) → T- The argument
42has typeNumber - Match formal parameter
Tagainst actual argumentNumber→T = Number - Substitute in the return type:
T→Number - Result:
Number
The substitution map is { T → Number }. We apply it to the return type, and the Var("T") becomes Number.
The Substitution struct
#![allow(unused)]
fn main() {
pub struct Substitution {
map: Vec<(String, Type)>,
}
}
A simple list of (variable name, concrete type) pairs. We could use a HashMap, but for the small maps we’re dealing with (1-3 variables), a linear scan is simpler and avoids the hashing overhead.
Building a substitution from function arguments
The key operation: given a function’s formal parameter types (which may contain Var) and the actual argument types (which are concrete), build a substitution.
#![allow(unused)]
fn main() {
impl Substitution {
fn from_args(formal_params: &[Type], actual_args: &[Type]) -> Substitution {
let mut subst = Substitution::new();
for (formal, actual) in formal_params.iter().zip(actual_args.iter()) {
subst.unify(formal, actual);
}
subst
}
}
}
The unify method matches a pattern (the formal parameter type, which may contain variables) against a concrete type, and extends the substitution with any variable bindings it discovers:
#![allow(unused)]
fn main() {
fn unify(&mut self, pattern: &Type, actual: &Type) {
match (pattern, actual) {
// Pattern is a type variable → bind it
(Type::Var(name), ty) => {
self.insert(name.clone(), ty.clone());
}
// Both are functions → unify params and returns
(Type::Function { params: p1, ret: r1 },
Type::Function { params: p2, ret: r2 }) if p1.len() == p2.len() => {
for (fp, ap) in p1.iter().zip(p2.iter()) {
self.unify(fp, ap);
}
self.unify(r1, r2);
}
// Both are tables → unify fields
(Type::Table { fields: f1 }, Type::Table { fields: f2 }) => {
for (name, ft) in f1 {
if let Some((_, at)) = f2.iter().find(|(n, _)| n == name) {
self.unify(ft, at);
}
}
}
// Unions → try each variant
//
// ⚠️ Simplification: unifying a union pattern against a concrete
// type tries every variant and collects all bindings. This is
// imprecise — a union means "one of these, not all of them."
// If Union([T, U]) unifies with Number, we'd bind both T → Number
// and U → Number, even though the union means "T or U, not both."
// For our gradual checker, this rarely causes problems because the
// bindings tend to be consistent, but a full type checker would try
// each variant separately and keep the first consistent binding.
(Type::Union(variants), actual) | (actual, Type::Union(variants)) => {
for v in variants { self.unify(v, actual); }
}
// Concrete types matching → nothing to bind
_ => {}
}
}
}
The name unify is intentional — this is a simplified form of unification, the algorithm that powers full type inference systems (where the compiler deduces types without any annotations at all, by solving constraints). Full unification handles bidirectional constraints and occurs-check (preventing T = Vec<T>). Our version is one-directional: we match the formal pattern against the actual type. This is enough for a gradual type checker.
What if a type variable maps to two different types? If identity had two parameters with the same variable — fun(x: T, y: T): T — and you called it with (42, "hello"), the substitution would try to map T → Number from the first argument and T → String from the second. The insert method detects this: if a variable is already mapped to a different concrete type, it keeps the first mapping and the inconsistency surfaces as a type error during checking. A full type checker would report the conflict immediately (“T cannot be both Number and String”), but for a gradual checker, letting the downstream error propagate is simpler and still catches the bug.
Applying the substitution
The substitute method walks a type and replaces variables:
#![allow(unused)]
fn main() {
impl Type {
fn substitute(&self, subst: &Substitution) -> Type {
match self {
Type::Var(name) => subst.get(name).cloned()
.unwrap_or_else(|| self.clone()),
Type::Function { params, ret } => Type::Function {
params: params.iter().map(|p| p.substitute(subst)).collect(),
ret: Box::new(ret.substitute(subst)),
},
Type::Table { fields } => Type::Table {
fields: fields.iter()
.map(|(n, t)| (n.clone(), t.substitute(subst)))
.collect(),
},
Type::Union(variants) => Type::union(
variants.iter().map(|v| v.substitute(subst)).collect()
),
other => other.clone(),
}
}
}
}
Atomic types (Number, String, etc.) substitute to themselves. Type variables are looked up in the map. Composite types (Function, Table, Union) recurse into their components.
If a variable isn’t in the substitution map, it stays as Var. This happens when a type variable appears in the return type but not in any parameter — there’s no argument to bind it from. The unbound variable remains as Var, which is compatible with everything (like Dynamic). A production type checker would report an error for unbound type variables.
The Call Site: Where Substitution Happens
The substitution is built and applied when a generic function is called, not when it’s defined. Here’s the change to infer_type for function calls:
#![allow(unused)]
fn main() {
Expression::FunctionCall { ref func, ref args } => {
let ft = infer_type(db, source, (**func).clone(), env.clone());
let arg_types: Vec<Type> = args.iter()
.map(|a| infer_type(db, source, a.clone(), env.clone()))
.collect();
match ft {
Type::Function { params, ret } => {
let subst = Substitution::from_args(¶ms, &arg_types);
ret.substitute(&subst)
}
Type::Dynamic => Type::Dynamic,
_ => Type::Error,
}
}
}
This uses ref func and ref args to borrow from the expression rather than taking ownership, consistent with the match patterns in earlier chapters. The (**func).clone() extracts and clones the boxed expression; args.iter() borrows the argument vector instead of consuming it.
The key addition: Substitution::from_args(¶ms, &arg_types) followed by ret.substitute(&subst).
This works for every function call, not only generic ones. If the function has no type variables, the substitution is empty and the return type passes through unchanged. No special-casing needed.
Let’s trace through the examples:
identity(42):
identityhas type(T) → T- Argument type:
Number - Substitution:
{ T → Number } - Return type after substitution:
Number
identity("hello"):
identityhas type(T) → T- Argument type:
String - Substitution:
{ T → String } - Return type after substitution:
String
identity(true):
identityhas type(T) → T- Argument type:
Boolean - Substitution:
{ T → Boolean } - Return type after substitution:
Boolean
Same function, different call sites, different return types. That’s the point.
Two Type Parameters: Map
The identity function uses one type parameter. Real generic functions often use more. Here’s map:
---@generic T, U
---@param arr table
---@param f fun(x: T): U
---@return table
local function map(arr, f)
return arr
end
T is the element type of the input array. U is the element type of the output array. The function type fun(x: T): U tells the type checker: “the callback takes a T and returns a U.”
A note before you try this: our parser can’t represent function types in annotations yet (that would require parsing fun(x: T): U as a Type::Function), so for now the ---@param f annotation falls through to Dynamic. The ---@generic T, U declaration is in place, but until Chapter 13 adds function type parsing, it won’t change the type checker’s behavior for this function. The substitution infrastructure works — when we add function type parsing, map’s type would be:
({T}, (T) → U) → {U}
And a call like map({1, 2, 3}, tostring) would produce the substitution { T → Number, U → String }, giving the return type {String}.
The infrastructure works. What’s missing is function type parsing in annotations — fun(x: T): U — which Chapter 13 adds. The ---@generic parsing is already in place from this chapter.
Inside the Function Body
When type-checking the body of a generic function, we don’t know what the type variables resolve to. identity’s parameter x is typed as T, but inside the function, T could be anything. We substitute type variables with Dynamic inside function bodies:
#![allow(unused)]
fn main() {
// Build the substitution directly: each type param → Dynamic
// Inside the function body, we don't know what T resolves to,
// so we treat it as Dynamic. The substitution happens at the
// call site, not inside the function.
let mut body_subst = Substitution::new();
for tp in &type_params {
body_subst.insert(tp.clone(), Type::Dynamic);
}
func_env = func_env.extend(p.clone(), pt.substitute(&body_subst));
}
We build the substitution directly instead of using from_args for a reason: unify(Dynamic, Dynamic) hits the wildcard arm and produces no binding. If we passed [Dynamic, Dynamic, ...] as both formal and actual, the substitution would be empty and Var("T") would stay as Var("T") instead of becoming Dynamic.
This means inside identity, x has type Dynamic, not T. That’s correct — the function body doesn’t know the concrete type, so it can’t assume anything about T. The substitution from the call site doesn’t flow into the body.
A more sophisticated type checker would carry the type parameter constraints into the body. But for a gradual type checker, Dynamic inside the body is the right tradeoff: simple, correct, and it doesn’t produce false positives.
What We’re Simplifying
No constraint solving. Full type inference systems build a set of constraints and solve them. We match arguments to parameters one-directionally. This means we can’t infer types that flow “backwards” — if a function’s return type constrains its input type, we miss it. For a gradual type checker with explicit annotations, this is fine — the annotations provide the constraints.
No occurs check. In full unification, T = Table<T> would be rejected (a type can’t contain itself). Our substitution allows this. It would produce an infinitely recursive type, which our Display implementation would loop on. For the generic function annotations we’re handling here, self-referential type variables don’t arise — a function’s type parameters are always resolved to concrete types at the call site. Recursive class types (like Node containing Node|nil) are a different problem, which Chapter 14 handles with named references instead of expansion.
No higher-kinded types. You can’t write ---@generic M where M is a type constructor (like List or Option). Type parameters are always concrete types. This matches LuaCATS’s design.
No function type parsing in annotations. ---@param f fun(x: T): U would need a parser for function type syntax. Our parse_type_name maps fun(...) to Dynamic. A complete implementation would parse the function type, potentially with type variables.
Type variable convention, not declaration check. We treat uppercase names as type variables. A production checker would verify that type variables are declared in ---@generic. Our approach works for well-formed annotations but doesn’t catch typos like ---@param x Typpo where Typpo isn’t a declared type parameter.
Unbound type variables are Dynamic-like. If a type variable appears in a return type but not in any parameter type, it can’t be resolved at the call site. It stays as Var, which is compatible with everything. A production checker would report an error.
Next: Chapter 13: Function Types in Annotations — We’ve inferred function types from their bodies, but we haven’t let the programmer declare them. ---@param and ---@return are half the story — fun(number): string is the other half.
Chapter 13: Function Types — Specifying What Goes In and What Comes Out
In Chapter 12, we wrote a generic map function with this annotation:
---@param f fun(x: T): U
But our parser couldn’t handle fun(x: T): U. It saw “fun” — not a known type name — and fell through to Dynamic. The type checker never learned what the callback does. f was Dynamic, and map couldn’t connect the callback’s parameter type to the array’s element type.
This chapter fixes that. We add a parser for function type syntax, which lets us express higher-order functions with full type precision.
The Problem: fun(...) Falls Through to Dynamic
Our parse_type_name function from Chapter 10 handles simple names:
#![allow(unused)]
fn main() {
fn parse_type_name(s: &str) -> Type {
match s.trim() {
"number" => Type::Number,
"string" => Type::String,
// ... other known types
_ if looks_like_type_var(s) => Type::Var(s.to_string()),
_ => Type::Dynamic, // ← "fun(x: T): U" hits this branch
}
}
}
The fun(...) syntax isn’t a simple name — it’s a structured type with parameters and a return type. We need a recursive parser that can handle it.
The Grammar
LuaCATS function type syntax:
fun_type = "fun" "(" param_list ")" ":" type
param_list = param ("," param)*
param = name ":" type | type
type = simple_name | fun_type | union_type
union_type = type ("|" type)*
Examples and what they produce:
| Annotation | Parsed Type |
|---|---|
fun(x: number): string | Function([Number], String) |
fun(x: T): U | Function([Var("T")], Var("U")) — Var("U") is the return type, not a param |
fun(number): boolean | Function([Number], Boolean) — unnamed params |
fun(f: fun(T): U): table | Function([Function([Var("T")], Var("U"))], Dynamic) — nested |
fun(x: number, y: string): boolean | Function([Number, String], Boolean) — multiple params |
The parser is recursive: fun(f: fun(T): U): table has a function type inside a function type. This is how you express higher-order functions — functions that take other functions as arguments.
Step 1: From parse_type_name to parse_type
The old function handled one thing: simple names. The new function handles three:
#![allow(unused)]
fn main() {
fn parse_type(s: &str) -> Type {
let s = s.trim();
// Function type: fun(...): return_type
if s.starts_with("fun(") || s.starts_with("fun (") {
return parse_function_type(s);
}
// Union type: "number|string"
//
// Caveat: this naive split on '|' breaks on nested unions inside
// function types, e.g. "fun(x: number|string): boolean". The '|' inside
// the function type's parameter gets split incorrectly. A production
// parser would track parenthesis depth and only split on '|' at depth 0
// — the same pattern we use for commas in parse_param_types below.
// For the annotations in this tutorial (no nested unions in function
// params), the naive split works. See "What We're Simplifying" at the end.
if s.contains('|') {
let variants: Vec<Type> = s.split('|')
.map(|part| parse_type(part.trim()))
.collect();
return Type::union(variants);
}
// Simple type name (the old parse_type_name logic)
parse_type_name(s)
}
}
The order matters: fun( must be checked first because function type syntax can contain | in its return type (like fun(T): U|nil). If we checked unions first, we’d split incorrectly on the | inside the return type. Then check for top-level unions, then fall back to simple names.
Wait — does fun(T): U|nil work with our parser? Let’s trace through it:
s = "fun(T): U|nil"— starts withfun(, so we callparse_function_typeparse_function_typefinds the matching), extractsTas the param- It finds
: U|nilafter the closing paren - It calls
parse_type("U|nil")for the return type - That hits the union branch:
parse_type("U")→Var("U"),parse_type("nil")→Nil - Result:
Function([Var("T")], Union([Var("U"), Nil]))
It works because the return type is parsed by the same parse_type function — recursion handles the union automatically.
Step 2: Parsing the Function Type
The function type parser has three jobs:
- Find the matching closing paren (respecting nesting)
- Parse parameter types from the content inside the parens
- Parse the return type after the colon
#![allow(unused)]
fn main() {
fn parse_function_type(s: &str) -> Type {
// Skip "fun" and optional space
let after_fun = if s.starts_with("fun(") { &s[3..] } else { &s[4..] };
// Find matching closing paren
let paren_content = match find_matching_paren(after_fun) {
Some(content) => content,
None => return Type::Dynamic, // malformed
};
// Find the return type after the closing paren
// Byte offset of the character after the closing ')'
let after_paren = if s.starts_with("fun(") {
4 + paren_content.len() + 1 // "fun(" + content + ")"
} else {
5 + paren_content.len() + 1 // "fun (" + content + ")"
};
let return_type = if let Some(rest) = s.get(after_paren..) {
let rest = rest.trim();
if let Some(rest) = rest.strip_prefix(':') {
parse_type(rest)
} else {
Type::Dynamic // no return type → Dynamic
}
} else {
Type::Dynamic
};
// Parse parameter types from paren_content
let params = parse_param_types(paren_content);
Type::Function { params, ret: Box::new(return_type) }
}
}
Finding the matching paren. We can’t find the first ) and call it done — function types can be nested. fun(f: fun(T): U): V has two closing parens, and we need the outer one. So we count nesting depth:
#![allow(unused)]
fn main() {
fn find_matching_paren(s: &str) -> Option<&str> {
if !s.starts_with('(') { return None; }
let mut depth = 1;
let mut i = 1;
while i < s.len() && depth > 0 {
match s.as_bytes()[i] {
b'(' => depth += 1,
b')' => depth -= 1,
_ => {}
}
i += 1;
}
if depth == 0 { Some(&s[1..i-1]) } else { None }
}
}
Step 3: Parameter Parsing — Named vs Unnamed
LuaCATS supports both named and unnamed parameters:
fun(x: number): string — named: x is number
fun(number): string — unnamed: only the type
For type checking, we only care about the type — the name is documentation. Our parser handles both:
#![allow(unused)]
fn main() {
fn parse_param_type(s: &str) -> Type {
let s = s.trim();
// Find the colon that isn't inside parens
let mut depth = 0;
for (i, c) in s.char_indices() {
match c {
'(' => depth += 1,
')' => depth -= 1,
':' if depth == 0 => {
// Named param: "x: type" — take everything after the colon
return parse_type(&s[i + 1..]);
}
_ => {}
}
}
// Unnamed param: the whole thing is a type
parse_type(s)
}
}
Why track depth when looking for the colon? Because a parameter could be a function type: f: fun(T): U. The first colon (after f) is the name separator. The second colon (after fun(T)) is inside the function type. We only want the first one at depth 0.
Step 4: Commas Need Depth-Tracking Too
Parameter lists are comma-separated: fun(x: number, y: string): boolean. But a parameter can be a function type: fun(f: fun(T): U, x: T): U. The comma between U and x is a parameter separator. But there’s no comma inside fun(T): U — so a naive split(',') would break.
We split on commas at depth 0:
#![allow(unused)]
fn main() {
fn parse_param_types(s: &str) -> Vec<Type> {
if s.trim().is_empty() {
return Vec::new(); // fun() — no parameters
}
let mut params = Vec::new();
let mut depth = 0;
let mut start = 0;
for (i, c) in s.char_indices() {
match c {
'(' => depth += 1,
')' => depth -= 1,
',' if depth == 0 => {
let param = s[start..i].trim();
if !param.is_empty() {
params.push(parse_param_type(param));
}
start = i + 1;
}
_ => {}
}
}
// Last param (after the final comma, or the only param if no commas)
let last = s[start..].trim();
if !last.is_empty() {
params.push(parse_param_type(last));
}
params
}
}
This pattern — track depth, split at depth 0 — shows up everywhere in recursive parsing. It’s the same technique you’d use for parsing nested parentheses in any language.
The Result: map Finally Has a Real Callback Type
Before (Chapter 12):
---@param f fun(x: T): U
-- parse_type_name("fun(x: T): U") → Dynamic
-- f's type: Dynamic (useless)
After (this chapter):
---@param f fun(x: T): U
-- parse_type("fun(x: T): U") → Function([Var("T")], Var("U"))
-- f's type: Function([Var("T")], Var("U"))
Now when map is called with a fun(number): string callback, the substitution correctly maps T → Number and U → String. The type checker can verify that the callback’s parameter type matches the array’s element type.
Higher-Order Function Composition
A higher-order function is a function that takes another function as an argument or returns one as a result. map is a higher-order function — it takes a callback. So is compose, which takes two functions and returns a new one. Function types in annotations let us express more than map alone. Here’s composition:
---@generic T, U, V
---@param f fun(x: U): V
---@param g fun(x: T): U
---@return fun(x: T): V
local function compose(f, g)
return f
end
The return type fun(x: T): V is a function type with type variables. When we call compose with concrete functions:
f: fun(string): booleang: fun(number): string
The substitution maps T → Number, U → String, V → Boolean, and the return type resolves to fun(number): boolean — exactly what you’d expect from composing those two functions.
This is the power of combining generics (Chapter 12) with function types (this chapter): the type checker can reason about higher-order functions end-to-end, not treat every callback as Dynamic by default.
What We’re Simplifying
No table types in function params. You can’t write fun(t: {x: number}): number yet — our parser doesn’t handle inline table types inside function annotations. You’d need to define a ---@class and use the class name instead.
No optional parameters. LuaCATS supports fun(x?: number): string for optional parameters. We don’t parse the ? syntax.
No variadic parameters. fun(...: number) is valid LuaCATS but not handled here.
No overloads. fun(number): string | fun(string): number (multiple signatures for the same function) requires more work — you’d need to parse it as a union of function types and check each branch during inference.
Union splitting doesn’t track depth. The parse_type function splits on | with s.split('|'), which doesn’t respect nesting. This works for number|string and even number|fun(x: number): boolean (no | inside the function type). But number|fun(x: number|string): boolean would split incorrectly on the | inside the function type’s parameter. A depth-aware split — the same pattern we use for commas and colons — would fix this. For the annotations in this tutorial, the naive split is sufficient; if you’re building a production parser, add depth-tracking to the union branch.
These are all solvable with the same recursive parsing approach — each adds more branches to parse_type.
Running
cargo run --bin ch13-function-types
This chapter has 10 test cases covering the parser, annotation extraction, and substitution with function types:
cargo test --bin ch13-function-types
Next: Chapter 14: Recursive Types and Self-Reference — A linked list node that references itself. A tree where each branch is another tree. These types show up constantly in real Lua code — and they break naive type checking because typeof(field) points back to the type we’re still defining. We’ll handle self-reference with named type references — instead of expanding Node forever, we stop at Ref("Node") and compare by name.
Chapter 14: Recursive Types — When a Type References Itself
You can write this in Lua:
---@class Node
---@field value number
---@field next Node|nil
local list = { value = 1, next = { value = 2, next = nil } }
Node has a field of type Node. Not a copy of Node — the same Node. A linked list that points to itself. A tree whose children are more trees. A JSON value that might contain other JSON values.
This is a recursive type. And if you try to represent it the way we’ve been representing types — by expanding the definition everywhere it appears — you’ll loop forever. Node contains Node contains Node contains Node…
Our type checker needs to handle this without spinning. The fix is simpler than you’d expect: don’t expand recursive types. Refer to them by name.
Expanding Node forever: Using a name reference:
Table { Table {
value: Number, value: Number,
next: Union([ next: Union([
Table { Ref("Node"), ← stops here
value: Number, Nil
next: Union([ ])
Table { }
value: Number,
next: Union([ Ref("Node") is a lazy pointer —
Table { "the type named Node" without
... ♾️ expanding what Node contains.
}, Compare names, not structures.
Nil One comparison, no loop.
])
}),
Nil
])
}),
Nil
])
}
The Problem: Substitution Loops
When our type checker encounters ---@field next Node|nil, it does two things:
- Look up
Nodein the environment → gets the class’s type - Use that type as the field type
If Node’s type is Table { fields: [("value", Number), ("next", Union([Node, Nil])] }, step 2 tries to resolve Node inside the definition. Which means looking up Node again. Which means expanding the same type. Which means looking up Node again. Infinite loop.
The same problem shows up with generic recursive types. Imagine:
---@class Tree<T>
---@field value T
---@field children Tree<T>[]
Tree<T> contains Tree<T>[] — itself, parameterized. Expanding it produces another Tree<T>, which produces another…
This isn’t a bug in our type checker. It’s a fundamental property of recursive definitions. The solution isn’t to prevent recursion — it’s to represent recursive types without expanding them.
Step 1: Names as References, Not Expansions
Right now, when check_stmt encounters ---@class Node, it registers the class in the environment with its full type:
#![allow(unused)]
fn main() {
// Current approach: store the expanded type
env.extend(name, Type::Table { fields: resolved_fields });
}
The problem: resolved_fields might contain Node, which triggers another lookup, which triggers another resolution…
Instead of expanding the class name into its full type, we can keep it as a name. When the type checker encounters Node as a field type, it stores… Node. Not the expanded definition. The name, and nothing more.
We already have the infrastructure for this: Type::Var stores a name. But Var means “a type variable to be resolved by substitution.” A recursive reference is different — it’s a name that shouldn’t be substituted away.
We need a new variant:
#![allow(unused)]
fn main() {
/// A named type reference — used for recursive types.
/// `Ref("Node")` means "the type named Node" without expanding it.
/// Unlike `Var("T")`, a `Ref` is never substituted — it's a final value.
Ref(String),
}
Ref and Var both store names, but they behave differently:
Var("T") | Ref("Node") | |
|---|---|---|
| Meaning | “Resolve me via substitution” | “I am a named type” |
| Substitution | Replaced with the mapped type | Left as-is |
| Compatibility | Compatible with everything (unknown) | Compatible with itself (same name) |
| When to use | Generic type parameters | Recursive type references |
Step 2: Class Registration with Ref
When check_stmt processes ---@class Node, it registers the class in the environment. Instead of expanding the class’s fields eagerly, we store a Ref:
#![allow(unused)]
fn main() {
Statement::Class { name, fields } => {
// Note: In our implementation, class registration happens via
// Annotation::Class in type_check's first pass, not here.
// This arm is shown for completeness — it demonstrates the
// registration pattern you'd use if classes were statements.
// Register the class name FIRST, before processing fields.
// This way, fields that reference the class itself will find it.
env.extend(name.clone(), Type::Ref(name.clone()));
// Now resolve field types. Any reference to this class
// will find the Ref we registered above — no infinite loop.
let resolved_fields: Vec<(String, Type)> = fields.iter()
.map(|(fname, ftype)| {
(fname.clone(), ftype.clone()) // parse_type already produces Ref for class names
})
.collect();
// Update the environment with the full table type.
// The Ref is still there for compatibility checks.
env.extend(name, Type::Table { fields: resolved_fields });
}
}
Let’s trace through what happens when we process ---@class Node:
---@class Nodearrives- Register
Node → Ref("Node")in the environment — this is the key step - Process
---@field value number→("value", Number) - Process
---@field next Node|nil→ look upNode→ findRef("Node")→ field type isUnion([Ref("Node"), Nil]) - Register
Node → Table { fields: [("value", Number), ("next", Union([Ref("Node"), Nil]))] }
The Table contains Ref("Node"), not another Table. No infinite expansion. The Ref acts as a lazy pointer — it names the type without expanding it.
The registration order matters: we register Ref("Node") before processing the fields (step 2 before steps 3-4). This way, when the next field looks up Node, it finds the Ref immediately. Step 5 then overwrites the Ref with the full Table, but by then all fields have already been resolved. This pattern — “declare the name first, fill in the details later” — is the same one C uses with forward declarations, and it shows up everywhere in compiler design.
Step 3: Compatibility for Ref
How does is_compatible_with handle Ref? Two Ref("Node") values are compatible because they refer to the same type:
#![allow(unused)]
fn main() {
(Type::Ref(a), Type::Ref(b)) => a == b,
}
What about Ref("Node") vs Table { fields: [...] }? If the Table is Node’s definition, they’re the same type — different representations of the same thing. This is where nominal vs structural typing comes back. We have two choices:
Option A: Nominal. Ref("Node") is only compatible with Ref("Node"). If the type checker has resolved Node to a Table, they’re different. Simple, consistent, but means you can’t compare a Ref with its expansion.
Option B: Structural with a lookup. When comparing Ref("Node") with a Table, look up Node in the environment and compare the Table structurally. More flexible, but requires access to the environment inside is_compatible_with.
We’ll go with Option A — nominal comparison. It’s simpler, it matches how LuaCATS ---@class works (classes are nominal, not structural), and it avoids threading the environment through every compatibility check.
#![allow(unused)]
fn main() {
(Type::Ref(a), Type::Ref(b)) => a == b,
// Ref is NOT automatically compatible with Table (in either direction).
// A Ref("Node") might have fields the Table doesn't; a Table
// might lack the class name. Use ---@type Node to bridge.
// (A full implementation would resolve Ref→Table via the
// environment and check structural compatibility.)
(Type::Ref(_), Type::Table { .. }) | (Type::Table { .. }, Type::Ref(_)) => false,
}
What about Ref vs Var, Ref vs Dynamic, or Ref vs Error? Those cases are handled by earlier wildcard arms in is_compatible_with — (Type::Var(_), _) | (_, Type::Var(_)) => true matches before the Ref arms, and (Type::Dynamic, _) | (_, Type::Dynamic) => true and (Type::Error, _) | (_, Type::Error) => true match even earlier. This is correct: Var is a placeholder that could be instantiated to any type (including a Ref), Dynamic is compatible with everything, and Error suppresses cascading errors.
The upshot: we only need the two Ref-specific match arms shown above — (Ref, Ref) for same-name checks and (Ref, Table) | (Table, Ref) to reject nominal-vs-structural mismatches. The catch-all _ => false handles any other Ref-vs-non-Ref pairing. Dynamic and Error are compatible with Ref as expected, but that flows through their own wildcard arms, not the Ref ones.
Step 4: Substitution Leaves Ref Alone
Our Substitution::substitute method replaces Var with mapped types. Ref should be left as-is:
#![allow(unused)]
fn main() {
Type::Var(name) => {
match self.get(name) {
Some(t) => t.substitute(self),
None => ty.clone(), // Unbound variable stays as-is
}
}
Type::Ref(_) => ty.clone(), // Never substitute a Ref
}
This is the key difference between Var and Ref. Var("T") is a placeholder — substitution fills it in. Ref("Node") is a final value — it names a type, and that name is the answer.
Step 5: parse_type_name Produces Ref for Known Classes
When the parser encounters a type name that corresponds to a class, it should produce a Ref instead of Dynamic. But parse_type_name doesn’t have access to the environment — it doesn’t know which names are classes.
In Chapter 12, we treated all uppercase names as type variables. Now that we have Ref for class references, we need a way to tell them apart. The convention: single uppercase letters (T, U) are type variables; multi-character uppercase names (Node, Tree) are class references. If a name starts with an uppercase letter and is more than one character, parse it as Ref. Unknown class names still become Ref — if the class doesn’t exist, the compatibility check fails cleanly.
#![allow(unused)]
fn main() {
fn parse_type_name(s: &str) -> Type {
match s.trim() {
"number" => Type::Number,
"string" => Type::String,
"boolean" => Type::Boolean,
"nil" => Type::Nil,
// Single uppercase letter → type variable
name if name.len() == 1 && name.chars().next().unwrap().is_uppercase() => {
Type::Var(name.to_string())
}
// Multi-char uppercase name → class reference
name if name.chars().next().map_or(false, |c| c.is_uppercase()) => {
Type::Ref(name.to_string())
}
_ => Type::Dynamic,
}
}
}
The rule: uppercase single letter = type variable, uppercase word = class reference.
A word of caution about this heuristic. In LuaCATS, only names declared in
---@genericare type parameters — everything else is a concrete type. Our convention (single uppercase letter = type variable) is a shortcut that works for the patterns in this tutorial (T,Ufor generics,Nodefor classes), but it has a blind spot: if someone writes---@param x String(capitalized), the parser treats it as a class referenceRef("String"), not the built-inType::String. The built-in types only match their lowercase spellings ("number","string","boolean","nil"). A production type checker would cross-reference the name against declared generics before deciding it’s aVar— we use the length heuristic to keep the code simple.
Step 6: Display for Ref
The Display implementation:
#![allow(unused)]
fn main() {
Type::Ref(name) => write!(f, "{name}"),
}
Simple — a Ref displays as its name. Ref("Node") shows as Node. This matches the annotation syntax.
Putting It Together: A Linked List
Let’s trace through a complete example:
---@class Node
---@field value number
---@field next Node|nil
local head = { value = 1, next = { value = 2, next = nil } }
-
---@class Node→ registerNode → Ref("Node")in the environment -
---@field value number→("value", Number) -
---@field next Node|nil→parse_type("Node|nil")→Union([Ref("Node"), Nil]) -
Update environment:
Node → Table { fields: [("value", Number), ("next", Union([Ref("Node"), Nil]))] } -
local head = { value = 1, next = { value = 2, next = nil } }→ type isTable { fields: [("value", Number), ("next", Union([Ref("Node"), Nil]))] }A note on table constructors: our simplified parser doesn’t handle table literals —
{ value = 1, next = ... }would be inferred asNilwithout an annotation. With---@type Node,headis typed asRef("Node")regardless of the literal’s inferred type, which is the point of the annotation. The trace above shows what a full implementation would produce; our code relies on the annotation to skip table-constructor inference entirely. -
Compatibility check:
head’s type is compatible withNode?TablevsRef("Node")→false(nominal)
Step 6 says the table literal isn’t compatible with Node! That’s because we’re comparing structurally-typed table literals against nominally-typed class references. The table literal has the right fields, but the type checker doesn’t recognize it as a Node — it’s an anonymous Table.
This is a real limitation of nominal typing. The fix is explicit annotation: add ---@type Node before the local head line, and the type checker treats the variable as Ref("Node") instead of an anonymous Table:
---@type Node
local head = { value = 1, next = { value = 2, next = nil } }
Now head’s type is Ref("Node"), and compatibility checks against Node work. Real LuaCATS implementations do the same thing — classes are nominal, and constructing instances that the type checker recognizes requires the annotation. This isn’t a hack; it’s the intended design. Structural subtyping (“any table with the right fields is a Node”) is possible but introduces its own complexity — cycle detection in the structural comparator, variance rules for field types, and the question of whether extra fields are allowed.
What We’re Simplifying
No structural subtyping for classes. A table with the right fields isn’t automatically a Node. Real LuaCATS doesn’t do structural subtyping either; classes are nominal.
No structural equivalence checking for Ref types. Ref("Node") and Ref("LinkedList") are different types, even if they have the same fields.
No generic recursive types. ---@class Tree<T> isn’t supported — generic classes require tracking type parameters on the class itself, which is a significant extension. The Ref mechanism works the same way — you’d need Ref("Tree", [Type::Var("T")]) instead of Ref("Tree").
No cycle detection in class definitions. ---@class A\n---@field b B and ---@class B\n---@field a A is a mutual recursion. Our approach handles it correctly (both become Refs), but a full checker would verify the recursion terminates at nil or Dynamic.
Next: Chapter 15: Type Narrowing — Union types tell you what a value could be. Type narrowing tells you what it is right now. After if type(x) == "number" then, x is definitely a number inside that branch.
Chapter 15: Type Narrowing — Making Unions Useful
You defined number|string. You checked compatibility. You even inferred function types with generic parameters. But there’s a problem you’ve been ignoring since Chapter 11.
---@type number|string
local x = 42
local y = x + 1
This should work. x is clearly 42, which is a number. But the type checker sees number|string and gives up — it can’t prove x is safe for arithmetic. The union type you carefully built is now in the way.
What you need is narrowing: the ability to say “if this condition is true, then x is definitely a number.” That’s what this chapter builds.
The Problem: Unions Without Narrowing Are Decorative
Our type checker handles unions through is_compatible_with. A Union(Number | String) is compatible with Number — because one variant matches. But that’s the wrong question for arithmetic. The question isn’t “could this be a number?” — it’s “is this definitely a number?”
The difference matters:
| Check | Question | Number | String in arithmetic |
|---|---|---|
is_compatible_with | Could this be a number? | ✅ Yes — compatible |
| narrowing | Is this definitely a number? | ❌ No — might be a string |
Without narrowing, unions are decorations on your type annotations. They don’t help the checker make decisions — they make it more cautious without making it smarter.
Three Guard Patterns
Lua gives you three ways to narrow a type at runtime. Our checker needs to recognize all three.
1. type(x) == "number" — Type Check
The most explicit narrowing. If type(x) returns "number", then x is definitely a number.
---@type number|string
local x = get_value()
if type(x) == "number" then
local y = x + 1 -- x is Number here
else
local z = x .. "!" -- x is String here
end
In the then-block, x narrows from Number | String to Number. In the else-block, Number is removed from the union, leaving String.
2. x ~= nil — Nil Exclusion
Common in Lua: check that a value isn’t nil before using it.
---@type number|nil
local x = maybe_get_number()
if x ~= nil then
local y = x + 1 -- x is Number here
end
This removes Nil from the union. After the check, x is Number.
3. if x then — Truthy Check
Lua’s truthy check excludes both nil and false.
---@type number|nil
local x = maybe_get_number()
if x then
local y = x + 1 -- x is Number here (Nil removed)
end
Our Boolean type doesn’t distinguish true from false, so truthy narrowing removes both Nil and Boolean from the union. A production checker would have literal types (True, False) for more precise narrowing.
Adding Never — The Bottom Type
Before we write the narrowing logic, we need a new variant in our Type enum. When narrowing eliminates all possibilities — say, a String narrowed to Number — the result isn’t Nil. Nil is a real Lua type meaning “no value.” The result should mean “this code path is impossible.”
That’s Never, the bottom type:
#![allow(unused)]
fn main() {
#[derive(Clone, Debug, PartialEq, Eq, Hash, salsa::Update)]
pub enum Type {
Number, String, Boolean, Nil,
Never, // NEW: the bottom type — impossible code path
Function { params: Vec<Type>, ret: Box<Type> },
Table { fields: Vec<(String, Type)> },
Union(Vec<Type>),
Var(String), // from Chapter 12
Ref(String), // from Chapter 14
Dynamic, Error,
}
}
Never has two special properties:
-
It’s compatible with everything. Dead code can’t produce type errors. If a branch narrows to
Never, any expression inside it type-checks — there’s no runtime value that could violate the type. -
It’s absorbed by unions.
Number | Neversimplifies toNumber. An impossible alternative doesn’t change the set of possible values.
Property #2 means Type::union() needs to filter out Never:
#![allow(unused)]
fn main() {
impl Type {
pub fn union(types: Vec<Type>) -> Type {
let mut flat = Vec::new();
for t in types {
match t {
Type::Union(inner) => flat.extend(inner),
other => flat.push(other),
}
}
// Remove Never — it's absorbed by unions
flat.retain(|t| !matches!(t, Type::Never));
flat.sort_by(|a, b| format!("{a:?}").cmp(&format!("{b:?}")));
flat.dedup();
if flat.is_empty() {
return Type::Never; // union of nothing is impossible
}
if flat.len() == 1 {
return flat.pop().unwrap();
}
if flat.iter().any(|t| matches!(t, Type::Dynamic)) {
return Type::Dynamic;
}
Type::Union(flat)
}
}
}
The flat.retain line removes Never variants before checking for duplicates and singletons. If all variants are Never (an empty union), the result is Never — the whole thing is impossible. The flat.is_empty() guard is now necessary because retain might have emptied the list — previously, the list was never empty after flattening and dedup.
And is_compatible_with gains one more case:
#![allow(unused)]
fn main() {
(Type::Never, _) | (_, Type::Never) => true,
}
Never is compatible with everything — dead code can’t type-error.
How Narrowing Works: The narrow_to and remove Methods
Narrowing a type means: given a current type and a target type, produce the narrowest type that’s consistent with both.
narrow_to — “Narrow to exactly this type”
#![allow(unused)]
fn main() {
fn narrow_to(&self, target: &Type) -> Type {
match (self, target) {
// Can't narrow from Dynamic — we don't know what it is
(Type::Dynamic, _) => Type::Dynamic,
// Can't narrow TO Dynamic — but don't lose what we already know.
// If someone writes type(x) == "unknowntype", the target resolves
// to Dynamic. We shouldn't degrade Number to Dynamic because
// the target type is unrecognized.
(_, Type::Dynamic) => self.clone(),
// Error stays Error
(Type::Error, _) => Type::Error,
// Var stays as-is — we don't know what it resolves to
(Type::Var(_), _) => self.clone(),
// Union: narrow each variant recursively, keep the non-Never results
(Type::Union(variants), _) => {
let narrowed: Vec<Type> = variants.iter()
.filter_map(|v| {
let n = v.narrow_to(target);
if matches!(n, Type::Never) {
None // This variant was incompatible — skip it
} else {
Some(n)
}
})
.collect();
if narrowed.is_empty() {
// No variant matches — this code path is impossible.
// Return Never, not the target. If Union(String, Boolean)
// narrows to Number, the result is Never (neither variant
// is a Number), not Number (which would be a lie — the
// condition can never be true, so the variable can never
// have that type).
Type::Never
} else {
Type::union(narrowed)
}
}
// Same type? Already narrowed.
_ if self == target => self.clone(),
// Compatible but not equal (e.g., Ref("Node") narrowed to
// a Table with the same fields) → keep self
_ if self.is_compatible_with(target) => self.clone(),
// Incompatible: this branch is impossible
_ => Type::Never,
}
}
}
For Union(Number | String).narrow_to(&Number): Number.narrow_to(&Number) returns Number (same type), String.narrow_to(&Number) returns Never (incompatible, filtered out). The result is Number.
For Number.narrow_to(&String): incompatible, so the result is Never — this branch can’t execute. The Never type was introduced above; it means “impossible code path,” which is distinct from Nil (a real Lua value).
The union arm uses recursive narrow_to instead of is_compatible_with. This matters for nested unions: Union(Number | String | Nil).narrow_to(&Number) calls narrow_to on each variant, and String.narrow_to(&Number) returns Never (filtered out), Nil.narrow_to(&Number) returns Never (filtered out), and Number.narrow_to(&Number) returns Number. If we used is_compatible_with instead, String.is_compatible_with(&Number) returns false — which happens to work here. But for Union(Union(Number | String) | Nil).narrow_to(&Number), is_compatible_with would keep the inner Number | String union intact (because Number | String is compatible with Number), while recursive narrow_to correctly narrows it to Number alone.
remove — “Remove this type from the union”
#![allow(unused)]
fn main() {
fn remove(&self, removed: &Type) -> Type {
match self {
Type::Union(variants) => {
let kept: Vec<_> = variants.iter()
.filter(|v| v != &removed)
.cloned()
.collect();
Type::union(kept)
}
// If the type IS the removed type, this is an impossible branch
_ if self == removed => Type::Never,
// Dynamic and atomic types (Number, String, etc.) are unaffected —
// if it's not a union and not the removed type, it stays the same.
_ => self.clone(),
}
}
}
For Union(Number | String).remove(&Number): Number is filtered out, leaving String.
For Union(Number | Nil).remove(&Nil): Nil is filtered out, leaving Number.
This is simpler than trying to compute a “complement type” (everything that isn’t Nil). We don’t need to know what all possible types are — we remove the one we know isn’t present.
Extracting Narrowings from Conditions
We need a function that takes a condition expression and produces two sets of narrowings: one for the then-block and one for the else-block.
#![allow(unused)]
fn main() {
enum Narrowing {
ToType(String, Type), // narrow variable to this type
RemoveType(String, Type), // remove this type from variable's union
RemoveFalsy(String), // remove Nil and Boolean
}
fn extract_narrowings(expr: &Expression) -> (Vec<Narrowing>, Vec<Narrowing>) {
match expr {
// type(x) == "number" → narrow x to Number
// Also handles "number" == type(x) via .or_else()
BinaryOp { left, op: BinOp::Eq, right } => {
if let Some((var_name, type_name)) = match_type_check(left, right)
.or_else(|| match_type_check(right, left))
{
let target = type_name_to_type(&type_name);
return (
vec![Narrowing::ToType(var_name.clone(), target.clone())],
vec![Narrowing::RemoveType(var_name, target)],
);
}
(vec![], vec![])
}
// x ~= nil → remove Nil from x
// Also handles nil ~= x via .or_else()
BinaryOp { left, op: BinOp::Ne, right } => {
if let Some(var_name) = match_nil_check(left, right)
.or_else(|| match_nil_check(right, left))
{
return (
vec![Narrowing::RemoveType(var_name.clone(), Type::Nil)],
vec![Narrowing::ToType(var_name, Type::Nil)],
);
}
(vec![], vec![])
}
// if x then → remove falsy from then-block, narrow to Nil in else-block
Name(name) => {
let then_narrow = Narrowing::RemoveFalsy(name.clone());
// else-block: x is falsy. We narrow to Nil — the primary
// falsy type. This is an approximation: `false` is also
// falsy in Lua, but our type system doesn't distinguish
// True from False within Boolean. For the common case
// (Nil checks), this is correct.
//
// Why narrow to Nil at all, rather than leaving the type unchanged?
// In practice, `if x then` is almost always used with
// `number|nil` or `string|nil` — where Nil is the only falsy
// variant. Narrowing to Nil in the else-block is correct for
// those cases, and it's strictly more informative than leaving
// the type as `number|nil` (which would make the else-block
// unable to prove x is nil).
let else_narrow = Narrowing::ToType(name.clone(), Type::Nil);
(vec![then_narrow], vec![else_narrow])
}
// not x → flip the narrowings
UnaryOp { op: UnaryOp::Not, expr: inner } => {
let (then, else_) = extract_narrowings(inner);
(else_, then) // Negation swaps then and else
},
_ => (vec![], vec![]),
}
}
}
The not case is elegant: negation swaps the then/else narrowings. If type(x) == "number" narrows x to Number in the then-block, then not (type(x) == "number") narrows x to Number in the else-block.
The Helper Functions
extract_narrowings delegates the pattern matching to four small helpers. Here they are in full.
match_type_check recognizes the type(x) == "typename" pattern — a function call to type on one side, a string literal on the other:
#![allow(unused)]
fn main() {
fn match_type_check(left: &Expression, right: &Expression) -> Option<(String, String)> {
// Left must be `type(x)`, right must be a string literal
if let Expression::FunctionCall { func, args } = left {
if let Expression::Name(fname) = func.as_ref() {
if fname == "type" {
if let Some(arg) = args.first() {
if let Expression::Name(var_name) = arg {
if let Expression::StringLiteral(type_name) = right {
// Strip surrounding quotes — the AST includes them
let unquoted = type_name.trim_matches('"');
return Some((var_name.clone(), unquoted.to_string()));
}
}
}
}
}
}
None
}
}
match_nil_check recognizes x ~= nil — a variable name on one side, the Nil expression on the other:
#![allow(unused)]
fn main() {
fn match_nil_check(var_side: &Expression, nil_side: &Expression) -> Option<String> {
if let Expression::Name(name) = var_side {
if let Expression::Nil = nil_side {
return Some(name.clone());
}
}
None
}
}
type_name_to_type converts a Lua type name (from type()) to our Type enum. This is the same logic as the parse_type_name helper from Chapter 10, with additional cases for nil, table, and function:
#![allow(unused)]
fn main() {
fn type_name_to_type(name: &str) -> Type {
match name {
"number" => Type::Number,
"string" => Type::String,
"boolean" => Type::Boolean,
"nil" => Type::Nil,
"table" => Type::Table { fields: vec![] },
"function" => Type::Function { params: vec![], ret: Box::new(Type::Dynamic) },
_ => Type::Dynamic,
}
}
}
A real type checker would emit “unknown type name” instead of falling back to Dynamic, but for our purposes this is honest: an unrecognized type is effectively unknown.
apply_narrowings takes a type environment and a list of narrowings, and produces a new environment with each narrowed variable’s type updated:
#![allow(unused)]
fn main() {
fn apply_narrowings(env: &TypeEnv, narrowings: &[Narrowing]) -> TypeEnv {
let mut new_env = env.clone();
for narrowing in narrowings {
match narrowing {
Narrowing::ToType(var_name, target) => {
let current = new_env.lookup(var_name);
let narrowed = current.narrow_to(target);
new_env = new_env.extend(var_name.clone(), narrowed);
}
Narrowing::RemoveType(var_name, removed) => {
let current = new_env.lookup(var_name);
let narrowed = current.remove(removed);
new_env = new_env.extend(var_name.clone(), narrowed);
}
Narrowing::RemoveFalsy(var_name) => {
let current = new_env.lookup(var_name);
// Remove Nil, then remove Boolean. For unions this
// strips both. For non-union types, each remove is
// a no-op if the type isn't Nil or Boolean.
let narrowed = current.remove(&Type::Nil).remove(&Type::Boolean);
new_env = new_env.extend(var_name.clone(), narrowed);
}
}
}
new_env
}
}
apply_narrowings creates a new TypeEnv for each branch rather than mutating the original. This is essential — the original environment is returned after the if, preserving the scope rule. If both branches narrowed x, those narrowings don’t leak. Chapter 16 covers what happens when you want them to.
A Subtle Bug: String Literals in the AST
There’s a trap waiting in the type(x) == "number" pattern. The analisar parser stores string literals with their quotes — so the AST token for "number" is "\"number\"", not "number". If match_type_check passes the raw token to type_name_to_type, the lookup fails (because "\"number\"" doesn’t match "number"), and the narrowing silently resolves to Dynamic.
The match_type_check helper above handles this with type_name.trim_matches('"') — strip the quotes before looking up the type name. This is the kind of parser/AST mismatch that’s invisible until you actually run the code. The narrowing logic was correct — the data was wrong. If your narrowing tests pass but the narrowed type shows up as Dynamic instead of Number, check what the parser actually gives you for the string literal.
The Scope Rule: Narrowings Don’t Leak
After the if block ends, narrowings go out of scope. We return the original environment, not the narrowed one.
---@type number|string
local x = get_value()
if type(x) == "number" then
local y = x + 1 -- x is Number here
end
local z = x + 1 -- x is Number | String again
This is correct behavior: after the if, we don’t know which branch ran. The variable x could still be either type.
What about assignments inside the if? Our checker loses them too — any assignment to an existing variable inside an if block is discarded when the block ends. A production checker would do “environment merging”: if both branches assign to x, the post-if type is the union of both branches’ types for x. That requires tracking assignments within branches, which our current architecture doesn’t support.
The Key Change in check_stmt
The Statement::If arm now looks like this:
#![allow(unused)]
fn main() {
Statement::If { test, then_block, else_block } => {
// Evaluate the condition
infer_type(db, source, test.clone(), env.clone(), diagnostics);
// Extract narrowings from the condition
let (then_narrowings, else_narrowings) = extract_narrowings(test);
// Keep the original env for after the if-else
let original_env = env.clone();
// Then-block: apply then-narrowings
let then_env = apply_narrowings(&original_env, &then_narrowings);
let mut env = then_env; // shadow env with the narrowed version for this block
for s in then_block {
env = check_stmt(db, source, s, &env, annotations, diagnostics);
}
// Else-block: apply else-narrowings
if let Some(else_block) = else_block {
let else_env = apply_narrowings(&original_env, &else_narrowings);
let mut env = else_env; // shadow env with the narrowed version for this block
for s in else_block {
env = check_stmt(db, source, s, &env, annotations, diagnostics);
}
}
// After the if-else, narrowings go out of scope
original_env
}
}
apply_narrowings produces a new TypeEnv (not mutating the existing one) where each narrowed variable has its updated type. The original environment is returned after the if, preserving the scope rule.
What We’re Simplifying
No type narrowing for assignments. If x = 42 inside an if block, we don’t propagate that information.
No environment merging. After an if-else, we don’t merge the environments from both branches. This is the next chapter — Chapter 16 builds TypeEnv::merge to handle exactly this case.
No Boolean narrowing. Our Boolean type doesn’t distinguish true from false, so truthy narrowing removes the entire Boolean type. A production checker would have literal types.
No narrowing across function calls. If f() returns number|string, we don’t narrow based on the return value.
No pattern-based narrowing. Lua’s match-like patterns (e.g., if type(x) == "table" and x.value then) aren’t supported — only single-variable guards.
Next: Chapter 16: Environment Merging After Conditionals — After an if/else, each branch might have narrowed the same variable to different types. How do you merge them? The answer isn’t “pick one” — it’s to compute the union of what each branch learned.
Chapter 16: Environment Merging — What the Type Checker Knows After an if
In Chapter 15, we built type narrowing: inside if type(x) == "number" then, x is narrowed to Number. When the if ends, we threw away the narrowing and returned the original environment.
That’s correct for narrowing — after the if, we don’t know which branch ran. But it’s wrong for assignments. If both branches assign to the same variable, the post-if type should be the union of both branches’ types for that variable.
---@type number|string
local x = 42
if type(x) == "number" then
x = "hello" -- x is now String in this branch
else
x = 99 -- x is now Number in this branch
end
-- x could be String OR Number → type is Number | String
Without merging, we’d keep the pre-if type (Number | String) or lose the assignment information entirely. Neither is right. The post-if type depends on what actually happened in each branch.
Three Cases for Merging
After an if-else, every variable falls into one of three cases:
-
Assigned in both branches → union of both types. If
xbecomesStringin the then-block andNumberin the else-block, the merged type isNumber | String. -
Assigned in only one branch → union with the pre-if type. If
xbecomesStringin the then-block but isn’t touched in the else-block, the merged type isString | Number | String(which normalizes toNumber | String). The pre-if type is still possible because the other branch might have run. -
Not assigned in either branch → pre-if type unchanged. If nobody touched
x, its type doesn’t change. Narrowing is scoped to the branch — it reverts after theif.
Case 2 is the one that surprises people. Why not use the then-block’s type? Because the else-block might have run. After an if without an else, any assignment inside the if is conditional — it might not have happened.
How Merging Works
The TypeEnv::merge function takes three environments:
pre_env— the environment before the if-elsethen_env— the environment after the then-blockelse_env— the environment after the else-block (orNoneif there’s no else)
For each variable, it compares the then-type and else-type to the pre-if type. If a type changed, that branch assigned the variable. The merged type is the union of all possible types for that variable across execution paths.
#![allow(unused)]
fn main() {
fn merge(pre_env: &TypeEnv, then_env: &TypeEnv, else_env: Option<&TypeEnv>) -> TypeEnv {
// Collect all variable names from all three environments.
// We need every name that appears in any branch — the merged
// environment must include variables that only one branch assigned.
// Note: Vec::contains is O(n) per call, making this O(n²) total.
// For the small environments in this tutorial (5-10 variables), this
// is negligible. A production checker would use a HashMap for O(1)
// lookups and a HashSet/BTreeSet for name deduplication.
let mut all_names: Vec<String> = Vec::new();
for (name, _) in &pre_env.bindings {
if !all_names.contains(name) { all_names.push(name.clone()); }
}
for (name, _) in &then_env.bindings {
if !all_names.contains(name) { all_names.push(name.clone()); }
}
if let Some(ee) = else_env {
for (name, _) in &ee.bindings {
if !all_names.contains(name) { all_names.push(name.clone()); }
}
}
// For each variable, compute the merged type.
// Key design choice: if a variable exists in a branch's environment
// but NOT in another, the "missing" branch contributes Nil (the
// variable might not exist if that branch ran). This is more correct
// than degrading to Dynamic, which loses type information.
let mut merged = TypeEnv::new();
for name in all_names {
let in_pre = pre_env.contains(&name);
let in_then = then_env.contains(&name);
let in_else = else_env.map(|e| e.contains(&name)).unwrap_or(false);
// For each environment, use its type if the variable exists there,
// or Nil if it doesn't (the variable is "absent" in that branch).
let pre_type = if in_pre { pre_env.lookup(&name) } else { Type::Nil };
let then_type = if in_then { then_env.lookup(&name) } else { Type::Nil };
let else_type = else_env.map(|e| {
if in_else { e.lookup(&name) } else { Type::Nil }
});
let then_changed = then_type != pre_type;
let merged_type = match (then_changed, else_type) {
// Assigned in then, no else block → union of then-type and pre-type
(true, None) => Type::union(vec![then_type, pre_type]),
// Assigned in then, else block exists
(true, Some(et)) => {
match et != pre_type {
// Both branches changed → union of both
true => Type::union(vec![then_type, et]),
// Only then changed → union of then-type and pre-type
false => Type::union(vec![then_type, pre_type]),
}
}
// Not assigned in then
(false, None) => pre_type,
(false, Some(et)) => {
match et != pre_type {
// Only else changed → union of else-type and pre-type
true => Type::union(vec![et, pre_type]),
// Neither changed → pre-if type
false => pre_type,
}
}
};
merged = merged.extend(name, merged_type);
}
merged
}
}
The key insight: we compare each branch’s final type to the pre-if type. If they differ, the variable was assigned in that branch.
The contains method
TypeEnv::lookup returns Dynamic for names that don’t exist in the environment — that’s the right behavior for type inference (“I don’t know what this is”). But for merging, Dynamic is wrong. If a variable exists in the then-block but not the else-block, using Dynamic for the else-block’s type would degrade the merged type to Dynamic (since Dynamic absorbs in unions). The right answer is Nil — the variable might not exist if that branch ran.
So merge uses a contains method to check whether a variable actually exists before looking it up:
#![allow(unused)]
fn main() {
impl TypeEnv {
/// Check if a variable exists in this environment.
/// Unlike `lookup` (which returns Dynamic for unknown names),
/// this returns false for variables that don't exist.
fn contains(&self, name: &str) -> bool {
self.bindings.iter().rev().any(|(n, _)| n == name)
}
}
}
When a variable is absent from a branch, merge uses Type::Nil instead of calling lookup (which would return Dynamic). This preserves type information instead of degrading it.
But wait — what about narrowing? If x starts as Number | String, narrows to Number in the then-block, and isn’t assigned, the then-block’s final type for x is Number. That’s different from the pre-if type (Number | String). Won’t the merge treat narrowing as an assignment?
Yes — and it works out correctly. The merge sees Number (then-block) vs. Number | String (pre-if), and produces Type::union(Number, Number | String) — which normalizes to Number | String. The narrowing “reverts” not because we explicitly revert it, but because the merge algorithm unions the narrowed type with the pre-if type, and union(Number, Number | String) = Number | String.
This isn’t a happy accident — it’s a natural consequence of the design. The merge’s job is to produce a type that accounts for all possible execution paths. After the if, x could be Number (if the then-block ran) or Number | String (if it didn’t). The union of those possibilities is Number | String — exactly the pre-if type. Narrowing reverts because the union operation absorbs the narrowed variant back into the original.
Narrowing + Assignment Interaction
The tricky case is when narrowing and assignment interact in the same branch:
---@type number|string
local x = 42
if type(x) == "number" then
local y = x + 1 -- OK: x narrowed to Number
x = "hello" -- x is now String in this branch
else
local z = x .. "!" -- OK: x narrowed to String
x = 99 -- x is now Number in this branch
end
-- After the if: x is Number | String
Inside the then-block, x is first narrowed to Number (so x + 1 works), then assigned String. The then-block’s final type for x is String. Inside the else-block, x is narrowed to String (so x .. "!" works), then assigned Number. The else-block’s final type for x is Number. The merge produces Number | String.
The order matters: narrowing happens first (applied to the branch environment before checking the body), then assignments override the narrowed type. The final type in each branch is whatever was last — narrowing or assignment.
No Else Block
When there’s no else, the merge uses None for the else environment:
---@type number|nil
local x = 42
if x ~= nil then
local y = x + 1 -- OK: x narrowed to Number (Nil removed)
x = "hello" -- x is now String in this branch
end
-- x could be String (from then) or Number | Nil (pre-if, because else-path might not have run)
-- Merged type: String | Number | Nil
The None else means the else-path keeps the pre-if type. If x was assigned String in the then-block, the merged type is Type::union(String, Number | Nil) — which flattens the nested union and deduplicates, producing String | Number | Nil. The String from the then-block plus all possibilities from the pre-if type.
The Full Picture: If-Statement Type Checking
Here’s how the pieces fit together in check_stmt:
- Evaluate the condition — infer its type and collect any diagnostics.
- Extract narrowings from the condition expression (Chapter 15’s
extract_narrowings). - Save the pre-if environment — we’ll need it for merging.
- Apply narrowings for the then-block — get a narrowed environment, then check each statement in the then-block body.
- Apply narrowings for the else-block — same process, with the complement narrowings.
- Merge — call
TypeEnv::merge(pre_env, final_then_env, final_else_env)to produce the post-if environment.
This replaces Chapter 15’s simpler approach of always returning the pre-if environment after an if. The pre-if environment is still used as the baseline for the merge — it’s the “neither branch ran” default.
What We’re Simplifying
No control-flow graph. A real type checker builds a CFG with basic blocks and computes join points; we’re doing it inline, which works for if-else but doesn’t handle loops, early returns, or break statements. In a loop body, the post-loop type should be the intersection of the pre-loop and post-loop-body types (the loop might not have run), but we return the post-loop-body type, which is overly optimistic.
No definite assignment analysis. If a variable is assigned in both branches of an if-else, a real checker can mark it as definitely assigned; we only track the type.
Narrowing reverts via union, not via scope. In a real checker, narrowing is scoped so exiting the if block restores the pre-if environment. In our implementation, narrowing “reverts” because the merge unions the narrowed type with the pre-if type, producing the original type. This works for simple cases but doesn’t generalize to nested control flow where you’d need explicit scoping.
No elseif demonstration. Lua’s elseif is syntactic sugar for else if, and our parser doesn’t produce elseif nodes (the else-block would contain another if statement). The merge handles this naturally via recursive checking, but we don’t show it.
New variables from branches. If a branch introduces a variable that doesn’t exist in the pre-if environment, the merged environment includes it with a union type including Nil — for example, if y is assigned String in the then-block but doesn’t exist in the else-block, the merged type is String | Nil (the variable might not exist if the else-block ran). We handle this by treating “variable absent from an environment” as Nil rather than Dynamic (which would degrade the type via Dynamic absorption).
Next: Chapter 17: Putting It All Together — Every concept from the tutorial in one place: annotations, unions, generics, narrowing, environment merging, and cross-file resolution. A working type checker that handles real Lua code.
Chapter 17: Capstone — Running Every Feature at Once and Watching What Composes
Sixteen chapters ago, we counted the lines in a Lua file and checked whether it contained the word "print". That was it — one string lookup, one Salsa query. Now we have a type checker. It handles unions, generics, classes, cross-file dependencies, narrowing, and environment merging. It catches real type errors. It re-checks incrementally when you edit a file.
This chapter doesn’t introduce new features. Instead, it runs all of them together and watches what happens when they interact. A type checker is more than its features — it’s the way they compose. Narrowing without merging is useless (the narrowed type is thrown away). Merging without narrowing is incomplete (you need narrowing to make unions tractable). Cross-file without annotations is blind (you get Dynamic everywhere). The features only work as a system.
Let’s see that system in action.
The Program
The capstone code in chapters/ch17-capstone/ is a single main.rs that combines everything from Chapters 1–16. There’s no new logic — every function and type is one you’ve seen before. What’s new is that they’re all present at once, and they call each other.
Here’s the dependency graph of the major components:
SourceFile (input)
├─→ parse (tracked) → LuaAst
└─→ extract_annotations (tracked) → Vec<Annotation>
└─→ type_check (tracked) → Vec<Diagnostic>
│ depends on both parse and extract_annotations
├─→ check_stmt → TypeEnv
│ ├─→ infer_type → Type
│ ├─→ extract_narrowings → (then, else)
│ └─ TypeEnv::merge
└─→ module_exports (tracked) → Type
└─→ resolve_require → SourceFile
Every arrow is a Salsa-tracked query or a function called from one. The data flows down; the invalidation flows up. When you change a SourceFile’s text, Salsa re-runs parse, then extract_annotations, then everything that depends on them — but only if the outputs actually changed.
Demo 1: Full Type Checking
Here’s a Lua file that uses classes, union types, and narrowing in the same program:
---@class Point
---@field x number
---@field y number
---@param a Point
---@param b Point
---@return Point
function add_points(a, b)
return a
end
---@type number|string
local x = 42
if type(x) == "number" then
local y = x + 1 -- OK: x is narrowed to Number
else
local z = x .. "!" -- OK: x is narrowed to String
end
---@type number|nil
local maybe_num = 42
if maybe_num ~= nil then
local safe = maybe_num + 1 -- OK: nil excluded
end
Zero diagnostics. Every feature is doing its job:
---@class Pointregisters a table type withx: Numberandy: Number(Ch11)---@param a Pointannotates the function’s parameter types (Ch10)---@type number|stringdeclares a union type (Ch11)type(x) == "number"narrowsxtoNumberin the then-block (Ch15)- After the
if, the narrowing reverts —xgoes back toNumber | String(Ch16) maybe_num ~= nilexcludesNilfrom the union (Ch15)
None of these features work in isolation. The narrowing is only useful because the union type exists. The union type is only useful because is_compatible_with handles it. The class annotation is only useful because check_stmt looks up parameter types. The features are layers, and each layer depends on the ones below.
Demo 2: Cross-File Type Checking
Two files: a utility module and a main program that requires it.
utils.lua:
---@param x number
---@return number
function double(x)
return x
end
return double
main.lua:
local double = require("utils")
local result = double(42)
When we type-check main.lua, the checker calls require("utils"), which triggers module_exports on utils.lua. That query parses utils.lua, infers that double has type Function(Number) -> Number, and returns it as the module’s export type. Back in main.lua, double(42) is a function call with a Number argument against a Function(Number) -> Number — it type-checks cleanly.
The cross-file dependency is tracked by Salsa. If you edit utils.lua and change double to return a string, module_exports re-runs, type_check on main.lua re-runs, and you get a diagnostic: “cannot use String in arithmetic.” Salsa figured out that main.lua depends on utils.lua because resolve_require called module_exports — the dependency is implicit in the query graph, not something you have to declare.
Demo 3: Language Server with Timing
The language server from Chapter 7, now with std::time::Instant measurements. The demo generates a ~1200-line Lua file (200 typed functions plus a narrowing section) to make the timing visible:
#![allow(unused)]
fn main() {
use salsa::Setter;
use std::path::PathBuf;
use std::time::{Duration, Instant};
struct LanguageServer {
db: Database,
files: Vec<(PathBuf, SourceFile)>, // path → SourceFile for this server's db
}
impl LanguageServer {
fn open_file(&mut self, path: &str, text: &str) -> (Vec<Diagnostic>, Duration) {
let start = Instant::now();
let source = SourceFile::new(&self.db, PathBuf::from(path), text.to_string());
self.files.push((PathBuf::from(path), source));
let diagnostics = type_check(&self.db, source);
(diagnostics, start.elapsed())
}
fn edit_file(&mut self, path: &str, new_text: &str) -> (Vec<Diagnostic>, Duration) {
if let Some((_, source)) = self.files.iter().find(|(p, _)| p == path) {
let start = Instant::now();
source.set_text(&mut self.db).to(new_text.to_string());
let diagnostics = type_check(&self.db, *source);
(diagnostics, start.elapsed())
} else {
self.open_file(path, new_text)
}
}
}
}
Three runs on the same 1200-line file:
- First open (cold cache) — full parse + type check: ~22ms
- Edit: change a value (
local result = 42→local result = 99) — re-parse + re-check: ~22ms. The source text changed, soparsere-runs. The AST changed, sotype_checkre-runs too. - Edit: add a comment (
---@class Item→--- A data item\n---@class Item) — re-parse only: ~1.5ms. The source text changed, soparsere-runs. But the AST is identical — comments aren’t part of the AST. Salsa’s value-based comparison detects this and skipstype_checkentirely.
That’s the 14x speedup (timings vary by machine; the key point is the ratio, not the absolute numbers). The comment-only edit triggers a re-parse but doesn’t propagate to type checking because Salsa compares the output, not only the revision. In a real codebase with cross-file dependencies, the difference is even more dramatic: editing file A doesn’t invalidate file B’s cached type_check.
Demo 4: Generics + Narrowing + Merging
This is where the features really start interacting. Here’s a Lua program that uses generics, narrowing, and environment merging in sequence:
---@param x T
---@return T
function identity(x)
return x
end
---@type number|string
local val = 42
if type(val) == "number" then
local narrowed = identity(val)
else
local also = val .. "!"
end
Trace what happens:
identityis declared with---@param x Tand---@return T. TheTbecomesType::Var("T")(Ch12).valisNumber | String(Ch11).type(val) == "number"narrowsvaltoNumberin the then-block (Ch15).identity(val)—valisNumber, soT = Number, return type isNumber. The generic substitution works on the narrowed type (Ch12).- After the
if, the environment merges:valgoes back toNumber | String(Ch16).
The generic substitution works on the narrowed type without any special handling. This isn’t an accident — it’s a consequence of the design. Narrowing produces a new TypeEnv. infer_type reads from that environment. Substitution applies to whatever type it finds. The features compose because they all operate on the same data structures, and none of them make assumptions about where those data structures came from.
Demo 5: Recursive Types + Narrowing
Recursive types (Ch14) and narrowing (Ch15) interact in a common Lua pattern — a linked list node that might be nil:
---@class Node
---@field value number
---@field next Node|nil
---@type Node|nil
local head = nil
if head ~= nil then
local v = head
end
Node|nil is a union of Ref("Node") and Nil. The head ~= nil check removes Nil from the union, narrowing head to Ref("Node") inside the then-block. The Ref type isn’t expanded — it stays as a named reference. This is correct: Ref is a final value, not a placeholder (Ch14). The type checker doesn’t need to know what’s inside the Node to know that head isn’t nil.
Demo 6: Environment Merging
The most recent feature, and the one that makes narrowing stick. Without merging, any assignment inside an if was lost after the block ended. With merging, the post-if type is the union of all possible types across branches:
---@type number|string
local x = 42
if type(x) == "number" then
x = "hello"
else
x = 99
end
After the if:
pre_envhasx: Number | Stringthen_envhasx: Stringelse_envhasx: Number- Merge:
Type::union(String, Number)→Number | String
The assignment changed x in each branch, but the merged type is the same as the original — because the union of the branch types equals the pre-if type. Narrowing reverts via the same mechanism: Type::union(NarrowedType, OriginalType) absorbs the narrowed variant back into the original.
What We Built
Here’s every feature, the chapter that introduced it, and what it contributes to the whole:
| Feature | Chapter | Why It Matters |
|---|---|---|
| Salsa inputs | 1 | Source of truth — everything derives from SourceFile |
| Tracked functions | 1–2 | Automatic caching and invalidation |
| Interned symbols | 3 | Cheap identity comparison for names |
| Tracked structs | 4 | Entity identity — FuncDef is a database-resident object |
| Type inference | 5 | The core query — “what type does this expression have?” |
| Diagnostics | 6 | Error reporting without polluting return types |
| Language server | 7 | The payoff — a usable tool for programmers |
| Cycle detection | 8 | Safety net — recursive queries can’t loop forever |
| Cross-file | 9 | require() resolves types across modules |
| Annotations | 10 | The programmer teaches the checker what they know |
| Union types | 11 | `number |
| Generics | 12 | identity(x: T): T — type-preserving abstractions |
| Function types | 13 | fun(number): string — higher-order function types |
| Recursive types | 14 | `Node |
| Type narrowing | 15 | type(x) == "number" — making unions tractable |
| Environment merging | 16 | Assignments inside branches survive the if |
Seventeen chapters. Sixteen features. One system.
What’s Still Missing
This tutorial builds a teaching type checker, not a production one. A real Lua type checker (like LuaLS or Teal) has plenty that we don’t.
Type narrowing in loops — we return the post-loop-body type, which is overly optimistic. The correct post-loop type is the intersection of the pre-loop and post-loop-body types, because the loop might not have run at all.
Full method desugaring — method calls (a:f()) are silently dropped. They’d need to desugar to a.f(a, ...) with the subject as an implicit first argument.
Table expression types — we have Type::Table for class instances but no way to construct a table literal. A real checker would infer { x = 1, y = 2 } as Table { x: Number, y: Number }.
Multiple return values — Lua functions can return multiple values and we only track the first.
Variadic arguments — Lua functions can take ... arguments and we don’t model variadic generics.
Live registry updates — our OnceLock registry is write-once. A real language server would watch the filesystem and update the registry on file changes, through the database, not a side channel.
Bidirectional type checking — we only check top-down (from annotations to expressions). A production checker also works bottom-up, inferring types from usage when annotations are absent.
Type aliasing — ---@alias Vec2 {x: number, y: number} style named shorthands for complex types aren’t supported.
Each of these is a real feature, not a hypothetical. If you’re building a type checker for production use, you’ll need most of them. The good news: the architecture you’ve learned — Salsa inputs, tracked queries, derived types, accumulators, narrowing, merging — scales to all of them. The hard part isn’t the Salsa plumbing. It’s the type-system semantics.
REVIEW.md — Building a Gradual Type Checker for Lua
Reviewed by Esme, 2026-04-21
Overall
This is a well-structured tutorial that builds incrementally from “hello Salsa” to a working language server simulation. The progression is natural — each chapter introduces one new Salsa concept and shows why it matters. The writing is clear and conversational. The main issues are around code correctness, significant code duplication across chapters, and a few conceptual gaps.
Chapter 1: Hello Salsa
[clarity] The spreadsheet analogy (inputs = cells, tracked functions = formulas, revisions = “after you edit a cell”) is excellent. This is exactly the kind of concrete-before-abstract approach that works.
[error] The contains_text tracked function takes needle: String as a key parameter. This means every unique needle string creates a separate cache entry. For a tutorial, this is fine to demonstrate key parameters, but it’s worth noting that using String as a query key means allocations on every call. A reader might assume this is idiomatic — it’s not, for hot paths.
[suggestion] The README links point to chapters/ch01-hello-salsa.md but the actual files are chapters/ch01-hello-salsa/README.md. The links in README.md won’t resolve correctly.
[style] The code comments are thorough and well-organized with section headers (Step 1, Step 2, etc.). This is great for learning — keep this pattern.
Chapter 2: Parsing Lua
[error] The salsa::Update derive macro is used on LuaAst, Statement, Expression, BinOp, UnaryOp. The salsa::Update trait is not a standard derive in Salsa 0.26 — it may be salsa::Update or may not exist as a derive. This needs verification against the actual Salsa 0.26 API. If it doesn’t exist, the code won’t compile.
[clarity] The README says “Why Convert the AST?” but doesn’t fully explain the 'db lifetime issue. The comment in code mentions it, but the README should note that tracked function return values must be 'static (or at least not borrow from the input), which is why analisar’s Cow<'a, str> AST can’t be returned directly.
[error] The Suffixed expression handling has a logic problem. When s.computed is true (index access a[b]), the match arm ast::Expression::Suffixed(s) if !s.computed && !s.method won’t match, and it falls through to the wildcard _ => Expression::Nil. This silently drops index expressions. Should at least have a comment noting this is intentionally simplified.
[clarity] The top_level_names function’s comment about Salsa skipping re-runs when parse returns the same LuaAst is slightly misleading. Salsa re-runs top_level_names if parse is re-invoked in a new revision — it doesn’t compare the LuaAst value. The comment says “same value” which implies value equality, but Salsa tracks at the revision level, then checks if the inputs to the derived query changed. This distinction matters for understanding Salsa’s actual invalidation strategy.
Chapter 3: Interned Symbols
[error] The Symbol<'db> interned struct is defined but never actually used in the symbol_table or lookup_name queries in a way that demonstrates the benefit. The lookup_name function interns both the lookup name and each definition name, then compares by ID — but it’s doing Symbol::new(db, def.name.clone()) inside a loop, creating new interned values on every comparison. The real benefit of interning is that you store Symbol in your data structures (not String) and then comparison is free. The current code actually does MORE work than a plain string comparison (intern + compare vs. just compare). This undermines the lesson.
[suggestion] The chapter should show Definition { name: Symbol<'db>, ... } instead of Definition { name: String, ... } to demonstrate the actual benefit: store symbols once, compare by ID forever after.
[clarity] The README’s “Interned vs Input” table is great — clear and concise. More of these comparison tables throughout.
[error] The Symbol<'db> lifetime parameter means it can’t be stored in Definition (which is plain data with salsa::Update). This is a real design tension that the tutorial doesn’t address. The reader will try to put Symbol<'db> in their data structures and hit lifetime issues. This deserves explicit discussion.
Chapter 4: Tracked Structs
[error] The FuncDef::new(db, func_name, param_count, body) constructor signature doesn’t match how #[salsa::tracked] structs work in Salsa 0.26. Tracked structs typically use a builder pattern or have specific field registration. The exact API needs verification.
[clarity] The “wrapper/data pattern” in the README is a key concept that deserves more explanation. The README mentions it but the code doesn’t really demonstrate it — FuncDef has body_text: String as a tracked field, but there’s no separate “data” struct inside it. The pattern is: tracked struct holds an ID, a separate data struct holds the content. The current code puts the content directly in the tracked struct, which is the simpler approach. Either demonstrate the actual wrapper/data pattern or remove the reference.
[error] The parsing in parse_functions is extremely fragile — it uses simple string splitting to parse function name(params) body end. This is fine for a demo, but the line if let Some(close_paren) = after_paren.find(')') will break on nested parentheses in the body. A comment acknowledging this simplification would help.
[suggestion] This chapter drops the analisar parser entirely in favor of manual string parsing. The transition is jarring — chapters 2-3 built up the analisar-based AST, and now we’re parsing by hand. A brief explanation of why (tracked structs want a different granularity than the file-level parse) would help the reader understand the architectural shift.
Chapter 5: Type Inference
[error] The infer_type function passes Expression (a large clone-heavy enum) and TypeEnv as query keys. Every recursive call clones the entire environment and expression tree. In a real Salsa program, this would be catastrophically slow for non-trivial programs. The code acknowledges this in the “IMPORTANT DESIGN NOTE” comment, but the comment says “the granularity is per-file” — this isn’t quite right. The granularity is actually per-(source, expr, env) triple, which means different expressions get separate cache entries, but the env cloning makes this extremely expensive.
[clarity] The “IMPORTANT DESIGN NOTE” is good but buried in a comment. This should be in the README and the main text — it’s a critical architectural limitation that the reader needs to understand.
[error] Expression::String was renamed to Expression::StringLiteral in this chapter (vs String in ch02-04). This is actually a good rename (avoids confusion with Rust’s String type) but it’s inconsistent with earlier chapters. Should be backported or noted.
[clarity] The Dynamic type explanation is excellent. “Dynamic is NOT the same as Error. Dynamic means ‘the programmer chose not to annotate this’ — it’s intentional. Error means ‘we found a contradiction’ — it’s a bug.” This is the kind of precise distinction that makes a tutorial valuable.
[suggestion] The is_compatible_with method treats Error as compatible with everything. This means once you have one error, it can suppress other real errors (error cascading is hidden). A brief note about this design choice would be helpful.
Chapter 6: Diagnostics
[error] The Diagnostic::emit method calls self.accumulate(db). In Salsa 0.26, the accumulator API may be different — accumulate might not be a method on the accumulated value itself. This needs verification against the actual Salsa 0.26 accumulator API.
[clarity] The “Why Not Result<Type, Diagnostic>?” section in the code comments is excellent — it clearly explains the three-option tradeoff. This belongs in the README too.
[style] The code formatting in ch06 and ch07 is significantly compressed compared to earlier chapters (e.g., fn new() -> Self { TypeEnv { bindings: Vec::new() } } on one line). This makes the code harder to read, especially in a tutorial context. The earlier chapters’ formatting was better.
Chapter 7: Language Server
[error] The LanguageServer struct holds db: Database (owned). But SourceFile::set_text requires &mut db, and after mutation, any SourceFile references obtained before the mutation are still valid (Salsa guarantees this). However, the diagnostics method takes &self, which means it calls type_check::accumulated with an immutable reference. This should work, but the pattern of holding owned Database in a struct while also holding SourceFile IDs is worth a note — it’s a common pattern in Salsa but not immediately obvious.
[suggestion] The language server simulation doesn’t demonstrate the key benefit: timing. Add std::time::Instant measurements to show that re-checking after a small edit is faster than the initial check. Without timing, the “Salsa skips work” claim is invisible to the reader.
[clarity] The architecture diagram in the code comments is great. Should be in the README.
Cross-Cutting Issues
[error] MASSIVE CODE DUPLICATION. The AST types (Statement, Expression, BinOp, UnaryOp), the convert_stmt/convert_expr functions, TypeEnv, and infer_type are copy-pasted across chapters 2-7 with minor variations. This is ~200 lines duplicated per chapter. While this makes each chapter self-contained (good), it means any bug found in one chapter exists in all of them. Consider: (a) extracting shared code into a library crate that chapters depend on, or (b) explicitly noting that this is intentional for self-containedness and that production code would share the definitions.
[error] The salsa::Update derive macro is used extensively but never explained. The reader has no idea what it does or why it’s needed. A one-paragraph explanation (it enables Salsa to compare old and new values for fine-grained invalidation) would help.
[clarity] The README chapter links use .md extensions pointing to files that don’t exist. The actual content is in chapters/chXX-name/README.md.
[suggestion] No chapter discusses how Salsa handles cycles. The README mentions “Cycle detection — recursive queries that would loop forever are caught” but no chapter demonstrates this. A brief example (even just in an appendix) would round out the coverage.
[style] The code quality degrades noticeably from ch05 onward — compressed formatting, longer lines, less whitespace. Earlier chapters are much more readable.
What’s Working
- The chapter progression is excellent — each one builds naturally on the last
- The Salsa concepts are introduced at the right pace: inputs → tracked functions → interned → tracked structs → accumulators
- The “key concepts” recaps at the end of each chapter are valuable
- The spreadsheet analogy in ch1 is memorable and accurate
- The gradual typing explanation in ch5 is precise and well-articulated
- The accumulator pattern (sentinel return + accumulated diagnostic) is clearly explained
- The per-file isolation demo in ch7 effectively shows Salsa’s incremental value
Deep Dive Additions (2026-04-21 — Second Pass)
I re-read all 7 chapters’ source code, compiled each one, and ran all binaries. The initial review raised API concerns about Salsa 0.26 and analisar 0.4 — these were verified:
salsa::Updatederive macro exists in Salsa 0.26 ✓- The accumulator API
type_check::accumulated::<Diagnostic>(&db, file)works as shown ✓ - All chapters compile and run successfully ✓
However, the deeper code-level review found these new issues:
[error] Suffixed expression regression in ch03, ch05, ch06, ch07
Chapter 2’s convert_expr handles ast::Expression::Suffixed completely: field access (a.b), index access (a[b]), and method calls (a:f()). But chapters 3, 5, 6, and 7 changed the match arm to:
#![allow(unused)]
fn main() {
ast::Expression::Suffixed(s) if !s.computed && !s.method => { ... }
}
The guard !s.computed && !s.method means:
a[b](index expression,s.computed == true) → falls to_ => Expression::Nila:f()(method call,s.method == true) → falls to_ => Expression::Nil
This is a regression from Chapter 2. Index and method expressions are silently converted to Nil across 4 chapters. No comment explains this is intentional.
Fix: Restore the full Suffixed handling from Chapter 2, or add a comment explaining that index/method expressions are intentionally unhandled.
[error] convert_args drops table arguments silently
In convert_args, the ast::Args::Table(_) arm returns vec![]:
#![allow(unused)]
fn main() {
ast::Args::Table(_) => vec![], // Table args dropped silently
}
Function calls with table arguments like foo({a = 1}) are converted to zero arguments. For a tutorial showing Lua parsing, this is misleading — readers writing Lua code with table arguments will get broken behavior without explanation.
[error] BinOp::Concat returns Type::String unconditionally
In both ch05 (type inference) and ch06 (diagnostics), the concatenation operator returns:
#![allow(unused)]
fn main() {
BinOp::Concat => Type::String,
}
This returns Type::String regardless of whether the operands are actually strings. In a gradual type checker:
String .. String→Type::String(correct)Dynamic .. anything→ should beType::Dynamic(currentlyType::String— wrong)Number .. String→ should beType::Error(currentlyType::String— wrong)
The current code has no type checking for .. — it’s a blind shortcut. By contrast, BinOp::Add (and other arithmetic ops) in ch06 correctly checks type compatibility and emits diagnostics. BinOp::Concat should do the same.
Note: There’s no test case exercising the .. operator, so this bug isn’t caught by the asserts.
[error] Unused import / dead code warnings
- ch03:
use salsa::Setter;— unused import generates warning - ch03:
let mut db = Database::default()— never calls setters,mutgenerates warning - ch07:
fn display(&self) -> &str— dead code, never called, generates warning
These generate compiler warnings that would confuse readers following along.
[suggestion] No test case exercises the .. operator
None of the chapter tests use Lua’s string concatenation operator (..). Adding a test like "hello" .. " world" and 42 .. "text" would verify the BinOp::Concat case and expose the bug described above.
[clarity] Per-file granularity claim is technically imprecise
Chapter 5’s README says “the granularity is per-file” but the actual granularity is per-statement — check_stmt is called for each statement in the AST, and infer_type is called for each expression within each statement. A change to any statement in the file invalidates all downstream queries for that file. The README should clarify: “granularity is per-file for now; tracked structs (ch04) would enable per-node granularity.”
[suggestion] Add comment explaining per-file granularity limitation
The README says “per-file isolation” in Chapter 7 but the actual behavior is: when a file is edited, ALL statements in that file are re-checked because type_check calls check_stmt for each statement in sequence. Only other files’ queries return cached values. The README should clarify that the per-file isolation means “other files are not re-checked” not “other statements in the same file are not re-checked.”
Priority Fixes from Second Pass
- Fix suffixed expression regression — ch03, ch05, ch06, ch07 silently drop
a[b]anda:f()expressions. Either restore full handling or add a comment explaining it’s intentional. - Fix
BinOp::Concattype checking — should returnType::Dynamicwhen either operand isDynamic, andType::Error+ emit diagnostic when both are incompatible concrete types. - Clean up compiler warnings — remove unused
Setterimport in ch03, changelet mut dbtolet dbin ch03, remove deaddisplaymethod in ch07. - Add table argument handling or comment —
convert_argssilently drops table arguments in all chapters. Either handle them or explain the simplification. - Add
..operator test case — currently untested, and the bug from #2 can’t be caught without it.
Chapter 10: Type Annotations (Reviewed 2026-04-24)
This chapter is structurally strong — the opening example (annotated greet vs unannotated) immediately shows the value, the LuaCATS convention is explained concretely, and the gradual guarantee is articulated with precision. The writing stays in the Lin Clark / Julia Evans register: conversational, concrete before abstract, honest about simplifications. The code compiles and all six tests pass.
But there are real bugs in the annotation-association logic that would produce wrong results in multi-function files. These need fixing.
[error] ---@param and ---@return annotations are matched globally, not per-function
The Function arm of check_stmt looks up parameter annotations like this:
#![allow(unused)]
fn main() {
let annotated = annotations.iter().find_map(|a| match a {
Annotation::Param { name: n, ty } if n == p => Some(ty.clone()),
_ => None,
});
}
annotations is the full list for the entire file. find_map returns the first Annotation::Param with that name, regardless of which function it belongs to. In a file with two functions that both have a parameter named x:
---@param x number
local function add(x) return x + 1 end
---@param x string
local function greet(x) return "hi " .. x end
Both functions would get x: number — the second function’s annotation is silently ignored. The same bug affects ---@return: find_map returns the first Annotation::Return in the entire file, so every function with a return annotation gets the first one found.
Fix: The extract_annotations function already groups params and returns by function (the pending-params/return flushing logic). But after flushing, the annotations lose their association with the function they belong to. Either: (a) add a function_name: String field to Annotation::Param and Annotation::Return so they can be matched to the right function, or (b) change the extraction to return a HashMap<String, FunctionAnnotations> keyed by function name. Option (b) is cleaner.
[error] apply_annotation matches the first empty-name ---@type to any variable
apply_annotation iterates ALL annotations and returns the first Annotation::Type { name: "", ty } it finds, regardless of which variable it should apply to. In a file with multiple ---@type annotations:
---@type string
local x = "hello"
---@type number
local y = 42
When checking y, apply_annotation finds the first ---@type string annotation and applies it, making y: string instead of y: number.
Fix: Either track which ---@type annotations have already been consumed (consume-on-read), or — since the README acknowledges that ---@type position matching isn’t implemented — don’t apply ---@type annotations at all. The current half-implementation produces wrong results silently, which is worse than not applying them.
[error] Pending annotations are never cleared when a non-function line breaks the association
The LuaCATS position rule says annotations must appear immediately above the thing they annotate. A comment or blank line between ---@param and the function should break the association. The code’s own comment says this: “stray comments between annotations and their targets — they break the association.” But the code doesn’t implement it:
#![allow(unused)]
fn main() {
if !trimmed.starts_with("---@") && !trimmed.is_empty() {
if pending_params.is_empty() && pending_return.is_none() {
// Nothing to clear.
}
}
}
This only enters the block when pending is already empty — the case where clearing is needed (pending is NOT empty) is skipped. Pending params and returns are never cleared on a breaking line.
Fix: Clear pending_params and pending_return when a non-annotation, non-function, non-empty line appears:
#![allow(unused)]
fn main() {
if !trimmed.starts_with("---@") && !trimmed.is_empty()
&& !trimmed.starts_with("local function") && !trimmed.starts_with("function") {
pending_params.clear();
pending_return = None;
}
}
[clarity] “Annotations are inputs, not derived values” — contradicted two paragraphs later
The “Annotations as Salsa Inputs” section opens with: “annotations are inputs, not derived values.” But the very next paragraph explains that we extract them via a tracked function (making them derived), and the diagram shows them as a tracked function between the input and type_check. The text eventually resolves the tension (“they’re derived from the source text — they’re a parsed view of what’s already in the input”), but the opening claim is flatly wrong and will confuse the reader before they reach the nuance.
Fix: Reword the opening to something like: “Annotations feel like inputs — the programmer writes them, they don’t change unless the source changes. But architecturally, they’re extracted from the source text by a tracked function, which means Salsa handles invalidation automatically.” Start with the accurate description, not the wrong one.
[clarity] Code snippet in README uses fe without declaration
The “How Annotations Change Type Checking” section shows this code:
#![allow(unused)]
fn main() {
for p in params {
let annotated = annotations.iter().find_map(|a| match a {
Annotation::Param { name: n, ty } if n == p => Some(ty.clone()),
_ => None,
});
let pt = annotated.unwrap_or(Type::Dynamic);
param_types.push(pt.clone());
fe = fe.extend(p.clone(), pt);
}
}
The variable fe appears from nowhere. The actual code has let mut fe = env.clone(); above the loop, but the README snippet omits it. Readers tracing through the code won’t know what fe is. Either include the declaration or rename to something self-explanatory like func_env.
[clarity] Annotation::Type { name: String, ty: Type } has a name field that’s always empty
The ---@type variant has a name field, but the parser always sets it to String::new(). The README acknowledges that ---@type doesn’t carry a variable name. So why does the data structure have the field? This will confuse readers who look at the type definition and wonder when name is non-empty. Either remove the field or add a comment on the definition explaining it’s a placeholder for future position-based matching.
[style] The opening example is excellent
The greet example at the top of the chapter is exactly right. It shows the problem concretely (passing 42 to a function that concatenates), explains why the type checker misses it (parameter is Dynamic), and motivates the whole chapter in three paragraphs. No “What You’ll Learn” section, no hand-waving. More of this.
[style] “The Gradual Guarantee” section is precise and well-articulated
The before/after example (unannotated → Dynamic → no errors vs annotated → type checking activates) is clean. The closing line — “Dynamic is not a failure of the type checker — it’s the whole point of gradual typing” — is the kind of precise distinction that makes a tutorial valuable. More of this.
[suggestion] Add a test case with multiple annotated functions
All six test cases use single-function source files. A test with two annotated functions would expose the annotation-association bugs described above. For example:
#![allow(unused)]
fn main() {
let source = SourceFile::new(&mut db, "test.lua", [
"---@param x number",
"local function add(x) return x + 1 end",
"---@param x string",
"local function greet(x) return \"hi \" .. x end",
"greet(42)",
].join("\n"));
}
This should catch greet(42) as a type error (passing number to string param), but with the current code it won’t, because greet’s x gets the number annotation from add.
[fixed] BinOp::Concat type checking is now correct in Ch10
The second-pass review flagged that BinOp::Concat unconditionally returned Type::String regardless of operand types (ch05/ch06). Ch10’s infer_type now handles all cases correctly: String..String → String, Dynamic..anything → Dynamic, Error → Error, and incompatible types emit a diagnostic and return Type::Error. This fix is in Ch10’s code only — ch05 and ch06 still have the bug.
[suggestion] The “What We’re Simplifying” section could be slightly more honest about the ---@type bug — [fixed] ✅
The section now says: “We match it to the next variable that requests an annotation, then consume it so it’s not reused.” This accurately describes the consume-on-read pattern implemented in the fix.
2026-04-25 Verification — Ch10 Annotation-Association Fixes
Lola fixed the three [error] items in ch10’s annotation logic:
-
---@param/---@returnscoped to function — [fixed] ✅ —Annotation::ParamandAnnotation::Returnnow have afunc_name: Stringfield.extract_annotationsrecords the function name when flushing pending params/returns.check_stmtmatches only annotations wherefunc_name == name. -
apply_annotationconsumes---@typeon use — [fixed] ✅ —apply_annotationnow takes&mut Vec<Annotation>and removes each---@typeannotation after applying it. This prevents the same annotation from being applied to every variable. -
Pending annotations cleared on breaking lines — [fixed] ✅ — When a non-annotation, non-function, non-empty line appears with pending params/returns, the pending state is now cleared. The previous code had the logic inverted — it only entered the clearing block when pending was already empty.
New test added: multiple_functions_with_same_param_name — verifies that greet(42) fails when greet’s x is annotated as string even though add’s x is annotated as number. All 7 tests pass.
README updated: code snippets reflect func_name field, “What We’re Simplifying” section accurately describes the consume-on-read pattern.
2026-04-29 — Bug Fixes in Chapters 5, 6, 7
While doing a review pass, I went through the flagged issues in Esme’s review and verified the actual state of the code. Several were already fixed (the suffixed expression regression in ch05/ch06/ch07 — the broken guard pattern from ch02 had already been replaced with proper if s.computed branching). Two real issues remained:
[fixed] Method calls returned Nil type in ch05, ch06, ch07 — [fixed] ✅
The convert_expr function for ast::Expression::Suffixed had this branch:
#![allow(unused)]
fn main() {
} else if s.method {
// Method calls handled by FuncCall; fall back
Expression::Nil
}
When a Lua method call (a:b()) appeared without a surrounding function call prefix, it returned Type::Nil with no diagnostic. This silently propagates — if a method reference is used anywhere, the type checker would silently treat it as nil. Fixed by using the <method-call> placeholder from ch02, which produces a clear “unknown variable” diagnostic instead.
[fixed] Table arguments silently dropped without comment in ch05, ch06, ch07 — [fixed] ✅
The convert_args match used _,_=>vec![] to catch Table args (and everything else), with no comment explaining the drop. This meant foo({a = 1}) silently passes zero arguments — potentially masking real type errors. Replaced with:
#![allow(unused)]
fn main() {
// All other arg forms (Table, etc.) are dropped. We'd need
// Expression::Table and proper table type inference to handle them.
_=>vec![]
}
Remaining known limitations (not bugs, just not yet implemented)
convert_argsdrops table args — still the case, now documented. Would needExpression::Tablevariant.Symbol<'db>not used inDefinition— ch03Definitionstill usesString, notSymbol<'db>. Using interned symbols in plain data structs requires careful lifetime handling.BinOp::Concatwrong type in ch05/ch06 — only fixed in ch10. Earlier chapters still returnType::Stringunconditionally.
All 23 tests pass. cargo build produces zero warnings.
2026-05-01 Deep Re-Review — Ch11: Union Types and Table Classes
Reviewed by Esme, 2026-05-01
No new commits since the 09:52 UTC review. Deep re-review of Ch11 — a chapter I hadn’t scrutinized closely before.
[error] is_compatible_with for Union→expected uses any() — should use all()
The is_compatible_with implementation for union types:
#![allow(unused)]
fn main() {
(Type::Union(variants), expected) => {
variants.iter().any(|v| v.is_compatible_with(expected))
}
}
This says: “a union value is compatible with the expected type if any variant is compatible.” But that’s wrong for type safety. If x: Number | Boolean and you’re assigning it to y: Number | String, the assignment should fail because Boolean isn’t compatible with Number | String. With any(), it passes — because Number IS compatible with Number | String.
The correct subtyping rule for unions is: all variants of the source union must be compatible with the target. The fix:
#![allow(unused)]
fn main() {
(Type::Union(variants), expected) => {
variants.iter().all(|v| v.is_compatible_with(expected))
}
}
The second arm (actual value → Union expected) is correct with any() — a concrete Number only needs to fit one variant of Number | String.
The chapter discusses is_compatible_with vs is_exactly but frames the problem as overly lenient for arithmetic (“a Number | String in arithmetic position passes the check”). That’s a real issue, but the union-to-union comparison bug is more fundamental — it gives wrong answers for assignments, not just operations. A Number | Boolean would be incorrectly accepted where Number | String is expected.
Why this matters pedagogically: A reader building a type checker from this chapter would implement union compatibility wrong. They’d get confusing results when unions appear on both sides of an assignment, and the bug would be hard to trace because it only manifests for multi-variant unions where the variants differ.
Fix: Change any() to all() in the Union→expected arm. Update the text’s discussion of is_compatible_with limitations to note that the fix addresses both the assignment-correctness issue AND the arithmetic-lenience issue (with all(), Number | String is NOT compatible with Number, which means arithmetic on a union value correctly fails).
[clarity] parse_type_name maps unknown type names to Dynamic without warning
The parse_type_name function maps unknown type names to Type::Dynamic with a comment: // Unknown type name → Dynamic (with warning, in a real checker). But the code doesn’t emit any warning. A reader building a checker would want to know when they’ve misspelled a type name (nmuber instead of number). The comment acknowledges the gap but doesn’t explain why the tutorial omits it or what the reader should do about it.
Fix: Either add a brief explanation (“We don’t emit warnings here because the annotation parser runs before diagnostics are collected — a production checker would buffer these as parse warnings”) or add a // TODO: emit diagnostic for unknown type name to make the omission explicit.
Status
- 0 outstanding [error] items ✅ —
is_compatible_withUnion→expectedany()→all()fixed (d94ac94) - 0 outstanding [clarity] items ✅ —
parse_type_nameunknown-type-name mapping now has inline comment explaining the omission (code line 249-250)