Chapter 5: Type Inference — The Core Query
The heart of the tutorial: implement type inference as a Salsa tracked query and see incrementality in action.
What You'll Learn
- How gradual typing works: types where you want them, flexibility where you don't
- How to implement type inference as a Salsa tracked query
- How recursive tracked queries work (each call is memoized independently)
- The relationship between query granularity and incrementality
The Mental Model: Gradual Typing
Lua is dynamically typed. You don't write local x: number = 42. But that doesn't mean we can't check types — we just do it gradually:
- If the type is known, check it.
42 + 1→ both are numbers → OK. - If the type is unknown, allow it.
unknown_var + 1→ unknown is "dynamic" → OK (we trust the programmer). - If there's a contradiction, that's an error.
42 + "hello"→ number + string → Error.
The key distinction: Dynamic ≠ Error. Dynamic means "we don't know and that's fine." Error means "we found a contradiction." One is intentional flexibility; the other is a bug.
Our Type enum:
Type::Number — we know it's a number
Type::String — we know it's a string
Type::Boolean — we know it's a boolean
Type::Nil — we know it's nil
Type::Function — we know it's a function (with param/return types)
Type::Dynamic — we don't know, and that's OK
Type::Error — type checking found a contradiction
Compatibility check: is Type::Dynamic compatible with Type::Number? Yes. Is Type::Error compatible with anything? Yes (to avoid cascading errors). Is Type::Number compatible with Type::String? No.
Step 1: The Inference Query
#![allow(unused)] fn main() { #[salsa::tracked] pub fn infer_type( db: &dyn salsa::Database, source: SourceFile, expr: Expression, env: TypeEnv, ) -> Type { match expr { Expression::Nil => Type::Nil, Expression::True | Expression::False => Type::Boolean, Expression::Number(_) => Type::Number, Expression::StringLiteral(_) => Type::String, Expression::Name(ref name) => env.lookup(name), Expression::BinaryOp { ref left, op, ref right } => { let left_type = infer_type(db, source, (**left).clone(), env.clone()); let right_type = infer_type(db, source, (**right).clone(), env.clone()); // ... type-check the operation } // ... } } }
This is a tracked function that takes an expression and an environment, and returns a type. The environment (a list of name→type bindings) is part of the query key — same expression, different environment = different result.
Recursive calls are fine. infer_type(a + b) calls infer_type(a) and infer_type(b). Salsa memoizes each call independently. If only b changes, infer_type(a) returns its cached value, and only infer_type(b) re-runs.
Step 2: The Type Environment
#![allow(unused)] fn main() { #[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)] pub struct TypeEnv { pub bindings: Vec<(String, Type)>, } }
The environment maps variable names to types. When we see local x = 42, we add ("x", Type::Number) to the environment. When we look up x later, we find it's a number.
Why is TypeEnv part of the query key? Because the same variable name in different scopes can have different types. infer_type(db, source, expr_a, env_1) and infer_type(db, source, expr_a, env_2) are different queries with potentially different results.
This is correct but has a cost: if the environment changes (because you changed a variable's type), many queries re-run. In a production system, you'd use tracked structs for the environment to get finer-grained invalidation.
Step 3: Type-Checking Statements
Type inference handles expressions. Type-checking handles statements — it walks the AST, calls infer_type for each expression, and updates the environment:
Statement::Assignment { local, targets, values }
→ infer the value type
→ extend the environment with target = value_type
Statement::Function { name, params, body }
→ create a new env with params as Dynamic
→ type-check the body in that env
→ add name = Dynamic to the outer env
Statement::If { test, then_block, else_block }
→ infer the test type
→ check both blocks
Statement checking is not a tracked query in our implementation — it's a regular function called from type_check_program. This means the granularity of our type checker is per-file: changing any statement invalidates the entire file's type check.
Is this a problem? For a tutorial, no. For a production system, yes — you'd want per-function granularity. That's where tracked structs (Chapter 4) become essential: each function gets its own tracked struct, and changing one function only invalidates that function's type check.
Step 4: The Full Pipeline
SourceFile (input)
│
▼
parse() → LuaAst (plain data)
│
▼
top_level_types() → check each statement
│ │
│ ▼
│ infer_type() — recursive, memoized
│ │
│ (recursive calls, each cached)
│
▼
Vec<(String, Type)> — the top-level bindings and their types
Step 5: Incrementality in Action
#![allow(unused)] fn main() { let source = SourceFile::new(&db, "example.lua", "local x = 42\nlocal y = \"hello\"\n"); let types = top_level_types(&db, source); // x: Number, y: String source.set_text(&mut db).to("local a = 1 + 2\n".to_string()); let new_types = top_level_types(&db, source); // a: Number — re-computed because source changed }
When we change the source text, parse() re-runs, top_level_types() re-runs, and infer_type() re-runs for each expression. Everything that depended on the old source text is invalidated.
But here's what doesn't re-run: queries for other source files. If we have two files and only change one, the other file's type check returns from cache instantly. This is per-input isolation from Chapter 1, now applied to real work.
Running
cargo run --bin ch05-type-inference
Key Takeaways
-
Gradual typing = Types + Flexibility. Known types are checked. Unknown types are allowed. Contradictions are errors. This is the sweet spot between static and dynamic typing.
-
The environment is part of the query key. Same expression, different environment = different result. This is correct:
xin one scope is notxin another. -
Recursive queries are fine.
infer_type(a + b)callsinfer_type(a)andinfer_type(b). Each is memoized independently. This is one of Salsa's most powerful features. -
Granularity matters. With a plain-data AST, the granularity is per-file. With tracked structs, it's per-node. For a tutorial, per-file is fine. For a production system, use tracked structs.
-
The pipeline is a dependency graph.
SourceFile → parse → type_check → infer_type. Salsa builds this graph automatically from what you read. You don't declare dependencies — they're inferred.
What's Next
Chapter 6: Diagnostics as Accumulators — How to report errors without poisoning the query graph. This is essential for a real type checker: you want partial results even when there are type errors.