Chapter 5: Type Inference — The Core Query

The heart of the tutorial: implement type inference as a Salsa tracked query and see incrementality in action.

What You'll Learn

  • How gradual typing works: types where you want them, flexibility where you don't
  • How to implement type inference as a Salsa tracked query
  • How recursive tracked queries work (each call is memoized independently)
  • The relationship between query granularity and incrementality

The Mental Model: Gradual Typing

Lua is dynamically typed. You don't write local x: number = 42. But that doesn't mean we can't check types — we just do it gradually:

  • If the type is known, check it. 42 + 1 → both are numbers → OK.
  • If the type is unknown, allow it. unknown_var + 1 → unknown is "dynamic" → OK (we trust the programmer).
  • If there's a contradiction, that's an error. 42 + "hello" → number + string → Error.

The key distinction: Dynamic ≠ Error. Dynamic means "we don't know and that's fine." Error means "we found a contradiction." One is intentional flexibility; the other is a bug.

Our Type enum:

Type::Number        — we know it's a number
Type::String        — we know it's a string
Type::Boolean       — we know it's a boolean
Type::Nil           — we know it's nil
Type::Function      — we know it's a function (with param/return types)
Type::Dynamic       — we don't know, and that's OK
Type::Error         — type checking found a contradiction

Compatibility check: is Type::Dynamic compatible with Type::Number? Yes. Is Type::Error compatible with anything? Yes (to avoid cascading errors). Is Type::Number compatible with Type::String? No.

Step 1: The Inference Query

#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn infer_type(
    db: &dyn salsa::Database,
    source: SourceFile,
    expr: Expression,
    env: TypeEnv,
) -> Type {
    match expr {
        Expression::Nil => Type::Nil,
        Expression::True | Expression::False => Type::Boolean,
        Expression::Number(_) => Type::Number,
        Expression::StringLiteral(_) => Type::String,
        Expression::Name(ref name) => env.lookup(name),
        Expression::BinaryOp { ref left, op, ref right } => {
            let left_type = infer_type(db, source, (**left).clone(), env.clone());
            let right_type = infer_type(db, source, (**right).clone(), env.clone());
            // ... type-check the operation
        }
        // ...
    }
}
}

This is a tracked function that takes an expression and an environment, and returns a type. The environment (a list of name→type bindings) is part of the query key — same expression, different environment = different result.

Recursive calls are fine. infer_type(a + b) calls infer_type(a) and infer_type(b). Salsa memoizes each call independently. If only b changes, infer_type(a) returns its cached value, and only infer_type(b) re-runs.

Step 2: The Type Environment

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub struct TypeEnv {
    pub bindings: Vec<(String, Type)>,
}
}

The environment maps variable names to types. When we see local x = 42, we add ("x", Type::Number) to the environment. When we look up x later, we find it's a number.

Why is TypeEnv part of the query key? Because the same variable name in different scopes can have different types. infer_type(db, source, expr_a, env_1) and infer_type(db, source, expr_a, env_2) are different queries with potentially different results.

This is correct but has a cost: if the environment changes (because you changed a variable's type), many queries re-run. In a production system, you'd use tracked structs for the environment to get finer-grained invalidation.

Step 3: Type-Checking Statements

Type inference handles expressions. Type-checking handles statements — it walks the AST, calls infer_type for each expression, and updates the environment:

Statement::Assignment { local, targets, values }
  → infer the value type
  → extend the environment with target = value_type

Statement::Function { name, params, body }
  → create a new env with params as Dynamic
  → type-check the body in that env
  → add name = Dynamic to the outer env

Statement::If { test, then_block, else_block }
  → infer the test type
  → check both blocks

Statement checking is not a tracked query in our implementation — it's a regular function called from type_check_program. This means the granularity of our type checker is per-file: changing any statement invalidates the entire file's type check.

Is this a problem? For a tutorial, no. For a production system, yes — you'd want per-function granularity. That's where tracked structs (Chapter 4) become essential: each function gets its own tracked struct, and changing one function only invalidates that function's type check.

Step 4: The Full Pipeline

SourceFile (input)
     │
     ▼
   parse()           → LuaAst (plain data)
     │
     ▼
   top_level_types() → check each statement
     │                    │
     │                    ▼
     │                 infer_type() — recursive, memoized
     │                    │
     │              (recursive calls, each cached)
     │
     ▼
   Vec<(String, Type)> — the top-level bindings and their types

Step 5: Incrementality in Action

#![allow(unused)]
fn main() {
let source = SourceFile::new(&db, "example.lua", "local x = 42\nlocal y = \"hello\"\n");
let types = top_level_types(&db, source);
// x: Number, y: String

source.set_text(&mut db).to("local a = 1 + 2\n".to_string());
let new_types = top_level_types(&db, source);
// a: Number — re-computed because source changed
}

When we change the source text, parse() re-runs, top_level_types() re-runs, and infer_type() re-runs for each expression. Everything that depended on the old source text is invalidated.

But here's what doesn't re-run: queries for other source files. If we have two files and only change one, the other file's type check returns from cache instantly. This is per-input isolation from Chapter 1, now applied to real work.

Running

cargo run --bin ch05-type-inference

Key Takeaways

  1. Gradual typing = Types + Flexibility. Known types are checked. Unknown types are allowed. Contradictions are errors. This is the sweet spot between static and dynamic typing.

  2. The environment is part of the query key. Same expression, different environment = different result. This is correct: x in one scope is not x in another.

  3. Recursive queries are fine. infer_type(a + b) calls infer_type(a) and infer_type(b). Each is memoized independently. This is one of Salsa's most powerful features.

  4. Granularity matters. With a plain-data AST, the granularity is per-file. With tracked structs, it's per-node. For a tutorial, per-file is fine. For a production system, use tracked structs.

  5. The pipeline is a dependency graph. SourceFile → parse → type_check → infer_type. Salsa builds this graph automatically from what you read. You don't declare dependencies — they're inferred.

What's Next

Chapter 6: Diagnostics as Accumulators — How to report errors without poisoning the query graph. This is essential for a real type checker: you want partial results even when there are type errors.