Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 5: Type Inference — The Core Query

You’ve built the pieces: inputs (source files), interned symbols (fast name comparison), and tracked structs (entities with stable identity). Now the question is: given an expression, what type does it produce?

That’s type inference. 42 produces Number. "hello" produces String. x + 1 produces Number if x is compatible with Number. The type checker’s job is to walk the AST and assign a type to every expression — and when it finds a contradiction (adding a string to a number), emit a diagnostic.

This chapter builds the core infer_type query. It’s the query that everything else depends on: diagnostics, cross-file checking, and the language server all start here.

The Mental Model: Gradual Typing

Lua is dynamically typed. You don’t write local x: number = 42. But that doesn’t mean we can’t check types — we do it gradually. If the type is known, we check it (42 + 1 → both are numbers → OK). If the type is unknown, we allow it (unknown_var + 1 → unknown is “dynamic” → OK, we trust the programmer). If there’s a contradiction, that’s an error (42 + "hello" → number + string → Error).

The key distinction: Dynamic ≠ Error. Dynamic means “we don’t know and that’s fine.” Error means “we found a contradiction.” One is intentional flexibility; the other is a bug.

Our Type enum:

#![allow(unused)]
fn main() {
pub enum Type {
    Number,
    String,
    Boolean,
    Nil,
    Function { params: Vec<Type>, ret: Box<Type> },
    Dynamic,
    Error,
}
}

Seven variants, two that need explanation. Function { params, ret } is a record variant — it carries the parameter types and return type, not a unit or tuple. ret is Box<Type> (not Type directly) because Rust requires box indirection for recursive types: Function contains Type, which can be Function, which contains Type, and so on. Without the Box, the enum would have infinite size. Dynamic means “we don’t know and that’s OK” — it’s the gradual typing escape hatch. Error means “we found a contradiction” — a type check failed. They’re not the same: Dynamic is intentional flexibility, Error is a bug.

Compatibility check: is Type::Dynamic compatible with Type::Number? Yes. Is Type::Error compatible with anything? Yes (to avoid cascading errors). Is Type::Number compatible with Type::String? No.

Here’s the method that implements those rules:

#![allow(unused)]
fn main() {
impl Type {
    fn is_compatible_with(&self, target: &Type) -> bool {
        match (self, target) {
            (Type::Dynamic, _) | (_, Type::Dynamic) => true,
            (Type::Error, _) | (_, Type::Error) => true,
            (Type::Number, Type::Number) => true,
            (Type::String, Type::String) => true,
            (Type::Boolean, Type::Boolean) => true,
            (Type::Nil, Type::Nil) => true,
            (Type::Function { params: p1, ret: r1 },
             Type::Function { params: p2, ret: r2 }) => {
                p1.len() == p2.len()
                    && p1.iter().zip(p2.iter()).all(|(a, b)| a.is_compatible_with(b))
                    && r1.is_compatible_with(r2)
            }
            _ => false,
        }
    }
}
}

Two things to notice. Dynamic is compatible with everything, in both directions(Type::Dynamic, _) covers Dynamic on the left, (_, Type::Dynamic) covers it on the right. This is the gradual typing guarantee: if we don’t know the type, we don’t complain. Error is also compatible with everything. A type error in one expression shouldn’t cascade into errors in every expression that references it. Once you have an Error, all downstream checks pass silently — the real error was reported where it happened.

We’ll extend this method in Chapter 11 when we add union types (Number | String). For now, every type either matches itself, is Dynamic, or is Error.

Step 1: The Inference Query

#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn infer_type(
    db: &dyn salsa::Database,
    source: SourceFile,
    expr: Expression,
    env: TypeEnv,
) -> Type {
    match expr {
        Expression::Nil => Type::Nil,
        Expression::True | Expression::False => Type::Boolean,
        Expression::Number(_) => Type::Number,
        Expression::StringLiteral(_) => Type::String,
        Expression::Name(ref name) => env.lookup(name),
        Expression::BinaryOp { ref left, op, ref right } => {
            let left_type = infer_type(db, source, (**left).clone(), env.clone());
            let right_type = infer_type(db, source, (**right).clone(), env.clone());
            match op {
                BinOp::Add | BinOp::Sub | BinOp::Mul | BinOp::Div
                | BinOp::Mod | BinOp::Pow => {
                    // Arithmetic: both sides must be compatible with Number.
                    if left_type.is_compatible_with(&Type::Number)
                        && right_type.is_compatible_with(&Type::Number)
                    {
                        Type::Number
                    } else {
                        Type::Error
                    }
                }
                BinOp::Concat => {
                    // Lua's `..` operator coerces numbers to strings.
                    // So `1 .. "x"` is valid — the 1 becomes "1" at runtime.
                    // If either operand is Dynamic, we can't know whether
                    // coercion will happen or fail — Dynamic is the honest answer.
                    //
                    // For known types: Number and String are both valid operands.
                    // Anything else (Boolean, Nil, Function) is an error — Lua
                    // only coerces numbers, not arbitrary values.
                    if left_type == Type::Dynamic || right_type == Type::Dynamic {
                        Type::Dynamic
                    } else if is_concat_compatible(&left_type) && is_concat_compatible(&right_type) {
                        Type::String
                    } else {
                        Type::Error
                    }
                }
}

The is_concat_compatible helper accepts both String and Number — Lua coerces numbers to strings for .., but not booleans, nil, or functions:

#![allow(unused)]
fn main() {
/// Lua coerces numbers to strings for concatenation: `1 .. "x"` is valid.
/// But it does NOT coerce booleans, nil, or functions — those are errors.
fn is_concat_compatible(ty: &Type) -> bool {
    matches!(ty, Type::String | Type::Number)
}
}

You might notice an inconsistency: arithmetic returns Number when either operand is Dynamic (because Dynamic.is_compatible_with(&Number) is true), but concat returns Dynamic. Why the difference? For arithmetic, the gradual typing convention is that Dynamic + 1 optimistically returns Number — if the Dynamic turns out to be a string at runtime, that’s a type error, but the type checker assumes the best case. For concat, the coercion possibility makes this optimism less safe: 1 .. "x" works (Number coerces to String), but true .. "x" fails — and we can’t tell at compile time which case we’re in. So concat is more conservative. Both choices are defensible for gradual typing; the key is being consistent within each operator.

Back in the BinOp match, the comparison and logical operators are straightforward:

#![allow(unused)]
fn main() {
                BinOp::Lt | BinOp::Gt | BinOp::Le | BinOp::Ge
                | BinOp::Eq | BinOp::Ne => Type::Boolean,
                // Lua's `and`/`or` return one of their operands (not a boolean).
                // `true and 42` returns 42 (Number), `nil or 1` returns 1 (Number).
                // When either operand is Dynamic, we return Dynamic — we can't
                // know which operand the runtime will choose, so Dynamic is the
                // honest answer. The real type would be a union of both operand
                // types — Chapter 11 introduces the union type machinery you'd
                // need to express this properly.
                BinOp::And | BinOp::Or => {
                    if left_type == Type::Dynamic || right_type == Type::Dynamic {
                        Type::Dynamic
                    } else {
                        left_type
                    }
                }
            }
        }
        Expression::UnaryOp { op, ref expr } => {
            let inner = infer_type(db, source, (**expr).clone(), env);
            match op {
                UnaryOp::Negate => {
                    if inner.is_compatible_with(&Type::Number) { Type::Number } else { Type::Error }
                }
                UnaryOp::Not => Type::Boolean,
                UnaryOp::Length => {
                    if inner.is_compatible_with(&Type::String) { Type::Number } else { Type::Error }
                }
            }
        }
        Expression::FunctionCall { ref func, ref args } => {
            let func_type = infer_type(db, source, (**func).clone(), env.clone());
            match func_type {
                Type::Function { params, ret } => {
                    for (i, arg) in args.iter().enumerate() {
                        let arg_type = infer_type(db, source, arg.clone(), env.clone());
                        if let Some(param_type) = params.get(i) {
                            if !arg_type.is_compatible_with(param_type) {
                                return Type::Error;
                            }
                        }
                    }
                    *ret
                }
                Type::Dynamic => Type::Dynamic,
                Type::Error => Type::Error,
                _ => Type::Error,
            }
        }
        Expression::FieldAccess { .. } | Expression::Index { .. } => Type::Dynamic,
    }
}
}

The UnaryOp arms follow the same pattern as binary operators: check compatibility, return the result type or Error. Negate requires a Number, Not always returns Boolean, and Length requires a String and returns Number (Lua’s # operator returns the length of a string).

The FunctionCall arm is where inference and the type environment finally connect to function definitions. When the function has a known Function type, we check each argument against its parameter type using is_compatible_with — a Dynamic argument is always compatible, but a String argument against a Number parameter is an error. When the function is Dynamic, the return type is Dynamic too. When the function is Error, we propagate the error. Any other type being called (calling a Number?) is also an error.

FieldAccess and Index return Dynamic — we don’t model Lua tables yet, so we can’t say anything about what’s inside them.

This is a tracked function that takes an expression and an environment, and returns a type. The environment (a list of name→type bindings) is part of the query key — same expression, different environment = different result.

Recursive calls are fine. infer_type(a + b) calls infer_type(a) and infer_type(b). Salsa memoizes each call independently. If only b changes, infer_type(a) returns its cached value, and only infer_type(b) re-runs.

Step 2: The Type Environment

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub struct TypeEnv {
    pub bindings: Vec<(String, Type)>,
}
}

The salsa::Update derive is not optional decoration. It lets Salsa compare old and new values of this type for fine-grained invalidation. Every type that Salsa needs to compare for change detection — whether as an input, an output, or an intermediate value — needs this derive. When a tracked function takes or returns a TypeEnv, Salsa uses the Update impl to check whether the value actually changed. If it didn’t, downstream queries don’t re-run.

The environment maps variable names to types. When we see local x = 42, we add ("x", Type::Number) to the environment. When we look up x later, we find it’s a number. TypeEnv::lookup does a linear scan of bindings and returns the matching type, or Type::Dynamic if the name isn’t found — the same pattern as the lookup_name function in Chapter 3, but for types instead of definitions.

Why is TypeEnv part of the query key? Because the same variable name in different scopes can have different types. infer_type(db, source, expr_a, env_1) and infer_type(db, source, expr_a, env_2) are different queries with potentially different results.

This is correct but has a cost: if the environment changes (because you changed a variable’s type), many queries re-run. In a production system, you’d use tracked structs for the environment to get finer-grained invalidation.

Step 3: Type-Checking Statements

Type inference handles expressions. Type-checking handles statements — it walks the AST, calls infer_type for each expression, and updates the environment:

Statement::Assignment { local, targets, values }
  → infer the value type
  → extend the environment with target = value_type

Statement::Function { name, params, body }
  → create a new env with params as Dynamic
  → type-check the body in that env
  → add name = Dynamic to the outer env

Statement::If { test, then_block, else_block }
  → infer the test type
  → check both blocks

Statement checking is not a tracked query in our implementation — it’s a regular function called from type_check_program. This means the granularity of our type checker is per-file: changing any statement invalidates the entire file’s type check. Other files’ cached results are preserved (that’s the per-file isolation), but all statements within the changed file are re-checked from scratch — even the ones that didn’t change.

Is this a problem? For a tutorial, no. For a production system, yes — you’d want per-function granularity. That’s where tracked structs (Chapter 4) become essential: each function gets its own tracked struct, and changing one function only invalidates that function’s type check.

Here are the two tracked queries that tie it all together:

#![allow(unused)]
fn main() {
/// Type-check an entire program.
#[salsa::tracked]
pub fn type_check_program(db: &dyn salsa::Database, source: SourceFile) {
    let ast = parse(db, source);
    let mut env = TypeEnv::new();
    for stmt in &ast.statements {
        env = check_stmt(db, source, stmt, &env);
    }
}

/// Get all inferred types for top-level bindings.
#[salsa::tracked]
pub fn top_level_types(db: &dyn salsa::Database, source: SourceFile) -> Vec<(String, Type)> {
    let ast = parse(db, source);
    let mut env = TypeEnv::new();
    for stmt in &ast.statements {
        env = check_stmt(db, source, stmt, &env);
    }
    env.bindings
}
}

Two things to notice. type_check_program walks all statements and updates the environment, but doesn’t return a value — its job is to trigger infer_type for every expression and let Salsa record the dependencies. If you later query individual expression types, they’ll be cached. top_level_types does the same walk but returns the final environment’s bindings, the top-level name→type pairs. This is the query you’d call from a language server to display hover types.

Yes, they duplicate the same loop. In a production system, type_check_program would exist as the entry point and top_level_types would read its result instead of recomputing it. For the tutorial, keeping them separate makes the dependency chain clearer: one is side-effect-driven (check everything), the other is value-driven (return the bindings).

Step 4: The Full Pipeline

SourceFile (input)
     │
     ▼
   parse()           → LuaAst (plain data)
     │
     ▼
   top_level_types() → check each statement
     │                    │
     │                    ▼
     │                 infer_type() — recursive, memoized
     │                    │
     │              (recursive calls, each cached)
     │
     ▼
   Vec<(String, Type)> — the top-level bindings and their types

Step 5: Incrementality in Action

#![allow(unused)]
fn main() {
use salsa::Setter;
use std::path::PathBuf;

let source = SourceFile::new(&db, PathBuf::from("example.lua"), "local x = 42\nlocal y = \"hello\"\n");
let types = top_level_types(&db, source);
// x: Number, y: String

source.set_text(&mut db).to("local a = 1 + 2\n".to_string());
let new_types = top_level_types(&db, source);
// a: Number — re-computed because source changed
}

When we change the source text, parse() re-runs, top_level_types() re-runs, and infer_type() re-runs for each expression. Everything that depended on the old source text is invalidated.

But here’s what doesn’t re-run: queries for other source files. If we have two files and only change one, the other file’s type check returns from cache instantly. This is per-input isolation from Chapter 1, now applied to real work.

Running

cargo run --bin ch05-type-inference

This chapter also has a #[cfg(test)] module — the idiomatic Rust way to write tests. The demo assertions in main() show results immediately, but cargo test is what you’d use in a real project:

cargo test --bin ch05-type-inference

Next: Chapter 6: Diagnostics as Accumulators — How to report errors without poisoning the query graph. This is essential for a real type checker: you want partial results even when there are type errors.