Chapter 11: Union Types — When a Value Can Be More Than One Thing
Our type checker knows about Number, String, Boolean, and Dynamic. But real Lua code doesn’t always fit into single boxes. A function parameter might accept either a string or a number. A variable might hold nil or a table. A return value might be true or an error message.
That’s what union types are for. A union type says “this value is one of these types” — Number | String means “could be a number, could be a string.” LuaCATS uses the | syntax: ---@param x number|string. Our parser doesn’t handle it yet. Let’s fix that.
And then there’s ---@class. Real Lua programs pass around tables with known shapes — a Point has x and y, a User has name and email. LuaCATS lets you declare these shapes with ---@class, and our Type::Table variant has been sitting there unused since Chapter 9. Time to connect them.
Union Types
A union type is a set of possibilities. When the type checker encounters a union, it needs to answer two questions:
Is this value compatible with this type? If a function expects Number | String and you pass 42, that’s fine — Number is one of the possibilities. If you pass true, that’s an error — Boolean isn’t in the union.
What operations are safe? If x is Number | String, you can’t do arithmetic on it (it might be a string) and you can’t concatenate it (it might be a number). You need to narrow the type first — check which possibility it actually is.
Let’s start with the representation.
Adding Type::Union
#![allow(unused)]
fn main() {
#[derive(Clone, Debug, PartialEq, Eq, Hash, salsa::Update)]
pub enum Type {
Number,
String,
Boolean,
Nil,
Dynamic,
Error,
Function {
params: Vec<Type>,
ret: Box<Type>,
},
Table {
fields: Vec<(String, Type)>,
},
Union(Vec<Type>), // <-- NEW: one of these types
}
}
Union(Vec<Type>) holds the variants. The order doesn’t matter — Union([Number, String]) is the same type as Union([String, Number]). We’ll normalize the order when we construct unions so that equality checks work correctly.
Design note: Why
Vec<Type>and notHashSet<Type>? Two reasons. First,TypederivesHashandEq, so aHashSetwould work — butsalsa::Updateneeds to compare old and new values for fine-grained invalidation, andVeccomparisons are simpler and more predictable thanHashSetcomparisons (order-dependent vs. order-independent). Second, union types are typically small (2-3 variants), so the linear-scan cost of aVecis negligible. If you had unions with hundreds of variants, you’d want a different representation, but that’s not a realistic use case.
Constructing Unions
We don’t want Union([Number]) (a union of one type — Number itself) or Union([Number, Number]) (duplicates). Let’s add a constructor that normalizes:
#![allow(unused)]
fn main() {
impl Type {
pub fn union(types: Vec<Type>) -> Type {
// Flatten nested unions: Union([Number, Union([String, Boolean])])
// → Union([Number, String, Boolean])
let mut flat = Vec::new();
for t in types {
match t {
Type::Union(inner) => flat.extend(inner),
other => flat.push(other),
}
}
// Remove duplicates and sort for consistent equality
// Sort by Debug representation for deterministic order.
// This is a simplification: Debug output isn't guaranteed stable
// across Rust versions, and format! inside a sort comparator
// allocates per comparison. For the small unions in this tutorial
// (2–3 variants), it's fine.
//
// Why not derive Ord on Type? Because Expression contains Type
// (in Function, Table variants), and deriving Ord on Type would
// require Ord on every field type — including BinOp, UnaryOp,
// and Expression itself. That cascades through the entire AST.
// A production checker would either: (a) extract a discriminant
// method on Type that returns a stable integer (0=Number, 1=String,
// etc.) and sort by that, or (b) use a canonical ordering key
// derived from the type's structure, avoiding the derive cascade.
flat.sort_by(|a, b| format!("{a:?}").cmp(&format!("{b:?}")));
flat.dedup();
// Simplify: Union of one type → that type
// Union containing Error → Error (tainted)
// Union containing Dynamic → Dynamic (absorbs alternatives)
if flat.len() == 1 {
return flat.pop().unwrap();
}
if flat.iter().any(|t| matches!(t, Type::Error)) {
return Type::Error;
}
if flat.iter().any(|t| matches!(t, Type::Dynamic)) {
return Type::Dynamic;
}
Type::Union(flat)
}
}
}
The normalization handles three cases:
- Nested unions get flattened.
Number | (String | Boolean)becomesNumber | String | Boolean. - Duplicates get removed.
Number | Number | StringbecomesNumber | String. - Singletons get unwrapped. A union of one type collapses to that type. A union containing
Errorsimplifies toError— if something went wrong, the whole type is wrong. A union containingDynamicsimplifies toDynamic— if anything goes, there’s no point listing the alternatives.
Why does
Dynamicabsorb the union? BecauseDynamicmeans “I don’t know and I don’t care.” If one branch of a union isDynamic, the type checker can’t make any guarantees about what the value is — so the union provides no more information thanDynamicalone. The same logic applies toError: if one branch isError, the whole type is tainted. This is the same reasoning as TypeScript:number | anysimplifies toany.
Compatibility: Unions on Both Sides
is_compatible_with needs to handle unions as both the “expected” type and the “actual” type:
#![allow(unused)]
fn main() {
impl Type {
pub fn is_compatible_with(&self, other: &Type) -> bool {
match (self, other) {
// Existing cases...
(Type::Dynamic, _) | (_, Type::Dynamic) => true,
(Type::Error, _) | (_, Type::Error) => true,
(a, b) if a == b => true,
// NEW: value is a union → ALL variants must be compatible with expected
(Type::Union(variants), expected) => {
// Every variant of the source union must fit the target type.
// If x: Number | Boolean and we need Number, it fails — Boolean
// isn't compatible. With `any()`, it would incorrectly pass
// because Number IS compatible. Union-to-union works the same
// way: (Number | Boolean) is NOT compatible with (Number | String)
// because Boolean doesn't fit the target.
variants.iter().all(|v| v.is_compatible_with(expected))
}
// NEW: expected is a union → value must fit at least one variant
(actual, Type::Union(variants)) => {
// A concrete value only needs to fit one variant of the union.
// Number IS compatible with Number | String because it fits
// the Number arm.
variants.iter().any(|v| actual.is_compatible_with(v))
}
// Everything else: not compatible
_ => false,
}
}
}
}
A Union([Number, String]) is compatible with Number (one of the variants matches) and with String (another matches). A Number is compatible with Union([Number, String]) (it fits one of the slots). A Boolean is NOT compatible with Union([Number, String]) (it doesn’t fit any slot).
Parsing Union Annotations
The annotation parser needs to handle ---@param x number|string:
#![allow(unused)]
fn main() {
fn parse_type_name(s: &str) -> Type {
let s = s.trim();
// Union type: "number|string"
if s.contains('|') {
let variants: Vec<Type> = s.split('|')
.map(|part| parse_type_name(part.trim()))
.collect();
return Type::union(variants);
}
match s {
"number" => Type::Number,
"string" => Type::String,
"boolean" => Type::Boolean,
"nil" => Type::Nil,
"any" => Type::Dynamic,
_ => Type::Dynamic, // Unknown type name → Dynamic. A production checker
// would emit a diagnostic here ("unknown type 'nmuber'").
// We don't because the annotation parser runs before
// diagnostics are collected — buffering parse warnings
// would require a separate collection pass.
}
}
}
The recursion handles nested unions: number|string|boolean splits into three parts, each parsed individually. Type::union() normalizes the result.
What Operations Work on Unions?
Here’s the subtle part. If x is Number | String, what can you do with it?
---@param x number|string
local function inspect(x)
return x + 1 -- ERROR: x might be a string
end
Arithmetic requires Number. But x might be String. So the type checker must reject this. In our implementation, infer_type for BinOp::Add checks that both operands are Number:
#![allow(unused)]
fn main() {
BinOp::Add | BinOp::Sub | BinOp::Mul | BinOp::Div => {
let lt = infer_type(db, source, left.clone(), env.clone());
let rt = infer_type(db, source, right.clone(), env.clone());
// For arithmetic, operands must be compatible with Number.
// With the all() rule on Union → expected, Number|String is NOT
// compatible with Number — String isn't, so the union fails the
// "all variants must fit" check. This is the correct behavior:
// arithmetic on a maybe-String value is a type error.
if lt.is_compatible_with(&Type::Number) && rt.is_compatible_with(&Type::Number) {
Type::Number
} else {
// Emit diagnostic: cannot use <type> in arithmetic
Type::Error
}
}
}
With the all() rule on the Union → expected arm, Number | String is NOT compatible with Number — because String isn’t compatible with Number, the union fails the “all variants must fit” check. This means arithmetic on a Number | String value correctly fails: x + 1 when x is Number | String produces a type error, because x isn’t definitely a Number.
But there’s a subtler question: what about assignments? If you have y: Number and x: Number | String, should y = x be allowed? With our all() rule, it correctly fails — you can’t assign a maybe-String value to a definitely-Number variable. A gradual type checker would require a type narrowing (if type(x) == "number" then y = x) before the assignment.
This is why is_compatible_with with all() on the first arm gives the right answer for both operations and assignments. The “overly lenient” behavior from earlier drafts (using any()) has been fixed.
A production type checker would still benefit from a separate is_exactly method for cases where even all() is too lenient — for example, when deciding whether to emit a specific “type mismatch” diagnostic vs. a generic “types not compatible” message. But for this tutorial, is_compatible_with with the corrected all() rule is sufficient.
Table Classes with ---@class
Lua uses tables as objects. A Point table has x and y fields. A User table has name and email fields. Without type annotations, every table is Type::Table { fields: Vec<(String, Type)> } — but the type checker doesn’t know what shape a table should have until it sees the fields.
---@class lets the programmer declare a table’s shape up front:
---@class Point
---@field x number
---@field y number
local p = { x = 1, y = 2 }
This tells the type checker: “any value of type Point has fields x: number and y: number.” It’s a named table type — like a struct, but built from Lua’s universal table mechanism.
Parsing ---@class and ---@field
Add two new annotation variants:
#![allow(unused)]
fn main() {
#[derive(Clone, Debug, PartialEq, Eq, Hash, salsa::Update)]
pub enum Annotation {
Type { name: String, ty: Type, raw_type_name: String },
Param { func_name: String, name: String, ty: Type, raw_type_name: String },
Return { func_name: String, ty: Type, raw_type_name: String },
Class { name: String, fields: Vec<(String, Type)> }, // NEW
Field { class_name: String, name: String, ty: Type }, // NEW
}
}
The raw_type_name field on Type, Param, and Return preserves the original type name string from the annotation — "Point", "number|string", etc. We need it because parse_type_name runs during annotation extraction, before class types are registered in the environment. When it encounters an unknown name like "Point", it returns Type::Dynamic — and the original name is lost. raw_type_name keeps the string around so the resolution pass (shown below) can look it up in the environment after classes are registered.
If you’re wondering why Class and Field don’t have raw_type_name: Class stores its fields directly (no type name to resolve), and Field’s ty is always a primitive type like number (field types in our simplified LuaCATS don’t reference other classes).
The parser handles ---@class Point and ---@field x number:
#![allow(unused)]
fn main() {
"@class" => {
let name = rest.trim().to_string();
Annotation::Class { name, fields: vec![] }
}
"@field" => {
// Format: ---@field <name> <type>
let parts: Vec<&str> = rest.trim().splitn(2, ' ').collect();
if parts.len() == 2 {
Annotation::Field {
class_name: String::new(), // filled in when we see which class this belongs to
name: parts[0].to_string(),
ty: parse_type_name(parts[1]),
}
} else {
continue; // malformed, skip
}
}
}
The ---@field annotations are collected under the most recent ---@class, similar to how ---@param collects under the next function:
---@class Point ← starts class "Point"
---@field x number ← field of Point
---@field y number ← field of Point
When extract_annotations finishes processing a class block, it produces a single Annotation::Class { name: "Point", fields: [("x", Number), ("y", Number)] }.
Storing Named Classes in the Environment
A ---@class Point declaration adds a named type to the environment. When the type checker later sees ---@type Point, it looks up the class by name:
#![allow(unused)]
fn main() {
fn resolve_type_by_name(name: &str, env: &TypeEnv) -> Type {
match name {
"number" => Type::Number,
"string" => Type::String,
"boolean" => Type::Boolean,
"nil" => Type::Nil,
"any" => Type::Dynamic,
other => {
// Look up named class in the environment.
// lookup returns Dynamic for unknown names, which is
// the right default for an undeclared class reference.
env.lookup(other)
}
}
}
}
Wait — this is a chicken-and-egg problem. The class type needs to be in the environment before it’s referenced, but extract_annotations is a tracked function that produces annotations, and annotations are processed during check_stmt. The class declaration itself is an annotation — so where does the type come from?
There’s a related problem: parse_type_name — called by extract_annotations — doesn’t know about the environment. When it encounters "Point", it returns Type::Dynamic. The raw type name is lost, so we can’t resolve it later.
Our solution: store the raw type name alongside the parsed type in each annotation. After the first pass registers class types in the environment, we re-resolve any annotation types that fell back to Dynamic. This is where resolve_annotation_type comes in — it checks whether a type is still Dynamic, then calls resolve_type_by_name with the now-populated environment to look up class names that parse_type_name couldn’t resolve on the first pass.
The answer: ---@class annotations are processed first, before any statements. This is a two-pass approach:
- First pass: Scan all
---@classannotations and add their types to the environment. - Resolve: Re-resolve annotation types that
parse_type_namereturnedDynamicfor, usingresolve_annotation_type(which callsresolve_type_by_namewith the now-populated environment). - Second pass: Type-check statements, using the resolved annotation types.
In our implementation, this means check_stmt for the first statement in a file already has the class types available:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn type_check(db: &dyn salsa::Database, source: SourceFile) -> Vec<Diagnostic> {
let ast = parse(db, source);
let annotations = extract_annotations(db, source);
let mut env = TypeEnv::new();
// First pass: register class types
for ann in &annotations {
if let Annotation::Class { name, fields } = ann {
env.extend(name.clone(), Type::Table { fields: fields.clone() });
}
}
// Resolve: re-resolve annotation types that parse_type_name
// returned Dynamic for (e.g., ---@type Point → Table { x: Number, y: Number })
let resolved_annotations: Vec<Annotation> = annotations
.into_iter()
.map(|ann| match ann {
Annotation::Type { name, ty, raw_type_name } => Annotation::Type {
name,
ty: resolve_annotation_type(&ty, &raw_type_name, &env),
raw_type_name,
},
Annotation::Param { func_name, name, ty, raw_type_name } => Annotation::Param {
func_name,
name,
ty: resolve_annotation_type(&ty, &raw_type_name, &env),
raw_type_name,
},
Annotation::Return { func_name, ty, raw_type_name } => Annotation::Return {
func_name,
ty: resolve_annotation_type(&ty, &raw_type_name, &env),
raw_type_name,
},
other => other,
})
.collect();
// Second pass: type-check statements
let mut diagnostics = Vec::new();
for stmt in &ast.statements {
// ... check each statement ...
}
diagnostics
}
}
This is the same two-pass pattern that real type checkers use: declarations first, then checks. It’s not specific to Salsa — it’s a consequence of forward references. Lua’s local is hoisted (the variable exists from the start of the block), so a named type should also be available before it’s “declared” in the source order.
The resolve_annotation_type helper re-resolves types that parse_type_name returned Dynamic for, using the saved raw_type_name:
#![allow(unused)]
fn main() {
fn resolve_annotation_type(ty: &Type, raw_name: &str, env: &TypeEnv) -> Type {
if let Type::Union(variants) = ty {
// Can't resolve individual union variants without per-variant
// raw names. A union like ---@param x Point|nil degrades to
// Dynamic because parse_type_name("Point") returns Dynamic
// (class not registered yet), and Type::union([Dynamic, Nil])
// collapses to Dynamic. When we get here, ty is already
// Dynamic, so this branch doesn't fire. To fix this, we'd
// need to store the raw name for each variant separately.
return Type::union(variants.to_vec());
}
if matches!(ty, Type::Dynamic) {
let resolved = resolve_type_by_name(raw_name, env);
if !matches!(resolved, Type::Dynamic) {
return resolved;
}
}
ty.clone()
}
}
When parse_type_name("Point") returned Dynamic (because the Point class wasn’t registered yet), resolve_annotation_type now looks up "Point" in the environment and finds the Type::Table { fields: [("x", Number), ("y", Number)] } that the class declaration stored. If the name still isn’t found, it stays Dynamic — we don’t have enough information to type it, and that’s okay.
A limitation to be aware of: union annotations that mix class references with other types — like ---@param x Point|nil — degrade to Dynamic silently. Here’s why: parse_type_name("Point") returns Dynamic (the class isn’t registered yet), and Type::union([Dynamic, Nil]) collapses to Dynamic because Dynamic absorbs. The raw name stored is the whole union string "Point|nil", and resolve_type_by_name("Point|nil", env) won’t find it in the environment — "Point|nil" isn’t a variable name. The fix would be storing per-variant raw names so each can be resolved independently. For now, avoid union annotations with class references, or write them as separate ---@type Point annotations instead.
Named Table Types vs Anonymous Tables
There’s a subtle difference between a named class (Point) and an anonymous table ({ x: number, y: number }). Both have the same fields. But the named class carries identity — Point and Size might have the same fields, but they’re different types.
Our implementation treats named classes as Type::Table { fields } — the fields are the same, so Point and Size with identical fields would be compatible. This is structural typing, not nominal typing. For Lua, this is the right default: Lua doesn’t have classes in the language, and ---@class is a documentation convention, not a language feature. Structural typing matches how Lua actually works.
If you wanted nominal typing (where Point ≠ Size even with the same fields), you’d add a name field to Type::Table:
#![allow(unused)]
fn main() {
Table {
name: Option<String>, // None = anonymous, Some("Point") = named
fields: Vec<(String, Type)>,
}
}
Then compatibility would check the name first: same name → compatible, different name → not compatible (even with the same fields). This is a design choice, not a correctness issue. Our tutorial uses structural typing because it’s simpler and matches Lua’s philosophy.
Field Access on Named Classes
The Expression::FieldAccess arm of infer_type already handles Type::Table:
#![allow(unused)]
fn main() {
Expression::FieldAccess { object, field } => {
let obj_type = infer_type(db, source, object.clone(), env.clone());
match obj_type {
Type::Table { fields } => {
fields.iter()
.find(|(name, _)| name == field)
.map(|(_, ty)| ty.clone())
.unwrap_or(Type::Dynamic)
}
Type::Dynamic => Type::Dynamic,
_ => Type::Error,
}
}
}
This works for named classes automatically — a ---@class Point produces a Type::Table { fields: [("x", Number), ("y", Number)] }, and field access on it returns the field’s type. No changes needed.
The type checker can now catch typos in field names:
---@class Point
---@field x number
---@field y number
local p: Point = { x = 1, y = 2 }
print(p.x) -- Number ✓ (resolved via resolve_type_by_name)
print(p.z) -- Dynamic (field not found — a real checker would emit a diagnostic)
Before the resolve_type_by_name step, ---@type Point would have resolved to Dynamic, and even p.x would return Dynamic. The resolution step is what makes class-typed variables actually useful for field access.
What We’re Simplifying
No type narrowing yet. A real gradual type checker would narrow union types after type checks: if type(x) == "number" then ... x + 1 ... end — inside the if, x is known to be Number. Our type checker doesn’t track control flow yet. This is the biggest missing feature for a production checker, and it’s where most of the complexity lives. Chapter 15 adds narrowing — the if type(x) == "number" pattern is exactly what it handles.
No ---@type on table constructors. When you write local p: Point = { x = 1, y = 2 }, the type checker should verify that the table constructor matches the Point class — all required fields present, all field types compatible. Our checker applies the annotation to p but doesn’t validate the constructor against it.
No optional fields. LuaCATS supports ---@field x? number (field might be absent). Our Type::Table has no representation for optional fields — every field is required. Adding optional fields is straightforward: change fields: Vec<(String, Type)> to fields: Vec<(String, Type, bool)> where the bool indicates optionality.
Structural, not nominal typing. Two classes with the same fields are compatible. Nominal typing (where class names matter) is a design choice, not a bug — but it’s worth knowing the tradeoff.
Next: Chapter 12: Generic Functions — ---@generic lets you write type-parameterized functions like ---@generic T on function identity(x). We’ll parse generic annotations, represent type variables in our type system, and substitute them during inference.