Chapter 13: Function Types — Specifying What Goes In and What Comes Out
In Chapter 12, we wrote a generic map function with this annotation:
---@param f fun(x: T): U
But our parser couldn’t handle fun(x: T): U. It saw “fun” — not a known type name — and fell through to Dynamic. The type checker never learned what the callback does. f was Dynamic, and map couldn’t connect the callback’s parameter type to the array’s element type.
This chapter fixes that. We add a parser for function type syntax, which lets us express higher-order functions with full type precision.
The Problem: fun(...) Falls Through to Dynamic
Our parse_type_name function from Chapter 10 handles simple names:
#![allow(unused)]
fn main() {
fn parse_type_name(s: &str) -> Type {
match s.trim() {
"number" => Type::Number,
"string" => Type::String,
// ... other known types
_ if looks_like_type_var(s) => Type::Var(s.to_string()),
_ => Type::Dynamic, // ← "fun(x: T): U" hits this branch
}
}
}
The fun(...) syntax isn’t a simple name — it’s a structured type with parameters and a return type. We need a recursive parser that can handle it.
The Grammar
LuaCATS function type syntax:
fun_type = "fun" "(" param_list ")" ":" type
param_list = param ("," param)*
param = name ":" type | type
type = simple_name | fun_type | union_type
union_type = type ("|" type)*
Examples and what they produce:
| Annotation | Parsed Type |
|---|---|
fun(x: number): string | Function([Number], String) |
fun(x: T): U | Function([Var("T")], Var("U")) — Var("U") is the return type, not a param |
fun(number): boolean | Function([Number], Boolean) — unnamed params |
fun(f: fun(T): U): table | Function([Function([Var("T")], Var("U"))], Dynamic) — nested |
fun(x: number, y: string): boolean | Function([Number, String], Boolean) — multiple params |
The parser is recursive: fun(f: fun(T): U): table has a function type inside a function type. This is how you express higher-order functions — functions that take other functions as arguments.
Step 1: From parse_type_name to parse_type
The old function handled one thing: simple names. The new function handles three:
#![allow(unused)]
fn main() {
fn parse_type(s: &str) -> Type {
let s = s.trim();
// Function type: fun(...): return_type
if s.starts_with("fun(") || s.starts_with("fun (") {
return parse_function_type(s);
}
// Union type: "number|string"
//
// Caveat: this naive split on '|' breaks on nested unions inside
// function types, e.g. "fun(x: number|string): boolean". The '|' inside
// the function type's parameter gets split incorrectly. A production
// parser would track parenthesis depth and only split on '|' at depth 0
// — the same pattern we use for commas in parse_param_types below.
// For the annotations in this tutorial (no nested unions in function
// params), the naive split works. See "What We're Simplifying" at the end.
if s.contains('|') {
let variants: Vec<Type> = s.split('|')
.map(|part| parse_type(part.trim()))
.collect();
return Type::union(variants);
}
// Simple type name (the old parse_type_name logic)
parse_type_name(s)
}
}
The order matters: fun( must be checked first because function type syntax can contain | in its return type (like fun(T): U|nil). If we checked unions first, we’d split incorrectly on the | inside the return type. Then check for top-level unions, then fall back to simple names.
Wait — does fun(T): U|nil work with our parser? Let’s trace through it:
s = "fun(T): U|nil"— starts withfun(, so we callparse_function_typeparse_function_typefinds the matching), extractsTas the param- It finds
: U|nilafter the closing paren - It calls
parse_type("U|nil")for the return type - That hits the union branch:
parse_type("U")→Var("U"),parse_type("nil")→Nil - Result:
Function([Var("T")], Union([Var("U"), Nil]))
It works because the return type is parsed by the same parse_type function — recursion handles the union automatically.
Step 2: Parsing the Function Type
The function type parser has three jobs:
- Find the matching closing paren (respecting nesting)
- Parse parameter types from the content inside the parens
- Parse the return type after the colon
#![allow(unused)]
fn main() {
fn parse_function_type(s: &str) -> Type {
// Skip "fun" and optional space
let after_fun = if s.starts_with("fun(") { &s[3..] } else { &s[4..] };
// Find matching closing paren
let paren_content = match find_matching_paren(after_fun) {
Some(content) => content,
None => return Type::Dynamic, // malformed
};
// Find the return type after the closing paren
// Byte offset of the character after the closing ')'
let after_paren = if s.starts_with("fun(") {
4 + paren_content.len() + 1 // "fun(" + content + ")"
} else {
5 + paren_content.len() + 1 // "fun (" + content + ")"
};
let return_type = if let Some(rest) = s.get(after_paren..) {
let rest = rest.trim();
if let Some(rest) = rest.strip_prefix(':') {
parse_type(rest)
} else {
Type::Dynamic // no return type → Dynamic
}
} else {
Type::Dynamic
};
// Parse parameter types from paren_content
let params = parse_param_types(paren_content);
Type::Function { params, ret: Box::new(return_type) }
}
}
Finding the matching paren. We can’t find the first ) and call it done — function types can be nested. fun(f: fun(T): U): V has two closing parens, and we need the outer one. So we count nesting depth:
#![allow(unused)]
fn main() {
fn find_matching_paren(s: &str) -> Option<&str> {
if !s.starts_with('(') { return None; }
let mut depth = 1;
let mut i = 1;
while i < s.len() && depth > 0 {
match s.as_bytes()[i] {
b'(' => depth += 1,
b')' => depth -= 1,
_ => {}
}
i += 1;
}
if depth == 0 { Some(&s[1..i-1]) } else { None }
}
}
Step 3: Parameter Parsing — Named vs Unnamed
LuaCATS supports both named and unnamed parameters:
fun(x: number): string — named: x is number
fun(number): string — unnamed: only the type
For type checking, we only care about the type — the name is documentation. Our parser handles both:
#![allow(unused)]
fn main() {
fn parse_param_type(s: &str) -> Type {
let s = s.trim();
// Find the colon that isn't inside parens
let mut depth = 0;
for (i, c) in s.char_indices() {
match c {
'(' => depth += 1,
')' => depth -= 1,
':' if depth == 0 => {
// Named param: "x: type" — take everything after the colon
return parse_type(&s[i + 1..]);
}
_ => {}
}
}
// Unnamed param: the whole thing is a type
parse_type(s)
}
}
Why track depth when looking for the colon? Because a parameter could be a function type: f: fun(T): U. The first colon (after f) is the name separator. The second colon (after fun(T)) is inside the function type. We only want the first one at depth 0.
Step 4: Commas Need Depth-Tracking Too
Parameter lists are comma-separated: fun(x: number, y: string): boolean. But a parameter can be a function type: fun(f: fun(T): U, x: T): U. The comma between U and x is a parameter separator. But there’s no comma inside fun(T): U — so a naive split(',') would break.
We split on commas at depth 0:
#![allow(unused)]
fn main() {
fn parse_param_types(s: &str) -> Vec<Type> {
if s.trim().is_empty() {
return Vec::new(); // fun() — no parameters
}
let mut params = Vec::new();
let mut depth = 0;
let mut start = 0;
for (i, c) in s.char_indices() {
match c {
'(' => depth += 1,
')' => depth -= 1,
',' if depth == 0 => {
let param = s[start..i].trim();
if !param.is_empty() {
params.push(parse_param_type(param));
}
start = i + 1;
}
_ => {}
}
}
// Last param (after the final comma, or the only param if no commas)
let last = s[start..].trim();
if !last.is_empty() {
params.push(parse_param_type(last));
}
params
}
}
This pattern — track depth, split at depth 0 — shows up everywhere in recursive parsing. It’s the same technique you’d use for parsing nested parentheses in any language.
The Result: map Finally Has a Real Callback Type
Before (Chapter 12):
---@param f fun(x: T): U
-- parse_type_name("fun(x: T): U") → Dynamic
-- f's type: Dynamic (useless)
After (this chapter):
---@param f fun(x: T): U
-- parse_type("fun(x: T): U") → Function([Var("T")], Var("U"))
-- f's type: Function([Var("T")], Var("U"))
Now when map is called with a fun(number): string callback, the substitution correctly maps T → Number and U → String. The type checker can verify that the callback’s parameter type matches the array’s element type.
Higher-Order Function Composition
A higher-order function is a function that takes another function as an argument or returns one as a result. map is a higher-order function — it takes a callback. So is compose, which takes two functions and returns a new one. Function types in annotations let us express more than map alone. Here’s composition:
---@generic T, U, V
---@param f fun(x: U): V
---@param g fun(x: T): U
---@return fun(x: T): V
local function compose(f, g)
return f
end
The return type fun(x: T): V is a function type with type variables. When we call compose with concrete functions:
f: fun(string): booleang: fun(number): string
The substitution maps T → Number, U → String, V → Boolean, and the return type resolves to fun(number): boolean — exactly what you’d expect from composing those two functions.
This is the power of combining generics (Chapter 12) with function types (this chapter): the type checker can reason about higher-order functions end-to-end, not treat every callback as Dynamic by default.
What We’re Simplifying
No table types in function params. You can’t write fun(t: {x: number}): number yet — our parser doesn’t handle inline table types inside function annotations. You’d need to define a ---@class and use the class name instead.
No optional parameters. LuaCATS supports fun(x?: number): string for optional parameters. We don’t parse the ? syntax.
No variadic parameters. fun(...: number) is valid LuaCATS but not handled here.
No overloads. fun(number): string | fun(string): number (multiple signatures for the same function) requires more work — you’d need to parse it as a union of function types and check each branch during inference.
Union splitting doesn’t track depth. The parse_type function splits on | with s.split('|'), which doesn’t respect nesting. This works for number|string and even number|fun(x: number): boolean (no | inside the function type). But number|fun(x: number|string): boolean would split incorrectly on the | inside the function type’s parameter. A depth-aware split — the same pattern we use for commas and colons — would fix this. For the annotations in this tutorial, the naive split is sufficient; if you’re building a production parser, add depth-tracking to the union branch.
These are all solvable with the same recursive parsing approach — each adds more branches to parse_type.
Running
cargo run --bin ch13-function-types
This chapter has 10 test cases covering the parser, annotation extraction, and substitution with function types:
cargo test --bin ch13-function-types
Next: Chapter 14: Recursive Types and Self-Reference — A linked list node that references itself. A tree where each branch is another tree. These types show up constantly in real Lua code — and they break naive type checking because typeof(field) points back to the type we’re still defining. We’ll handle self-reference with named type references — instead of expanding Node forever, we stop at Ref("Node") and compare by name.