Chapter 3: Interned Symbols — Fast Comparison by Giving Names Identity
Our AST uses String for every identifier. That works — until you realize what happens at scale. Every name lookup compares strings byte-by-byte. "print" == "print" walks five characters on both sides. In a file with hundreds of references, that adds up. And when Salsa caches a query result keyed by a string, it hashes the entire string to check the cache.
The Fix: Interning
Compilers solve this with interning: allocate each unique string once, and refer to it by ID. The first time you see "print", you store it and get back ID 42. Every subsequent "print" maps to the same ID 42. Name comparison becomes a single integer comparison — O(1), no string walking, no hashing. Two symbols are equal if and only if they have the same database-assigned ID.
This isn’t a new idea — Lisp systems have done it since the 1960s. Salsa gives you a first-class way to do it.
Without interning: With interning:
"print" == "print" Symbol(42) == Symbol(42)
│ │ │ │
▼ ▼ ▼ ▼
p == p → yes 42 == 42 → yes
r == r → yes Done. One comparison.
i == i → yes
n == n → yes
t == t → yes
5 character comparisons 1 integer comparison
(and a hash for Salsa caching) (and no hash — the ID IS the hash)
Same answer. Different cost. When your type checker compares thousands of names per file, that difference compounds.
#[salsa::interned] vs #[salsa::input]
Both live in the database. Both get IDs. But they’re different:
#[salsa::input] | #[salsa::interned] | |
|---|---|---|
| Created by | Outside code (your main function) | Inside tracked functions |
| Deduplication | No — each new() call creates a new entry | Yes — same field values → same ID |
| Mutable? | Yes — setters for each field | No — immutable once created |
| Lifecycle | Survives across revisions | Survives across revisions |
| Lifetime | No 'db parameter | Has 'db lifetime |
Inputs are for data that comes from outside the system and can change (source files, configuration). Interned values are for data that’s derived inside queries and never changes (identifiers, keywords).
Step 1: Define an Interned Symbol
#![allow(unused)]
fn main() {
#[salsa::interned(debug)]
pub struct Symbol<'db> {
#[returns(ref)]
pub text: String,
}
}
The 'db lifetime ties the symbol to the database — you can’t hold a Symbol after the database is dropped. The #[returns(ref)] on text means symbol.text(db) returns &str instead of String, avoiding a clone.
When you call Symbol::new(db, "print".to_string()):
- Salsa checks: have I seen
"print"before? - If yes → return the existing ID. No allocation.
- If no → store the string, assign a new ID, return it.
Step 2: The Guarantee
#![allow(unused)]
fn main() {
let s1 = Symbol::new(&db, "print".to_string());
let s2 = Symbol::new(&db, "print".to_string());
let s3 = Symbol::new(&db, "write".to_string());
assert_eq!(s1, s2); // same string → same ID → equal
assert_ne!(s1, s3); // different string → different ID → not equal
}
s1 == s2 is a single integer comparison. Not a string comparison. In a type checker doing millions of name lookups, this matters.
Step 3: Definitions and the Symbol Table
Before we can look up names, we need to represent what a definition is — the thing a name refers to. A definition has a name, a kind (variable, function, parameter), and whether it’s local:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub struct Definition {
pub name: String,
pub is_local: bool,
pub kind: DefKind,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, salsa::Update)]
pub enum DefKind {
Variable,
Function,
Parameter,
}
}
Why salsa::Update on these? Because they’re stored inside tracked query results (like Vec<Definition>). Salsa needs to compare old and new values when inputs change, and salsa::Update provides the recursive equality check. If you forget this derive, the compiler will catch it — you’ll get an error when you use Definition in a tracked function’s return type.
Why String instead of Symbol<'db> in Definition? That’s the design tension we’ll get to in a moment.
Now we can build the symbol table itself:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn symbol_table(db: &dyn salsa::Database, source: SourceFile) -> Vec<Definition> {
let ast = parse(db, source);
let mut defs = Vec::new();
for stmt in &ast.statements {
match stmt {
Statement::Assignment { local, targets, .. } => {
for target in targets {
if let Expression::Name(name) = target {
defs.push(Definition {
name: name.clone(),
is_local: *local,
kind: DefKind::Variable,
});
}
}
}
Statement::Function { local, name, params, .. } => {
defs.push(Definition {
name: name.clone(),
is_local: *local,
kind: DefKind::Function,
});
for param in params {
defs.push(Definition {
name: param.clone(),
is_local: true,
kind: DefKind::Parameter,
});
}
}
_ => {}
}
}
defs
}
}
This walks the AST and collects every name binding — local variables, function names, and function parameters. It’s a derived query: it depends on parse(), so when the source changes, the symbol table is recomputed.
Step 4: Looking Up Names with Interned Comparison
Now we can look up a name using interned comparison:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn lookup_name(db: &dyn salsa::Database, source: SourceFile, name: String) -> Option<Definition> {
// Intern the lookup name — this gives us an integer ID for fast comparison
let symbol = Symbol::new(db, name.clone());
let defs = symbol_table(db, source);
for def in &defs {
// Intern each definition name, then compare Symbols by ID
let def_symbol = Symbol::new(db, def.name.clone());
if def_symbol == symbol {
return Some(def.clone());
}
}
None
}
}
Each comparison is a single integer equality check — Symbol’s Eq impl compares the underlying Salsa IDs, not the string contents. The == on two Symbol values compiles to the same thing as comparing two u32 values.
You might notice: for a linear scan, interning doesn’t save much. We’re converting each String to a Symbol on every call, and the scan itself is O(n). The real payoff comes when the symbol table is indexed — a HashMap<Symbol, Definition> where the key is the interned ID. Symbol implements Hash via its Salsa ID, so HashMap lookups are integer hashes, not string hashes. But there’s a catch: building a HashMap for a single lookup is more work than a linear scan. The pattern only pays off when the map is cached as its own tracked query, so multiple lookups reuse it. Chapter 4 adds that cached map.
Why not use a HashMap<String, Definition> now and skip interning? Because String hashing walks every byte. Symbol hashing is a single integer. In a file with hundreds of lookups, that difference compounds. And there’s a conceptual benefit: with String, “equality” means “same bytes.” With Symbol, equality means “same identity” — the database said these are the same name. In practice, for well-formed Lua, the distinction doesn’t matter (“print” is always “print”). But the mental model shift matters: names have identity, not mere spelling.
We’re setting up the infrastructure here — the actual HashMap-indexed lookup arrives in Chapter 4, where the symbol table becomes a cached tracked query.
The design tension. You might wonder: why not store Symbol<'db> directly in Definition instead of String? Then we wouldn’t need to intern the definition names at map-construction time. The problem: Definition is returned from a tracked function, so Salsa stores it inside the database. Stored values can’t hold references into the database — and Symbol<'db> is exactly that (the 'db lifetime means “borrowed from the database”). You can’t store a database reference inside the database. This is a real constraint in Salsa — interned values give you fast comparison, but they can’t be stored in other tracked query results. Chapter 4’s tracked structs solve this: they’re first-class database entities, not plain data, so they CAN hold interned values.
When to Intern vs When Not To
Intern: identifiers compared frequently (variable names, function names, type names). The whole point is deduplication + fast comparison.
Don’t intern: one-off strings, large strings (you’d waste memory storing them forever), strings rarely compared. Not everything needs to be interned because you can.
A good rule of thumb: if you’d put it in a HashSet or use it as a HashMap key, intern it. If you’d display it once and forget it, don’t.
Running
cargo run --bin ch03-interned-symbols
Next: Chapter 4: Tracked Structs — The backbone of a Salsa-aware AST. We’ll convert our String-based AST into one that uses Salsa IDs, giving each AST node stable identity that survives edits.