Chapter 3: Interned Symbols and Name Resolution
Use #[salsa::interned] to deduplicate identifiers and enable fast pointer-comparison.
What You'll Learn
- What interning is and why it matters for identifiers
- How
#[salsa::interned]works vs#[salsa::input] - How to build a symbol table (name → definition) as a tracked query
- Why interning enables fast pointer-comparison for names
The Problem: String Comparison Is Expensive
In Chapter 2, our AST used String for every identifier. When the type checker looks up a variable name, it compares strings byte-by-byte. In a real codebase with thousands of name lookups, that adds up.
Compilers solve this with interning: allocate each unique string once, and refer to it by ID. The first time you see "print", you store it and get back ID 42. Every subsequent "print" maps to the same ID 42. Now name comparison is a single integer comparison.
This isn't a new idea — Lisp systems have done it since the 1960s. Salsa gives you a first-class way to do it.
#[salsa::interned] vs #[salsa::input]
Both live in the database. Both get IDs. But they're different:
#[salsa::input] | #[salsa::interned] | |
|---|---|---|
| Created by | Outside code (your main function) | Inside tracked functions |
| Deduplication | No — each new() call creates a new entry | Yes — same field values → same ID |
| Mutable? | Yes — setters for each field | No — immutable once created |
| Lifecycle | Survives across revisions | Survives across revisions |
| Lifetime | No 'db parameter | Has 'db lifetime |
Inputs are for data that comes from outside the system and can change (source files, configuration). Interned values are for data that's derived inside queries and never changes (identifiers, keywords).
Step 1: Define an Interned Symbol
#![allow(unused)] fn main() { #[salsa::interned(debug)] pub struct Symbol<'db> { #[returns(ref)] pub text: String, } }
The 'db lifetime ties the symbol to the database — you can't hold a Symbol after the database is dropped. The #[returns(ref)] on text means symbol.text(db) returns &str instead of String, avoiding a clone.
When you call Symbol::new(db, "print".to_string()):
- Salsa checks: have I seen
"print"before? - If yes → return the existing ID. No allocation.
- If no → store the string, assign a new ID, return it.
Step 2: The Guarantee
#![allow(unused)] fn main() { let s1 = Symbol::new(&db, "print".to_string()); let s2 = Symbol::new(&db, "print".to_string()); let s3 = Symbol::new(&db, "write".to_string()); assert_eq!(s1, s2); // same string → same ID → equal assert_ne!(s1, s3); // different string → different ID → not equal }
s1 == s2 is a single integer comparison. Not a string comparison. In a type checker doing millions of name lookups, this matters.
Step 3: Symbol Table as a Tracked Query
Now we can build a symbol table that uses interned comparison:
#![allow(unused)] fn main() { #[salsa::tracked] pub fn lookup_name(db: &dyn salsa::Database, source: SourceFile, name: String) -> Option<Definition> { let symbol = Symbol::new(db, name.clone()); // intern the lookup name let defs = symbol_table(db, source); for def in defs { let def_symbol = Symbol::new(db, def.name.clone()); // intern each def name if def_symbol == symbol { // integer comparison! return Some(def); } } None } }
In this chapter's code, we're still using String in the AST (we'll switch to Symbol in the AST itself in Chapter 4). But the lookup already uses interned comparison at the boundary.
When to Intern vs When Not To
Intern: identifiers compared frequently (variable names, function names, type names). The whole point is deduplication + fast comparison.
Don't intern: one-off strings, large strings (you'd waste memory storing them forever), strings rarely compared. Not everything needs to be interned just because you can.
A good rule of thumb: if you'd put it in a HashSet or use it as a HashMap key, intern it. If you'd just display it once and forget it, don't.
Running
cargo run --bin ch03-interned-symbols
Key Takeaways
-
Interned = deduplicated. Same text → same ID. Always. This is how compilers handle identifiers efficiently.
-
Comparison by ID, not by value. Two
Symbolvalues with the same text have the same ID, so equality is O(1). -
Created inside queries. Unlike inputs (which come from outside), interned values are created inside tracked functions. They live in the database and persist across revisions.
-
Lifetime-tied.
Symbol<'db>can't outlive the database. This prevents stale references after database mutations. -
Inputs vs interned: different jobs. Inputs are mutable facts from outside. Interned values are immutable deduplicated data from inside. Don't confuse them.
What's Next
Chapter 4: Tracked Structs — The backbone of a Salsa-aware AST. We'll convert our String-based AST into one that uses Salsa IDs, giving each AST node stable identity that survives edits.