MLIR for Lox: Part 2 - Garbage Collection (From Scratch)
This guide starts from absolute zero and builds up GC understanding one concept at a time. No prior GC knowledge assumed.
Chapter 1: The Problem
What Happens to Memory Without GC?
When you allocate memory, it comes from a big pool called the heap. In C, you do:
void* obj = malloc(16); // Take 16 bytes from heap
// ... use obj ...
free(obj); // Put it back
The problem: When do you call free()?
Consider this Lox code:
var a = "hello";
var b = a;
a = nil;
// Is "hello" still needed? b still references it!
The string "hello" is referenced by b. If we freed it when a was set to nil, b would point to garbage.
The core question GC answers:
When is an object no longer needed, so it's safe to free?
Chapter 2: The Intuition
Think of it like a party at your house:
- Objects = guests
- References = guests holding a "ticket" to stay
- GC = you checking who still has a ticket
Guest A enters (allocate)
↓
Guest A gives ticket to Guest B (reference)
↓
Guest A leaves but Guest B still has ticket
↓
You check: "Does anyone have Guest A's ticket?"
↓
No? → Guest A can leave (free)
Key insight: An object is "live" if it's reachable from your program's variables. If nothing points to it, it's garbage.
Chapter 3: The Two Questions
GC must answer:
- What objects are "roots"? (starting points for reachability)
- What objects does each object reference? (follow the chain)
What Are Roots?
A "root" is something your program can directly access:
var x = 1; // 'x' is a root (global variable)
print x; // We can reach '1' from 'x'
Roots include:
- Global variables
- Local variables on the stack
- Temporaries in expressions (intermediate results)
What Is Reachability?
var a = "hello"; // "hello" is reachable from root 'a'
var b = a; // "hello" is reachable from 'b' too
a = nil; // Still reachable from 'b'
b = nil; // Now unreachable! Garbage!
Visual representation:
BEFORE: AFTER b = nil:
┌─────────┐ ┌─────────┐
│ root: a │──► "hello" │ root: a │──► nil
└─────────┘ └─────────┘
┌─────────┐ ┌─────────┐
│ root: b │──► "hello" │ root: b │──► nil
└─────────┘ └─────────┘
│
"hello" ← GARBAGE!
(nothing points to it)
Chapter 4: The Algorithm (Mark-Sweep)
The simplest GC algorithm is mark-sweep. It has two phases:
Phase 1: Mark
┌─────────────────────────────────────────────┐
│ MARK PHASE │
│ │
│ 1. Start with all roots │
│ 2. For each root, mark the object │
│ 3. Recursively mark objects it references │
│ 4. Repeat until nothing new to mark │
└─────────────────────────────────────────────┘
Phase 2: Sweep
┌─────────────────────────────────────────────┐
│ SWEEP PHASE │
│ │
│ 1. Walk all allocated objects │
│ 2. If marked: clear mark (still live) │
│ 3. If not marked: FREE IT (garbage) │
└─────────────────────────────────────────────┘
Visual Example
Before collection:
Objects in heap:
[A] ──► [B] ──► [C]
↑
root (global variable)
[D] ──► [E]
↑
nothing! (D was a local that went out of scope)
After mark phase:
[A*] ──► [B*] ──► [C*] (* = marked, still live)
↑
root
[D] ──► [E] (not marked, these are garbage)
After sweep phase:
[A] ──► [B] ──► [C] (live, marks cleared for next cycle)
[freed] [freed] (D and E are gone! memory reclaimed)
Chapter 5: The Implementation Challenge
Now we know WHAT to do. But HOW does the GC know:
- Where are the roots? (what variables are currently in scope)
- What does each object reference? (which other objects it points to)
These are the two hard problems. Let's tackle them one at a time.
Problem 1: Finding Roots
When the GC runs, your program is paused mid-execution. The stack has local variables. Those are roots.
fun example() {
var a = 1; // 'a' is on the stack → root
var b = 2; // 'b' is on the stack → root
// GC runs here!
// It needs to know about 'a' and 'b'
}
The challenge: In compiled code, variables are just CPU registers or stack slots. The GC doesn't know which ones are pointers vs integers.
Solutions:
- Conservative GC: Treat everything that LOOKS like a pointer as a pointer
- Precise GC: Tell the GC exactly where pointers are (this is what we'll do)
Problem 2: Finding Object References
Each object may reference other objects:
class Pair {
init(first, second) {
this.first = first; // this object references 'first'
this.second = second; // this object references 'second'
}
}
var pair = Pair(1, 2);
// The 'pair' object references two number objects
The challenge: The GC needs to know: given an object, what other objects does it point to?
Solution: Store type information with each object so the GC knows how to walk it.
Chapter 6: Our First Data Structure
Before writing any GC code, we need a way to represent objects. Every object needs:
- A header (metadata for the GC)
- The actual data (the object's fields)
┌────────────────────────────┐
│ Object Header │
│ - marked: bool │
│ - type: ObjType │
│ - size: usize │
├────────────────────────────┤
│ Object Data │
│ - field 1 │
│ - field 2 │
│ - ... │
└────────────────────────────┘
Why the Header?
When the GC walks all objects, it needs to:
- Know if an object is already marked (avoid infinite loops)
- Know what type it is (to find references)
- Know how big it is (for sweeping)
The Type Enum
Different Lox objects have different data:
| Lox Type | What It Stores | What It References |
|---|---|---|
| Number | f64 | Nothing |
| String | bytes | Nothing |
| Closure | function ptr, env | Environment variables |
| Instance | class ptr, fields | Field values |
Chapter 7: Let's Write Some Code
Now we understand the concepts. Let's implement the absolute minimum:
Step 1: Define the Object Header
#![allow(unused)] fn main() { // src/runtime/object.rs /// Type of a Lox object #[repr(u8)] #[derive(Debug, Clone, Copy, PartialEq)] pub enum ObjType { Number = 0, String = 1, Closure = 2, Instance = 3, } /// Header prepended to every heap object /// /// Memory layout: /// [header: ObjHeader][data: depends on ObjType] #[repr(C)] pub struct ObjHeader { /// Has this object been marked by the GC? pub marked: bool, /// What type of object is this? pub obj_type: ObjType, /// Size of the data portion (not including header) pub size: u32, /// Pointer to next object in the allocation list /// (the GC needs to walk all objects during sweep) pub next: Option<*mut ObjHeader>, } }
Step 2: The Global Object List
The GC needs to know about ALL objects to sweep them:
#![allow(unused)] fn main() { // src/runtime/gc.rs use std::cell::RefCell; /// All allocated objects, linked list style thread_local! { static ALL_OBJECTS: RefCell<Option<*mut ObjHeader>> = RefCell::new(None); static OBJECT_COUNT: RefCell<usize> = RefCell::new(0); } }
Why thread_local!? Because Lox is single-threaded, and this avoids passing a GC context everywhere.
Step 3: Allocation (Without GC Yet)
#![allow(unused)] fn main() { /// Allocate a new object /// /// This is like `malloc` but: /// 1. Adds a header for GC metadata /// 2. Tracks the object in a global list pub fn alloc(size: usize, obj_type: ObjType) -> *mut u8 { // Calculate total size: header + data let total_size = std::mem::size_of::<ObjHeader>() + size; // Allocate raw memory let layout = std::alloc::Layout::from_size_align(total_size, 8).unwrap(); let ptr = unsafe { std::alloc::alloc(layout) }; if ptr.is_null() { panic!("Out of memory!"); } // Write the header let header = ptr as *mut ObjHeader; unsafe { (*header).marked = false; (*header).obj_type = obj_type; (*header).size = size as u32; // Add to global list ALL_OBJECTS.with(|list| { (*header).next = *list.borrow(); *list.borrow_mut() = Some(header); }); } // Return pointer to DATA (after header) let data_ptr = unsafe { ptr.add(std::mem::size_of::<ObjHeader>()) }; // Increment count OBJECT_COUNT.with(|count| { *count.borrow_mut() += 1; }); data_ptr } }
Memory Layout Deep Dive
Let's trace through exactly what happens when we allocate:
#![allow(unused)] fn main() { // Calling: let ptr = alloc(16, ObjType::String); // Step 1: Calculate total size // sizeof(ObjHeader) = 16 bytes (bool + padding + u8 + padding + u32 + padding + pointer) // data_size = 16 bytes // total = 32 bytes // Step 2: Allocate raw memory // ptr = 0x1000 (example address) // Step 3: Write header at ptr // 0x1000: marked = false // 0x1001: obj_type = String (1) // 0x1004: size = 16 // 0x1008: next = previous head of list // Step 4: Return data pointer // data_ptr = 0x1010 (ptr + 16) }
Visual Memory Layout
Address Content
──────────────────────────────────────
0x1000 ┌─────────────────────────┐ ← Header starts here
0x1001 │ marked: false (1 byte) │
0x1002 │ padding (1 byte) │
0x1003 │ obj_type: 1 (1 byte) │
0x1004 │ padding (1 byte) │
0x1005 │ size: 16 (4 bytes) │
0x1008 │ padding (4 bytes) │
0x1010 │ next: 0x2000 (8 bytes) │ ← Points to previous allocation
├─────────────────────────┤
0x1018 │ │ ← Data starts here (returned ptr)
0x1020 │ Your 16 bytes of │
0x1028 │ string data │
0x1030 │ │
└─────────────────────────┘
Key insight: When you receive a pointer from alloc(), you can always find the header by subtracting 16 bytes (sizeof header).
Chapter 7.5: Practice Exercise
Before moving on, let's verify your understanding:
Exercise 1: Trace the Allocations
Given this code:
#![allow(unused)] fn main() { let a = alloc(8, ObjType::Number); // Allocates a number let b = alloc(24, ObjType::String); // Allocates a string // What does ALL_OBJECTS look like? // What are the header addresses for a and b? }
Click to reveal answer
ALL_OBJECTS → b's header → a's header → None
a's header:
- address: b's data ptr - 24 - 16 = depends on b's allocation
- size: 8
- type: Number
- next: None (it was first)
b's header:
- address: b_data_ptr - 16
- size: 24
- type: String
- next: a's header
The linked list is built in reverse order. Each new allocation becomes the new head.
Exercise 2: Why the Header?
Why do we need marked, obj_type, and size in the header? What would happen if we removed each one?
Click to reveal answer
-
Without
marked: We couldn't track which objects are live. The sweep phase wouldn't know what to free. We might free live objects or keep garbage forever. -
Without
obj_type: We couldn't walk object references. When marking a closure, we wouldn't know it has an environment to also mark. -
Without
size: We couldn't properly free memory (need to know how much to deallocate). We also couldn't debug allocation issues.
Exercise 3: Thread Safety
Why do we use thread_local! instead of a regular static?
Click to reveal answer
A regular static would be shared across all threads, requiring synchronization (mutexes, atomics). Since Lox is single-threaded, thread_local! gives us:
- No synchronization overhead
- Each thread could have its own GC (if we later add threads)
- Simpler code
If we used a regular static, we'd need static ALL_OBJECTS: Mutex<...> which adds overhead.
Checkpoint: Before Moving On
We have:
- Object header definition
- Global object list
- Allocation function
- Understanding of memory layout
Before we continue to Part 3, make sure you can answer:
- Where is the header relative to the returned pointer?
- How does the linked list of objects work?
- Why do we need each field in the header?
Next: How do we find roots during garbage collection?