Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MLIR for Lox: Part 2 — Garbage Collection from Scratch — Because Lox Values Can’t Clean Up After Themselves

var a = "hello";
var b = a;
a = nil;
// Is "hello" still needed? b still references it!

When a was set to "hello", the runtime allocated memory for that string. When a was set to nil, the string is still reachable through b. Free it too early and b points to garbage. Free it too late and your program’s memory grows forever.

In C, you’d call free() yourself and get it wrong. In a language like Lox, the runtime needs to figure out when it’s safe — and that’s what garbage collection does.

The core question GC answers:

When is an object no longer needed, so it’s safe to free?

This part builds the foundation: the mark-sweep algorithm, object headers, allocation, and the global object list. Part 3 adds shadow stacks (for finding roots) and Part 4 wires the whole thing into the MLIR code generator so the compiler emits GC bookkeeping automatically. By the end of Part 4, you’ll have a real mark-sweep collector. The kind you’d find in a production interpreter, scaled down.

A note on the code examples. Parts 2 and 3 use Rust for their code examples because it’s more readable than the equivalent C — pattern matching, Option instead of null, and thread_local! instead of __thread globals all make the logic easier to follow. The actual runtime (Part 9) is C, and the data structures are the same: a linked list of ObjHeaders, a shadow stack of StackFrames. If you’re reading ahead, the C versions of alloc, gc_push_frame, gc_collect, and friends in Part 9 are the real implementations. The Rust here is the conceptual skeleton.


The Intuition

Think of it like a party at your house:

  1. Objects = guests
  2. References = guests holding a “ticket” to stay
  3. GC = you checking who still has a ticket
Guest A enters (allocate)
    ↓
Guest A gives ticket to Guest B (reference)
    ↓
Guest A leaves but Guest B still has ticket
    ↓
You check: "Does anyone have Guest A's ticket?"
    ↓
No? → Guest A can leave (free)

Key insight: An object is “live” if it’s reachable from your program’s variables. If nothing points to it, it’s garbage.


The Two Questions

GC must answer:

  1. What objects are “roots”? (starting points for reachability)
  2. What objects does each object reference? (follow the chain)

What Are Roots?

A “root” is something your program can directly access:

var x = 1;    // 'x' is a root (global variable)
print x;      // We can reach '1' from 'x'

Roots come from three places: global variables, local variables on the stack, and temporaries — the intermediate results that expressions produce while they’re being computed.

What Is Reachability?

var a = "hello";   // "hello" is reachable from root 'a'
var b = a;         // "hello" is reachable from 'b' too
a = nil;           // Still reachable from 'b'
b = nil;           // Now unreachable! Garbage!

Visual representation:

BEFORE:                      AFTER b = nil:

    ┌─────────┐                  ┌─────────┐
    │ root: a │──► "hello"       │ root: a │──► nil
    └─────────┘                  └─────────┘
    ┌─────────┐                  ┌─────────┐
    │ root: b │──► "hello"       │ root: b │──► nil
    └─────────┘                  └─────────┘
                                      │
                                 "hello" ← GARBAGE!
                                 (nothing points to it)

The Algorithm: Mark-Sweep

The simplest GC algorithm is mark-sweep. It has two phases:

Phase 1: Mark

┌─────────────────────────────────────────────┐
│ MARK PHASE                                  │
│                                             │
│   1. Start with all roots                   │
│   2. For each root, mark the object         │
│   3. Recursively mark objects it references │
│   4. Repeat until nothing new to mark       │
└─────────────────────────────────────────────┘

Phase 2: Sweep

┌─────────────────────────────────────────────┐
│ SWEEP PHASE                                 │
│                                             │
│   1. Walk all allocated objects             │
│   2. If marked: clear mark (still live)     │
│   3. If not marked: FREE IT (garbage)       │
└─────────────────────────────────────────────┘

Visual Example

Before collection:

Objects in heap:
  
  [A] ──► [B] ──► [C]
   ↑
  root (global variable)
  
  [D] ──► [E]
   ↑
  nothing! (D was a local that went out of scope)

After mark phase:

  [A*] ──► [B*] ──► [C*]    (* = marked, still live)
   ↑
  root
  
  [D] ──► [E]              (not marked, these are garbage)

After sweep phase:

  [A] ──► [B] ──► [C]       (live, marks cleared for next cycle)
  
  [freed] [freed]           (D and E are gone! memory reclaimed)

The Implementation Challenge

Now we know WHAT to do. But HOW does the GC know:

  1. Where are the roots? (what variables are currently in scope)
  2. What does each object reference? (which other objects it points to)

These are the two hard problems. Let’s tackle them one at a time.

Problem 1: Finding Roots

When the GC runs, your program is paused mid-execution. The stack has local variables. Those are roots.

fun example() {
    var a = 1;    // 'a' is on the stack → root
    var b = 2;    // 'b' is on the stack → root
    // GC runs here!
    // It needs to know about 'a' and 'b'
}

The challenge: In compiled code, variables are CPU registers or stack slots. The GC doesn’t know which ones are pointers vs integers.

Solutions: Conservative GC treats everything that looks like a pointer as a pointer — an integer whose value happens to equal a heap address will keep that object alive even though nothing actually references it. Simple, but you retain garbage you don’t need to. Precise GC tells the GC exactly where the pointers are — more work to set up, but no false positives. We’ll use precise GC.

Problem 2: Finding Object References

Each object may reference other objects:

class Pair {
    init(first, second) {
        this.first = first;   // this object references 'first'
        this.second = second; // this object references 'second'
    }
}

var pair = Pair(1, 2);
// The 'pair' object references two number objects

The challenge: The GC needs to know: given an object, what other objects does it point to?

Solution: Store type information with each object so the GC knows how to walk it.


Our First Data Structure

Before writing any GC code, we need a way to represent objects. Every object needs:

  1. A header (metadata for the GC)
  2. The actual data (the object’s fields)
┌────────────────────────────┐
│ Object Header              │
│   - marked: bool           │
│   - type: ObjType          │
│   - size: u32              │
├────────────────────────────┤
│ Object Data                │
│   - field 1                │
│   - field 2                │
│   - ...                    │
└────────────────────────────┘

Why the Header?

When the GC walks all objects, it needs to know whether an object is already marked (to avoid infinite loops), what type it is (to find references to other objects), and how big it is (so the sweep phase knows how much memory to free).

The Type Enum

Different Lox objects have different data:

Lox TypeWhat It StoresWhat It References
Numberf64Nothing
StringbytesNothing
Closurefunction ptr, envEnvironment variables
Instanceclass ptr, fieldsField values

Closure and Instance don’t appear until Parts 5 and 7. They’re listed here because the ObjType enum needs to account for every kind of object the runtime will handle — even types we haven’t built yet. Think of this table as the full map; we’ll fill in each row as we go.


Writing the Collector

Now we understand the concepts. Let’s implement the absolute minimum:

Step 1: Define the Object Header

#![allow(unused)]
fn main() {
// src/runtime/object.rs

/// Type of a Lox object
#[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum ObjType {
    Number = 0,
    String = 1,
    Closure = 2,
    Instance = 3,
}
}

This enum grows as the tutorial progresses. Part 5 (Closures) adds Environment = 2 for closure capture, which shifts Closure to 3 and Instance to 4. Part 7 (Classes) adds Class = 5 and BoundMethod = 6. Each part shows the cumulative enum with explicit discriminants so you can see which values are in use.

#![allow(unused)]
fn main() {
/// Header prepended to every heap object
/// 
/// Memory layout:
///   [header: ObjHeader][data: depends on ObjType]
#[repr(C)]
pub struct ObjHeader {
    /// Has this object been marked by the GC?
    pub marked: bool,
    
    /// What type of object is this?
    pub obj_type: ObjType,
    
    /// Size of the data portion (not including header)
    pub size: u32,
    
    /// Pointer to next object in the allocation list
    /// (the GC needs to walk all objects during sweep)
    /// null = end of list
    pub next: *mut ObjHeader,
}
}

Step 2: The Global Object List

The GC needs to know about ALL objects to sweep them:

#![allow(unused)]
fn main() {
// src/runtime/gc.rs
use std::cell::RefCell;

/// All allocated objects, linked list style
thread_local! {
    static ALL_OBJECTS: RefCell<*mut ObjHeader> = RefCell::new(std::ptr::null_mut());
    static OBJECT_COUNT: RefCell<usize> = RefCell::new(0);
}
}

Why thread_local!? Because Lox is single-threaded, and this avoids passing a GC context everywhere.

Step 3: Allocation (Without GC Yet)

#![allow(unused)]
fn main() {
/// Allocate a new object
/// 
/// This is like `malloc` but:
///   1. Adds a header for GC metadata
///   2. Tracks the object in a global list
pub fn alloc(size: usize, obj_type: ObjType) -> *mut u8 {
    // Calculate total size: header + data
    let total_size = std::mem::size_of::<ObjHeader>() + size;
    
    // Allocate raw memory
    // Note: size is usize but we store it as u32 in the header,
    // which means allocations larger than 4 GB would silently
    // truncate. Fine for Lox objects — nothing in this language
    // needs that much contiguous memory.
    let layout = std::alloc::Layout::from_size_align(total_size, 8).unwrap();
    let ptr = unsafe { std::alloc::alloc(layout) };
    
    if ptr.is_null() {
        panic!("Out of memory!");
    }
    
    // Write the header
    let header = ptr as *mut ObjHeader;
    unsafe {
        (*header).marked = false;
        (*header).obj_type = obj_type;
        (*header).size = size as u32;
        
        // Add to global list
        ALL_OBJECTS.with(|list| {
            (*header).next = *list.borrow();
            *list.borrow_mut() = header;
        });
    }
    
    // Return pointer to DATA (after header)
    let data_ptr = unsafe { ptr.add(std::mem::size_of::<ObjHeader>()) };
    
    // Increment count
    OBJECT_COUNT.with(|count| {
        *count.borrow_mut() += 1;
    });
    
    data_ptr
}
}

Memory Layout Deep Dive

Let’s trace through exactly what happens when we allocate:

#![allow(unused)]
fn main() {
// Calling:
let ptr = alloc(16, ObjType::String);

// Step 1: Calculate total size
//   sizeof(ObjHeader) = 16 bytes (bool + u8 + 2 padding + u32 + pointer)
//   data_size = 16 bytes
//   total = 32 bytes

// Step 2: Allocate raw memory
//   ptr = 0x1000 (example address)

// Step 3: Write header at ptr
//   0x1000: marked = false
//   0x1001: obj_type = String (1)
//   0x1004: size = 16
//   0x1008: next = previous head of list (null if first)

// Step 4: Return data pointer
//   data_ptr = 0x1010 (ptr + 16)
}

Visual Memory Layout

With repr(C), the compiler lays out fields in declaration order with natural alignment. bool and u8 are both 1-byte aligned, so obj_type goes right after marked. Then 2 bytes of padding align size: u32 to a 4-byte boundary. next is a raw pointer (8 bytes, 8-byte aligned), which is already satisfied at offset 8.

Offset  Field                   Bytes
──────────────────────────────────────────
  +0    marked: false           1 byte
  +1    obj_type: String (1)    1 byte
  +2    (padding)               2 bytes  ← align size to 4
  +4    size: 16                4 bytes
  +8    next: 0x2000            8 bytes  ← next object in allocation list
 +16    ── data starts here ──
        Your 16 bytes of string data

Total header: 16 bytes. Data pointer returned by alloc() points to offset +16.

Why isn’t next bigger? The next field is a raw pointer (8 bytes), not Option<*mut ObjHeader> (which you might expect since the list needs a sentinel for “no next object”). Rust’s null pointer optimization means Option<*mut T> is the same size as a raw pointer — None maps to a null pointer, Some(ptr) maps to the pointer value. No extra byte for a discriminant. That’s why the header is 16 bytes, not 24. (Our code uses a raw *mut ObjHeader with null as the sentinel, which is the C-compatible equivalent of the same idea.)

Key insight: When you receive a pointer from alloc(), you can always find the header by subtracting 16 bytes (sizeof header).


Practice Exercise

Before moving on, let’s verify your understanding:

Exercise 1: Trace the Allocations

Given this code:

#![allow(unused)]
fn main() {
let a = alloc(8, ObjType::Number);   // Allocates a number
let b = alloc(24, ObjType::String);  // Allocates a string

// What does ALL_OBJECTS look like?
// What are the header addresses for a and b?
}
Click to reveal answer
ALL_OBJECTS → b's header → a's header → null

a's header:
  - address: a_data_ptr - 16  (header precedes data by sizeof(ObjHeader))
  - size: 8
  - type: Number
  - next: null (it was first)

b's header:
  - address: b_data_ptr - 16
  - size: 24
  - type: String
  - next: a's header

The linked list is built in reverse order. Each new allocation becomes the new head.

Exercise 2: Why the Header?

Why do we need marked, obj_type, and size in the header? What would happen if we removed each one?

Click to reveal answer
  • Without marked: We couldn’t track which objects are live. The sweep phase wouldn’t know what to free. We might free live objects or keep garbage forever.

  • Without obj_type: We couldn’t walk object references. When marking a closure, we wouldn’t know it has an environment to also mark.

  • Without size: We couldn’t properly free memory (need to know how much to deallocate). We also couldn’t debug allocation issues.

Exercise 3: What Breaks Without thread_local!?

Suppose we changed the global object list to a plain static:

#![allow(unused)]
fn main() {
static mut ALL_OBJECTS: *mut ObjHeader = std::ptr::null_mut();
}

What would go wrong if two threads ran Lox programs simultaneously? (Hint: think about the alloc function — specifically the line that updates next and the line that sets the list head.)

Click to reveal answer

Two problems:

  1. Data race on the linked list. Thread A allocates, sets (*header).next = current_head. Thread B allocates concurrently, does the same. Both read the same current_head and both write their header as the new head. One of them loses — its object is no longer reachable from the list, so it’ll never be freed. That’s a memory leak.

  2. Data race on the count. Both threads read OBJECT_COUNT, both increment, both write back. Final count is one less than the actual number of objects.

thread_local! avoids both: each thread gets its own list and its own counter. No sharing, no races, no locks needed.


Next: Part 3 — Finding Roots — Mark-sweep can free garbage, but it has a prerequisite: knowing which objects are still alive. In compiled code, that means knowing which stack slots hold pointers. We’ll build a shadow stack that tells the GC exactly where the roots are — and wire it into the MLIR code generator so the compiler emits root registration automatically.