MLIR for Lox: Part 2 - Garbage Collection (From Scratch)

This guide starts from absolute zero and builds up GC understanding one concept at a time. No prior GC knowledge assumed.


Chapter 1: The Problem

What Happens to Memory Without GC?

When you allocate memory, it comes from a big pool called the heap. In C, you do:

void* obj = malloc(16);  // Take 16 bytes from heap
// ... use obj ...
free(obj);                // Put it back

The problem: When do you call free()?

Consider this Lox code:

var a = "hello";
var b = a;
a = nil;
// Is "hello" still needed? b still references it!

The string "hello" is referenced by b. If we freed it when a was set to nil, b would point to garbage.

The core question GC answers:

When is an object no longer needed, so it's safe to free?


Chapter 2: The Intuition

Think of it like a party at your house:

  1. Objects = guests
  2. References = guests holding a "ticket" to stay
  3. GC = you checking who still has a ticket
Guest A enters (allocate)
    ↓
Guest A gives ticket to Guest B (reference)
    ↓
Guest A leaves but Guest B still has ticket
    ↓
You check: "Does anyone have Guest A's ticket?"
    ↓
No? → Guest A can leave (free)

Key insight: An object is "live" if it's reachable from your program's variables. If nothing points to it, it's garbage.


Chapter 3: The Two Questions

GC must answer:

  1. What objects are "roots"? (starting points for reachability)
  2. What objects does each object reference? (follow the chain)

What Are Roots?

A "root" is something your program can directly access:

var x = 1;    // 'x' is a root (global variable)
print x;      // We can reach '1' from 'x'

Roots include:

  • Global variables
  • Local variables on the stack
  • Temporaries in expressions (intermediate results)

What Is Reachability?

var a = "hello";   // "hello" is reachable from root 'a'
var b = a;         // "hello" is reachable from 'b' too
a = nil;           // Still reachable from 'b'
b = nil;           // Now unreachable! Garbage!

Visual representation:

BEFORE:                      AFTER b = nil:

    ┌─────────┐                  ┌─────────┐
    │ root: a │──► "hello"       │ root: a │──► nil
    └─────────┘                  └─────────┘
    ┌─────────┐                  ┌─────────┐
    │ root: b │──► "hello"       │ root: b │──► nil
    └─────────┘                  └─────────┘
                                      │
                                 "hello" ← GARBAGE!
                                 (nothing points to it)

Chapter 4: The Algorithm (Mark-Sweep)

The simplest GC algorithm is mark-sweep. It has two phases:

Phase 1: Mark

┌─────────────────────────────────────────────┐
│ MARK PHASE                                  │
│                                             │
│   1. Start with all roots                   │
│   2. For each root, mark the object         │
│   3. Recursively mark objects it references │
│   4. Repeat until nothing new to mark       │
└─────────────────────────────────────────────┘

Phase 2: Sweep

┌─────────────────────────────────────────────┐
│ SWEEP PHASE                                 │
│                                             │
│   1. Walk all allocated objects             │
│   2. If marked: clear mark (still live)     │
│   3. If not marked: FREE IT (garbage)       │
└─────────────────────────────────────────────┘

Visual Example

Before collection:

Objects in heap:
  
  [A] ──► [B] ──► [C]
   ↑
  root (global variable)
  
  [D] ──► [E]
   ↑
  nothing! (D was a local that went out of scope)

After mark phase:

  [A*] ──► [B*] ──► [C*]    (* = marked, still live)
   ↑
  root
  
  [D] ──► [E]              (not marked, these are garbage)

After sweep phase:

  [A] ──► [B] ──► [C]       (live, marks cleared for next cycle)
  
  [freed] [freed]           (D and E are gone! memory reclaimed)

Chapter 5: The Implementation Challenge

Now we know WHAT to do. But HOW does the GC know:

  1. Where are the roots? (what variables are currently in scope)
  2. What does each object reference? (which other objects it points to)

These are the two hard problems. Let's tackle them one at a time.

Problem 1: Finding Roots

When the GC runs, your program is paused mid-execution. The stack has local variables. Those are roots.

fun example() {
    var a = 1;    // 'a' is on the stack → root
    var b = 2;    // 'b' is on the stack → root
    // GC runs here!
    // It needs to know about 'a' and 'b'
}

The challenge: In compiled code, variables are just CPU registers or stack slots. The GC doesn't know which ones are pointers vs integers.

Solutions:

  • Conservative GC: Treat everything that LOOKS like a pointer as a pointer
  • Precise GC: Tell the GC exactly where pointers are (this is what we'll do)

Problem 2: Finding Object References

Each object may reference other objects:

class Pair {
    init(first, second) {
        this.first = first;   // this object references 'first'
        this.second = second; // this object references 'second'
    }
}

var pair = Pair(1, 2);
// The 'pair' object references two number objects

The challenge: The GC needs to know: given an object, what other objects does it point to?

Solution: Store type information with each object so the GC knows how to walk it.


Chapter 6: Our First Data Structure

Before writing any GC code, we need a way to represent objects. Every object needs:

  1. A header (metadata for the GC)
  2. The actual data (the object's fields)
┌────────────────────────────┐
│ Object Header              │
│   - marked: bool           │
│   - type: ObjType          │
│   - size: usize            │
├────────────────────────────┤
│ Object Data                │
│   - field 1                │
│   - field 2                │
│   - ...                    │
└────────────────────────────┘

Why the Header?

When the GC walks all objects, it needs to:

  • Know if an object is already marked (avoid infinite loops)
  • Know what type it is (to find references)
  • Know how big it is (for sweeping)

The Type Enum

Different Lox objects have different data:

Lox TypeWhat It StoresWhat It References
Numberf64Nothing
StringbytesNothing
Closurefunction ptr, envEnvironment variables
Instanceclass ptr, fieldsField values

Chapter 7: Let's Write Some Code

Now we understand the concepts. Let's implement the absolute minimum:

Step 1: Define the Object Header

#![allow(unused)]
fn main() {
// src/runtime/object.rs

/// Type of a Lox object
#[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum ObjType {
    Number = 0,
    String = 1,
    Closure = 2,
    Instance = 3,
}

/// Header prepended to every heap object
/// 
/// Memory layout:
///   [header: ObjHeader][data: depends on ObjType]
#[repr(C)]
pub struct ObjHeader {
    /// Has this object been marked by the GC?
    pub marked: bool,
    
    /// What type of object is this?
    pub obj_type: ObjType,
    
    /// Size of the data portion (not including header)
    pub size: u32,
    
    /// Pointer to next object in the allocation list
    /// (the GC needs to walk all objects during sweep)
    pub next: Option<*mut ObjHeader>,
}
}

Step 2: The Global Object List

The GC needs to know about ALL objects to sweep them:

#![allow(unused)]
fn main() {
// src/runtime/gc.rs
use std::cell::RefCell;

/// All allocated objects, linked list style
thread_local! {
    static ALL_OBJECTS: RefCell<Option<*mut ObjHeader>> = RefCell::new(None);
    static OBJECT_COUNT: RefCell<usize> = RefCell::new(0);
}
}

Why thread_local!? Because Lox is single-threaded, and this avoids passing a GC context everywhere.

Step 3: Allocation (Without GC Yet)

#![allow(unused)]
fn main() {
/// Allocate a new object
/// 
/// This is like `malloc` but:
///   1. Adds a header for GC metadata
///   2. Tracks the object in a global list
pub fn alloc(size: usize, obj_type: ObjType) -> *mut u8 {
    // Calculate total size: header + data
    let total_size = std::mem::size_of::<ObjHeader>() + size;
    
    // Allocate raw memory
    let layout = std::alloc::Layout::from_size_align(total_size, 8).unwrap();
    let ptr = unsafe { std::alloc::alloc(layout) };
    
    if ptr.is_null() {
        panic!("Out of memory!");
    }
    
    // Write the header
    let header = ptr as *mut ObjHeader;
    unsafe {
        (*header).marked = false;
        (*header).obj_type = obj_type;
        (*header).size = size as u32;
        
        // Add to global list
        ALL_OBJECTS.with(|list| {
            (*header).next = *list.borrow();
            *list.borrow_mut() = Some(header);
        });
    }
    
    // Return pointer to DATA (after header)
    let data_ptr = unsafe { ptr.add(std::mem::size_of::<ObjHeader>()) };
    
    // Increment count
    OBJECT_COUNT.with(|count| {
        *count.borrow_mut() += 1;
    });
    
    data_ptr
}
}

Memory Layout Deep Dive

Let's trace through exactly what happens when we allocate:

#![allow(unused)]
fn main() {
// Calling:
let ptr = alloc(16, ObjType::String);

// Step 1: Calculate total size
//   sizeof(ObjHeader) = 16 bytes (bool + padding + u8 + padding + u32 + padding + pointer)
//   data_size = 16 bytes
//   total = 32 bytes

// Step 2: Allocate raw memory
//   ptr = 0x1000 (example address)

// Step 3: Write header at ptr
//   0x1000: marked = false
//   0x1001: obj_type = String (1)
//   0x1004: size = 16
//   0x1008: next = previous head of list

// Step 4: Return data pointer
//   data_ptr = 0x1010 (ptr + 16)
}

Visual Memory Layout

Address   Content
──────────────────────────────────────
0x1000    ┌─────────────────────────┐ ← Header starts here
0x1001    │ marked: false (1 byte)  │
0x1002    │ padding (1 byte)        │
0x1003    │ obj_type: 1 (1 byte)    │
0x1004    │ padding (1 byte)        │
0x1005    │ size: 16 (4 bytes)      │
0x1008    │ padding (4 bytes)       │
0x1010    │ next: 0x2000 (8 bytes)  │ ← Points to previous allocation
          ├─────────────────────────┤
0x1018    │                         │ ← Data starts here (returned ptr)
0x1020    │   Your 16 bytes of      │
0x1028    │   string data           │
0x1030    │                         │
          └─────────────────────────┘

Key insight: When you receive a pointer from alloc(), you can always find the header by subtracting 16 bytes (sizeof header).


Chapter 7.5: Practice Exercise

Before moving on, let's verify your understanding:

Exercise 1: Trace the Allocations

Given this code:

#![allow(unused)]
fn main() {
let a = alloc(8, ObjType::Number);   // Allocates a number
let b = alloc(24, ObjType::String);  // Allocates a string

// What does ALL_OBJECTS look like?
// What are the header addresses for a and b?
}
Click to reveal answer
ALL_OBJECTS → b's header → a's header → None

a's header:
  - address: b's data ptr - 24 - 16 = depends on b's allocation
  - size: 8
  - type: Number
  - next: None (it was first)

b's header:
  - address: b_data_ptr - 16
  - size: 24
  - type: String
  - next: a's header

The linked list is built in reverse order. Each new allocation becomes the new head.

Exercise 2: Why the Header?

Why do we need marked, obj_type, and size in the header? What would happen if we removed each one?

Click to reveal answer
  • Without marked: We couldn't track which objects are live. The sweep phase wouldn't know what to free. We might free live objects or keep garbage forever.

  • Without obj_type: We couldn't walk object references. When marking a closure, we wouldn't know it has an environment to also mark.

  • Without size: We couldn't properly free memory (need to know how much to deallocate). We also couldn't debug allocation issues.

Exercise 3: Thread Safety

Why do we use thread_local! instead of a regular static?

Click to reveal answer

A regular static would be shared across all threads, requiring synchronization (mutexes, atomics). Since Lox is single-threaded, thread_local! gives us:

  1. No synchronization overhead
  2. Each thread could have its own GC (if we later add threads)
  3. Simpler code

If we used a regular static, we'd need static ALL_OBJECTS: Mutex<...> which adds overhead.


Checkpoint: Before Moving On

We have:

  • Object header definition
  • Global object list
  • Allocation function
  • Understanding of memory layout

Before we continue to Part 3, make sure you can answer:

  1. Where is the header relative to the returned pointer?
  2. How does the linked list of objects work?
  3. Why do we need each field in the header?

Next: How do we find roots during garbage collection?