Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MLIR for Lox: Part 3 — Finding GC Roots — The Compiler Knows Where the Pointers Are

When GC runs, it needs to know: what variables are currently holding object references?

fun example() {
    var a = "hello";    // 'a' is on the stack
    var b = 42;         // 'b' is on the stack (but it's a number, not a reference)
    var c = a;          // 'c' is on the stack, points to "hello"
    
    // GC RUNS HERE
    
    // The GC needs to know:
    //   - 'a' is a reference to a string object
    //   - 'b' is NOT a reference (it's a number, not a pointer)
    //   - 'c' is a reference to the same string object
}

Part 2 built a mark-sweep collector that knows how to collect garbage. But mark-sweep has a prerequisite: you need to know where to start marking. That’s the root-finding problem, and it’s harder than it looks.

Why This Is Hard

In compiled code, a, b, and c are bytes in memory (stack slots or registers). The GC sees:

Stack memory:
  [slot 0]: 0x7fff1234   <-- Is this a pointer? A number? Who knows!
  [slot 1]: 42           <-- This looks like a number, but could be a pointer!
  [slot 2]: 0x7fff1234   <-- Same as slot 0

The GC can’t tell the difference between a pointer and an integer that happens to look like a valid address.

Two Approaches

ApproachHow It WorksProsCons
ConservativeTreat anything that LOOKS like a pointer as a pointerSimple, no compiler changesCan’t move objects, may keep garbage alive
PreciseTell GC exactly where pointers areEfficient, enables moving GCCompiler must track pointer locations

We’ll use precise GC because:

  1. It’s more educational
  2. It’s what real language implementations use
  3. It enables better collectors later

The Shadow Stack

The simplest way to do precise GC: maintain a linked list of stack frames.

The Concept

┌─────────────────────────────────────────────────────────────┐
│                        SHADOW STACK                         │
│                                                             │
│   Each function call pushes a "frame" that lists:           │
│     - How many roots it has                                 │
│     - Pointers to each root                                 │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌─────────────────────┐                                   │
│   │ Frame for main()    │                                   │
│   │   roots: [a, c]     │──┐                                │
│   └─────────────────────┘  │                                │
│                            ▼                                │
│   ┌─────────────────────┐                                   │
│   │ Frame for example() │                                   │
│   │   roots: [x, y]     │──┐                                │
│   └─────────────────────┘  │                                │
│                            ▼                                │
│   ┌─────────────────────┐                                   │
│   │ Frame for helper()  │                                   │
│   │   roots: [temp]     │──► NULL                           │
│   └─────────────────────┘                                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

When GC runs:

  1. Walk the linked list of frames
  2. For each frame, mark each root
  3. Done! All reachable objects are marked

Each root is a data pointer — the same pointer alloc() returns (pointing to the object’s data, after the header). The GC can find the header by subtracting sizeof(ObjHeader) from any data pointer. This convention is important: if you accidentally store a header pointer in the shadow stack, mark_object will go back too far and read garbage.

The Frame Structure

#![allow(unused)]
fn main() {
// src/runtime/shadow_stack.rs

/// A frame in the shadow stack
/// 
/// This is pushed when a function is called,
/// and popped when a function returns.
#[repr(C)]
pub struct StackFrame {
    /// Pointer to the next frame (up the call stack)
    /// null = bottom of stack
    pub next: *mut StackFrame,
    
    /// Number of roots in this frame
    pub root_count: usize,
    
    /// Array of root pointers
    /// (this is a flexible array member - actual size = root_count)
    pub roots: [*mut u8; 0],
}

impl StackFrame {
    /// Get a pointer to the roots array
    pub fn roots_ptr(&mut self) -> *mut *mut u8 {
        self.roots.as_mut_ptr()
    }
    
    /// Get a root by index
    pub fn get_root(&self, index: usize) -> *mut u8 {
        assert!(index < self.root_count);
        unsafe { *self.roots.as_ptr().add(index) }
    }
    
    /// Set a root by index
    pub fn set_root(&mut self, index: usize, value: *mut u8) {
        assert!(index < self.root_count);
        unsafe {
            *self.roots.as_mut_ptr().add(index) = value;
        }
    }
}
}

Global Stack Head

#![allow(unused)]
fn main() {
// src/runtime/shadow_stack.rs (continued)

use std::cell::RefCell;

/// The head of the shadow stack (most recent frame)
thread_local! {
    pub static SHADOW_STACK_HEAD: RefCell<*mut StackFrame> = RefCell::new(std::ptr::null_mut());
}
}

Push and Pop Operations

#![allow(unused)]
fn main() {
/// Push a new frame onto the shadow stack
/// 
/// This allocates a frame with space for `root_count` roots.
/// Returns a pointer to the roots array so the compiler can fill it in.
/// 
/// # Safety
/// The caller must call `pop_frame` before the function returns.
#[no_mangle]
pub unsafe extern "C" fn gc_push_frame(root_count: usize) -> *mut *mut u8 {
    // Calculate frame size: header + root pointers
    let frame_size = std::mem::size_of::<StackFrame>() 
                   + root_count * std::mem::size_of::<*mut u8>();
    
    // Allocate the frame
    let layout = std::alloc::Layout::from_size_align(frame_size, 8).unwrap();
    let frame_ptr = std::alloc::alloc(layout) as *mut StackFrame;
    
    if frame_ptr.is_null() {
        panic!("Out of memory allocating stack frame!");
    }
    
    // Initialize the frame
    SHADOW_STACK_HEAD.with(|head| {
        (*frame_ptr).next = *head.borrow();
        (*frame_ptr).root_count = root_count;
        
        // Initialize all roots to null
        for i in 0..root_count {
            (*frame_ptr).set_root(i, std::ptr::null_mut());
        }
        
        // Update head
        *head.borrow_mut() = frame_ptr;
    });
    
    // Return pointer to roots array
    (*frame_ptr).roots_ptr()
}

/// Pop a frame from the shadow stack
/// 
/// # Safety
/// Must be called exactly once for each `gc_push_frame`.
#[no_mangle]
pub unsafe extern "C" fn gc_pop_frame() {
    SHADOW_STACK_HEAD.with(|head| {
        let frame_ptr = *head.borrow();
        if frame_ptr.is_null() {
            panic!("No frame to pop!");
        }
        
        // Calculate frame size for deallocation
        let root_count = (*frame_ptr).root_count;
        let frame_size = std::mem::size_of::<StackFrame>() 
                       + root_count * std::mem::size_of::<*mut u8>();
        
        // Update head to next frame
        *head.borrow_mut() = (*frame_ptr).next;
        
        // Free the frame
        let layout = std::alloc::Layout::from_size_align(frame_size, 8).unwrap();
        std::alloc::dealloc(frame_ptr as *mut u8, layout);
    });
}

/// Set a root in the current frame
#[no_mangle]
pub unsafe extern "C" fn gc_set_root(root_index: usize, value: *mut u8) {
    SHADOW_STACK_HEAD.with(|head| {
        let frame_ptr = *head.borrow();
        if frame_ptr.is_null() {
            panic!("No frame!");
        }
        (*frame_ptr).set_root(root_index, value);
    });
}
}

Putting It Together — The Full GC

Now we have:

  • Allocation (Part 2)
  • Shadow stack for roots (this part)

Let’s write the actual gc_collect() function.

Mark Phase

#![allow(unused)]
fn main() {
// src/runtime/gc.rs (continued)

use crate::runtime::object::{ObjHeader, ObjType};
use crate::runtime::shadow_stack::{SHADOW_STACK_HEAD, StackFrame};

/// Mark all reachable objects
fn gc_mark() {
    // Walk the shadow stack
    SHADOW_STACK_HEAD.with(|head| {
        let mut current = *head.borrow();
        
        while !current.is_null() {
            let frame = unsafe { &*current };
            
            // Mark each root in this frame
            for i in 0..frame.root_count {
                let root = frame.get_root(i);
                if !root.is_null() {
                    mark_object(root);
                }
            }
            
            // Move to next frame
            current = frame.next;
        }
    });
}

/// Recursively mark an object and everything it references
///
/// `obj_ptr` is a DATA pointer — the same kind `alloc()` returns
/// (pointing after the header). The shadow stack stores these
/// data pointers, not header pointers. That's why we need to
/// go back to find the header.
fn mark_object(obj_ptr: *mut u8) {
    // Get the header (before the data pointer)
    // sub(1) moves back by sizeof(ObjHeader) bytes — not 1 byte.
    // Rust pointer arithmetic is typed: sub(1) on a *mut ObjHeader
    // moves back by one ObjHeader (16 bytes), the same way that
    // ptr - 1 on a T* in C moves back by sizeof(T). The key insight
    // is that the data pointer and the header are adjacent in memory —
    // the data starts immediately after the header, so stepping back
    // one ObjHeader lands on the header. (See Part 2's layout table.)
    let header = unsafe { (obj_ptr as *mut ObjHeader).sub(1) };
    
    // Already marked? Skip (avoid infinite loops)
    if unsafe { (*header).marked } {
        return;
    }
    
    // Mark it
    unsafe { (*header).marked = true; }
    
    // Now mark any objects this one references
    // (depends on object type)
    mark_references(header);
}

/// Mark objects referenced by this object
/// (Extended in Part 5 to add the Environment arm for closures)
fn mark_references(header: *mut ObjHeader) {
    let obj_type = unsafe { (*header).obj_type };
    // The data starts right after the header — same layout as Part 2's
    // memory diagram. We go forward from the header (whereas mark_object
    // goes backward from the data pointer). add(sizeof(ObjHeader)) lands
    // at offset +16, which is exactly where alloc() returns.
    let data = unsafe { (header as *mut u8).add(std::mem::size_of::<ObjHeader>()) };
    
    match obj_type {
        ObjType::Number | ObjType::String => {
            // These don't reference other objects.
            // You might wonder why Number is here — aren't numbers stack
            // values, not heap objects? In this model, they are. This arm
            // exists for exhaustiveness: ObjType has a Number variant, so
            // the match must handle it. In practice, the GC never traces
            // a Number object because numbers are never heap-allocated.
        }
        
        ObjType::Closure => {
            // A closure references its captured variables.
            // This layout is the simple model: a function pointer followed by a flat
            // capture array. Part 5 (Closures) revises the layout to use a function
            // index + environment pointer — and the GC trace logic changes to walk
            // the environment's slots instead of a capture array. Both approaches
            // do the same thing (walk the closure's outgoing references and mark
            // them); the difference is how those references are stored.
            // Layout (this part): [function_ptr: *mut u8 (8 bytes)][capture_count: usize (8 bytes)][capture_0][capture_1]...
            //                       offset 0                   offset 8                         offset 16
            unsafe {
                let capture_count = *(data.add(8) as *const usize);
                let captures = data.add(16) as *const *mut u8;
                
                for i in 0..capture_count {
                    let captured = *captures.add(i);
                    if !captured.is_null() {
                        mark_object(captured);
                    }
                }
            }
        }
        
        ObjType::Instance => {
            // An instance references its field values
            // Layout: [class_ptr: *mut u8 (8 bytes)][field_count: usize (8 bytes)][field_0_key][field_0_value]...
            //           offset 0                    offset 8                          offset 16
            unsafe {
                let field_count = *(data.add(8) as *const usize);
                let fields = data.add(16) as *const *mut u8;
                
                // Each field is (key_ptr, value_ptr)
                // We only trace field values, not class_ptr or field keys.
                //
                // Why skip class_ptr? The class pointer references a class object
                // that's reachable through global variables (class declarations are
                // rooted in the compiler's global scope). As long as classes are
                // always global declarations — never dynamically allocated and
                // assigned to a local variable — the class object is already live.
                // If you add dynamically-allocated classes, you'd need to trace
                // class_ptr here; otherwise, a class stored in a local could be
                // collected while instances still reference it.
                //
                // Why skip field keys? Field keys are ObjString pointers owned by
                // the class definition. They're reachable through the class (if
                // you traced class_ptr, you'd find them). Since the class is
                // always reachable (see above), the keys are transitively live.
                //
                // Part 7 (Classes and Instances) refines this model — instances
                // gain a more detailed layout with method tables and the class
                // object stores its own GC trace. The GC logic changes to walk the
                // class's method table, which is the right place to trace class_ptr
                // and keys for the general case.
                for i in 0..field_count {
                    let value = *fields.add(i * 2 + 1);
                    if !value.is_null() {
                        mark_object(value);
                    }
                }
            }
        }
    }
}
}

Sweep Phase

#![allow(unused)]
fn main() {
/// Free all unmarked objects
fn gc_sweep() {
    let mut freed_count = 0;
    
    ALL_OBJECTS.with(|objects| {
        let mut current = *objects.borrow();
        // Collect surviving objects, then rebuild the linked list.
        // This uses a Vec because unlinking in-place (pointer
        // manipulation while walking) is error-prone: you need a
        // `prev` pointer to splice out the current node, a special
        // case for the list head (no `prev` to update), and careful
        // handling when `prev.next` should skip the freed node.
        // The Vec approach separates the two phases (identify
        // survivors, rebuild list) and avoids the pointer
        // gymnastics entirely.
        //
        // The Vec allocation goes through Rust's system allocator
        // (the same allocator that backs Vec in any Rust program),
        // not our custom `alloc()`. So this won't trigger a
        // re-entrant GC — `maybe_gc` only fires from `alloc()`,
        // not from the system allocator. A production GC would
        // use in-place pointer manipulation to avoid even this
        // system allocation, but clarity wins in a tutorial.
        let mut keep: Vec<*mut ObjHeader> = Vec::new();
        
        while !current.is_null() {
            let header = unsafe { &*current };
            let next = header.next;
            
            if header.marked {
                // Still alive! Clear mark for next cycle
                unsafe { (*current).marked = false; }
                keep.push(current);
            } else {
                // Dead! Free it
                let total_size = std::mem::size_of::<ObjHeader>() + header.size as usize;
                let layout = std::alloc::Layout::from_size_align(total_size, 8).unwrap();
                unsafe {
                    std::alloc::dealloc(current as *mut u8, layout);
                }
                freed_count += 1;
            }
            
            current = next;
        }
        
        // Rebuild linked list from surviving objects
        for (i, &header_ptr) in keep.iter().enumerate() {
            let next = if i + 1 < keep.len() {
                keep[i + 1]
            } else {
                std::ptr::null_mut()
            };
            unsafe { (*header_ptr).next = next; }
        }
        
        *objects.borrow_mut() = keep.first().copied().unwrap_or(std::ptr::null_mut());
    });
    
    OBJECT_COUNT.with(|count| {
        *count.borrow_mut() -= freed_count;
    });
    
    eprintln!("GC: Freed {} objects, {} remaining", freed_count, 
              OBJECT_COUNT.with(|c| *c.borrow()));
}
}

The Main Entry Point

#![allow(unused)]
fn main() {
/// Run a garbage collection cycle
/// 
/// This is automatically called when allocation threshold is hit,
/// but can also be called manually.
#[no_mangle]
pub extern "C" fn gc_collect() {
    gc_mark();
    gc_sweep();
}
}

When to Collect

We need to decide: when should gc_collect() run?

Options:

  1. Every allocation - Simple but slow
  2. Every N allocations - Simple, tunable
  3. When memory exceeds threshold - More sophisticated
  4. Never automatically - Manual only (good for debugging)

We’ll use option 2 (every N allocations):

#![allow(unused)]
fn main() {
// src/runtime/gc.rs

/// Trigger collection after this many allocations
const GC_THRESHOLD: usize = 1024;

/// Check if we should collect
fn maybe_gc() {
    OBJECT_COUNT.with(|count| {
        if *count.borrow() >= GC_THRESHOLD {
            gc_collect();
        }
    });
}

/// Modified alloc to trigger GC
pub fn alloc(size: usize, obj_type: ObjType) -> *mut u8 {
    maybe_gc();  // Check before allocating
    
    // ... rest of alloc code ...
}
}

Step-by-Step Mark-Sweep Walkthrough

Let’s trace through a complete GC cycle with a concrete example:

Setup: A Running Program

fun main() {
    var a = "hello";    // Object A: String "hello"
    var b = 42;         // Not an object (number is inline)
    var c = a;          // c points to same string as a
    
    // ... some code that creates garbage ...
    var temp = "garbage";  // Object G: String "garbage"
    temp = nil;            // G is now garbage!
    
    // GC RUNS HERE
    
    print a;  // Should still work
    print c;  // Should still work
}

State Before GC

Shadow Stack:                     ← roots store DATA pointers
┌─────────────────────────────────┐    (the pointer alloc() returns)
│ Frame for main()                │
│   root[0]: 0x0110 (a)           │
│   root[1]: null (b)             │
│   root[2]: 0x0110 (c)           │
└─────────────────────────────────┘

Heap Objects:                     ← ALL_OBJECTS links HEADERS
ALL_OBJECTS → 0x0200 (G) → 0x0100 (A) → None
                                   (headers, not data pointers)

Object A:
  header (at 0x0100): { marked: false, type: String, size: 5, next: None }
  data  (at 0x0110): "hello"        ← 0x0100 + 16 bytes = 0x0110

Object G:
  header (at 0x0200): { marked: false, type: String, size: 7, next: 0x0100 }
  data  (at 0x0210): "garbage"      ← 0x0200 + 16 bytes = 0x0210

Mark Phase: Step by Step

Step 1: Walk to frame (main)
        
Step 2: Process root[0] = 0x0110 (a)
        - This is a data pointer. Get header by going back 16 bytes:
          0x0110 - 16 = 0x0100
        - (0x0100).marked = false → mark it!
        - Type is String, no references to mark
        Result: Object A is now marked

Step 3: Process root[1] = null
        - Skip null values
        Result: No action

Step 4: Process root[2] = 0x0110 (c)
        - Get header at 0x0110 - 16 = 0x0100
        - (0x0100).marked = true → already marked, skip
        Result: Already marked, no action

Step 5: No more frames
        Mark phase complete!

State After Mark Phase

Shadow Stack: (unchanged)
┌─────────────────────────────────┐
│ Frame for main()                │
│   root[0]: 0x0110               │
│   root[1]: null                 │
│   root[2]: 0x0110               │
└─────────────────────────────────┘

Heap Objects:
ALL_OBJECTS → 0x0200 (G) → 0x0100 (A) → None

Object A:
  header (at 0x0100): { marked: TRUE, type: String, size: 5, next: None }
  data  (at 0x0110): "hello"

Object G:
  header (at 0x0200): { marked: FALSE, type: String, size: 7, next: 0x0100 }
  data  (at 0x0210): "garbage"

Sweep Phase: Step by Step

Step 1: Walk to object 0x0200 (G — header address)
        - marked = false → FREE IT!
        - Free 23 bytes (16-byte header + 7 data)
        - freed_count = 1

Step 2: Walk to object 0x0100 (A — header address)
        - marked = true → KEEP IT
        - Clear mark for next cycle
        - marked = false
        - Add to keep list

Step 3: Rebuild linked list
        - keep = [0x0100]
        - ALL_OBJECTS = 0x0100
        - 0x0100.next = null

Result: 1 object freed, 1 remaining

State After Sweep Phase

Shadow Stack: (unchanged)

Heap Objects:
ALL_OBJECTS → 0x0100 (A) → null

Object A:
  header (at 0x0100): { marked: false, type: String, size: 5, next: null }
  data  (at 0x0110): "hello"

Object G: FREED! Memory returned to system.

Common Mistakes and Debugging

Mistake 1: Forgetting to Register a Root

Code:

#![allow(unused)]
fn main() {
fn broken_function() {
    let obj = alloc(16, ObjType::String);
    // Oops! Forgot gc_set_root(0, obj)
    
    gc_collect();  // BAD! obj might be freed!
    
    // Now obj points to freed memory
    use_object(obj);  // CRASH or corruption!
}
}

Fix:

#![allow(unused)]
fn main() {
fn correct_function() {
    gc_push_frame(1);  // One root slot
    let obj = alloc(16, ObjType::String);
    gc_set_root(0, obj);  // Register as root!
    
    gc_collect();  // Safe! obj is protected
    
    // ... use obj ...
    
    gc_pop_frame();  // Clean up
}
}

Mistake 2: Push/Pop Mismatch

Code:

#![allow(unused)]
fn main() {
fn broken() {
    gc_push_frame(2);
    // Oops! Forgot gc_pop_frame()
}  // Frame leaks!

fn next_call() {
    gc_push_frame(1);  // Creates NEW frame
    // But old frame is still in the list!
    // GC will mark stale pointers
}
}

Fix: Always use RAII pattern or ensure pop on all paths:

#![allow(unused)]
fn main() {
struct FrameGuard;

impl FrameGuard {
    fn new(root_count: usize) -> Self {
        // gc_push_frame returns the roots array pointer, but
        // FrameGuard doesn't save it — you'd call gc_set_root
        // instead. The return value matters when the *compiler*
        // fills in roots directly (see "From Runtime to Compiler"
        // at the end of this part).
        unsafe { gc_push_frame(root_count) };
        FrameGuard
    }
}

impl Drop for FrameGuard {
    fn drop(&mut self) {
        unsafe { gc_pop_frame() };
    }
}

fn correct() {
    let _guard = FrameGuard::new(2);
    // gc_pop_frame called automatically when _guard drops
    // Even if we panic or return early!
}
}

What About Circular References?

If object A references B and B references A, won’t mark_object loop forever? No — the marked flag prevents that:

Object A references Object B
Object B references Object A

mark(A):
  A.marked = true
  mark(B)  // A's reference
    B.marked = true
    mark(A)  // B's reference
      A.marked = true → SKIP (already marked)
    return
  return

No infinite loop!

Debugging Tips

Add these diagnostic functions:

#![allow(unused)]
fn main() {
/// Check if the GC state is consistent
pub fn gc_validate() {
    // Check shadow stack
    SHADOW_STACK_HEAD.with(|head| {
        let mut current = *head.borrow();
        let mut frame_count = 0;
        
        while !current.is_null() {
            frame_count += 1;
            if frame_count > 1000 {
                panic!("Shadow stack too deep! Infinite loop?");
            }
            current = unsafe { (*current).next };
        }
    });
    
    // Check object list
    ALL_OBJECTS.with(|objects| {
        let mut current = *objects.borrow();
        let mut obj_count = 0;
        
        while !current.is_null() {
            obj_count += 1;
            if obj_count > 100000 {
                panic!("Object list too long! Infinite loop?");
            }
            current = unsafe { (*current).next };
        }
        
        // Verify count matches
        OBJECT_COUNT.with(|count| {
            assert_eq!(*count.borrow(), obj_count, "Object count mismatch!");
        });
    });
}
}

Practice Exercises

Exercise 1: Trace the Mark Phase

Given this state:

Shadow Stack:                                    ← roots store DATA pointers
  Frame(main): roots = [0x1010, null, 0x2010]     (the pointer alloc() returns)
  Frame(helper): roots = [0x3010]

Heap:                                            ← header addresses listed first;
  0x1000: String "a" (not marked)   data at 0x1010    data is 16 bytes after
  0x2000: String "b" (not marked)   data at 0x2010    (same convention as the
  0x3000: String "c" (not marked)   data at 0x3010     walkthrough above)
  0x4000: String "garbage" (not marked)   data at 0x4010

Which objects get marked?

Click to reveal answer
  • 0x1000: Marked (root 0x1010 in main frame → header at 0x1010 − 16 = 0x1000)
  • 0x2000: Marked (root 0x2010 in main frame → header at 0x2010 − 16 = 0x2000)
  • 0x3000: Marked (root 0x3010 in helper frame → header at 0x3010 − 16 = 0x3000)
  • 0x4000: NOT marked (no root points to it) → will be freed

The mark phase walks all frames, subtracts 16 from each non-null data pointer to find the header, then marks it.

Exercise 2: Root Counting for Nested Scopes

How many root slots does this function need?

fun makeGreeter(greeting) {
    var counter = 0;
    
    fun greet(name) {
        counter = counter + 1;
        print greeting + " " + name;
    }
    
    return greet;
}

Count every variable that could hold a heap reference — parameters, locals, and anything captured by the inner function. Remember: in the numbers-only model, we root everything uniformly (see “Why We Root Numbers” in Part 4 for the full reasoning).

Click to reveal answer

makeGreeter needs 3 root slots:

  1. greeting (parameter) — could be a string
  2. counter (local) — a number now, but we root everything uniformly (see “Why We Root Numbers” in Part 4), and later when closures capture it, it will need to survive GC
  3. greet (local) — the closure object itself, allocated on the heap

greet needs 1 root slot:

  1. name (parameter) — could be a string

Wait — what about greeting and counter? Those are captured from the outer scope. But captured variables are reachable through the closure’s environment, which is itself a root in makeGreeter. The GC traces roots → objects → references, so it will find greeting and counter through the environment. A variable only needs its own root slot if it can’t be reached by tracing from another root.

The takeaway: captured variables don’t need separate root slots in the inner function — they’re already reachable through the closure object.

Exercise 3: Why Not Mark During Sweep?

Why do we separate mark and sweep into two phases? Why not free objects as soon as we find they’re unreachable?

Click to reveal answer

We need to mark ALL live objects before freeing ANY.

Consider:

Object A references Object B
We're walking the heap in order: A, then B

If we freed A immediately when we saw it was unmarked:
  - We'd lose the reference to B
  - B would become unreachable
  - But B might be reachable from a root!
  
By marking first, we ensure we've found ALL live objects.
Then sweep can safely free everything else.

Two-phase approach ensures correctness.

From Runtime to Compiler

The shadow stack API we built in this part — gc_push_frame, gc_set_root, gc_pop_frame — is designed to be called from compiled code. Right now, we’re calling it by hand from Rust. But in a real compiler, the code generator emits the calls automatically: every function gets a push_frame at entry and a pop_frame before every return, and every variable store gets a set_root.

Part 4 (MLIR Integration) shows how the MLIR code generator does exactly that — emitting lox.push_frame, lox.set_root, and lox.pop_frame operations that lower to the runtime calls we defined here. The concept is the same; the difference is who writes the calls.


Next: Part 4 — MLIR Integration — The shadow stack works, but calling gc_push_frame by hand in every function is error-prone and verbose. The compiler should do it for us. We’ll define custom MLIR operations (lox.alloc, lox.set_root) that emit GC bookkeeping during lowering — the GC becomes invisible to the programmer and automatic in the generated code.