MLIR for Lox: Part 3 - Finding Roots
Continuing from Part 2. We can allocate objects. Now we need to find them during GC.
Chapter 8: The Root Problem
When GC runs, it needs to know: what variables are currently holding object references?
fun example() {
var a = "hello"; // 'a' is on the stack
var b = 42; // 'b' is on the stack (but it's a number, not a reference)
var c = a; // 'c' is on the stack, points to "hello"
// GC RUNS HERE
// The GC needs to know:
// - 'a' is a reference to a string object
// - 'b' is NOT a reference (just a number)
// - 'c' is a reference to the same string object
}
Why This Is Hard
In compiled code, a, b, and c are just bytes in memory (stack slots or registers). The GC sees:
Stack memory:
[slot 0]: 0x7fff1234 <-- Is this a pointer? A number? Who knows!
[slot 1]: 42 <-- This looks like a number, but could be a pointer!
[slot 2]: 0x7fff1234 <-- Same as slot 0
The GC can't tell the difference between a pointer and an integer that happens to look like a valid address.
Two Approaches
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Conservative | Treat anything that LOOKS like a pointer as a pointer | Simple, no compiler changes | Can't move objects, may keep garbage alive |
| Precise | Tell GC exactly where pointers are | Efficient, enables moving GC | Compiler must track pointer locations |
We'll use precise GC because:
- It's more educational
- It's what real language implementations use
- It enables better collectors later
Chapter 9: The Shadow Stack
The simplest way to do precise GC: maintain a linked list of stack frames.
The Concept
┌─────────────────────────────────────────────────────────────┐
│ SHADOW STACK │
│ │
│ Each function call pushes a "frame" that lists: │
│ - How many roots it has │
│ - Pointers to each root │
│ │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ │
│ │ Frame for main() │ │
│ │ roots: [a, c] │──┐ │
│ └─────────────────────┘ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Frame for example() │ │
│ │ roots: [x, y] │──┐ │
│ └─────────────────────┘ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Frame for helper() │ │
│ │ roots: [temp] │──► NULL │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
When GC runs:
- Walk the linked list of frames
- For each frame, mark each root
- Done! All reachable objects are marked
The Frame Structure
#![allow(unused)] fn main() { // src/runtime/shadow_stack.rs /// A frame in the shadow stack /// /// This is pushed when a function is called, /// and popped when a function returns. #[repr(C)] pub struct StackFrame { /// Pointer to the next frame (up the call stack) pub next: Option<*mut StackFrame>, /// Number of roots in this frame pub root_count: usize, /// Array of root pointers /// (this is a flexible array member - actual size = root_count) pub roots: [*mut u8; 0], } impl StackFrame { /// Get a pointer to the roots array pub fn roots_ptr(&mut self) -> *mut *mut u8 { self.roots.as_mut_ptr() } /// Get a root by index pub fn get_root(&self, index: usize) -> *mut u8 { assert!(index < self.root_count); unsafe { *self.roots.as_ptr().add(index) } } /// Set a root by index pub fn set_root(&mut self, index: usize, value: *mut u8) { assert!(index < self.root_count); unsafe { *self.roots.as_mut_ptr().add(index) = value; } } } }
Global Stack Head
#![allow(unused)] fn main() { // src/runtime/shadow_stack.rs (continued) use std::cell::RefCell; /// The head of the shadow stack (most recent frame) thread_local! { pub static SHADOW_STACK_HEAD: RefCell<Option<*mut StackFrame>> = RefCell::new(None); } }
Push and Pop Operations
#![allow(unused)] fn main() { /// Push a new frame onto the shadow stack /// /// This allocates a frame with space for `root_count` roots. /// Returns a pointer to the roots array so the compiler can fill it in. /// /// # Safety /// The caller must call `pop_frame` before the function returns. #[no_mangle] pub unsafe extern "C" fn gc_push_frame(root_count: usize) -> *mut *mut u8 { // Calculate frame size: header + root pointers let frame_size = std::mem::size_of::<StackFrame>() + root_count * std::mem::size_of::<*mut u8>(); // Allocate the frame let layout = std::alloc::Layout::from_size_align(frame_size, 8).unwrap(); let frame_ptr = std::alloc::alloc(layout) as *mut StackFrame; if frame_ptr.is_null() { panic!("Out of memory allocating stack frame!"); } // Initialize the frame SHADOW_STACK_HEAD.with(|head| { (*frame_ptr).next = *head.borrow(); (*frame_ptr).root_count = root_count; // Initialize all roots to null for i in 0..root_count { (*frame_ptr).set_root(i, std::ptr::null_mut()); } // Update head *head.borrow_mut() = Some(frame_ptr); }); // Return pointer to roots array (*frame_ptr).roots_ptr() } /// Pop a frame from the shadow stack /// /// # Safety /// Must be called exactly once for each `gc_push_frame`. #[no_mangle] pub unsafe extern "C" fn gc_pop_frame() { SHADOW_STACK_HEAD.with(|head| { let frame_ptr = (*head.borrow()).expect("No frame to pop!"); // Calculate frame size for deallocation let root_count = (*frame_ptr).root_count; let frame_size = std::mem::size_of::<StackFrame>() + root_count * std::mem::size_of::<*mut u8>(); // Update head to next frame *head.borrow_mut() = (*frame_ptr).next; // Free the frame let layout = std::alloc::Layout::from_size_align(frame_size, 8).unwrap(); std::alloc::dealloc(frame_ptr as *mut u8, layout); }); } /// Set a root in the current frame #[no_mangle] pub unsafe extern "C" fn gc_set_root(root_index: usize, value: *mut u8) { SHADOW_STACK_HEAD.with(|head| { let frame_ptr = (*head.borrow()).expect("No frame!"); (*frame_ptr).set_root(root_index, value); }); } }
Chapter 10: Putting It Together - The Full GC
Now we have:
- Allocation (Part 2)
- Shadow stack for roots (this part)
Let's write the actual gc_collect() function.
Mark Phase
#![allow(unused)] fn main() { // src/runtime/gc.rs (continued) use crate::runtime::object::{ObjHeader, ObjType}; use crate::runtime::shadow_stack::{SHADOW_STACK_HEAD, StackFrame}; /// Mark all reachable objects fn gc_mark() { // Walk the shadow stack SHADOW_STACK_HEAD.with(|head| { let mut current = *head.borrow(); while let Some(frame_ptr) = current { let frame = unsafe { &*frame_ptr }; // Mark each root in this frame for i in 0..frame.root_count { let root = frame.get_root(i); if !root.is_null() { mark_object(root); } } // Move to next frame current = frame.next; } }); } /// Recursively mark an object and everything it references fn mark_object(obj_ptr: *mut u8) { // Get the header (before the data pointer) let header = unsafe { (obj_ptr as *mut ObjHeader).sub(1) }; // Already marked? Skip (avoid infinite loops) if unsafe { (*header).marked } { return; } // Mark it unsafe { (*header).marked = true; } // Now mark any objects this one references // (depends on object type) mark_references(header); } /// Mark objects referenced by this object fn mark_references(header: *mut ObjHeader) { let obj_type = unsafe { (*header).obj_type }; let data = unsafe { (header as *mut u8).add(std::mem::size_of::<ObjHeader>()) }; match obj_type { ObjType::Number | ObjType::String => { // These don't reference other objects } ObjType::Closure => { // A closure references its captured variables // Layout: [function_ptr][capture_count][capture_0][capture_1]... unsafe { let capture_count = *(data.add(8) as *const usize); let captures = data.add(16) as *const *mut u8; for i in 0..capture_count { let captured = *captures.add(i); if !captured.is_null() { mark_object(captured); } } } } ObjType::Instance => { // An instance references its field values // Layout: [class_ptr][field_count][field_0_key][field_0_value]... unsafe { let field_count = *(data.add(8) as *const usize); let fields = data.add(16) as *const *mut u8; // Each field is (key_ptr, value_ptr) for i in 0..field_count { let value = *fields.add(i * 2 + 1); if !value.is_null() { mark_object(value); } } } } } } }
Sweep Phase
#![allow(unused)] fn main() { /// Free all unmarked objects fn gc_sweep() { use std::collections::HashSet; // We need to rebuild the object list as we sweep let mut new_head: Option<*mut ObjHeader> = None; let mut freed_count = 0; ALL_OBJECTS.with(|objects| { let mut current = *objects.borrow(); let mut keep = Vec::new(); while let Some(header_ptr) = current { let header = unsafe { &*header_ptr }; let next = header.next; if header.marked { // Still alive! Clear mark for next cycle unsafe { (*header_ptr).marked = false; } keep.push(header_ptr); } else { // Dead! Free it let total_size = std::mem::size_of::<ObjHeader>() + header.size as usize; let layout = std::alloc::Layout::from_size_align(total_size, 8).unwrap(); unsafe { std::alloc::dealloc(header_ptr as *mut u8, layout); } freed_count += 1; } current = next; } // Rebuild linked list for (i, &header_ptr) in keep.iter().enumerate() { let next = if i + 1 < keep.len() { Some(keep[i + 1]) } else { None }; unsafe { (*header_ptr).next = next; } } *objects.borrow_mut() = keep.first().copied(); }); OBJECT_COUNT.with(|count| { *count.borrow_mut() -= freed_count; }); eprintln!("GC: Freed {} objects, {} remaining", freed_count, OBJECT_COUNT.with(|c| *c.borrow())); } }
The Main Entry Point
#![allow(unused)] fn main() { /// Run a garbage collection cycle /// /// This is automatically called when allocation threshold is hit, /// but can also be called manually. #[no_mangle] pub extern "C" fn gc_collect() { gc_mark(); gc_sweep(); } }
Chapter 11: When to Collect
We need to decide: when should gc_collect() run?
Options:
- Every allocation - Simple but slow
- Every N allocations - Simple, tunable
- When memory exceeds threshold - More sophisticated
- Never automatically - Manual only (good for debugging)
We'll use option 2 (every N allocations):
#![allow(unused)] fn main() { // src/runtime/gc.rs /// Trigger collection after this many allocations const GC_THRESHOLD: usize = 1024; /// Check if we should collect fn maybe_gc() { OBJECT_COUNT.with(|count| { if *count.borrow() >= GC_THRESHOLD { gc_collect(); } }); } /// Modified alloc to trigger GC pub fn alloc(size: usize, obj_type: ObjType) -> *mut u8 { maybe_gc(); // Check before allocating // ... rest of alloc code ... } }
Chapter 10.5: Step-by-Step Mark-Sweep Walkthrough
Let's trace through a complete GC cycle with a concrete example:
Setup: A Running Program
fun main() {
var a = "hello"; // Object A: String "hello"
var b = 42; // Not an object (number is inline)
var c = a; // c points to same string as a
// ... some code that creates garbage ...
var temp = "garbage"; // Object G: String "garbage"
temp = nil; // G is now garbage!
// GC RUNS HERE
print a; // Should still work
print c; // Should still work
}
State Before GC
Shadow Stack:
┌─────────────────────────┐
│ Frame for main() │
│ root[0]: 0x1000 (a) │
│ root[1]: null (b) │
│ root[2]: 0x1000 (c) │
└─────────────────────────┘
Heap Objects:
ALL_OBJECTS → 0x2000 (G) → 0x1000 (A) → None
Object A (at 0x1000):
header: { marked: false, type: String, size: 5, next: None }
data: "hello"
Object G (at 0x2000):
header: { marked: false, type: String, size: 7, next: 0x1000 }
data: "garbage"
Mark Phase: Step by Step
Step 1: Walk to frame (main)
Step 2: Process root[0] = 0x1000 (a)
- Get header at 0x1000 - 16 = 0x0FF0
- marked = false → mark it!
- Type is String, no references to mark
Result: Object A is now marked
Step 3: Process root[1] = null
- Skip null values
Result: No action
Step 4: Process root[2] = 0x1000 (c)
- Get header at 0x1000 - 16 = 0x0FF0
- marked = true → already marked, skip
Result: Already marked, no action
Step 5: No more frames
Mark phase complete!
State After Mark Phase
Shadow Stack: (unchanged)
┌─────────────────────────┐
│ Frame for main() │
│ root[0]: 0x1000 │
│ root[1]: null │
│ root[2]: 0x1000 │
└─────────────────────────┘
Heap Objects:
ALL_OBJECTS → 0x2000 (G) → 0x1000 (A) → None
Object A (at 0x1000):
header: { marked: TRUE, type: String, size: 5, next: None }
data: "hello"
Object G (at 0x2000):
header: { marked: FALSE, type: String, size: 7, next: 0x1000 }
data: "garbage"
Sweep Phase: Step by Step
Step 1: Walk to object 0x2000 (G)
- marked = false → FREE IT!
- Free 23 bytes (header + 7 data)
- freed_count = 1
Step 2: Walk to object 0x1000 (A)
- marked = true → KEEP IT
- Clear mark for next cycle
- marked = false
- Add to keep list
Step 3: Rebuild linked list
- keep = [0x1000]
- ALL_OBJECTS = 0x1000
- 0x1000.next = None
Result: 1 object freed, 1 remaining
State After Sweep Phase
Shadow Stack: (unchanged)
Heap Objects:
ALL_OBJECTS → 0x1000 (A) → None
Object A (at 0x1000):
header: { marked: false, type: String, size: 5, next: None }
data: "hello"
Object G: FREED! Memory returned to system.
Chapter 10.6: Common Mistakes and Debugging
Mistake 1: Forgetting to Register a Root
Code:
#![allow(unused)] fn main() { fn broken_function() { let obj = alloc(16, ObjType::String); // Oops! Forgot gc_set_root(0, obj) gc_collect(); // BAD! obj might be freed! // Now obj points to freed memory use_object(obj); // CRASH or corruption! } }
Fix:
#![allow(unused)] fn main() { fn correct_function() { gc_push_frame(1); // One root slot let obj = alloc(16, ObjType::String); gc_set_root(0, obj); // Register as root! gc_collect(); // Safe! obj is protected // ... use obj ... gc_pop_frame(); // Clean up } }
Mistake 2: Push/Pop Mismatch
Code:
#![allow(unused)] fn main() { fn broken() { gc_push_frame(2); // Oops! Forgot gc_pop_frame() } // Frame leaks! fn next_call() { gc_push_frame(1); // Creates NEW frame // But old frame is still in the list! // GC will mark stale pointers } }
Fix: Always use RAII pattern or ensure pop on all paths:
#![allow(unused)] fn main() { struct FrameGuard; impl FrameGuard { fn new(root_count: usize) -> Self { unsafe { gc_push_frame(root_count) }; FrameGuard } } impl Drop for FrameGuard { fn drop(&mut self) { unsafe { gc_pop_frame() }; } } fn correct() { let _guard = FrameGuard::new(2); // gc_pop_frame called automatically when _guard drops // Even if we panic or return early! } }
Mistake 3: Circular References Not Handled
Actually this IS handled! The marked flag prevents infinite loops:
Object A references Object B
Object B references Object A
mark(A):
A.marked = true
mark(B) // A's reference
B.marked = true
mark(A) // B's reference
A.marked = true → SKIP (already marked)
return
return
No infinite loop!
Debugging Tips
Add these diagnostic functions:
#![allow(unused)] fn main() { /// Check if the GC state is consistent pub fn gc_validate() { // Check shadow stack SHADOW_STACK_HEAD.with(|head| { let mut current = *head.borrow(); let mut frame_count = 0; while let Some(frame_ptr) = current { frame_count += 1; if frame_count > 1000 { panic!("Shadow stack too deep! Infinite loop?"); } current = unsafe { (*frame_ptr).next }; } }); // Check object list ALL_OBJECTS.with(|objects| { let mut current = *objects.borrow(); let mut obj_count = 0; while let Some(header_ptr) = current { obj_count += 1; if obj_count > 100000 { panic!("Object list too long! Infinite loop?"); } current = unsafe { (*header_ptr).next }; } // Verify count matches OBJECT_COUNT.with(|count| { assert_eq!(*count.borrow(), obj_count, "Object count mismatch!"); }); }); } }
Chapter 10.7: Practice Exercises
Exercise 1: Trace the Mark Phase
Given this state:
Shadow Stack:
Frame(main): roots = [0x1000, null, 0x2000]
Frame(helper): roots = [0x3000]
Heap:
0x1000: String "a" (not marked)
0x2000: String "b" (not marked)
0x3000: String "c" (not marked)
0x4000: String "garbage" (not marked)
Which objects get marked?
Click to reveal answer
- 0x1000: Marked (root in main frame, slot 0)
- 0x2000: Marked (root in main frame, slot 2)
- 0x3000: Marked (root in helper frame, slot 0)
- 0x4000: NOT marked (no root points to it) → will be freed
The mark phase walks all frames and marks every non-null root.
Exercise 2: Implement a Stack Frame Guard
Write a Rust struct that automatically pops the frame when it goes out of scope:
#![allow(unused)] fn main() { // TODO: Implement this struct StackFrameGuard { // What fields? } impl StackFrameGuard { fn new(root_count: usize) -> Self { // Push frame, return guard } fn set_root(&self, index: usize, value: *mut u8) { // Set a root } } impl Drop for StackFrameGuard { fn drop(&mut self) { // Pop frame } } }
Click to reveal answer
#![allow(unused)] fn main() { struct StackFrameGuard { root_count: usize, } impl StackFrameGuard { fn new(root_count: usize) -> Self { unsafe { gc_push_frame(root_count) }; Self { root_count } } fn set_root(&self, index: usize, value: *mut u8) { assert!(index < self.root_count); unsafe { gc_set_root(index, value) }; } } impl Drop for StackFrameGuard { fn drop(&mut self) { unsafe { gc_pop_frame() }; } } // Usage: fn example() { let frame = StackFrameGuard::new(2); let obj = alloc(16, ObjType::String); frame.set_root(0, obj); // ... use obj ... // frame automatically pops when function returns } }
Exercise 3: Why Not Mark During Sweep?
Why do we separate mark and sweep into two phases? Why not free objects as soon as we find they're unreachable?
Click to reveal answer
We need to mark ALL live objects before freeing ANY.
Consider:
Object A references Object B
We're walking the heap in order: A, then B
If we freed A immediately when we saw it was unmarked:
- We'd lose the reference to B
- B would become unreachable
- But B might be reachable from a root!
By marking first, we ensure we've found ALL live objects.
Then sweep can safely free everything else.
Two-phase approach ensures correctness.
Checkpoint 2
We now have a complete garbage collector:
- Object allocation with headers
- Shadow stack for tracking roots
- Mark phase (walk roots, recursively mark)
- Sweep phase (free unmarked objects)
- Automatic collection trigger
- Step-by-step walkthrough
- Common mistakes and debugging
- Practice exercises
Files created so far:
mlir-lox-guide-rust-part2.md- Concepts + allocationmlir-lox-guide-rust-part3.md- Roots + full GC (this file)
Next: Integration with MLIR code generation