MLIR for Lox: Part 3 - Finding Roots

Continuing from Part 2. We can allocate objects. Now we need to find them during GC.


Chapter 8: The Root Problem

When GC runs, it needs to know: what variables are currently holding object references?

fun example() {
    var a = "hello";    // 'a' is on the stack
    var b = 42;         // 'b' is on the stack (but it's a number, not a reference)
    var c = a;          // 'c' is on the stack, points to "hello"
    
    // GC RUNS HERE
    
    // The GC needs to know:
    //   - 'a' is a reference to a string object
    //   - 'b' is NOT a reference (just a number)
    //   - 'c' is a reference to the same string object
}

Why This Is Hard

In compiled code, a, b, and c are just bytes in memory (stack slots or registers). The GC sees:

Stack memory:
  [slot 0]: 0x7fff1234   <-- Is this a pointer? A number? Who knows!
  [slot 1]: 42           <-- This looks like a number, but could be a pointer!
  [slot 2]: 0x7fff1234   <-- Same as slot 0

The GC can't tell the difference between a pointer and an integer that happens to look like a valid address.

Two Approaches

ApproachHow It WorksProsCons
ConservativeTreat anything that LOOKS like a pointer as a pointerSimple, no compiler changesCan't move objects, may keep garbage alive
PreciseTell GC exactly where pointers areEfficient, enables moving GCCompiler must track pointer locations

We'll use precise GC because:

  1. It's more educational
  2. It's what real language implementations use
  3. It enables better collectors later

Chapter 9: The Shadow Stack

The simplest way to do precise GC: maintain a linked list of stack frames.

The Concept

┌─────────────────────────────────────────────────────────────┐
│                        SHADOW STACK                         │
│                                                             │
│   Each function call pushes a "frame" that lists:           │
│     - How many roots it has                                 │
│     - Pointers to each root                                 │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌─────────────────────┐                                   │
│   │ Frame for main()    │                                   │
│   │   roots: [a, c]     │──┐                                │
│   └─────────────────────┘  │                                │
│                            ▼                                │
│   ┌─────────────────────┐                                   │
│   │ Frame for example() │                                   │
│   │   roots: [x, y]     │──┐                                │
│   └─────────────────────┘  │                                │
│                            ▼                                │
│   ┌─────────────────────┐                                   │
│   │ Frame for helper()  │                                   │
│   │   roots: [temp]     │──► NULL                           │
│   └─────────────────────┘                                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

When GC runs:

  1. Walk the linked list of frames
  2. For each frame, mark each root
  3. Done! All reachable objects are marked

The Frame Structure

#![allow(unused)]
fn main() {
// src/runtime/shadow_stack.rs

/// A frame in the shadow stack
/// 
/// This is pushed when a function is called,
/// and popped when a function returns.
#[repr(C)]
pub struct StackFrame {
    /// Pointer to the next frame (up the call stack)
    pub next: Option<*mut StackFrame>,
    
    /// Number of roots in this frame
    pub root_count: usize,
    
    /// Array of root pointers
    /// (this is a flexible array member - actual size = root_count)
    pub roots: [*mut u8; 0],
}

impl StackFrame {
    /// Get a pointer to the roots array
    pub fn roots_ptr(&mut self) -> *mut *mut u8 {
        self.roots.as_mut_ptr()
    }
    
    /// Get a root by index
    pub fn get_root(&self, index: usize) -> *mut u8 {
        assert!(index < self.root_count);
        unsafe { *self.roots.as_ptr().add(index) }
    }
    
    /// Set a root by index
    pub fn set_root(&mut self, index: usize, value: *mut u8) {
        assert!(index < self.root_count);
        unsafe {
            *self.roots.as_mut_ptr().add(index) = value;
        }
    }
}
}

Global Stack Head

#![allow(unused)]
fn main() {
// src/runtime/shadow_stack.rs (continued)

use std::cell::RefCell;

/// The head of the shadow stack (most recent frame)
thread_local! {
    pub static SHADOW_STACK_HEAD: RefCell<Option<*mut StackFrame>> = RefCell::new(None);
}
}

Push and Pop Operations

#![allow(unused)]
fn main() {
/// Push a new frame onto the shadow stack
/// 
/// This allocates a frame with space for `root_count` roots.
/// Returns a pointer to the roots array so the compiler can fill it in.
/// 
/// # Safety
/// The caller must call `pop_frame` before the function returns.
#[no_mangle]
pub unsafe extern "C" fn gc_push_frame(root_count: usize) -> *mut *mut u8 {
    // Calculate frame size: header + root pointers
    let frame_size = std::mem::size_of::<StackFrame>() 
                   + root_count * std::mem::size_of::<*mut u8>();
    
    // Allocate the frame
    let layout = std::alloc::Layout::from_size_align(frame_size, 8).unwrap();
    let frame_ptr = std::alloc::alloc(layout) as *mut StackFrame;
    
    if frame_ptr.is_null() {
        panic!("Out of memory allocating stack frame!");
    }
    
    // Initialize the frame
    SHADOW_STACK_HEAD.with(|head| {
        (*frame_ptr).next = *head.borrow();
        (*frame_ptr).root_count = root_count;
        
        // Initialize all roots to null
        for i in 0..root_count {
            (*frame_ptr).set_root(i, std::ptr::null_mut());
        }
        
        // Update head
        *head.borrow_mut() = Some(frame_ptr);
    });
    
    // Return pointer to roots array
    (*frame_ptr).roots_ptr()
}

/// Pop a frame from the shadow stack
/// 
/// # Safety
/// Must be called exactly once for each `gc_push_frame`.
#[no_mangle]
pub unsafe extern "C" fn gc_pop_frame() {
    SHADOW_STACK_HEAD.with(|head| {
        let frame_ptr = (*head.borrow()).expect("No frame to pop!");
        
        // Calculate frame size for deallocation
        let root_count = (*frame_ptr).root_count;
        let frame_size = std::mem::size_of::<StackFrame>() 
                       + root_count * std::mem::size_of::<*mut u8>();
        
        // Update head to next frame
        *head.borrow_mut() = (*frame_ptr).next;
        
        // Free the frame
        let layout = std::alloc::Layout::from_size_align(frame_size, 8).unwrap();
        std::alloc::dealloc(frame_ptr as *mut u8, layout);
    });
}

/// Set a root in the current frame
#[no_mangle]
pub unsafe extern "C" fn gc_set_root(root_index: usize, value: *mut u8) {
    SHADOW_STACK_HEAD.with(|head| {
        let frame_ptr = (*head.borrow()).expect("No frame!");
        (*frame_ptr).set_root(root_index, value);
    });
}
}

Chapter 10: Putting It Together - The Full GC

Now we have:

  • Allocation (Part 2)
  • Shadow stack for roots (this part)

Let's write the actual gc_collect() function.

Mark Phase

#![allow(unused)]
fn main() {
// src/runtime/gc.rs (continued)

use crate::runtime::object::{ObjHeader, ObjType};
use crate::runtime::shadow_stack::{SHADOW_STACK_HEAD, StackFrame};

/// Mark all reachable objects
fn gc_mark() {
    // Walk the shadow stack
    SHADOW_STACK_HEAD.with(|head| {
        let mut current = *head.borrow();
        
        while let Some(frame_ptr) = current {
            let frame = unsafe { &*frame_ptr };
            
            // Mark each root in this frame
            for i in 0..frame.root_count {
                let root = frame.get_root(i);
                if !root.is_null() {
                    mark_object(root);
                }
            }
            
            // Move to next frame
            current = frame.next;
        }
    });
}

/// Recursively mark an object and everything it references
fn mark_object(obj_ptr: *mut u8) {
    // Get the header (before the data pointer)
    let header = unsafe { (obj_ptr as *mut ObjHeader).sub(1) };
    
    // Already marked? Skip (avoid infinite loops)
    if unsafe { (*header).marked } {
        return;
    }
    
    // Mark it
    unsafe { (*header).marked = true; }
    
    // Now mark any objects this one references
    // (depends on object type)
    mark_references(header);
}

/// Mark objects referenced by this object
fn mark_references(header: *mut ObjHeader) {
    let obj_type = unsafe { (*header).obj_type };
    let data = unsafe { (header as *mut u8).add(std::mem::size_of::<ObjHeader>()) };
    
    match obj_type {
        ObjType::Number | ObjType::String => {
            // These don't reference other objects
        }
        
        ObjType::Closure => {
            // A closure references its captured variables
            // Layout: [function_ptr][capture_count][capture_0][capture_1]...
            unsafe {
                let capture_count = *(data.add(8) as *const usize);
                let captures = data.add(16) as *const *mut u8;
                
                for i in 0..capture_count {
                    let captured = *captures.add(i);
                    if !captured.is_null() {
                        mark_object(captured);
                    }
                }
            }
        }
        
        ObjType::Instance => {
            // An instance references its field values
            // Layout: [class_ptr][field_count][field_0_key][field_0_value]...
            unsafe {
                let field_count = *(data.add(8) as *const usize);
                let fields = data.add(16) as *const *mut u8;
                
                // Each field is (key_ptr, value_ptr)
                for i in 0..field_count {
                    let value = *fields.add(i * 2 + 1);
                    if !value.is_null() {
                        mark_object(value);
                    }
                }
            }
        }
    }
}
}

Sweep Phase

#![allow(unused)]
fn main() {
/// Free all unmarked objects
fn gc_sweep() {
    use std::collections::HashSet;
    
    // We need to rebuild the object list as we sweep
    let mut new_head: Option<*mut ObjHeader> = None;
    let mut freed_count = 0;
    
    ALL_OBJECTS.with(|objects| {
        let mut current = *objects.borrow();
        let mut keep = Vec::new();
        
        while let Some(header_ptr) = current {
            let header = unsafe { &*header_ptr };
            let next = header.next;
            
            if header.marked {
                // Still alive! Clear mark for next cycle
                unsafe { (*header_ptr).marked = false; }
                keep.push(header_ptr);
            } else {
                // Dead! Free it
                let total_size = std::mem::size_of::<ObjHeader>() + header.size as usize;
                let layout = std::alloc::Layout::from_size_align(total_size, 8).unwrap();
                unsafe {
                    std::alloc::dealloc(header_ptr as *mut u8, layout);
                }
                freed_count += 1;
            }
            
            current = next;
        }
        
        // Rebuild linked list
        for (i, &header_ptr) in keep.iter().enumerate() {
            let next = if i + 1 < keep.len() {
                Some(keep[i + 1])
            } else {
                None
            };
            unsafe { (*header_ptr).next = next; }
        }
        
        *objects.borrow_mut() = keep.first().copied();
    });
    
    OBJECT_COUNT.with(|count| {
        *count.borrow_mut() -= freed_count;
    });
    
    eprintln!("GC: Freed {} objects, {} remaining", freed_count, 
              OBJECT_COUNT.with(|c| *c.borrow()));
}
}

The Main Entry Point

#![allow(unused)]
fn main() {
/// Run a garbage collection cycle
/// 
/// This is automatically called when allocation threshold is hit,
/// but can also be called manually.
#[no_mangle]
pub extern "C" fn gc_collect() {
    gc_mark();
    gc_sweep();
}
}

Chapter 11: When to Collect

We need to decide: when should gc_collect() run?

Options:

  1. Every allocation - Simple but slow
  2. Every N allocations - Simple, tunable
  3. When memory exceeds threshold - More sophisticated
  4. Never automatically - Manual only (good for debugging)

We'll use option 2 (every N allocations):

#![allow(unused)]
fn main() {
// src/runtime/gc.rs

/// Trigger collection after this many allocations
const GC_THRESHOLD: usize = 1024;

/// Check if we should collect
fn maybe_gc() {
    OBJECT_COUNT.with(|count| {
        if *count.borrow() >= GC_THRESHOLD {
            gc_collect();
        }
    });
}

/// Modified alloc to trigger GC
pub fn alloc(size: usize, obj_type: ObjType) -> *mut u8 {
    maybe_gc();  // Check before allocating
    
    // ... rest of alloc code ...
}
}

Chapter 10.5: Step-by-Step Mark-Sweep Walkthrough

Let's trace through a complete GC cycle with a concrete example:

Setup: A Running Program

fun main() {
    var a = "hello";    // Object A: String "hello"
    var b = 42;         // Not an object (number is inline)
    var c = a;          // c points to same string as a
    
    // ... some code that creates garbage ...
    var temp = "garbage";  // Object G: String "garbage"
    temp = nil;            // G is now garbage!
    
    // GC RUNS HERE
    
    print a;  // Should still work
    print c;  // Should still work
}

State Before GC

Shadow Stack:
┌─────────────────────────┐
│ Frame for main()        │
│   root[0]: 0x1000 (a)   │
│   root[1]: null (b)     │
│   root[2]: 0x1000 (c)   │
└─────────────────────────┘

Heap Objects:
ALL_OBJECTS → 0x2000 (G) → 0x1000 (A) → None

Object A (at 0x1000):
  header: { marked: false, type: String, size: 5, next: None }
  data: "hello"

Object G (at 0x2000):
  header: { marked: false, type: String, size: 7, next: 0x1000 }
  data: "garbage"

Mark Phase: Step by Step

Step 1: Walk to frame (main)
        
Step 2: Process root[0] = 0x1000 (a)
        - Get header at 0x1000 - 16 = 0x0FF0
        - marked = false → mark it!
        - Type is String, no references to mark
        Result: Object A is now marked

Step 3: Process root[1] = null
        - Skip null values
        Result: No action

Step 4: Process root[2] = 0x1000 (c)
        - Get header at 0x1000 - 16 = 0x0FF0
        - marked = true → already marked, skip
        Result: Already marked, no action

Step 5: No more frames
        Mark phase complete!

State After Mark Phase

Shadow Stack: (unchanged)
┌─────────────────────────┐
│ Frame for main()        │
│   root[0]: 0x1000       │
│   root[1]: null         │
│   root[2]: 0x1000       │
└─────────────────────────┘

Heap Objects:
ALL_OBJECTS → 0x2000 (G) → 0x1000 (A) → None

Object A (at 0x1000):
  header: { marked: TRUE, type: String, size: 5, next: None }
  data: "hello"

Object G (at 0x2000):
  header: { marked: FALSE, type: String, size: 7, next: 0x1000 }
  data: "garbage"

Sweep Phase: Step by Step

Step 1: Walk to object 0x2000 (G)
        - marked = false → FREE IT!
        - Free 23 bytes (header + 7 data)
        - freed_count = 1

Step 2: Walk to object 0x1000 (A)
        - marked = true → KEEP IT
        - Clear mark for next cycle
        - marked = false
        - Add to keep list

Step 3: Rebuild linked list
        - keep = [0x1000]
        - ALL_OBJECTS = 0x1000
        - 0x1000.next = None

Result: 1 object freed, 1 remaining

State After Sweep Phase

Shadow Stack: (unchanged)

Heap Objects:
ALL_OBJECTS → 0x1000 (A) → None

Object A (at 0x1000):
  header: { marked: false, type: String, size: 5, next: None }
  data: "hello"

Object G: FREED! Memory returned to system.

Chapter 10.6: Common Mistakes and Debugging

Mistake 1: Forgetting to Register a Root

Code:

#![allow(unused)]
fn main() {
fn broken_function() {
    let obj = alloc(16, ObjType::String);
    // Oops! Forgot gc_set_root(0, obj)
    
    gc_collect();  // BAD! obj might be freed!
    
    // Now obj points to freed memory
    use_object(obj);  // CRASH or corruption!
}
}

Fix:

#![allow(unused)]
fn main() {
fn correct_function() {
    gc_push_frame(1);  // One root slot
    let obj = alloc(16, ObjType::String);
    gc_set_root(0, obj);  // Register as root!
    
    gc_collect();  // Safe! obj is protected
    
    // ... use obj ...
    
    gc_pop_frame();  // Clean up
}
}

Mistake 2: Push/Pop Mismatch

Code:

#![allow(unused)]
fn main() {
fn broken() {
    gc_push_frame(2);
    // Oops! Forgot gc_pop_frame()
}  // Frame leaks!

fn next_call() {
    gc_push_frame(1);  // Creates NEW frame
    // But old frame is still in the list!
    // GC will mark stale pointers
}
}

Fix: Always use RAII pattern or ensure pop on all paths:

#![allow(unused)]
fn main() {
struct FrameGuard;

impl FrameGuard {
    fn new(root_count: usize) -> Self {
        unsafe { gc_push_frame(root_count) };
        FrameGuard
    }
}

impl Drop for FrameGuard {
    fn drop(&mut self) {
        unsafe { gc_pop_frame() };
    }
}

fn correct() {
    let _guard = FrameGuard::new(2);
    // gc_pop_frame called automatically when _guard drops
    // Even if we panic or return early!
}
}

Mistake 3: Circular References Not Handled

Actually this IS handled! The marked flag prevents infinite loops:

Object A references Object B
Object B references Object A

mark(A):
  A.marked = true
  mark(B)  // A's reference
    B.marked = true
    mark(A)  // B's reference
      A.marked = true → SKIP (already marked)
    return
  return

No infinite loop!

Debugging Tips

Add these diagnostic functions:

#![allow(unused)]
fn main() {
/// Check if the GC state is consistent
pub fn gc_validate() {
    // Check shadow stack
    SHADOW_STACK_HEAD.with(|head| {
        let mut current = *head.borrow();
        let mut frame_count = 0;
        
        while let Some(frame_ptr) = current {
            frame_count += 1;
            if frame_count > 1000 {
                panic!("Shadow stack too deep! Infinite loop?");
            }
            current = unsafe { (*frame_ptr).next };
        }
    });
    
    // Check object list
    ALL_OBJECTS.with(|objects| {
        let mut current = *objects.borrow();
        let mut obj_count = 0;
        
        while let Some(header_ptr) = current {
            obj_count += 1;
            if obj_count > 100000 {
                panic!("Object list too long! Infinite loop?");
            }
            current = unsafe { (*header_ptr).next };
        }
        
        // Verify count matches
        OBJECT_COUNT.with(|count| {
            assert_eq!(*count.borrow(), obj_count, "Object count mismatch!");
        });
    });
}
}

Chapter 10.7: Practice Exercises

Exercise 1: Trace the Mark Phase

Given this state:

Shadow Stack:
  Frame(main): roots = [0x1000, null, 0x2000]
  Frame(helper): roots = [0x3000]

Heap:
  0x1000: String "a" (not marked)
  0x2000: String "b" (not marked)
  0x3000: String "c" (not marked)
  0x4000: String "garbage" (not marked)

Which objects get marked?

Click to reveal answer
  • 0x1000: Marked (root in main frame, slot 0)
  • 0x2000: Marked (root in main frame, slot 2)
  • 0x3000: Marked (root in helper frame, slot 0)
  • 0x4000: NOT marked (no root points to it) → will be freed

The mark phase walks all frames and marks every non-null root.

Exercise 2: Implement a Stack Frame Guard

Write a Rust struct that automatically pops the frame when it goes out of scope:

#![allow(unused)]
fn main() {
// TODO: Implement this
struct StackFrameGuard {
    // What fields?
}

impl StackFrameGuard {
    fn new(root_count: usize) -> Self {
        // Push frame, return guard
    }
    
    fn set_root(&self, index: usize, value: *mut u8) {
        // Set a root
    }
}

impl Drop for StackFrameGuard {
    fn drop(&mut self) {
        // Pop frame
    }
}
}
Click to reveal answer
#![allow(unused)]
fn main() {
struct StackFrameGuard {
    root_count: usize,
}

impl StackFrameGuard {
    fn new(root_count: usize) -> Self {
        unsafe { gc_push_frame(root_count) };
        Self { root_count }
    }
    
    fn set_root(&self, index: usize, value: *mut u8) {
        assert!(index < self.root_count);
        unsafe { gc_set_root(index, value) };
    }
}

impl Drop for StackFrameGuard {
    fn drop(&mut self) {
        unsafe { gc_pop_frame() };
    }
}

// Usage:
fn example() {
    let frame = StackFrameGuard::new(2);
    let obj = alloc(16, ObjType::String);
    frame.set_root(0, obj);
    
    // ... use obj ...
    
    // frame automatically pops when function returns
}
}

Exercise 3: Why Not Mark During Sweep?

Why do we separate mark and sweep into two phases? Why not free objects as soon as we find they're unreachable?

Click to reveal answer

We need to mark ALL live objects before freeing ANY.

Consider:

Object A references Object B
We're walking the heap in order: A, then B

If we freed A immediately when we saw it was unmarked:
  - We'd lose the reference to B
  - B would become unreachable
  - But B might be reachable from a root!
  
By marking first, we ensure we've found ALL live objects.
Then sweep can safely free everything else.

Two-phase approach ensures correctness.


Checkpoint 2

We now have a complete garbage collector:

  • Object allocation with headers
  • Shadow stack for tracking roots
  • Mark phase (walk roots, recursively mark)
  • Sweep phase (free unmarked objects)
  • Automatic collection trigger
  • Step-by-step walkthrough
  • Common mistakes and debugging
  • Practice exercises

Files created so far:

  1. mlir-lox-guide-rust-part2.md - Concepts + allocation
  2. mlir-lox-guide-rust-part3.md - Roots + full GC (this file)

Next: Integration with MLIR code generation