MLIR for Lox: Part 5 - Closures
Closures are the most challenging part of garbage collection. They capture variables from outer scopes, and those captured variables must stay alive as long as the closure exists.
Chapter 18: The Closure Problem
Consider this Lox code:
fun makeCounter() {
var count = 0;
fun counter() {
count = count + 1; // 'count' is from makeCounter's scope!
return count;
}
return counter;
}
var c = makeCounter(); // makeCounter returns, but 'count' must live on!
print c(); // 1
print c(); // 2
print c(); // 3
The problem:
makeCounter()returns- Its stack frame is destroyed
- But
countmust still exist becausecountercaptures it! - Where does
countlive?
Stack vs Heap
┌─────────────────────────────────────────────────────────────┐
│ WRONG: count on the stack │
│ │
│ makeCounter() called │
│ ┌─────────────────────┐ │
│ │ count = 0 │ ← on the stack │
│ │ return counter │ │
│ └─────────────────────┘ │
│ ↓ │
│ makeCounter() returns │
│ ┌─────────────────────┐ │
│ │ (freed!) │ ← count is gone! │
│ └─────────────────────┘ │
│ ↓ │
│ c() is called │
│ counter tries to access count... CRASH! │
│ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ RIGHT: count on the heap (in a closure environment) │
│ │
│ makeCounter() called │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ env = alloc() │──────│ count = 0 │ │
│ │ return counter │ │ (on the heap!) │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ ↓ ↑ │
│ makeCounter() returns │ │
│ (stack frame freed) │ │
│ ↓ │ │
│ c() is called ─────────────────────┘ │
│ counter accesses count via env pointer │
│ count is still alive! │
│ │
└─────────────────────────────────────────────────────────────┘
Chapter 19: Closure Environments
A closure environment is a heap-allocated structure that holds captured variables.
Structure
Closure Environment:
┌────────────────────────────────────┐
│ Header (ObjHeader) │
│ marked: bool │
│ obj_type: ObjType::Environment │
│ size: ... │
├────────────────────────────────────┤
│ Data │
│ enclosing: *mut Env (or null) │ ← for nested closures
│ count: usize │ ← number of slots
│ slot[0]: value │
│ slot[1]: value │
│ ... │
└────────────────────────────────────┘
Closure Object
Closure Object:
┌────────────────────────────────────┐
│ Header (ObjHeader) │
│ marked: bool │
│ obj_type: ObjType::Closure │
│ size: ... │
├────────────────────────────────────┤
│ Data │
│ function: *mut Function │ ← the code to execute
│ environment: *mut Env │ ← captured variables
└────────────────────────────────────┘
Chapter 20: Implementing Environments
Let's add environment support to our runtime:
#![allow(unused)] fn main() { // src/runtime/object.rs (extended) #[repr(u8)] #[derive(Debug, Clone, Copy, PartialEq)] pub enum ObjType { Number = 0, String = 1, Environment = 2, // NEW: closure environment Closure = 3, // NEW: closure object Instance = 4, } /// An environment (holds captured variables) #[repr(C)] pub struct Environment { /// Pointer to enclosing environment (for nested closures) pub enclosing: Option<*mut Environment>, /// Number of variable slots pub slot_count: usize, /// Variable slots (flexible array member) pub slots: [*mut u8; 0], } impl Environment { /// Get a slot value pub fn get(&self, index: usize) -> *mut u8 { assert!(index < self.slot_count); unsafe { *self.slots.as_ptr().add(index) } } /// Set a slot value pub fn set(&mut self, index: usize, value: *mut u8) { assert!(index < self.slot_count); unsafe { *self.slots.as_mut_ptr().add(index) = value; } } } /// A closure (function + environment) #[repr(C)] pub struct Closure { /// Pointer to the function code (an index into our function table) pub function_index: usize, /// Pointer to the captured environment pub environment: *mut Environment, } }
Allocating an Environment
#![allow(unused)] fn main() { // src/runtime/gc.rs (extended) /// Allocate a new environment with the given slot count pub fn alloc_environment(slot_count: usize, enclosing: Option<*mut Environment>) -> *mut Environment { // Calculate size: header + environment struct + slots let env_data_size = std::mem::size_of::<Environment>() + slot_count * std::mem::size_of::<*mut u8>(); let data_ptr = alloc(env_data_size, ObjType::Environment); let env = data_ptr as *mut Environment; unsafe { (*env).enclosing = enclosing; (*env).slot_count = slot_count; // Initialize all slots to null for i in 0..slot_count { (*env).set(i, std::ptr::null_mut()); } } env } /// Allocate a new closure pub fn alloc_closure(function_index: usize, environment: *mut Environment) -> *mut Closure { let data_ptr = alloc(std::mem::size_of::<Closure>(), ObjType::Closure); let closure = data_ptr as *mut Closure; unsafe { (*closure).function_index = function_index; (*closure).environment = environment; } closure } }
Chapter 21: Marking Environments
When we mark a closure, we must also mark its environment:
#![allow(unused)] fn main() { // src/runtime/gc.rs (extended) fn mark_references(header: *mut ObjHeader) { let obj_type = unsafe { (*header).obj_type }; let data = unsafe { (header as *mut u8).add(std::mem::size_of::<ObjHeader>()) }; match obj_type { ObjType::Number | ObjType::String => { // No references } ObjType::Environment => { let env = data as *mut Environment; unsafe { // Mark enclosing environment if let Some(enclosing) = (*env).enclosing { mark_object(enclosing as *mut u8); } // Mark all slots for i in 0..(*env).slot_count { let slot = (*env).get(i); if !slot.is_null() { mark_object(slot); } } } } ObjType::Closure => { let closure = data as *mut Closure; unsafe { // Mark the environment let env = (*closure).environment; if !env.is_null() { mark_object(env as *mut u8); } } } ObjType::Instance => { // (same as before) } } } }
Chapter 22: Code Generation for Closures
Now let's generate code for closures in our compiler:
The Challenge
When compiling a closure, we need to:
- Identify captured variables - which variables from outer scopes are used?
- Allocate an environment - create heap storage for captured variables
- Store captured values - copy values into the environment
- Access via environment - when the closure reads/writes captured vars, go through the environment
Step 1: Variable Analysis
#![allow(unused)] fn main() { // src/analysis/captures.rs use crate::ast::*; /// A variable that is captured by a closure #[derive(Debug, Clone)] pub struct CapturedVar { pub name: String, pub depth: usize, // How many scopes up? (0 = immediate, 1 = one level up, etc.) pub slot_index: usize, // Index in the environment } /// Analyze a function to find captured variables pub fn find_captures(func: &FunctionStmt) -> Vec<CapturedVar> { let mut analyzer = CaptureAnalyzer::new(); analyzer.analyze_function(func); analyzer.captures } struct CaptureAnalyzer { scopes: Vec<Vec<String>>, // Stack of local variables in each scope captures: Vec<CapturedVar>, current_slot: usize, } impl CaptureAnalyzer { fn new() -> Self { Self { scopes: vec![vec![]], // Start with one scope for parameters captures: Vec::new(), current_slot: 0, } } fn analyze_function(&mut self, func: &FunctionStmt) { // Parameters are in scope 0 for param in &func.params { self.scopes[0].push(param.clone()); } // Analyze body for stmt in &func.body { self.analyze_stmt(stmt); } } fn analyze_stmt(&mut self, stmt: &Stmt) { match stmt { Stmt::Var(v) => { self.scopes.last_mut().unwrap().push(v.name.clone()); self.analyze_expr(&v.init); } Stmt::Block(b) => { self.scopes.push(vec![]); for s in &b.statements { self.analyze_stmt(s); } self.scopes.pop(); } // ... other cases ... _ => {} } } fn analyze_expr(&mut self, expr: &Expr) { match expr { Expr::Variable(v) => { // Is this variable captured? (not in current scope) if !self.is_local(&v.name) { // It's a capture! if !self.captures.iter().any(|c| c.name == v.name) { self.captures.push(CapturedVar { name: v.name.clone(), depth: self.depth_of(&v.name), slot_index: self.current_slot, }); self.current_slot += 1; } } } // ... other cases ... _ => {} } } fn is_local(&self, name: &str) -> bool { self.scopes.last().unwrap().contains(&name.to_string()) } fn depth_of(&self, name: &str) -> usize { for (depth, scope) in self.scopes.iter().rev().enumerate() { if scope.contains(&name.to_string()) { return depth; } } panic!("Variable not found: {}", name); } } }
Step 2: Environment Allocation
#![allow(unused)] fn main() { // src/codegen/generator.rs (extended) impl<'c> CodeGenerator<'c> { fn compile_function(&mut self, func: &FunctionStmt) { // ... setup ... // Find captured variables let captures = find_captures(func); // If we have captures, we need an environment if !captures.is_empty() { let env = self.alloc_environment(captures.len()); // Store captured values into the environment for capture in &captures { let value = self.get_capture_value(&capture.name, capture.depth); self.store_to_environment(env, capture.slot_index, value); } // The environment is now available for inner functions self.current_environment = Some(env); } // ... compile body ... } fn alloc_environment(&mut self, slot_count: usize) -> Value<'c> { let location = Location::unknown(self.context); // Call lox_runtime_alloc_environment(slot_count, enclosing) let slot_count_val = self.const_i64(slot_count as i64); let enclosing_val = self.current_environment .unwrap_or_else(|| self.const_null()); let call = func::call( self.context, melior::ir::attribute::FlatSymbolRefAttribute::new( self.context, "lox_runtime_alloc_environment" ), &[slot_count_val, enclosing_val], &[Type::parse(self.context, "!llvm.ptr").unwrap()], location, ); if let Some(block) = &self.current_block { block.append_operation(call.clone()); } call.result(0).unwrap().into() } fn store_to_environment(&mut self, env: Value<'c>, index: usize, value: Value<'c>) { let location = Location::unknown(self.context); // Calculate offset: sizeof(Environment) + index * sizeof(ptr) // Then store value at env + offset // For simplicity, call a runtime function let call = func::call( self.context, melior::ir::attribute::FlatSymbolRefAttribute::new( self.context, "lox_runtime_env_set" ), &[env, self.const_i64(index as i64), value], &[], location, ); if let Some(block) = &self.current_block { block.append_operation(call); } } fn load_from_environment(&mut self, env: Value<'c>, index: usize) -> Value<'c> { let location = Location::unknown(self.context); let call = func::call( self.context, melior::ir::attribute::FlatSymbolRefAttribute::new( self.context, "lox_runtime_env_get" ), &[env, self.const_i64(index as i64)], &[Type::parse(self.context, "!llvm.ptr").unwrap()], location, ); if let Some(block) = &self.current_block { block.append_operation(call.clone()); } call.result(0).unwrap().into() } } }
Chapter 23: The Complete Picture
Here's how everything connects for our makeCounter example:
Lox Source
fun makeCounter() {
var count = 0;
fun counter() {
count = count + 1;
return count;
}
return counter;
}
Generated MLIR (Simplified)
module {
// makeCounter creates an environment for 'count'
func.func @makeCounter() -> !llvm.ptr {
// Push frame with 0 roots (count goes in environment, not stack)
%frame = lox.push_frame root_count = 0 : !llvm.ptr
// Allocate environment with 1 slot
%env = call @lox_runtime_alloc_environment(1, null) : (!llvm.ptr)
// Initialize count = 0
%zero = arith.constant 0.0 : f64
call @lox_runtime_env_set(%env, 0, %zero) : (!llvm.ptr, i64, f64)
// Create closure for counter
// (counter_index = 1, env = %env)
%closure = call @lox_runtime_alloc_closure(1, %env) : (i64, !llvm.ptr)
lox.pop_frame
return %closure : !llvm.ptr
}
// counter accesses 'count' via environment
func.func @counter(%env: !llvm.ptr) -> f64 {
%frame = lox.push_frame root_count = 1 : !llvm.ptr
lox.set_root index = 0, %env : !llvm.ptr
// Load count from environment
%count = call @lox_runtime_env_get(%env, 0) : (!llvm.ptr, i64) -> f64
// count = count + 1
%one = arith.constant 1.0 : f64
%new_count = arith.addf %count, %one : f64
// Store back to environment
call @lox_runtime_env_set(%env, 0, %new_count) : (!llvm.ptr, i64, f64)
lox.pop_frame
return %new_count : f64
}
}
Memory Layout After var c = makeCounter();
Stack:
┌─────────────────┐
│ c = 0x1000 │──┐
└─────────────────┘ │
▼
Heap:
┌─────────────────────────┐ 0x1000: Closure
│ header: { Closure } │
│ function_index: 1 │
│ environment: 0x2000 ────│──┐
└─────────────────────────┘ │
▼
┌─────────────────────────┐ 0x2000: Environment
│ header: { Environment } │
│ enclosing: null │
│ slot_count: 1 │
│ slot[0]: 1.0 ───────────│──┐
└─────────────────────────┘ │
│
(count = 1.0) ◄──────────────┘
Chapter 23.5: Nested Closures Deep Dive
Let's trace through a more complex example with nested closures:
The Code
fun makeAdder(x) {
fun adder(y) {
return x + y; // Captures 'x' from makeAdder
}
return adder;
}
var add5 = makeAdder(5);
var add10 = makeAdder(10);
print add5(3); // 8
print add10(3); // 13
Step-by-Step Execution
1. makeAdder(5) is called:
Stack:
┌─────────────────────────────┐
│ Frame: makeAdder │
│ x = 5 (parameter) │
└─────────────────────────────┘
│
▼ Creates environment
┌─────────────────────────────┐
│ Environment 0x1000 │
│ slot[0]: 5 (x) │
└─────────────────────────────┘
│
▼ Creates closure
┌─────────────────────────────┐
│ Closure 0x2000 (adder) │
│ function_index: 1 │
│ environment: 0x1000 ──────│──► env with x=5
└─────────────────────────────┘
Returns closure 0x2000, assigned to add5
makeAdder stack frame is destroyed, but environment lives on!
2. makeAdder(10) is called:
Creates NEW environment 0x3000 with x=10
Creates NEW closure 0x4000 pointing to environment 0x3000
add5 → closure 0x2000 → env 0x1000 (x=5)
add10 → closure 0x4000 → env 0x3000 (x=10)
3. add5(3) is called:
Stack:
┌─────────────────────────────┐
│ Frame: adder │
│ y = 3 (parameter) │
│ env = 0x1000 (from closure)│
└─────────────────────────────┘
Load x from env[0] = 5
Compute 5 + 3 = 8
Return 8
4. add10(3) is called:
Same process, but env = 0x3000
Load x from env[0] = 10
Compute 10 + 3 = 13
Return 13
Key Insight: Each Call Creates Separate Environment
makeAdder(5):
→ Creates env {x: 5}
→ Creates closure pointing to that env
makeAdder(10):
→ Creates NEW env {x: 10}
→ Creates NEW closure pointing to NEW env
The two closures share no state!
Chapter 23.6: Multiple Captured Variables
What if we capture multiple variables?
fun makePoint(x, y) {
fun translate(dx, dy) {
// Captures both x and y
return (x + dx, y + dy);
}
return translate;
}
Environment Layout
Environment:
slot[0]: x
slot[1]: y
When translate is called:
1. Load x from env[0]
2. Load y from env[1]
3. Compute x + dx
4. Compute y + dy
Generated Code
func.func @translate(%env: !llvm.ptr, %dx: f64, %dy: f64) -> (f64, f64) {
// Load captured variables
%x = call @lox_runtime_env_get(%env, 0) : (!llvm.ptr, i64) -> f64
%y = call @lox_runtime_env_get(%env, 1) : (!llvm.ptr, i64) -> f64
// Compute
%new_x = arith.addf %x, %dx : f64
%new_y = arith.addf %y, %dy : f64
return %new_x, %new_y : f64, f64
}
Chapter 23.7: Nested Environments
What if a closure captures a variable from two scopes up?
fun outer() {
var a = 1;
fun middle() {
var b = 2;
fun inner() {
return a + b; // a from outer, b from middle
}
return inner;
}
return middle;
}
Environment Chain
outer() creates:
env_outer { a: 1 }
middle() creates:
env_middle { b: 2, enclosing: env_outer }
↑
Points to outer's environment
inner() closure:
environment: env_middle
When inner() runs:
1. Look up 'a': not in env_middle → follow enclosing → found in env_outer
2. Look up 'b': found in env_middle
Variable Lookup Algorithm
#![allow(unused)] fn main() { /// Look up a variable by traversing the environment chain fn lookup_variable(env: *mut Environment, depth: usize, slot: usize) -> *mut u8 { let mut current = env; // Walk up 'depth' levels for _ in 0..depth { current = unsafe { (*current).enclosing.unwrap() }; } // Now access the slot unsafe { (*current).get(slot) } } }
Generated MLIR
func.func @inner(%env: !llvm.ptr) -> f64 {
// Load 'b' from current environment (depth=0, slot=0)
%b = call @lox_runtime_env_get(%env, 0) : (!llvm.ptr, i64) -> f64
// Load 'a' from enclosing environment (depth=1, slot=0)
%env_outer = call @lox_runtime_env_get_enclosing(%env) : (!llvm.ptr) -> !llvm.ptr
%a = call @lox_runtime_env_get(%env_outer, 0) : (!llvm.ptr, i64) -> f64
// Compute
%sum = arith.addf %a, %b : f64
return %sum : f64
}
Chapter 23.8: Practice Exercises
Exercise 1: Trace Memory Layout
For this code:
fun factory(value) {
fun get() {
return value;
}
fun set(new_value) {
value = new_value;
}
return (get, set);
}
var (getter, setter) = factory(10);
print getter(); // 10
setter(20);
print getter(); // 20
Draw the memory layout after var (getter, setter) = factory(10);
Click to reveal answer
Stack:
┌─────────────────────────────┐
│ getter = 0x1000 (closure) │
│ setter = 0x2000 (closure) │
└─────────────────────────────┘
Heap:
0x1000: Closure (get)
function_index: @get
environment: 0x3000 ──────────┐
│
0x2000: Closure (set) │
function_index: @set │
environment: 0x3000 ──────────│── Same environment!
│
0x3000: Environment ◄───────────┘
slot[0]: 10 (value)
Key insight: Both closures share the SAME environment. That's why setter(20) affects what getter() returns!
Exercise 2: Variable Analysis
For this function, what variables are captured and where do they go?
fun outer(x, y) {
var a = x + y;
fun inner(b) {
return a + b + x; // Captures a and x
}
return inner;
}
Click to reveal answer
Captured variables:
a(local in outer) → slot 0x(parameter in outer) → slot 1
Environment for inner:
env_inner:
slot[0]: a
slot[1]: x
Note: y is NOT captured (not used by inner), so it's not in the environment.
Exercise 3: Why Not Copy Values?
Why can't we just copy captured values into the closure? Why do we need an environment?
fun example() {
var count = 0;
fun increment() {
count = count + 1; // MODIFIES count!
return count;
}
return increment;
}
Click to reveal answer
If we copied count into the closure:
- Each call to
increment()would modify its OWN copy - The outer
countwould never change - Multiple calls wouldn't accumulate
By using an environment:
- All closures share the same environment
- Modifications are visible to all
- State is properly shared
The environment is essentially a shared "box" that holds the variable.
Checkpoint 4
We've covered the hardest part: closures!
- Why captured variables can't live on the stack
- Environment structure (heap-allocated variable storage)
- Closure structure (function + environment pointer)
- Marking environments during GC
- Code generation for closures
- Nested closures deep dive
- Multiple captured variables
- Environment chains
- Practice exercises
Files created:
mlir-lox-guide-rust-part2.md- Concepts + allocationmlir-lox-guide-rust-part3.md- Roots + full GCmlir-lox-guide-rust-part4.md- MLIR integrationmlir-lox-guide-rust-part5.md- Closures (this file)
Next: Final review and complete reference