Part 2 — Project Setup and Minimal eBPF
Before we write any instrumentation, we need a working project that compiles and runs. Aya projects have two halves: the user-space Rust program and the eBPF programs. Getting the build tooling right is the first thing people get stuck on, so let’s do it carefully.
Prerequisites
You’ll need:
# Rust stable and nightly (needed for eBPF compilation)
rustup install stable
rustup toolchain install nightly --component rust-src
# Add the eBPF target to the nightly toolchain
# (Aya compiles eBPF programs as a separate target)
rustup target add bpfel-unknown-none --toolchain nightly
# bpf-linker: compiles eBPF bytecode from Rust
cargo install bpf-linker
# bpftool: generates Rust bindings from BTF info
# On Ubuntu, install from your package manager first, or build from source:
# https://github.com/libbpf/bpftool
sudo apt install linux-tools-$(uname -r)
# cargo-generate: scaffolds the Aya template
cargo install cargo-generate
Check your kernel version — eBPF is generally well-supported on kernels 5.8+, but some features (ringbuf, BTF) work better on 5.10+:
uname -r
⚠️ One Version Trap to Watch For
Aya has two separate crates with independent version tracks: aya (user-space) and aya-ebpf (eBPF kernel programs). They don’t share a version number. When you see aya = "0.13", the companion eBPF crate might be 0.1, 0.2, or something else entirely — check crates.io to confirm the current version.
The tutorial uses the latest compatible versions. If cargo update pulls in mismatched versions, pin them explicitly in Cargo.toml.
Scaffolding the Project
The Aya team provides a template. Use it with the program type and tracepoint details pre-filled — otherwise cargo generate will prompt you interactively, which breaks the copy-paste flow:
cargo generate --name perf-monitor \
-d program_type=tracepoint \
-d tracepoint_category=sched \
-d tracepoint_name=sched_switch \
https://github.com/aya-rs/aya-template
What each argument does:
--name perf-monitor— the name of the directory and the Rust workspace. This becomes the workspace root, and the three crates get named accordingly.-d program_type=tracepoint— tells the template to generate a tracepoint program. The template supports many types (xdp,kprobe,uprobe,tracepoint, etc.). Each type changes the generated code: a tracepoint program reads from aTracePointContext, an XDP program reads from anXdpContext, and so on. We picktracepointbecause our first instrument targets a kernel tracepoint.-d tracepoint_category=sched— the tracepoint category (also called the subsystem). Kernel tracepoints are organized ascategory:name. Theschedcategory contains scheduler events:sched_switch,sched_wakeup,sched_waking, etc.-d tracepoint_name=sched_switch— the specific tracepoint event.sched_switchfires every time the kernel switches from one task to another. It’s the most fundamental scheduler tracepoint — it tells you what ran and when.https://github.com/aya-rs/aya-template— the template repository.cargo generatedownloads this, replaces placeholders with the values you passed (-dflags), and writes the result to a new directory.
The -d flags are how you answer the template’s questions ahead of time. Without them, cargo generate prompts you interactively.
Important: The template uses git dependencies by default. After generating, switch them to crates.io versions for stability. Edit Cargo.toml in the workspace root and perf-monitor-ebpf/Cargo.toml:
# Before (git — moves, may break)
aya = { git = "https://github.com/aya-rs/aya" }
# After (crates.io — stable, tested)
aya = "0.13" # resolves to 0.13.1 (0.13.2 was yanked)
aya-build = "0.1"
aya-ebpf = "0.1"
aya-log = "0.2"
aya-log-ebpf = "0.1"
Then run cargo update to resolve.
This creates a workspace with three crates:
perf-monitor/ ← workspace root (Cargo.toml at root)
├── perf-monitor/ ← user-space program (what we write)
│ ├── Cargo.toml
│ ├── build.rs
│ └── src/main.rs
├── perf-monitor-ebpf/ ← eBPF programs (compiled to BPF bytecode)
│ ├── Cargo.toml
│ ├── build.rs
│ └── src/main.rs
└── perf-monitor-common/ ← code shared between userspace and eBPF
├── Cargo.toml
└── src/lib.rs
The eBPF build is handled automatically: perf-monitor-ebpf/build.rs runs the Aya build toolchain, which compiles the eBPF programs to BPF bytecode and embeds them into OUT_DIR. One cargo build compiles both halves.
The tracepoint template scaffolds a working program already wired to sched:sched_switch. We’ll replace the generated body with our own code, but the structure — the workspace, the three crates, the build configuration — is what we need.
The Two Halves
eBPF Programs (perf-monitor-ebpf/)
The eBPF programs are written in Rust but compiled to BPF bytecode by the build.rs script. The Aya aya-ebpf crate provides the Rust API for eBPF maps, programs, and context objects — no standard library, no heap, strict verifier.
A simple tracepoint program looks like this:
#![allow(unused)]
fn main() {
// perf-monitor-ebpf/src/main.rs
#![no_std]
#![no_main]
use aya_ebpf::programs::TracePointContext;
use aya_ebpf::macros::tracepoint;
use aya_ebpf::maps::RingBuf;
use aya_ebpf::helpers::{bpf_ktime_get_ns, bpf_get_smp_processor_id};
// Events we emit to userspace
#[derive(Clone, Copy)]
#[repr(C)]
struct SchedulerEvent {
cpu_id: u32,
prev_pid: u32,
next_pid: u32,
timestamp: u64,
}
// Declare the ring buffer as a static — this is how maps work in eBPF
#[map]
static EVENTS: RingBuf = RingBuf::with_byte_size(8 * 4096, 0);
// Attach to sched:sched_switch
#[tracepoint]
pub fn sched_switch(ctx: TracePointContext) -> u32 {
// The tracepoint payload layout for sched_switch on Linux 5.x:
// (after the common tracepoint header)
// offset 0: prev_comm char[16] (TASK_COMM_LEN)
// offset 16: prev_pid u32
// offset 20: prev_prio u32
// offset 24: prev_state u64 (TASK_* state mask)
// offset 32: next_comm char[16]
// offset 48: next_pid u32
// offset 52: next_prio u32
//
// Verify on your system: cat /sys/kernel/tracing/events/sched/sched_switch/format
let prev_pid = unsafe { ctx.read_at::<u32>(16).unwrap_or(0) };
let next_pid = unsafe { ctx.read_at::<u32>(48).unwrap_or(0) };
let cpu_id = unsafe { bpf_get_smp_processor_id() };
let timestamp = unsafe { bpf_ktime_get_ns() };
let event = SchedulerEvent {
cpu_id,
prev_pid,
next_pid,
timestamp,
};
// Send to ring buffer — userspace reads from the EVENTS map
EVENTS.output(&event, 0);
0
}
}
A few things to notice here:
#![no_std] and #![no_main]: eBPF programs don’t use the standard library (no heap, no I/O) and don’t have a main function. The entry point is the function marked with #[tracepoint].
#[map]: The #[map] attribute registers the static as an eBPF map. The ring buffer is declared at the top of the file and lives for the lifetime of the program. You don’t access it through a context object.
#[tracepoint]: The eBPF macro marks the function as a tracepoint program. No arguments — the category and name are provided from userspace via program.attach(). The eBPF macros are lowercase; userspace program types are PascalCase.
ctx: TracePointContext: The context is passed by value (not &mut). The TracePointContext gives you access to the tracepoint payload via read_at::<T>(offset).
unsafe { ctx.read_at::<T>(offset) }: Reading tracepoint payload requires unsafe — the verifier can’t guarantee the memory is valid. In practice, reading from a kernel-placed tracepoint payload is safe.
bpf_get_smp_processor_id() and bpf_ktime_get_ns(): These are BPF helpers exposed through aya_ebpf::helpers. They’re available in every eBPF program.
Verifying tracepoint offsets on your kernel. The
sched_switchlayout above is correct for Linux 5.x, but kernel versions can change field sizes, add new fields, or reorder them. Before you trust any hardcoded offset, check the format file for your running kernel:cat /sys/kernel/tracing/events/sched/sched_switch/formatYou’ll see output like this (your exact offsets may differ):
name: sched_switch ID: 314 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pad; offset:4; size:4; signed:1; field:char prev_comm[16]; offset:8; size:16; signed:0; field:pid_t prev_pid; offset:24; size:4; signed:1; field:int prev_prio; offset:28; size:4; signed:1; field:long prev_state; offset:32; size:8; signed:1; field:char next_comm[16]; offset:40; size:16; signed:0; field:pid_t next_pid; offset:56; size:4; signed:1; field:int next_prio; offset:60; size:4; signed:1;The first four fields (
common_*) are the tracepoint header — 8 bytes of metadata present in every tracepoint record. Theread_atoffsets in this tutorial start from the first byte after this header, so they are 8 bytes less than the offsets shown in the format file. For example, the format file showsprev_pidat offset 24, and the correspondingread_atcall uses offset 16 (24 − 8 = 16).The key cross-check: find the field you want in the format output, read its
offsetvalue, and subtract 8 to get yourread_atoffset. If the result doesn’t match the tutorial’s code, the field layout has changed on your kernel — and you’ll read garbage unless you update the offset. This applies to every tracepoint in every part of this tutorial. When in doubt, check the format file.
User-Space Program (perf-monitor/)
The user-space program loads the compiled eBPF object, creates maps, attaches programs, and reads data from ring buffers. It runs as a normal Rust binary:
// perf-monitor/src/main.rs
use aya::programs::TracePoint;
use aya::maps::RingBuf;
use aya::Ebpf;
use std::convert::TryFrom;
#[derive(Clone, Copy, Debug)]
#[repr(C)]
struct SchedulerEvent {
cpu_id: u32,
prev_pid: u32,
next_pid: u32,
timestamp: u64,
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// The eBPF object is embedded at compile-time via OUT_DIR.
// Ebpf::load() finds it without needing a file path.
let mut ebpf = aya::Ebpf::load(aya::include_bytes_aligned!(
concat!(env!("OUT_DIR"), "/perf-monitor")
))?;
// Attach the tracepoint program
let program: &mut TracePoint = ebpf
.program_mut("perf_monitor")
.unwrap()
.try_into()?;
program.load()?;
program.attach("sched", "sched_switch")?;
// Create the ring buffer from the map named "events"
let mut ring_buf = RingBuf::try_from(ebpf.map_mut("events")?)?;
// Poll the ring buffer
loop {
tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
while let Some(item) = ring_buf.next() {
// item derefs to &[u8] — cast to our event type
let event = unsafe {
std::ptr::read_unaligned(item.as_ptr() as *const SchedulerEvent)
};
println!(
"cpu={} prev_pid={} next_pid={} ts={}",
event.cpu_id, event.prev_pid, event.next_pid, event.timestamp
);
}
}
}
Note: the program name in the template is perf_monitor (underscore, from the crate name). The map name events comes from the eBPF side — check perf-monitor-ebpf/src/main.rs for the map definition.
Running Locally
You’ll need a Linux machine with eBPF support (kernel 5.8+). If you’re developing on a VM, eBPF may or may not work depending on the hypervisor — nested virtualization support for eBPF varies. On real hardware it always works.
# Build both userspace and eBPF — build.rs compiles eBPF and embeds it
cargo build
# Run as root (eBPF programs require elevated privileges)
sudo ./target/debug/perf-monitor
The eBPF programs are compiled automatically by the build.rs script in the perf-monitor-ebpf/ crate. cargo build handles everything in one step.
Troubleshooting
Build fails with exit status around 25856 → The nightly toolchain can’t compile for bpfel-unknown-none. Run:
rustup target add bpfel-unknown-none --toolchain nightly
Build fails with “target not found” for bpfel-unknown-none → Same fix. The BPF target isn’t automatically installed when you add the nightly toolchain — you have to add it explicitly.
Permission denied when running → eBPF programs require root. Use sudo.
Project Structure for This Tutorial
For the full performance monitor, we’ll extend the scaffold with multiple eBPF programs and multiple data sources. The structure we’ll build:
perf-monitor/ ← workspace root
├── perf-monitor/ ← user-space program
│ └── src/
│ ├── main.rs ← event loop, loads programs, aggregates data
│ ├── pmc.rs ← perf_event_open wrapper, PMC event reading
│ ├── numa.rs ← procfs/sysfs readers for NUMA stats
│ ├── thermal.rs ← sysfs thermal zone reader
│ └── types.rs ← shared event structs
├── perf-monitor-ebpf/ ← eBPF programs
│ └── src/
│ ├── scheduler.rs ← sched tracepoints: switch, waking, stat_wait
│ ├── blockio.rs ← block I/O tracepoints
│ ├── vhost.rs ← kprobes on vhost/virtio ring functions
│ └── lib.rs ← map definitions, program registration
└── perf-monitor-common/ ← code shared between userspace and eBPF
└── src/lib.rs
Next: Part 3 — Hardware PMCs with perf_event_open — Open a counter, read it, and compute instructions per cycle.