Part 1 — Three Sources of Signal

Performance data comes from three places on Linux. Understanding which source a metric comes from tells you a lot about how to capture it, what it means, and where its limits are.

Source 1: Hardware Performance Counters (PMCs)

Modern CPUs have dedicated hardware counters on-die. They’re called PMCs — Performance Monitoring Counters. They count things like:

How many times a cache line was requested and missed
How many branches executed and how many were mispredicted
How many instructions retired and how many cycles were stalled
How many TLB walks happened and how many missed the hardware TLB

PMCs are read through the perf_event_open syscall. On x86, the underlying hardware is the Performance Counters MSRs (Model-Specific Registers). On ARM, there’s the ARM PMU (Performance Monitoring Unit). The syscall abstracts this, but the available events depend on your CPU microarchitecture.

The kernel exposes a curated list of events through /dev/cpu/*/msr (requires root) and through perf list (user-accessible for most events). If you only need the universal hardware events (instructions, cycles, cache references), the perf-event crate wraps perf_event_open with a safe Rust interface. We don’t use it here because we need raw PMC events (cache misses, TLB walks, branch mispredicts) that vary by CPU microarchitecture — the crate doesn’t expose PERF_TYPE_RAW on all platforms. Part 3 shows the hand-rolled struct and the direct syscall.

Constraints:

Most PMCs require root or CAP_SYS_ADMIN
Not all events are available on all CPUs
On hypervisors, some events may not reflect guest-observed behavior accurately
Counting across multiple CPUs requires either per-CPU file descriptors or multiplexing

Source 2: Kernel Tracepoints and Kprobes

The Linux kernel emits events at interesting points in its execution. These come in two forms:

Tracepoints are stable hooks placed in the kernel by developers. They have stable names and argument formats. Examples:

sched:sched_waking — a task is about to be woken
sched:sched_switch — the scheduler switched from one task to another
block:block_bio_queue — a block I/O request was submitted
irq:softirq_entry — a softirq started executing

Kprobes are dynamic probes that can be placed at almost any kernel function entry or return. They’re less stable (function names change between kernel versions) but much more powerful — you can probe any function, not just the ones with tracepoints.

Both tracepoints and kprobes are programmable via eBPF. This is the core of what Aya lets us do in Rust — write eBPF programs that read data from these hooks and push it into maps that userspace can read.

Key maps:

BPF_MAP_TYPE_PERF_EVENT_ARRAY — ring buffer for sending structured events to userspace
BPF_MAP_TYPE_HASH — key-value store for counters and state
BPF_MAP_TYPE_ARRAY — indexed array, good for histograms
BPF_MAP_TYPE_RINGBUF — lock-free ring buffer, newer and faster than perf events

Constraints:

Kprobe function names are kernel-version specific
eBPF programs are verified — you can’t write to arbitrary memory or loop unboundedly
The eBPF VM has a 512-byte stack limit (no heap)
CO-RE (Compile Once, Run Everywhere) with BTF makes kprobes portable; without it, you need kernel headers for each target version

Source 3: Procfs and Sysfs

The kernel exposes a huge amount of state through two virtual filesystems that don’t exist on disk:

/proc/ — process and system information. Relevant files:

/proc/cpuinfo — CPU model, microarchitecture, flags
/proc/vmstat — virtual memory statistics, including NUMA page stats
/proc/schedstat — scheduler statistics per CPU
/proc/interrupts — interrupt counts per CPU
/proc/loadavg — load average

/sys/ — kernel data structures organized as a tree. Relevant paths:

/sys/devices/system/cpu/ — per-CPU attributes
/sys/class/thermal/thermal_zone*/ — thermal zones with current temperature
/sys/bus/event_source/devices/ — available perf events
/sys/kernel/mm/ — hugepages and transparent hugepage settings
/sys/class/block/ — per-block-device statistics
/sys/devices/system/node/ — NUMA node memory statistics

These are readable with standard file I/O — no root required for most files, and no eBPF required.

The Hybrid Architecture

Our monitoring system uses all three sources together. The architecture looks like this:

┌──────────────────────────────────────────────────┐
│                Userspace (monitor)              │
│                                                 │
│  ┌─────────────┐  ┌──────────────┐  ┌────────┐ │
│  │ perf_event  │  │ ring buffer  │  │procfs/ │ │
│  │ open poll   │  │ reader       │  │sysfs   │ │
│  └──────┬──────┘  └──────┬───────┘  └───┬────┘ │
│         │                │              │       │
└─────────┼────────────────┼──────────────┼───────┘
          │                │              │
   perf_event_open()   eBPF maps     file I/O
          ▲                ▲              ▲
          │                │              │
┌─────────┴────────────────┼──────────────┼───────┐
│    Linux Kernel          │              │        │
│                          │              │        │
│  ┌─────────────────┐    │              │        │
│  │ PMC counters    │    │              │        │
│  │ (perf_event_open│    │              │        │
│  │  file desc.)    │    │              │        │
│  └─────────────────┘    │              │        │
│         read via fd     │              │        │
│                          │              │        │
│  ┌─────────────────────────────────┐  │        │
│  │       eBPF programs            │  │        │
│  │  ┌──────────────────────────┐  │  │        │
│  │  │ scheduler tracepoints   │  │  │        │
│  │  │ (waking, switch)        │  │  │        │
│  │  └──────────────────────────┘  │  │        │
│  │  ┌──────────────────────────┐  │  │        │
│  │  │ block I/O, vhost, etc.  │  │  │        │
│  │  └──────────────────────────┘  │  │        │
│  │  writes to eBPF maps ──────────┘  │        │
│  └─────────────────────────────────┘          │
│                                                 │
│  ┌─────────────┐  ┌────────────────┐           │
│  │ /proc/vmstat│  │/sys/class/     │           │
│  │ /proc/stat  │  │thermal/        │           │
│  └─────────────┘  └────────────────┘           │
└─────────────────────────────────────────────────┘

PMC counters are not eBPF programs — they’re file descriptors opened via perf_event_open and read directly. eBPF programs are the ones attached to kernel tracepoints and kprobes. They write aggregated data (histograms, counters) into eBPF maps, which the ring buffer reader in userspace consumes. Procfs and sysfs are plain file reads — no special API, no eBPF.

The user-space program polls all three sources in a single event loop. The eBPF programs handle the in-kernel aggregation so we only get summaries over the ring buffer rather than a firehose of raw events.

What Aya Provides

Aya is the Rust library that ties the eBPF and userspace halves together. It handles:

Compiling eBPF programs via aya-build (runs automatically in build.rs — no separate build step)
Loading and attaching programs from Rust
Creating and populating maps (ring buffers, hashes, histograms)
Reading from maps in the user-space half

Aya is unusual among eBPF libraries because it doesn’t depend on libbpf, BCC, or a C toolchain. Everything is Rust, end to end.

Next: Part 2 — Project Setup and Minimal eBPF — Scaffold the Aya project, write a tracepoint handler, and read scheduler events in userspace.