Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Part 4 — CPU Microarchitecture Detection

The same PMC event number means different things on different CPUs. On Intel Skylake, event 0xD1 with umask 0x08 counts L1 data cache load misses. On AMD Zen 2, that same event number might not even exist. On ARM, the encoding scheme is completely different. If you hardcode event numbers, your code breaks on every machine that isn’t yours.

The fix isn’t complicated: before you open any PMC, figure out what CPU you’re running on, then pick the right event numbers for that chip. This part builds the detection layer that Parts 5 through 12 will rely on.

Reading /proc/cpuinfo

The file has one block per CPU. For our purposes, the relevant fields are the same across all cores of the same physical CPU, so we’ll read the first block:

#![allow(unused)]
fn main() {
use std::fs;

fn read_cpuinfo() -> io::Result<String> {
    fs::read_to_string("/proc/cpuinfo")
}
}

Here’s what the output looks like on an Intel Skylake desktop:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
stepping        : 3
microcode       : 0x96
cpu MHz         : 4000.000
cache size      : 8192 KB
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov

The fields we care about:

  • vendor_id: "GenuineIntel" or "AuthenticAMD" (the strings Linux uses)
  • cpu family: a decimal number, part of the CPUID family field
  • model: decimal, the CPUID model field
  • stepping: decimal, the CPUID stepping field
  • flags: a space-separated list of CPU feature flags

The family and model numbers are how we identify the microarchitecture. For Intel, family 6 means “P6 family or later” (which covers everything from Pentium Pro through modern Skylake/Ice Lake). The model field then distinguishes the specific generation.

Parsing vendor and family/model

#![allow(unused)]
fn main() {
use std::io::{self, BufRead};

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Vendor {
    Intel,
    Amd,
    Unknown,
}

#[derive(Debug, Clone)]
pub struct CpuInfo {
    pub vendor: Vendor,
    pub family: u32,
    pub model: u32,
    pub stepping: u32,
    pub flags: Vec<String>,
}

fn parse_cpuinfo(raw: &str) -> Option<CpuInfo> {
    let mut vendor = Vendor::Unknown;
    let mut family: u32 = 0;
    let mut model: u32 = 0;
    let mut stepping: u32 = 0;
    let mut flags: Vec<String> = Vec::new();

    for line in raw.lines() {
        let mut parts = line.splitn(2, ':');
        let key = parts.next()?.trim();
        let value = parts.next()?.trim();

        match key {
            "vendor_id" | "vendor" => {
                vendor = match value {
                    "GenuineIntel" => Vendor::Intel,
                    "AuthenticAMD" => Vendor::Amd,
                    _ => Vendor::Unknown,
                };
            }
            "cpu family" => {
                family = value.parse().ok()?;
            }
            "model" => {
                model = value.parse().ok()?;
            }
            "stepping" => {
                stepping = value.parse().ok()?;
            }
            "flags" => {
                flags = value.split_whitespace().map(String::from).collect();
            }
            _ => {}
        }

        // Stop after the first processor block. "processor" is the first
        // field of each CPU block in /proc/cpuinfo. When we see the second
        // "processor" line and we've already collected flags from the first
        // block, we've read everything we need (vendor, family, model,
        // stepping, and flags are the same across all cores of the same CPU).
        if key == "processor" && !flags.is_empty() {
            break;
        }
    }

    if vendor == Vendor::Unknown {
        return None;
    }

    Some(CpuInfo { vendor, family, model, stepping, flags })
}
}

Mapping to microarchitecture names

The family and model numbers combine to produce a “microarchitecture identifier.” Here’s how to build that mapping for Intel:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Microarch {
    // Intel
    Skylake,
    SkylakeSp,
    KabyLake,        // includes Coffee Lake, Comet Lake (same PMC event encodings)
    IceLake,
    IceLakeSp,
    RocketLake,
    AlderLake,
    SapphireRapids,
    // AMD
    Zen,
    ZenPlus,
    Zen2,
    Zen3,
    Zen4,
    // Generic
    Unknown,
}

pub fn detect_intel_microarch(family: u32, model: u32, stepping: u32) -> Microarch {
    // The "model" field in /proc/cpuinfo is the CPUID model value with the
    // extended model already folded in (family 6 Intel: model = ext_model << 4 | base_model).
    //
    // Why do some very different CPUs share a match arm? Because they share PMC events.
    // Kaby Lake, Coffee Lake, and Comet Lake all use Skylake-core PMCs. The perfmon
    // repo (github.com/intel/perfmon) puts them all in the /SKL/ event directory.
    // For a monitoring tool, the PMC mapping is what matters, not the marketing name.
    match (family, model) {
        // Skylake family — all use the same PMC event encodings
        (6, 85) => Microarch::SkylakeSp,                              // Skylake SP / Cascade Lake SP
        (6, 94) => Microarch::Skylake,                              // Desktop Skylake (i7-6700K)
        (6, 142 | 158) => Microarch::KabyLake,                      // Kaby Lake / Coffee Lake (same PMCs)
        (6, 165 | 166) => Microarch::KabyLake,                      // Comet Lake H/S (same PMCs as KBL)

        // Ice Lake — some events moved, L3 miss not available on all SKUs
        (6, 125 | 126) => Microarch::IceLake,                       // Ice Lake client (0x7D, 0x7E)
        (6, 106 | 108) => Microarch::IceLakeSp,                     // Ice Lake server (0x6A, 0x6C)

        // Tiger Lake — Ice Lake PMC events, plus some uncore changes
        (6, 140 | 141) => Microarch::IceLake,                      // Tiger Lake (0x8C, 0x8D; ICL PMCs)

        // Rocket Lake
        (6, 167) => Microarch::RocketLake,                          // Rocket Lake (0xA7)

        // Alder Lake — hybrid P-core (Golden Cove) + E-core (Gracemont)
        (6, 151 | 154) => Microarch::AlderLake,                     // Alder Lake desktop (0x97) / mobile (0x9A)

        // Sapphire Rapids
        (6, 143) => Microarch::SapphireRapids,                      // Sapphire Rapids (0x8F)

        _ => Microarch::Unknown,
    }
}

pub fn detect_amd_microarch(family: u32, model: u32) -> Microarch {
    // AMD family encoding: CPUID family = extended_family * 16 + base_family
    // For Zen, base_family is 15 (0xF) with extended_family = 1 → CPUID family 23 (0x17)
    // For Zen 2, AMD kept family 23 (0x17) for Matisse/Rome desktop and server parts.
    // Zen 3 and later use family 25 (0x19).
    //
    // Model numbers come from the AMD Processor Programming Reference (PPR) and
    // the /proc/cpuinfo "model" field. Server and desktop parts within the same
    // Zen generation use different model numbers but share PMC event encodings.
    match (family, model) {
        // Zen 1
        (23, 1) => Microarch::Zen,             // Naples (EPYC 1st gen)
        (23, 17) => Microarch::Zen,             // Raven Ridge (Zen 1 APU)

        // Zen+
        (23, 8) => Microarch::ZenPlus,          // Pinnacle Ridge (Ryzen 2000 desktop)
        (23, 32) => Microarch::ZenPlus,         // Colfax (Threadripper 2000)

        // Zen 2
        (23, 49) => Microarch::Zen2,            // Rome (EPYC 2nd gen, model 0x31)
        (23, 113) => Microarch::Zen2,           // Matisse (Ryzen 3000 desktop, model 0x71)

        // Zen 3
        (25, 1) => Microarch::Zen3,             // Milan (EPYC 3rd gen)
        (25, 33) => Microarch::Zen3,            // Vermeer (Ryzen 5000 desktop, model 0x21)

        // Zen 4
        (25, 17) => Microarch::Zen4,            // Genoa (EPYC 4th gen, model 0x11)
        (25, 97) => Microarch::Zen4,            // Raphael (Ryzen 7000 desktop, model 0x61)

        _ => Microarch::Unknown,
    }
}

pub fn detect_microarch(info: &CpuInfo) -> Microarch {
    match info.vendor {
        Vendor::Intel => detect_intel_microarch(info.family, info.model, info.stepping),
        Vendor::Amd => detect_amd_microarch(info.family, info.model),
        Vendor::Unknown => Microarch::Unknown,
    }
}
}

This mapping covers the major Intel and AMD microarchitectures in production. Two caveats for production use:

  1. Intel hybrid cores (Alder Lake and later): P-cores use Golden Cove PMC encodings; E-cores use Gracemont encodings. The detect_intel_microarch function returns AlderLake for both, but a production tool would need to detect which core type it’s running on (via the hybrid CPUID leaf, leaf 0x1A) and select events per-core. The Intel SDM documents both encoding sets in Volume 3B, Chapter 19.

  2. Extended model numbers: The model field in /proc/cpuinfo already folds in the extended model bits for family 6 Intel CPUs (the kernel does this for you). If you’re reading CPUID directly, you need to combine ext_model << 4 | base_model yourself.

For production code, cross-reference against the Intel SDM (Software Developer’s Manual, Volume 3B, Chapter 19) or use a maintained lookup table from a project like the Intel perfmon repository (github.com/intel/perfmon). But for a monitoring tool, the key is having some mapping, not a complete one.

Why this matters for PMC events

Here’s a concrete example. On Intel Skylake, L1 data cache load miss is:

  • Type: PERF_TYPE_RAW (4)
  • Event: 0xD1
  • Umask: 0x08

On Ice Lake, the same counter exists but some events moved. The safe approach is to enumerate available events on the target machine rather than hardcoding.

Enumerating available events with perf list

Before opening any raw event, you can ask the kernel what events are available:

perf list

This prints a categorized list of events. For raw events, look for entries under raw in the output. To narrow it down:

perf list | grep -i cache | head -20

To get only raw events with their hex encodings:

perf list | grep -i "raw" | head -20

Each raw event has a hex encoding in the output — that’s the config value to use with PERF_TYPE_RAW.

From a Rust program, you can call perf list via std::process::Command and parse the output. Or use the sysfs path directly:

#![allow(unused)]
fn main() {
use std::fs;

fn list_raw_events() -> io::Result<Vec<String>> {
    let path = "/sys/bus/event_source/devices/cpu/events";
    let mut events = Vec::new();

    if let Ok(entries) = fs::read_dir(path) {
        for entry in entries.flatten() {
            if let Ok(content) = fs::read_to_string(entry.path()) {
                // Each file is a perf event definition in the format:
                // event=0xD1\numask=0x08\n
                events.push(entry.file_name().into_string().unwrap_or_default());
            }
        }
    }

    Ok(events)
}
}

Putting it together

A helper that returns everything we need:

#![allow(unused)]
fn main() {
pub struct CpuMicroarch {
    pub cpuinfo: CpuInfo,
    pub microarch: Microarch,
    pub vendor: Vendor,
}

pub fn detect() -> io::Result<CpuMicroarch> {
    let raw = fs::read_to_string("/proc/cpuinfo")?;
    let cpuinfo = parse_cpuinfo(&raw).ok_or_else(|| {
        io::Error::new(io::ErrorKind::InvalidData, "could not parse cpuinfo")
    })?;
    let microarch = detect_microarch(&cpuinfo);
    let vendor = cpuinfo.vendor.clone();

    Ok(CpuMicroarch { cpuinfo, microarch, vendor })
}
}

Next: Part 5 — Cache and TLB Metrics from PMC — Open cache miss and TLB walk counters, compute miss rates, and select the right events for your CPU.