Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Part 9 — Thermal Monitoring

A CPU running hot doesn’t jump straight to throttling — there are graduated stages, and each one leaves a signal you can read.

Thermal throttling is the last resort. Before the CPU hits its critical temperature and starts skipping cycles, the kernel has already entered passive cooling — reducing the clock speed to lower heat output. If you’re watching CPU utilization stay flat while your benchmark scores drop, passive cooling is probably the cause. The good news: you can see it happening in real time.

Linux exposes thermal zone readings through sysfs. Each thermal zone corresponds to a physical temperature sensor somewhere in the system.

What thermal zones are

The kernel abstracts temperature sensors as thermal zones. A thermal zone has a type (what it measures), a temp (current temperature in milli-degrees Celsius), and a set of trip_point_* thresholds.

Common zone types on x86:

  • x86_pkg_temp — the CPU package (whole-chip temperature)
  • acpitz — ACPI thermal zone (usually near the CPU)
  • coretemp — per-core temperature (Intel)
  • nvme — NVMe drive temperature
  • tztsx — various other sensors

On ARM servers:

  • soc-thermal — the SoC temperature
  • cpu-thermal — the CPU cluster temperature

The zone naming and quantity vary by hardware. The kernel creates whatever zones the hardware exposes.

Reading thermal zones from sysfs

The sysfs layout:

/sys/class/thermal/
├── thermal_zone0/
│   ├── type            # "x86_pkg_temp", "acpitz", etc.
│   ├── temp            # current temperature in millidegrees Celsius (e.g., 72000 = 72.0°C)
│   ├── trip_point_0_temp  # threshold in m°C
│   ├── trip_point_0_type # "active", "passive", "hot", "critical"
│   └── ...
├── thermal_zone1/
│   └── ...

Each zone has multiple trip points — temperature thresholds that trigger different cooling responses. The types:

  • passive: fan speeds up (no performance impact)
  • active: stronger cooling, minor performance impact
  • hot: thermal throttle imminent
  • critical: emergency shut down if reached

Parsing a thermal zone

#![allow(unused)]
fn main() {
use std::fs;

#[derive(Debug, Clone)]
pub struct ThermalZone {
    pub name: String,          // zone name in sysfs (e.g., "thermal_zone0")
    pub zone_type: String,    // what this zone measures (e.g., "x86_pkg_temp")
    pub temp_millicelsius: i64,
    pub trip_points: Vec<TripPoint>,
}

#[derive(Debug, Clone)]
pub struct TripPoint {
    pub temp_millicelsius: i64,
    pub trip_type: String,    // "passive", "active", "hot", "critical"
}

fn read_thermal_zone(zone_path: &std::path::Path) -> std::io::Result<ThermalZone> {
    let name = zone_path
        .file_name()
        .and_then(|n| n.to_str())
        .unwrap_or("unknown")
        .to_string();

    let zone_type = fs::read_to_string(zone_path.join("type"))?.trim().to_string();
    let temp_str = fs::read_to_string(zone_path.join("temp"))?.trim().to_string();
    let temp_millicelsius: i64 = temp_str.parse().unwrap_or(0);

    let mut trip_points = Vec::new();

    // Trip points are numbered starting at 0
    let mut idx = 0;
    loop {
        let trip_type_path = zone_path.join(format!("trip_point_{}_type", idx));
        let trip_temp_path = zone_path.join(format!("trip_point_{}_temp", idx));

        if !trip_type_path.exists() {
            break;
        }

        let trip_type = fs::read_to_string(&trip_type_path)?.trim().to_string();
        let trip_temp_str = fs::read_to_string(&trip_temp_path)?.trim().to_string();
        let trip_temp: i64 = trip_temp_str.parse().unwrap_or(0);

        trip_points.push(TripPoint {
            temp_millicelsius: trip_temp,
            trip_type,
        });

        idx += 1;
    }

    Ok(ThermalZone {
        name,
        zone_type,
        temp_millicelsius,
        trip_points,
    })
}

fn read_all_thermal_zones() -> std::io::Result<Vec<ThermalZone>> {
    let thermal_path = std::path::Path::new("/sys/class/thermal");
    let entries = fs::read_dir(thermal_path)?;
    let mut zones = Vec::new();

    for entry in entries.flatten() {
        let path = entry.path();
        if path.file_name().and_then(|n| n.to_str())
            .map(|n| n.starts_with("thermal_zone"))
            .unwrap_or(false)
        {
            if let Ok(zone) = read_thermal_zone(&path) {
                zones.push(zone);
            }
        }
    }

    Ok(zones)
}
}

Computing thermal headroom

Thermal headroom is the gap between the current temperature and the critical threshold:

#![allow(unused)]
fn main() {
fn thermal_headroom(zone: &ThermalZone) -> Option<i64> {
    let critical = zone.trip_points.iter()
        .filter(|tp| tp.trip_type == "critical")
        .map(|tp| tp.temp_millicelsius)
        .min()?;

    // headroom = critical - current (both in m°C)
    Some(critical - zone.temp_millicelsius)
}
}

Headroom is your safety margin. If a workload pushes the CPU toward critical temperature, the headroom shrinks. When headroom hits zero, throttling kicks in.

Thermal headroom in degrees Celsius:

#![allow(unused)]
fn main() {
fn headroom_celsius(zone: &ThermalZone) -> Option<f64> {
    thermal_headroom(zone).map(|hm| hm as f64 / 1000.0)
}
}

Identifying the package sensor vs. per-core sensors

The most useful zone for CPU performance is the package-level zone. It’s typically x86_pkg_temp (Intel) or the ACPI zone near the CPU. Per-core zones (coretemp) are more granular but package-level is what you watch for overall thermal throttle risk.

#![allow(unused)]
fn main() {
fn find_package_zone(zones: &[ThermalZone]) -> Option<&ThermalZone> {
    // x86_pkg_temp is the canonical package-level sensor on Intel
    zones.iter()
        .find(|z| z.zone_type == "x86_pkg_temp")
        .or_else(|| zones.iter().find(|z| z.zone_type.contains("pkg")))
        .or_else(|| zones.iter().find(|z| z.zone_type == "acpitz"))
}
}

Polling interval

Thermal changes are slow. A CPU at 60°C doesn’t jump to 90°C in a second — the thermal mass is too large. A 1-second polling interval is more than enough. Even 5 seconds is fine for thermal monitoring.

The important thing is to watch for the trend, not individual readings. If the package temperature is creeping up over a 30-second window, something is building heat.

#![allow(unused)]
fn main() {
use std::time::{Duration, Instant};

async fn poll_thermal(interval: Duration) -> anyhow::Result<()> {
    loop {
        let zones = read_all_thermal_zones()?;
        let package = find_package_zone(&zones);

        if let Some(pkg) = package {
            let current = pkg.temp_millicelsius as f64 / 1000.0;
            let headroom = thermal_headroom(pkg).map(|h| h as f64 / 1000.0);

            println!(
                "package_temp={:.1}°C  headroom={:.1}°C  type={}",
                current,
                headroom.unwrap_or(-999.0),
                pkg.zone_type,
            );
        }

        tokio::time::sleep(interval).await;
    }
}
}

Cross-architecture differences

On AMD EPYC, the thermal zones may be named differently. Use ls /sys/class/thermal/ on the target system to see what’s available. The ACPI thermal zones are the most portable fallback — the ACPI spec requires them on all compliant systems.

On ARM servers, the sensor landscape is more fragmented. soc-thermal and cpu-thermal are common names. Some ARM platforms expose only one thermal zone for the whole SoC.

Thresholds at a glance

For quick reference, the thermal throttle scale (Intel desktop/server):

TemperatureWhat it means
< 70°CNormal operation, no throttling
70-85°CActive cooling engaged, performance nominal
85-95°CPassive cooling — clock speed reduced
95-100°CHot — aggressive throttling
> 100°CCritical — emergency throttle

These thresholds are approximate and vary by SKU. The trip points from sysfs are the authoritative source for your specific hardware.

Next: Part 10 — Block I/O Tracing — Trace block I/O requests and compute IOPS, throughput, and access pattern entropy.