Part 5: Running in Production — Logging, Metrics, and Not Breaking at 3 AM

You’ve built a proxy that load-balances, filters requests, and handles TLS. It works great on your laptop. But your laptop isn’t production. In production, things are different: the process needs to run in the background, it needs to survive machine restarts, and — the hard one — it needs to update without dropping connections.

Let’s talk about the operations side of running a Pingora proxy.

The bridge from Part 4 to here is one line in main():

#![allow(unused)]
fn main() {
// Before (Parts 1-4)
let mut server = Server::new(None).unwrap();

// After (this part)
let opt = Some(Opt::parse_args());          // add: CLI + config file support
let mut server = Server::new(opt).unwrap(); // change: pass opt instead of None
}

One line added, one line changed. That single change unlocks config files, daemon mode, CLI flags, and zero-downtime upgrades. Everything else in this chapter uses what that one line enables.

Configuration Files

So far, we’ve hardcoded everything: listen addresses, upstream backends, TLS certificate paths. That works for a tutorial. It doesn’t work when you need different settings per environment (dev, staging, production) or when you want to change settings without recompiling.

Pingora uses YAML configuration files. Create a file called conf.yaml:

---
version: 1
threads: 4
pid_file: /tmp/load_balancer.pid
upgrade_sock: /tmp/load_balancer.sock
error_log: /tmp/load_balancer_err.log

Then pass it to your server:

cargo run -- -c conf.yaml

The version: 1 is required — it tells Pingora which config format to expect. The other settings:

Setting	What It Does
`threads`	Number of worker threads per service. Default is 1. Production typically uses 2× CPU cores.
`pid_file`	Where to write the process ID. Essential for scripting and monitoring.
`upgrade_sock`	Unix socket for graceful upgrades (we’ll get to this).
`error_log`	Where to write errors. If not set, goes to stderr.
`daemon`	Run in the background. Default: false.
`user` / `group`	Drop privileges to this user/group after startup. Run as root to bind port 443, then drop to unprivileged user.

Any setting you don’t include uses its default. And here’s a nice detail: unknown settings are ignored, not rejected. This means you can add your own custom settings to the same file and read them in your code. Pingora won’t complain.

Reading Custom Settings

Want to put your upstream backends in the config file instead of hardcoding them? The Server object gives you access to the raw configuration:

#![allow(unused)]
fn main() {
use pingora::server::Server;

let mut server = Server::new(Some(Opt::parse_args())).unwrap();
server.bootstrap();

// Access the raw config via server.configuration
// Custom settings are preserved and accessible
}

The exact API for reading custom fields depends on your Pingora version. The key insight is: the config file is your config file too. Pingora uses what it understands and passes the rest through.

Command-Line Arguments

Even without a config file, Pingora’s Server gives you command-line argument parsing for free. Change your main():

#![allow(unused)]
fn main() {
// Before: no CLI args
let mut server = Server::new(None).unwrap();

// After: Pingora handles CLI parsing
let mut server = Server::new(Some(Opt::parse_args())).unwrap();
}

Now your binary supports these flags:

Flag	Effect
`-d` / `--daemon`	Run in the background
`-c` / `--conf`	Path to config file
`-u` / `--upgrade`	Graceful upgrade mode (more on this below)
`-t` / `--test`	Test the config and exit

This is free functionality. You don’t write the arg parser, you don’t handle the flags. Pingora does it.

Running as a Daemon

With --daemon (or daemon: true in the config), the process forks into the background. A few things to know:

The pid_file becomes essential. You need to know the PID to send signals. Check it with cat /tmp/load_balancer.pid.
Privilege dropping happens automatically. If you set user and group in the config, Pingora starts as root (to bind privileged ports like 443), loads certificates and keys, then drops to the unprivileged user before accepting connections. This is the correct pattern: do privileged things early, then run unprivileged.
Forking means threads don’t survive. The daemon fork happens inside run_forever(). If you spawn threads before that call, they’ll be lost in the fork. Do your setup, but don’t start background threads until after bootstrap().

Signals: How to Stop and Restart

Pingora listens for three signals, each with different behavior:

SIGINT (Ctrl+C): Fast Shutdown

The process exits immediately. All in-flight requests are dropped. This is the “something is very wrong, kill it now” option.

kill -INT $(cat /tmp/load_balancer.pid)

SIGTERM: Graceful Shutdown

The process stops accepting new connections, waits for in-flight requests to finish, then exits. This is the “I want to stop, but I don’t want to break anything” option.

kill -TERM $(cat /tmp/load_balancer.pid)

How long does it wait? By default, a few seconds. You can configure the grace period in your code or via the config file.

SIGQUIT: Graceful Upgrade

This is the interesting one. SIGQUIT triggers a graceful shutdown and transfers the listening sockets to a new instance. We’ll cover this in detail next.

Graceful Upgrades: Zero-Downtime Deployment

Here’s the problem: you found a bug in your proxy code. You fixed it, recompiled, and now you want to deploy the new binary. The naive approach:

Stop the old binary → connections drop → errors
Start the new binary → it binds the port → traffic resumes

During step 1, any request in flight gets an error. Clients see 502s or connection refused. For a proxy handling millions of requests, even a few seconds of errors is unacceptable.

Pingora solves this with graceful upgrades. The mechanism works like this:

New Instance (PID 5678)           Old Instance (PID 1234)
        │                               │
        │  Start with -u flag            │
        │  → create upgrade socket        │
        │  → wait for FDs                 │
        │                               │
        │                       SIGQUIT received
        │                       → connect to upgrade socket
        │◄──────────────────────────────│
        │                               │
        │  (receives listening FDs,       │  (finishes in-flight
        │   accepts new connections       │   requests, then exits)
        │   immediately)                  │
        │                               │
        │  handles all traffic            ✗ exits

How to Do It

Step by step:

1. Configure the upgrade socket. Both instances need to agree on where to transfer the sockets. This goes in conf.yaml:

upgrade_sock: /tmp/load_balancer.sock

2. Start the new instance in upgrade mode:

cargo run -- -c conf.yaml -d -u

The -u flag tells the new instance: “don’t try to bind the ports yourself. Instead, wait to acquire the listening sockets from the old instance.” The new process creates the upgrade socket and listens on it, waiting for the old process to connect.

3. Send SIGQUIT to the old instance:

kill -QUIT $(cat /tmp/load_balancer.pid)

4. What happens next:

The old instance receives SIGQUIT and connects to the upgrade socket on the new instance
It transfers its listening sockets and enters graceful shutdown
The new instance receives the sockets and starts accepting connections immediately
The old instance enters graceful shutdown: it finishes in-flight requests, then exits
The new instance handles all new traffic

From a client’s perspective, the proxy never stopped. The listening socket was never closed. There was no gap where connections would be refused.

The Guarantee

Pingora’s graceful upgrade guarantees two things:

No connection refused. Every request is handled by either the old instance or the new one. The listening socket transfers atomically.
No terminated requests. Any request that can finish within the grace period is allowed to complete. The old instance doesn’t kill in-flight work.

These are strong guarantees. They’re why Cloudflare can deploy new versions of their proxy infrastructure without affecting the 40M+ requests per second flowing through it.

One-Liner Upgrade

In practice, the new instance needs to be running before the old one sends its sockets. The order matters:

# Start the new instance first — it listens on the upgrade socket
RUST_LOG=INFO cargo run -- -c conf.yaml -d -u && \
kill -QUIT $(cat /tmp/load_balancer.pid)

With -d (daemon mode), the process forks into the background and the command returns. Then we send SIGQUIT to the old process, which connects to the upgrade socket and transfers its listening FDs. This command only works in daemon mode — without -d, cargo run blocks the terminal and the kill -QUIT never runs. Without daemon mode, you’d need two terminal sessions: one running the new instance, one sending the signal.

Why this order? The new process creates the upgrade socket and listens on it. When the old process receives SIGQUIT, it connects to that socket and sends its file descriptors. If you signal the old process first, it tries to connect to the upgrade socket before the new process has created it — the old process will retry for a few seconds (Pingora has built-in retry logic), but starting the new process first is more reliable.

The Code

The code changes for production are minimal — mostly it’s about using the APIs we’ve been ignoring. Here’s our load balancer with config file support, CLI args, and graceful upgrade readiness:

use async_trait::async_trait;
use pingora::prelude::*;
use pingora::proxy::{ProxyHttp, Session};
use pingora::upstreams::peer::HttpPeer;
use pingora::lb::{LoadBalancer, selection::RoundRobin, health_check::TcpHealthCheck};
use pingora::server::configuration::Opt;
use std::sync::Arc;

pub struct LB(Arc<LoadBalancer<RoundRobin>>);

#[async_trait]
impl ProxyHttp for LB {
    type CTX = ();
    fn new_ctx(&self) -> Self::CTX {}

    async fn upstream_peer(
        &self,
        _session: &mut Session,
        _ctx: &mut Self::CTX,
    ) -> Result<Box<HttpPeer>> {
        let upstream = self.0
            .select(b"", 256)
            .ok_or_else(|| Error::new_str("no healthy upstream available"))?;
        let peer = Box::new(HttpPeer::new(
            upstream,
            true,
            "one.one.one.one".to_string(),
        ));
        Ok(peer)
    }

    async fn upstream_request_filter(
        &self,
        _session: &mut Session,
        upstream_request: &mut pingora::http::RequestHeader,
        _ctx: &mut Self::CTX,
    ) -> Result<()> {
        upstream_request.insert_header("Host", "one.one.one.one")?;
        Ok(())
    }
}

fn main() {
    // Parse CLI args — gives us -c, -d, -u, -t for free
    let opt = Some(Opt::parse_args());
    let mut server = Server::new(opt).unwrap();
    server.bootstrap();

    let mut upstreams = LoadBalancer::try_from_iter(["1.1.1.1:443", "1.0.0.1:443"]).unwrap();

    // Health checks — detect and skip broken backends
    let hc = TcpHealthCheck::new();
    upstreams.set_health_check(hc);
    upstreams.health_check_frequency = Some(std::time::Duration::from_secs(10));

    let background = background_service("health check", upstreams);
    let upstreams = background.task();

    let lb = LB(upstreams);
    let mut service = http_proxy_service(&server.configuration, lb);
    service.add_tcp("0.0.0.0:6188");

    server.add_service(background);
    server.add_service(service);

    // run_forever() handles:
    // - daemonization (if -d or daemon: true)
    // - signal handling (SIGINT, SIGTERM, SIGQUIT)
    // - graceful upgrade socket transfer (if -u)
    server.run_forever();
}

The key change from earlier parts: Server::new(Some(Opt::parse_args())). That one change gives you config file support, daemonization, CLI args, and graceful upgrade capability. Everything else — the proxy logic, the load balancing, the health checks — is the same.

Running in Production

Here’s a typical production workflow:

Start the proxy as a daemon:

RUST_LOG=INFO cargo run --release -- -c conf.yaml -d

Check it’s running:

cat /tmp/load_balancer.pid
curl http://localhost:6188 -svo /dev/null

Deploy a new version (zero downtime):

# Rebuild with your changes
cargo build --release

# Start the new process FIRST — it creates the upgrade socket
# Then signal the old process to transfer its sockets
RUST_LOG=INFO ./target/release/part-05-production -c conf.yaml -d -u && \
kill -QUIT $(cat /tmp/load_balancer.pid)

Stop it gracefully (no new connections, finish in-flight):

kill -TERM $(cat /tmp/load_balancer.pid)

Emergency stop (drop everything):

kill -INT $(cat /tmp/load_balancer.pid)

Systemd Integration

For production, you’ll likely run under systemd. Here’s a minimal service file:

[Unit]
Description=Pingora Load Balancer
After=network.target

[Service]
Type=forking
PIDFile=/tmp/load_balancer.pid
ExecStart=/usr/local/bin/load_balancer -c /etc/load_balancer/conf.yaml -d
ExecReload=/bin/sh -c '/usr/local/bin/load_balancer -c /etc/load_balancer/conf.yaml -d -u && kill -QUIT $(cat /tmp/load_balancer.pid)'
KillSignal=SIGTERM
TimeoutStopSec=30

[Install]
WantedBy=multi-user.target

The ExecReload line does exactly what the one-liner above does: starts the new process in upgrade mode (which creates the upgrade socket and waits), then sends SIGQUIT to the old process. The old process connects to the upgrade socket, transfers its listening file descriptors, and enters graceful shutdown.

This works because ExecReload runs in a shell — we can chain the new process startup with the signal in one command. With -d (daemon mode), the new process forks to the background immediately, so kill -QUIT runs right after. The old process receives SIGQUIT, transfers its sockets, and exits. The new process starts accepting connections with no gap.

The one-liner from the section above and this systemd reload do the same thing. The unit file automates the same operation: systemctl reload load_balancer.

⚠️ systemctl restart is NOT a graceful upgrade. systemctl restart sends SIGTERM (stop) then starts a new process — there’s a gap between the old process stopping and the new process binding the port. During that gap, connections are refused. Use systemctl reload for zero-downtime deployment. Use systemctl restart only when you want a full stop-and-start (e.g., after a config change that can’t be picked up by reload).

The Type=forking tells systemd that the process will daemonize. The PIDFile lets systemd track the daemon’s PID.

What We’re Simplifying

A few things this part doesn’t cover in depth:

Observability. Pingora has built-in Prometheus metrics. Add a Prometheus service alongside your proxy and you get request counts, error rates, latency histograms for free. We showed this briefly in Part 3’s logging phase. For production, you want dashboards and alerts.

Hot config reload. Pingora reads the config file at startup. Changing the config requires a restart (graceful or otherwise). For dynamic configuration — like adding backends without restarting — you’d maintain an in-memory data structure and update it through your own mechanism (a config service, a file watcher, etc.).

Multiple services. A single Pingora Server can host multiple Service instances — different proxies on different ports, a metrics endpoint, an admin API. Each service has its own listeners and proxy logic.

What You’ve Built

Across all five parts, you’ve built a production-ready reverse proxy:

Part	What You Added
1	A working reverse proxy
2	Load balancing and health checks
3	Request filtering, response modification, per-request state
4	TLS termination and certificate verification
5	Config files, daemonization, zero-downtime upgrades

That’s a real proxy. Not a toy — the same framework powers 40M+ requests per second at Cloudflare. The APIs you’ve learned are the ones they use.

Where to Go Next

The Pingora ecosystem has more to explore:

Caching — pingora-cache provides HTTP caching with cache-control, varying, and purge support
Rate limiting — pingora-load-balancing includes rate limiter utilities
Custom protocols — Pingora does more than HTTP. You can build TCP proxies, tunneling services, or custom protocols on the same framework
Connection pooling — Pingora reuses upstream connections automatically. The pooling behavior is configurable per-peer.

The Pingora GitHub repository has examples for all of these. The user guide covers the internals in more depth than we did here.

The hardest part of building a proxy isn’t the code — it’s the operational concerns. Handling slow clients, backpressure, connection limits, retry storms, and the long tail of edge cases that only show up at scale. Pingora handles most of these for you. Your job is to configure it correctly and write the proxy logic that makes sense for your use case.

Keyboard shortcuts