Part 5: Running in Production — Logging, Metrics, and Not Breaking at 3 AM
You’ve built a proxy that load-balances, filters requests, and handles TLS. It works great on your laptop. But your laptop isn’t production. In production, things are different: the process needs to run in the background, it needs to survive machine restarts, and — the hard one — it needs to update without dropping connections.
Let’s talk about the operations side of running a Pingora proxy.
The bridge from Part 4 to here is one line in main():
#![allow(unused)]
fn main() {
// Before (Parts 1-4)
let mut server = Server::new(None).unwrap();
// After (this part)
let opt = Some(Opt::parse_args()); // add: CLI + config file support
let mut server = Server::new(opt).unwrap(); // change: pass opt instead of None
}
One line added, one line changed. That single change unlocks config files, daemon mode, CLI flags, and zero-downtime upgrades. Everything else in this chapter uses what that one line enables.
Configuration Files
So far, we’ve hardcoded everything: listen addresses, upstream backends, TLS certificate paths. That works for a tutorial. It doesn’t work when you need different settings per environment (dev, staging, production) or when you want to change settings without recompiling.
Pingora uses YAML configuration files. Create a file called conf.yaml:
---
version: 1
threads: 4
pid_file: /tmp/load_balancer.pid
upgrade_sock: /tmp/load_balancer.sock
error_log: /tmp/load_balancer_err.log
Then pass it to your server:
cargo run -- -c conf.yaml
The version: 1 is required — it tells Pingora which config format to expect. The other settings:
| Setting | What It Does |
|---|---|
threads | Number of worker threads per service. Default is 1. Production typically uses 2× CPU cores. |
pid_file | Where to write the process ID. Essential for scripting and monitoring. |
upgrade_sock | Unix socket for graceful upgrades (we’ll get to this). |
error_log | Where to write errors. If not set, goes to stderr. |
daemon | Run in the background. Default: false. |
user / group | Drop privileges to this user/group after startup. Run as root to bind port 443, then drop to unprivileged user. |
Any setting you don’t include uses its default. And here’s a nice detail: unknown settings are ignored, not rejected. This means you can add your own custom settings to the same file and read them in your code. Pingora won’t complain.
Reading Custom Settings
Want to put your upstream backends in the config file instead of hardcoding them? The Server object gives you access to the raw configuration:
#![allow(unused)]
fn main() {
use pingora::server::Server;
let mut server = Server::new(Some(Opt::parse_args())).unwrap();
server.bootstrap();
// Access the raw config via server.configuration
// Custom settings are preserved and accessible
}
The exact API for reading custom fields depends on your Pingora version. The key insight is: the config file is your config file too. Pingora uses what it understands and passes the rest through.
Command-Line Arguments
Even without a config file, Pingora’s Server gives you command-line argument parsing for free. Change your main():
#![allow(unused)]
fn main() {
// Before: no CLI args
let mut server = Server::new(None).unwrap();
// After: Pingora handles CLI parsing
let mut server = Server::new(Some(Opt::parse_args())).unwrap();
}
Now your binary supports these flags:
| Flag | Effect |
|---|---|
-d / --daemon | Run in the background |
-c / --conf | Path to config file |
-u / --upgrade | Graceful upgrade mode (more on this below) |
-t / --test | Test the config and exit |
This is free functionality. You don’t write the arg parser, you don’t handle the flags. Pingora does it.
Running as a Daemon
With --daemon (or daemon: true in the config), the process forks into the background. A few things to know:
-
The
pid_filebecomes essential. You need to know the PID to send signals. Check it withcat /tmp/load_balancer.pid. -
Privilege dropping happens automatically. If you set
userandgroupin the config, Pingora starts as root (to bind privileged ports like 443), loads certificates and keys, then drops to the unprivileged user before accepting connections. This is the correct pattern: do privileged things early, then run unprivileged. -
Forking means threads don’t survive. The daemon fork happens inside
run_forever(). If you spawn threads before that call, they’ll be lost in the fork. Do your setup, but don’t start background threads until afterbootstrap().
Signals: How to Stop and Restart
Pingora listens for three signals, each with different behavior:
SIGINT (Ctrl+C): Fast Shutdown
The process exits immediately. All in-flight requests are dropped. This is the “something is very wrong, kill it now” option.
kill -INT $(cat /tmp/load_balancer.pid)
SIGTERM: Graceful Shutdown
The process stops accepting new connections, waits for in-flight requests to finish, then exits. This is the “I want to stop, but I don’t want to break anything” option.
kill -TERM $(cat /tmp/load_balancer.pid)
How long does it wait? By default, a few seconds. You can configure the grace period in your code or via the config file.
SIGQUIT: Graceful Upgrade
This is the interesting one. SIGQUIT triggers a graceful shutdown and transfers the listening sockets to a new instance. We’ll cover this in detail next.
Graceful Upgrades: Zero-Downtime Deployment
Here’s the problem: you found a bug in your proxy code. You fixed it, recompiled, and now you want to deploy the new binary. The naive approach:
- Stop the old binary → connections drop → errors
- Start the new binary → it binds the port → traffic resumes
During step 1, any request in flight gets an error. Clients see 502s or connection refused. For a proxy handling millions of requests, even a few seconds of errors is unacceptable.
Pingora solves this with graceful upgrades. The mechanism works like this:
New Instance (PID 5678) Old Instance (PID 1234)
│ │
│ Start with -u flag │
│ → create upgrade socket │
│ → wait for FDs │
│ │
│ SIGQUIT received
│ → connect to upgrade socket
│◄──────────────────────────────│
│ │
│ (receives listening FDs, │ (finishes in-flight
│ accepts new connections │ requests, then exits)
│ immediately) │
│ │
│ handles all traffic ✗ exits
How to Do It
Step by step:
1. Configure the upgrade socket. Both instances need to agree on where to transfer the sockets. This goes in conf.yaml:
upgrade_sock: /tmp/load_balancer.sock
2. Start the new instance in upgrade mode:
cargo run -- -c conf.yaml -d -u
The -u flag tells the new instance: “don’t try to bind the ports yourself. Instead, wait to acquire the listening sockets from the old instance.” The new process creates the upgrade socket and listens on it, waiting for the old process to connect.
3. Send SIGQUIT to the old instance:
kill -QUIT $(cat /tmp/load_balancer.pid)
4. What happens next:
- The old instance receives SIGQUIT and connects to the upgrade socket on the new instance
- It transfers its listening sockets and enters graceful shutdown
- The new instance receives the sockets and starts accepting connections immediately
- The old instance enters graceful shutdown: it finishes in-flight requests, then exits
- The new instance handles all new traffic
From a client’s perspective, the proxy never stopped. The listening socket was never closed. There was no gap where connections would be refused.
The Guarantee
Pingora’s graceful upgrade guarantees two things:
-
No connection refused. Every request is handled by either the old instance or the new one. The listening socket transfers atomically.
-
No terminated requests. Any request that can finish within the grace period is allowed to complete. The old instance doesn’t kill in-flight work.
These are strong guarantees. They’re why Cloudflare can deploy new versions of their proxy infrastructure without affecting the 40M+ requests per second flowing through it.
One-Liner Upgrade
In practice, the new instance needs to be running before the old one sends its sockets. The order matters:
# Start the new instance first — it listens on the upgrade socket
RUST_LOG=INFO cargo run -- -c conf.yaml -d -u && \
kill -QUIT $(cat /tmp/load_balancer.pid)
With -d (daemon mode), the process forks into the background and the command returns. Then we send SIGQUIT to the old process, which connects to the upgrade socket and transfers its listening FDs. This command only works in daemon mode — without -d, cargo run blocks the terminal and the kill -QUIT never runs. Without daemon mode, you’d need two terminal sessions: one running the new instance, one sending the signal.
Why this order? The new process creates the upgrade socket and listens on it. When the old process receives SIGQUIT, it connects to that socket and sends its file descriptors. If you signal the old process first, it tries to connect to the upgrade socket before the new process has created it — the old process will retry for a few seconds (Pingora has built-in retry logic), but starting the new process first is more reliable.
The Code
The code changes for production are minimal — mostly it’s about using the APIs we’ve been ignoring. Here’s our load balancer with config file support, CLI args, and graceful upgrade readiness:
use async_trait::async_trait;
use pingora::prelude::*;
use pingora::proxy::{ProxyHttp, Session};
use pingora::upstreams::peer::HttpPeer;
use pingora::lb::{LoadBalancer, selection::RoundRobin, health_check::TcpHealthCheck};
use pingora::server::configuration::Opt;
use std::sync::Arc;
pub struct LB(Arc<LoadBalancer<RoundRobin>>);
#[async_trait]
impl ProxyHttp for LB {
type CTX = ();
fn new_ctx(&self) -> Self::CTX {}
async fn upstream_peer(
&self,
_session: &mut Session,
_ctx: &mut Self::CTX,
) -> Result<Box<HttpPeer>> {
let upstream = self.0
.select(b"", 256)
.ok_or_else(|| Error::new_str("no healthy upstream available"))?;
let peer = Box::new(HttpPeer::new(
upstream,
true,
"one.one.one.one".to_string(),
));
Ok(peer)
}
async fn upstream_request_filter(
&self,
_session: &mut Session,
upstream_request: &mut pingora::http::RequestHeader,
_ctx: &mut Self::CTX,
) -> Result<()> {
upstream_request.insert_header("Host", "one.one.one.one")?;
Ok(())
}
}
fn main() {
// Parse CLI args — gives us -c, -d, -u, -t for free
let opt = Some(Opt::parse_args());
let mut server = Server::new(opt).unwrap();
server.bootstrap();
let mut upstreams = LoadBalancer::try_from_iter(["1.1.1.1:443", "1.0.0.1:443"]).unwrap();
// Health checks — detect and skip broken backends
let hc = TcpHealthCheck::new();
upstreams.set_health_check(hc);
upstreams.health_check_frequency = Some(std::time::Duration::from_secs(10));
let background = background_service("health check", upstreams);
let upstreams = background.task();
let lb = LB(upstreams);
let mut service = http_proxy_service(&server.configuration, lb);
service.add_tcp("0.0.0.0:6188");
server.add_service(background);
server.add_service(service);
// run_forever() handles:
// - daemonization (if -d or daemon: true)
// - signal handling (SIGINT, SIGTERM, SIGQUIT)
// - graceful upgrade socket transfer (if -u)
server.run_forever();
}
The key change from earlier parts: Server::new(Some(Opt::parse_args())). That one change gives you config file support, daemonization, CLI args, and graceful upgrade capability. Everything else — the proxy logic, the load balancing, the health checks — is the same.
Running in Production
Here’s a typical production workflow:
Start the proxy as a daemon:
RUST_LOG=INFO cargo run --release -- -c conf.yaml -d
Check it’s running:
cat /tmp/load_balancer.pid
curl http://localhost:6188 -svo /dev/null
Deploy a new version (zero downtime):
# Rebuild with your changes
cargo build --release
# Start the new process FIRST — it creates the upgrade socket
# Then signal the old process to transfer its sockets
RUST_LOG=INFO ./target/release/part-05-production -c conf.yaml -d -u && \
kill -QUIT $(cat /tmp/load_balancer.pid)
Stop it gracefully (no new connections, finish in-flight):
kill -TERM $(cat /tmp/load_balancer.pid)
Emergency stop (drop everything):
kill -INT $(cat /tmp/load_balancer.pid)
Systemd Integration
For production, you’ll likely run under systemd. Here’s a minimal service file:
[Unit]
Description=Pingora Load Balancer
After=network.target
[Service]
Type=forking
PIDFile=/tmp/load_balancer.pid
ExecStart=/usr/local/bin/load_balancer -c /etc/load_balancer/conf.yaml -d
ExecReload=/bin/sh -c '/usr/local/bin/load_balancer -c /etc/load_balancer/conf.yaml -d -u && kill -QUIT $(cat /tmp/load_balancer.pid)'
KillSignal=SIGTERM
TimeoutStopSec=30
[Install]
WantedBy=multi-user.target
The ExecReload line does exactly what the one-liner above does: starts the new process in upgrade mode (which creates the upgrade socket and waits), then sends SIGQUIT to the old process. The old process connects to the upgrade socket, transfers its listening file descriptors, and enters graceful shutdown.
This works because ExecReload runs in a shell — we can chain the new process startup with the signal in one command. With -d (daemon mode), the new process forks to the background immediately, so kill -QUIT runs right after. The old process receives SIGQUIT, transfers its sockets, and exits. The new process starts accepting connections with no gap.
The one-liner from the section above and this systemd reload do the same thing. The unit file automates the same operation: systemctl reload load_balancer.
⚠️
systemctl restartis NOT a graceful upgrade.systemctl restartsends SIGTERM (stop) then starts a new process — there’s a gap between the old process stopping and the new process binding the port. During that gap, connections are refused. Usesystemctl reloadfor zero-downtime deployment. Usesystemctl restartonly when you want a full stop-and-start (e.g., after a config change that can’t be picked up by reload).
The Type=forking tells systemd that the process will daemonize. The PIDFile lets systemd track the daemon’s PID.
What We’re Simplifying
A few things this part doesn’t cover in depth:
Observability. Pingora has built-in Prometheus metrics. Add a Prometheus service alongside your proxy and you get request counts, error rates, latency histograms for free. We showed this briefly in Part 3’s logging phase. For production, you want dashboards and alerts.
Hot config reload. Pingora reads the config file at startup. Changing the config requires a restart (graceful or otherwise). For dynamic configuration — like adding backends without restarting — you’d maintain an in-memory data structure and update it through your own mechanism (a config service, a file watcher, etc.).
Multiple services. A single Pingora Server can host multiple Service instances — different proxies on different ports, a metrics endpoint, an admin API. Each service has its own listeners and proxy logic.
What You’ve Built
Across all five parts, you’ve built a production-ready reverse proxy:
| Part | What You Added |
|---|---|
| 1 | A working reverse proxy |
| 2 | Load balancing and health checks |
| 3 | Request filtering, response modification, per-request state |
| 4 | TLS termination and certificate verification |
| 5 | Config files, daemonization, zero-downtime upgrades |
That’s a real proxy. Not a toy — the same framework powers 40M+ requests per second at Cloudflare. The APIs you’ve learned are the ones they use.
Where to Go Next
The Pingora ecosystem has more to explore:
- Caching —
pingora-cacheprovides HTTP caching with cache-control, varying, and purge support - Rate limiting —
pingora-load-balancingincludes rate limiter utilities - Custom protocols — Pingora does more than HTTP. You can build TCP proxies, tunneling services, or custom protocols on the same framework
- Connection pooling — Pingora reuses upstream connections automatically. The pooling behavior is configurable per-peer.
The Pingora GitHub repository has examples for all of these. The user guide covers the internals in more depth than we did here.
The hardest part of building a proxy isn’t the code — it’s the operational concerns. Handling slow clients, backpressure, connection limits, retry storms, and the long tail of edge cases that only show up at scale. Pingora handles most of these for you. Your job is to configure it correctly and write the proxy logic that makes sense for your use case.