Part 2: Load Balancing — Picking the Right Backend When There Are Many
One backend isn’t enough. If it goes down, everything stops. If it’s slow, everyone waits. You need multiple backends, and you need to pick the right one for each request. That’s load balancing.
Your proxy will distribute requests across multiple backends using round-robin selection and automatically skip backends that are unhealthy.
The Problem with One Backend
In Part 1, every request went to 1.1.1.1. That works fine — until it doesn’t. What if 1.1.1.1 is down for maintenance? What if traffic spikes and one server can’t handle it?
We need two things:
- Multiple backends — so traffic can be spread across them
- Health checks — so we stop sending traffic to backends that are down
Round-Robin Selection
The simplest load balancing algorithm is round-robin: rotate through the backends in order. First request goes to backend A, second to B, third to A again, and so on.
Request 1 → 1.1.1.1:443
Request 2 → 1.0.0.1:443
Request 3 → 1.1.1.1:443
Request 4 → 1.0.0.1:443
...
Pingora provides this via the LoadBalancer type with a RoundRobin selection algorithm:
#![allow(unused)]
fn main() {
use pingora::lb::{LoadBalancer, selection::RoundRobin};
let upstreams = LoadBalancer::try_from_iter([
"1.1.1.1:443",
"1.0.0.1:443",
]).unwrap();
}
try_from_iter creates a load balancer from a list of backend addresses. It resolves DNS names and validates the addresses. The RoundRobin type parameter tells it which selection algorithm to use.
The Proxy with Load Balancing
Our proxy struct wraps the LoadBalancer in an Arc:
#![allow(unused)]
fn main() {
pub struct LB(Arc<LoadBalancer<RoundRobin>>);
}
Why Arc? Because the load balancer is shared between two things:
- The proxy handler (which calls
select()to pick a backend) - The health check service (which marks backends healthy/unhealthy)
Both need to see the same state. Arc gives us shared ownership without copying.
The upstream_peer method is almost the same as Part 1, except now we select from the pool:
#![allow(unused)]
fn main() {
async fn upstream_peer(
&self,
_session: &mut Session,
_ctx: &mut Self::CTX,
) -> Result<Box<HttpPeer>> {
let upstream = self.0
.select(b"", 256)
.ok_or_else(|| Error::new_str("no healthy upstream available"))?;
let peer = Box::new(HttpPeer::new(
upstream,
true,
"one.one.one.one".to_string(),
));
Ok(peer)
}
}
Two differences from Part 1:
-
self.0.select(b"", 256)instead of a hardcoded address. Theselectmethod picks a backend using the selection algorithm. The first argument (b"") is a hash key — it’s used for consistent hashing but ignored by round-robin. The second argument is the maximum number of iterations (in case many backends are unhealthy). -
.ok_or_else(|| ...)instead of.unwrap().select()returnsOption<Backend>— it returnsNoneif all backends are unhealthy. We convert that to an error that Pingora will turn into a 502 for the client.
Health Checks: Stopping Traffic to Dead Backends
Here’s the problem: if one of our three backends is down, round-robin will still try to send traffic to it. Every third request fails.
We need health checks. Pingora provides TcpHealthCheck, which periodically tries to connect to each backend:
#![allow(unused)]
fn main() {
let mut upstreams = LoadBalancer::try_from_iter([
"1.1.1.1:443",
"1.0.0.1:443",
"127.0.0.1:343", // broken — nothing listens here
]).unwrap();
let hc = TcpHealthCheck::new();
upstreams.set_health_check(hc);
upstreams.health_check_frequency = Some(std::time::Duration::from_secs(1));
}
Three things happen here:
-
Create the health check.
TcpHealthCheck::new()creates a check that attempts a TCP connection. If the connection succeeds, the backend is healthy. If it’s refused, the backend is unhealthy. -
Attach it to the load balancer.
set_health_check(hc)tells the load balancer to use this check. After this,select()will skip unhealthy backends. -
Set the frequency. We check every second. The default is 5 seconds — faster checks mean faster failover but more overhead.
Running Health Checks in the Background
Health checks need to run continuously, independently of request handling. Pingora handles this with the background service pattern:
#![allow(unused)]
fn main() {
let background = pingora::services::background::background_service(
"health_check",
upstreams,
);
let upstreams = background.task(); // shadows the earlier `upstreams` variable
}
Note: the second let upstreams shadows the first — we’re replacing the owned LoadBalancer with the shared Arc<LoadBalancer> returned by task(). Same name, but now it’s a reference the proxy can use.
This is subtle, so let’s break it down:
-
background_service("health_check", upstreams)— takes ownership of theLoadBalancerand creates a background task. The string is a label for logging — it appears in health check log messages so you can identify which service generated them. -
background.task()— returns anArc<LoadBalancer>that points to the same instance the background service is managing. This is how we share state: the health checker updates the load balancer’s internal state, and our proxy reads it, both through the sameArc. -
Register both services:
#![allow(unused)]
fn main() {
server.add_service(background); // health check runner
server.add_service(proxy_service); // our proxy
}
The background service runs in the server’s thread pool. It doesn’t block request handling.
The Complete Code
use async_trait::async_trait;
use pingora::prelude::*;
use pingora::proxy::{ProxyHttp, Session};
use pingora::upstreams::peer::HttpPeer;
use pingora::lb::{LoadBalancer, selection::RoundRobin, health_check::TcpHealthCheck};
use std::sync::Arc;
pub struct LB(Arc<LoadBalancer<RoundRobin>>);
#[async_trait]
impl ProxyHttp for LB {
type CTX = ();
fn new_ctx(&self) -> Self::CTX {}
async fn upstream_peer(
&self,
_session: &mut Session,
_ctx: &mut Self::CTX,
) -> Result<Box<HttpPeer>> {
let upstream = self.0
.select(b"", 256)
.ok_or_else(|| Error::new_str("no healthy upstream available"))?;
let peer = Box::new(HttpPeer::new(
upstream,
true,
"one.one.one.one".to_string(),
));
Ok(peer)
}
async fn upstream_request_filter(
&self,
_session: &mut Session,
upstream_request: &mut pingora::http::RequestHeader,
_ctx: &mut Self::CTX,
) -> Result<()> {
upstream_request.insert_header("Host", "one.one.one.one")?;
Ok(())
}
}
fn main() {
let mut server = Server::new(None).unwrap();
server.bootstrap();
let mut upstreams = LoadBalancer::try_from_iter([
"1.1.1.1:443",
"1.0.0.1:443",
"127.0.0.1:343",
]).unwrap();
let hc = TcpHealthCheck::new();
upstreams.set_health_check(hc);
upstreams.health_check_frequency = Some(std::time::Duration::from_secs(1));
let background = pingora::services::background::background_service(
"health_check",
upstreams,
);
let upstreams = background.task();
let mut service = http_proxy_service(&server.configuration, LB(upstreams));
service.add_tcp("0.0.0.0:6188");
server.add_service(background);
server.add_service(service);
server.run_forever();
}
Running It
cargo run
Then test with multiple requests:
for i in $(seq 1 10); do curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:6188; done
Before health checks kick in, you might see some 502s (the broken backend). After a second or two, all requests should return 200 — the health checker has detected that 127.0.0.1:343 is down and removed it from rotation.
The Background Service Pattern
This pattern — a background task that manages shared state — is worth understanding because it shows up everywhere in Pingora:
┌─────────────────────────────────────┐
│ Arc<LoadBalancer> │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Health Check │ │ Proxy (LB) │ │
│ │ Background │ │ reads via │ │
│ │ Service │ │ select() │ │
│ │ writes via │ │ │ │
│ │ health_check │ │ │ │
│ └──────────────┘ └──────────────┘ │
│ task() returns the Arc │
└─────────────────────────────────────┘
The background_service() function takes ownership of the state, runs it in the background, and gives you back an Arc via .task(). Your proxy uses that Arc. This is Pingora’s way of sharing state between services without mutex hell.
Other Selection Algorithms
Round-robin is the simplest algorithm. Pingora also provides:
| Algorithm | When to Use | How It Selects |
|---|---|---|
RoundRobin | Equal backends, no session affinity | Rotate through in order |
ConsistentHashing | Session affinity needed (sticky sessions) | Hash the key to a backend |
Random | Simple distribution, no order guarantees | Pick randomly |
For consistent hashing, the key parameter in select(key, max) matters — it determines which backend a given key maps to. Same key always maps to the same backend (until backends are added or removed).
What’s Next
This proxy can balance across multiple backends and skip unhealthy ones. But every request gets the same treatment — no matter who’s asking or what they’re asking for. In Part 3: Filters and Middleware, we’ll intercept requests, modify headers, and add custom logic that runs before and after each request.