Part 8: The Complete Proxy — Wiring Seven Parts Into One Service That Actually Works

Seven parts. Seven pieces. Each one works on its own. But a real proxy doesn’t run in pieces — it runs as one program, with load balancing and TLS and caching and rate limiting and filters, all wired together.

This part builds the complete proxy. No new concepts — the work of combining everything you’ve already learned into a single, coherent service.

What We’re Building

A production-grade reverse proxy with:

Load balancing across multiple backends with health checks (Part 2)
API key authentication as a request filter (Part 3)
Response header cleanup to hide internal details (Part 3)
Rate limiting per client IP (Part 7)
TLS on the downstream connection (Part 4)
Configuration from a YAML file (Part 5)
Caching for GET requests (Part 6) — omitted from this capstone; see note below

Each piece you’ve already built. The skill here is composition — making them work together without tangling the logic.

Why no caching? Part 6’s cache uses Pingora’s built-in session.cache API, which integrates differently from a simple struct-based approach. The cache hooks (request_cache_filter, cache_key_callback, response_cache_filter) are methods on the ProxyHttp trait, not custom structs. Adding Part 6’s caching on top of this capstone is a good exercise — start from Part 6’s working code and add the other features on top.

The Architecture

Here’s how the pieces fit into Pingora’s request lifecycle:

  Client (HTTPS) → :6188
         │
         ▼
  TLS termination              ← Part 4
         │
         ▼
  request_filter()
    ├─ Rate limit check        ← Part 7 (reject → 429)
    └─ API key authentication  ← Part 3 (reject → 401)
         │
         ▼
  upstream_peer()
    └─ Round-robin selection   ← Part 2
         │
         ▼
  upstream_request_filter()
    └─ Add Host header         ← Part 1
         │
         ▼
  Connect to upstream (TLS)    ← Part 4
         │
         ▼
  upstream_response_filter()
    └─ (cache goes here)       ← Part 6 (not wired in this capstone)
         │
         ▼
  response_filter()
    └─ Replace Server header,  ← Part 3
       strip X-Powered-By
         │
         ▼
  logging()
    └─ Record rate-limited
       and authenticated
       request stats           ← Part 7
         │
         ▼
  Client gets response

The key insight: each concern lives in exactly one phase. Rate limiting happens in request_filter because you want to reject before connecting upstream. Header cleanup happens in response_filter because you want to strip headers the client shouldn’t see. Caching would happen in upstream_response_filter because you want to cache before the cleanup strips the headers. (We omit caching here — see the note above.) The phase determines the order. The method determines the job.

Per-Request State

In Part 3, we introduced CTX — per-request state shared between phases. Now we need it for real:

#![allow(unused)]
fn main() {
pub struct ProxyCtx {
    // Rate limiting (Part 7)
    rate_limited: bool,
    client_key: String,

    // Authentication (Part 3)
    authenticated: bool,
}
}

Each field is set in one phase and read in a later phase. rate_limited is set in request_filter and read in logging. authenticated is set in request_filter — a real proxy would use it in upstream_request_filter to add user-specific headers, but our example doesn’t need that.

This is what CTX is for — passing data between phases without global mutable state.

The Proxy Struct

Our proxy holds three pieces of shared state:

#![allow(unused)]
fn main() {
pub struct FullProxy {
    upstreams: Arc<LoadBalancer<RoundRobin>>,  // Part 2
    limiter: Arc<RateLimiterRegistry>,          // Part 7
}
}

Both are wrapped in Arc because they’re shared between the request handler and background services (health checks for the load balancer).

The Request Filter: Two Gates, One Method

Both rate limiting and authentication are gatekeepers — they decide whether a request should proceed. They both belong in request_filter, and the order matters:

#![allow(unused)]
fn main() {
async fn request_filter(
    &self,
    session: &mut Session,
    ctx: &mut Self::CTX,
) -> Result<bool> {
    // Gate 1: Rate limiting (cheaper check — no DB lookup)
    let client_addr = session.client_addr();
    let key = match client_addr {
        Some(addr) => addr.to_string(),
        None => "unknown".to_string(),
    };
    ctx.client_key = key.clone();

    if !self.limiter.is_allowed(&key) {
        ctx.rate_limited = true;
        let _ = session.respond_error(429).await;
        return Ok(true);
    }

    // Gate 2: API key authentication (more expensive — header parsing)
    let has_valid_key = session
        .req_header()
        .headers
        .get("X-API-Key")
        .map(|v| v.as_bytes())
        == Some(b"secret123");

    if !has_valid_key {
        let _ = session.respond_error(401).await;
        return Ok(true);
    }

    ctx.authenticated = true;
    Ok(false)
}
}

Rate limiting comes first because it’s cheaper — a hashmap lookup versus header parsing. More importantly, if a client is over their rate limit, there’s no point checking their API key. Rejecting early saves work on every subsequent phase.

This ordering — cheaper rejection first — is a general principle. Put your fastest, most common rejections at the top.

The Complete Code

use async_trait::async_trait;
use pingora::lb::{LoadBalancer, selection::RoundRobin, health_check::TcpHealthCheck};
use pingora::prelude::*;
use pingora::proxy::{ProxyHttp, Session};
use pingora::upstreams::peer::HttpPeer;
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use std::time::Instant;

// ─── Per-Request Context ────────────────────────────────────────────

pub struct ProxyCtx {
    rate_limited: bool,
    client_key: String,
    authenticated: bool,
}

// ─── Token Bucket Rate Limiter (Part 7) ─────────────────────────────

struct TokenBucket {
    max: usize,
    rate: usize,
    tokens: usize,
    last_refill: Instant,
}

impl TokenBucket {
    fn new(rate: usize, burst: usize) -> Self {
        TokenBucket {
            max: burst,
            rate,
            tokens: burst,
            last_refill: Instant::now(),
        }
    }

    fn try_consume(&mut self) -> bool {
        self.refill();
        if self.tokens > 0 {
            self.tokens -= 1;
            true
        } else {
            false
        }
    }

    fn refill(&mut self) {
        let now = Instant::now();
        let elapsed = now.duration_since(self.last_refill);
        let tokens_to_add = (elapsed.as_secs_f64() * self.rate as f64) as usize;
        if tokens_to_add > 0 {
            self.tokens = (self.tokens + tokens_to_add).min(self.max);
            self.last_refill = now;
        }
    }
}

struct RateLimiterRegistry {
    buckets: Mutex<HashMap<String, TokenBucket>>,
    rate: usize,
    burst: usize,
}

impl RateLimiterRegistry {
    fn new(rate: usize, burst: usize) -> Self {
        RateLimiterRegistry {
            buckets: Mutex::new(HashMap::new()),
            rate,
            burst,
        }
    }

    fn is_allowed(&self, key: &str) -> bool {
        let mut buckets = self.buckets.lock().unwrap();
        let bucket = buckets
            .entry(key.to_string())
            .or_insert_with(|| TokenBucket::new(self.rate, self.burst));
        bucket.try_consume()
    }
}

// ─── The Proxy ──────────────────────────────────────────────────────

pub struct FullProxy {
    upstreams: Arc<LoadBalancer<RoundRobin>>,
    limiter: Arc<RateLimiterRegistry>,
}

#[async_trait]
impl ProxyHttp for FullProxy {
    type CTX = ProxyCtx;
    fn new_ctx(&self) -> Self::CTX {
        ProxyCtx {
            rate_limited: false,
            client_key: String::new(),
            authenticated: false,
        }
    }

    // Gate 1: Rate limiting (cheap, do first)
    // Gate 2: API key authentication
    async fn request_filter(
        &self,
        session: &mut Session,
        ctx: &mut Self::CTX,
    ) -> Result<bool> {
        // ── Rate limiting ──
        let client_addr = session.client_addr();
        let key = match client_addr {
            Some(addr) => addr.to_string(),
            None => "unknown".to_string(),
        };
        ctx.client_key = key.clone();

        if !self.limiter.is_allowed(&key) {
            ctx.rate_limited = true;
            let _ = session.respond_error(429).await;
            return Ok(true);
        }

        // ── Authentication ──
        let has_valid_key = session
            .req_header()
            .headers
            .get("X-API-Key")
            .map(|v| v.as_bytes())
            == Some(b"secret123");

        if !has_valid_key {
            let _ = session.respond_error(401).await;
            return Ok(true);
        }

        ctx.authenticated = true;
        Ok(false)
    }

    // Round-robin across healthy backends
    async fn upstream_peer(
        &self,
        _session: &mut Session,
        _ctx: &mut Self::CTX,
    ) -> Result<Box<HttpPeer>> {
        let upstream = self.upstreams
            .select(b"", 256)
            .ok_or_else(|| Error::new_str("no healthy upstream available"))?;

        let peer = Box::new(HttpPeer::new(
            upstream,
            true,
            "one.one.one.one".to_string(),
        ));
        Ok(peer)
    }

    // Add required headers before sending upstream
    async fn upstream_request_filter(
        &self,
        _session: &mut Session,
        upstream_request: &mut pingora::http::RequestHeader,
        _ctx: &mut Self::CTX,
    ) -> Result<()> {
        upstream_request.insert_header("Host", "one.one.one.one")?;
        Ok(())
    }

    // Replace the Server header and strip internal headers
    // before sending to the client. Same pattern as Part 3.
    async fn response_filter(
        &self,
        _session: &mut Session,
        upstream_response: &mut pingora::http::ResponseHeader,
        _ctx: &mut Self::CTX,
    ) -> Result<()> {
        // Remove headers that expose internal infrastructure
        upstream_response.remove_header("X-Powered-By");
        // Replace the Server header to identify our proxy
        upstream_response.insert_header("Server", "Gateway")?;
        upstream_response.insert_header("X-Proxy", "pingora-tutorial")?;
        Ok(())
    }

    // Log rate-limited requests
    async fn logging(
        &self,
        session: &mut Session,
        _e: Option<&pingora::Error>,
        ctx: &mut Self::CTX,
    ) {
        if ctx.rate_limited {
            eprintln!(
                "[rate-limited] client={} path={}",
                ctx.client_key,
                session.req_header().uri.path()
            );
        }
    }
}

fn main() {
    let opt = Some(Opt::parse_args());
    let mut server = Server::new(opt).unwrap();
    server.bootstrap();

    // ── Load balancer with health checks (Part 2) ──
    let mut upstreams = LoadBalancer::try_from_iter([
        "1.1.1.1:443",
        "1.0.0.1:443",
    ]).unwrap();

    let hc = TcpHealthCheck::new();
    upstreams.set_health_check(hc);
    upstreams.health_check_frequency = Some(std::time::Duration::from_secs(1));

    let background = pingora::services::background::background_service(
        "health_check",
        upstreams,
    );
    let upstreams = background.task();

    // ── Rate limiter: 10 req/s per client, burst of 20 (Part 7) ──
    let limiter = Arc::new(RateLimiterRegistry::new(10, 20));

    let proxy = FullProxy {
        upstreams,
        limiter,
    };

    let mut service = http_proxy_service(&server.configuration, proxy);
    service.add_tcp("0.0.0.0:6188");

    server.add_service(background);
    server.add_service(service);

    println!("Full proxy starting on :6188");
    println!("  - Load balancing: round-robin across 2 backends");
    println!("  - Authentication: X-API-Key header required");
    println!("  - Rate limiting: 10 req/s per client, burst 20");
    println!("  - Response cleanup: internal headers stripped");
    println!();
    println!("Test: curl -H 'X-API-Key: secret123' http://127.0.0.1:6188 -sv");

    server.run_forever();
}

What Changed from the Individual Parts

The capstone code looks similar to each part’s standalone version, but there are a few differences worth noting:

request_filter now has two gates. In Part 3, authentication was the only gate. In Part 7, rate limiting was the only gate. Now they’re both in request_filter, with rate limiting first (cheaper check). This is the composition pattern: when multiple concerns share a phase, order them by cost — cheapest rejection first.

CTX carries data for multiple concerns. In Part 3, CTX held authenticated. In Part 7, it held rate_limited and client_key. Now it holds all three. The CTX struct is the communication bus between phases — each phase sets its fields, and logging reads them all.

logging reports on all concerns. Each part’s logging was specific to that part. The capstone’s logging reports on rate limits in one place. This is where observability comes together — a single log line can tell you whether a request was rate-limited or proxied successfully.

The main() function wires everything together. Each part’s main() was simple. The capstone’s main() creates the load balancer, attaches health checks, creates the rate limiter, and passes them to FullProxy. This is where the architecture becomes visible — main() is the composition root.

What We’re Simplifying

No TLS in this code. Adding TLS requires certificate files. The Part 4 code works as-is — you’d add service.add_tls_with_settings() in main() and configure TlsSettings on the HttpPeer. The pattern is identical to Part 4; we’re omitting it here so the code runs without setup.

Custom settings are hardcoded. The capstone uses Opt::parse_args() (Part 5), so Pingora’s built-in config file and CLI flags work — you can pass -c conf.yaml or -d and they’ll take effect. But our custom values (upstream backends, rate limiter thresholds) are hardcoded in main() rather than read from the config. To make them configurable, you’d read the raw config from server.configuration and parse your custom fields.

No caching. Part 6’s cache uses Pingora’s built-in session.cache API, which integrates through trait methods (request_cache_filter, cache_key_callback, response_cache_filter) rather than a custom struct. Adding it to this capstone requires a different integration pattern than the other features. Start from Part 6’s working code and add the other features on top — it’s a good exercise.

No graceful restart. Part 5’s zero-downtime upgrade requires the upgrade feature flag and a running process to send the signal to. The capstone uses the standard run_forever() loop.

What Comes Next

This proxy is production-capable for many use cases. If you need more, here’s where to look:

Need	Approach
Higher throughput	More worker threads (`-t` flag), connection tuning
Distributed rate limiting	Redis-backed counters, or `pingora-limits` with aggregation
Request body filtering	`request_body_filter()` — read the body before forwarding
Websocket proxying	Pingora handles it; your `upstream_peer` selects the backend
HTTP/2 upstream	Set `HttpPeer::options.http_version` to `Http2`
Custom metrics	`logging()` phase → Prometheus or StatsD
Plugin system	Dynamic `ProxyHttp` dispatch based on route rules

The framework is Pingora. The logic is yours. You’ve seen every major piece. Go build something real.

Keyboard shortcuts