Part 6: HTTP Caching — Making Your Proxy Remember So Your Backends Don’t Have To

Your proxy sits between clients and backends. Every request goes upstream, every response comes back. That’s fine when your backends are fast and your traffic is low. But at scale, most requests are for the same content — the same images, the same API responses, the same static assets. Why fetch them from the upstream every time?

Caching is the answer, and it’s also the problem. HTTP caching is deceptively complex. It’s not “store the response, serve it again.” You need to decide what to cache, how long to cache it, who gets which version, and when to throw it away. The HTTP spec has an entire section on caching semantics (RFC 9211, building on decades of earlier RFCs). Pingora’s caching layer implements these semantics — your job is to configure and customize them.

Let’s start with the concepts, then wire up a real cache.

Why Cache at the Proxy?

You could cache at the application layer — your backend could use Redis, or serve cached responses from memory. So why cache at the proxy?

Speed. The proxy is closer to the client than the backend. A cache hit at the proxy means zero network hops to the backend. For a globally distributed setup, that’s the difference between 10ms and 200ms.

Load reduction. Every cached response is a request your backend doesn’t see. During traffic spikes, the cache absorbs the load. Your backend stays healthy because it only handles cache misses.

Independence. The proxy cache works regardless of your backend technology. Whether you’re proxying to a Go service, a Python app, or a legacy Java monolith, the caching logic is the same. You don’t need every backend to implement its own caching.

Correctness. The proxy understands HTTP caching semantics — Cache-Control, Vary, ETag, conditional requests. It can serve stale content while revalidating in the background (stale-while-revalidate). These are the same semantics browsers use, enforced consistently at the proxy layer.

How HTTP Caching Works (The Short Version)

Before we get to Pingora’s API, let’s make sure we’re on the same page about HTTP caching.

Cache-Control

The Cache-Control header is the primary mechanism. The backend sends it with responses:

Cache-Control: max-age=300, public

This means: “cache this for 300 seconds, and any cache (browser, CDN, proxy) can store it.”

Common directives:

Directive	Meaning
`max-age=N`	Cache is fresh for N seconds
`s-maxage=N`	Same, but only for shared caches (like your proxy)
`public`	Any cache can store this
`private`	Only the browser can cache this (not the proxy)
`no-cache`	Cache can store, but must revalidate before using
`no-store`	Don’t cache at all
`stale-while-revalidate=N`	Serve stale content for up to N seconds while fetching fresh content in the background

Vary

The Vary header tells the cache which request headers affect the response:

Vary: Accept-Encoding

This means: “I serve different content based on Accept-Encoding. Store separate cache entries for gzip vs. brotli vs. uncompressed.”

If Vary is missing, the cache assumes the response is the same regardless of request headers. Getting Vary wrong is a common source of cache bugs — serving compressed content to clients that can’t handle it, or serving the wrong language variant.

Conditional Requests

When a cached response is stale (past its max-age), the proxy doesn’t throw it away. It can send a conditional request to check if the content has changed:

If-None-Match: "abc123" — “I have version abc123. Only send the full response if it’s changed.” If the backend responds with 304 Not Modified, the proxy refreshes the cache metadata and serves the cached body.
If-Modified-Since: Wed, 21 Oct 2025 07:28:00 GMT — Same idea, but based on the last modification time.

Conditional requests save bandwidth — the backend doesn’t need to send the full response body if nothing changed.

The Three Hooks

Pingora’s cache integrates with the same ProxyHttp trait you’ve been using since Part 3. There are three methods you implement to make caching work:

Request arrives
     │
     ▼
┌──────────────────────┐
│  request_cache_filter │  ← Should this request use the cache?
│  (enable caching)     │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│  cache_key_callback   │  ← What's the cache key for this request?
│  (generate the key)   │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│  Pingora looks up    │  ← Internal: check storage for this key
│  the key in storage  │
└──────────┬───────────┘
           │
     ┌─────┴─────┐
     │            │
 Cache Hit    Cache Miss
     │            │
     ▼            ▼
┌──────────┐ ┌────────────────────────┐
│ Serve it │ │  response_cache_filter │ ← Is the upstream response cacheable?
│ from     │ │  (decide what to store)│
│ cache    │ └──────────┬─────────────┘
└──────────┘            │
                        ▼
                  ┌──────────────┐
                  │  Store it    │
                  │  in cache    │
                  └──────────────┘

Each hook corresponds to a method on ProxyHttp. Let’s walk through them.

Step 1: Enable Caching with `request_cache_filter`

By default, caching is off. You have to opt in. request_cache_filter is where you decide: “should this request even use the cache?”

#![allow(unused)]
fn main() {
use pingora::cache::{MemCache, NoCacheReason};
use pingora::proxy::{ProxyHttp, Session};

fn request_cache_filter(
    &self,
    session: &mut Session,
    _ctx: &mut Self::CTX,
) -> Result<()> {
    // Only cache GET and HEAD requests.
    // POST, PUT, DELETE are mutations — caching them is almost always wrong.
    let method = &session.req_header().method;
    if method != "GET" && method != "HEAD" {
        return Ok(());
    }

    // Enable the cache with our storage backend.
    session.cache.enable(
        self.storage,  // the Storage implementation
        None,          // eviction manager (entries live until process restart)
        None,          // predictor (every eligible request goes through cache logic)
        None,          // cache lock (concurrent misses fetch independently)
        None,          // option overrides
    );

    Ok(())
}
}

The key call is session.cache.enable(). It takes a storage backend and four optional parameters:

storage — Where cached data lives. We’re using MemCache, an in-memory hashtable. The pingora-cache crate docs mark it “for testing only,” but it’s the right tool for learning the API.
eviction — How to decide what to throw away when the cache is full. Without one, nothing gets evicted — entries live until the process restarts.
predictor — A hint about whether a request is likely cacheable. Without one, every eligible request goes through the full cache lookup.
cache_lock — What to do when multiple requests miss the cache for the same key. Without one, they all fetch independently. With one, the first request fetches and the rest wait for its result.

For a learning setup, None for everything except storage is fine. Production deployments would configure eviction and cache locking.

One thing to notice: storage is &'static (dyn Storage + Sync). The cache outlives any single request, so Pingora needs a reference that’s valid for the program’s lifetime. In our main(), we get this with Box::leak:

#![allow(unused)]
fn main() {
let storage: &'static MemCache = Box::leak(Box::new(MemCache::new()));
}

In production, you’d use LazyLock instead:

use std::sync::LazyLock;

static STORAGE: LazyLock<&'static MemCache> = LazyLock::new(|| {
    Box::leak(Box::new(MemCache::new()))
});

LazyLock defers the allocation until first use, then locks it in place — same static lifetime, but the initialization is explicit and lazy rather than eagerly leaked. The Box::leak approach works for a tutorial — the memory is allocated once and lives until the process exits.

Why &'static? The cache storage must outlive every individual request. Pingora can’t hold a regular reference — it would need a lifetime parameter on every struct that touches the cache, which would propagate through the entire API. Box::leak gives us a static reference by intentionally “leaking” the allocation. The memory lives until the process exits, which is exactly how long we need it. This isn’t a memory leak in the traditional sense — it’s a deliberate, one-time allocation that we never intend to free.

Step 2: Generate the Cache Key with `cache_key_callback`

The cache key determines whether two requests map to the same cached response. This is the most security-sensitive part of caching. Get the key wrong and you serve one user’s data to another — that’s cache poisoning.

One critical detail before we build the key: the upstream controls whether your cache is valid. If one.one.one.one sends Cache-Control: no-store, Pingora won’t cache it — even if your key construction is perfect. The contract is: the proxy decides how to store and retrieve; the origin decides whether to allow it. one.one.one.one works because it explicitly sends Cache-Control: public, max-age=60.

A safe default: use the request URI as the key. This means /path/a and /path/b are cached separately.

#![allow(unused)]
fn main() {
use pingora::cache::CacheKey;

static CACHE_NAMESPACE: &[u8] = b"pingora-tutorial";

fn cache_key_callback(
    &self,
    session: &Session,
    _ctx: &mut Self::CTX,
) -> Result<CacheKey> {
    let uri = session.req_header().uri.path().as_bytes();
    Ok(CacheKey::new(
        CACHE_NAMESPACE,     // namespace: distinguishes this proxy's cache
        uri,                 // primary: the request path
        "tutorial-cache",    // user_tag: for logging and debugging
    ))
}
}

The CacheKey has three parts:

namespace — Separates cache entries from different proxies sharing the same storage. Like a table prefix in a shared database.
primary — The main identifier. Usually the request URI. Both namespace and primary are hashed together to form the lookup key.
user_tag — A human-readable label for debugging. Not part of the hash, but appears in logs and tracing.

Why is the key security-sensitive? Consider what happens if you forget the namespace: two proxies sharing the same MemCache would overwrite each other’s entries. Or if you include only the host but not the path: every path on the same host would share one cache entry. The key has to capture everything that makes one response different from another.

Our example uses .uri.path(), which excludes query strings. A request to /api/users?page=1 and /api/users?page=2 would share the same cache entry — returning the first page’s response for the second page. For an API with query parameters, use .uri.path_and_query() instead to include the ?page=2 portion in the key. We use .path() here because one.one.one.one doesn’t vary responses by query string, but a real API almost certainly would.

The default implementation of cache_key_callback panics — you must override it when caching is enabled. This is by design. There’s no safe default because the correct key depends on your application.

Step 3: Decide What’s Cacheable with `response_cache_filter`

After the upstream responds, Pingora asks: “should we store this?” The default answer is no — you have to opt in.

#![allow(unused)]
fn main() {
use pingora::cache::{CacheMeta, RespCacheable::*, NoCacheReason};
use std::time::{Duration, SystemTime};

fn response_cache_filter(
    &self,
    _session: &Session,
    resp: &pingora::http::ResponseHeader,
    _ctx: &mut Self::CTX,
) -> Result<pingora::cache::RespCacheable> {
    // Cache 200 responses for 60 seconds.
    // Skip everything else — 404s, 500s, redirects.
    if resp.status != 200 {
        return Ok(Uncacheable(NoCacheReason::Custom("non-200")));
    }

    // Respect Cache-Control from the upstream.
    if let Some(cc) = resp.headers.get("cache-control") {
        let cc_str = cc.to_str().unwrap_or("");
        if cc_str.contains("no-store") || cc_str.contains("private") {
            return Ok(Uncacheable(NoCacheReason::OriginNotCache));
        }
    }

    // Build the cache metadata.
    let now = SystemTime::now();
    let meta = CacheMeta::new(
        now + Duration::from_secs(60),  // fresh_until: when the entry becomes stale
        now,                             // created: when it was cached
        0,                               // stale-while-revalidate seconds
        0,                               // stale-if-error seconds
        resp.clone(),                    // the response header to cache
    );

    Ok(Cacheable(meta))
}
}

The return type is RespCacheable — either Cacheable(CacheMeta) or Uncacheable(NoCacheReason). The CacheMeta tells Pingora how long to keep the entry and when to revalidate.

Our implementation is conservative:

Only 200s get cached. A 404 might be worth caching briefly (to avoid hammering the backend for a known-missing resource), but it depends on the use case.
We check Cache-Control from the upstream. If the origin says no-store or private, we respect it. This is important — the proxy shouldn’t cache things the origin didn’t intend to be cached.
The TTL is 60 seconds. Short enough that stale data doesn’t linger, long enough to absorb traffic spikes.

This is a simplification. A production cache would also check Vary, handle stale-while-revalidate, and respect s-maxage. Pingora’s cache internals handle most of this automatically when you enable the full pipeline. Our example manually creates CacheMeta because we’re demonstrating the API at a level where you can see every piece.

The Full Picture

Here’s what the caching proxy looks like when you put it all together:

use async_trait::async_trait;
use pingora::cache::{CacheKey, CacheMeta, MemCache, NoCacheReason, RespCacheable::*};
use pingora::lb::{LoadBalancer, selection::RoundRobin};
use pingora::prelude::*;
use pingora::proxy::{ProxyHttp, Session};
use pingora::upstreams::peer::HttpPeer;
use std::sync::Arc;
use std::time::{Duration, SystemTime};

static CACHE_NAMESPACE: &[u8] = b"pingora-tutorial";

pub struct LB {
    upstreams: Arc<LoadBalancer<RoundRobin>>,
    storage: &'static MemCache,
}

#[async_trait]
impl ProxyHttp for LB {
    type CTX = ();
    fn new_ctx(&self) -> Self::CTX {}

    async fn upstream_peer(
        &self,
        _session: &mut Session,
        _ctx: &mut Self::CTX,
    ) -> Result<Box<HttpPeer>> {
        let upstream = self.upstreams
            .select(b"", 256)
            .ok_or_else(|| Error::new_str("no healthy upstream available"))?;
        Ok(Box::new(HttpPeer::new(
            upstream, true, "one.one.one.one".to_string(),
        )))
    }

    fn request_cache_filter(
        &self,
        session: &mut Session,
        _ctx: &mut Self::CTX,
    ) -> Result<()> {
        let method = &session.req_header().method;
        if method != "GET" && method != "HEAD" {
            return Ok(());
        }
        session.cache.enable(self.storage, None, None, None, None);
        Ok(())
    }

    fn cache_key_callback(
        &self,
        session: &Session,
        _ctx: &mut Self::CTX,
    ) -> Result<CacheKey> {
        let uri = session.req_header().uri.path().as_bytes();
        Ok(CacheKey::new(CACHE_NAMESPACE, uri, "tutorial-cache"))
    }

    fn response_cache_filter(
        &self,
        _session: &Session,
        resp: &pingora::http::ResponseHeader,
        _ctx: &mut Self::CTX,
    ) -> Result<pingora::cache::RespCacheable> {
        if resp.status != 200 {
            return Ok(Uncacheable(NoCacheReason::Custom("non-200")));
        }
        if let Some(cc) = resp.headers.get("cache-control") {
            let cc_str = cc.to_str().unwrap_or("");
            if cc_str.contains("no-store") || cc_str.contains("private") {
                return Ok(Uncacheable(NoCacheReason::OriginNotCache));
            }
        }
        let now = SystemTime::now();
        let meta = CacheMeta::new(
            now + Duration::from_secs(60), now, 0, 0, resp.clone(),
        );
        Ok(Cacheable(meta))
    }
}

fn main() {
    let opt = Some(Opt::parse_args());
    let mut server = Server::new(opt).unwrap();
    server.bootstrap();

    let storage: &'static MemCache = Box::leak(Box::new(MemCache::new()));
    let upstreams = LoadBalancer::try_from_iter(["1.1.1.1:443", "1.0.0.1:443"]).unwrap();
    let lb = LB { upstreams: Arc::new(upstreams), storage };

    let mut service = http_proxy_service(&server.configuration, lb);
    service.add_tcp("0.0.0.0:6188");

    server.add_service(service);
    server.run_forever();
}

This is a working caching proxy. The first GET to any path goes upstream. The second serves from cache. After 60 seconds, the entry goes stale and the next request fetches fresh content.

Cache Strategies

The technical setup is one thing. The harder question is: what should you cache?

Cache Static Assets Aggressively

Images, CSS, JavaScript — these change infrequently. Cache them with a long max-age. Let the upstream set the headers, or override them in response_cache_filter:

Cache-Control: public, max-age=86400

Cache API Responses Carefully

API responses might be user-specific (private) or change frequently. Cache with short TTLs and revalidation:

Cache-Control: public, max-age=10, stale-while-revalidate=60

This serves cached content for up to 10 seconds, then serves stale content for up to 60 more seconds while fetching a fresh copy in the background. The client always gets a fast response; the data is at most 70 seconds stale.

Don’t Cache Authenticated Content

If the response depends on who’s asking (e.g., a user profile), either:

Set Cache-Control: private so the proxy doesn’t cache it
Vary on the Authorization header (but this means every unique token gets a separate cache entry — usually wrong)
Skip caching in request_cache_filter for authenticated requests

Cache 404s Briefly

A 404 for a non-existent resource prevents hammering the backend. But cache it briefly — the resource might be created:

Cache-Control: public, max-age=30

Cacheability Checklist

Before you wire in caching, run through this:

✅ The origin sends Cache-Control (not no-store or private)
✅ The request method is cacheable (GET or HEAD)
✅ The cache key uniquely identifies the response (if the upstream varies on headers, include them in the key)

Cache Invalidation

The hardest part of caching isn’t storing things — it’s knowing when to throw them away.

Time-based expiration. The simplest approach. Set a max-age, and the cache evicts the entry when it expires. Works well for content that changes predictably.

Purge. Explicitly remove a cached entry. Useful when you know content has changed (e.g., after a deploy). Pingora supports purge via the Storage::purge() method.

Revalidation. When a cached entry is stale, the proxy checks with the upstream before serving it. Conditional requests (ETag, If-Modified-Since) make this efficient — the upstream only sends the full response if the content actually changed.

Stale-while-revalidate. Serve stale content immediately while fetching fresh content in the background. The client gets a fast response (from cache), and the cache is updated for the next request. This is the best default for content that should be fresh but where a few seconds of staleness is acceptable.

What We’re Simplifying

Our cache works, but it’s a teaching implementation. A production cache would add several things we’re skipping.

Eviction. Without an eviction manager, cached entries live until the process restarts. Under load, MemCache grows without bound. A production cache would evict stale entries and enforce a size limit.

Cache locking. Without a cache lock, multiple concurrent requests for the same uncached resource all fetch from upstream independently. A cache lock makes the first request the “writer” and the rest wait for its result — avoiding the “thundering herd” problem.

Full Vary support. Our cache_key_callback uses the URI path. A production cache would also vary on headers like Accept-Encoding when the upstream sends a Vary header. Pingora handles this via the cache_vary_filter method.

Predictive caching. A cache predictor learns which request patterns are likely cacheable and short-circuits uncacheable requests before they hit the storage layer. This reduces latency for requests that would miss anyway.

Conditional requests. We don’t implement cache_hit_filter, which lets you customize how cache hits are served — for example, checking conditional request headers like If-None-Match before serving from cache and returning 304 Not Modified. The default behavior serves cached responses directly, which is correct for our use case.

These features are all available in pingora-cache. Our example opts out of them to keep the API visible. In a real deployment, you’d configure them through the parameters we passed as None in session.cache.enable().

Where to Learn More

pingora-cache source code — The crate is the authoritative reference. Start with lib.rs for the HttpCache state machine.
RFC 9211 — “Cache-Status HTTP Header.” The modern spec for HTTP caching.
RFC 7234 — The original HTTP caching spec. Dense but definitive.
Cloudflare’s caching documentation — Practical advice from the team that built Pingora.

What You’ve Built

You now have a working caching proxy:

Part	What You Added
1	A working reverse proxy
2	Load balancing and health checks
3	Request filtering, response modification, per-request state
4	TLS termination and certificate verification
5	Config files, daemonization, zero-downtime upgrades
6	HTTP caching — with real cache hits and misses

The framework is Pingora. The logic is yours. Go build something.

Keyboard shortcuts