feat: L2 Cache with Valkey Client Side Hash Ring#4033
Conversation
a47f35e to
730af75
Compare
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com> feat: wire ValkeyStorage into NewCacheFilter (nil = in-memory only) Signed-off-by: Larry D Almeida <hello@larrydalmeida.com> feat: wire Valkey ring into cache() filter when swarm Valkey is configured Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
7047126 to
573ebdf
Compare
Replace the single valkey_fallback counter with three distinct counters for better observability: - valkey_miss — clean cache miss (key absent in Valkey) - valkey_get_fallback — Valkey error on Get; L1 consulted instead - valkey_set_fallback — Valkey error on Set; L1 written instead Inject metrics.Metrics into ValkeyStorage via NewValkeyStorage so tests can assert counter values without relying on the global metrics.Default singleton. Introduce a valkeyClient interface (Get/SetWithExpire/Expire) so unit tests can use an in-memory stub instead of a live Valkey/Docker connection. Two new tests — RecordsValkeyMiss and SplitFallbackCounters — exercise the counter logic with stubs. Signed-off-by: Larry D Almeida <hello@larrydalmeida.com> fix(cache): add compile-time interface guards; assert valkey_get_fallback fires on fallback Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
55ba444 to
c73eb72
Compare
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
| // Shallow copy so NewValkeyRingClient can mutate opt.Addrs without | ||
| // racing against the ratelimit registry's copy of the same pointer. | ||
| cacheValkeyOpts := *valkeyOptions | ||
| cacheValkeyRing, err = skpnet.NewValkeyRingClient(&cacheValkeyOpts) |
There was a problem hiding this comment.
Why not passing the same client?
I know that you can not know this, but in the end we can refactor code that I want to move around a bit and some I want to delete later, so we can make this nice. :)
There was a problem hiding this comment.
Please merge larry-dalmeida#2 , if you agree that this makes sense
…we can reuse the valkey ring client in the cache filter creation. Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
…ients refactor: ratelimit registry creation to pass ring clients
|
👍 |
|
👍 |
That actually implies that l1 cache will be empty in case if Set was successfull and only Get fails. |
Good point, I wondered about this and missed to ask the question. |
|
Intended behavior: L1 is a degraded-mode fallback only, not hot cache layer path. Happy path always reads from and writes to L2. L1 is only populated when a valkey operation fails. CPI L1/L2 cache hierarchyworks because 1) L2 is on-chip with 5-30ns latency. L1 saves nanoseconds, not milliseconds. 2) Cache coherence is handled at hardware level (MESI protocol) - when core writes to L1, hardware broadcasts invalidation to other core’s L1 automatically. This does not transfer here: L1 is an LRU per Skipper pod - there can be n pods, each with own private LRU. No hardware coherence protocol Valkey round trip is ~0.5-1ms over local cluster network - latency is higher than L1 cache but problem being solved is not sub-ms latency but rather cross-pod cache sharing. Primary goal: cross-pod consistencyWithout L2, every pod maintains an independent LRU. A cold miss on pod A fetches from origin, pod B's LRU is empty and fetches independently. Under load (e.g. a popular campaign going live), this produces an N-way thundering herd - one upstream fetch per pod, not one per cluster. Valkey solves this because the consistent-hash ring maps a given cache key to the same shard regardless of which pod is making the request. A cold-miss coalesced by pod A writes to Valkey shard S. Pod B's next request for the same key hits shard S directly - no upstream fetch. Why warming L1 on write undermines this goalIf L1 is warmed on a successful Valkey
The tradeoff is: faster reads on the hot path vs. stale content served after invalidation, with non-trivial invalidation infrastructure. Why Valkey-miss does not consult L1A Valkey miss ( L1 is only consulted when Valkey returns an error, because in that case we have no authoritative answer. Serving a potentially-stale L1 entry is preferable to a 5xx. Considered alternativesWrite-through with TTL-bounded stalenessWarm L1 on every successful Not chosen because explicit Write-through + Valkey pub/sub invalidationWarm L1, subscribe each pod to a Valkey pub/sub channel for invalidation events. This matches the CPU L1/L2 mental model most closely. Not chosen for this PR. The complexity cost is high (subscribe lifecycle, reconnect handling, message delivery guarantees, lag-bounded staleness), and the latency benefit does not yet justify it. This is the natural next step if Valkey read latency becomes a bottleneck. |
Signed-off-by: Larry D Almeida <hello@larrydalmeida.com>
We could have L1 entry TTL of a fixed acceptable amount of time. |
Good point. tokeninfo data is a strong precedent. The write-around choice was conservative: L1 and Valkey TTLs are independent, and if a Valkey entry expires or gets evicted, an L1 entry with a longer TTL would silently serve stale content with no signal. The intent was to keep Valkey authoritative for the lifetime of every entry. That said, your proposal sidesteps the problem cleanly.
Trade-off is: cache filter TTL must be meaningfully longer than the L1 TTL for the L1 layer to be useful. For now I will proceed with setting 60s as fixed TTL as a start. |
After every successfull read from L1 you can call EXPIRE to valkey comand to:
You can also consider using GETEX instead of GET if you wan TTLs to be updated on read ops. |
This you can't really do because l2 cache is shared by all skipper instances |
|
@szuecs @a4180p @MustafaSaber thanks a lot for the feedback and patience 🧡 I will review it thoroughly and get back to you with concrete proposal. Currently I am busy with a business critical reliability related topic but I will resume this on Monday 8th June. |
Related Issue
#3991
Description
Extends the
cache()filter with an optional Valkey-backed L2 cache using a client-side consistent hash ring.When
--swarm-valkey-urlsis configured, responses are stored in Valkey (L2) with automatic fallback to the in-process LRU (L1) on any Valkey error.Storage architecture
Write path: successful Valkey
Setdoes not warm L1 (write-around). L1 is only populated on Valkey errors.Read path: Valkey miss →
nil(clean miss, no L1 consulted). Valkey error → L1 consulted as fallback.Valkey ring topology
All pods share the same ring, so a cold-miss coalesced by pod A lands in the same shard that pod B would read from — no thundering herd across pods.
Observability
Three counters track Valkey operation outcomes:
valkey_missvalkey_get_fallbackvalkey_set_fallbacklru_bytesgauge is now updated by a background scraper every 10 s instead of only on eviction, so it stays accurate when capacity is not exceeded.Storage
SetandDeleteerrors are now logged atWarninstead of being silently discarded.Additional changes
cache_status,cache_key, andcache_ttl_remaining_msare tagged on the active OpenTracing span for HIT, MISS, and STALE paths.lru_evictionandlru_oversizedcounters are now injectable via constructor parameters, removing the hiddenmetrics.Defaultdependency and enabling test-scoped counter assertions without global state mutation. Seefilters/cache/observability-gaps.mdfor why full per-route namespacing requires a Skipper core change.NewCacheFiltersignature change: a fourth parametervalkeyRing *skpnet.ValkeyRingClientis added. Passnilto preserve the existing LRU-only behaviour. Call sites embedding Skipper as a library must be updated.Bug fixes
Response()was dead code —coalesce()always callsctx.Serve(), causingResponse()to return early. SIE logic moved insidecoalesce()with the pre-fetch snapshot captured beforef.fetchruns.only-if-cached+ SWR: entries in the stale-while-revalidate window are still being served to other clients and must not return 504 toonly-if-cachedrequests. Fixed viaIsUsable()replacingIsStale().Response()nil-safe key assertion: bare type assertion onstateBagKeypanicked on route misconfiguration; replaced with comma-ok form.Expire(-1)truncation:time.Duration(-1)(-1 ns) truncated toEXPIRE key 0in Valkey. Changed to-1*time.Secondwhich correctly sendsEXPIRE key -1.Regression tests
TestCacheFilter_MustRevalidate_ForcesCoalesceWhenStalemust-revalidateforces origin fetch even within SWR windowTestCacheFilter_UnsafeMethod_4xx_DoesNotInvalidateTestLRUStorage_OversizedEntrylru_oversizedand is not storedTestCacheFilter_RevalDropped_WhenQueueFullreval_droppedcounter fires whenrevalJobschannel is at capacityTestValkeyStorage_FallsBackToL1OnValkeyUnavailable(strengthened)Get