feat(core): per-run event-log LRU cache to avoid full reloads on warm resume#2210
feat(core): per-run event-log LRU cache to avoid full reloads on warm resume#2210TooTallNate wants to merge 1 commit into
Conversation
… resume On warm function instances, the workflow runtime now keeps a process-wide LRU cache (keyed by runId) of loaded event logs. On resume we delta-fetch from the cached cursor instead of reloading from event 0 on every invocation, converting the warm-replay path from O(N^2) to O(N) event reads over a long, step-heavy run's life. The cache is bounded by ~64 MiB of approximate bytes and 500 entries (both tunable via WORKFLOW_EVENT_CACHE_MAX_BYTES / WORKFLOW_EVENT_CACHE_MAX_ENTRIES). We always delta against the server (never replay purely from cache) so we stay correct against writers on other instances, and the existing shouldRetryWithoutEventCursor path self-heals if a cached cursor is ever stale. Set WORKFLOW_DISABLE_EVENT_CACHE=1 to disable as a kill switch.
🦋 Changeset detectedLatest commit: 71e52b2 The changes in this PR will be included in the next version bump. This PR includes changesets to release 17 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
❌ Some benchmark jobs failed:
Check the workflow run for details. |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (13 failed)astro (1 failed):
example (8 failed):
express (1 failed):
nextjs-webpack (1 failed):
nitro (1 failed):
vite (1 failed):
📋 Other (2 failed)e2e-vercel-prod-tanstack-start (2 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
Summary
Adds a process-wide LRU cache keyed by
runIdso warm function instances delta-fetch only the new events on resume instead of reloading the full event log from event 0 on every invocation.Long, step-heavy runs that resume frequently (once per step) previously re-read a growing log each time — O(N²) total event reads over the run's life. With this change, the warm-replay path becomes O(delta), converting to O(N) overall.
What changed
packages/core/src/runtime/event-cache.ts— module-level LRU bounded by approximate bytes (~64 MiB) and entry count (500). Both tunable viaWORKFLOW_EVENT_CACHE_MAX_BYTES/WORKFLOW_EVENT_CACHE_MAX_ENTRIES. Hand-rolledMap-based LRU (no new dependency).packages/core/src/runtime/event-cache.test.ts— 17 unit tests (round-trip, byte-budget eviction, totalSize accounting, clear, feature flag, size estimation).packages/core/src/runtime.ts— integrated the cache into the resume path:eventId); fall back to a full load on miss.{events: [...events], cursor}so the next resume's baseline includeswait_completedappends.run_completed/run_failed/run_cancelled).workflow.events.cache_hitspan attribute on every replay.packages/core/src/runtime/helpers.ts— exportedappendUniqueEventsso the cache merge path reuses the existing dedup helper.packages/core/src/runtime/wait-completion-replay.test.ts—clearEventCache()inafterEachso the process-wide cache doesn't leak between scenarios that share arunId.Correctness
eventIdand preserves ascending order (delta usessortOrder: 'asc').helpers.tssemantics).wait_completedappends (write-back happens after the merge block).events.shouldRetryWithoutEventCursorpath performs a full reload; the cache simply repopulates with the new baseline. No new invalidation protocol needed.Rollout / risk
@workflow/corerelease.WORKFLOW_DISABLE_EVENT_CACHE=1forces the cold full-load path so the cache can be disabled without a rollback.Verification
pnpm typecheckcleanpnpm test— all 1100 tests pass (including 17 new event-cache tests)pnpm buildclean