Add docs recommending autoscaling setup by carlydf · Pull Request #324 · temporalio/temporal-worker-controller

carlydf · 2026-05-14T01:52:36Z

Adds documentation outlining the tradeoffs between two autoscaling solutions:

HPA+prometheus adapter
KEDA Temporal Scaler

Documentation focuses on straightforward descriptions of the pros and cons of each solution.

jaypipes

Thanks @carlydf , I've done a first go-around reviewing this documentation and adding (quite a few) suggested changes and removals to "de-Claude" some of it and make it (hopefully) a bit more readable for a general audience.

Shivs11

a couple of nits -- looks g to me otherwise

carlydf

So close! Thank you for all your hard work on this @jaypipes. Let me know what you think of my suggestions :)

The backlog metric pipeline goes from prometheus-adapter directly to the raw temporal_cloud_v1_approximate_backlog_count series, eliminating the temporal_approximate_backlog_count recording rule. Adapter rule: - seriesQuery filters out temporal_worker_build_id="__unversioned__" so discovery doesn't choke on the 5000+ unversioned series in typical accounts. - metricsQuery sum(...) collapses labels the HPA doesn't select on at query time (instance/job/region/task_priority/temporal_account). - metricsRelistInterval is bumped to 5m to accommodate the ~3-minute embedded-timestamp lag in Temporal Cloud's OpenMetrics emission. WRT example, prometheus-stack-values, and demo README are updated to match. Add docs/scaling-recommendations.md covering the empirically measured reactivity model (steady-state ~3:15 dominated by Cloud aggregation lag), task-queue-unload behavior, scale-from-zero limits, and when to pick KEDA over the metric path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Initial scaling-recommendations.md framed steady-state HPA reactivity as ~3:15, citing a "Temporal Cloud aggregation lag." That was wrong. The actual sample-age distribution on the OpenMetrics endpoint is: p50 30s (matches ~1/min emission cadence, age oscillates 0-60s) p95 50s p99 ~tail of occasional gateway-wide stalls So typical end-to-end reactivity is ~85s (emission + scrape + HPA poll), not ~3:15. The 3-minute figures came from observations made during the occasional periods when the OpenMetrics gateway returns frozen timestamps across every series in the account simultaneously - those stalls are real but not steady-state. Doc now: - Replaces the 3:15 figure with empirically-derived ~85s typical. - Adds a "Gateway-wide stalls" caveat describing the frozen-timestamp behavior observationally (no speculation about cause). - Keeps the metricsRelistInterval: 5m recommendation, now justified by the need to exceed stall duration rather than the misattributed "aggregation lag." - Demo README updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Earlier wording implied multiple stall events ("occasional periods") when we have only directly characterized one such event during this investigation. Reword to describe exactly what was seen, note that frequency is not yet known, and that the behavior is open with the Observability team. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Verified directly: across a 3-hour window including one of the observed "stall" events, every gap between consecutive sample timestamps in Prometheus's storage is exactly 60 seconds. So the OpenMetrics endpoint isn't dropping or freezing emissions - it's delivering them late, in bursts after a delay, with their original minute-aligned timestamps. The retrospective record looks complete (good for dashboards), but live HPA consumers see the delay as real staleness because they query the latest available timestamp at decision time. Reframe the caveat in the scaling doc and demo README accordingly. Also note we observed two such delay events in ~2 hours of close observation - frequency in normal operation is still open with the Observability team. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-authored-by: Jay Pipes <jaypipes@gmail.com> Co-authored-by: Stefan Richter <stefan@02strich.de>

Removes a bunch of overly verbose Claude-generated stuff that will likely confuse readers. Reworded a few places where Claude was using some odd terminology -- e.g. "typical end-to-end reactivity" -- to use more straightforward verbiage. Added a brief WRT example HPA template that shows the stabilization window that is referred to in multiple sections of the doc. Signed-off-by: Jay Pipes <jay.pipes@temporal.io>

Signed-off-by: Jay Pipes <jay.pipes@temporal.io>

carlydf

approved! (I can't actually approve because this is technically my PR)

carlydf requested review from a team and jlegrone as code owners May 14, 2026 01:52

carlydf marked this pull request as draft May 14, 2026 02:03

jaypipes requested changes May 15, 2026

View reviewed changes

02strich reviewed May 19, 2026

View reviewed changes