Add docs recommending autoscaling setup#324
Open
carlydf wants to merge 9 commits into
Open
Conversation
jaypipes
requested changes
May 15, 2026
jaypipes
left a comment
Contributor
There was a problem hiding this comment.
Thanks @carlydf , I've done a first go-around reviewing this documentation and adding (quite a few) suggested changes and removals to "de-Claude" some of it and make it (hopefully) a bit more readable for a general audience.
02strich
reviewed
May 19, 2026
02strich
reviewed
May 19, 2026
02strich
reviewed
May 19, 2026
02strich
reviewed
May 19, 2026
carlydf
commented
May 28, 2026
Shivs11
approved these changes
Jun 3, 2026
Shivs11
left a comment
Member
There was a problem hiding this comment.
a couple of nits -- looks g to me otherwise
718471b to
f8335de
Compare
eniko-dif
reviewed
Jun 9, 2026
d452af5 to
d1e2d02
Compare
jaypipes
approved these changes
Jun 11, 2026
carlydf
commented
Jun 17, 2026
carlydf
commented
Jun 17, 2026
The backlog metric pipeline goes from prometheus-adapter directly to the raw temporal_cloud_v1_approximate_backlog_count series, eliminating the temporal_approximate_backlog_count recording rule. Adapter rule: - seriesQuery filters out temporal_worker_build_id="__unversioned__" so discovery doesn't choke on the 5000+ unversioned series in typical accounts. - metricsQuery sum(...) collapses labels the HPA doesn't select on at query time (instance/job/region/task_priority/temporal_account). - metricsRelistInterval is bumped to 5m to accommodate the ~3-minute embedded-timestamp lag in Temporal Cloud's OpenMetrics emission. WRT example, prometheus-stack-values, and demo README are updated to match. Add docs/scaling-recommendations.md covering the empirically measured reactivity model (steady-state ~3:15 dominated by Cloud aggregation lag), task-queue-unload behavior, scale-from-zero limits, and when to pick KEDA over the metric path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Initial scaling-recommendations.md framed steady-state HPA reactivity as ~3:15, citing a "Temporal Cloud aggregation lag." That was wrong. The actual sample-age distribution on the OpenMetrics endpoint is: p50 30s (matches ~1/min emission cadence, age oscillates 0-60s) p95 50s p99 ~tail of occasional gateway-wide stalls So typical end-to-end reactivity is ~85s (emission + scrape + HPA poll), not ~3:15. The 3-minute figures came from observations made during the occasional periods when the OpenMetrics gateway returns frozen timestamps across every series in the account simultaneously - those stalls are real but not steady-state. Doc now: - Replaces the 3:15 figure with empirically-derived ~85s typical. - Adds a "Gateway-wide stalls" caveat describing the frozen-timestamp behavior observationally (no speculation about cause). - Keeps the metricsRelistInterval: 5m recommendation, now justified by the need to exceed stall duration rather than the misattributed "aggregation lag." - Demo README updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier wording implied multiple stall events ("occasional periods")
when we have only directly characterized one such event during this
investigation. Reword to describe exactly what was seen, note that
frequency is not yet known, and that the behavior is open with the
Observability team.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verified directly: across a 3-hour window including one of the observed "stall" events, every gap between consecutive sample timestamps in Prometheus's storage is exactly 60 seconds. So the OpenMetrics endpoint isn't dropping or freezing emissions - it's delivering them late, in bursts after a delay, with their original minute-aligned timestamps. The retrospective record looks complete (good for dashboards), but live HPA consumers see the delay as real staleness because they query the latest available timestamp at decision time. Reframe the caveat in the scaling doc and demo README accordingly. Also note we observed two such delay events in ~2 hours of close observation - frequency in normal operation is still open with the Observability team. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Jay Pipes <jaypipes@gmail.com> Co-authored-by: Stefan Richter <stefan@02strich.de>
Removes a bunch of overly verbose Claude-generated stuff that will likely confuse readers. Reworded a few places where Claude was using some odd terminology -- e.g. "typical end-to-end reactivity" -- to use more straightforward verbiage. Added a brief WRT example HPA template that shows the stabilization window that is referred to in multiple sections of the doc. Signed-off-by: Jay Pipes <jay.pipes@temporal.io>
Signed-off-by: Jay Pipes <jay.pipes@temporal.io>
Signed-off-by: Jay Pipes <jay.pipes@temporal.io>
d1e2d02 to
a80374f
Compare
carlydf
commented
Jun 22, 2026
carlydf
left a comment
Collaborator
Author
There was a problem hiding this comment.
approved! (I can't actually approve because this is technically my PR)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds documentation outlining the tradeoffs between two autoscaling solutions:
Documentation focuses on straightforward descriptions of the pros and cons of each solution.