loadgen: batched pacer to remove the RPS ramp ceiling + actionable shortfall diagnosis by hmchangw · Pull Request #270 · hmchangw/chat

hmchangw · 2026-06-03T14:27:57Z

The open-loop generators drove load with a single time.Ticker at 1s/rate.
At high rates that interval goes sub-millisecond, which the Go runtime can't
honor — the ticker silently coalesces ticks (buffer-of-1) and one event is
released per delivered tick. Achieved RPS plateaued at a few thousand
regardless of --steps, so most max-rps ramps reported INCONCLUSIVE
("load box limited") well below the real service capacity.

Fix A — batched pacer (pacer.go): tick on a coarse, reliably-schedulable
interval (clamped to a floor) and release rate*interval events per tick,
driven by wall-clock elapsed so the long-run rate stays exact. Wired into
both Generator.runPaced and HistoryGenerator.runPaced; MaxInFlight=0 keeps
the legacy serial path for bisection.

Fix E — make INCONCLUSIVE actionable: the pacer reports events it could not
release on schedule as "emit underrun", recorded distinctly from pool
"saturation". evaluateRPSStep now names the dominant load-box limit in the
shortfall reason — emit underrun (add CPU/shards, lower per-box rate) vs
saturation (raise MaxInFlight) — and both surface as CSV columns.

TDD: pacer math, verdict shortfall attribution, collector/build-inputs
plumbing, and a relative serial-vs-paced throughput regression (race-safe).

https://claude.ai/code/session_01JTVrS6qxLaYjA1S837L8Dp

Summary by CodeRabbit

New Features
- Batched rate pacer to improve high-rate throughput and bounded concurrent dispatch.
- Emit underrun and saturation metrics surfaced and used in reporting.
Documentation
- README updated with rate-pacing behavior and explanations of emit_underrun and saturation CSV signals.
Tests
- Added tests covering pacer behavior, underrun accounting, reporting, and shortfall diagnosis.

coderabbitai · 2026-06-03T14:30:35Z

Warning

Review limit reached

@hmchangw, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 34 minutes and 44 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e0d149d-506d-49ea-8dc2-01adf361524e

📥 Commits

Reviewing files that changed from the base of the PR and between 17dac7c and 3ff7f44.

📒 Files selected for processing (17)

tools/loadgen/README.md
tools/loadgen/generator.go
tools/loadgen/generator_test.go
tools/loadgen/history_collector.go
tools/loadgen/history_generator.go
tools/loadgen/main.go
tools/loadgen/main_test.go
tools/loadgen/maxrps_history.go
tools/loadgen/maxrps_history_test.go
tools/loadgen/maxrps_messages.go
tools/loadgen/maxrps_messages_test.go
tools/loadgen/maxrps_report.go
tools/loadgen/maxrps_report_test.go
tools/loadgen/pacer.go
tools/loadgen/pacer_test.go
tools/loadgen/verdict.go
tools/loadgen/verdict_test.go

📝 Walkthrough

Walkthrough

This PR adds rate-based pacing to the loadgen tool, replacing single-ticker emission with a pacer that computes per-tick event counts. Generator and HistoryGenerator are split into serial (MaxInFlight ≤ 0) and paced (MaxInFlight > 0) execution paths. Underrun (when pacer target exceeds load capacity) and saturation (when worker pool is full) are tracked as load-box signals, propagated through verdict evaluation, and exported in CSV output.

Changes

Load generator pacing and underrun metrics

Layer / File(s)	Summary
Pacer rate-limiting algorithm `tools/loadgen/pacer.go`, `tools/loadgen/pacer_test.go`	New pacer struct calculates per-tick event emission based on target rate, tracks underrun when target exceeds capacity, and handles burst capping and fractional remainder accumulation. Tests validate interval clamping, batching, steady-state tracking, underrun, and catch-up behavior.
Generator serial and paced execution paths `tools/loadgen/generator.go`, `tools/loadgen/generator_test.go`	Generator.Run now dispatches to runSerial (MaxInFlight ≤ 0, one message per tick) or runPaced (MaxInFlight > 0, batched via pacer). Paced path records underrun and saturation metrics when worker pool exhaustion occurs. Regression test verifies paced path out-dispatches serial ceiling under high rate.
HistoryCollector and HistoryGenerator refactoring `tools/loadgen/history_collector.go`, `tools/loadgen/history_generator.go`	HistoryCollector adds underrun field and thread-safe RecordUnderrun/UnderrunCount methods. HistoryGenerator.Run mirrors Generator with serial and paced paths, delegating to serialDispatch/pacedDispatch and wiring Collector.RecordUnderrun/RecordSaturation.
Metrics gathering and hard publish error counting `tools/loadgen/main.go`, `tools/loadgen/main_test.go`, `tools/loadgen/maxrps_messages.go`, `tools/loadgen/maxrps_messages_test.go`, `tools/loadgen/maxrps_history.go`, `tools/loadgen/maxrps_history_test.go`	Adds hardPublishErrorCount to sum only publish-side hard errors and makes runRun resilient to Gather errors. Includes "underrun" in tracked message reasons and surfaces EmitUnderrun from message/history builders; tests validate filtering and propagation.
Verdict evaluation of underrun and saturation `tools/loadgen/verdict.go`, `tools/loadgen/verdict_test.go`	rpsStepInputs and rpsStepResult gain EmitUnderrun; shortfallReason refactored to differentiate underrun vs saturation with specific diagnostic messages. Tests assert verdict content and fields for both cases.
CSV reporting with emit_underrun column `tools/loadgen/maxrps_report.go`, `tools/loadgen/maxrps_report_test.go`	emit_underrun column added to CSV output after saturation. Row construction and header assertions updated to include r.EmitUnderrun.
Documentation of rate-pacing signals `tools/loadgen/README.md`	New section documents emit_underrun and saturation load-box signals and explains rate-pacing behavior vs single-ticker serial mode.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

hmchangw/chat#117: Related earlier work on worker-pool dispatch and reason="saturated" publish-error handling.
hmchangw/chat#230: Related changes to HistoryCollector and its metrics plumbing.
hmchangw/chat#240: Related SLO/evaluation changes that consume emit_underrun/verdict inputs.

Suggested labels

ready

Suggested reviewers

mliu33
Joey0538

Poem

🐰 I hop the tick and count each run,
A pacer’s math beneath the sun.
When bursts fall short it cries “underrun,”
When pools are full the lights will stun.
Batch and beat — the metrics hum.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main changes: introducing a batched pacer to overcome RPS ceiling limitations and adding diagnostic information for rate shortfalls.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/loadgen-rps-optimization-Jr2JW

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

tools/loadgen/maxrps_report_test.go (1)
58-59: ⚡ Quick win

Assert exact CSV header order, not just presence.

These checks only confirm columns exist. Since column order is part of the report contract, assert the full header string (or parsed slice equality) to prevent silent ordering regressions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/loadgen/maxrps_report_test.go` around lines 58 - 59, Replace the loose
presence checks on the CSV header (the two assert.Contains calls that inspect
lines[0]) with an exact-order assertion: either compare lines[0] to the expected
full header string or parse lines[0] into a slice and assert slice equality
against the exact expected column order; update the assertions that reference
lines and the assert package accordingly so the test enforces header order
rather than just presence.
tools/loadgen/verdict_test.go (1)
194-226: ⚡ Quick win

Add a tie-case shortfall test for equal underrun/saturation counts.

The new tests cover both dominant paths, but not the boundary where EmitUnderrun == Saturation > 0. Adding this case will lock in expected attribution behavior and prevent regressions in the main diagnosis path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/loadgen/verdict_test.go` around lines 194 - 226, Add a test that covers
the tie case where EmitUnderrun == Saturation > 0: create an rpsStepInputs with
TargetRPS/AttemptedOps similar to existing tests, set EmitUnderrun and
Saturation to the same non-zero value (e.g., 250), and call evaluateRPSStep;
assert the returned verdict.Kind is verdictInconclusive and assert the diagnosis
consistently attributes the shortfall to the chosen path (for example assert
Reasons[0] contains "underrun" and "shard" and that got.EmitUnderrun equals the
tie value) so the behavior is locked in and will fail if attribution changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/loadgen/generator.go`:
- Around line 91-94: The current logic treats any MaxInFlight <= 0 as serial;
instead reject negative values and reserve serial only for exactly 0: in
Generator.Run check g.cfg.MaxInFlight and if it's negative return an error
(don’t fall back to runSerial), if it equals 0 call runSerial(ctx), otherwise
call runPaced(ctx); apply the same change to HistoryGenerator.Run which mirrors
this logic so both return a clear error for negative MaxInFlight rather than
silently switching modes.

In `@tools/loadgen/history_generator.go`:
- Around line 229-232: The current branch treats any non-positive
g.cfg.MaxInFlight as serial, which hides invalid negative settings; change the
logic to explicitly validate g.cfg.MaxInFlight: if g.cfg.MaxInFlight < 0 return
an error (fail fast) describing the invalid concurrency value, if == 0 call
g.runSerial(ctx) and otherwise call g.runPaced(ctx); update the calling site or
signature to propagate the error like Generator.Run does so invalid negatives
are rejected rather than silently using runSerial.

In `@tools/loadgen/verdict.go`:
- Around line 175-183: The switch that returns fmt.Sprintf(...) misattributes
when in.EmitUnderrun == in.Saturation and both are non-zero, because it falls
through to the saturation branch; add a dedicated equality case before the
existing comparisons that checks in.EmitUnderrun == in.Saturation &&
in.Saturation > 0 and returns a neutral/tie message (using base) without
suggesting a single dominant remediation, then keep the current branches for
in.EmitUnderrun > in.Saturation and in.Saturation > 0 as-is.

---

Nitpick comments:
In `@tools/loadgen/maxrps_report_test.go`:
- Around line 58-59: Replace the loose presence checks on the CSV header (the
two assert.Contains calls that inspect lines[0]) with an exact-order assertion:
either compare lines[0] to the expected full header string or parse lines[0]
into a slice and assert slice equality against the exact expected column order;
update the assertions that reference lines and the assert package accordingly so
the test enforces header order rather than just presence.

In `@tools/loadgen/verdict_test.go`:
- Around line 194-226: Add a test that covers the tie case where EmitUnderrun ==
Saturation > 0: create an rpsStepInputs with TargetRPS/AttemptedOps similar to
existing tests, set EmitUnderrun and Saturation to the same non-zero value
(e.g., 250), and call evaluateRPSStep; assert the returned verdict.Kind is
verdictInconclusive and assert the diagnosis consistently attributes the
shortfall to the chosen path (for example assert Reasons[0] contains "underrun"
and "shard" and that got.EmitUnderrun equals the tie value) so the behavior is
locked in and will fail if attribution changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 39d1a90d-cba4-4485-b268-b47adf879d57

📥 Commits

Reviewing files that changed from the base of the PR and between 1c68ea2 and fc550c5.

📒 Files selected for processing (15)

tools/loadgen/README.md
tools/loadgen/generator.go
tools/loadgen/generator_test.go
tools/loadgen/history_collector.go
tools/loadgen/history_generator.go
tools/loadgen/maxrps_history.go
tools/loadgen/maxrps_history_test.go
tools/loadgen/maxrps_messages.go
tools/loadgen/maxrps_messages_test.go
tools/loadgen/maxrps_report.go
tools/loadgen/maxrps_report_test.go
tools/loadgen/pacer.go
tools/loadgen/pacer_test.go
tools/loadgen/verdict.go
tools/loadgen/verdict_test.go

coderabbitai · 2026-06-03T14:37:22Z

+	if g.cfg.MaxInFlight <= 0 {
+		return g.runSerial(ctx)
+	}
+	return g.runPaced(ctx)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject negative MaxInFlight instead of silently switching to serial mode.

MaxInFlight is the concurrency knob for paced runs, but <= 0 currently routes invalid negative values into the legacy bisection path. That will make a misconfigured run look artificially slow and skew the new underrun/saturation diagnostics instead of failing fast. Reserve serial mode for exactly 0 and reject negatives here (and in HistoryGenerator.Run, which mirrors this logic).

Suggested fix

func (g *Generator) Run(ctx context.Context) error { if g.cfg.Rate <= 0 { return fmt.Errorf("rate must be > 0") } - if g.cfg.MaxInFlight <= 0 { + if g.cfg.MaxInFlight < 0 { + return fmt.Errorf("max in flight must be >= 0") + } + if g.cfg.MaxInFlight == 0 { return g.runSerial(ctx) } return g.runPaced(ctx) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if g.cfg.MaxInFlight <= 0 {

return g.runSerial(ctx)

}

return g.runPaced(ctx)

if g.cfg.MaxInFlight < 0 {

return fmt.Errorf("max in flight must be >= 0")

}

if g.cfg.MaxInFlight == 0 {

return g.runSerial(ctx)

}

return g.runPaced(ctx)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/generator.go` around lines 91 - 94, The current logic treats any MaxInFlight <= 0 as serial; instead reject negative values and reserve serial only for exactly 0: in Generator.Run check g.cfg.MaxInFlight and if it's negative return an error (don’t fall back to runSerial), if it equals 0 call runSerial(ctx), otherwise call runPaced(ctx); apply the same change to HistoryGenerator.Run which mirrors this logic so both return a clear error for negative MaxInFlight rather than silently switching modes.

coderabbitai · 2026-06-03T14:37:22Z

+	if g.cfg.MaxInFlight <= 0 {
+		return g.runSerial(ctx)
+	}
+	return g.runPaced(ctx)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on negative MaxInFlight here as well.

This path has the same problem as Generator.Run: <= 0 quietly turns an invalid negative concurrency setting into the legacy serial mode. For a benchmarking tool that changes run semantics materially, that should be a validation error, not an implicit fallback.

Suggested fix

func (g *HistoryGenerator) Run(ctx context.Context) error { if g.cfg.Rate <= 0 { return fmt.Errorf("rate must be > 0") } - if g.cfg.MaxInFlight <= 0 { + if g.cfg.MaxInFlight < 0 { + return fmt.Errorf("max in flight must be >= 0") + } + if g.cfg.MaxInFlight == 0 { return g.runSerial(ctx) } return g.runPaced(ctx) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if g.cfg.MaxInFlight <= 0 {

return g.runSerial(ctx)

}

return g.runPaced(ctx)

func (g *HistoryGenerator) Run(ctx context.Context) error {

if g.cfg.Rate <= 0 {

return fmt.Errorf("rate must be > 0")

}

if g.cfg.MaxInFlight < 0 {

return fmt.Errorf("max in flight must be >= 0")

}

if g.cfg.MaxInFlight == 0 {

return g.runSerial(ctx)

}

return g.runPaced(ctx)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/history_generator.go` around lines 229 - 232, The current branch treats any non-positive g.cfg.MaxInFlight as serial, which hides invalid negative settings; change the logic to explicitly validate g.cfg.MaxInFlight: if g.cfg.MaxInFlight < 0 return an error (fail fast) describing the invalid concurrency value, if == 0 call g.runSerial(ctx) and otherwise call g.runPaced(ctx); update the calling site or signature to propagate the error like Generator.Run does so invalid negatives are rejected rather than silently using runSerial.

coderabbitai · 2026-06-03T14:37:23Z

+	switch {
+	case in.EmitUnderrun > in.Saturation:
+		return fmt.Sprintf("%s — load box could not emit on schedule (emit underrun=%d, saturation=%d); "+
+			"reduce per-box rate, add load shards, or give the load box more CPU",
+			base, in.EmitUnderrun, in.Saturation)
+	case in.Saturation > 0:
+		return fmt.Sprintf("%s — in-flight pool saturated (saturation=%d, emit underrun=%d); "+
+			"raise MaxInFlight (and/or reduce backend latency)",
+			base, in.Saturation, in.EmitUnderrun)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle tie case explicitly to avoid misleading “dominant” attribution.

When EmitUnderrun == Saturation and both are non-zero, the current switch falls into the saturation branch, even though neither signal dominates. This can misdirect remediation guidance in INCONCLUSIVE diagnostics.

Suggested fix

switch { case in.EmitUnderrun > in.Saturation: return fmt.Sprintf("%s — load box could not emit on schedule (emit underrun=%d, saturation=%d); "+ "reduce per-box rate, add load shards, or give the load box more CPU", base, in.EmitUnderrun, in.Saturation) +case in.EmitUnderrun > 0 && in.EmitUnderrun == in.Saturation: + return fmt.Sprintf("%s — load box limited by both emit scheduling and in-flight capacity equally "+ + "(emit underrun=%d, saturation=%d); reduce per-box rate/add shards/CPU and raise MaxInFlight as needed", + base, in.EmitUnderrun, in.Saturation) case in.Saturation > 0: return fmt.Sprintf("%s — in-flight pool saturated (saturation=%d, emit underrun=%d); "+ "raise MaxInFlight (and/or reduce backend latency)", base, in.Saturation, in.EmitUnderrun)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

switch {

case in.EmitUnderrun > in.Saturation:

return fmt.Sprintf("%s — load box could not emit on schedule (emit underrun=%d, saturation=%d); "+

"reduce per-box rate, add load shards, or give the load box more CPU",

base, in.EmitUnderrun, in.Saturation)

case in.Saturation > 0:

return fmt.Sprintf("%s — in-flight pool saturated (saturation=%d, emit underrun=%d); "+

"raise MaxInFlight (and/or reduce backend latency)",

base, in.Saturation, in.EmitUnderrun)

switch {

case in.EmitUnderrun > in.Saturation:

return fmt.Sprintf("%s — load box could not emit on schedule (emit underrun=%d, saturation=%d); "+

"reduce per-box rate, add load shards, or give the load box more CPU",

base, in.EmitUnderrun, in.Saturation)

case in.EmitUnderrun > 0 && in.EmitUnderrun == in.Saturation:

return fmt.Sprintf("%s — load box limited by both emit scheduling and in-flight capacity equally "+

"(emit underrun=%d, saturation=%d); reduce per-box rate/add shards/CPU and raise MaxInFlight as needed",

base, in.EmitUnderrun, in.Saturation)

case in.Saturation > 0:

return fmt.Sprintf("%s — in-flight pool saturated (saturation=%d, emit underrun=%d); "+

"raise MaxInFlight (and/or reduce backend latency)",

base, in.Saturation, in.EmitUnderrun)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/verdict.go` around lines 175 - 183, The switch that returns fmt.Sprintf(...) misattributes when in.EmitUnderrun == in.Saturation and both are non-zero, because it falls through to the saturation branch; add a dedicated equality case before the existing comparisons that checks in.EmitUnderrun == in.Saturation && in.Saturation > 0 and returns a neutral/tie message (using base) without suggesting a single dominant remediation, then keep the current branches for in.EmitUnderrun > in.Saturation and in.Saturation > 0 as-is.

coderabbitai

🧹 Nitpick comments (1)

tools/loadgen/main_test.go (1)
20-37: ⚡ Quick win

Consider adding edge case tests for nil and empty inputs.

The test validates the happy path well, but the coding guidelines require edge case coverage. Adding test cases for hardPublishErrorCount(nil) and hardPublishErrorCount([]*dto.MetricFamily{}) would document the function's safe handling of these inputs.
📋 Proposed edge case tests
+func TestHardPublishErrorCount_NilInput(t *testing.T) {
+	assert.Equal(t, 0, hardPublishErrorCount(nil))
+}
+
+func TestHardPublishErrorCount_EmptyMetrics(t *testing.T) {
+	assert.Equal(t, 0, hardPublishErrorCount([]*dto.MetricFamily{}))
+}
As per coding guidelines: "Test coverage must include: happy path, error paths, edge cases (empty collections, boundary conditions), and invalid input."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/loadgen/main_test.go` around lines 20 - 37, Add edge-case assertions to
TestHardPublishErrorCount_ExcludesSelfLimitAndGatekeeper: call
hardPublishErrorCount(nil) and hardPublishErrorCount([]*dto.MetricFamily{}) and
assert both return 0; place these two checks alongside the existing happy-path
assertion in the same test (or as subtests) so the function
hardPublishErrorCount is validated for nil and empty-slice inputs.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tools/loadgen/main_test.go`:
- Around line 20-37: Add edge-case assertions to
TestHardPublishErrorCount_ExcludesSelfLimitAndGatekeeper: call
hardPublishErrorCount(nil) and hardPublishErrorCount([]*dto.MetricFamily{}) and
assert both return 0; place these two checks alongside the existing happy-path
assertion in the same test (or as subtests) so the function
hardPublishErrorCount is validated for nil and empty-slice inputs.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 535ca8c7-e77c-41da-a079-bb0c2e52eb22

📥 Commits

Reviewing files that changed from the base of the PR and between fc550c5 and 17dac7c.

📒 Files selected for processing (17)

tools/loadgen/README.md
tools/loadgen/generator.go
tools/loadgen/generator_test.go
tools/loadgen/history_collector.go
tools/loadgen/history_generator.go
tools/loadgen/main.go
tools/loadgen/main_test.go
tools/loadgen/maxrps_history.go
tools/loadgen/maxrps_history_test.go
tools/loadgen/maxrps_messages.go
tools/loadgen/maxrps_messages_test.go
tools/loadgen/maxrps_report.go
tools/loadgen/maxrps_report_test.go
tools/loadgen/pacer.go
tools/loadgen/pacer_test.go
tools/loadgen/verdict.go
tools/loadgen/verdict_test.go

✅ Files skipped from review due to trivial changes (2)

tools/loadgen/README.md
tools/loadgen/maxrps_report.go

🚧 Files skipped from review as they are similar to previous changes (10)

tools/loadgen/pacer.go
tools/loadgen/history_collector.go
tools/loadgen/maxrps_messages_test.go
tools/loadgen/maxrps_history_test.go
tools/loadgen/maxrps_history.go
tools/loadgen/maxrps_messages.go
tools/loadgen/verdict.go
tools/loadgen/verdict_test.go
tools/loadgen/generator_test.go
tools/loadgen/pacer_test.go

mliu33

Thanks!

…ortfall diagnosis The open-loop generators drove load with a single time.Ticker at 1s/rate. At high rates that interval goes sub-millisecond, which the Go runtime can't honor — the ticker silently coalesces ticks (buffer-of-1) and one event is released per delivered tick. Achieved RPS plateaued at a few thousand regardless of --steps, so most max-rps ramps reported INCONCLUSIVE ("load box limited") well below the real service capacity. Fix A — batched pacer (pacer.go): tick on a coarse, reliably-schedulable interval (clamped to a floor) and release rate*interval events per tick, driven by wall-clock elapsed so the long-run rate stays exact. Wired into both Generator.runPaced and HistoryGenerator.runPaced; MaxInFlight=0 keeps the legacy serial path for bisection. Fix E — make INCONCLUSIVE actionable: the pacer reports events it could not release on schedule as "emit underrun", recorded distinctly from pool "saturation". evaluateRPSStep now names the dominant load-box limit in the shortfall reason — emit underrun (add CPU/shards, lower per-box rate) vs saturation (raise MaxInFlight) — and both surface as CSV columns. TDD: pacer math, verdict shortfall attribution, collector/build-inputs plumbing, and a relative serial-vs-paced throughput regression (race-safe). https://claude.ai/code/session_01JTVrS6qxLaYjA1S837L8Dp

…ned-run summary The `loadgen run` summary computed PublishErrors as "sum of all loadgen_publish_errors_total series minus gatekeeper". That bucket includes the load-box self-limit reasons, which are pacing diagnostics rather than publish failures: "saturated" (pre-existing) and "underrun" (added with the batched pacer). A plain sustained run therefore over-reported publish errors by however much the box saturated the in-flight pool or fell behind schedule. Replace the fragile sum-all-minus-gatekeeper pattern with an explicit enumeration of the real error reasons (publish, marshal, bad_reply), mirroring buildMessagesInputs. This fixes the new underrun miscount and the pre-existing saturated miscount, and won't silently break when future non-error reasons are added. https://claude.ai/code/session_01JTVrS6qxLaYjA1S837L8Dp

…allReason args Cleanup pass — no behavior change. - Extract the batched-pacer worker-pool loop and the legacy serial-ticker loop into shared serialDispatch/pacedDispatch helpers in pacer.go. Both Generator and HistoryGenerator now delegate via thin adapters, removing ~40 lines of byte-identical loop duplication that differed only in the dispatched method and the underrun sink (Prometheus label vs HistoryCollector tally). - shortfallReason no longer takes *rpsStepInputs: every field it reads (TargetRPS, AchievedRPS, Saturation, EmitUnderrun) is already on the *rpsStepResult it was passed, so the redundant input arg is dropped. https://claude.ai/code/session_01JTVrS6qxLaYjA1S837L8Dp

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

hmchangw force-pushed the claude/loadgen-rps-optimization-Jr2JW branch from 3dd88cd to 17dac7c Compare June 4, 2026 05:06

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

mliu33 approved these changes Jun 4, 2026

View reviewed changes

claude added 3 commits June 4, 2026 06:40

hmchangw force-pushed the claude/loadgen-rps-optimization-Jr2JW branch from 17dac7c to 3ff7f44 Compare June 4, 2026 06:45

hmchangw merged commit de844af into main Jun 4, 2026
6 checks passed

hmchangw deleted the claude/loadgen-rps-optimization-Jr2JW branch June 4, 2026 07:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loadgen: batched pacer to remove the RPS ramp ceiling + actionable shortfall diagnosis#270

loadgen: batched pacer to remove the RPS ramp ceiling + actionable shortfall diagnosis#270
hmchangw merged 3 commits into
mainfrom
claude/loadgen-rps-optimization-Jr2JW

hmchangw commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Uh oh!

coderabbitai Bot Jun 3, 2026

Uh oh!

coderabbitai Bot Jun 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

mliu33 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-	if g.cfg.MaxInFlight <= 0 {
-		return g.runSerial(ctx)
-	}
-	return g.runPaced(ctx)
+func (g *HistoryGenerator) Run(ctx context.Context) error {
+	if g.cfg.Rate <= 0 {
+		return fmt.Errorf("rate must be > 0")
+	}
+	if g.cfg.MaxInFlight < 0 {
+		return fmt.Errorf("max in flight must be >= 0")
+	}
+	if g.cfg.MaxInFlight == 0 {
+		return g.runSerial(ctx)
+	}
+	return g.runPaced(ctx)
+}

Conversation

hmchangw commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mliu33 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hmchangw commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading