feat(loadgen): add read-receipt workload to max-rps by hmchangw · Pull Request #264 · hmchangw/chat

hmchangw · 2026-06-02T11:20:43Z

Summary

Adds a read-receipt workload to the loadgen max-rps ramp command. It drives the room-service read-receipt RPC (chat.user.{account}.request.room.{roomID}.{siteID}.message.read-receipt) — a synchronous request/reply read ("who has read message X") — at increasing RPS steps to find the maximum sustainable rate under the latency/error SLOs.

Mirrors the existing history workload (synchronous read, no JetStream consumer): plugs into the rpsWorkload interface and reuses the ramp engine, verdict logic, and report rendering unchanged.
Reuses the history seed and presets. Targets are derived from the message plan (top-level messages only); the requester is each message's sender, satisfying the handler's msgSender == requesterAccount guard.
New seed --workload=read-receipt --read-ratio=0.7 step stamps lastSeenAt on a configurable fraction of each room's subscribers so ListReadReceipts exercises its real $match/$lookup/$unwind path instead of short-circuiting on an empty match.
--slo-pending-growth is ignored (no consumer); --request-timeout sets the per-request timeout.

New files: readreceipt_collector.go, readreceipt_requester.go, readreceipt_generator.go, maxrps_readreceipt.go, readreceipt_seed.go (+ unit tests and an integration test). Wiring in maxrps.go, main.go, history_main.go. Docs in tools/loadgen/README.md. Design + plan under docs/superpowers/.

Test Plan

make test SERVICE=tools/loadgen — unit suite passes (race detector)
make lint — 0 issues
make fmt — no changes
gosec — PASS (govulncheck/semgrep run in CI; blocked locally by network policy)
make test-integration SERVICE=tools/loadgen — TestReadReceiptWorkload_EndToEnd (compiles + go vet -tags integration clean; runs in CI — Docker unavailable in the dev environment)
Manual: loadgen seed --workload=read-receipt --preset=history-medium --read-ratio=0.7 then loadgen max-rps --workload=read-receipt --preset=history-medium --steps=200,500,1k,2k,5k

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

Generated by Claude Code

Summary by CodeRabbit

New Features
- Added read-receipt workload option to loadgen max-rps for capacity testing
- Introduced --read-ratio flag to configure the fraction of subscribers marked as readers during workload preparation
- Added read-receipt seeding support via loadgen seed command
Documentation
- Updated loadgen README with read-receipt workload usage, examples, and CLI reference
- Added design specification and implementation plan documentation

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

Moves the production natsReadReceiptRequester impl out of the interface file (which caused unused-lint failures) to align with the history pattern where the concrete NATS requester lives in the main wiring file. https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

Wires the readReceiptWorkload into the max-rps ramp engine: restores the production natsReadReceiptRequester, adds the read-receipt case to runMaxRPS, and introduces buildReadReceiptInputs/runReadReceiptFor. https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

Implements selectReaders (ceil-based uniform random selection) and latestTopLevelByRoom helpers, then composes them in SeedReadReceiptState to stamp lastSeenAt on a configurable fraction of subscribers per room. Wires a new "read-receipt" workload into the seed subcommand via runSeedReadReceipt, which seeds the full history fixture set (Mongo + Cassandra) before stamping reader state. https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

coderabbitai · 2026-06-02T11:20:55Z

📝 Walkthrough

Walkthrough

This pull request implements a complete read-receipt workload for the loadgen max-rps capacity-testing tool. It adds seeding logic to mark subscribers as readers in MongoDB, a generator to issue concurrent RPC requests at target rate, and CLI wiring to integrate the workload into existing seed and max-rps commands within the ramp/verdict/report pipeline.

Changes

Read-receipt max-rps workload implementation

Layer / File(s)	Summary
Design and planning documentation `docs/superpowers/plans/2026-06-02-loadgen-read-receipt-maxrps.md`, `docs/superpowers/specs/2026-06-02-loadgen-read-receipt-maxrps-design.md`	Two new design documents specify the workload purpose (drive read-receipt RPC at increasing RPS and report max sustainable rate under SLOs), architecture (reuse ramp/verdict/report pipeline), seeding strategy (stamp `lastSeenAt` on configurable subscriber fraction), component breakdown (collector, requester, generator, workload adapter, seed helpers), testing plan (unit and integration tests), and CLI wiring changes.
Collector and requester transport `tools/loadgen/readreceipt_collector.go`, `tools/loadgen/readreceipt_collector_test.go`, `tools/loadgen/readreceipt_requester.go`	`ReadReceiptCollector` provides thread-safe latency/error/saturation recording via `RecordSample`, `RecordError`, and `RecordSaturation` methods; defensive-copy accessors return latencies, failed counts, and saturation ticks. `ReadReceiptRequester` interface and `natsReadReceiptRequester` implementation issue synchronous request/reply RPCs with per-call timeouts, wrapping NATS errors contextually.
Read-receipt seeding logic `tools/loadgen/readreceipt_seed.go`, `tools/loadgen/readreceipt_seed_test.go`	`SeedReadReceiptState` function deterministically selects `ceil(readRatio * n)` subscribers per room using seeded RNG permutation, derives the latest top-level message timestamp per room (excluding threaded replies), and stamps `lastSeenAt` in MongoDB via `subscriptions.UpdateMany`. Tests validate selection ratio, determinism, and edge cases; `latestTopLevelByRoom` correctness on mixed thread/top-level messages.
Read-receipt generator for concurrent RPC dispatch `tools/loadgen/readreceipt_generator.go`, `tools/loadgen/readreceipt_generator_test.go`	`ReadReceiptGenerator` issues requests at configured rate in open-loop pattern, deriving targets from top-level messages only. Optional semaphore-based concurrency bounding (with `MaxInFlight`) records saturation when capacity exhausted. Each request classifies failures (request errors, bad replies, embedded errors) and records latencies on success using collector. Tests cover target derivation, rate validation, error classification, and request issuance.
Read-receipt workload adapter and step execution `tools/loadgen/maxrps_readreceipt.go`, `tools/loadgen/maxrps_readreceipt_test.go`, `tools/loadgen/readreceipt_integration_test.go`	`readReceiptWorkload` implements `rpsWorkload` interface: `RunStep` executes warmup (discarded) and hold (measured) phases sequentially via generator runs, drains trailing replies, and returns normalized `rpsStepInputs` with latency samples, operation counts, and saturation. `newReadReceiptWorkload` wires NATS, metrics HTTP server, and requester; cleanup shuts down server and drains NATS. Unit test validates input field population from collector state; integration test seeds MongoDB, stubs NATS responder, runs generator, and asserts seeded count and non-zero samples with zero failures.
Seed command CLI integration `tools/loadgen/history_main.go`, `tools/loadgen/main.go`	Adds `--read-ratio` flag (default 0.7) to `loadgen seed`; extends workload dispatcher with `read-receipt` case that calls `runSeedReadReceipt`. `runSeedReadReceipt` validates `readRatio` range, loads history preset, connects to Mongo/Cassandra, builds/seeds history fixtures, then calls `SeedReadReceiptState`, returning exit code on success/error.
Max-rps command CLI integration `tools/loadgen/maxrps.go`	Expands `--workload` flag to include `read-receipt`; routes both `history` and `read-receipt` through same default step sequence. Adds `read-receipt` case in `runMaxRPS` that validates preset/timeout, constructs `readReceiptWorkload` via `newReadReceiptWorkload`, and wires cleanup. Updates `--request-timeout` help to apply to both `history` and `read-receipt`.
README documentation `tools/loadgen/README.md`	Extends `max-rps` synopsis and quick-start examples to include `read-receipt` workload with preparatory `loadgen seed --read-ratio` step. Updates `--workload` flag list and `--request-timeout` notes to reference `read-receipt`. Adds dedicated "Read-receipt workload" section documenting RPC target, seeding prerequisites, timeout applicability, and teardown mapping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The PR combines heterogeneous new components (collector, requester, generator, seeding helpers, workload adapter) across multiple files with logic density in the generator concurrency bounding and seeding selection. However, each component is self-contained with unit tests, and the architecture mirrors the existing history workload pattern. The integration points are straightforward wiring into existing CLI dispatch tables and the rpsWorkload interface.

Possibly related PRs

hmchangw/chat#240: Extends the existing max-rps ramp/SLO framework introduced in PR #240 by adding the new read-receipt workload into the same verdict/report pipeline.
hmchangw/chat#167: Depends on the room-service read-receipt RPC interface (subjects, request/response types, reader list behavior) introduced in PR #167.
hmchangw/chat#117: Extends the loadgen CLI seed/workload dispatch structure established in PR #117 with the new read-receipt seed and max-rps paths.

Suggested labels

ready

Suggested reviewers

mliu33
Joey0538

Poem

🐰 A workload born to test the load,
Read-receipts marching down the road,
With collectors, seeds, and rates so neat,
The max-rps pipeline stays complete!
hops off with a scroll 📜

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding a new read-receipt workload to the loadgen max-rps command, which is the primary objective of this PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/magical-dirac-1ytWo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/plans/2026-06-02-loadgen-read-receipt-maxrps.md`:
- Around line 700-713: The RunStep method in readReceiptWorkload uses time.Sleep
to "drain" in-flight replies; replace this with explicit synchronization by
having the generator created by w.newGenerator provide a deterministic
completion signal (for example a done channel or sync.WaitGroup) and waiting on
that signal instead of time.Sleep. Concretely: update
w.newGenerator/runReadReceiptFor contract so runReadReceiptFor returns (or
exposes) a completion indicator tied to the generator instance (e.g., a done
chan struct{} or *sync.WaitGroup) that is closed/marked when all reply
goroutines are finished; then in RunStep (the function shown) remove
time.Sleep(2 * time.Second) and wait on that completion signal before calling
buildReadReceiptInputs with the collector; ensure NewReadReceiptCollector and
collector usage remain unchanged but are only read after the generator signals
completion.

In `@docs/superpowers/specs/2026-06-02-loadgen-read-receipt-maxrps-design.md`:
- Around line 125-128: The RunStep description currently tells implementers to
"sleep briefly to drain trailing in-flight replies" — replace that with explicit
synchronization: when mirroring historyWorkload.RunStep (functions/methods
referenced: RunStep, buildReadReceiptInputs, warmup, hold, collector) require
the warmup and hold generators to run as distinct goroutines that signal
completion via channels or a sync.WaitGroup and ensure the measured collector
drains by waiting for a done signal from the generator/collector rather than
sleeping; document that the throwaway collector used for warmup discards samples
and does not block the synchronization, and specify the exact signal (e.g.,
generator done channel or WaitGroup) that callers must wait on before calling
buildReadReceiptInputs(targetRPS, hold, collector).
- Around line 93-95: Update the spec to reflect the actual CLI contract: replace
references to the non-existent `seed-read-receipt` subcommand with the real
invocation `loadgen seed --workload read-receipt` and show the same parameters
(`--preset`, `--seed`, `--read-ratio`) passed to `loadgen seed`; specifically
edit the sections that describe the command (currently mentioning
`seed-read-receipt`) so they present the correct command shape (e.g., `loadgen
seed --workload read-receipt --preset ... --seed ... --read-ratio ...`) and any
examples or docs that repeat `seed-read-receipt` to avoid operator confusion.
- Line 77: Remove the dangling Reset() references from the collector spec:
update the descriptions for the readreceipt_collector and history_collector to
omit "Reset()"-able and instead state the collector provides an in-memory
latency tape, timeout/reply-error/bad-reply/saturation counters and is
thread-safe (mutex); ensure any other occurrences (e.g., the second mention
around line 153) are likewise cleaned up and optionally add a short note that
Reset() will be documented when/if a Reset API is actually introduced.

In `@tools/loadgen/history_main.go`:
- Around line 82-84: The validation for the command-line flag readRatio
incorrectly allows NaN because flag.Float64 parses "NaN" and the current check
(readRatio <= 0 || readRatio > 1) doesn't catch it; update the validation in the
same block that references readRatio (before calling SeedReadReceiptState) to
explicitly reject NaN (use math.IsNaN(readRatio)) and treat it as invalid,
printing the same error message "--read-ratio must be in (0, 1]" and returning
2.

In `@tools/loadgen/maxrps_readreceipt.go`:
- Around line 71-76: In the cleanup closure, don't ignore errors from
srv.Shutdown and nc.Drain; capture each return value (e.g., err :=
srv.Shutdown(shutCtx) and err := nc.Drain()) and log a warning or error with
context (for example "server shutdown failed" and "nats drain failed") including
the error value so teardown failures are visible; keep the existing context
cancel call and ensure logging happens before returning from the cleanup func.
- Around line 112-119: The goroutine currently discards the error returned by
gen.Run(genCtx), causing failures to be hidden; change it to capture the result
(e.g., send the error to a buffered channel or store it in a variable
synchronized with the WaitGroup) and after wg.Wait() check that error and return
it instead of the nil from waitOrCancel; ensure you still call cancel() and
wait, and return the gen.Run error if present (whether it finished before or
after waitOrCancel). Reference: gen.Run, genCtx, waitOrCancel, cancel, wg.

In `@tools/loadgen/README.md`:
- Around line 320-325: The README example is misleading because
runTeardownHistory derives room IDs from the preset+seed, so teardown must be
run with the same --seed used to create the fixtures; update the docs to state
this requirement, mention that failing to pass the original --seed will leave
Valkey room keys behind, and change the example for loadgen teardown
--workload=history --preset=history-medium to include a matching --seed (e.g.,
--seed=1234) and a short note referencing runTeardownHistory so readers know why
the seed must match.

In `@tools/loadgen/readreceipt_integration_test.go`:
- Around line 78-84: Replace the fixed time.Sleep by waiting on a deterministic
signal from the generator/responder: modify the test to create a done channel
(or use a sync.WaitGroup) that the stub responder closes or signals when it has
delivered all trailing replies, then replace time.Sleep(500*time.Millisecond)
with a select that waits on that done channel (or WaitGroup completion) or the
runCtx.Done() for timeout; update the stub responder (or generator) to signal
the done channel when it finishes emitting replies so the test uses
gen.Run(runCtx) plus a synchronous wait on that signal before asserting on
collector.Samples() and collector.Failed().

In `@tools/loadgen/readreceipt_seed.go`:
- Around line 69-77: byRoom is a map so iterating over it uses a
nondeterministic order and consumes the shared RNG (rng) unpredictably; to
restore deterministic behavior for SeedReadReceiptState, collect the room IDs
from byRoom, sort them (e.g., lexicographically), and iterate over the sorted
slice when generating lastSeen and calling selectReaders so the same rng
sequence is consumed in a stable order for every run.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d0270cc3-2bf2-4650-a477-4f67c93bdacb

📥 Commits

Reviewing files that changed from the base of the PR and between 9c5d14a and 7695977.

📒 Files selected for processing (16)

docs/superpowers/plans/2026-06-02-loadgen-read-receipt-maxrps.md
docs/superpowers/specs/2026-06-02-loadgen-read-receipt-maxrps-design.md
tools/loadgen/README.md
tools/loadgen/history_main.go
tools/loadgen/main.go
tools/loadgen/maxrps.go
tools/loadgen/maxrps_readreceipt.go
tools/loadgen/maxrps_readreceipt_test.go
tools/loadgen/readreceipt_collector.go
tools/loadgen/readreceipt_collector_test.go
tools/loadgen/readreceipt_generator.go
tools/loadgen/readreceipt_generator_test.go
tools/loadgen/readreceipt_integration_test.go
tools/loadgen/readreceipt_requester.go
tools/loadgen/readreceipt_seed.go
tools/loadgen/readreceipt_seed_test.go

coderabbitai · 2026-06-02T11:31:19Z

+func (w *readReceiptWorkload) RunStep(ctx context.Context, targetRPS int, warmup, hold time.Duration) (rpsStepInputs, error) {
+	if warmup > 0 {
+		warmCollector := NewReadReceiptCollector()
+		if err := runReadReceiptFor(ctx, w.newGenerator(warmCollector, targetRPS), warmup); err != nil {
+			return rpsStepInputs{}, err
+		}
+	}
+	collector := NewReadReceiptCollector()
+	if err := runReadReceiptFor(ctx, w.newGenerator(collector, targetRPS), hold); err != nil {
+		return rpsStepInputs{}, err
+	}
+	time.Sleep(2 * time.Second) // drain trailing in-flight replies into the collector
+	return buildReadReceiptInputs(targetRPS, hold, collector), nil
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace sleep-based draining with explicit synchronization in the prescribed RunStep flow.

Line 711 currently instructs time.Sleep(2 * time.Second) to drain replies. That pattern is brittle and violates the project’s concurrency rule; make the plan require deterministic completion signaling (e.g., wait on generator-owned sync primitives) instead.

As per coding guidelines: "Never use time.Sleep for goroutine synchronization — use proper sync primitives (channels, sync.WaitGroup, sync.Mutex)".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/plans/2026-06-02-loadgen-read-receipt-maxrps.md` around lines 700 - 713, The RunStep method in readReceiptWorkload uses time.Sleep to "drain" in-flight replies; replace this with explicit synchronization by having the generator created by w.newGenerator provide a deterministic completion signal (for example a done channel or sync.WaitGroup) and waiting on that signal instead of time.Sleep. Concretely: update w.newGenerator/runReadReceiptFor contract so runReadReceiptFor returns (or exposes) a completion indicator tied to the generator instance (e.g., a done chan struct{} or *sync.WaitGroup) that is closed/marked when all reply goroutines are finished; then in RunStep (the function shown) remove time.Sleep(2 * time.Second) and wait on that completion signal before calling buildReadReceiptInputs with the collector; ensure NewReadReceiptCollector and collector usage remain unchanged but are only read after the generator signals completion.

coderabbitai · 2026-06-02T11:31:19Z

+|------|---------|----------------|
+| `tools/loadgen/maxrps_readreceipt.go` | `maxrps_history.go` | `readReceiptWorkload` implementing `rpsWorkload`. `newReadReceiptWorkload` wires NATS, the metrics HTTP server, the requester, and derives targets from `BuildHistoryFixtures`. `RunStep` runs warmup (discarded) then hold (measured) and returns `rpsStepInputs`. `Label()` returns `"read-receipt"`. |
+| `tools/loadgen/readreceipt_generator.go` | `history_generator.go` | `ReadReceiptGenerator` with a `Rate` ticker and a `MaxInFlight` semaphore. Each tick picks a random target, issues the request via the requester, and records the result in the collector. Saturation (pool full on tick) is recorded, not dropped silently. |
+| `tools/loadgen/readreceipt_collector.go` | `history_collector.go` | In-memory latency tape plus `timeout`, `reply-error`, `bad-reply`, and `saturation` counters. Thread-safe (mutex), `Reset()`-able. |


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove Reset() references from collector behavior unless that API is actually added.

Line 77 and Line 153 describe a Reset()-able collector, but the implemented collector contract does not include Reset(). Align the spec to the real API to avoid misleading test/usage expectations.

Also applies to: 153-153

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-06-02-loadgen-read-receipt-maxrps-design.md` at line 77, Remove the dangling Reset() references from the collector spec: update the descriptions for the readreceipt_collector and history_collector to omit "Reset()"-able and instead state the collector provides an in-memory latency tape, timeout/reply-error/bad-reply/saturation counters and is thread-safe (mutex); ensure any other occurrences (e.g., the second mention around line 153) are likewise cleaned up and optionally add a short note that Reset() will be documented when/if a Reset API is actually introduced.

coderabbitai · 2026-06-02T11:31:19Z

+- `main.go`: add a `seed-read-receipt` subcommand that runs the full history
+  seed (`Seed`, `SeedRoomKeys`, `SeedThreadRooms`, `SeedHistoryCassandra`) then
+  `SeedReadReceiptState`, parameterized by `--preset`, `--seed`, `--read-ratio`.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the seed command contract: this spec documents a non-existent CLI shape.

Lines 93–95 and Line 101 describe seed-read-receipt, but this PR’s contract is loadgen seed --workload read-receipt .... Keeping the wrong command here will cause immediate operator error.

Also applies to: 101-101

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-06-02-loadgen-read-receipt-maxrps-design.md` around lines 93 - 95, Update the spec to reflect the actual CLI contract: replace references to the non-existent `seed-read-receipt` subcommand with the real invocation `loadgen seed --workload read-receipt` and show the same parameters (`--preset`, `--seed`, `--read-ratio`) passed to `loadgen seed`; specifically edit the sections that describe the command (currently mentioning `seed-read-receipt`) so they present the correct command shape (e.g., `loadgen seed --workload read-receipt --preset ... --seed ... --read-ratio ...`) and any examples or docs that repeat `seed-read-receipt` to avoid operator confusion.

coderabbitai · 2026-06-02T11:31:19Z

+Mirrors `historyWorkload.RunStep`: run a fresh generator for `warmup` against a
+throwaway collector (samples discarded), then run a fresh generator for `hold`
+against the measured collector, sleep briefly to drain trailing in-flight
+replies, and return `buildReadReceiptInputs(targetRPS, hold, collector)`.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid prescribing sleep-based draining in RunStep behavior.

The “sleep briefly to drain trailing replies” guidance should be replaced with explicit synchronization semantics; sleep-based coordination is nondeterministic and policy-inconsistent.

As per coding guidelines: "Never use time.Sleep for goroutine synchronization — use proper sync primitives (channels, sync.WaitGroup, sync.Mutex)".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-06-02-loadgen-read-receipt-maxrps-design.md` around lines 125 - 128, The RunStep description currently tells implementers to "sleep briefly to drain trailing in-flight replies" — replace that with explicit synchronization: when mirroring historyWorkload.RunStep (functions/methods referenced: RunStep, buildReadReceiptInputs, warmup, hold, collector) require the warmup and hold generators to run as distinct goroutines that signal completion via channels or a sync.WaitGroup and ensure the measured collector drains by waiting for a done signal from the generator/collector rather than sleeping; document that the throwaway collector used for warmup discards samples and does not block the synchronization, and specify the exact signal (e.g., generator done channel or WaitGroup) that callers must wait on before calling buildReadReceiptInputs(targetRPS, hold, collector).

coderabbitai · 2026-06-02T11:31:19Z

+	if readRatio <= 0 || readRatio > 1 {
+		fmt.Fprintln(os.Stderr, "--read-ratio must be in (0, 1]")
+		return 2


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does Go's flag package accept "NaN" for a float64 flag, and do comparisons like x <= 0 || x > 1 evaluate to false when x is NaN?

💡 Result:

Yes. The standard library flag package’s float64 flag ultimately uses strconv.ParseFloat to parse the flag value, and strconv.ParseFloat recognizes the string "NaN" (case-insensitive) as a NaN float64. [1][2] For comparisons: in Go, any comparison operation (<, <=, >, >=, ==,!=) involving a NaN produces results that follow IEEE 754 semantics: x <= 0 is false when x is NaN, and x > 1 is also false when x is NaN; therefore the boolean expression (x <= 0 || x > 1) evaluates to false when x is NaN. This is explicitly tested by Go’s float comparison test suite (nan <= f, nan > f, and nan <= nan are all false). [3] Note:!= is the only “true” comparison when comparing NaN to anything (including itself), consistent with IEEE 754. [3]

Citations:

1: https://github.com/golang/go/blob/996b9850/src/flag/flag.go

2: https://pkg.go.dev/strconv

3: https://go.googlesource.com/go/+/2af974777d8e771e5c58f172876ac4c6d9fb7256/test/floatcmp.go

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "readRatio|read-ratio|SeedReadReceiptState|runSeedReadReceipt" tools/loadgen/history_main.go tools/loadgen/main.go || true echo "---- history_main.go (around guard) ----" sed -n '60,110p' tools/loadgen/history_main.go echo "---- history_main.go (around runSeedReadReceipt) ----" sed -n '110,190p' tools/loadgen/history_main.go echo "---- main.go (flag wiring) ----" sed -n '1,220p' tools/loadgen/main.go

Repository: hmchangw/chat

Length of output: 12596

Reject NaN for --read-ratio.

flag.Float64 accepts "NaN", and readRatio <= 0 || readRatio > 1 evaluates to false when readRatio is NaN, so an invalid value can reach SeedReadReceiptState.

Suggested fix

- if readRatio <= 0 || readRatio > 1 { + if !(readRatio > 0 && readRatio <= 1) { fmt.Fprintln(os.Stderr, "--read-ratio must be in (0, 1]") return 2 }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if readRatio <= 0 || readRatio > 1 {

fmt.Fprintln(os.Stderr, "--read-ratio must be in (0, 1]")

return 2

if !(readRatio > 0 && readRatio <= 1) {

fmt.Fprintln(os.Stderr, "--read-ratio must be in (0, 1]")

return 2

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/history_main.go` around lines 82 - 84, The validation for the command-line flag readRatio incorrectly allows NaN because flag.Float64 parses "NaN" and the current check (readRatio <= 0 || readRatio > 1) doesn't catch it; update the validation in the same block that references readRatio (before calling SeedReadReceiptState) to explicitly reject NaN (use math.IsNaN(readRatio)) and treat it as invalid, printing the same error message "--read-ratio must be in (0, 1]" and returning 2.

coderabbitai · 2026-06-02T11:31:19Z

+	cleanup := func() {
+		shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		_ = srv.Shutdown(shutCtx)
+		cancel()
+		_ = nc.Drain()
+	}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Log teardown failures instead of discarding them.

Lines 73 and 75 silently drop Shutdown/Drain errors, which hides failed listener teardown or NATS drain problems during loadgen cleanup. At least warn with context so the run output reflects partial shutdowns.

💡 Suggested fix

cleanup := func() { shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second) - _ = srv.Shutdown(shutCtx) + if err := srv.Shutdown(shutCtx); err != nil { + slog.Warn("shutdown metrics server", "error", err) + } cancel() - _ = nc.Drain() + if err := nc.Drain(); err != nil { + slog.Warn("drain nats connection", "error", err) + } }

As per coding guidelines: "Never ignore errors silently — comment if intentionally discarded".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cleanup := func() {

shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)

_ = srv.Shutdown(shutCtx)

cancel()

_ = nc.Drain()

}

cleanup := func() {

shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)

if err := srv.Shutdown(shutCtx); err != nil {

slog.Warn("shutdown metrics server", "error", err)

}

cancel()

if err := nc.Drain(); err != nil {

slog.Warn("drain nats connection", "error", err)

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/maxrps_readreceipt.go` around lines 71 - 76, In the cleanup closure, don't ignore errors from srv.Shutdown and nc.Drain; capture each return value (e.g., err := srv.Shutdown(shutCtx) and err := nc.Drain()) and log a warning or error with context (for example "server shutdown failed" and "nats drain failed") including the error value so teardown failures are visible; keep the existing context cancel call and ensure logging happens before returning from the cleanup func.

coderabbitai · 2026-06-02T11:31:20Z

+	go func() {
+		defer wg.Done()
+		_ = gen.Run(genCtx)
+	}()
+	err := waitOrCancel(ctx, d)
+	cancel()
+	wg.Wait()
+	return err


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate gen.Run failures instead of dropping them.

Line 114 discards the generator error, so an early requester/setup failure turns into a full-duration wait and a misleadingly successful step result. Capture the goroutine result and return it when gen.Run exits before or after cancellation.

💡 Suggested fix

func runReadReceiptFor(ctx context.Context, gen *ReadReceiptGenerator, d time.Duration) error { genCtx, cancel := context.WithCancel(ctx) - var wg sync.WaitGroup - wg.Add(1) + defer cancel() + errCh := make(chan error, 1) + var wg sync.WaitGroup + wg.Add(1) go func() { defer wg.Done() - _ = gen.Run(genCtx) + errCh <- gen.Run(genCtx) }() - err := waitOrCancel(ctx, d) - cancel() - wg.Wait() - return err + timer := time.NewTimer(d) + defer timer.Stop() + + select { + case err := <-errCh: + cancel() + wg.Wait() + return err + case <-ctx.Done(): + cancel() + wg.Wait() + return ctx.Err() + case <-timer.C: + cancel() + wg.Wait() + return <-errCh + } }

As per coding guidelines: "Never ignore errors silently — comment if intentionally discarded".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/maxrps_readreceipt.go` around lines 112 - 119, The goroutine currently discards the error returned by gen.Run(genCtx), causing failures to be hidden; change it to capture the result (e.g., send the error to a buffered channel or store it in a variable synchronized with the WaitGroup) and after wg.Wait() check that error and return it instead of the nil from waitOrCancel; ensure you still call cancel() and wait, and return the gen.Run error if present (whether it finished before or after waitOrCancel). Reference: gen.Run, genCtx, waitOrCancel, cancel, wg.

coderabbitai · 2026-06-02T11:31:20Z

+To tear down, use the history teardown — read-receipt seeds the identical
+history fixtures, so `loadgen teardown --workload=history --preset=<name>` drops
+everything (dropping `subscriptions` removes the stamped `lastSeenAt` too):
+
+```bash
+loadgen teardown --workload=history --preset=history-medium


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Mention that teardown must reuse the seed.

This example implies any seed works, but runTeardownHistory derives room IDs from the preset+seed before deleting Valkey room keys. If someone seeded read-receipt data with a non-default seed, this command will drop Mongo/Cassandra state but leave the corresponding room keys behind unless the same --seed is passed.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/README.md` around lines 320 - 325, The README example is misleading because runTeardownHistory derives room IDs from the preset+seed, so teardown must be run with the same --seed used to create the fixtures; update the docs to state this requirement, mention that failing to pass the original --seed will leave Valkey room keys behind, and change the example for loadgen teardown --workload=history --preset=history-medium to include a matching --seed (e.g., --seed=1234) and a short note referencing runTeardownHistory so readers know why the seed must match.

coderabbitai · 2026-06-02T11:31:20Z

+	runCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
+	defer cancel()
+	require.NoError(t, gen.Run(runCtx))
+	time.Sleep(500 * time.Millisecond) // drain trailing replies
+
+	assert.NotEmpty(t, collector.Samples(), "generator produced zero samples")
+	assert.Equal(t, 0, collector.Failed(), "stub responder should yield zero failures")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace the fixed sleep with a real synchronization signal.

time.Sleep(500 * time.Millisecond) makes this test timing-dependent: slower CI can still observe the collector too early, while faster runs just pay an unnecessary delay. Wait on a concrete condition from the responder/generator instead of sleeping for an arbitrary interval. As per coding guidelines, "Never use time.Sleep for goroutine synchronization — use proper sync primitives (channels, sync.WaitGroup, sync.Mutex)."

💡 One way to make the assertion deterministic

runCtx, cancel := context.WithTimeout(ctx, 2*time.Second) defer cancel() require.NoError(t, gen.Run(runCtx)) - time.Sleep(500 * time.Millisecond) // drain trailing replies + require.Eventually(t, func() bool { + return len(collector.Samples()) > 0 || collector.Failed() > 0 + }, time.Second, 10*time.Millisecond) assert.NotEmpty(t, collector.Samples(), "generator produced zero samples") assert.Equal(t, 0, collector.Failed(), "stub responder should yield zero failures")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

runCtx, cancel := context.WithTimeout(ctx, 2*time.Second)

defer cancel()

require.NoError(t, gen.Run(runCtx))

time.Sleep(500 * time.Millisecond) // drain trailing replies

assert.NotEmpty(t, collector.Samples(), "generator produced zero samples")

assert.Equal(t, 0, collector.Failed(), "stub responder should yield zero failures")

runCtx, cancel := context.WithTimeout(ctx, 2*time.Second)

defer cancel()

require.NoError(t, gen.Run(runCtx))

require.Eventually(t, func() bool {

return len(collector.Samples()) > 0 || collector.Failed() > 0

}, time.Second, 10*time.Millisecond)

assert.NotEmpty(t, collector.Samples(), "generator produced zero samples")

assert.Equal(t, 0, collector.Failed(), "stub responder should yield zero failures")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/readreceipt_integration_test.go` around lines 78 - 84, Replace the fixed time.Sleep by waiting on a deterministic signal from the generator/responder: modify the test to create a done channel (or use a sync.WaitGroup) that the stub responder closes or signals when it has delivered all trailing replies, then replace time.Sleep(500*time.Millisecond) with a select that waits on that done channel (or WaitGroup completion) or the runCtx.Done() for timeout; update the stub responder (or generator) to signal the done channel when it finishes emitting replies so the test uses gen.Run(runCtx) plus a synchronous wait on that signal before asserting on collector.Samples() and collector.Failed().

coderabbitai · 2026-06-02T11:31:20Z

+	rng := rand.New(rand.NewSource(seed))
+	coll := db.Collection("subscriptions")
+	for roomID, roomSubs := range byRoom {
+		floor, ok := latest[roomID]
+		if !ok {
+			continue // room has no top-level messages — nothing to read
+		}
+		lastSeen := floor.Add(time.Millisecond).UTC()
+		chosen := selectReaders(roomSubs, readRatio, rng)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sort rooms before consuming the seeded RNG.

byRoom is a Go map, so this loop runs in nondeterministic order. Because every room pulls from the same rng, the chosen readers change across runs even with the same seed, which breaks the deterministic seeding behavior promised by SeedReadReceiptState.

💡 Proposed fix

import ( "context" "fmt" "math" "math/rand" + "sort" "time" @@ rng := rand.New(rand.NewSource(seed)) coll := db.Collection("subscriptions") - for roomID, roomSubs := range byRoom { + roomIDs := make([]string, 0, len(byRoom)) + for roomID := range byRoom { + roomIDs = append(roomIDs, roomID) + } + sort.Strings(roomIDs) + for _, roomID := range roomIDs { + roomSubs := byRoom[roomID] floor, ok := latest[roomID] if !ok { continue // room has no top-level messages — nothing to read }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rng := rand.New(rand.NewSource(seed))

coll := db.Collection("subscriptions")

for roomID, roomSubs := range byRoom {

floor, ok := latest[roomID]

if !ok {

continue // room has no top-level messages — nothing to read

}

lastSeen := floor.Add(time.Millisecond).UTC()

chosen := selectReaders(roomSubs, readRatio, rng)

rng := rand.New(rand.NewSource(seed))

coll := db.Collection("subscriptions")

roomIDs := make([]string, 0, len(byRoom))

for roomID := range byRoom {

roomIDs = append(roomIDs, roomID)

}

sort.Strings(roomIDs)

for _, roomID := range roomIDs {

roomSubs := byRoom[roomID]

floor, ok := latest[roomID]

if !ok {

continue // room has no top-level messages — nothing to read

}

lastSeen := floor.Add(time.Millisecond).UTC()

chosen := selectReaders(roomSubs, readRatio, rng)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/readreceipt_seed.go` around lines 69 - 77, byRoom is a map so iterating over it uses a nondeterministic order and consumes the shared RNG (rng) unpredictably; to restore deterministic behavior for SeedReadReceiptState, collect the room IDs from byRoom, sort them (e.g., lexicographically), and iterate over the sorted slice when generating lastSeen and calling selectReaders so the same rng sequence is consumed in a stable order for every run.

Resolve conflicts between the read-receipt max-rps workload and the room-read workload + bottleneck-attribution work from main: - main.go / maxrps.go: keep both read-receipt and room-read workloads in the seed and max-rps dispatch, flag help, and default-steps list. - README.md: keep both the read-receipt section and main's bottleneck-attribution + daily-IM guide additions. - Adapt read-receipt seed/workload to main's refactored HistoryFixtures API: SeedThreadRooms/SeedHistoryCassandra now take *HistoryFixtures (SeedHistoryCassandra returns msgCount), and the in-memory plan is materialized via res.FullPlan() instead of the removed res.Plan field.

Resolve README.md conflict: keep main's new INCONCLUSIVE reasons (emit-underrun vs saturation columns + rate-pacing note) ahead of the read-receipt workload subsection. main.go auto-merged cleanly with both the read-receipt and room-read workloads present.

claude added 11 commits June 2, 2026 09:18

docs: design spec for loadgen max-rps read-receipt workload

c8829f7

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

docs: implementation plan for loadgen max-rps read-receipt workload

752faec

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

feat(loadgen): add read-receipt collector

a1cc476

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

feat(loadgen): add read-receipt requester

5f204f0

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

chore(loadgen): tidy defaultSteps switch and request-timeout flag help

63f447b

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

test(loadgen): add read-receipt workload integration test

8af86ed

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

docs(loadgen): document read-receipt max-rps workload

f7cb7cc

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

docs(loadgen): note history teardown covers read-receipt fixtures

7695977

https://claude.ai/code/session_019TuJE9Y5nbsTMnXdycLiWo

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

claude added 2 commits June 4, 2026 06:54

Conversation

hmchangw commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hmchangw commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading