feat(loadgen): daily-IM load scenario to find sustainable N by hmchangw · Pull Request #234 · hmchangw/chat

hmchangw · 2026-05-27T22:51:46Z

Summary

Adds a loadgen daily subcommand that simulates N users using the chat system as their primary IM over a workday, ramps N geometrically, and reports the largest N at which all SLO signals held over a steady-state hold window. Single-site only.

Design spec: docs/superpowers/specs/2026-05-27-daily-im-load-scenario-design.md
Implementation plan: docs/superpowers/plans/2026-05-27-daily-im-load-scenario.md

What's implemented

Presets (preset.go): daily-light / daily-heavy / daily-power with banded per-user room membership (DMs + small + medium + large rooms, Zipf sizing). Headline daily-heavy ≈ 56 rooms/user.
User behavior (daily_user.go, daily_envelope.go): two-state idle/active Markov chain, weighted action picker, and a diurnal envelope (two Gaussians at 1/3 and 2/3 of the hold) scaling the per-user Poisson rate.
Action handlers (daily_actions.go): send, read-receipt, room-list refresh, history scroll, mute toggle, room create, member add, thread reply — all going through the frontdoor (message-gatekeeper).
Hybrid receiver (daily_pool.go): one nats.Conn per user up to --max-direct-users, then a multiplexed pool of shared conns. Both subscribe to channel (RoomEvent) and DM (UserRoomEvent) broadcasts.
SLO verdict (daily_verdict.go): p95/p99 latency, error rate, JetStream consumer pending-growth, plus loadgen self-metrics that mark a step INCONCLUSIVE when the load box itself is saturated.
Ramp + report (daily.go, daily_report.go): step-up-and-hold control loop, console table + CSV output, "ANSWER: N = …" line naming the first limit hit.
Wiring: loadgen daily subcommand, make -C tools/loadgen/deploy run-daily target, README section.

Status

lint, test, sast CI jobs green.
test-integration (tools) fixed (the newMultiplexPool signature change broke a call site; corrected in c9187d0).
Unit coverage for pure logic (envelope, verdict, picker, fixtures, report, pollers via httptest). Pool code is exercised by integration tests behind //go:build integration.

Known limitations / follow-ups

Large-room sends are gatekeeper-blocked. Large-band rooms have UserCount > message-gatekeeper's LargeRoomThreshold (default 500), so member-role top-level sends to them are rejected. Daily presets funnel sends to smaller rooms; large rooms are exercised for receive-side fan-out only. A clean fix (thread-routing or role escalation in fixtures) is deferred.
Auth is a placeholder. mintJWT is a no-op stub — per-user auth-service login is not yet wired (spec §10 deferred it as a light touch).
Service-error signal is dormant. Backend services don't expose /metrics HTTP endpoints, so the slog_errors_total scraper passes an empty URL map and that SLO signal is currently a no-op. The other four signals are live.
Reconnect/presence storms and cross-site federation are explicitly out of scope (separate PRs).

Test plan

make -C tools/loadgen/deploy up && make -C tools/loadgen/deploy seed PRESET=daily-heavy && make -C tools/loadgen/deploy run-daily PRESET=daily-heavy STEPS=1000,2000 HOLD=30s produces a CSV + console summary against the local stack.
make lint, make test SERVICE=tools/loadgen pass.
make test-integration SERVICE=tools/loadgen passes (Docker required; verified compiling under the integration tag).

https://claude.ai/code/session_014KJfz9ZA7cCe4rAmfSHHmt

Summary by CodeRabbit

New Features
- Added loadgen daily: ramps concurrent “daily‑IM” users with warmup/hold/cooldown, SLO evaluation, stop‑on‑trip, and CSV output.
- Presets: daily‑light / daily‑heavy / daily‑power; CLI controls for steps, warmup/hold/cooldown, connection sizing, and CSV.
- Console and CSV reports now include latency percentiles (p50/p95/p99), error rate, pending deltas, multiplex-drop and action attempt/failure metrics, and an ANSWER line.
Documentation
- New design spec and quick‑start guide covering scenario, presets, SLOs, and outputs.

coderabbitai · 2026-05-27T22:52:00Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.08% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(loadgen): daily-IM load scenario to find sustainable N' clearly and concisely summarizes the primary change—a new loadgen scenario for simulating daily IM workload and finding the maximum sustainable user count (N).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/gifted-rubin-ry8HI

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

🧹 Nitpick comments (1)

tools/loadgen/daily_user_test.go (1)

38-40: ⚡ Quick win

Strengthen the distribution assertion for rare actions.

The current check (mute < roomCreate+memberAdd+10) is very permissive and can still pass if weights regress. Consider asserting sampled ratios against expected probabilities (with tolerance) so this test actually guards weight drift.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/loadgen/daily_user_test.go` around lines 38 - 40, The current
permissive assertion using require.Less for actionMuteToggle should be replaced
by asserting the empirical probability of the rare action is within a tolerance
of its expected probability: compute totalSamples (sum of counts map), compute
empiricalProb := float64(counts[actionMuteToggle]) / float64(totalSamples) and
expectedProb based on the configured weights for actionMuteToggle relative to
sum of weights (or derive from actionRoomCreate+actionMemberAdd if that's the
intended baseline), then use a numeric comparison (e.g., require.InDelta or
require.True with math.Abs(empiricalProb-expectedProb) < tolerance) to enforce
that empiricalProb is within an acceptable tolerance (e.g., 2–5%) so weight
regressions are caught; update the assertion that currently references
actionMuteToggle, actionRoomCreate, and actionMemberAdd accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/specs/2026-05-27-daily-im-load-scenario-design.md`:
- Around line 212-222: The fenced output block containing the performance table
and ANSWER (the block that starts with "N        p50    p95..." and ends with
"Next limit: broadcast-worker consumer (pending growth)") needs a language tag
to satisfy MD040; change the opening fence from ``` to ```text so the block is
labeled (e.g., ```text ... ```). Ensure only the opening fence is updated and
keep the block content unchanged.

In `@tools/loadgen/collector.go`:
- Around line 170-181: Reset currently clears per-step counters but leaves
multiplexDrops untouched; update Collector.Reset to clear multiplexDrops as well
(under the existing c.mu lock) by setting it to its zero value—e.g., if
multiplexDrops is an atomic counter call c.multiplexDrops.Store(0), or if it’s a
plain integer or map assign the appropriate zero/nil (c.multiplexDrops = 0 or
c.multiplexDrops = nil/make(...)) so per-step drop counts are not cumulative;
place this change alongside the other resets in Reset().

In `@tools/loadgen/daily_actions_test.go`:
- Around line 43-125: Add negative and boundary tests for the action handlers
(sendMessage, readReceipt, refreshRoomList, scrollHistory, muteToggle,
roomCreate, memberAdd, threadReply) by extending captured and actionCtx to
simulate Publish and Request failures and marshal errors, plus empty-room no-op
cases: write tests that (1) substitute captured.publish/captured.request to
return errors and assert handlers return wrapped errors, (2) force JSON
marshal/unmarshal failures for sendMessage/threadReply (e.g., use a type or hook
that causes json.Marshal to fail) and assert error propagation, and (3) pass
userState with empty Rooms to handlers that should be no-ops and assert no
publishes/requests occur; ensure each new test checks error messages are
preserved (wrapped) and that failure counts/lengths on
captured.pubs/captured.reqs are as expected.

In `@tools/loadgen/daily_actions.go`:
- Line 77: Replace the wrong publish subject in the read-receipt publish call:
in the a.Publish invocation that currently uses subject.MessageRead(u.Account,
roomID, a.SiteID), change the subject to the read-receipt subject (e.g.,
subject.ReadReceipt(u.Account, roomID, a.SiteID)) so the event is sent on the
correct RPC path; update the single a.Publish call accordingly in
daily_actions.go.

In `@tools/loadgen/daily_pool.go`:
- Around line 108-120: newMultiplexPool currently panics on nats.Connect
failure; change its signature from newMultiplexPool(natsURL string, c
*Collector, size int) *multiplexPool to return (*multiplexPool, error) so
bootstrap errors can be returned; inside newMultiplexPool, on connection error
call p.Close() (as already done) and return nil and a wrapped error (e.g.,
fmt.Errorf("multiplex conn %d: %w", i, err)); update all callers of
newMultiplexPool to handle the returned error and propagate a controlled CLI
failure instead of letting the process crash.
- Around line 129-143: The code currently inserts into p.userInbox and appends
to p.dispatch before calling nc.Subscribe, which leaks state if nc.Subscribe
fails; change the logic in the Add method so you either (a) perform Subscribe
for each room first (while holding p.mu) and only after all subscribes succeed
mutate p.userInbox and p.dispatch, or (b) keep the current ordering but on
Subscribe error roll back the partial mutations: remove p.userInbox[u.ID],
remove the recently appended inbox entries from p.dispatch[roomID], and
decrement p.roomRefs[roomID] accordingly before returning the error; ensure you
still unlock p.mu in all paths and use the existing symbols (p.mu, p.userInbox,
p.dispatch, p.roomRefs, p.nextConn, nc.Subscribe, p.route) to locate where to
apply the fix.

In `@tools/loadgen/daily_report.go`:
- Around line 84-86: writeDailyCSV currently defers w.Flush() which discards
w.Error(); change the function to use a named error return (e.g., err error) and
replace defer w.Flush() with a defer func() { w.Flush(); if ferr := w.Error();
ferr != nil && err == nil { err = ferr } } so any flush/write errors from the
csv.Writer (w) are surfaced as the function's returned error; ensure the
function signature and any callers still work with the named error return.

In `@tools/loadgen/daily_verdict.go`:
- Around line 193-199: The P99 target calculation can be zero for total<100,
causing the loop in the code that iterates h.Counts to return h.Buckets[0] even
when that bucket has zero count; change the target computation (the variable
target used alongside total, h.Counts and h.Buckets) to ensure it is at least 1
(e.g., compute target := total*99/100 and if target == 0 { target = 1 } or use a
math-based ceil to guarantee at least one sample) so the accumulator logic (acc
+= c; if acc >= target { return h.Buckets[i] * 1000 }) never returns due to a
zero target and correctly reflects GCPauseP99Ms and step conclusiveness.
- Around line 239-267: The pollPending and scrapeErrorCounter HTTP calls need
robust error handling: don't ignore errors from http.NewRequestWithContext in
pollPending (and the analogous call in scrapeErrorCounter) — return the error;
use a configured http.Client with a sensible timeout instead of
http.DefaultClient; after Do(), check resp.StatusCode and return an error for
non-2xx responses before attempting to decode/read the body; ensure JSON
decoding in pollPending returns any decode error; and in scrapeErrorCounter,
propagate any io.Read or scan errors instead of breaking the loop and returning
a potentially underreported sum. Update the functions pollPending and
scrapeErrorCounter to perform these checks and return errors when they occur.

In `@tools/loadgen/daily.go`:
- Around line 378-387: The auth-service login call uses http.DefaultClient.Do
which has no timeout; change it to use a client with a deadline (or derive a
context with timeout) so the call can't block indefinitely: create an
http.Client with a sensible Timeout (e.g., client := &http.Client{Timeout:
10*time.Second}) and replace http.DefaultClient.Do(req) with client.Do(req), or
wrap ctx with context.WithTimeout before calling http.NewRequestWithContext and
cancel after use; update imports to include time (or context) as needed and
ensure errors/cancel are handled in the same block where req and resp are
created.
- Around line 143-149: The hold loop in the routine that waits until holdEnd
currently always sleeps a fixed 5s (the for/select block) and can overshoot
short holds; change the logic in that loop (the goroutine using ctx.Done() and
time.After(5 * time.Second)) to compute the remaining duration (d :=
time.Until(holdEnd)) and wait only for min(d, 5*time.Second) using a timer
(time.NewTimer / timer.Reset) or time.After with that computed duration so the
loop exits promptly when holdEnd is reached and still polls at most every 5s.

In `@tools/loadgen/README.md`:
- Around line 163-167: Add missing fenced code block language identifiers by
changing the two command examples that start with "make -C tools/loadgen/deploy
up" and "loadgen daily \\" to use bash fences (```bash) instead of plain ``` so
the blocks are explicitly marked as Bash shells; update both occurrences in the
README so the first block (make -C tools/loadgen/deploy ...) and the second
block (loadgen daily ...) have ```bash opening fences.

---

Nitpick comments:
In `@tools/loadgen/daily_user_test.go`:
- Around line 38-40: The current permissive assertion using require.Less for
actionMuteToggle should be replaced by asserting the empirical probability of
the rare action is within a tolerance of its expected probability: compute
totalSamples (sum of counts map), compute empiricalProb :=
float64(counts[actionMuteToggle]) / float64(totalSamples) and expectedProb based
on the configured weights for actionMuteToggle relative to sum of weights (or
derive from actionRoomCreate+actionMemberAdd if that's the intended baseline),
then use a numeric comparison (e.g., require.InDelta or require.True with
math.Abs(empiricalProb-expectedProb) < tolerance) to enforce that empiricalProb
is within an acceptable tolerance (e.g., 2–5%) so weight regressions are caught;
update the assertion that currently references actionMuteToggle,
actionRoomCreate, and actionMemberAdd accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 01513166-7f5b-44d1-b9c3-33088365baff

📥 Commits

Reviewing files that changed from the base of the PR and between 416285d and 700a83b.

📒 Files selected for processing (24)

docs/superpowers/plans/2026-05-27-daily-im-load-scenario.md
docs/superpowers/specs/2026-05-27-daily-im-load-scenario-design.md
tools/loadgen/README.md
tools/loadgen/collector.go
tools/loadgen/daily.go
tools/loadgen/daily_actions.go
tools/loadgen/daily_actions_test.go
tools/loadgen/daily_envelope.go
tools/loadgen/daily_envelope_test.go
tools/loadgen/daily_integration_test.go
tools/loadgen/daily_pool.go
tools/loadgen/daily_pool_test.go
tools/loadgen/daily_report.go
tools/loadgen/daily_report_test.go
tools/loadgen/daily_test.go
tools/loadgen/daily_user.go
tools/loadgen/daily_user_test.go
tools/loadgen/daily_verdict.go
tools/loadgen/daily_verdict_test.go
tools/loadgen/deploy/Makefile
tools/loadgen/main.go
tools/loadgen/main_test.go
tools/loadgen/preset.go
tools/loadgen/preset_test.go

coderabbitai · 2026-05-27T23:01:40Z

+```
+N        p50    p95    p99    err%    worst-pending-delta             verdict
+1000     12     45     89     0.00%   broadcast-worker +12             PASS
+2000     14     58     112    0.00%   broadcast-worker +34             PASS
+5000     22     94     180    0.01%   broadcast-worker +180            PASS
+10000    38     210    430    0.02%   broadcast-worker +890            PASS
+20000    71     480    980    0.04%   broadcast-worker +1240           TRIP
+
+ANSWER: N = 10000 (last passing step)
+        Next limit: broadcast-worker consumer (pending growth)
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced output block.

The fenced block starting at Line 212 is missing a language identifier (MD040). Please label it (e.g., text) to keep markdown lint clean.

Proposed fix

-``` +```text N p50 p95 p99 err% worst-pending-delta verdict 1000 12 45 89 0.00% broadcast-worker +12 PASS 2000 14 58 112 0.00% broadcast-worker +34 PASS @@ ANSWER: N = 10000 (last passing step) Next limit: broadcast-worker consumer (pending growth)

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 212-212: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-05-27-daily-im-load-scenario-design.md` around lines 212 - 222, The fenced output block containing the performance table and ANSWER (the block that starts with "N p50 p95..." and ends with "Next limit: broadcast-worker consumer (pending growth)") needs a language tag to satisfy MD040; change the opening fence from ``` to ```text so the block is labeled (e.g., ```text ... ```). Ensure only the opening fence is updated and keep the block content unchanged.

coderabbitai · 2026-05-27T23:01:40Z

+// Reset clears all per-step counters and sample slices.
+// Called at the end of warmup so the hold window starts fresh.
+func (c *Collector) Reset() {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+	c.e1 = c.e1[:0]
+	c.e2 = c.e2[:0]
+	c.byReqID = make(map[string]publishEntry)
+	c.byMsgID = make(map[string]publishEntry)
+	c.attempted.Store(0)
+	c.failed.Store(0)
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reset should also clear multiplexDrops.

Reset() is documented as clearing per-step counters, but multiplexDrops is left untouched. This makes drop counts cumulative across steps and skews per-step metrics after the first hold window.

[suggested fix]

Patch

func (c *Collector) Reset() { c.mu.Lock() defer c.mu.Unlock() c.e1 = c.e1[:0] c.e2 = c.e2[:0] c.byReqID = make(map[string]publishEntry) c.byMsgID = make(map[string]publishEntry) + c.multiplexDrops.Store(0) c.attempted.Store(0) c.failed.Store(0) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Reset clears all per-step counters and sample slices.

// Called at the end of warmup so the hold window starts fresh.

func (c *Collector) Reset() {

c.mu.Lock()

defer c.mu.Unlock()

c.e1 = c.e1[:0]

c.e2 = c.e2[:0]

c.byReqID = make(map[string]publishEntry)

c.byMsgID = make(map[string]publishEntry)

c.attempted.Store(0)

c.failed.Store(0)

}

// Reset clears all per-step counters and sample slices.

// Called at the end of warmup so the hold window starts fresh.

func (c *Collector) Reset() {

c.mu.Lock()

defer c.mu.Unlock()

c.e1 = c.e1[:0]

c.e2 = c.e2[:0]

c.byReqID = make(map[string]publishEntry)

c.byMsgID = make(map[string]publishEntry)

c.multiplexDrops.Store(0)

c.attempted.Store(0)

c.failed.Store(0)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/collector.go` around lines 170 - 181, Reset currently clears per-step counters but leaves multiplexDrops untouched; update Collector.Reset to clear multiplexDrops as well (under the existing c.mu lock) by setting it to its zero value—e.g., if multiplexDrops is an atomic counter call c.multiplexDrops.Store(0), or if it’s a plain integer or map assign the appropriate zero/nil (c.multiplexDrops = 0 or c.multiplexDrops = nil/make(...)) so per-step drop counts are not cumulative; place this change alongside the other resets in Reset().

coderabbitai · 2026-05-27T23:01:40Z

+func TestSendMessage_PublishesToFrontdoor(t *testing.T) {
+	c := &captured{}
+	u := &userState{ID: "u-1", Account: "user-1", Rooms: []string{"room-a", "room-b"}}
+	ctx := actionCtx{Ctx: context.Background(), Publish: c.publish, Request: c.request, SiteID: "site-test"}
+	err := sendMessage(ctx, u, "hello")
+	require.NoError(t, err)
+	require.Len(t, c.pubs, 1)
+	got := c.pubs[0]
+	require.True(t, got.Subj == subject.MsgSend("user-1", "room-a", "site-test") ||
+		got.Subj == subject.MsgSend("user-1", "room-b", "site-test"))
+	var req model.SendMessageRequest
+	require.NoError(t, json.Unmarshal(got.Data, &req))
+	require.Equal(t, "hello", req.Content)
+}
+
+func TestReadReceipt_Publishes(t *testing.T) {
+	c := &captured{}
+	u := &userState{ID: "u-1", Account: "user-1", Rooms: []string{"room-a"}}
+	ctx := actionCtx{Ctx: context.Background(), Publish: c.publish, Request: c.request, SiteID: "site-test"}
+	err := readReceipt(ctx, u, "msg-1")
+	require.NoError(t, err)
+	require.Len(t, c.pubs, 1)
+	require.Equal(t, subject.MessageRead("user-1", "room-a", "site-test"), c.pubs[0].Subj)
+}
+
+func TestRefreshRoomList_Requests(t *testing.T) {
+	c := &captured{}
+	u := &userState{ID: "u-1", Account: "user-1"}
+	ctx := actionCtx{Ctx: context.Background(), Publish: c.publish, Request: c.request, SiteID: "site-test"}
+	err := refreshRoomList(ctx, u)
+	require.NoError(t, err)
+	require.Len(t, c.reqs, 1)
+	require.Equal(t, subject.UserSubscriptionGetRooms("user-1", "site-test"), c.reqs[0].Subj)
+}
+
+func TestScrollHistory_Requests(t *testing.T) {
+	c := &captured{}
+	u := &userState{ID: "u-1", Account: "user-1", Rooms: []string{"room-a"}}
+	ctx := actionCtx{Ctx: context.Background(), Publish: c.publish, Request: c.request, SiteID: "site-test"}
+	require.NoError(t, scrollHistory(ctx, u))
+	require.Len(t, c.reqs, 1)
+	// History fetch goes through MsgGet-style subject — check it includes the roomID.
+	require.Contains(t, c.reqs[0].Subj, "room-a")
+}
+
+func TestMuteToggle_Publishes(t *testing.T) {
+	c := &captured{}
+	u := &userState{ID: "u-1", Account: "user-1", Rooms: []string{"room-a"}}
+	ctx := actionCtx{Ctx: context.Background(), Publish: c.publish, Request: c.request, SiteID: "site-test"}
+	require.NoError(t, muteToggle(ctx, u))
+	require.Len(t, c.reqs, 1)
+	require.Equal(t, subject.MuteToggle("user-1", "room-a", "site-test"), c.reqs[0].Subj)
+}
+
+func TestRoomCreate_Requests(t *testing.T) {
+	c := &captured{}
+	u := &userState{ID: "u-1", Account: "user-1"}
+	ctx := actionCtx{Ctx: context.Background(), Publish: c.publish, Request: c.request, SiteID: "site-test"}
+	require.NoError(t, roomCreate(ctx, u))
+	require.Len(t, c.reqs, 1)
+	require.Equal(t, subject.RoomCreate("user-1", "site-test"), c.reqs[0].Subj)
+}
+
+func TestMemberAdd_Requests(t *testing.T) {
+	c := &captured{}
+	u := &userState{ID: "u-1", Account: "user-1", Rooms: []string{"room-a"}}
+	ctx := actionCtx{Ctx: context.Background(), Publish: c.publish, Request: c.request, SiteID: "site-test"}
+	require.NoError(t, memberAdd(ctx, u, "user-2"))
+	require.Len(t, c.reqs, 1)
+	require.Equal(t, subject.MemberAdd("user-1", "room-a", "site-test"), c.reqs[0].Subj)
+}
+
+func TestThreadReply_Publishes(t *testing.T) {
+	c := &captured{}
+	u := &userState{ID: "u-1", Account: "user-1", Rooms: []string{"room-a"}}
+	ctx := actionCtx{Ctx: context.Background(), Publish: c.publish, Request: c.request, SiteID: "site-test"}
+	require.NoError(t, threadReply(ctx, u, "parent-msg-1", "reply text"))
+	require.Len(t, c.pubs, 1)
+	require.Equal(t, subject.MsgSend("user-1", "room-a", "site-test"), c.pubs[0].Subj)
+	var req model.SendMessageRequest
+	require.NoError(t, json.Unmarshal(c.pubs[0].Data, &req))
+	require.Equal(t, "parent-msg-1", req.ThreadParentMessageID)
+}


🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Add error-path and boundary-case tests for action handlers.

Current tests are mostly happy path. Please add cases for publish/request failures, marshal failures (where applicable), and empty-room no-op behavior so failure accounting and wrapped errors stay protected.

As per coding guidelines, “Tests must cover: happy path, error paths, edge cases (empty collections, boundary conditions), and invalid input”.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/daily_actions_test.go` around lines 43 - 125, Add negative and boundary tests for the action handlers (sendMessage, readReceipt, refreshRoomList, scrollHistory, muteToggle, roomCreate, memberAdd, threadReply) by extending captured and actionCtx to simulate Publish and Request failures and marshal errors, plus empty-room no-op cases: write tests that (1) substitute captured.publish/captured.request to return errors and assert handlers return wrapped errors, (2) force JSON marshal/unmarshal failures for sendMessage/threadReply (e.g., use a type or hook that causes json.Marshal to fail) and assert error propagation, and (3) pass userState with empty Rooms to handlers that should be no-ops and assert no publishes/requests occur; ensure each new test checks error messages are preserved (wrapped) and that failure counts/lengths on captured.pubs/captured.reqs are as expected.

coderabbitai · 2026-05-27T23:01:40Z

+		return nil
+	}
+	roomID := u.Rooms[a.rand().Intn(len(u.Rooms))]
+	payload, err := json.Marshal(map[string]string{"messageId": lastMsgID})


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Replace map-based JSON payloads with typed pkg/model structs.

These request/publish payloads are built from map[...]..., which makes schema drift easy and bypasses compile-time checks. Please switch them to typed model payloads for read-receipt, room-create, and member-add requests.

As per coding guidelines, “All NATS payloads are JSON — use encoding/json with typed structs from pkg/model, never map[string]interface{}”.

Also applies to: 123-126, 143-143

coderabbitai · 2026-05-27T23:01:40Z

+	if err != nil {
+		return fmt.Errorf("marshal read-receipt: %w", err)
+	}
+	if err := a.Publish(a.Ctx, subject.MessageRead(u.Account, roomID, a.SiteID), payload); err != nil {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the read-receipt subject, not the message-read subject.

readReceipt currently publishes to subject.MessageRead(...), which is a different RPC path than read-receipt. This can send the event to the wrong consumer contract.

Suggested fix

- if err := a.Publish(a.Ctx, subject.MessageRead(u.Account, roomID, a.SiteID), payload); err != nil { + if err := a.Publish(a.Ctx, subject.MessageReadReceipt(u.Account, roomID, a.SiteID), payload); err != nil {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if err := a.Publish(a.Ctx, subject.MessageRead(u.Account, roomID, a.SiteID), payload); err != nil {

if err := a.Publish(a.Ctx, subject.MessageReadReceipt(u.Account, roomID, a.SiteID), payload); err != nil {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/daily_actions.go` at line 77, Replace the wrong publish subject in the read-receipt publish call: in the a.Publish invocation that currently uses subject.MessageRead(u.Account, roomID, a.SiteID), change the subject to the read-receipt subject (e.g., subject.ReadReceipt(u.Account, roomID, a.SiteID)) so the event is sent on the correct RPC path; update the single a.Publish call accordingly in daily_actions.go.

coderabbitai · 2026-05-27T23:01:40Z

+	target := total * 99 / 100
+	var acc uint64
+	for i, c := range h.Counts {
+		acc += c
+		if acc >= target {
+			return h.Buckets[i] * 1000
+		}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

P99 target calculation can select an empty bucket for small totals.

target := total * 99 / 100 becomes 0 when total < 100, so the loop may return the first bucket even if its count is zero. That can distort GCPauseP99Ms and incorrectly mark steps inconclusive.

Patch

- target := total * 99 / 100 + target := (total*99 + 99) / 100 // ceil(0.99 * total), minimum 1 when total > 0 var acc uint64 for i, c := range h.Counts { acc += c if acc >= target { return h.Buckets[i] * 1000 } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

target := total * 99 / 100

var acc uint64

for i, c := range h.Counts {

acc += c

if acc >= target {

return h.Buckets[i] * 1000

}

target := (total*99 + 99) / 100 // ceil(0.99 * total), minimum 1 when total > 0

var acc uint64

for i, c := range h.Counts {

acc += c

if acc >= target {

return h.Buckets[i] * 1000

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/daily_verdict.go` around lines 193 - 199, The P99 target calculation can be zero for total<100, causing the loop in the code that iterates h.Counts to return h.Buckets[0] even when that bucket has zero count; change the target computation (the variable target used alongside total, h.Counts and h.Buckets) to ensure it is at least 1 (e.g., compute target := total*99/100 and if target == 0 { target = 1 } or use a math-based ceil to guarantee at least one sample) so the accumulator logic (acc += c; if acc >= target { return h.Buckets[i] * 1000 }) never returns due to a zero target and correctly reflects GCPauseP99Ms and step conclusiveness.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

docs/superpowers/specs/2026-05-27-daily-im-load-scenario-design.md (1)

212-222: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the console-output fenced block.

Use ```text for this block to satisfy markdown lint (MD040).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-05-27-daily-im-load-scenario-design.md` around
lines 212 - 222, The fenced console-output block showing the table (the
triple-backtick block containing "N p50 p95 p99 err% worst-pending-delta verdict
..." and the ANSWER lines) needs a language tag to satisfy MD040; change the
opening fence from ``` to ```text so the block starts with ```text and leave the
rest unchanged.

tools/loadgen/README.md (1)

224-228: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Mark both daily command fences as Bash.

Both new command examples should use ```bash to satisfy MD040.

Also applies to: 240-248

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/loadgen/README.md` around lines 224 - 228, Update the markdown command
fences in the README examples so they are explicitly Bash: change the two
triple-backtick blocks that contain the make commands (the blocks shown with
make -C tools/loadgen/deploy up / seed PRESET=daily-heavy / run-daily
PRESET=daily-heavy and the other block at lines 240-248) from plain ``` to
```bash so they satisfy MD040; locate the command examples in
tools/loadgen/README.md and replace their opening fences accordingly.

tools/loadgen/daily.go (1)

406-413: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Fail the run when multiplex pool initialization fails.

Continuing without multiplex means higher configured N steps may run far below target load, which can produce a false “sustainable N” answer.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/loadgen/daily.go` around lines 406 - 413, The current code swallows
newMultiplexPool errors (when cfg.MultiplexPoolSize > 0) by logging and
continuing, which must instead cause the run to fail; replace the slog.Error +
mux=nil branch with a hard failure: if this code is inside main, call slog.Error
then os.Exit(1) or use slog.Fatal to terminate; if inside a function that
returns error, return a wrapped error (e.g., fmt.Errorf("multiplex pool init
failed: %w", err)) so the caller stops execution. Ensure you update the branch
that handles newMultiplexPool (the mux variable and error handling) to stop the
run rather than proceed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/loadgen/daily_pool.go`:
- Around line 148-173: The Add flow can leave provisional subscriptions active
if a later Subscribe fails; update Add to track the provisional subscriptions
created in the first pass (store the nc and subject strings created when calling
nc.Subscribe for subject.RoomEvent(roomID) and for
subject.UserRoomEvent(u.Account)), and if any subsequent Subscribe returns an
error, iterate that tracking slice and call Unsubscribe (or Unsubscribe/Drain as
appropriate on nc) for each stored subscription before returning the error; only
after the second pass completes successfully should you increment p.roomRefs and
add entries to p.dispatch so there is no gap between creating a subscription and
tracking it (refer to Add, p.roomRefs, p.conns, p.nextConn, subject.RoomEvent,
subject.UserRoomEvent, and p.route).

In `@tools/loadgen/daily.go`:
- Around line 81-100: In parseStepList ensure each parsed step is >0 and
strictly increasing by validating (n*mult) > 0 and comparing each computed value
against the previous appended value, returning a descriptive error if a
non-positive or non-increasing value is encountered; update the same validation
in the duplicate parsing block referenced elsewhere so any code that parses
--steps enforces positive, strictly increasing steps and returns an error
instead of silently accepting invalid sequences.
- Around line 433-435: The bare return of err (the snippet with "if err != nil {
return nil, err }") must be replaced with an error wrap that adds descriptive
operation context; change the return to wrap err using fmt.Errorf with a clear
message about the failing operation (for example: fmt.Errorf("failed to
<describe request/operation>: %w", err)), keeping the existing nil return value
and preserving the original err via %w so callers can inspect it.

---

Duplicate comments:
In `@docs/superpowers/specs/2026-05-27-daily-im-load-scenario-design.md`:
- Around line 212-222: The fenced console-output block showing the table (the
triple-backtick block containing "N p50 p95 p99 err% worst-pending-delta verdict
..." and the ANSWER lines) needs a language tag to satisfy MD040; change the
opening fence from ``` to ```text so the block starts with ```text and leave the
rest unchanged.

In `@tools/loadgen/daily.go`:
- Around line 406-413: The current code swallows newMultiplexPool errors (when
cfg.MultiplexPoolSize > 0) by logging and continuing, which must instead cause
the run to fail; replace the slog.Error + mux=nil branch with a hard failure: if
this code is inside main, call slog.Error then os.Exit(1) or use slog.Fatal to
terminate; if inside a function that returns error, return a wrapped error
(e.g., fmt.Errorf("multiplex pool init failed: %w", err)) so the caller stops
execution. Ensure you update the branch that handles newMultiplexPool (the mux
variable and error handling) to stop the run rather than proceed.

In `@tools/loadgen/README.md`:
- Around line 224-228: Update the markdown command fences in the README examples
so they are explicitly Bash: change the two triple-backtick blocks that contain
the make commands (the blocks shown with make -C tools/loadgen/deploy up / seed
PRESET=daily-heavy / run-daily PRESET=daily-heavy and the other block at lines
240-248) from plain ``` to ```bash so they satisfy MD040; locate the command
examples in tools/loadgen/README.md and replace their opening fences
accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8241471-7084-40cc-b1cf-d2d8b6ac364c

📥 Commits

Reviewing files that changed from the base of the PR and between 700a83b and 07c9239.

📒 Files selected for processing (25)

.gitignore
docs/superpowers/plans/2026-05-27-daily-im-load-scenario.md
docs/superpowers/specs/2026-05-27-daily-im-load-scenario-design.md
tools/loadgen/README.md
tools/loadgen/collector.go
tools/loadgen/daily.go
tools/loadgen/daily_actions.go
tools/loadgen/daily_actions_test.go
tools/loadgen/daily_envelope.go
tools/loadgen/daily_envelope_test.go
tools/loadgen/daily_integration_test.go
tools/loadgen/daily_pool.go
tools/loadgen/daily_pool_test.go
tools/loadgen/daily_report.go
tools/loadgen/daily_report_test.go
tools/loadgen/daily_test.go
tools/loadgen/daily_user.go
tools/loadgen/daily_user_test.go
tools/loadgen/daily_verdict.go
tools/loadgen/daily_verdict_test.go
tools/loadgen/deploy/Makefile
tools/loadgen/main.go
tools/loadgen/main_test.go
tools/loadgen/preset.go
tools/loadgen/preset_test.go

✅ Files skipped from review due to trivial changes (1)

.gitignore

🚧 Files skipped from review as they are similar to previous changes (18)

tools/loadgen/daily_integration_test.go
tools/loadgen/daily_report_test.go
tools/loadgen/main_test.go
tools/loadgen/daily_envelope.go
tools/loadgen/main.go
tools/loadgen/daily_user_test.go
tools/loadgen/preset_test.go
tools/loadgen/daily_test.go
tools/loadgen/daily_pool_test.go
tools/loadgen/daily_report.go
tools/loadgen/daily_envelope_test.go
tools/loadgen/daily_user.go
tools/loadgen/daily_actions_test.go
tools/loadgen/preset.go
tools/loadgen/daily_actions.go
tools/loadgen/collector.go
tools/loadgen/daily_verdict.go
tools/loadgen/daily_verdict_test.go

coderabbitai · 2026-05-28T14:38:41Z

+	// First pass: subscribe to any new room subjects via round-robin conn.
+	// Track which rooms we subscribed *in this Add* so partial failures can
+	// be undone. (roomRefs already > 0 means an earlier user already
+	// subscribed — no new sub needed.)
+	for _, roomID := range u.Rooms {
+		if p.roomRefs[roomID] > 0 || len(p.conns) == 0 {
+			continue
+		}
+		nc := p.conns[p.nextConn%len(p.conns)]
+		p.nextConn++
+		if _, err := nc.Subscribe(subject.RoomEvent(roomID), p.route); err != nil {
+			return fmt.Errorf("multiplex subscribe %s: %w", roomID, err)
+		}
+		// Mark provisionally with refcount 0 — the second pass below will
+		// increment it. We don't increment here so a subsequent Subscribe
+		// failure doesn't leave a dangling subscription.
+	}
+
+	// User-scoped subject for DM broadcasts. Subscribed per-user (no
+	// refcount needed since UserRoomEvent is scoped to the account).
+	if len(p.conns) > 0 {
+		nc := p.conns[p.nextConn%len(p.conns)]
+		p.nextConn++
+		if _, err := nc.Subscribe(subject.UserRoomEvent(u.Account), p.route); err != nil {
+			return fmt.Errorf("multiplex subscribe user %s: %w", u.ID, err)
+		}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Rollback successful subscriptions when Add fails mid-flight.

If any later Subscribe fails, earlier successful subscriptions from this same call remain active but are never tracked in roomRefs/dispatch. Retries can then double-subscribe the same room and over-route messages.

Proposed fix

func (p *multiplexPool) Add(u *userState) error { inbox := make(chan *nats.Msg, 128) p.mu.Lock() defer p.mu.Unlock() + createdSubs := make([]*nats.Subscription, 0, len(u.Rooms)+1) + rollback := func() { + for _, s := range createdSubs { + _ = s.Unsubscribe() + } + } for _, roomID := range u.Rooms { if p.roomRefs[roomID] > 0 || len(p.conns) == 0 { continue } nc := p.conns[p.nextConn%len(p.conns)] p.nextConn++ - if _, err := nc.Subscribe(subject.RoomEvent(roomID), p.route); err != nil { + sub, err := nc.Subscribe(subject.RoomEvent(roomID), p.route) + if err != nil { + rollback() return fmt.Errorf("multiplex subscribe %s: %w", roomID, err) } + createdSubs = append(createdSubs, sub) } if len(p.conns) > 0 { nc := p.conns[p.nextConn%len(p.conns)] p.nextConn++ - if _, err := nc.Subscribe(subject.UserRoomEvent(u.Account), p.route); err != nil { + sub, err := nc.Subscribe(subject.UserRoomEvent(u.Account), p.route) + if err != nil { + rollback() return fmt.Errorf("multiplex subscribe user %s: %w", u.ID, err) } + createdSubs = append(createdSubs, sub) }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/daily_pool.go` around lines 148 - 173, The Add flow can leave provisional subscriptions active if a later Subscribe fails; update Add to track the provisional subscriptions created in the first pass (store the nc and subject strings created when calling nc.Subscribe for subject.RoomEvent(roomID) and for subject.UserRoomEvent(u.Account)), and if any subsequent Subscribe returns an error, iterate that tracking slice and call Unsubscribe (or Unsubscribe/Drain as appropriate on nc) for each stored subscription before returning the error; only after the second pass completes successfully should you increment p.roomRefs and add entries to p.dispatch so there is no gap between creating a subscription and tracking it (refer to Add, p.roomRefs, p.conns, p.nextConn, subject.RoomEvent, subject.UserRoomEvent, and p.route).

coderabbitai · 2026-05-28T14:38:41Z

+		if err != nil {
+			return nil, err
+		}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Wrap request errors with operation context.

return nil, err drops call-site context and violates the repo’s error-wrapping rule.

Proposed fix

msg, err := pubConn.RequestWithContext(ctx, subj, data) if err != nil { - return nil, err + return nil, fmt.Errorf("nats request %s: %w", subj, err) }

As per coding guidelines: "Never return bare err or fmt.Errorf("error: %w", err) — always provide descriptive context".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if err != nil {

return nil, err

}

msg, err := pubConn.RequestWithContext(ctx, subj, data)

if err != nil {

return nil, fmt.Errorf("nats request %s: %w", subj, err)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/daily.go` around lines 433 - 435, The bare return of err (the snippet with "if err != nil { return nil, err }") must be replaced with an error wrap that adds descriptive operation context; change the return to wrap err using fmt.Errorf with a clear message about the failing operation (for example: fmt.Errorf("failed to <describe request/operation>: %w", err)), keeping the existing nil return value and preserving the original err via %w so callers can inspect it.

Task-by-task TDD plan: generic ramp engine + verdict + report, reusing the existing message-send and history open-loop generators, exposed as a max-rps subcommand. rps-prefixed identifiers avoid collision with PR #234. https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha

* docs: design spec for loadgen max-rps SLO finder Mirrors PR #234's step-up-and-hold-under-SLO control loop, ramping target RPS instead of user count N, to auto-find the max sustainable RPS for the messages and history loadgen workloads. Standalone on main with rps-prefixed identifiers so it neither depends on nor collides with #234. https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * docs: implementation plan for loadgen max-rps SLO finder Task-by-task TDD plan: generic ramp engine + verdict + report, reusing the existing message-send and history open-loop generators, exposed as a max-rps subcommand. rps-prefixed identifiers avoid collision with PR #234. https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * feat(loadgen): add rps step verdict types and evaluateRPSStep https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * fix(loadgen): address Task 1 review — pointer verdict param, String() and precedence tests https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * feat(loadgen): add rps ramp engine (parseRPSSteps, runRamp) https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * fix(loadgen): address Task 2 review — cover runRamp branches, overflow guard, cleanups https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * feat(loadgen): add max-rps report renderer and CSV writer https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * test(loadgen): cover no-Next-limit path and multi-series CSV alignment https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * feat(loadgen): add Collector.Reset for per-step ramp windows * feat(loadgen): add messages workload adapter for max-rps https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * fix(loadgen): address Task 5 review — shut metrics server on constructor error, log gather error - Add shutdownSrv helper and call it on all three subscription-error return paths so the metrics HTTP server is not leaked when construction fails. - Log the Gather() error in snapshotCounters instead of silently discarding it. - Strengthen TestDiffCounters to assert zero-delta for marshal/gatekeeper/bad_reply. https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * feat(loadgen): add history workload adapter for max-rps https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * feat(loadgen): wire max-rps subcommand into dispatch https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * test(loadgen): integration coverage for max-rps messages ramp https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * docs(loadgen): document max-rps subcommand and add run-max-rps target https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha * docs(loadgen): fix max-rps README + validate history flags Correct the README "Reading the output" section (INCONCLUSIVE does not stop the ramp and is not a pass; fix the no-pass ANSWER string), validate --page-limit/--request-timeout > 0, and document the deliberate straggler-exclusion in the messages hold-window counter snapshot. https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha --------- Co-authored-by: Claude <noreply@anthropic.com>

Add directPool which, for each simulated user, opens a dedicated nats.Conn and Subscribes to every room's broadcast subject. Each delivery is timestamped and correlated against the shared Collector via RecordBroadcast so end-to-end (E2) latency can be measured. Wired into the daily-IM runtime by the next task; symbols carry //nolint:unused until that wiring lands so this commit stands alone. https://claude.ai/code/session_014KJfz9ZA7cCe4rAmfSHHmt

https://claude.ai/code/session_014KJfz9ZA7cCe4rAmfSHHmt

Wires the daily-IM runtime end-to-end: prodEnvFactory builds the real NATS direct/multiplex pools, pending poller, service scraper, and JWT minting; runDaily is the production entrypoint dispatched from main.go. Drops //nolint:unused suppressions on symbols now genuinely consumed via the production path (directPool/multiplexPool + constructors, pollPending, serviceScraper + Scrape, runStep/activateUsers/stepEnv). startEmitter and doAction remain suppressed pending the per-user emitter wiring task. https://claude.ai/code/session_014KJfz9ZA7cCe4rAmfSHHmt

Apply correctness and cleanup fixes surfaced by /simplify high-effort review: Correctness: - runStep: return Inconclusive on ctx cancel and on pollPending error instead of zero-valued StepResult that renderConsole treats as PASS - multiplexPool.Add: subscribe BEFORE mutating dispatch/refcount, so partial-subscribe failures leave the pool consistent - multiplexPool.Close: stop closing per-user channels — race against in-flight route() could panic on send-on-closed; rely on Drain+GC - multiplexPool.route: hold lock through inbox send so Close can't race; drop counter incremented outside lock to avoid contention - newMultiplexPool: return error instead of panic on connect failure - runDailyForTest: defer Close on direct+multiplex pools to drain connections cleanly between/after runs - directPool/multiplexPool: also subscribe to UserRoomEvent(account) so DM broadcasts (the dominant receive traffic for daily presets) are observed - prodEnvFactory: wire publish/request fns through stepEnv; emitters now actually emit actions in production (previously startEmitter was dead code with //nolint:unused, making the verdict trivially pass) - doAction: use env.siteID instead of hardcoded "site-local" - runDailyForTest: propagate baseCfg.SiteID into fixtures and stepEnv - prodEnvFactory: drop the hardcoded :9100/metrics URLs — backend services don't expose HTTP endpoints, so the scrape was a permanent no-op masquerading as a real signal Cleanup: - 5s tick in runStep hold loop → single context-aware timer - maxInt helper → slices.Max - joinReasons body → strings.Join - scrapeErrorCounter manual read loop → io.ReadAll Known limitation documented in preset.go: large-band rooms have UserCount > LargeRoomThreshold in production fixtures, which causes message-gatekeeper to block non-thread sends from member-role users. Daily presets work around this by funneling sends to smaller rooms; large-band rooms still exercise fan-out via receive-side subscriptions.

…ange

`loadgen daily -h` now prints a description of the scenario, SLO thresholds, receiver topology, preset summary, flag list with units, and three example invocations. Also exit cleanly on -h/--help instead of logging a 'parse daily config error=flag: help requested' line.

100k-user fixtures take minutes to build (O(N*perUser*R) weighted picker in preset.go); without a log line up front, the silence looked like a hang. Now you see 'building fixtures users=N' immediately and 'fixtures built rooms=X subscriptions=Y elapsed=Zs' when the picker finishes, so it's clear whether the process is in fixture build, in the per-step warmup, or stuck. The picker itself remains the original weighted-by-remaining-capacity algorithm; an earlier attempt to swap it for a shuffled slot-bag preserved the asymptotic complexity win but broke the DM band's perfect-fit tail (any duplicate-skip burns a slot the last user needs) and was reverted. A proper fix (stub-pairing for DM band only) is deferred to a follow-up.

…orting A misbehaving NATS monitoring endpoint was marking every step Inconclusive — too strict. Now: - pollPending retries up to 3x with linear backoff (200ms, 400ms) so transient failures are absorbed. - If the poll still fails after retries, runStep logs a warning and drops the pending-growth signal for that step (passes nil ConsumerPending to evaluateStep). The other four signals (latency, error rate, service errors, self-metrics) still produce a real verdict. - Only ctx.Canceled / DeadlineExceeded from the poller still maps to Inconclusive, since that indicates the run is shutting down, not a flaky server.

Adds NATS_MONITORING_URL (default http://nats:8222/jsz, the docker-compose default) to the base config. prodEnvFactory.Build reads it for the daily-IM pending-growth poller. Lets users running against non-default infra (host-port-mapped NATS, monitoring on a different port, etc.) point the poller at the right endpoint without recompiling.

The DM band's weighted-by-capacity picker was the dominant cost in BuildFixtures: at N users with bands.DMs=perUser the band has R = N*perUser/2 rooms, making the per-pick O(R) scan quadratic in N. At N=100000 (the default --steps tip) the DM band alone was ~3e12 inner-loop iterations, taking hours. Stub-pairing (configuration model) generates a perUser-regular bipartite graph in O(N*perUser): 1. Each user contributes perUser stubs to a flat list. 2. Shuffle the list once. 3. Pair consecutive stubs into DM rooms. 4. Fix self-pairs (stubs[2k]==stubs[2k+1]) by swapping the second stub with a later position whose neighbours don't create a new self-loop. Self-loops are rare (~perUser expected over the whole list). By construction every user appears in exactly perUser DMs and every DM has 2 distinct users. The Small/Medium/Large bands keep the existing weighted picker (they're tractable at typical N -- O(R) per pick is fine when R is in the thousands). Locked in by TestBuildFixtures_DailyHeavy_FastAtScale: at N=2000 the whole BuildFixtures call now completes in ~2s on a developer box; the test fails the build if it ever regresses past 30s. Existing TestBuildFixtures_DailyBands (which already asserts the per-user membership count invariant at N=200) still passes.

Three perf+correctness+ops reviews surfaced 14 findings. Applies the verdict-affecting and operationally-critical ones; flags the performance/memory ones as follow-ups in commit messages and code. Correctness: - **Deterministic per-user RNG seed.** startEmitter was seeded from time.Now().UnixNano() XOR len(u.ID) — len is constant for all users ("u-NNNNNN"), so same-nanosecond activations got identical seeds and emitted perfectly correlated action streams. Replaced with a splitmix mix of env.runSeed and the user index, so two runs at the same run-seed produce identical action streams (the whole point of a reproducible verdict). - **Per-step envelope re-anchor.** Emitters used to capture holdStart at activation time, so users from step N kept emitting against step N's curve all the way through step N+1 — by step 2 the early half of the user population was firing at envelope baseline (0.4×rate) instead of following the new step's hold curve, making later steps' "PASS" verdicts wrong. Now stepEnv carries holdStart and holdDuration as atomic.Int64 fields; emitters re-read them on every tick; runStep calls env.setHold(now, hold) right before Reset() at the start of each hold. - **Post-activation hold anchor.** holdStart was computed *before* activateUsers, which is rate-limited at 500 users/sec. At a +50k delta this meant 100s of activation, mostly during what was supposed to be warmup, with fresh users emitting at the envelope's clamped edge instead of ramping. The re-anchor above also fixes this — hold begins after warmup actually elapsed. - **percentile uses ceil-based nearest-rank.** Old floor-based indexing (int(p*(len-1))) returned p98 for p99 on ≤100-sample windows, so tight smoke tests false-PASSed. Now uses ceil(p*len)-1, matching the standard. - **diffPending walks both sides.** A consumer that disappears mid-hold (crashed/deleted) used to be silently dropped from the verdict. Now surfaces as End=0/negative-Delta and trips with reason "<durable> disappeared mid-hold". - **HTTP timeout on /jsz.** pollPendingOnce used http.DefaultClient with no timeout; a hung monitoring endpoint would wedge the whole run until the operator killed the process. Dedicated 5s-timeout client. - **ctx-aware ramp loop.** runDailyForTest now checks ctx.Err() between steps, so SIGINT mid-cooldown doesn't produce a junk trail of INCONCLUSIVE rows for steps that never started. Operational guards: - **Zero-traffic → INCONCLUSIVE.** Publisher conn failure or unwired emitters used to produce a silent PASS at every step. Now evaluateStep returns INCONCLUSIVE with "zero actions attempted" when AttemptedOps==0. - **Effective-N tracking + <95% threshold.** activateUsers increments env.activatedCount/skippedCount atomically; if EffectiveN/N < 95% the step returns INCONCLUSIVE with the shortfall in the reason. StepResult.EffectiveN is surfaced in console (e.g. "20000(10000)") and added as a CSV column. Stops "N=20000 PASS" silently meaning "10000 active". - **Goroutine-as-CPU proxy disabled.** readCPUPercent's heuristic (NumGoroutine/5000 × 100) tripped INCONCLUSIVE at any scale ≥4k users — exactly the regime this tool is built for. Now returns 0; INCONCLUSIVE relies on the GC pause signal alone until a real CPU sample (gopsutil/proc-self-stat) is wired. Follow-ups (not in this commit): - **Perf.** Collector single-mutex contention (~520k locks/sec at N=100k), unbounded byReqID/byMsgID growth (spec promised TTL+1M cap), per-user inbox channels that nobody reads, small/medium band O(N²) picker, one-goroutine-per-user (100k goroutines), per-tick actionCtx allocation. Each is a real win but a non-trivial refactor; queued for the next pass. - **Ergonomics.** Configurable SLO thresholds + seed flags, heartbeat between warmup and verdict, ABORTED marker on ctx-cancel, full per-durable CSV columns, --validate dry-run.

Two perf fixes flagged by the post-review pass. preset.go: small/medium/large band fixture picker switched from the O(N x perUser x R) weighted-scan to a configuration-model slot-bag walk with retry on per-user duplicates. Same shape as the DM band's stub-pairing but generalised for heterogeneous room capacities. N=10000 daily-heavy fixture build drops from minutes to ~1s; N=100k drops from ~30+ min to seconds. TestBuildFixtures_DailyHeavy_FastAtScale bumped to N=10000 to lock the speedup in. collector.go: byReqID/byMsgID/e1/e2 sharded by FNV-1a hash of the correlation key across 64 shards (reqShard for requestID-keyed, msgShard for messageID-keyed; each has its own mutex). At N=100k a busy daily-IM run produces ~150k+ Record* calls/sec, all of which previously serialised on a single sync.Mutex — loadgen became the bottleneck and inflated latency samples before the backend tripped. Per-shard contention drops by ~64x. Determinism preserved: emit-as-you-pick instead of range-over-map (Go map iteration order is randomised), and the slot-bag uses the same seeded r.Intn sequence both runs. TestBuildFixtures_DailyBands still passes its same-seed -> equal Fixtures invariant.

…set) PR #240 (max-rps) landed on main with its own waitOrCancel (in ramp.go, error-returning) and a Reset() method on the unsharded Collector. Both collided with the daily-IM scenario's versions: - daily.go's waitOrCancel returned bool; replaced its callers to use the error-returning ramp.go helper. - collector.go's PR #240 Reset referenced the pre-sharded fields (c.mu, c.byReqID, etc.) that no longer exist; the sharded Reset further down in the file does the same job correctly under per-shard locks. Build, lint, and unit tests green after the reconciliation.

TestRunDaily_Integration_TinyPresetPasses passed vacuously before the recall-review fix wired emitters into prodEnvFactory (no actions = no errors = trivial PASS). With emitters live, the test now actually publishes to subjects like `chat.user.{acct}.room.{room}.{site}.msg.send` and request-replies on history/room-list/etc. — all of which time out in the CI integration job because the testutil NATS container has no gatekeeper, no room-service, and no broadcast-worker subscribed. Result: error_rate=7.69% > 0.1% threshold, step TRIPs, test FAILs. The wiring is correct (proven by unit + race tests); what's wrong is that this scenario fundamentally needs the full docker-compose stack to be meaningful. Skipping with a comment pointing operators at `make -C tools/loadgen/deploy run-daily` for real end-to-end coverage.

…fore Add returns CI hit: TestMultiplexPool_RoutesBroadcastToInbox 'Condition never satisfied'. Root cause: nc.Subscribe queues the SUB command client- side; it's not sent to the broker until the next Flush. A caller (or test) that publishes immediately after Add() can see the broadcast dropped because the server hasn't registered the subscription interest yet. Production emitters tick on a 1s schedule so they never publish during the registration race window, but the test does — explaining why the test was reliably failing only under CI's timing. Fix in both pools (direct + multiplex): call nc.Flush() after the Subscribe pass so Add only returns once the server knows about every new interest. One round-trip per pool conn at activation time, dominated by the Subscribe overhead already incurred.

Bug: daily.go's prodEnvFactory and daily_pool.go's pool constructors called bare nats.Connect(url, nats.Name(...)) with no credentials. On operator-mode NATS servers (the chat-system standard, including the local docker-local stack) anonymous connections either fail outright or get default-deny perms — manifesting as 'permissions violation' on subscribe and the silent-zero-latency symptom reported here. Every other loadgen subcommand correctly used natsutil.Connect(cfg.NatsURL, cfg.NatsCredsFile); only the daily scenario was missing the second arg. Fix: - New connectWithCreds(url, name, credsFile) helper applies nats.UserCredentials when credsFile is non-empty, else falls back to anonymous (testcontainer NATS in unit/integration tests). - newDirectPool, newMultiplexPool, and the prodEnvFactory publisher conn all route through it. - baseCfg.NatsCredsFile (existing env var NATS_CREDS_FILE) flows in through prodEnvFactory; the docker-compose overlay already wires NATS_CREDS_FILE=/etc/nats/backend.creds. After this fix: 'make -C tools/loadgen/deploy run-daily PRESET=...' authenticates as the backend user, subscribe perms work, and latency samples populate — P50/P95/P99 should be non-zero.

…eply room-service registers MessageReadWildcard via nc.QueueSubscribe and unconditionally calls m.Msg.Respond(resp) (room-service/handler.go:83). Respond requires msg.Reply to be set, which nc.Publish does not set — only nc.Request does. The daily readReceipt action was using a.Publish, so every read-receipt action triggered 'nats: message does not have a reply' in room-service's log. Switched to a.Request with the same defaultRequestTimeout the other request-shaped actions use, and renamed/updated the unit test to lock in the contract (must land in c.reqs, not c.pubs). Verified the other six action handlers (scrollHistory, refreshRoomList, muteToggle, roomCreate, memberAdd, threadReply) — they all use Request against the right routes, so readReceipt was the sole mismatch.

…es Request Two unrelated bugs that bit during real runs: 1. readReceipt was using a.Publish for subject.MessageRead, but room-service registers that subject via QueueSubscribe and calls msg.Respond on every received message — which fails with 'nats: message does not have a reply' when the sender used fire-and-forget Publish. Switched to a.Request (matches the pattern of every other request-side action handler). Updated the test. 2. runDaily started straight into BuildFixtures + a full ramp even when the Mongo subscriptions collection was empty for the configured siteID. The gatekeeper then rejected every send with 'user X is not subscribed to room Y' and the operator burned ~30+ minutes before seeing the error_rate trip. Added a preflight that opens a short- lived Mongo connection, counts subscriptions for cfg.SiteID, and bails with an actionable message: ERROR daily pre-flight error="no subscriptions found in mongo for siteID="site-local"; run \`loadgen seed --workload=messages --preset=<your daily preset>\` first (or \`make -C tools/loadgen/deploy seed PRESET=<your preset>\`)" Exit code 2 (config error). Uses a 10s context independent of the run-level ctx so a transient Mongo blip doesn't burn the window before failing.

…atch with seed Symptom: gatekeeper rejects every send with "user X is not subscribed to room Y" even though `loadgen seed` was just run successfully. Root cause: BuildFixtures is deterministic in (preset, seed, siteID). `loadgen seed --workload=messages --preset=daily-heavy` uses the preset's default Users (10000). But runDailyForTest was overriding preset.Users = slices.Max(cfg.Steps) before calling BuildFixtures. If max(steps) != preset.Users, the per-band stub shuffle produces DIFFERENT user/room/sub IDs at run time vs seed time — same prefixes ("u-000123", "room-small-000045"), different totals → different permutation → no overlap. Gatekeeper's GetSubscription misses every lookup and the whole ramp fails on error_rate. Fix: drop the override. runDailyForTest now uses preset.Users as-is, matching `loadgen seed`. activateUsers already caps at len(env.users), and the EffectiveN-shortfall guard surfaces "only X/Y users activated" as INCONCLUSIVE when --steps exceeds preset.Users — much clearer than silent ID drift. Side effect: --steps cannot exceed preset.Users for that preset. For daily-light/heavy/power that's 10000. A startup log line warns when max(steps) > preset.Users so the operator sees the cap before the run starts. No re-seed required after deploying the fix.

Replace the 53-line stub section with a full operator guide that captures every lesson from real runs: - Quick start + prerequisites (seed step, NATS perms, env vars) - Preset comparison with rooms-per-user breakdown - Complete CLI flag reference (units, defaults, when to change) - Environment variables (NATS_CREDS_FILE, NATS_MONITORING_URL, SITE_ID, plus the mongo/valkey vars) - SLO signals with thresholds; PASS/TRIP/INCONCLUSIVE semantics - Sample console + CSV output, how to read 'ANSWER' and N(EffectiveN) annotations - Troubleshooting matrix covering every failure mode encountered: not-subscribed, siteID mismatch, large-room block, 'no reply' from room-service, NATS permissions, all-zero latency, EffectiveN cap, fixture build silence, flaky /jsz - Known limitations explicitly listed (large-room block, no-op auth stub, dormant service-error scraper, disabled CPU proxy) - Pointers to spec, plan, and source files Aims to make the tool self-serviceable: a new operator who has never used it should be able to get to a verdict by following the doc, and an experienced operator hitting an error should find the fix in the troubleshooting matrix without escalating.

Adds observational per-handler latency stats so operators can see what each daily action (send / read_receipt / scroll_history / refresh_room_list / member_add / room_create / mute_toggle) actually costs, in addition to the system-wide publish→broadcast p50/p95/p99 the verdict already uses. - Collector gains per-action latency sample tracking (map[int][]Duration under its own mutex; cleared by Reset alongside the existing counters). - doAction wraps each handler call with time.Now/Since and records the elapsed via RecordActionLatency. - actionKind grows a String() helper for stable lowercase names and an allActionKinds canonical-order slice for the report. - stepInputs/StepResult carry ActionLatencies; evaluateStep computes Count + P50/P95/P99 per action via the existing percentile helper. - Console prints 'actions: send n=8920 p50=12 p95=180 p99=320 | ...' on the line after each step's verdict. - CSV adds 28 columns (<action>_count, _p50_ms, _p95_ms, _p99_ms × 7 actions) in stable allActionKinds order so downstream tools can column-index reliably. Verdict criteria unchanged: these stats are observational only — PASS / TRIP / INCONCLUSIVE still depends on the existing latency, error-rate, pending-growth, and self-metrics signals. Per the operator's request (too easy to confuse a single 'all actions sluggish' p95 with a real broadcast-pipeline regression if we trip on per-action numbers).

The daily presets hard-code Users=10000. Operators wanting to ramp above that previously had to edit preset.go and rebuild. Now both `loadgen seed --workload=messages` and `loadgen daily` accept `--users=N` to override at invocation time. The two MUST be passed the same value — BuildFixtures is deterministic in (preset, seed, siteID), and the per-band stub shuffle uses totalUsers as length, so a mismatch produces entirely different room/sub IDs even though the user IDs (u-000000, u-000001, ...) overlap. The gatekeeper then rejects every send with "user not subscribed". To catch that misuse fast, the daily preflight now also counts users in Mongo and compares to preset.Users (or the --users override). On mismatch it errors with the exact teardown+re-seed commands needed. Usage at, say, N=50000: loadgen seed --workload=messages --preset=daily-heavy --users=50000 loadgen daily --preset=daily-heavy --users=50000 --steps=5000,10000,20000,50000 `--users=0` (default) means "use preset default", the safe path for normal runs and existing scripts.

room-service extracts X-Request-ID from inbound NATS headers via natsutil.ContextWithRequestIDFromHeaders (room-service/handler.go:60). The ID flows through ctx into the canonical event room-service publishes to room-worker's ROOMS stream. room-worker REQUIRES X-Request-ID on canonical events for idempotency (it's the dedup seed) and rejects with 'missing X-Request-ID' otherwise (room-worker/handler.go:752,1219). Before this fix, prodEnvFactory's publish/request closures used pubConn.Publish(subj,data) and pubConn.RequestWithContext(ctx,subj,data) — neither of which sets headers. Empty inbound header at room-service → empty ctx → empty canonical header → room-worker logs the error and drops the event. Visible in room-worker logs as 'missing-X-request-id' on subject 'chat.room.canonical.<site>.member.add'. Now both closures build *nats.Msg with a fresh natsutil.RequestIDHeader (X-Request-ID) per call and use PublishMsg / RequestMsgWithContext. Every emitter action now produces a properly traceable request through the canonical pipeline. The header ID is independent of any payload-level request ID (e.g. SendMessageRequest.RequestID for the gatekeeper) — both serve different layers. In production a real client typically uses the same value at both layers; we could plumb that through actionCtx as a follow-up if request-tracing across layers needs it.

room-service rejects member-add on DM rooms with 'cannot add members to a non-channel room' (DMs have a fixed 2-member shape). For daily-heavy ~25 DMs out of 56 rooms per user, picking uniformly from u.Rooms means ~45% of memberAdd actions error out — inflating error_rate noise and spamming the room-service log. Pre-filter at activation time: userState.ChannelRooms is the subset of Rooms that aren't DMs, computed once in newUserState by ID-prefix ('room-dm-' = DM band, everything else = channel band). memberAdd now picks from u.ChannelRooms instead of u.Rooms. Users with no channel rooms (DM-only) no-op the action rather than erroring. The other actions (sendMessage, readReceipt, scrollHistory, refreshRoomList, muteToggle, roomCreate, threadReply) work fine on DMs and continue using u.Rooms. Only memberAdd has the channel-only constraint at the room-service layer. Tests: - TestNewUserState_ChannelRoomsExcludesDMs locks the prefix-filter - TestMemberAdd_Requests now seeds ChannelRooms explicitly - TestMemberAdd_SkipsWhenNoChannelRooms covers the DM-only no-op path

…l account Two related fixes for actions that hit room-service: roomCreate sent only {name, type}. classifyAndValidate at room-service/handler.go:193 lets that through (name is non-empty), but the channel branch at line 290 then rejects it: allUsers := stripAccount(...) if len(allUsers) == 0 && len(allOrgs) == 0 { return nil, errEmptyCreateRequest } A channel must have at least one member besides the creator. Now we include u.Neighbor in the 'users' array. errEmptyCreateRequest gone. memberAdd was passing the literal 'user-stub' as the target account. room-service validates accounts exist (errUserNotFound) — 'user-stub' isn't in Mongo, so every memberAdd that reached a channel was failing silently after the channel-only filter. Now uses u.Neighbor, which is guaranteed-seeded. userState.Neighbor is computed once in newUserState by shifting the requester's index by 1 ('user-N' → 'user-(N-1)' for N≥1, 'user-0' → 'user-1'). Always != Account; always within the seeded range for any preset with Users ≥ 2 (i.e. all daily-* presets). Tests: - TestNeighborOf covers shift / wrap / fallback - TestRoomCreate_Requests asserts the payload now has a users array - TestMemberAdd_Requests / SkipsWhenNoChannelRooms unchanged shape

The 'timeout' argument to actionCtx.Request was silently ignored: RequestMsgWithContext uses the context's deadline, and the emitter's ctx is the run-level ctx with no deadline. So defaultRequestTimeout (5s) never bound anything; a slow room-service handler could hang arbitrarily long, showing up as huge per-action p50 (e.g. 25s for read_receipt) instead of cleanly timing out and contributing to error_rate. Wrap the incoming ctx with context.WithTimeout(timeout) so each call gets a real deadline. Action handlers continue to pass defaultRequestTimeout=5s; future per-action overrides can vary the value without code changes. This makes the per-action latency numbers a true latency signal: a healthy handler shows actual round-trip ms; a sick one shows a wall at the timeout value plus a matching jump in failed_ops / error_rate.

Previously the verdict only gated on the publish→broadcast latency (the messaging pipeline). Operations like read_receipt, scroll_history, member_add, refresh_room_list — all of which exercise different parts of the system (room-service, history-service, room-worker) — only contributed to error_rate when they timed out, never to latency. A slow handler that stayed under the request timeout was invisible. Now Thresholds carries per-action p95/p99 maps; evaluateStep iterates ActionLatencies and trips if any gated action exceeds its cap. Default thresholds reflect typical chat-backend handler latencies: read_receipt: p95=100ms p99=250ms (mongo update + reply) refresh_room_list: p95=200ms p99=500ms (mongo query per user) scroll_history: p95=500ms p99=1500ms (cassandra bucket walk) member_add: p95=200ms p99=500ms (mongo + canonical pub) mute_toggle: p95=100ms p99=250ms (mongo update) room_create: p95=500ms p99=1500ms (mongo + canonical pub) sendMessage/threadReply are not gated here — they're already gated by the broadcast-pipeline P95LatencyMs / P99LatencyMs at the top of the verdict (the latency the existing p50/p95/p99 columns reflect). Operator can tune via --action-p95-ms / --action-p99-ms flags: loadgen daily --preset=daily-heavy \ --action-p95-ms=read_receipt:80,scroll_history:300 \ --action-p99-ms=read_receipt:200,scroll_history:800 Format: comma-separated name:N pairs. Unset actions keep defaults; unknown names are rejected at parse time. Set N to a very large number to effectively disable an action's gate. Trip reasons name the action explicitly: scroll_history p95=612ms > 500 read_receipt p99=340ms > 250

The action marks the user's subscription as read (msg.read handler). "read_receipt" invited confusion with the separate MessageReadReceipt notification path; "mark_read" names what the action actually does. Breaking for existing CSV consumers and CLI users: the per-action column names change from read_receipt_* to mark_read_*, and --action-p95-ms / --action-p99-ms must use "mark_read" as the key.

…arge history-large materialized a single 65M-row plannedMessage slice (~50 GB of strings), so seeding got SIGKILLed before reaching Cassandra. BuildHistoryFixtures now keeps only the Fixtures + ThreadParents map and hands the seeder an IterateRoomMessages iterator that builds one room's plan at a time (~50 MB peak). FullPlan stays available for tests and small/medium presets where materialization is bounded. Per-room randomness is split into two streams — structural (sender, jitter, parent-selection, reply offsets) and content (message body bytes) — so a cheap metadata-only walk can derive Room.LastMsgAt and ThreadParents without paying the O(MessagesPerRoom × ContentBytes) cost of regenerating content. CreatedAt and content bytes for a given (seed, preset) are different from before — no test asserts on either, but anyone diffing a re-seed against an old Cassandra dump will see the change.

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

hmchangw changed the title ~~docs: daily-IM load scenario design spec~~ feat(loadgen): daily-IM load scenario to find sustainable N May 28, 2026

hmchangw force-pushed the claude/gifted-rubin-ry8HI branch 2 times, most recently from 6669ca2 to 07c9239 Compare May 28, 2026 14:30

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

hmchangw mentioned this pull request May 28, 2026

feat(loadgen): max-rps SLO finder for message-send & history #240

Merged

7 tasks

hmchangw force-pushed the claude/gifted-rubin-ry8HI branch 2 times, most recently from 354715f to a76f537 Compare June 2, 2026 19:15

claude added 19 commits June 3, 2026 01:28

docs: daily-IM load scenario design spec

7576abb

docs: daily-IM load scenario implementation plan

4bf43d0

loadgen: add DailyBands field and daily-light/heavy/power presets

1703b02

loadgen: banded fixture build for daily-IM presets

73aeaf5

loadgen: diurnal envelope for daily-IM scenario

75bedcc

loadgen: user state machine + action picker for daily-IM

0bd8a5d

loadgen: send/read-receipt/room-list action handlers

58fa2ca

loadgen: history/mute/room-create/member-add/thread action handlers

fb31152

loadgen: multiplex receiver pool with drop counting

4825ecd

https://claude.ai/code/session_014KJfz9ZA7cCe4rAmfSHHmt

loadgen: SLO verdict evaluator for daily-IM steps

4938b8f

loadgen: pending poller + service scraper + self-metrics

3d53afc

loadgen: parseDailyConfig CLI flags + validation

db5d998

loadgen: per-step lifecycle (warmup/hold/cooldown)

24b0e97

https://claude.ai/code/session_014KJfz9ZA7cCe4rAmfSHHmt

loadgen: per-user emitter goroutines + runDaily control loop

7351c5e

loadgen: console + CSV report for daily-IM scenario

d87b319

loadgen: dispatch test + usage line for daily subcommand

e640a1b

loadgen: integration test for daily-IM scenario

04c2dd4

claude added 27 commits June 3, 2026 01:28

loadgen: httptest unit coverage for pending poller + service scraper

ef9542e

loadgen: fix integration test call site after newMultiplexPool sig ch…

ebf328a

…ange

hmchangw force-pushed the claude/gifted-rubin-ry8HI branch from af5b086 to 7829660 Compare June 3, 2026 01:30

mliu33 approved these changes Jun 3, 2026

View reviewed changes

mliu33 merged commit 806f593 into main Jun 3, 2026
6 checks passed

	if err := a.Publish(a.Ctx, subject.MessageRead(u.Account, roomID, a.SiteID), payload); err != nil {
	if err := a.Publish(a.Ctx, subject.MessageReadReceipt(u.Account, roomID, a.SiteID), payload); err != nil {

Conversation

hmchangw commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's implemented

Status

Known limitations / follow-ups

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hmchangw commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading