From a83be2ca9d8babc2fe00219a266099e83b800777 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 15:50:12 +0000
Subject: [PATCH 01/16] docs: design spec for loadgen max-rps SLO finder

Mirrors PR #234's step-up-and-hold-under-SLO control loop, ramping target
RPS instead of user count N, to auto-find the max sustainable RPS for the
messages and history loadgen workloads. Standalone on main with
rps-prefixed identifiers so it neither depends on nor collides with #234.

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 .../2026-05-28-max-rps-slo-loadgen-design.md  | 257 ++++++++++++++++++
 1 file changed, 257 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-05-28-max-rps-slo-loadgen-design.md
diff --git a/docs/superpowers/specs/2026-05-28-max-rps-slo-loadgen-design.md b/docs/superpowers/specs/2026-05-28-max-rps-slo-loadgen-design.md
new file mode 100644
index 000000000..4a430178c
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-28-max-rps-slo-loadgen-design.md
@@ -0,0 +1,257 @@
+# Design: `loadgen max-rps` — auto-find Max RPS under SLO
+
+**Status:** Approved (brainstorming complete)
+**Date:** 2026-05-28
+**Scope:** `tools/loadgen/`
+
+## 1. Background & goal
+
+PR #234 ("feat(loadgen): daily-IM load scenario to find sustainable N") introduced a
+**step-up-and-hold-under-SLO control loop**: it ramps a load parameter through a series of
+steps, holds at each step, evaluates SLO signals, and reports the largest step value at which
+all signals held (`ANSWER: N = …`).
+
+This design applies the **same control loop** to a different axis: instead of ramping the
+number of simulated users `N`, we ramp the **target request rate (RPS)** of the existing
+**open-loop** load generators, to automatically find the **maximum RPS each workload can
+sustain under an SLO**.
+
+Two existing workloads are covered:
+
+- **messages** — the `run` subcommand: an open-loop publisher into the messaging pipeline
+  (`message-gatekeeper` → `MESSAGES_CANONICAL` → `message-worker` + `broadcast-worker`),
+  measuring E1 (gatekeeper ack) and E2 (broadcast visibility) latency and sampling the
+  `message-worker` / `broadcast-worker` consumer backlog.
+- **history** — the `history-sustained` subcommand: an open-loop NATS request/reply workload
+  against history-service's synchronous read handlers (`LoadHistory` + `GetThreadMessages`),
+  backed by Cassandra + MongoDB. No JetStream consumer queue is involved.
+
+### Relationship to PR #234 (decided)
+
+PR #234's verdict / report / pending-poller code lives only on its unmerged branch
+(`claude/gifted-rubin-ry8HI`); none of it is on `main`. This work is built **standalone on
+`main`**, mirroring #234's design with **workload-agnostic helpers** and **`rps`-prefixed
+identifiers** so there is:
+
+- **no dependency** on the unmerged PR, and
+- **no symbol collision** in `package main` whether #234 merges before or after this branch
+  (#234 uses `Thresholds`, `StepResult`, `evaluateStep`, `percentile`, `parseStepList`,
+  `renderConsole`, `writeDailyCSV`; this work uses `rpsThresholds`, `rpsStepResult`,
+  `evaluateRPSStep`, `rpsPercentile`, `parseRPSSteps`, `renderRPSReport`, `writeRPSCSV`).
+
+If/when #234 merges, converging the two implementations into shared helpers (or a small
+`pkg/` library) is a mechanical refactor, not a rewrite.
+
+## 2. CLI surface
+
+```
+loadgen max-rps --workload=messages|history --preset=<name> [flags]
+```
+
+| Flag | Default | Notes |
+|------|---------|-------|
+| `--workload` | `messages` | `messages` or `history` |
+| `--preset` | (required) | an existing preset for the chosen workload (`BuiltinPreset` / `BuiltinHistoryPreset`) |
+| `--steps` | messages `500,1k,2k,5k,10k` / history `200,500,1k,2k,5k` | explicit ordered RPS list; `k` suffix = ×1000 |
+| `--warmup` | `10s` | per-step warmup (samples discarded) |
+| `--hold` | `30s` | per-step measurement window |
+| `--cooldown` | `5s` | per-step settle gap before next step |
+| `--slo-p95` | `100ms` | applied to **every** gated latency series |
+| `--slo-p99` | `250ms` | applied to **every** gated latency series |
+| `--slo-error-rate` | `0.001` | `failed / attempted` (0.1%) |
+| `--slo-pending-growth` | `1000` | **messages only**: per-durable end−start `NumPending` delta |
+| `--rate-tolerance` | `0.05` | achieved-vs-target shortfall band for the INCONCLUSIVE guard |
+| `--stop-on-trip` | `true` | stop the ramp at the first TRIP (does **not** stop on INCONCLUSIVE) |
+| `--seed` | `42` | RNG seed (parity with existing subcommands) |
+| `--csv` | "" | optional CSV output path |
+
+Per-workload defaults for `--steps` are chosen because messages are fire-and-forget publishes
+(can sustain high RPS) while history requests are bounded-concurrency request/reply (lower
+ceiling). Both lists are fully overridable.
+
+Validation: `--preset` required; `--steps` must parse to a non-empty ascending list of
+positive ints; latency/error/tolerance thresholds must be > 0. `history` workload requires
+`CASSANDRA_HOSTS` (same fail-fast as `history-sustained`).
+
+## 3. Architecture
+
+A generic engine drives a per-workload adapter. Everything lives in `tools/loadgen`
+(`package main`), consistent with the existing flat loadgen layout.
+
+### New files
+
+- **`ramp.go`** — generic driver. `parseRPSSteps(string) ([]int, error)` (comma split,
+  `k` suffix, ascending-positive validation); the step iterator that calls the adapter per
+  step, applies `--stop-on-trip`, and tracks `lastPass`. Knows nothing about NATS/Mongo.
+- **`verdict.go`** — `rpsThresholds`, `rpsStepInputs`, `rpsStepResult`, the pure
+  `evaluateRPSStep(in rpsStepInputs, th rpsThresholds) rpsStepResult`, and `rpsPercentile`.
+  Latency is modeled as **named series** so "E1+E2" and per-endpoint gate uniformly.
+- **`maxrps_report.go`** — `renderRPSReport` (console table + `ANSWER:` line) and
+  `writeRPSCSV` (one row per step).
+- **`maxrps_messages.go`** — `messagesWorkload` adapter (implements `rpsWorkload`): reuses
+  `Generator`, `Collector`, the E1/E2 subscriptions and `ConsumerSampler` to run the
+  messaging pipeline at a given RPS for the hold window and harvest `rpsStepInputs`
+  (E1+E2 latency series, attempted/failed counts, saturation count, achieved RPS, and
+  consumer-pending deltas).
+- **`maxrps_history.go`** — `historyWorkload` adapter: reuses `HistoryGenerator` and
+  `HistoryCollector`; harvests per-endpoint latency series (LoadHistory, GetThreadMessages),
+  error/timeout counts, saturation count, achieved RPS; **no** pending deltas.
+- **`maxrps.go`** — `runMaxRPS(ctx, cfg, args)`: flag parsing, dependency wiring (NATS,
+  Mongo, Valkey, and Cassandra for history), builds the adapter, runs the ramp, renders the
+  report. Wired into `dispatch` in `main.go` as the `max-rps` case.
+
+### The `rpsWorkload` interface (engine ↔ adapter seam)
+
+```go
+type rpsWorkload interface {
+    // RunStep drives open-loop load at targetRPS. The engine handles phase
+    // timing; RunStep blocks for (warmup+hold), resetting measurement at the
+    // hold boundary, and returns the harvested inputs for this step.
+    RunStep(ctx context.Context, targetRPS int, warmup, hold time.Duration) (rpsStepInputs, error)
+    // Label is used in the ANSWER line / report header.
+    Label() string
+}
+```
+
+The engine owns warmup/hold/cooldown timing, `--stop-on-trip`, and `lastPass`; the adapter
+owns "how to emit load and harvest a normalized result." This is the convergence seam that
+maps onto #234's `envFactory`/`stepEnv` split.
+
+### Normalized step inputs
+
+```go
+type latencySeries struct {
+    Name    string          // "E1","E2"  OR  "history","thread"
+    Samples []time.Duration
+}
+
+type rpsStepInputs struct {
+    TargetRPS    int
+    AchievedRPS  float64
+    Latencies    []latencySeries
+    AttemptedOps int
+    FailedOps    int
+    Saturation   int                     // open-loop self-saturation tally
+    Pending      []consumerPendingDelta  // empty for history
+}
+```
+
+## 4. Per-step lifecycle
+
+For each RPS step, the engine runs:
+
+```
+activate rate  →  warmup  →  [hold start: reset collector + snapshot pending]
+              →  hold (accumulate samples)
+              →  [hold end: snapshot pending + harvest inputs]
+              →  evaluate verdict  →  cooldown
+```
+
+NATS connection, subscriptions, consumer samplers, and the collector stay alive across
+steps; each step simply re-points the generator at the new RPS. The run has no `--duration`
+flag — its length is the sum over steps of `warmup + hold + cooldown`, plus an early stop on
+the first TRIP when `--stop-on-trip` is set. A SIGINT/SIGTERM during any phase ends the run
+after printing whatever results exist so far. A failed pending snapshot at either boundary
+marks that step INCONCLUSIVE (cannot trust the backlog signal).
+
+Measurement covers the **full hold window** (collector reset at hold start, read at hold
+end). #234's documented "middle 60% of hold" window was never implemented and is unnecessary
+here because the offered rate is stationary within a step.
+
+## 5. SLO verdict
+
+`evaluateRPSStep` applies this precedence (the **ordering is the key correctness point** and
+deliberately differs from #234):
+
+1. **TRIP** if any of:
+   - any latency series p95 > `--slo-p95`, **or** any series p99 > `--slo-p99`;
+   - error rate (`FailedOps / AttemptedOps`) > `--slo-error-rate`;
+   - (messages only) any `consumerPendingDelta.Delta` > `--slo-pending-growth`.
+   Each tripped condition appends a human-readable reason
+   (e.g. `"E2 p95=143ms > 100ms"`, `"broadcast-worker pending +1240 > +1000"`).
+2. **else INCONCLUSIVE** if `AchievedRPS < (1 − rateTolerance) × TargetRPS`
+   (corroborated by a non-zero `Saturation` tally) — meaning *"the system looked healthy but
+   the harness could not push the target rate, so the limiting factor is the load box, not
+   the service under test."*
+3. **else PASS** — record `lastPass = TargetRPS`.
+
+### Why TRIP must precede the shortfall guard (differs from #234)
+
+#234 evaluates its harness-health signal **first** and returns early, because its GC-pause /
+goroutine-count proxy is independent of the server under test.
+
+Our shortfall signal is **entangled** with server health: when the service saturates, it
+backpressures the open-loop generator and `AchievedRPS` drops *because the server is slow*.
+If the shortfall guard ran first, we would wrongly mark the very step that found the limit as
+INCONCLUSIVE. Therefore server-induced backpressure (latency/pending/error over threshold)
+must be classified as **TRIP**, and only a *healthy-but-cannot-push* step becomes
+INCONCLUSIVE. This single rule is correct for both workloads:
+
+- **messages** — publishes are fire-and-forget, so `AchievedRPS ≈ TargetRPS` almost always;
+  the real ceiling shows up as consumer pending-growth and rising E2 latency → TRIP.
+  INCONCLUSIVE here is rare (only if the NATS client/CPU can't emit fast enough).
+- **history** — request/reply holds an in-flight slot until the reply, so as the server
+  slows, slots fill, ticks drop, latency climbs → TRIP (correctly attributing the plateau to
+  the server). A genuine box limit (healthy latency but can't push rate) → INCONCLUSIVE.
+
+A real box-CPU signal (gopsutil) is a possible future corroborator but is unnecessary given
+the shortfall rule.
+
+## 6. Reporting
+
+Console table, one row per step:
+
+```
+target_rps  achieved_rps  <per-series p95/p99>  err%  worst_pending_delta  verdict
+```
+
+followed by:
+
+```
+ANSWER: max RPS = <largest passing step> (workload=<messages|history>, preset=<name>)
+        Next limit: <reasons of the FIRST tripped step>
+```
+
+`ANSWER: no step passed` when nothing held. CSV mirrors the table (one row per step) with
+columns: `target_rps,achieved_rps,<series>_p95_ms,<series>_p99_ms,error_rate,attempted,
+failed,worst_durable,worst_pending_delta,verdict,reasons`.
+
+Exit code: reuse `DetermineExitCode` semantics — non-zero if no step passed or the run
+errored; zero otherwise.
+
+## 7. Testing (TDD — Red→Green→Refactor, commit per green step)
+
+- **`verdict_test.go`** — table-driven `evaluateRPSStep`: PASS; TRIP on each signal (E1 p95,
+  E2 p99, error rate, pending growth, per-endpoint latency); shortfall → INCONCLUSIVE;
+  TRIP-beats-shortfall (high latency + low achieved → TRIP, not INCONCLUSIVE); boundary
+  values (exactly at threshold); empty sample sets.
+- **`ramp_test.go`** — `parseRPSSteps` (k-suffix, whitespace, bad tokens, non-ascending,
+  empty); the step iterator against a **fake `rpsWorkload`**: stops on first TRIP, does
+  **not** stop on INCONCLUSIVE, records every result, computes `lastPass` correctly.
+- **`maxrps_report_test.go`** — console table format; ANSWER line for (some pass),
+  (none pass), (last pass then trip); CSV header + row formatting.
+- **Adapter pure-logic tests** — latency-series assembly and achieved-rate computation with
+  fakes / no live NATS.
+- **`integration_test.go`** (`//go:build integration`) — end-to-end `max-rps` 2-step ramp
+  against testcontainers, reusing `pkg/testutil` (NATS for messages; NATS + Mongo + Cassandra
+  for history), asserting a report is produced and the verdict classification is sane.
+
+Coverage: meet the repo's 80% floor; target 90%+ on `verdict.go` / `ramp.go` (pure logic).
+
+## 8. Deliverables beyond code
+
+- `tools/loadgen/README.md` — a `max-rps` section (quick-start for both workloads,
+  flag table, how to read the ANSWER line).
+- `tools/loadgen/deploy/Makefile` — a `run-max-rps` target (parameterized by
+  `WORKLOAD`/`PRESET`/`STEPS`).
+- No `docs/client-api.md` change: this is tooling, not a client-facing handler.
+
+## 9. Out of scope (YAGNI)
+
+- Binary-search refinement between last-pass and first-trip (chose explicit-steps/last-pass).
+- Auto-geometric ramp (`--rps-start/--rps-factor/--rps-max`).
+- The `members` workload (could be a follow-up using the same engine).
+- Cross-site / federation RPS.
+- Real-CPU box-health via gopsutil.
+- Grafana dashboard panels for the ramp.
+- Per-user auth (keep the existing stub behavior of the underlying generators).

From 83ef2ef7d44c085811b7fe850b743750221b62f6 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 16:05:31 +0000
Subject: [PATCH 02/16] docs: implementation plan for loadgen max-rps SLO
 finder

Task-by-task TDD plan: generic ramp engine + verdict + report, reusing
the existing message-send and history open-loop generators, exposed as a
max-rps subcommand. rps-prefixed identifiers avoid collision with PR #234.

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 .../plans/2026-05-28-max-rps-slo-loadgen.md   | 1986 +++++++++++++++++
 1 file changed, 1986 insertions(+)
 create mode 100644 docs/superpowers/plans/2026-05-28-max-rps-slo-loadgen.md

diff --git a/docs/superpowers/plans/2026-05-28-max-rps-slo-loadgen.md b/docs/superpowers/plans/2026-05-28-max-rps-slo-loadgen.md
new file mode 100644
index 000000000..664d93f14
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-28-max-rps-slo-loadgen.md
@@ -0,0 +1,1986 @@
+# loadgen max-rps SLO finder — Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Add a `loadgen max-rps --workload=messages|history` subcommand that ramps target RPS through an explicit step list, holds at each step under an SLO, and reports the largest RPS that held.
+
+**Architecture:** A generic, workload-agnostic engine (`ramp.go` + `verdict.go` + `maxrps_report.go`) drives a per-workload adapter (`maxrps_messages.go` / `maxrps_history.go`) through the `rpsWorkload` interface. Adapters reuse the existing `Generator` / `HistoryGenerator`, `Collector` / `HistoryCollector`, presets, subscriptions, and `ComputePercentiles`. All identifiers are `rps`-prefixed to avoid symbol collisions with PR #234's `daily_*` code in the same `package main`.
+
+**Tech Stack:** Go 1.25, NATS + JetStream (`nats.go`), Prometheus client, testify, testcontainers (integration). Build/test via `make` targets only.
+
+**Spec:** `docs/superpowers/specs/2026-05-28-max-rps-slo-loadgen-design.md`
+
+---
+
+## File structure
+
+All new files live in `tools/loadgen/` (`package main`), consistent with the existing flat layout.
+
+- **Create `tools/loadgen/verdict.go`** — `rpsThresholds`, `seriesSamples`, `consumerPendingDelta`, `rpsStepInputs`, `verdictKind`, `seriesPercentile`, `rpsStepResult`, `evaluateRPSStep`. Pure logic, no I/O.
+- **Create `tools/loadgen/ramp.go`** — `parseRPSSteps`, `waitOrCancel`, `rpsWorkload` interface, `rampConfig`, `runRamp`, `maxRPSExitCode`. Engine only; no NATS.
+- **Create `tools/loadgen/maxrps_report.go`** — `renderRPSReport`, `writeRPSCSV`, `lastPassRPS`, `firstTrip`.
+- **Create `tools/loadgen/maxrps_messages.go`** — `messagesWorkload` adapter + `newMessagesWorkload` constructor + counter/pending snapshot helpers.
+- **Create `tools/loadgen/maxrps_history.go`** — `historyWorkload` adapter + `newHistoryWorkload` constructor.
+- **Create `tools/loadgen/maxrps.go`** — `runMaxRPS` (flag parsing, wiring, ramp, report).
+- **Modify `tools/loadgen/main.go`** — add the `max-rps` case to `dispatch` (`main.go:82-100`).
+- **Modify `tools/loadgen/collector.go`** — add `Collector.Reset()`.
+- **Create tests:** `verdict_test.go`, `ramp_test.go`, `maxrps_report_test.go`, `maxrps_messages_test.go`, `maxrps_history_test.go`, `maxrps_test.go`; extend `integration_test.go`.
+- **Modify `tools/loadgen/README.md`** and **`tools/loadgen/deploy/Makefile`**.
+
+---
+
+## Task 1: Verdict types and `evaluateRPSStep`
+
+**Files:**
+- Create: `tools/loadgen/verdict.go`
+- Test: `tools/loadgen/verdict_test.go`
+
+- [ ] **Step 1: Write the failing tests**
+
+Create `tools/loadgen/verdict_test.go`:
+
+```go
+package main
+
+import (
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func ms(n int) time.Duration { return time.Duration(n) * time.Millisecond }
+
+// nLatencies returns a slice of n identical-latency samples.
+func nLatencies(n int, d time.Duration) []time.Duration {
+	out := make([]time.Duration, n)
+	for i := range out {
+		out[i] = d
+	}
+	return out
+}
+
+func defaultRPSThresholds() rpsThresholds {
+	return rpsThresholds{
+		P95:           ms(100),
+		P99:           ms(250),
+		ErrorRate:     0.001,
+		PendingGrowth: 1000,
+		RateTolerance: 0.05,
+	}
+}
+
+func TestEvaluateRPSStep(t *testing.T) {
+	th := defaultRPSThresholds()
+	tests := []struct {
+		name     string
+		in       rpsStepInputs
+		wantKind verdictKind
+	}{
+		{
+			name: "all healthy passes",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000, FailedOps: 0,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(20))}},
+				Pending:   []consumerPendingDelta{{Durable: "message-worker", Start: 0, End: 10}},
+			},
+			wantKind: verdictPass,
+		},
+		{
+			name: "p95 over threshold trips",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(150))}},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "p99 over threshold trips",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				// 99 samples at 20ms, 1 at 300ms -> p95=20ms (ok), p99=300ms (>250ms).
+				Latencies: []seriesSamples{{Name: "E1", Samples: append(nLatencies(99, ms(20)), ms(300))}},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "error rate over threshold trips",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000, FailedOps: 5, // 0.5% > 0.1%
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(20))}},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "pending growth over threshold trips",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(20))}},
+				Pending:   []consumerPendingDelta{{Durable: "broadcast-worker", Start: 0, End: 1500}},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "per-endpoint: slow thread trips even if history fast",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{
+					{Name: "history", Samples: nLatencies(100, ms(20))},
+					{Name: "thread", Samples: nLatencies(100, ms(180))},
+				},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "healthy but rate shortfall is inconclusive",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 800, // 80% < 95%
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(80, ms(20))}},
+			},
+			wantKind: verdictInconclusive,
+		},
+		{
+			name: "trip beats shortfall: high latency AND low achieved is a TRIP",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 800, // shortfall...
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(80, ms(400))}}, // ...but slow
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "explicit inconclusive flag short-circuits",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Inconclusive: true, InconclusiveReason: "pending snapshot failed",
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(20))}},
+			},
+			wantKind: verdictInconclusive,
+		},
+		{
+			name: "p95 exactly at threshold passes (boundary)",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(100))}},
+			},
+			wantKind: verdictPass,
+		},
+		{
+			name: "empty samples does not panic and passes on other signals",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nil}},
+			},
+			wantKind: verdictPass,
+		},
+	}
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got := evaluateRPSStep(tt.in, th)
+			assert.Equal(t, tt.wantKind, got.Kind, "reasons=%v", got.Reasons)
+		})
+	}
+}
+
+func TestEvaluateRPSStep_AchievedAndErrorRate(t *testing.T) {
+	th := defaultRPSThresholds()
+	in := rpsStepInputs{
+		TargetRPS: 1000, Hold: 2 * time.Second, AttemptedOps: 1000, FailedOps: 100,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(20))}},
+	}
+	got := evaluateRPSStep(in, th)
+	assert.InDelta(t, 500.0, got.AchievedRPS, 0.01) // 1000 ops / 2s
+	assert.InDelta(t, 0.1, got.ErrorRate, 0.0001)   // 100/1000
+}
+
+func TestEvaluateRPSStep_WorstPendingReported(t *testing.T) {
+	th := defaultRPSThresholds()
+	in := rpsStepInputs{
+		TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(20))}},
+		Pending: []consumerPendingDelta{
+			{Durable: "message-worker", Start: 0, End: 50},
+			{Durable: "broadcast-worker", Start: 100, End: 700}, // delta 600, the worst
+		},
+	}
+	got := evaluateRPSStep(in, th)
+	assert.Equal(t, "broadcast-worker", got.WorstDurable)
+	assert.Equal(t, int64(600), got.WorstDelta)
+	assert.Equal(t, verdictPass, got.Kind) // 600 < 1000
+}
+```
+
+- [ ] **Step 2: Run the tests to verify they fail**
+
+Run: `cd tools/loadgen && go test -run 'TestEvaluateRPSStep' . 2>&1 | head -20`
+Expected: FAIL — `undefined: rpsThresholds`, `undefined: evaluateRPSStep`, etc.
+
+- [ ] **Step 3: Write the implementation**
+
+Create `tools/loadgen/verdict.go`:
+
+```go
+package main
+
+import (
+	"fmt"
+	"time"
+)
+
+// rpsThresholds holds the SLO limits a step is judged against. Every gated
+// latency series shares the same P95/P99 limits.
+type rpsThresholds struct {
+	P95, P99      time.Duration
+	ErrorRate     float64
+	PendingGrowth uint64 // messages only; per-durable end-start NumPending delta
+	RateTolerance float64
+}
+
+// seriesSamples is one named latency tape (e.g. "E1","E2" or "history","thread").
+type seriesSamples struct {
+	Name    string
+	Samples []time.Duration
+}
+
+// consumerPendingDelta is one durable's NumPending at the hold boundaries.
+type consumerPendingDelta struct {
+	Durable    string
+	Start, End uint64
+}
+
+// Delta returns End-Start as a signed value (it can be negative if the backlog drained).
+func (d consumerPendingDelta) Delta() int64 { return int64(d.End) - int64(d.Start) }
+
+// rpsStepInputs is the normalized, workload-agnostic measurement of one step.
+type rpsStepInputs struct {
+	TargetRPS    int
+	Hold         time.Duration
+	AttemptedOps int
+	FailedOps    int
+	Saturation   int // open-loop self-saturation tally (corroborates shortfall)
+	Latencies    []seriesSamples
+	Pending      []consumerPendingDelta // empty for history
+	// Inconclusive is set by the adapter when measurement itself failed (e.g. a
+	// pending snapshot errored), independent of the system under test.
+	Inconclusive       bool
+	InconclusiveReason string
+}
+
+type verdictKind int
+
+const (
+	verdictPass verdictKind = iota
+	verdictTrip
+	verdictInconclusive
+)
+
+func (k verdictKind) String() string {
+	switch k {
+	case verdictTrip:
+		return "TRIP"
+	case verdictInconclusive:
+		return "INCONCLUSIVE"
+	default:
+		return "PASS"
+	}
+}
+
+// seriesPercentile is a named series' computed percentiles, for reporting.
+type seriesPercentile struct {
+	Name string
+	Pct  Percentiles
+}
+
+// rpsStepResult is the verdict for one step.
+type rpsStepResult struct {
+	TargetRPS    int
+	AchievedRPS  float64
+	AttemptedOps int
+	FailedOps    int
+	Saturation   int
+	ErrorRate    float64
+	Latencies    []seriesPercentile
+	WorstDurable string
+	WorstDelta   int64
+	Kind         verdictKind
+	Reasons      []string
+}
+
+// evaluateRPSStep classifies a step PASS / TRIP / INCONCLUSIVE.
+//
+// Precedence (deliberately differs from PR #234):
+//  1. explicit measurement-failure inconclusive (harness/measurement issue),
+//  2. TRIP if any SLO signal is over threshold — server-induced backpressure
+//     must NOT be misread as a harness limit,
+//  3. INCONCLUSIVE if the harness could not push the target rate while the
+//     system looked healthy (the load box is the limit, not the service),
+//  4. PASS.
+func evaluateRPSStep(in rpsStepInputs, th rpsThresholds) rpsStepResult {
+	res := rpsStepResult{
+		TargetRPS:    in.TargetRPS,
+		AttemptedOps: in.AttemptedOps,
+		FailedOps:    in.FailedOps,
+		Saturation:   in.Saturation,
+	}
+	if in.Hold > 0 {
+		res.AchievedRPS = float64(in.AttemptedOps) / in.Hold.Seconds()
+	}
+	if in.AttemptedOps > 0 {
+		res.ErrorRate = float64(in.FailedOps) / float64(in.AttemptedOps)
+	}
+
+	// Compute percentiles for every series (always, so the report has data).
+	for _, s := range in.Latencies {
+		res.Latencies = append(res.Latencies, seriesPercentile{Name: s.Name, Pct: ComputePercentiles(s.Samples)})
+	}
+	// Worst pending delta (always, for the report column).
+	res.WorstDelta = 0
+	for _, p := range in.Pending {
+		if d := p.Delta(); d > res.WorstDelta || res.WorstDurable == "" {
+			res.WorstDelta = d
+			res.WorstDurable = p.Durable
+		}
+	}
+
+	// (1) Measurement failure short-circuits.
+	if in.Inconclusive {
+		res.Kind = verdictInconclusive
+		res.Reasons = []string{in.InconclusiveReason}
+		return res
+	}
+
+	// (2) TRIP conditions (accumulate human-readable reasons).
+	var reasons []string
+	for _, sp := range res.Latencies {
+		if sp.Pct.P95 > th.P95 {
+			reasons = append(reasons, fmt.Sprintf("%s p95=%s > %s", sp.Name, sp.Pct.P95, th.P95))
+		}
+		if sp.Pct.P99 > th.P99 {
+			reasons = append(reasons, fmt.Sprintf("%s p99=%s > %s", sp.Name, sp.Pct.P99, th.P99))
+		}
+	}
+	if res.ErrorRate > th.ErrorRate {
+		reasons = append(reasons, fmt.Sprintf("error rate %.3f%% > %.3f%%", res.ErrorRate*100, th.ErrorRate*100))
+	}
+	for _, p := range in.Pending {
+		if d := p.Delta(); d > int64(th.PendingGrowth) {
+			reasons = append(reasons, fmt.Sprintf("%s pending +%d > +%d", p.Durable, d, th.PendingGrowth))
+		}
+	}
+	if len(reasons) > 0 {
+		res.Kind = verdictTrip
+		res.Reasons = reasons
+		return res
+	}
+
+	// (3) Healthy-but-cannot-push -> INCONCLUSIVE.
+	if th.RateTolerance > 0 && res.AchievedRPS < float64(in.TargetRPS)*(1-th.RateTolerance) {
+		res.Kind = verdictInconclusive
+		res.Reasons = []string{fmt.Sprintf(
+			"achieved %.0f rps < %.0f%% of target %d rps (saturation=%d) — load box limited",
+			res.AchievedRPS, (1-th.RateTolerance)*100, in.TargetRPS, in.Saturation)}
+		return res
+	}
+
+	// (4) PASS.
+	res.Kind = verdictPass
+	return res
+}
+```
+
+- [ ] **Step 4: Run the tests to verify they pass**
+
+Run: `cd tools/loadgen && go test -run 'TestEvaluateRPSStep' . 2>&1 | tail -5`
+Expected: PASS (`ok  github.com/hmchangw/chat/tools/loadgen`).
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tools/loadgen/verdict.go tools/loadgen/verdict_test.go
+git commit -m "feat(loadgen): add rps step verdict types and evaluateRPSStep"
+```
+
+---
+
+## Task 2: Ramp engine — `parseRPSSteps`, `waitOrCancel`, `runRamp`
+
+**Files:**
+- Create: `tools/loadgen/ramp.go`
+- Test: `tools/loadgen/ramp_test.go`
+
+- [ ] **Step 1: Write the failing tests**
+
+Create `tools/loadgen/ramp_test.go`:
+
+```go
+package main
+
+import (
+	"context"
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestParseRPSSteps(t *testing.T) {
+	tests := []struct {
+		in      string
+		want    []int
+		wantErr bool
+	}{
+		{in: "500,1000,2000", want: []int{500, 1000, 2000}},
+		{in: "1k,2k,5k", want: []int{1000, 2000, 5000}},
+		{in: " 500 , 1k ", want: []int{500, 1000}},
+		{in: "1000", want: []int{1000}},
+		{in: "", wantErr: true},
+		{in: "abc", wantErr: true},
+		{in: "1000,500", wantErr: true},  // not ascending
+		{in: "0,1000", wantErr: true},    // not positive
+		{in: "1000,1000", wantErr: true}, // not strictly ascending
+	}
+	for _, tt := range tests {
+		t.Run(tt.in, func(t *testing.T) {
+			got, err := parseRPSSteps(tt.in)
+			if tt.wantErr {
+				assert.Error(t, err)
+				return
+			}
+			require.NoError(t, err)
+			assert.Equal(t, tt.want, got)
+		})
+	}
+}
+
+// fakeWorkload returns canned inputs, one per step, in order.
+type fakeWorkload struct {
+	inputs []rpsStepInputs
+	calls  int
+}
+
+func (f *fakeWorkload) Label() string { return "fake" }
+func (f *fakeWorkload) RunStep(_ context.Context, _ int, _, _ time.Duration) (rpsStepInputs, error) {
+	in := f.inputs[f.calls]
+	f.calls++
+	return in, nil
+}
+
+func passInputs(target int) rpsStepInputs {
+	return rpsStepInputs{TargetRPS: target, Hold: time.Second, AttemptedOps: target,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(20))}}}
+}
+
+func tripInputs(target int) rpsStepInputs {
+	return rpsStepInputs{TargetRPS: target, Hold: time.Second, AttemptedOps: target,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(400))}}}
+}
+
+func inconclusiveInputs(target int) rpsStepInputs {
+	return rpsStepInputs{TargetRPS: target, Hold: time.Second, AttemptedOps: target / 2,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(20))}}}
+}
+
+func TestRunRamp_StopsOnTrip(t *testing.T) {
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500), tripInputs(1000), passInputs(2000)}}
+	cfg := rampConfig{Steps: []int{500, 1000, 2000}, Hold: time.Second,
+		Thresholds: defaultRPSThresholds(), StopOnTrip: true}
+	results := runRamp(context.Background(), w, cfg)
+	require.Len(t, results, 2) // stopped after the trip at 1000
+	assert.Equal(t, verdictPass, results[0].Kind)
+	assert.Equal(t, verdictTrip, results[1].Kind)
+	assert.Equal(t, 2, w.calls)
+}
+
+func TestRunRamp_DoesNotStopOnInconclusive(t *testing.T) {
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500), inconclusiveInputs(1000), passInputs(2000)}}
+	cfg := rampConfig{Steps: []int{500, 1000, 2000}, Hold: time.Second,
+		Thresholds: defaultRPSThresholds(), StopOnTrip: true}
+	results := runRamp(context.Background(), w, cfg)
+	require.Len(t, results, 3)
+	assert.Equal(t, verdictInconclusive, results[1].Kind)
+	assert.Equal(t, verdictPass, results[2].Kind)
+}
+
+func TestRunRamp_NoStopOnTripRunsAll(t *testing.T) {
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500), tripInputs(1000), tripInputs(2000)}}
+	cfg := rampConfig{Steps: []int{500, 1000, 2000}, Hold: time.Second,
+		Thresholds: defaultRPSThresholds(), StopOnTrip: false}
+	results := runRamp(context.Background(), w, cfg)
+	require.Len(t, results, 3)
+}
+
+func TestMaxRPSExitCode(t *testing.T) {
+	pass := []rpsStepResult{{Kind: verdictPass}, {Kind: verdictTrip}}
+	none := []rpsStepResult{{Kind: verdictInconclusive}, {Kind: verdictTrip}}
+	assert.Equal(t, 0, maxRPSExitCode(pass))
+	assert.Equal(t, 1, maxRPSExitCode(none))
+	assert.Equal(t, 1, maxRPSExitCode(nil))
+}
+
+func TestWaitOrCancel(t *testing.T) {
+	require.NoError(t, waitOrCancel(context.Background(), time.Millisecond))
+	ctx, cancel := context.WithCancel(context.Background())
+	cancel()
+	assert.Error(t, waitOrCancel(ctx, time.Hour))
+}
+```
+
+- [ ] **Step 2: Run the tests to verify they fail**
+
+Run: `cd tools/loadgen && go test -run 'TestParseRPSSteps|TestRunRamp|TestMaxRPSExitCode|TestWaitOrCancel' . 2>&1 | head -20`
+Expected: FAIL — `undefined: parseRPSSteps`, `undefined: runRamp`, etc.
+
+- [ ] **Step 3: Write the implementation**
+
+Create `tools/loadgen/ramp.go`:
+
+```go
+package main
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"log/slog"
+	"strconv"
+	"strings"
+	"time"
+)
+
+// rpsWorkload is the engine<->adapter seam. RunStep drives open-loop load at
+// targetRPS, owning its own warmup/hold measurement boundaries, and returns the
+// normalized inputs for the hold window. The engine owns cooldown, stop-on-trip
+// and last-pass tracking.
+type rpsWorkload interface {
+	RunStep(ctx context.Context, targetRPS int, warmup, hold time.Duration) (rpsStepInputs, error)
+	Label() string
+}
+
+// rampConfig parameterizes a ramp.
+type rampConfig struct {
+	Steps                    []int
+	Warmup, Hold, Cooldown   time.Duration
+	Thresholds               rpsThresholds
+	StopOnTrip               bool
+}
+
+// parseRPSSteps parses a comma-separated, strictly-ascending list of positive
+// RPS values. A trailing "k" multiplies by 1000 (e.g. "5k" -> 5000).
+func parseRPSSteps(s string) ([]int, error) {
+	parts := strings.Split(s, ",")
+	out := make([]int, 0, len(parts))
+	prev := 0
+	for _, raw := range parts {
+		tok := strings.TrimSpace(raw)
+		if tok == "" {
+			return nil, fmt.Errorf("empty step in %q", s)
+		}
+		mult := 1
+		if strings.HasSuffix(tok, "k") || strings.HasSuffix(tok, "K") {
+			mult = 1000
+			tok = tok[:len(tok)-1]
+		}
+		n, err := strconv.Atoi(strings.TrimSpace(tok))
+		if err != nil {
+			return nil, fmt.Errorf("bad step %q: %w", raw, err)
+		}
+		n *= mult
+		if n <= 0 {
+			return nil, fmt.Errorf("step must be > 0, got %d", n)
+		}
+		if n <= prev {
+			return nil, fmt.Errorf("steps must be strictly ascending, got %d after %d", n, prev)
+		}
+		prev = n
+		out = append(out, n)
+	}
+	if len(out) == 0 {
+		return nil, fmt.Errorf("no steps parsed from %q", s)
+	}
+	return out, nil
+}
+
+// waitOrCancel sleeps for d or returns early with ctx.Err() if ctx is cancelled.
+func waitOrCancel(ctx context.Context, d time.Duration) error {
+	if d <= 0 {
+		return ctx.Err()
+	}
+	t := time.NewTimer(d)
+	defer t.Stop()
+	select {
+	case <-ctx.Done():
+		return ctx.Err()
+	case <-t.C:
+		return nil
+	}
+}
+
+// runRamp executes each step in order. It stops early on the first TRIP when
+// StopOnTrip is set (an INCONCLUSIVE step never stops the ramp), and on ctx
+// cancellation, returning whatever results were gathered.
+func runRamp(ctx context.Context, w rpsWorkload, cfg rampConfig) []rpsStepResult {
+	var results []rpsStepResult
+	for i, n := range cfg.Steps {
+		if ctx.Err() != nil {
+			break
+		}
+		in, err := w.RunStep(ctx, n, cfg.Warmup, cfg.Hold)
+		if err != nil {
+			if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+				break
+			}
+			slog.Warn("step run failed", "rps", n, "error", err)
+			break
+		}
+		res := evaluateRPSStep(in, cfg.Thresholds)
+		results = append(results, res)
+		slog.Info("step complete", "rps", n, "verdict", res.Kind.String(),
+			"achieved", res.AchievedRPS, "reasons", res.Reasons)
+		if cfg.StopOnTrip && res.Kind == verdictTrip {
+			break
+		}
+		if i < len(cfg.Steps)-1 {
+			if err := waitOrCancel(ctx, cfg.Cooldown); err != nil {
+				break
+			}
+		}
+	}
+	return results
+}
+
+// maxRPSExitCode returns 0 if any step PASSed, else 1.
+func maxRPSExitCode(results []rpsStepResult) int {
+	for i := range results {
+		if results[i].Kind == verdictPass {
+			return 0
+		}
+	}
+	return 1
+}
+```
+
+- [ ] **Step 4: Run the tests to verify they pass**
+
+Run: `cd tools/loadgen && go test -run 'TestParseRPSSteps|TestRunRamp|TestMaxRPSExitCode|TestWaitOrCancel' . 2>&1 | tail -5`
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tools/loadgen/ramp.go tools/loadgen/ramp_test.go
+git commit -m "feat(loadgen): add rps ramp engine (parseRPSSteps, runRamp)"
+```
+
+---
+
+## Task 3: Report — `renderRPSReport`, `writeRPSCSV`
+
+**Files:**
+- Create: `tools/loadgen/maxrps_report.go`
+- Test: `tools/loadgen/maxrps_report_test.go`
+
+- [ ] **Step 1: Write the failing tests**
+
+Create `tools/loadgen/maxrps_report_test.go`:
+
+```go
+package main
+
+import (
+	"bytes"
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func sampleResults() []rpsStepResult {
+	return []rpsStepResult{
+		{TargetRPS: 500, AchievedRPS: 499, ErrorRate: 0, Kind: verdictPass,
+			Latencies: []seriesPercentile{{Name: "E1", Pct: Percentiles{P95: ms(20), P99: ms(40)}}}},
+		{TargetRPS: 1000, AchievedRPS: 998, ErrorRate: 0, Kind: verdictPass,
+			Latencies: []seriesPercentile{{Name: "E1", Pct: Percentiles{P95: ms(60), P99: ms(90)}}}},
+		{TargetRPS: 2000, AchievedRPS: 1900, ErrorRate: 0.02, Kind: verdictTrip,
+			WorstDurable: "broadcast-worker", WorstDelta: 1500,
+			Latencies: []seriesPercentile{{Name: "E1", Pct: Percentiles{P95: ms(160), P99: ms(300)}}},
+			Reasons:   []string{"E1 p95=160ms > 100ms", "broadcast-worker pending +1500 > +1000"}},
+	}
+}
+
+func TestRenderRPSReport_ReportsLastPass(t *testing.T) {
+	var buf bytes.Buffer
+	require.NoError(t, renderRPSReport(&buf, sampleResults(), "messages", "medium"))
+	out := buf.String()
+	assert.Contains(t, out, "ANSWER: max RPS = 1000")
+	assert.Contains(t, out, "workload=messages")
+	assert.Contains(t, out, "preset=medium")
+	assert.Contains(t, out, "Next limit:")
+	assert.Contains(t, out, "broadcast-worker pending +1500 > +1000")
+	assert.Contains(t, out, "E1 p95") // dynamic series column header
+}
+
+func TestRenderRPSReport_NoStepPassed(t *testing.T) {
+	results := []rpsStepResult{{TargetRPS: 500, Kind: verdictTrip, Reasons: []string{"E1 p95=400ms > 100ms"}}}
+	var buf bytes.Buffer
+	require.NoError(t, renderRPSReport(&buf, results, "history", "history-medium"))
+	assert.Contains(t, buf.String(), "ANSWER: no step passed")
+}
+
+func TestLastPassRPS(t *testing.T) {
+	assert.Equal(t, 1000, lastPassRPS(sampleResults()))
+	assert.Equal(t, 0, lastPassRPS([]rpsStepResult{{Kind: verdictTrip}}))
+}
+
+func TestWriteRPSCSV(t *testing.T) {
+	var buf bytes.Buffer
+	require.NoError(t, writeRPSCSV(&buf, sampleResults()))
+	lines := strings.Split(strings.TrimSpace(buf.String()), "\n")
+	require.Len(t, lines, 4) // header + 3 rows
+	assert.Contains(t, lines[0], "target_rps")
+	assert.Contains(t, lines[0], "achieved_rps")
+	assert.Contains(t, lines[0], "E1_p95_ms")
+	assert.Contains(t, lines[0], "verdict")
+	assert.Contains(t, lines[3], "2000")
+	assert.Contains(t, lines[3], "TRIP")
+}
+```
+
+- [ ] **Step 2: Run the tests to verify they fail**
+
+Run: `cd tools/loadgen && go test -run 'TestRenderRPSReport|TestLastPassRPS|TestWriteRPSCSV' . 2>&1 | head -20`
+Expected: FAIL — `undefined: renderRPSReport`, etc.
+
+- [ ] **Step 3: Write the implementation**
+
+Create `tools/loadgen/maxrps_report.go`:
+
+```go
+package main
+
+import (
+	"encoding/csv"
+	"fmt"
+	"io"
+	"strconv"
+	"strings"
+	"text/tabwriter"
+)
+
+// lastPassRPS returns the largest TargetRPS whose step PASSed, or 0 if none.
+// Assumes results are in ascending step order.
+func lastPassRPS(results []rpsStepResult) int {
+	last := 0
+	for i := range results {
+		if results[i].Kind == verdictPass {
+			last = results[i].TargetRPS
+		}
+	}
+	return last
+}
+
+// firstTrip returns the first tripped step, or nil if none tripped.
+func firstTrip(results []rpsStepResult) *rpsStepResult {
+	for i := range results {
+		if results[i].Kind == verdictTrip {
+			return &results[i]
+		}
+	}
+	return nil
+}
+
+// seriesNames returns the ordered union of latency-series names across results.
+func seriesNames(results []rpsStepResult) []string {
+	var names []string
+	seen := map[string]bool{}
+	for i := range results {
+		for _, sp := range results[i].Latencies {
+			if !seen[sp.Name] {
+				seen[sp.Name] = true
+				names = append(names, sp.Name)
+			}
+		}
+	}
+	return names
+}
+
+// pctFor returns the percentiles for a named series in a result (zero if absent).
+func pctFor(r *rpsStepResult, name string) Percentiles {
+	for _, sp := range r.Latencies {
+		if sp.Name == name {
+			return sp.Pct
+		}
+	}
+	return Percentiles{}
+}
+
+// renderRPSReport writes the per-step table and the ANSWER line.
+func renderRPSReport(w io.Writer, results []rpsStepResult, workload, preset string) error {
+	fmt.Fprintf(w, "=== loadgen max-rps complete (workload=%s, preset=%s) ===\n\n", workload, preset)
+	names := seriesNames(results)
+
+	tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
+	header := []string{"target_rps", "achieved_rps"}
+	for _, n := range names {
+		header = append(header, n+" p95", n+" p99")
+	}
+	header = append(header, "err%", "worst_pending", "verdict")
+	fmt.Fprintln(tw, strings.Join(header, "\t"))
+
+	for i := range results {
+		r := &results[i]
+		row := []string{strconv.Itoa(r.TargetRPS), fmt.Sprintf("%.0f", r.AchievedRPS)}
+		for _, n := range names {
+			p := pctFor(r, n)
+			row = append(row, p.P95.String(), p.P99.String())
+		}
+		pending := "-"
+		if r.WorstDurable != "" {
+			pending = fmt.Sprintf("%s +%d", r.WorstDurable, r.WorstDelta)
+		}
+		row = append(row, fmt.Sprintf("%.3f", r.ErrorRate*100), pending, r.Kind.String())
+		fmt.Fprintln(tw, strings.Join(row, "\t"))
+	}
+	if err := tw.Flush(); err != nil {
+		return fmt.Errorf("flush table: %w", err)
+	}
+
+	fmt.Fprintln(w)
+	pass := lastPassRPS(results)
+	if pass == 0 {
+		fmt.Fprintf(w, "ANSWER: no step passed (workload=%s, preset=%s)\n", workload, preset)
+		return nil
+	}
+	fmt.Fprintf(w, "ANSWER: max RPS = %d (workload=%s, preset=%s)\n", pass, workload, preset)
+	if trip := firstTrip(results); trip != nil {
+		fmt.Fprintf(w, "        Next limit: %s\n", strings.Join(trip.Reasons, "; "))
+	}
+	return nil
+}
+
+// writeRPSCSV writes one row per step. Series percentile columns are emitted in
+// the union order of series names across all steps.
+func writeRPSCSV(w io.Writer, results []rpsStepResult) error {
+	cw := csv.NewWriter(w)
+	names := seriesNames(results)
+
+	header := []string{"target_rps", "achieved_rps"}
+	for _, n := range names {
+		header = append(header, n+"_p95_ms", n+"_p99_ms")
+	}
+	header = append(header, "error_rate", "attempted", "failed", "saturation", "worst_durable", "worst_pending_delta", "verdict", "reasons")
+	if err := cw.Write(header); err != nil {
+		return fmt.Errorf("write csv header: %w", err)
+	}
+
+	for i := range results {
+		r := &results[i]
+		row := []string{strconv.Itoa(r.TargetRPS), fmt.Sprintf("%.1f", r.AchievedRPS)}
+		for _, n := range names {
+			p := pctFor(r, n)
+			row = append(row,
+				strconv.FormatInt(p.P95.Milliseconds(), 10),
+				strconv.FormatInt(p.P99.Milliseconds(), 10))
+		}
+		row = append(row,
+			strconv.FormatFloat(r.ErrorRate, 'f', 6, 64),
+			strconv.Itoa(r.AttemptedOps), strconv.Itoa(r.FailedOps), strconv.Itoa(r.Saturation),
+			r.WorstDurable, strconv.FormatInt(r.WorstDelta, 10),
+			r.Kind.String(), strings.Join(r.Reasons, "; "))
+		if err := cw.Write(row); err != nil {
+			return fmt.Errorf("write csv row: %w", err)
+		}
+	}
+	cw.Flush()
+	if err := cw.Error(); err != nil {
+		return fmt.Errorf("flush csv: %w", err)
+	}
+	return nil
+}
+```
+
+- [ ] **Step 4: Run the tests to verify they pass**
+
+Run: `cd tools/loadgen && go test -run 'TestRenderRPSReport|TestLastPassRPS|TestWriteRPSCSV' . 2>&1 | tail -5`
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tools/loadgen/maxrps_report.go tools/loadgen/maxrps_report_test.go
+git commit -m "feat(loadgen): add max-rps report renderer and CSV writer"
+```
+
+---
+
+## Task 4: Add `Collector.Reset()`
+
+**Files:**
+- Modify: `tools/loadgen/collector.go`
+- Test: `tools/loadgen/collector_test.go` (add one test)
+
+- [ ] **Step 1: Write the failing test**
+
+Add to `tools/loadgen/collector_test.go`:
+
+```go
+func TestCollector_Reset(t *testing.T) {
+	c := NewCollector(NewMetrics(), "test")
+	now := time.Now()
+	c.RecordPublish("req-1", "msg-1", now)
+	c.RecordReply("req-1", now.Add(10*time.Millisecond))
+	c.RecordBroadcast("msg-1", now.Add(20*time.Millisecond))
+	require.Equal(t, 1, c.E1Count())
+	require.Equal(t, 1, c.E2Count())
+
+	c.Reset()
+
+	assert.Equal(t, 0, c.E1Count())
+	assert.Equal(t, 0, c.E2Count())
+	mr, mb := c.Finalize()
+	assert.Equal(t, 0, mr)
+	assert.Equal(t, 0, mb)
+	// After reset, a fresh publish+reply correlates normally.
+	c.RecordPublish("req-2", "msg-2", now)
+	c.RecordReply("req-2", now.Add(5*time.Millisecond))
+	assert.Equal(t, 1, c.E1Count())
+}
+```
+
+Ensure `collector_test.go` imports `time`, `testing`, and testify `assert`/`require` (add any missing imports).
+
+- [ ] **Step 2: Run the test to verify it fails**
+
+Run: `cd tools/loadgen && go test -run 'TestCollector_Reset' . 2>&1 | head -20`
+Expected: FAIL — `c.Reset undefined`.
+
+- [ ] **Step 3: Write the implementation**
+
+Add to `tools/loadgen/collector.go` (after `NewCollector`):
+
+```go
+// Reset clears all correlation state and accumulated samples. Used by the
+// max-rps ramp to start each step's hold window from a clean slate while the
+// E1/E2 subscriptions (which hold this *Collector pointer) stay alive.
+func (c *Collector) Reset() {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+	c.byReqID = make(map[string]publishEntry)
+	c.byMsgID = make(map[string]publishEntry)
+	c.e1 = nil
+	c.e2 = nil
+}
+```
+
+- [ ] **Step 4: Run the test to verify it passes**
+
+Run: `cd tools/loadgen && go test -run 'TestCollector_Reset' . 2>&1 | tail -5`
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tools/loadgen/collector.go tools/loadgen/collector_test.go
+git commit -m "feat(loadgen): add Collector.Reset for per-step ramp windows"
+```
+
+---
+
+## Task 5: Messages workload adapter
+
+**Files:**
+- Create: `tools/loadgen/maxrps_messages.go`
+- Test: `tools/loadgen/maxrps_messages_test.go`
+
+This adapter reuses `Generator`, `Collector`, `newE2Handler`, `newNatsCorePublisher`, `gatheredCounterValue`, `stream.MessagesCanonical`, and `subject.*Wildcard` (all already in `package main`). The pure-logic helpers (`buildMessagesInputs`, `diffCounters`) are unit-tested; the NATS-touching constructor and `RunStep` are covered by the integration test in Task 8.
+
+> **Note:** `natsutil.Connect` returns `*otelnats.Conn`. The adapter does NOT store it on the struct — the constructor captures the connection (and the metrics `*http.Server`) in the cleanup closure, so no `otelnats` import is needed in the adapter.
+
+- [ ] **Step 1: Write the failing tests**
+
+Create `tools/loadgen/maxrps_messages_test.go`:
+
+```go
+package main
+
+import (
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func TestDiffCounters(t *testing.T) {
+	start := msgCounters{published: 100, err: map[string]float64{"publish": 1, "saturated": 5}}
+	end := msgCounters{published: 1100, err: map[string]float64{"publish": 3, "saturated": 9}}
+	d := diffCounters(start, end)
+	assert.Equal(t, float64(1000), d.published)
+	assert.Equal(t, float64(2), d.err["publish"])
+	assert.Equal(t, float64(4), d.err["saturated"])
+}
+
+func TestBuildMessagesInputs(t *testing.T) {
+	delta := msgCounters{
+		published: 980,
+		err:       map[string]float64{"publish": 10, "marshal": 0, "gatekeeper": 5, "bad_reply": 0, "saturated": 7},
+	}
+	e1 := nLatencies(50, ms(15))
+	e2 := nLatencies(50, ms(30))
+	pending := map[string]uint64{"message-worker": 12, "broadcast-worker": 40}
+	startPending := map[string]uint64{"message-worker": 2, "broadcast-worker": 5}
+	durables := []string{"message-worker", "broadcast-worker"}
+
+	in := buildMessagesInputs(1000, 10*time.Second, delta, e1, e2, startPending, pending, durables, true)
+
+	// attempted = published(980) + publish(10) + marshal(0)
+	assert.Equal(t, 990, in.AttemptedOps)
+	// failed = publish(10) + marshal(0) + gatekeeper(5) + bad_reply(0)
+	assert.Equal(t, 15, in.FailedOps)
+	assert.Equal(t, 7, in.Saturation)
+	assert.Len(t, in.Latencies, 2)
+	assert.Equal(t, "E1", in.Latencies[0].Name)
+	assert.Equal(t, "E2", in.Latencies[1].Name)
+	assert.Len(t, in.Pending, 2)
+	assert.Equal(t, uint64(2), in.Pending[0].Start)
+	assert.Equal(t, uint64(12), in.Pending[0].End)
+	assert.False(t, in.Inconclusive)
+}
+
+func TestBuildMessagesInputs_PendingUnavailableIsInconclusive(t *testing.T) {
+	delta := msgCounters{published: 1000, err: map[string]float64{}}
+	in := buildMessagesInputs(1000, time.Second, delta, nil, nil, nil, nil, []string{"message-worker"}, false)
+	assert.True(t, in.Inconclusive)
+	assert.Contains(t, in.InconclusiveReason, "pending")
+	assert.Empty(t, in.Pending)
+}
+```
+
+- [ ] **Step 2: Run the tests to verify they fail**
+
+Run: `cd tools/loadgen && go test -run 'TestDiffCounters|TestBuildMessagesInputs' . 2>&1 | head -20`
+Expected: FAIL — `undefined: diffCounters`, `undefined: buildMessagesInputs`, `undefined: msgCounters`.
+
+- [ ] **Step 3: Write the implementation**
+
+Create `tools/loadgen/maxrps_messages.go`:
+
+```go
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"errors"
+	"fmt"
+	"log/slog"
+	"net/http"
+	"sync"
+	"time"
+
+	"github.com/nats-io/nats.go"
+	"github.com/nats-io/nats.go/jetstream"
+
+	"github.com/hmchangw/chat/pkg/natsutil"
+	"github.com/hmchangw/chat/pkg/stream"
+	"github.com/hmchangw/chat/pkg/subject"
+)
+
+// msgCounters is a point-in-time snapshot of the loadgen publish counters.
+type msgCounters struct {
+	published float64
+	err       map[string]float64 // keyed by reason
+}
+
+var msgErrorReasons = []string{"publish", "marshal", "gatekeeper", "bad_reply", "saturated"}
+
+// diffCounters returns end-start for published and each tracked reason.
+func diffCounters(start, end msgCounters) msgCounters {
+	d := msgCounters{published: end.published - start.published, err: map[string]float64{}}
+	for _, r := range msgErrorReasons {
+		d.err[r] = end.err[r] - start.err[r]
+	}
+	return d
+}
+
+// buildMessagesInputs assembles the normalized step inputs from a counter delta,
+// the hold-window latency tapes, and the pending snapshots.
+//
+// Error accounting (see spec §5): FailedOps counts hard publish/gatekeeper errors
+// only; missing replies/broadcasts are NOT counted (late stragglers would create
+// false trips) — slow/dropped delivery is caught by latency and pending-growth.
+func buildMessagesInputs(
+	targetRPS int, hold time.Duration, delta msgCounters,
+	e1, e2 []time.Duration,
+	startPending, endPending map[string]uint64,
+	durables []string, pendingOK bool,
+) rpsStepInputs {
+	attempted := int(delta.published + delta.err["publish"] + delta.err["marshal"])
+	failed := int(delta.err["publish"] + delta.err["marshal"] + delta.err["gatekeeper"] + delta.err["bad_reply"])
+	in := rpsStepInputs{
+		TargetRPS:    targetRPS,
+		Hold:         hold,
+		AttemptedOps: attempted,
+		FailedOps:    failed,
+		Saturation:   int(delta.err["saturated"]),
+		Latencies: []seriesSamples{
+			{Name: "E1", Samples: e1},
+			{Name: "E2", Samples: e2},
+		},
+	}
+	if !pendingOK {
+		in.Inconclusive = true
+		in.InconclusiveReason = "consumer pending snapshot failed — backlog signal unavailable"
+		return in
+	}
+	for _, d := range durables {
+		in.Pending = append(in.Pending, consumerPendingDelta{Durable: d, Start: startPending[d], End: endPending[d]})
+	}
+	return in
+}
+
+// messagesWorkload drives the messaging pipeline at a given RPS.
+// The natsutil connection and metrics server are not stored on the struct
+// (natsutil.Connect returns *otelnats.Conn); they are captured by the cleanup
+// closure instead, so the adapter only keeps what RunStep needs.
+type messagesWorkload struct {
+	cfg       *config
+	preset    *Preset
+	fixtures  Fixtures
+	inject    InjectMode
+	seed      int64
+	js        jetstream.JetStream
+	metrics   *Metrics
+	collector *Collector
+	publisher Publisher
+	canonical string
+	durables  []string
+}
+
+func (w *messagesWorkload) Label() string { return "messages" }
+
+// newMessagesWorkload wires NATS, the metrics server, the E1/E2 subscriptions,
+// and the publisher. The returned cleanup unsubscribes, shuts the metrics server
+// and drains NATS.
+func newMessagesWorkload(ctx context.Context, cfg *config, preset *Preset, inject InjectMode, seed int64) (*messagesWorkload, func(), error) {
+	nc, err := natsutil.Connect(cfg.NatsURL, cfg.NatsCredsFile)
+	if err != nil {
+		return nil, nil, fmt.Errorf("nats connect: %w", err)
+	}
+	js, err := jetstream.New(nc.NatsConn())
+	if err != nil {
+		_ = nc.Drain()
+		return nil, nil, fmt.Errorf("jetstream init: %w", err)
+	}
+	metrics := NewMetrics()
+	srv := &http.Server{Addr: cfg.MetricsAddr, Handler: metrics.Handler(), ReadHeaderTimeout: 5 * time.Second}
+	go func() {
+		if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
+			slog.Warn("metrics server stopped", "error", err)
+		}
+	}()
+
+	collector := NewCollector(metrics, preset.Name)
+
+	e1Sub, err := nc.NatsConn().Subscribe(subject.UserResponseWildcard(), func(msg *nats.Msg) {
+		reqID := lastToken(msg.Subject)
+		var payload struct {
+			Error string `json:"error"`
+		}
+		if err := json.Unmarshal(msg.Data, &payload); err != nil {
+			metrics.PublishErrors.WithLabelValues(preset.Name, "bad_reply").Inc()
+			return
+		}
+		if payload.Error != "" {
+			metrics.PublishErrors.WithLabelValues(preset.Name, "gatekeeper").Inc()
+		}
+		collector.RecordReply(reqID, time.Now())
+	})
+	if err != nil {
+		_ = nc.Drain()
+		return nil, nil, fmt.Errorf("subscribe e1: %w", err)
+	}
+	e2Handler := newE2Handler(collector)
+	e2Sub, err := nc.NatsConn().Subscribe(subject.RoomEventWildcard(), e2Handler)
+	if err != nil {
+		_ = e1Sub.Unsubscribe()
+		_ = nc.Drain()
+		return nil, nil, fmt.Errorf("subscribe e2: %w", err)
+	}
+	e2DMSub, err := nc.NatsConn().Subscribe(subject.UserRoomEventWildcard(), e2Handler)
+	if err != nil {
+		_ = e1Sub.Unsubscribe()
+		_ = e2Sub.Unsubscribe()
+		_ = nc.Drain()
+		return nil, nil, fmt.Errorf("subscribe e2 dm: %w", err)
+	}
+
+	w := &messagesWorkload{
+		cfg: cfg, preset: preset, fixtures: BuildFixtures(preset, seed, cfg.SiteID),
+		inject: inject, seed: seed, js: js, metrics: metrics, collector: collector,
+		publisher: newNatsCorePublisher(nc.NatsConn(), inject, js),
+		canonical: stream.MessagesCanonical(cfg.SiteID).Name,
+		durables:  []string{"message-worker", "broadcast-worker"},
+	}
+	cleanup := func() {
+		_ = e1Sub.Unsubscribe()
+		_ = e2Sub.Unsubscribe()
+		_ = e2DMSub.Unsubscribe()
+		shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		_ = srv.Shutdown(shutCtx)
+		cancel()
+		_ = nc.Drain()
+	}
+	return w, cleanup, nil
+}
+
+func (w *messagesWorkload) snapshotCounters() msgCounters {
+	mfs, _ := w.metrics.Registry.Gather()
+	c := msgCounters{
+		published: gatheredCounterValue(mfs, "loadgen_published_total", "", ""),
+		err:       map[string]float64{},
+	}
+	for _, reason := range msgErrorReasons {
+		c.err[reason] = gatheredCounterValue(mfs, "loadgen_publish_errors_total", "reason", reason)
+	}
+	return c
+}
+
+func (w *messagesWorkload) snapshotPending(ctx context.Context) (map[string]uint64, error) {
+	out := map[string]uint64{}
+	for _, d := range w.durables {
+		cons, err := w.js.Consumer(ctx, w.canonical, d)
+		if err != nil {
+			return nil, fmt.Errorf("consumer %s: %w", d, err)
+		}
+		info, err := cons.Info(ctx)
+		if err != nil {
+			return nil, fmt.Errorf("consumer info %s: %w", d, err)
+		}
+		out[d] = info.NumPending
+	}
+	return out, nil
+}
+
+// RunStep runs a fresh generator at targetRPS for warmup+hold, resetting the
+// collector at the hold boundary so only the hold window is measured.
+func (w *messagesWorkload) RunStep(ctx context.Context, targetRPS int, warmup, hold time.Duration) (rpsStepInputs, error) {
+	gen := NewGenerator(&GeneratorConfig{
+		Preset: w.preset, Fixtures: w.fixtures, SiteID: w.cfg.SiteID,
+		Rate: targetRPS, Inject: w.inject, Publisher: w.publisher,
+		Metrics: w.metrics, Collector: w.collector,
+		WarmupDeadline: time.Now().Add(warmup), MaxInFlight: w.cfg.MaxInFlight,
+	}, w.seed)
+
+	genCtx, cancel := context.WithCancel(ctx)
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		_ = gen.Run(genCtx)
+	}()
+
+	if err := waitOrCancel(ctx, warmup); err != nil {
+		cancel()
+		wg.Wait()
+		return rpsStepInputs{}, err
+	}
+
+	holdStart := time.Now()
+	w.collector.Reset()
+	startCounts := w.snapshotCounters()
+	startPending, perr1 := w.snapshotPending(ctx)
+
+	holdErr := waitOrCancel(ctx, hold)
+
+	endCounts := w.snapshotCounters()
+	endPending, perr2 := w.snapshotPending(ctx)
+	cancel()
+	wg.Wait()
+	time.Sleep(2 * time.Second) // drain trailing replies/broadcasts
+	w.collector.DiscardBefore(holdStart)
+
+	if holdErr != nil {
+		return rpsStepInputs{}, holdErr
+	}
+
+	delta := diffCounters(startCounts, endCounts)
+	pendingOK := perr1 == nil && perr2 == nil
+	if !pendingOK {
+		slog.Warn("pending snapshot failed", "start_err", perr1, "end_err", perr2)
+	}
+	return buildMessagesInputs(targetRPS, hold, delta,
+		w.collector.E1Samples(), w.collector.E2Samples(),
+		startPending, endPending, w.durables, pendingOK), nil
+}
+```
+
+- [ ] **Step 4: Run the tests to verify they pass**
+
+Run: `cd tools/loadgen && go test -run 'TestDiffCounters|TestBuildMessagesInputs' . 2>&1 | tail -5`
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tools/loadgen/maxrps_messages.go tools/loadgen/maxrps_messages_test.go
+git commit -m "feat(loadgen): add messages workload adapter for max-rps"
+```
+
+---
+
+## Task 6: History workload adapter
+
+**Files:**
+- Create: `tools/loadgen/maxrps_history.go`
+- Test: `tools/loadgen/maxrps_history_test.go`
+
+The history adapter runs warmup and hold as two sequential generator runs, each with its own fresh `HistoryCollector`, so the hold collector holds only hold-window samples and error tallies (no time filtering needed). It reuses `NewHistoryGenerator`, `newNATSHistoryRequester`, and `BuildHistoryFixtures`.
+
+- [ ] **Step 1: Write the failing tests**
+
+Create `tools/loadgen/maxrps_history_test.go`:
+
+```go
+package main
+
+import (
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func TestBuildHistoryInputs(t *testing.T) {
+	c := NewHistoryCollector()
+	now := time.Now()
+	for i := 0; i < 40; i++ {
+		c.RecordSample(HistorySample{Endpoint: HistoryEndpointHistory, Latency: ms(15), At: now})
+	}
+	for i := 0; i < 10; i++ {
+		c.RecordSample(HistorySample{Endpoint: HistoryEndpointThread, Latency: ms(25), At: now})
+	}
+	c.RecordError(HistoryEndpointHistory, errClassTimeout, 0)
+	c.RecordError(HistoryEndpointThread, errClassReply, 0)
+	c.RecordSaturation()
+	c.RecordSaturation()
+
+	in := buildHistoryInputs(2000, 30*time.Second, c)
+
+	// attempted = 40 + 10 history/thread samples + 2 errors (timeout+reply)
+	assert.Equal(t, 52, in.AttemptedOps)
+	assert.Equal(t, 2, in.FailedOps)
+	assert.Equal(t, 2, in.Saturation)
+	assert.Len(t, in.Latencies, 2)
+	assert.Equal(t, "history", in.Latencies[0].Name)
+	assert.Equal(t, "thread", in.Latencies[1].Name)
+	assert.Len(t, in.Latencies[0].Samples, 40)
+	assert.Len(t, in.Latencies[1].Samples, 10)
+	assert.Empty(t, in.Pending) // history has no consumer queue
+}
+```
+
+- [ ] **Step 2: Run the tests to verify they fail**
+
+Run: `cd tools/loadgen && go test -run 'TestBuildHistoryInputs' . 2>&1 | head -20`
+Expected: FAIL — `undefined: buildHistoryInputs`.
+
+- [ ] **Step 3: Write the implementation**
+
+Create `tools/loadgen/maxrps_history.go`:
+
+```go
+package main
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"log/slog"
+	"net/http"
+	"sync"
+	"time"
+
+	"github.com/hmchangw/chat/pkg/natsutil"
+)
+
+// latenciesOf extracts the latency tape from a sample slice.
+func latenciesOf(samples []HistorySample) []time.Duration {
+	out := make([]time.Duration, len(samples))
+	for i := range samples {
+		out[i] = samples[i].Latency
+	}
+	return out
+}
+
+// buildHistoryInputs assembles normalized step inputs from a (hold-only) history
+// collector. Per-endpoint latency series gate independently; no consumer queue
+// exists for synchronous reads so Pending is empty.
+func buildHistoryInputs(targetRPS int, hold time.Duration, c *HistoryCollector) rpsStepInputs {
+	hist := c.HistorySamples()
+	thread := c.ThreadSamples()
+	failed := c.TimeoutErrors() + c.ReplyErrors() + c.BadReplyCount()
+	attempted := len(hist) + len(thread) + failed
+	return rpsStepInputs{
+		TargetRPS:    targetRPS,
+		Hold:         hold,
+		AttemptedOps: attempted,
+		FailedOps:    failed,
+		Saturation:   c.SaturationCount(),
+		Latencies: []seriesSamples{
+			{Name: "history", Samples: latenciesOf(hist)},
+			{Name: "thread", Samples: latenciesOf(thread)},
+		},
+	}
+}
+
+// historyWorkload drives history-service read requests at a given RPS.
+// As with messagesWorkload, the natsutil connection (*otelnats.Conn) and metrics
+// server are captured by the cleanup closure, not stored on the struct.
+type historyWorkload struct {
+	cfg             *config
+	preset          *HistoryPreset
+	fixtures        HistoryFixtures
+	seed            int64
+	mix             EndpointMix
+	beforeMode      BeforeMode
+	scrollbackPages int
+	pageLimit       int
+	requestTimeout  time.Duration
+	metrics         *Metrics
+	requester       HistoryRequester
+}
+
+func (w *historyWorkload) Label() string { return "history" }
+
+// historyWorkloadParams bundles the history-specific tunables.
+type historyWorkloadParams struct {
+	Mix             EndpointMix
+	BeforeMode      BeforeMode
+	ScrollbackPages int
+	PageLimit       int
+	RequestTimeout  time.Duration
+}
+
+func newHistoryWorkload(ctx context.Context, cfg *config, preset *HistoryPreset, seed int64, p historyWorkloadParams) (*historyWorkload, func(), error) {
+	if cfg.CassandraHosts == "" {
+		return nil, nil, fmt.Errorf("history workload requires CASSANDRA_HOSTS")
+	}
+	nc, err := natsutil.Connect(cfg.NatsURL, cfg.NatsCredsFile)
+	if err != nil {
+		return nil, nil, fmt.Errorf("nats connect: %w", err)
+	}
+	metrics := NewMetrics()
+	srv := &http.Server{Addr: cfg.MetricsAddr, Handler: metrics.Handler(), ReadHeaderTimeout: 5 * time.Second}
+	go func() {
+		if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
+			slog.Warn("metrics server stopped", "error", err)
+		}
+	}()
+	w := &historyWorkload{
+		cfg: cfg, preset: preset, fixtures: BuildHistoryFixtures(preset, seed, cfg.SiteID, time.Now().UTC()),
+		seed: seed, mix: p.Mix, beforeMode: p.BeforeMode, scrollbackPages: p.ScrollbackPages,
+		pageLimit: p.PageLimit, requestTimeout: p.RequestTimeout,
+		metrics: metrics, requester: newNATSHistoryRequester(nc.NatsConn()),
+	}
+	cleanup := func() {
+		shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		_ = srv.Shutdown(shutCtx)
+		cancel()
+		_ = nc.Drain()
+	}
+	return w, cleanup, nil
+}
+
+func (w *historyWorkload) newGenerator(collector *HistoryCollector, targetRPS int) *HistoryGenerator {
+	return NewHistoryGenerator(&HistoryGeneratorConfig{
+		Preset: w.preset, Fixtures: &w.fixtures, SiteID: w.cfg.SiteID, Rate: targetRPS,
+		Mix: w.mix, BeforeMode: w.beforeMode, ScrollbackPages: w.scrollbackPages,
+		PageLimit: w.pageLimit, RequestTimeout: w.requestTimeout,
+		Requester: w.requester, Collector: collector, MaxInFlight: w.cfg.MaxInFlight,
+	}, w.seed)
+}
+
+// runFor runs gen.Run in a goroutine for d (or until ctx cancels), then stops it.
+func runFor(ctx context.Context, gen *HistoryGenerator, d time.Duration) error {
+	genCtx, cancel := context.WithCancel(ctx)
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		_ = gen.Run(genCtx)
+	}()
+	err := waitOrCancel(ctx, d)
+	cancel()
+	wg.Wait()
+	return err
+}
+
+// RunStep runs warmup (discarded) then hold (measured) as two sequential
+// generator runs so the hold collector contains only hold-window data.
+func (w *historyWorkload) RunStep(ctx context.Context, targetRPS int, warmup, hold time.Duration) (rpsStepInputs, error) {
+	if warmup > 0 {
+		warmCollector := NewHistoryCollector()
+		if err := runFor(ctx, w.newGenerator(warmCollector, targetRPS), warmup); err != nil {
+			return rpsStepInputs{}, err
+		}
+	}
+	collector := NewHistoryCollector()
+	if err := runFor(ctx, w.newGenerator(collector, targetRPS), hold); err != nil {
+		return rpsStepInputs{}, err
+	}
+	time.Sleep(2 * time.Second) // drain trailing in-flight replies into the collector
+	return buildHistoryInputs(targetRPS, hold, collector), nil
+}
+```
+
+- [ ] **Step 4: Run the tests to verify they pass**
+
+Run: `cd tools/loadgen && go test -run 'TestBuildHistoryInputs' . 2>&1 | tail -5`
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tools/loadgen/maxrps_history.go tools/loadgen/maxrps_history_test.go
+git commit -m "feat(loadgen): add history workload adapter for max-rps"
+```
+
+---
+
+## Task 7: CLI wiring — `runMaxRPS` and `dispatch`
+
+**Files:**
+- Create: `tools/loadgen/maxrps.go`
+- Modify: `tools/loadgen/main.go` (`dispatch`, `main.go:82-100`)
+- Test: `tools/loadgen/maxrps_test.go`
+
+- [ ] **Step 1: Write the failing tests**
+
+Create `tools/loadgen/maxrps_test.go`:
+
+```go
+package main
+
+import (
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestDefaultSteps(t *testing.T) {
+	msgs, err := parseRPSSteps(defaultSteps("messages"))
+	require.NoError(t, err)
+	assert.Equal(t, []int{500, 1000, 2000, 5000, 10000}, msgs)
+
+	hist, err := parseRPSSteps(defaultSteps("history"))
+	require.NoError(t, err)
+	assert.Equal(t, []int{200, 500, 1000, 2000, 5000}, hist)
+}
+
+func TestBuildThresholds(t *testing.T) {
+	th := buildThresholds(100*time.Millisecond, 250*time.Millisecond, 0.001, 1000, 0.05)
+	assert.Equal(t, 100*time.Millisecond, th.P95)
+	assert.Equal(t, 250*time.Millisecond, th.P99)
+	assert.Equal(t, 0.001, th.ErrorRate)
+	assert.Equal(t, uint64(1000), th.PendingGrowth)
+	assert.Equal(t, 0.05, th.RateTolerance)
+}
+```
+
+- [ ] **Step 2: Run the tests to verify they fail**
+
+Run: `cd tools/loadgen && go test -run 'TestDefaultSteps|TestBuildThresholds' . 2>&1 | head -20`
+Expected: FAIL — `undefined: defaultSteps`, `undefined: buildThresholds`.
+
+- [ ] **Step 3: Write the implementation**
+
+Create `tools/loadgen/maxrps.go`:
+
+```go
+package main
+
+import (
+	"context"
+	"flag"
+	"fmt"
+	"log/slog"
+	"os"
+	"time"
+)
+
+func defaultSteps(workload string) string {
+	if workload == "history" {
+		return "200,500,1000,2000,5000"
+	}
+	return "500,1000,2000,5000,10000"
+}
+
+func buildThresholds(p95, p99 time.Duration, errRate float64, pendingGrowth uint64, rateTol float64) rpsThresholds {
+	return rpsThresholds{P95: p95, P99: p99, ErrorRate: errRate, PendingGrowth: pendingGrowth, RateTolerance: rateTol}
+}
+
+// runMaxRPS parses flags, builds the workload adapter, runs the ramp and prints
+// the report. Returns the process exit code.
+func runMaxRPS(ctx context.Context, cfg *config, args []string) int {
+	fs := flag.NewFlagSet("max-rps", flag.ExitOnError)
+	workload := fs.String("workload", "messages", "messages|history")
+	preset := fs.String("preset", "", "preset name")
+	seed := fs.Int64("seed", 42, "RNG seed")
+	stepsFlag := fs.String("steps", "", "ascending RPS list, e.g. 500,1k,2k,5k,10k (default depends on workload)")
+	warmup := fs.Duration("warmup", 10*time.Second, "per-step warmup (samples discarded)")
+	hold := fs.Duration("hold", 30*time.Second, "per-step measurement window")
+	cooldown := fs.Duration("cooldown", 5*time.Second, "per-step settle gap")
+	sloP95 := fs.Duration("slo-p95", 100*time.Millisecond, "p95 latency SLO (all gated series)")
+	sloP99 := fs.Duration("slo-p99", 250*time.Millisecond, "p99 latency SLO (all gated series)")
+	sloErr := fs.Float64("slo-error-rate", 0.001, "max error rate (failed/attempted)")
+	sloPending := fs.Uint64("slo-pending-growth", 1000, "max per-durable pending growth (messages only)")
+	rateTol := fs.Float64("rate-tolerance", 0.05, "achieved-vs-target shortfall band for INCONCLUSIVE")
+	stopOnTrip := fs.Bool("stop-on-trip", true, "stop the ramp at the first TRIP")
+	inject := fs.String("inject", "frontdoor", "messages only: frontdoor|canonical")
+	// history-only tunables (ignored for messages):
+	mixFlag := fs.String("mix", "history:80,thread:20", "history only: endpoint mix")
+	beforeModeFlag := fs.String("before-mode", "open:70,scrollback:30", "history only: before-cursor mix")
+	scrollbackPages := fs.Int("scrollback-pages", 5, "history only: pages per scrollback chain")
+	pageLimit := fs.Int("page-limit", 20, "history only: page limit")
+	requestTimeout := fs.Duration("request-timeout", 5*time.Second, "history only: per-request timeout")
+	csvPath := fs.String("csv", "", "optional CSV output path")
+	_ = fs.Parse(args)
+
+	if *preset == "" {
+		fmt.Fprintln(os.Stderr, "--preset required")
+		return 2
+	}
+	stepsStr := *stepsFlag
+	if stepsStr == "" {
+		stepsStr = defaultSteps(*workload)
+	}
+	steps, err := parseRPSSteps(stepsStr)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "bad --steps: %v\n", err)
+		return 2
+	}
+	thresholds := buildThresholds(*sloP95, *sloP99, *sloErr, *sloPending, *rateTol)
+
+	var (
+		w        rpsWorkload
+		cleanup  func()
+		presetID string
+	)
+	switch *workload {
+	case "messages":
+		p, ok := BuiltinPreset(*preset)
+		if !ok {
+			fmt.Fprintf(os.Stderr, "unknown preset: %s\n", *preset)
+			return 2
+		}
+		injectMode, err := ParseInjectMode(*inject)
+		if err != nil {
+			fmt.Fprintln(os.Stderr, err.Error())
+			return 2
+		}
+		mw, clean, err := newMessagesWorkload(ctx, cfg, &p, injectMode, *seed)
+		if err != nil {
+			slog.Error("init messages workload", "error", err)
+			return 1
+		}
+		w, cleanup, presetID = mw, clean, p.Name
+	case "history":
+		p, ok := BuiltinHistoryPreset(*preset)
+		if !ok {
+			fmt.Fprintf(os.Stderr, "unknown history preset: %s\n", *preset)
+			return 2
+		}
+		mix, err := ParseEndpointMix(*mixFlag)
+		if err != nil {
+			fmt.Fprintln(os.Stderr, err.Error())
+			return 2
+		}
+		beforeMode, err := ParseBeforeMode(*beforeModeFlag)
+		if err != nil {
+			fmt.Fprintln(os.Stderr, err.Error())
+			return 2
+		}
+		if *scrollbackPages <= 0 {
+			fmt.Fprintln(os.Stderr, "--scrollback-pages must be > 0")
+			return 2
+		}
+		hw, clean, err := newHistoryWorkload(ctx, cfg, &p, *seed, historyWorkloadParams{
+			Mix: mix, BeforeMode: beforeMode, ScrollbackPages: *scrollbackPages,
+			PageLimit: *pageLimit, RequestTimeout: *requestTimeout,
+		})
+		if err != nil {
+			slog.Error("init history workload", "error", err)
+			return 1
+		}
+		w, cleanup, presetID = hw, clean, p.Name
+	default:
+		fmt.Fprintf(os.Stderr, "unknown workload: %s\n", *workload)
+		return 2
+	}
+	defer cleanup()
+
+	results := runRamp(ctx, w, rampConfig{
+		Steps: steps, Warmup: *warmup, Hold: *hold, Cooldown: *cooldown,
+		Thresholds: thresholds, StopOnTrip: *stopOnTrip,
+	})
+
+	if err := renderRPSReport(os.Stdout, results, w.Label(), presetID); err != nil {
+		slog.Warn("render report", "error", err)
+	}
+	if *csvPath != "" {
+		f, err := os.Create(*csvPath)
+		if err != nil {
+			slog.Error("create csv", "error", err)
+		} else {
+			if err := writeRPSCSV(f, results); err != nil {
+				slog.Error("write csv", "error", err)
+			}
+			_ = f.Close()
+		}
+	}
+	return maxRPSExitCode(results)
+}
+```
+
+- [ ] **Step 4: Wire into `dispatch`**
+
+In `tools/loadgen/main.go`, add a case to the `dispatch` switch (after the `history-sustained` case, before `default`):
+
+```go
+	case "max-rps":
+		return runMaxRPS(ctx, cfg, os.Args[2:])
+```
+
+Also update the usage line in `main` (`main.go:59`) to:
+
+```go
+		fmt.Fprintln(os.Stderr, "usage: loadgen <seed|run|teardown|members-sustained|members-capacity|history-sustained|max-rps> [flags]")
+```
+
+- [ ] **Step 5: Run the tests + build to verify**
+
+Run: `cd tools/loadgen && go test -run 'TestDefaultSteps|TestBuildThresholds' . 2>&1 | tail -5 && go build ./... 2>&1 | tail -5`
+Expected: tests PASS; build clean (no output).
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add tools/loadgen/maxrps.go tools/loadgen/maxrps_test.go tools/loadgen/main.go
+git commit -m "feat(loadgen): wire max-rps subcommand into dispatch"
+```
+
+---
+
+## Task 8: Integration test — end-to-end 2-step ramp
+
+**Files:**
+- Modify: `tools/loadgen/integration_test.go`
+
+Follow the existing `TestLoadgenSmallPreset_EndToEnd` in `integration_test.go` (build tag `//go:build integration`, `TestMain` via `testutil.RunTests` already present). That test creates the canonical stream, two ack-only durables (`message-worker`, `broadcast-worker`), a fake gatekeeper that forwards frontdoor sends to the canonical subject, and a fake broadcast-worker. The new test reuses the same scaffolding but drives the load through `newMessagesWorkload` + `runRamp` instead of a manual `Generator`.
+
+Key facts (verified against the existing test):
+- Connection in-test is `nats.Connect(testutil.NATS(t))` (the adapter opens its own connection internally via `natsutil.Connect`).
+- The canonical stream + the two durables MUST exist before the ramp runs, because `messagesWorkload.snapshotPending` calls `js.Consumer(canonical, "message-worker"/"broadcast-worker").Info`.
+- No Mongo seeding is needed: the fake gatekeeper does not validate against Mongo, and the generator picks subjects from the adapter's in-memory fixtures.
+- The fake gatekeeper does NOT reply, so E1 stays empty; `AttemptedOps` comes from the `loadgen_published_total` counter delta, which is the assertion target.
+
+- [ ] **Step 1: Re-read the existing test to confirm the scaffolding is unchanged**
+
+Run: `cd tools/loadgen && sed -n '20,120p' integration_test.go`
+Confirm the stream/durable/gatekeeper setup matches what is reused below.
+
+- [ ] **Step 2: Write the integration test**
+
+Add this function to `tools/loadgen/integration_test.go` (uses only `require`, which the file already imports — no new imports needed):
+
+```go
+func TestMaxRPS_Messages_TwoStepRamp(t *testing.T) {
+	ctx := context.Background()
+	siteID := "site-maxrps"
+
+	nc, err := nats.Connect(testutil.NATS(t))
+	require.NoError(t, err)
+	defer nc.Drain()
+	js, err := jetstream.New(nc)
+	require.NoError(t, err)
+
+	canonical := stream.MessagesCanonical(siteID)
+	_, err = js.CreateOrUpdateStream(ctx, jetstream.StreamConfig{
+		Name:     canonical.Name,
+		Subjects: canonical.Subjects,
+	})
+	require.NoError(t, err)
+
+	// Ack-only durables so the canonical stream drains to zero (pending stays low).
+	for _, durable := range []string{"message-worker", "broadcast-worker"} {
+		cons, err := js.CreateOrUpdateConsumer(ctx, canonical.Name, jetstream.ConsumerConfig{
+			Durable:   durable,
+			AckPolicy: jetstream.AckExplicitPolicy,
+		})
+		require.NoError(t, err)
+		cc, err := cons.Consume(func(msg jetstream.Msg) { _ = msg.Ack() })
+		require.NoError(t, err)
+		defer cc.Stop()
+	}
+
+	// Fake gatekeeper: frontdoor send -> canonical event.
+	gkSub, err := nc.Subscribe(subject.MsgSendWildcard(siteID), func(m *nats.Msg) {
+		var req model.SendMessageRequest
+		if err := json.Unmarshal(m.Data, &req); err != nil {
+			return
+		}
+		evt := model.MessageEvent{
+			Message:   model.Message{ID: req.ID, Content: req.Content, CreatedAt: time.Now().UTC()},
+			SiteID:    siteID,
+			Timestamp: time.Now().UnixMilli(),
+		}
+		data, _ := json.Marshal(evt)
+		_, _ = js.Publish(ctx, subject.MsgCanonicalCreated(siteID), data)
+	})
+	require.NoError(t, err)
+	defer gkSub.Unsubscribe()
+
+	cfg := &config{NatsURL: testutil.NATS(t), SiteID: siteID, MetricsAddr: ":0", MaxInFlight: 100}
+	preset, _ := BuiltinPreset("small")
+
+	w, cleanup, err := newMessagesWorkload(ctx, cfg, &preset, InjectFrontdoor, 42)
+	require.NoError(t, err)
+	defer cleanup()
+
+	results := runRamp(ctx, w, rampConfig{
+		Steps: []int{50, 100}, Warmup: time.Second, Hold: 2 * time.Second, Cooldown: 0,
+		Thresholds: rpsThresholds{
+			P95: time.Second, P99: 2 * time.Second, ErrorRate: 0.9,
+			PendingGrowth: 1_000_000, RateTolerance: 0.9,
+		},
+		StopOnTrip: true,
+	})
+
+	require.Len(t, results, 2)
+	for _, r := range results {
+		require.NotEqual(t, verdictTrip, r.Kind, "reasons=%v", r.Reasons)
+		require.Greater(t, r.AttemptedOps, 0)
+		require.Greater(t, r.AchievedRPS, 0.0)
+	}
+}
+```
+
+- [ ] **Step 3: Run the integration test**
+
+Run: `make test-integration SERVICE=tools/loadgen 2>&1 | tail -30`
+Expected: PASS (Docker required). The whole-package integration build must compile; if the existing file already declares `siteID`/`canonical` at function scope elsewhere there is no conflict (each test function has its own scope).
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add tools/loadgen/integration_test.go
+git commit -m "test(loadgen): integration coverage for max-rps messages ramp"
+```
+
+---
+
+## Task 9: README and deploy Makefile target
+
+**Files:**
+- Modify: `tools/loadgen/README.md`
+- Modify: `tools/loadgen/deploy/Makefile`
+
+- [ ] **Step 1: Read both files**
+
+Run: `cd tools/loadgen && sed -n '1,40p' README.md && echo '--- MAKEFILE ---' && sed -n '1,60p' deploy/Makefile`
+Note the existing section style and how `run` / `run-history` (or `history-sustained`) targets pass env (`PRESET`, `RATE`, etc.).
+
+- [ ] **Step 2: Add a README section**
+
+Append a `## max-rps — auto-find Max RPS under SLO` section to `tools/loadgen/README.md` documenting:
+- the subcommand and `--workload=messages|history`,
+- the flag table (copy from the spec §2),
+- how to read the `ANSWER: max RPS = N` line and INCONCLUSIVE rows,
+- two quick-start command examples (one per workload), e.g.:
+
+```bash
+# messages: ramp 500..10k rps, stop at first SLO breach
+loadgen max-rps --workload=messages --preset=medium --steps=500,1k,2k,5k,10k
+
+# history: per-endpoint SLO, custom p95
+loadgen max-rps --workload=history --preset=history-medium --steps=200,500,1k,2k --slo-p95=80ms
+```
+
+- [ ] **Step 3: Add a Makefile target**
+
+Add to `tools/loadgen/deploy/Makefile` (matching the existing target style, with `WORKLOAD`/`PRESET`/`STEPS` overridable):
+
+```makefile
+WORKLOAD ?= messages
+STEPS ?=
+
+.PHONY: run-max-rps
+run-max-rps: ## Ramp RPS to find the max under SLO (WORKLOAD=messages|history PRESET=.. STEPS=..)
+	$(COMPOSE) run --rm loadgen max-rps --workload=$(WORKLOAD) --preset=$(PRESET) $(if $(STEPS),--steps=$(STEPS),)
+```
+
+> Match `$(COMPOSE)`, the service name (`loadgen`), and the `##` help convention to whatever the existing `run` target uses; adjust if the file differs.
+
+- [ ] **Step 4: Verify the Makefile parses**
+
+Run: `make -C tools/loadgen/deploy -n run-max-rps PRESET=medium 2>&1 | tail -5`
+Expected: prints the docker compose command without executing it (no make syntax error).
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tools/loadgen/README.md tools/loadgen/deploy/Makefile
+git commit -m "docs(loadgen): document max-rps subcommand and add run-max-rps target"
+```
+
+---
+
+## Task 10: Final verification
+
+**Files:** none (verification only)
+
+- [ ] **Step 1: Format**
+
+Run: `make fmt 2>&1 | tail -5`
+Expected: no diff complaints; reformats if needed.
+
+- [ ] **Step 2: Lint**
+
+Run: `make lint 2>&1 | tail -20`
+Expected: 0 issues. Fix any reported (common: unused var, `gofmt`, error-wrap style per CLAUDE.md). If `unparam`/`revive` flags the `ctx context.Context` parameter on `newMessagesWorkload` / `newHistoryWorkload` as unused, drop the parameter and update both call sites in `maxrps.go` and the integration test, or `_ = ctx`.
+
+- [ ] **Step 3: Unit tests with race detector**
+
+Run: `make test SERVICE=tools/loadgen 2>&1 | tail -20`
+Expected: PASS under `-race`.
+
+- [ ] **Step 4: SAST**
+
+Run: `make sast 2>&1 | tail -20`
+Expected: no medium+ findings. (No `InsecureSkipVerify` or unsafe conversions introduced; the `int64(uint64)` in `consumerPendingDelta.Delta` is on small bounded values — if gosec flags G115, add `// #nosec G115 -- NumPending is a small bounded backlog count` directly above the conversion.)
+
+- [ ] **Step 5: Integration tests**
+
+Run: `make test-integration SERVICE=tools/loadgen 2>&1 | tail -20`
+Expected: PASS (Docker required).
+
+- [ ] **Step 6: Commit any fixes, then push**
+
+```bash
+git add -A
+git commit -m "chore(loadgen): lint/sast fixes for max-rps" # only if there were fixes
+git push -u origin claude/max-rps-slo-loadgen-ajKRi
+```
+
+---
+
+## Notes for the implementer
+
+- **No `docs/client-api.md` change** — this is tooling, not a client-facing handler.
+- **Determinism:** the messages adapter reconstructs `Generator` per step with the same seed; this replays the same RNG sequence each step, which is fine (the workload shape per step is deterministic). The history adapter uses fresh collectors per warmup/hold run.
+- **Error-rate definition (messages):** `FailedOps` counts hard publish/gatekeeper/marshal/bad_reply errors only. Missing replies/broadcasts are deliberately NOT counted as failures (late stragglers would create false trips); slow or dropped delivery is caught by the E2 latency and pending-growth signals instead. This is intentional — do not "fix" it by adding missing-reply counts to `FailedOps`.
+- **INCONCLUSIVE precedence:** TRIP is checked before the rate-shortfall guard. Do not reorder — a server saturating backpressures the open-loop generator (achieved < target), and that must read as TRIP, not INCONCLUSIVE. See spec §5.
+- **Convergence with PR #234:** all identifiers are `rps`-prefixed (`rpsThresholds`, `evaluateRPSStep`, `parseRPSSteps`, `renderRPSReport`, …) so there is no `package main` symbol collision with #234's `daily_*` code regardless of merge order.
+```

From 9d5de5a36857ba78dc30dc4078d6fed4cfbfbf8d Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 18:05:39 +0000
Subject: [PATCH 03/16] feat(loadgen): add rps step verdict types and
 evaluateRPSStep

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/verdict.go      | 166 +++++++++++++++++++++++++++++++++
 tools/loadgen/verdict_test.go | 168 ++++++++++++++++++++++++++++++++++
 2 files changed, 334 insertions(+)
 create mode 100644 tools/loadgen/verdict.go
 create mode 100644 tools/loadgen/verdict_test.go

diff --git a/tools/loadgen/verdict.go b/tools/loadgen/verdict.go
new file mode 100644
index 000000000..36fe017ad
--- /dev/null
+++ b/tools/loadgen/verdict.go
@@ -0,0 +1,166 @@
+package main
+
+import (
+	"fmt"
+	"time"
+)
+
+// rpsThresholds holds the SLO limits a step is judged against. Every gated
+// latency series shares the same P95/P99 limits.
+type rpsThresholds struct {
+	P95, P99      time.Duration
+	ErrorRate     float64
+	PendingGrowth uint64 // messages only; per-durable end-start NumPending delta
+	RateTolerance float64
+}
+
+// seriesSamples is one named latency tape (e.g. "E1","E2" or "history","thread").
+type seriesSamples struct {
+	Name    string
+	Samples []time.Duration
+}
+
+// consumerPendingDelta is one durable's NumPending at the hold boundaries.
+type consumerPendingDelta struct {
+	Durable    string
+	Start, End uint64
+}
+
+// Delta returns End-Start as a signed value (it can be negative if the backlog drained).
+func (d consumerPendingDelta) Delta() int64 { return int64(d.End) - int64(d.Start) }
+
+// rpsStepInputs is the normalized, workload-agnostic measurement of one step.
+type rpsStepInputs struct {
+	TargetRPS    int
+	Hold         time.Duration
+	AttemptedOps int
+	FailedOps    int
+	Saturation   int // open-loop self-saturation tally (corroborates shortfall)
+	Latencies    []seriesSamples
+	Pending      []consumerPendingDelta // empty for history
+	// Inconclusive is set by the adapter when measurement itself failed (e.g. a
+	// pending snapshot errored), independent of the system under test.
+	Inconclusive       bool
+	InconclusiveReason string
+}
+
+type verdictKind int
+
+const (
+	verdictPass verdictKind = iota
+	verdictTrip
+	verdictInconclusive
+)
+
+func (k verdictKind) String() string {
+	switch k {
+	case verdictTrip:
+		return "TRIP"
+	case verdictInconclusive:
+		return "INCONCLUSIVE"
+	default:
+		return "PASS"
+	}
+}
+
+// seriesPercentile is a named series' computed percentiles, for reporting.
+type seriesPercentile struct {
+	Name string
+	Pct  Percentiles
+}
+
+// rpsStepResult is the verdict for one step.
+type rpsStepResult struct {
+	TargetRPS    int
+	AchievedRPS  float64
+	AttemptedOps int
+	FailedOps    int
+	Saturation   int
+	ErrorRate    float64
+	Latencies    []seriesPercentile
+	WorstDurable string
+	WorstDelta   int64
+	Kind         verdictKind
+	Reasons      []string
+}
+
+// evaluateRPSStep classifies a step PASS / TRIP / INCONCLUSIVE.
+//
+// Precedence (deliberately differs from PR #234):
+//  1. explicit measurement-failure inconclusive (harness/measurement issue),
+//  2. TRIP if any SLO signal is over threshold — server-induced backpressure
+//     must NOT be misread as a harness limit,
+//  3. INCONCLUSIVE if the harness could not push the target rate while the
+//     system looked healthy (the load box is the limit, not the service),
+//  4. PASS.
+func evaluateRPSStep(in rpsStepInputs, th rpsThresholds) rpsStepResult {
+	res := rpsStepResult{
+		TargetRPS:    in.TargetRPS,
+		AttemptedOps: in.AttemptedOps,
+		FailedOps:    in.FailedOps,
+		Saturation:   in.Saturation,
+	}
+	if in.Hold > 0 {
+		res.AchievedRPS = float64(in.AttemptedOps) / in.Hold.Seconds()
+	}
+	if in.AttemptedOps > 0 {
+		res.ErrorRate = float64(in.FailedOps) / float64(in.AttemptedOps)
+	}
+
+	// Compute percentiles for every series (always, so the report has data).
+	for _, s := range in.Latencies {
+		res.Latencies = append(res.Latencies, seriesPercentile{Name: s.Name, Pct: ComputePercentiles(s.Samples)})
+	}
+	// Worst pending delta (always, for the report column).
+	res.WorstDelta = 0
+	for _, p := range in.Pending {
+		if d := p.Delta(); d > res.WorstDelta || res.WorstDurable == "" {
+			res.WorstDelta = d
+			res.WorstDurable = p.Durable
+		}
+	}
+
+	// (1) Measurement failure short-circuits.
+	if in.Inconclusive {
+		res.Kind = verdictInconclusive
+		res.Reasons = []string{in.InconclusiveReason}
+		return res
+	}
+
+	// (2) TRIP conditions (accumulate human-readable reasons).
+	var reasons []string
+	for _, sp := range res.Latencies {
+		if sp.Pct.P95 > th.P95 {
+			reasons = append(reasons, fmt.Sprintf("%s p95=%s > %s", sp.Name, sp.Pct.P95, th.P95))
+		}
+		if sp.Pct.P99 > th.P99 {
+			reasons = append(reasons, fmt.Sprintf("%s p99=%s > %s", sp.Name, sp.Pct.P99, th.P99))
+		}
+	}
+	if res.ErrorRate > th.ErrorRate {
+		reasons = append(reasons, fmt.Sprintf("error rate %.3f%% > %.3f%%", res.ErrorRate*100, th.ErrorRate*100))
+	}
+	for _, p := range in.Pending {
+		if d := p.Delta(); d > int64(th.PendingGrowth) {
+			reasons = append(reasons, fmt.Sprintf("%s pending +%d > +%d", p.Durable, d, th.PendingGrowth))
+		}
+	}
+	if len(reasons) > 0 {
+		res.Kind = verdictTrip
+		res.Reasons = reasons
+		return res
+	}
+
+	// (3) Healthy-but-cannot-push -> INCONCLUSIVE.
+	if th.RateTolerance > 0 && res.AchievedRPS < float64(in.TargetRPS)*(1-th.RateTolerance) {
+		res.Kind = verdictInconclusive
+		res.Reasons = []string{fmt.Sprintf(
+			"achieved %.0f rps < %.0f%% of target %d rps (saturation=%d) — load box limited",
+			res.AchievedRPS, (1-th.RateTolerance)*100, in.TargetRPS, in.Saturation)}
+		return res
+	}
+
+	// (4) PASS.
+	res.Kind = verdictPass
+	return res
+}
diff --git a/tools/loadgen/verdict_test.go b/tools/loadgen/verdict_test.go
new file mode 100644
index 000000000..bb69a37a6
--- /dev/null
+++ b/tools/loadgen/verdict_test.go
@@ -0,0 +1,168 @@
+package main
+
+import (
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func ms(n int) time.Duration { return time.Duration(n) * time.Millisecond }
+
+// nLatencies returns a slice of n identical-latency samples.
+func nLatencies(n int, d time.Duration) []time.Duration {
+	out := make([]time.Duration, n)
+	for i := range out {
+		out[i] = d
+	}
+	return out
+}
+
+func defaultRPSThresholds() rpsThresholds {
+	return rpsThresholds{
+		P95:           ms(100),
+		P99:           ms(250),
+		ErrorRate:     0.001,
+		PendingGrowth: 1000,
+		RateTolerance: 0.05,
+	}
+}
+
+func TestEvaluateRPSStep(t *testing.T) {
+	th := defaultRPSThresholds()
+	tests := []struct {
+		name     string
+		in       rpsStepInputs
+		wantKind verdictKind
+	}{
+		{
+			name: "all healthy passes",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000, FailedOps: 0,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(20))}},
+				Pending:   []consumerPendingDelta{{Durable: "message-worker", Start: 0, End: 10}},
+			},
+			wantKind: verdictPass,
+		},
+		{
+			name: "p95 over threshold trips",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(150))}},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "p99 over threshold trips",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				// 98 samples at 20ms, 2 at 300ms -> p95=20ms (ok), p99=300ms (>250ms).
+				// pick(0.99) = int(99*0.99) = index 98 = first 300ms value.
+				Latencies: []seriesSamples{{Name: "E1", Samples: append(nLatencies(98, ms(20)), ms(300), ms(300))}},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "error rate over threshold trips",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000, FailedOps: 5, // 0.5% > 0.1%
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(20))}},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "pending growth over threshold trips",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(20))}},
+				Pending:   []consumerPendingDelta{{Durable: "broadcast-worker", Start: 0, End: 1500}},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "per-endpoint: slow thread trips even if history fast",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{
+					{Name: "history", Samples: nLatencies(100, ms(20))},
+					{Name: "thread", Samples: nLatencies(100, ms(180))},
+				},
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "healthy but rate shortfall is inconclusive",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 800, // 80% < 95%
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(80, ms(20))}},
+			},
+			wantKind: verdictInconclusive,
+		},
+		{
+			name: "trip beats shortfall: high latency AND low achieved is a TRIP",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 800, // shortfall...
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(80, ms(400))}}, // ...but slow
+			},
+			wantKind: verdictTrip,
+		},
+		{
+			name: "explicit inconclusive flag short-circuits",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Inconclusive: true, InconclusiveReason: "pending snapshot failed",
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(20))}},
+			},
+			wantKind: verdictInconclusive,
+		},
+		{
+			name: "p95 exactly at threshold passes (boundary)",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(100))}},
+			},
+			wantKind: verdictPass,
+		},
+		{
+			name: "empty samples does not panic and passes on other signals",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Latencies: []seriesSamples{{Name: "E1", Samples: nil}},
+			},
+			wantKind: verdictPass,
+		},
+	}
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got := evaluateRPSStep(tt.in, th)
+			assert.Equal(t, tt.wantKind, got.Kind, "reasons=%v", got.Reasons)
+		})
+	}
+}
+
+func TestEvaluateRPSStep_AchievedAndErrorRate(t *testing.T) {
+	th := defaultRPSThresholds()
+	in := rpsStepInputs{
+		TargetRPS: 1000, Hold: 2 * time.Second, AttemptedOps: 1000, FailedOps: 100,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(20))}},
+	}
+	got := evaluateRPSStep(in, th)
+	assert.InDelta(t, 500.0, got.AchievedRPS, 0.01) // 1000 ops / 2s
+	assert.InDelta(t, 0.1, got.ErrorRate, 0.0001)   // 100/1000
+}
+
+func TestEvaluateRPSStep_WorstPendingReported(t *testing.T) {
+	th := defaultRPSThresholds()
+	in := rpsStepInputs{
+		TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(20))}},
+		Pending: []consumerPendingDelta{
+			{Durable: "message-worker", Start: 0, End: 50},
+			{Durable: "broadcast-worker", Start: 100, End: 700}, // delta 600, the worst
+		},
+	}
+	got := evaluateRPSStep(in, th)
+	assert.Equal(t, "broadcast-worker", got.WorstDurable)
+	assert.Equal(t, int64(600), got.WorstDelta)
+	assert.Equal(t, verdictPass, got.Kind) // 600 < 1000
+}

From 59680c7f02f41fdfa7b7dfa82859013ebabbe8e8 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 18:23:05 +0000
Subject: [PATCH 04/16] =?UTF-8?q?fix(loadgen):=20address=20Task=201=20revi?=
 =?UTF-8?q?ew=20=E2=80=94=20pointer=20verdict=20param,=20String()=20and=20?=
 =?UTF-8?q?precedence=20tests?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/verdict.go      |  3 +--
 tools/loadgen/verdict_test.go | 35 ++++++++++++++++++++++++++++++++---
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/tools/loadgen/verdict.go b/tools/loadgen/verdict.go
index 36fe017ad..024c5da22 100644
--- a/tools/loadgen/verdict.go
+++ b/tools/loadgen/verdict.go
@@ -93,7 +93,7 @@ type rpsStepResult struct {
 //  3. INCONCLUSIVE if the harness could not push the target rate while the
 //     system looked healthy (the load box is the limit, not the service),
 //  4. PASS.
-func evaluateRPSStep(in rpsStepInputs, th rpsThresholds) rpsStepResult {
+func evaluateRPSStep(in *rpsStepInputs, th rpsThresholds) rpsStepResult {
 	res := rpsStepResult{
 		TargetRPS:    in.TargetRPS,
 		AttemptedOps: in.AttemptedOps,
@@ -112,7 +112,6 @@ func evaluateRPSStep(in rpsStepInputs, th rpsThresholds) rpsStepResult {
 		res.Latencies = append(res.Latencies, seriesPercentile{Name: s.Name, Pct: ComputePercentiles(s.Samples)})
 	}
 	// Worst pending delta (always, for the report column).
-	res.WorstDelta = 0
 	for _, p := range in.Pending {
 		if d := p.Delta(); d > res.WorstDelta || res.WorstDurable == "" {
 			res.WorstDelta = d
diff --git a/tools/loadgen/verdict_test.go b/tools/loadgen/verdict_test.go
index bb69a37a6..0ace99e8a 100644
--- a/tools/loadgen/verdict_test.go
+++ b/tools/loadgen/verdict_test.go
@@ -131,11 +131,23 @@ func TestEvaluateRPSStep(t *testing.T) {
 			},
 			wantKind: verdictPass,
 		},
+		{
+			name: "explicit inconclusive beats TRIP signals",
+			in: rpsStepInputs{
+				TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+				Inconclusive: true, InconclusiveReason: "snapshot failed",
+				Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(400))}},
+			},
+			wantKind: verdictInconclusive,
+		},
 	}
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
-			got := evaluateRPSStep(tt.in, th)
+			got := evaluateRPSStep(&tt.in, th)
 			assert.Equal(t, tt.wantKind, got.Kind, "reasons=%v", got.Reasons)
+			if tt.wantKind == verdictTrip {
+				assert.NotEmpty(t, got.Reasons, "TRIP verdict must have at least one reason")
+			}
 		})
 	}
 }
@@ -146,7 +158,7 @@ func TestEvaluateRPSStep_AchievedAndErrorRate(t *testing.T) {
 		TargetRPS: 1000, Hold: 2 * time.Second, AttemptedOps: 1000, FailedOps: 100,
 		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(20))}},
 	}
-	got := evaluateRPSStep(in, th)
+	got := evaluateRPSStep(&in, th)
 	assert.InDelta(t, 500.0, got.AchievedRPS, 0.01) // 1000 ops / 2s
 	assert.InDelta(t, 0.1, got.ErrorRate, 0.0001)   // 100/1000
 }
@@ -161,8 +173,25 @@ func TestEvaluateRPSStep_WorstPendingReported(t *testing.T) {
 			{Durable: "broadcast-worker", Start: 100, End: 700}, // delta 600, the worst
 		},
 	}
-	got := evaluateRPSStep(in, th)
+	got := evaluateRPSStep(&in, th)
 	assert.Equal(t, "broadcast-worker", got.WorstDurable)
 	assert.Equal(t, int64(600), got.WorstDelta)
 	assert.Equal(t, verdictPass, got.Kind) // 600 < 1000
+
+	// Over-threshold p95 trip: verify the reason string contains "p95=" and "> ".
+	tripIn := rpsStepInputs{
+		TargetRPS: 1000, Hold: time.Second, AttemptedOps: 1000,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(100, ms(400))}},
+	}
+	tripGot := evaluateRPSStep(&tripIn, th)
+	assert.Equal(t, verdictTrip, tripGot.Kind)
+	assert.NotEmpty(t, tripGot.Reasons)
+	assert.Contains(t, tripGot.Reasons[0], "p95=")
+	assert.Contains(t, tripGot.Reasons[0], "> ")
+}
+
+func TestVerdictKind_String(t *testing.T) {
+	assert.Equal(t, "PASS", verdictPass.String())
+	assert.Equal(t, "TRIP", verdictTrip.String())
+	assert.Equal(t, "INCONCLUSIVE", verdictInconclusive.String())
 }

From a7f35a23110401ad173a6b069f43cb18a1437530 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 18:31:07 +0000
Subject: [PATCH 05/16] feat(loadgen): add rps ramp engine (parseRPSSteps,
 runRamp)

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/ramp.go      | 122 +++++++++++++++++++++++++++++++++++++
 tools/loadgen/ramp_test.go | 111 +++++++++++++++++++++++++++++++++
 2 files changed, 233 insertions(+)
 create mode 100644 tools/loadgen/ramp.go
 create mode 100644 tools/loadgen/ramp_test.go

diff --git a/tools/loadgen/ramp.go b/tools/loadgen/ramp.go
new file mode 100644
index 000000000..5e1118124
--- /dev/null
+++ b/tools/loadgen/ramp.go
@@ -0,0 +1,122 @@
+package main
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"log/slog"
+	"strconv"
+	"strings"
+	"time"
+)
+
+// rpsWorkload is the engine<->adapter seam. RunStep drives open-loop load at
+// targetRPS, owning its own warmup/hold measurement boundaries, and returns the
+// normalized inputs for the hold window. The engine owns cooldown, stop-on-trip
+// and last-pass tracking.
+type rpsWorkload interface {
+	RunStep(ctx context.Context, targetRPS int, warmup, hold time.Duration) (rpsStepInputs, error)
+	Label() string
+}
+
+// rampConfig parameterizes a ramp.
+type rampConfig struct {
+	Steps                  []int
+	Warmup, Hold, Cooldown time.Duration
+	Thresholds             rpsThresholds
+	StopOnTrip             bool
+}
+
+// parseRPSSteps parses a comma-separated, strictly-ascending list of positive
+// RPS values. A trailing "k" multiplies by 1000 (e.g. "5k" -> 5000).
+func parseRPSSteps(s string) ([]int, error) {
+	parts := strings.Split(s, ",")
+	out := make([]int, 0, len(parts))
+	prev := 0
+	for _, raw := range parts {
+		tok := strings.TrimSpace(raw)
+		if tok == "" {
+			return nil, fmt.Errorf("empty step in %q", s)
+		}
+		mult := 1
+		if strings.HasSuffix(tok, "k") || strings.HasSuffix(tok, "K") {
+			mult = 1000
+			tok = tok[:len(tok)-1]
+		}
+		n, err := strconv.Atoi(strings.TrimSpace(tok))
+		if err != nil {
+			return nil, fmt.Errorf("bad step %q: %w", raw, err)
+		}
+		n *= mult
+		if n <= 0 {
+			return nil, fmt.Errorf("step must be > 0, got %d", n)
+		}
+		if n <= prev {
+			return nil, fmt.Errorf("steps must be strictly ascending, got %d after %d", n, prev)
+		}
+		prev = n
+		out = append(out, n)
+	}
+	if len(out) == 0 {
+		return nil, fmt.Errorf("no steps parsed from %q", s)
+	}
+	return out, nil
+}
+
+// waitOrCancel sleeps for d or returns early with ctx.Err() if ctx is cancelled.
+func waitOrCancel(ctx context.Context, d time.Duration) error {
+	if d <= 0 {
+		return ctx.Err()
+	}
+	t := time.NewTimer(d)
+	defer t.Stop()
+	select {
+	case <-ctx.Done():
+		return ctx.Err()
+	case <-t.C:
+		return nil
+	}
+}
+
+// runRamp executes each step in order. It stops early on the first TRIP when
+// StopOnTrip is set (an INCONCLUSIVE step never stops the ramp), and on ctx
+// cancellation, returning whatever results were gathered.
+func runRamp(ctx context.Context, w rpsWorkload, cfg *rampConfig) []rpsStepResult {
+	var results []rpsStepResult
+	for i, n := range cfg.Steps {
+		if ctx.Err() != nil {
+			break
+		}
+		in, err := w.RunStep(ctx, n, cfg.Warmup, cfg.Hold)
+		if err != nil {
+			if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+				break
+			}
+			slog.Warn("step run failed", "rps", n, "error", err)
+			break
+		}
+		res := evaluateRPSStep(&in, cfg.Thresholds)
+		results = append(results, res)
+		slog.Info("step complete", "rps", n, "verdict", res.Kind.String(),
+			"achieved", res.AchievedRPS, "reasons", res.Reasons)
+		if cfg.StopOnTrip && res.Kind == verdictTrip {
+			break
+		}
+		if i < len(cfg.Steps)-1 {
+			if err := waitOrCancel(ctx, cfg.Cooldown); err != nil {
+				break
+			}
+		}
+	}
+	return results
+}
+
+// maxRPSExitCode returns 0 if any step PASSed, else 1.
+func maxRPSExitCode(results []rpsStepResult) int {
+	for i := range results {
+		if results[i].Kind == verdictPass {
+			return 0
+		}
+	}
+	return 1
+}
diff --git a/tools/loadgen/ramp_test.go b/tools/loadgen/ramp_test.go
new file mode 100644
index 000000000..1bdbb5e1f
--- /dev/null
+++ b/tools/loadgen/ramp_test.go
@@ -0,0 +1,111 @@
+package main
+
+import (
+	"context"
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestParseRPSSteps(t *testing.T) {
+	tests := []struct {
+		in      string
+		want    []int
+		wantErr bool
+	}{
+		{in: "500,1000,2000", want: []int{500, 1000, 2000}},
+		{in: "1k,2k,5k", want: []int{1000, 2000, 5000}},
+		{in: " 500 , 1k ", want: []int{500, 1000}},
+		{in: "1000", want: []int{1000}},
+		{in: "", wantErr: true},
+		{in: "abc", wantErr: true},
+		{in: "1000,500", wantErr: true},  // not ascending
+		{in: "0,1000", wantErr: true},    // not positive
+		{in: "1000,1000", wantErr: true}, // not strictly ascending
+	}
+	for _, tt := range tests {
+		t.Run(tt.in, func(t *testing.T) {
+			got, err := parseRPSSteps(tt.in)
+			if tt.wantErr {
+				assert.Error(t, err)
+				return
+			}
+			require.NoError(t, err)
+			assert.Equal(t, tt.want, got)
+		})
+	}
+}
+
+// fakeWorkload returns canned inputs, one per step, in order.
+type fakeWorkload struct {
+	inputs []rpsStepInputs
+	calls  int
+}
+
+func (f *fakeWorkload) Label() string { return "fake" }
+func (f *fakeWorkload) RunStep(_ context.Context, _ int, _, _ time.Duration) (rpsStepInputs, error) {
+	in := f.inputs[f.calls]
+	f.calls++
+	return in, nil
+}
+
+func passInputs(target int) rpsStepInputs {
+	return rpsStepInputs{TargetRPS: target, Hold: time.Second, AttemptedOps: target,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(20))}}}
+}
+
+func tripInputs(target int) rpsStepInputs {
+	return rpsStepInputs{TargetRPS: target, Hold: time.Second, AttemptedOps: target,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(400))}}}
+}
+
+func inconclusiveInputs(target int) rpsStepInputs {
+	return rpsStepInputs{TargetRPS: target, Hold: time.Second, AttemptedOps: target / 2,
+		Latencies: []seriesSamples{{Name: "E1", Samples: nLatencies(10, ms(20))}}}
+}
+
+func TestRunRamp_StopsOnTrip(t *testing.T) {
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500), tripInputs(1000), passInputs(2000)}}
+	cfg := rampConfig{Steps: []int{500, 1000, 2000}, Hold: time.Second,
+		Thresholds: defaultRPSThresholds(), StopOnTrip: true}
+	results := runRamp(context.Background(), w, &cfg)
+	require.Len(t, results, 2) // stopped after the trip at 1000
+	assert.Equal(t, verdictPass, results[0].Kind)
+	assert.Equal(t, verdictTrip, results[1].Kind)
+	assert.Equal(t, 2, w.calls)
+}
+
+func TestRunRamp_DoesNotStopOnInconclusive(t *testing.T) {
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500), inconclusiveInputs(1000), passInputs(2000)}}
+	cfg := rampConfig{Steps: []int{500, 1000, 2000}, Hold: time.Second,
+		Thresholds: defaultRPSThresholds(), StopOnTrip: true}
+	results := runRamp(context.Background(), w, &cfg)
+	require.Len(t, results, 3)
+	assert.Equal(t, verdictInconclusive, results[1].Kind)
+	assert.Equal(t, verdictPass, results[2].Kind)
+}
+
+func TestRunRamp_NoStopOnTripRunsAll(t *testing.T) {
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500), tripInputs(1000), tripInputs(2000)}}
+	cfg := rampConfig{Steps: []int{500, 1000, 2000}, Hold: time.Second,
+		Thresholds: defaultRPSThresholds(), StopOnTrip: false}
+	results := runRamp(context.Background(), w, &cfg)
+	require.Len(t, results, 3)
+}
+
+func TestMaxRPSExitCode(t *testing.T) {
+	pass := []rpsStepResult{{Kind: verdictPass}, {Kind: verdictTrip}}
+	none := []rpsStepResult{{Kind: verdictInconclusive}, {Kind: verdictTrip}}
+	assert.Equal(t, 0, maxRPSExitCode(pass))
+	assert.Equal(t, 1, maxRPSExitCode(none))
+	assert.Equal(t, 1, maxRPSExitCode(nil))
+}
+
+func TestWaitOrCancel(t *testing.T) {
+	require.NoError(t, waitOrCancel(context.Background(), time.Millisecond))
+	ctx, cancel := context.WithCancel(context.Background())
+	cancel()
+	assert.Error(t, waitOrCancel(ctx, time.Hour))
+}

From fb5c768bcf17e18c4e3fa0832ea89ce49db3e815 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 18:52:24 +0000
Subject: [PATCH 06/16] =?UTF-8?q?fix(loadgen):=20address=20Task=202=20revi?=
 =?UTF-8?q?ew=20=E2=80=94=20cover=20runRamp=20branches,=20overflow=20guard?=
 =?UTF-8?q?,=20cleanups?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/ramp.go      | 10 +++---
 tools/loadgen/ramp_test.go | 62 +++++++++++++++++++++++++++++++++-----
 2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/tools/loadgen/ramp.go b/tools/loadgen/ramp.go
index 5e1118124..dc223cf64 100644
--- a/tools/loadgen/ramp.go
+++ b/tools/loadgen/ramp.go
@@ -5,6 +5,7 @@ import (
 	"errors"
 	"fmt"
 	"log/slog"
+	"math"
 	"strconv"
 	"strings"
 	"time"
@@ -12,8 +13,7 @@ import (
 
 // rpsWorkload is the engine<->adapter seam. RunStep drives open-loop load at
 // targetRPS, owning its own warmup/hold measurement boundaries, and returns the
-// normalized inputs for the hold window. The engine owns cooldown, stop-on-trip
-// and last-pass tracking.
+// normalized inputs for the hold window. The engine owns cooldown and stop-on-trip.
 type rpsWorkload interface {
 	RunStep(ctx context.Context, targetRPS int, warmup, hold time.Duration) (rpsStepInputs, error)
 	Label() string
@@ -47,6 +47,9 @@ func parseRPSSteps(s string) ([]int, error) {
 		if err != nil {
 			return nil, fmt.Errorf("bad step %q: %w", raw, err)
 		}
+		if mult > 1 && n > math.MaxInt/mult {
+			return nil, fmt.Errorf("step %q overflows int", raw)
+		}
 		n *= mult
 		if n <= 0 {
 			return nil, fmt.Errorf("step must be > 0, got %d", n)
@@ -57,9 +60,6 @@ func parseRPSSteps(s string) ([]int, error) {
 		prev = n
 		out = append(out, n)
 	}
-	if len(out) == 0 {
-		return nil, fmt.Errorf("no steps parsed from %q", s)
-	}
 	return out, nil
 }
 
diff --git a/tools/loadgen/ramp_test.go b/tools/loadgen/ramp_test.go
index 1bdbb5e1f..2d6fb6fe8 100644
--- a/tools/loadgen/ramp_test.go
+++ b/tools/loadgen/ramp_test.go
@@ -2,6 +2,7 @@ package main
 
 import (
 	"context"
+	"errors"
 	"testing"
 	"time"
 
@@ -21,9 +22,10 @@ func TestParseRPSSteps(t *testing.T) {
 		{in: "1000", want: []int{1000}},
 		{in: "", wantErr: true},
 		{in: "abc", wantErr: true},
-		{in: "1000,500", wantErr: true},  // not ascending
-		{in: "0,1000", wantErr: true},    // not positive
-		{in: "1000,1000", wantErr: true}, // not strictly ascending
+		{in: "1000,500", wantErr: true},             // not ascending
+		{in: "0,1000", wantErr: true},               // not positive
+		{in: "1000,1000", wantErr: true},            // not strictly ascending
+		{in: "9223372036854775807k", wantErr: true}, // overflows int
 	}
 	for _, tt := range tests {
 		t.Run(tt.in, func(t *testing.T) {
@@ -40,15 +42,26 @@ func TestParseRPSSteps(t *testing.T) {
 
 // fakeWorkload returns canned inputs, one per step, in order.
 type fakeWorkload struct {
-	inputs []rpsStepInputs
-	calls  int
+	inputs   []rpsStepInputs
+	errs     map[int]error // step index -> error to return
+	calls    int
+	cancel   context.CancelFunc // if non-nil, called at the start of RunStep #cancelOn
+	cancelOn int
 }
 
 func (f *fakeWorkload) Label() string { return "fake" }
 func (f *fakeWorkload) RunStep(_ context.Context, _ int, _, _ time.Duration) (rpsStepInputs, error) {
-	in := f.inputs[f.calls]
+	i := f.calls
 	f.calls++
-	return in, nil
+	if f.cancel != nil && i == f.cancelOn {
+		f.cancel()
+	}
+	if f.errs != nil {
+		if err := f.errs[i]; err != nil {
+			return rpsStepInputs{}, err
+		}
+	}
+	return f.inputs[i], nil
 }
 
 func passInputs(target int) rpsStepInputs {
@@ -95,6 +108,37 @@ func TestRunRamp_NoStopOnTripRunsAll(t *testing.T) {
 	require.Len(t, results, 3)
 }
 
+func TestRunRamp_StopsOnRunStepError(t *testing.T) {
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500), {}, passInputs(2000)}, errs: map[int]error{1: errors.New("boom")}}
+	cfg := rampConfig{Steps: []int{500, 1000, 2000}, Hold: time.Second, Thresholds: defaultRPSThresholds(), StopOnTrip: true}
+	results := runRamp(context.Background(), w, &cfg)
+	require.Len(t, results, 1)
+	assert.Equal(t, verdictPass, results[0].Kind)
+}
+
+func TestRunRamp_StopsOnContextCanceledError(t *testing.T) {
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500), {}}, errs: map[int]error{1: context.Canceled}}
+	cfg := rampConfig{Steps: []int{500, 1000}, Hold: time.Second, Thresholds: defaultRPSThresholds()}
+	results := runRamp(context.Background(), w, &cfg)
+	require.Len(t, results, 1)
+}
+
+func TestRunRamp_EmptyWhenContextCancelledBeforeStart(t *testing.T) {
+	ctx, cancel := context.WithCancel(context.Background())
+	cancel()
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500)}}
+	results := runRamp(ctx, w, &rampConfig{Steps: []int{500}, Hold: time.Second, Thresholds: defaultRPSThresholds()})
+	assert.Empty(t, results)
+}
+
+func TestRunRamp_StopsWhenCooldownCancelled(t *testing.T) {
+	ctx, cancel := context.WithCancel(context.Background())
+	w := &fakeWorkload{inputs: []rpsStepInputs{passInputs(500), passInputs(1000)}, cancel: cancel, cancelOn: 0}
+	cfg := rampConfig{Steps: []int{500, 1000}, Hold: time.Second, Cooldown: time.Hour, Thresholds: defaultRPSThresholds()}
+	results := runRamp(ctx, w, &cfg)
+	require.Len(t, results, 1)
+}
+
 func TestMaxRPSExitCode(t *testing.T) {
 	pass := []rpsStepResult{{Kind: verdictPass}, {Kind: verdictTrip}}
 	none := []rpsStepResult{{Kind: verdictInconclusive}, {Kind: verdictTrip}}
@@ -108,4 +152,8 @@ func TestWaitOrCancel(t *testing.T) {
 	ctx, cancel := context.WithCancel(context.Background())
 	cancel()
 	assert.Error(t, waitOrCancel(ctx, time.Hour))
+	require.NoError(t, waitOrCancel(context.Background(), 0))
+	ctx2, cancel2 := context.WithCancel(context.Background())
+	cancel2()
+	assert.Error(t, waitOrCancel(ctx2, 0))
 }

From 1b1408c77d52e3a1e3753ac18eb9f830e189da23 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 19:00:11 +0000
Subject: [PATCH 07/16] feat(loadgen): add max-rps report renderer and CSV
 writer

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/maxrps_report.go      | 141 ++++++++++++++++++++++++++++
 tools/loadgen/maxrps_report_test.go |  87 +++++++++++++++++
 2 files changed, 228 insertions(+)
 create mode 100644 tools/loadgen/maxrps_report.go
 create mode 100644 tools/loadgen/maxrps_report_test.go

diff --git a/tools/loadgen/maxrps_report.go b/tools/loadgen/maxrps_report.go
new file mode 100644
index 000000000..fdacb503a
--- /dev/null
+++ b/tools/loadgen/maxrps_report.go
@@ -0,0 +1,141 @@
+package main
+
+import (
+	"encoding/csv"
+	"fmt"
+	"io"
+	"strconv"
+	"strings"
+	"text/tabwriter"
+)
+
+// lastPassRPS returns the largest TargetRPS whose step PASSed, or 0 if none.
+// Assumes results are in ascending step order.
+func lastPassRPS(results []rpsStepResult) int {
+	last := 0
+	for i := range results {
+		if results[i].Kind == verdictPass {
+			last = results[i].TargetRPS
+		}
+	}
+	return last
+}
+
+// firstTrip returns the first tripped step, or nil if none tripped.
+func firstTrip(results []rpsStepResult) *rpsStepResult {
+	for i := range results {
+		if results[i].Kind == verdictTrip {
+			return &results[i]
+		}
+	}
+	return nil
+}
+
+// seriesNames returns the ordered union of latency-series names across results.
+func seriesNames(results []rpsStepResult) []string {
+	var names []string
+	seen := map[string]bool{}
+	for i := range results {
+		for _, sp := range results[i].Latencies {
+			if !seen[sp.Name] {
+				seen[sp.Name] = true
+				names = append(names, sp.Name)
+			}
+		}
+	}
+	return names
+}
+
+// pctFor returns the percentiles for a named series in a result (zero if absent).
+func pctFor(r *rpsStepResult, name string) Percentiles {
+	for _, sp := range r.Latencies {
+		if sp.Name == name {
+			return sp.Pct
+		}
+	}
+	return Percentiles{}
+}
+
+// renderRPSReport writes the per-step table and the ANSWER line.
+func renderRPSReport(w io.Writer, results []rpsStepResult, workload, preset string) error {
+	fmt.Fprintf(w, "=== loadgen max-rps complete (workload=%s, preset=%s) ===\n\n", workload, preset)
+	names := seriesNames(results)
+
+	tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
+	header := []string{"target_rps", "achieved_rps"}
+	for _, n := range names {
+		header = append(header, n+" p95", n+" p99")
+	}
+	header = append(header, "err%", "worst_pending", "verdict")
+	fmt.Fprintln(tw, strings.Join(header, "\t"))
+
+	for i := range results {
+		r := &results[i]
+		row := []string{strconv.Itoa(r.TargetRPS), fmt.Sprintf("%.0f", r.AchievedRPS)}
+		for _, n := range names {
+			p := pctFor(r, n)
+			row = append(row, p.P95.String(), p.P99.String())
+		}
+		pending := "-"
+		if r.WorstDurable != "" {
+			pending = fmt.Sprintf("%s +%d", r.WorstDurable, r.WorstDelta)
+		}
+		row = append(row, fmt.Sprintf("%.3f", r.ErrorRate*100), pending, r.Kind.String())
+		fmt.Fprintln(tw, strings.Join(row, "\t"))
+	}
+	if err := tw.Flush(); err != nil {
+		return fmt.Errorf("flush table: %w", err)
+	}
+
+	fmt.Fprintln(w)
+	pass := lastPassRPS(results)
+	if pass == 0 {
+		fmt.Fprintf(w, "ANSWER: no step passed (workload=%s, preset=%s)\n", workload, preset)
+		return nil
+	}
+	fmt.Fprintf(w, "ANSWER: max RPS = %d (workload=%s, preset=%s)\n", pass, workload, preset)
+	if trip := firstTrip(results); trip != nil {
+		fmt.Fprintf(w, "        Next limit: %s\n", strings.Join(trip.Reasons, "; "))
+	}
+	return nil
+}
+
+// writeRPSCSV writes one row per step. Series percentile columns are emitted in
+// the union order of series names across all steps.
+func writeRPSCSV(w io.Writer, results []rpsStepResult) error {
+	cw := csv.NewWriter(w)
+	names := seriesNames(results)
+
+	header := []string{"target_rps", "achieved_rps"}
+	for _, n := range names {
+		header = append(header, n+"_p95_ms", n+"_p99_ms")
+	}
+	header = append(header, "error_rate", "attempted", "failed", "saturation", "worst_durable", "worst_pending_delta", "verdict", "reasons")
+	if err := cw.Write(header); err != nil {
+		return fmt.Errorf("write csv header: %w", err)
+	}
+
+	for i := range results {
+		r := &results[i]
+		row := []string{strconv.Itoa(r.TargetRPS), fmt.Sprintf("%.1f", r.AchievedRPS)}
+		for _, n := range names {
+			p := pctFor(r, n)
+			row = append(row,
+				strconv.FormatInt(p.P95.Milliseconds(), 10),
+				strconv.FormatInt(p.P99.Milliseconds(), 10))
+		}
+		row = append(row,
+			strconv.FormatFloat(r.ErrorRate, 'f', 6, 64),
+			strconv.Itoa(r.AttemptedOps), strconv.Itoa(r.FailedOps), strconv.Itoa(r.Saturation),
+			r.WorstDurable, strconv.FormatInt(r.WorstDelta, 10),
+			r.Kind.String(), strings.Join(r.Reasons, "; "))
+		if err := cw.Write(row); err != nil {
+			return fmt.Errorf("write csv row: %w", err)
+		}
+	}
+	cw.Flush()
+	if err := cw.Error(); err != nil {
+		return fmt.Errorf("flush csv: %w", err)
+	}
+	return nil
+}
diff --git a/tools/loadgen/maxrps_report_test.go b/tools/loadgen/maxrps_report_test.go
new file mode 100644
index 000000000..2a0e775df
--- /dev/null
+++ b/tools/loadgen/maxrps_report_test.go
@@ -0,0 +1,87 @@
+package main
+
+import (
+	"bytes"
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func sampleResults() []rpsStepResult {
+	return []rpsStepResult{
+		{TargetRPS: 500, AchievedRPS: 499, ErrorRate: 0, Kind: verdictPass,
+			Latencies: []seriesPercentile{{Name: "E1", Pct: Percentiles{P95: ms(20), P99: ms(40)}}}},
+		{TargetRPS: 1000, AchievedRPS: 998, ErrorRate: 0, Kind: verdictPass,
+			Latencies: []seriesPercentile{{Name: "E1", Pct: Percentiles{P95: ms(60), P99: ms(90)}}}},
+		{TargetRPS: 2000, AchievedRPS: 1900, ErrorRate: 0.02, Kind: verdictTrip,
+			WorstDurable: "broadcast-worker", WorstDelta: 1500,
+			Latencies: []seriesPercentile{{Name: "E1", Pct: Percentiles{P95: ms(160), P99: ms(300)}}},
+			Reasons:   []string{"E1 p95=160ms > 100ms", "broadcast-worker pending +1500 > +1000"}},
+	}
+}
+
+func TestRenderRPSReport_ReportsLastPass(t *testing.T) {
+	var buf bytes.Buffer
+	require.NoError(t, renderRPSReport(&buf, sampleResults(), "messages", "medium"))
+	out := buf.String()
+	assert.Contains(t, out, "ANSWER: max RPS = 1000")
+	assert.Contains(t, out, "workload=messages")
+	assert.Contains(t, out, "preset=medium")
+	assert.Contains(t, out, "Next limit:")
+	assert.Contains(t, out, "broadcast-worker pending +1500 > +1000")
+	assert.Contains(t, out, "E1 p95") // dynamic series column header
+}
+
+func TestRenderRPSReport_NoStepPassed(t *testing.T) {
+	results := []rpsStepResult{{TargetRPS: 500, Kind: verdictTrip, Reasons: []string{"E1 p95=400ms > 100ms"}}}
+	var buf bytes.Buffer
+	require.NoError(t, renderRPSReport(&buf, results, "history", "history-medium"))
+	assert.Contains(t, buf.String(), "ANSWER: no step passed")
+}
+
+func TestLastPassRPS(t *testing.T) {
+	assert.Equal(t, 1000, lastPassRPS(sampleResults()))
+	assert.Equal(t, 0, lastPassRPS([]rpsStepResult{{Kind: verdictTrip}}))
+}
+
+func TestWriteRPSCSV(t *testing.T) {
+	var buf bytes.Buffer
+	require.NoError(t, writeRPSCSV(&buf, sampleResults()))
+	lines := strings.Split(strings.TrimSpace(buf.String()), "\n")
+	require.Len(t, lines, 4) // header + 3 rows
+	assert.Contains(t, lines[0], "target_rps")
+	assert.Contains(t, lines[0], "achieved_rps")
+	assert.Contains(t, lines[0], "E1_p95_ms")
+	assert.Contains(t, lines[0], "verdict")
+	assert.Contains(t, lines[3], "2000")
+	assert.Contains(t, lines[3], "TRIP")
+}
+
+func TestFirstTrip_NoneTripped(t *testing.T) {
+	results := []rpsStepResult{
+		{Kind: verdictPass},
+		{Kind: verdictPass},
+	}
+	assert.Nil(t, firstTrip(results))
+}
+
+func TestPctFor_AbsentSeries(t *testing.T) {
+	r := &rpsStepResult{
+		Latencies: []seriesPercentile{{Name: "E1", Pct: Percentiles{P95: ms(50)}}},
+	}
+	assert.Equal(t, Percentiles{}, pctFor(r, "E2"))
+}
+
+func TestRenderRPSReport_AllPassNoTrip(t *testing.T) {
+	results := []rpsStepResult{
+		{TargetRPS: 500, AchievedRPS: 499, Kind: verdictPass,
+			Latencies: []seriesPercentile{{Name: "E1", Pct: Percentiles{P95: ms(20), P99: ms(40)}}}},
+	}
+	var buf bytes.Buffer
+	require.NoError(t, renderRPSReport(&buf, results, "messages", "medium"))
+	out := buf.String()
+	assert.Contains(t, out, "ANSWER: max RPS = 500")
+	assert.NotContains(t, out, "Next limit:")
+}

From 717e3fba02a4c95db1b46a1e854f4791a65d9b1e Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 19:14:28 +0000
Subject: [PATCH 08/16] test(loadgen): cover no-Next-limit path and
 multi-series CSV alignment

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/maxrps_report_test.go | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/loadgen/maxrps_report_test.go b/tools/loadgen/maxrps_report_test.go
index 2a0e775df..19a5e7b32 100644
--- a/tools/loadgen/maxrps_report_test.go
+++ b/tools/loadgen/maxrps_report_test.go
@@ -39,6 +39,7 @@ func TestRenderRPSReport_NoStepPassed(t *testing.T) {
 	var buf bytes.Buffer
 	require.NoError(t, renderRPSReport(&buf, results, "history", "history-medium"))
 	assert.Contains(t, buf.String(), "ANSWER: no step passed")
+	assert.NotContains(t, buf.String(), "Next limit:")
 }
 
 func TestLastPassRPS(t *testing.T) {
@@ -85,3 +86,30 @@ func TestRenderRPSReport_AllPassNoTrip(t *testing.T) {
 	assert.Contains(t, out, "ANSWER: max RPS = 500")
 	assert.NotContains(t, out, "Next limit:")
 }
+
+func TestRenderRPSReport_MultiSeriesAlignment(t *testing.T) {
+	results := []rpsStepResult{
+		{TargetRPS: 500, AchievedRPS: 500, Kind: verdictPass, Latencies: []seriesPercentile{
+			{Name: "E1", Pct: Percentiles{P95: ms(10), P99: ms(20)}},
+			{Name: "E2", Pct: Percentiles{P95: ms(30), P99: ms(40)}},
+		}},
+		{TargetRPS: 1000, AchievedRPS: 1000, Kind: verdictPass, Latencies: []seriesPercentile{
+			{Name: "E1", Pct: Percentiles{P95: ms(15), P99: ms(25)}},
+		}},
+	}
+	var buf bytes.Buffer
+	require.NoError(t, renderRPSReport(&buf, results, "messages", "medium"))
+	out := buf.String()
+	assert.Contains(t, out, "E1 p95")
+	assert.Contains(t, out, "E2 p95")
+
+	var csvBuf bytes.Buffer
+	require.NoError(t, writeRPSCSV(&csvBuf, results))
+	lines := strings.Split(strings.TrimSpace(csvBuf.String()), "\n")
+	require.Len(t, lines, 3) // header + 2 rows
+	assert.Contains(t, lines[0], "E1_p95_ms,E1_p99_ms,E2_p95_ms,E2_p99_ms")
+	cols := strings.Split(lines[2], ",") // step 1 row; E2 columns must be zero-filled
+	require.GreaterOrEqual(t, len(cols), 6)
+	assert.Equal(t, "0", cols[4]) // E2_p95_ms
+	assert.Equal(t, "0", cols[5]) // E2_p99_ms
+}

From fa3d70eff3838e1d76ab84fd9310a46b1a64ef88 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 19:19:19 +0000
Subject: [PATCH 09/16] feat(loadgen): add Collector.Reset for per-step ramp
 windows

---
 tools/loadgen/collector.go      | 12 ++++++++++++
 tools/loadgen/collector_test.go | 22 ++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/tools/loadgen/collector.go b/tools/loadgen/collector.go
index 98a3a88bc..d2d44776d 100644
--- a/tools/loadgen/collector.go
+++ b/tools/loadgen/collector.go
@@ -35,6 +35,18 @@ func NewCollector(m *Metrics, preset string) *Collector {
 	}
 }
 
+// Reset clears all correlation state and accumulated samples. Used by the
+// max-rps ramp to start each step's hold window from a clean slate while the
+// E1/E2 subscriptions (which hold this *Collector pointer) stay alive.
+func (c *Collector) Reset() {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+	c.byReqID = make(map[string]publishEntry)
+	c.byMsgID = make(map[string]publishEntry)
+	c.e1 = nil
+	c.e2 = nil
+}
+
 // RecordPublish stores the publish time under both correlation keys.
 func (c *Collector) RecordPublish(requestID, messageID string, t time.Time) {
 	c.mu.Lock()
diff --git a/tools/loadgen/collector_test.go b/tools/loadgen/collector_test.go
index 86ae5301e..1c2f98ec3 100644
--- a/tools/loadgen/collector_test.go
+++ b/tools/loadgen/collector_test.go
@@ -168,3 +168,25 @@ func TestCollector_RecordPublishBroadcastOnly_FinalizeNoMissingReplies(t *testin
 	assert.Equal(t, 0, missingReplies, "canonical mode should never produce missing replies")
 	assert.Equal(t, 1, missingBroadcasts)
 }
+
+func TestCollector_Reset(t *testing.T) {
+	c := NewCollector(NewMetrics(), "test")
+	now := time.Now()
+	c.RecordPublish("req-1", "msg-1", now)
+	c.RecordReply("req-1", now.Add(10*time.Millisecond))
+	c.RecordBroadcast("msg-1", now.Add(20*time.Millisecond))
+	require.Equal(t, 1, c.E1Count())
+	require.Equal(t, 1, c.E2Count())
+
+	c.Reset()
+
+	assert.Equal(t, 0, c.E1Count())
+	assert.Equal(t, 0, c.E2Count())
+	mr, mb := c.Finalize()
+	assert.Equal(t, 0, mr)
+	assert.Equal(t, 0, mb)
+	// After reset, a fresh publish+reply correlates normally.
+	c.RecordPublish("req-2", "msg-2", now)
+	c.RecordReply("req-2", now.Add(5*time.Millisecond))
+	assert.Equal(t, 1, c.E1Count())
+}

From a9da8d2512bcdc8ff9098bed40ff5a57554e1afd Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 19:33:02 +0000
Subject: [PATCH 10/16] feat(loadgen): add messages workload adapter for
 max-rps

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/maxrps_messages.go      | 247 ++++++++++++++++++++++++++
 tools/loadgen/maxrps_messages_test.go |  56 ++++++
 2 files changed, 303 insertions(+)
 create mode 100644 tools/loadgen/maxrps_messages.go
 create mode 100644 tools/loadgen/maxrps_messages_test.go

diff --git a/tools/loadgen/maxrps_messages.go b/tools/loadgen/maxrps_messages.go
new file mode 100644
index 000000000..81afc31f5
--- /dev/null
+++ b/tools/loadgen/maxrps_messages.go
@@ -0,0 +1,247 @@
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"errors"
+	"fmt"
+	"log/slog"
+	"net/http"
+	"sync"
+	"time"
+
+	"github.com/nats-io/nats.go"
+	"github.com/nats-io/nats.go/jetstream"
+
+	"github.com/hmchangw/chat/pkg/natsutil"
+	"github.com/hmchangw/chat/pkg/stream"
+	"github.com/hmchangw/chat/pkg/subject"
+)
+
+// msgCounters is a point-in-time snapshot of the loadgen publish counters.
+type msgCounters struct {
+	published float64
+	err       map[string]float64 // keyed by reason
+}
+
+var msgErrorReasons = []string{"publish", "marshal", "gatekeeper", "bad_reply", "saturated"}
+
+// diffCounters returns end-start for published and each tracked reason.
+func diffCounters(start, end msgCounters) msgCounters {
+	d := msgCounters{published: end.published - start.published, err: map[string]float64{}}
+	for _, r := range msgErrorReasons {
+		d.err[r] = end.err[r] - start.err[r]
+	}
+	return d
+}
+
+// buildMessagesInputs assembles the normalized step inputs from a counter delta,
+// the hold-window latency tapes, and the pending snapshots.
+//
+// Error accounting (see spec §5): FailedOps counts hard publish/gatekeeper errors
+// only; missing replies/broadcasts are NOT counted (late stragglers would create
+// false trips) — slow/dropped delivery is caught by latency and pending-growth.
+func buildMessagesInputs(
+	targetRPS int, hold time.Duration, delta msgCounters,
+	e1, e2 []time.Duration,
+	startPending, endPending map[string]uint64,
+	durables []string, pendingOK bool,
+) rpsStepInputs {
+	attempted := int(delta.published + delta.err["publish"] + delta.err["marshal"])
+	failed := int(delta.err["publish"] + delta.err["marshal"] + delta.err["gatekeeper"] + delta.err["bad_reply"])
+	in := rpsStepInputs{
+		TargetRPS:    targetRPS,
+		Hold:         hold,
+		AttemptedOps: attempted,
+		FailedOps:    failed,
+		Saturation:   int(delta.err["saturated"]),
+		Latencies: []seriesSamples{
+			{Name: "E1", Samples: e1},
+			{Name: "E2", Samples: e2},
+		},
+	}
+	if !pendingOK {
+		in.Inconclusive = true
+		in.InconclusiveReason = "consumer pending snapshot failed — backlog signal unavailable"
+		return in
+	}
+	for _, d := range durables {
+		in.Pending = append(in.Pending, consumerPendingDelta{Durable: d, Start: startPending[d], End: endPending[d]})
+	}
+	return in
+}
+
+// messagesWorkload drives the messaging pipeline at a given RPS.
+// The natsutil connection and metrics server are not stored on the struct
+// (natsutil.Connect returns *otelnats.Conn); they are captured by the cleanup
+// closure instead, so the adapter only keeps what RunStep needs.
+type messagesWorkload struct {
+	cfg       *config
+	preset    *Preset
+	fixtures  Fixtures
+	inject    InjectMode
+	seed      int64
+	js        jetstream.JetStream
+	metrics   *Metrics
+	collector *Collector
+	publisher Publisher
+	canonical string
+	durables  []string
+}
+
+func (w *messagesWorkload) Label() string { return "messages" }
+
+// newMessagesWorkload wires NATS, the metrics server, the E1/E2 subscriptions,
+// and the publisher. The returned cleanup unsubscribes, shuts the metrics server
+// and drains NATS.
+func newMessagesWorkload(ctx context.Context, cfg *config, preset *Preset, inject InjectMode, seed int64) (*messagesWorkload, func(), error) {
+	nc, err := natsutil.Connect(cfg.NatsURL, cfg.NatsCredsFile)
+	if err != nil {
+		return nil, nil, fmt.Errorf("nats connect: %w", err)
+	}
+	js, err := jetstream.New(nc.NatsConn())
+	if err != nil {
+		_ = nc.Drain()
+		return nil, nil, fmt.Errorf("jetstream init: %w", err)
+	}
+	metrics := NewMetrics()
+	srv := &http.Server{Addr: cfg.MetricsAddr, Handler: metrics.Handler(), ReadHeaderTimeout: 5 * time.Second}
+	go func() {
+		if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
+			slog.Warn("metrics server stopped", "error", err)
+		}
+	}()
+
+	collector := NewCollector(metrics, preset.Name)
+
+	e1Sub, err := nc.NatsConn().Subscribe(subject.UserResponseWildcard(), func(msg *nats.Msg) {
+		reqID := lastToken(msg.Subject)
+		var payload struct {
+			Error string `json:"error"`
+		}
+		if err := json.Unmarshal(msg.Data, &payload); err != nil {
+			metrics.PublishErrors.WithLabelValues(preset.Name, "bad_reply").Inc()
+			return
+		}
+		if payload.Error != "" {
+			metrics.PublishErrors.WithLabelValues(preset.Name, "gatekeeper").Inc()
+		}
+		collector.RecordReply(reqID, time.Now())
+	})
+	if err != nil {
+		_ = nc.Drain()
+		return nil, nil, fmt.Errorf("subscribe e1: %w", err)
+	}
+	e2Handler := newE2Handler(collector)
+	e2Sub, err := nc.NatsConn().Subscribe(subject.RoomEventWildcard(), e2Handler)
+	if err != nil {
+		_ = e1Sub.Unsubscribe()
+		_ = nc.Drain()
+		return nil, nil, fmt.Errorf("subscribe e2: %w", err)
+	}
+	e2DMSub, err := nc.NatsConn().Subscribe(subject.UserRoomEventWildcard(), e2Handler)
+	if err != nil {
+		_ = e1Sub.Unsubscribe()
+		_ = e2Sub.Unsubscribe()
+		_ = nc.Drain()
+		return nil, nil, fmt.Errorf("subscribe e2 dm: %w", err)
+	}
+
+	w := &messagesWorkload{
+		cfg: cfg, preset: preset, fixtures: BuildFixtures(preset, seed, cfg.SiteID),
+		inject: inject, seed: seed, js: js, metrics: metrics, collector: collector,
+		publisher: newNatsCorePublisher(nc.NatsConn(), inject, js),
+		canonical: stream.MessagesCanonical(cfg.SiteID).Name,
+		durables:  []string{"message-worker", "broadcast-worker"},
+	}
+	cleanup := func() {
+		_ = e1Sub.Unsubscribe()
+		_ = e2Sub.Unsubscribe()
+		_ = e2DMSub.Unsubscribe()
+		shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		_ = srv.Shutdown(shutCtx)
+		cancel()
+		_ = nc.Drain()
+	}
+	return w, cleanup, nil
+}
+
+func (w *messagesWorkload) snapshotCounters() msgCounters {
+	mfs, _ := w.metrics.Registry.Gather()
+	c := msgCounters{
+		published: gatheredCounterValue(mfs, "loadgen_published_total", "", ""),
+		err:       map[string]float64{},
+	}
+	for _, reason := range msgErrorReasons {
+		c.err[reason] = gatheredCounterValue(mfs, "loadgen_publish_errors_total", "reason", reason)
+	}
+	return c
+}
+
+func (w *messagesWorkload) snapshotPending(ctx context.Context) (map[string]uint64, error) {
+	out := map[string]uint64{}
+	for _, d := range w.durables {
+		cons, err := w.js.Consumer(ctx, w.canonical, d)
+		if err != nil {
+			return nil, fmt.Errorf("consumer %s: %w", d, err)
+		}
+		info, err := cons.Info(ctx)
+		if err != nil {
+			return nil, fmt.Errorf("consumer info %s: %w", d, err)
+		}
+		out[d] = info.NumPending
+	}
+	return out, nil
+}
+
+// RunStep runs a fresh generator at targetRPS for warmup+hold, resetting the
+// collector at the hold boundary so only the hold window is measured.
+func (w *messagesWorkload) RunStep(ctx context.Context, targetRPS int, warmup, hold time.Duration) (rpsStepInputs, error) {
+	gen := NewGenerator(&GeneratorConfig{
+		Preset: w.preset, Fixtures: w.fixtures, SiteID: w.cfg.SiteID,
+		Rate: targetRPS, Inject: w.inject, Publisher: w.publisher,
+		Metrics: w.metrics, Collector: w.collector,
+		WarmupDeadline: time.Now().Add(warmup), MaxInFlight: w.cfg.MaxInFlight,
+	}, w.seed)
+
+	genCtx, cancel := context.WithCancel(ctx)
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		_ = gen.Run(genCtx)
+	}()
+
+	if err := waitOrCancel(ctx, warmup); err != nil {
+		cancel()
+		wg.Wait()
+		return rpsStepInputs{}, err
+	}
+
+	holdStart := time.Now()
+	w.collector.Reset()
+	startCounts := w.snapshotCounters()
+	startPending, perr1 := w.snapshotPending(ctx)
+
+	holdErr := waitOrCancel(ctx, hold)
+
+	endCounts := w.snapshotCounters()
+	endPending, perr2 := w.snapshotPending(ctx)
+	cancel()
+	wg.Wait()
+	time.Sleep(2 * time.Second) // drain trailing replies/broadcasts
+	w.collector.DiscardBefore(holdStart)
+
+	if holdErr != nil {
+		return rpsStepInputs{}, holdErr
+	}
+
+	delta := diffCounters(startCounts, endCounts)
+	pendingOK := perr1 == nil && perr2 == nil
+	if !pendingOK {
+		slog.Warn("pending snapshot failed", "start_err", perr1, "end_err", perr2)
+	}
+	return buildMessagesInputs(targetRPS, hold, delta,
+		w.collector.E1Samples(), w.collector.E2Samples(),
+		startPending, endPending, w.durables, pendingOK), nil
+}
diff --git a/tools/loadgen/maxrps_messages_test.go b/tools/loadgen/maxrps_messages_test.go
new file mode 100644
index 000000000..a7cdb0d4a
--- /dev/null
+++ b/tools/loadgen/maxrps_messages_test.go
@@ -0,0 +1,56 @@
+package main
+
+import (
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+)
+
+// Compile-time checks: messagesWorkload satisfies rpsWorkload; constructor exists.
+var _ rpsWorkload = (*messagesWorkload)(nil)
+var _ = newMessagesWorkload
+
+func TestDiffCounters(t *testing.T) {
+	start := msgCounters{published: 100, err: map[string]float64{"publish": 1, "saturated": 5}}
+	end := msgCounters{published: 1100, err: map[string]float64{"publish": 3, "saturated": 9}}
+	d := diffCounters(start, end)
+	assert.Equal(t, float64(1000), d.published)
+	assert.Equal(t, float64(2), d.err["publish"])
+	assert.Equal(t, float64(4), d.err["saturated"])
+}
+
+func TestBuildMessagesInputs(t *testing.T) {
+	delta := msgCounters{
+		published: 980,
+		err:       map[string]float64{"publish": 10, "marshal": 0, "gatekeeper": 5, "bad_reply": 0, "saturated": 7},
+	}
+	e1 := nLatencies(50, ms(15))
+	e2 := nLatencies(50, ms(30))
+	pending := map[string]uint64{"message-worker": 12, "broadcast-worker": 40}
+	startPending := map[string]uint64{"message-worker": 2, "broadcast-worker": 5}
+	durables := []string{"message-worker", "broadcast-worker"}
+
+	in := buildMessagesInputs(1000, 10*time.Second, delta, e1, e2, startPending, pending, durables, true)
+
+	// AttemptedOps = published 980 + publish_err 10 + marshal_err 0 = 990
+	assert.Equal(t, 990, in.AttemptedOps)
+	// FailedOps = publish_err 10 + marshal_err 0 + gatekeeper 5 + bad_reply 0 = 15
+	assert.Equal(t, 15, in.FailedOps)
+	assert.Equal(t, 7, in.Saturation)
+	assert.Len(t, in.Latencies, 2)
+	assert.Equal(t, "E1", in.Latencies[0].Name)
+	assert.Equal(t, "E2", in.Latencies[1].Name)
+	assert.Len(t, in.Pending, 2)
+	assert.Equal(t, uint64(2), in.Pending[0].Start)
+	assert.Equal(t, uint64(12), in.Pending[0].End)
+	assert.False(t, in.Inconclusive)
+}
+
+func TestBuildMessagesInputs_PendingUnavailableIsInconclusive(t *testing.T) {
+	delta := msgCounters{published: 1000, err: map[string]float64{}}
+	in := buildMessagesInputs(1000, time.Second, delta, nil, nil, nil, nil, []string{"message-worker"}, false)
+	assert.True(t, in.Inconclusive)
+	assert.Contains(t, in.InconclusiveReason, "pending")
+	assert.Empty(t, in.Pending)
+}

From 893a6275a47ff5fd2acf1804ea88e72f89e482b5 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 19:51:17 +0000
Subject: [PATCH 11/16] =?UTF-8?q?fix(loadgen):=20address=20Task=205=20revi?=
 =?UTF-8?q?ew=20=E2=80=94=20shut=20metrics=20server=20on=20constructor=20e?=
 =?UTF-8?q?rror,=20log=20gather=20error?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Add shutdownSrv helper and call it on all three subscription-error return
  paths so the metrics HTTP server is not leaked when construction fails.
- Log the Gather() error in snapshotCounters instead of silently discarding it.
- Strengthen TestDiffCounters to assert zero-delta for marshal/gatekeeper/bad_reply.

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/maxrps_messages.go      | 13 ++++++++++++-
 tools/loadgen/maxrps_messages_test.go |  3 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/loadgen/maxrps_messages.go b/tools/loadgen/maxrps_messages.go
index 81afc31f5..b5f8c24e0 100644
--- a/tools/loadgen/maxrps_messages.go
+++ b/tools/loadgen/maxrps_messages.go
@@ -111,6 +111,11 @@ func newMessagesWorkload(ctx context.Context, cfg *config, preset *Preset, injec
 			slog.Warn("metrics server stopped", "error", err)
 		}
 	}()
+	shutdownSrv := func() {
+		shutCtx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+		defer cancel()
+		_ = srv.Shutdown(shutCtx)
+	}
 
 	collector := NewCollector(metrics, preset.Name)
 
@@ -129,18 +134,21 @@ func newMessagesWorkload(ctx context.Context, cfg *config, preset *Preset, injec
 		collector.RecordReply(reqID, time.Now())
 	})
 	if err != nil {
+		shutdownSrv()
 		_ = nc.Drain()
 		return nil, nil, fmt.Errorf("subscribe e1: %w", err)
 	}
 	e2Handler := newE2Handler(collector)
 	e2Sub, err := nc.NatsConn().Subscribe(subject.RoomEventWildcard(), e2Handler)
 	if err != nil {
+		shutdownSrv()
 		_ = e1Sub.Unsubscribe()
 		_ = nc.Drain()
 		return nil, nil, fmt.Errorf("subscribe e2: %w", err)
 	}
 	e2DMSub, err := nc.NatsConn().Subscribe(subject.UserRoomEventWildcard(), e2Handler)
 	if err != nil {
+		shutdownSrv()
 		_ = e1Sub.Unsubscribe()
 		_ = e2Sub.Unsubscribe()
 		_ = nc.Drain()
@@ -167,7 +175,10 @@ func newMessagesWorkload(ctx context.Context, cfg *config, preset *Preset, injec
 }
 
 func (w *messagesWorkload) snapshotCounters() msgCounters {
-	mfs, _ := w.metrics.Registry.Gather()
+	mfs, err := w.metrics.Registry.Gather()
+	if err != nil {
+		slog.Warn("metrics gather", "error", err)
+	}
 	c := msgCounters{
 		published: gatheredCounterValue(mfs, "loadgen_published_total", "", ""),
 		err:       map[string]float64{},
diff --git a/tools/loadgen/maxrps_messages_test.go b/tools/loadgen/maxrps_messages_test.go
index a7cdb0d4a..882a938a4 100644
--- a/tools/loadgen/maxrps_messages_test.go
+++ b/tools/loadgen/maxrps_messages_test.go
@@ -18,6 +18,9 @@ func TestDiffCounters(t *testing.T) {
 	assert.Equal(t, float64(1000), d.published)
 	assert.Equal(t, float64(2), d.err["publish"])
 	assert.Equal(t, float64(4), d.err["saturated"])
+	assert.Equal(t, float64(0), d.err["marshal"])
+	assert.Equal(t, float64(0), d.err["gatekeeper"])
+	assert.Equal(t, float64(0), d.err["bad_reply"])
 }
 
 func TestBuildMessagesInputs(t *testing.T) {

From 6ff0fc330a979ebd4e997980ea025e77b2e16a9f Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 19:56:45 +0000
Subject: [PATCH 12/16] feat(loadgen): add history workload adapter for max-rps

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/maxrps_history.go      | 142 +++++++++++++++++++++++++++
 tools/loadgen/maxrps_history_test.go |  40 ++++++++
 2 files changed, 182 insertions(+)
 create mode 100644 tools/loadgen/maxrps_history.go
 create mode 100644 tools/loadgen/maxrps_history_test.go

diff --git a/tools/loadgen/maxrps_history.go b/tools/loadgen/maxrps_history.go
new file mode 100644
index 000000000..caed82d63
--- /dev/null
+++ b/tools/loadgen/maxrps_history.go
@@ -0,0 +1,142 @@
+package main
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"log/slog"
+	"net/http"
+	"sync"
+	"time"
+
+	"github.com/hmchangw/chat/pkg/natsutil"
+)
+
+// latenciesOf extracts the latency tape from a sample slice.
+func latenciesOf(samples []HistorySample) []time.Duration {
+	out := make([]time.Duration, len(samples))
+	for i := range samples {
+		out[i] = samples[i].Latency
+	}
+	return out
+}
+
+// buildHistoryInputs assembles normalized step inputs from a (hold-only) history
+// collector. Per-endpoint latency series gate independently; no consumer queue
+// exists for synchronous reads so Pending is empty.
+func buildHistoryInputs(targetRPS int, hold time.Duration, c *HistoryCollector) rpsStepInputs {
+	hist := c.HistorySamples()
+	thread := c.ThreadSamples()
+	failed := c.TimeoutErrors() + c.ReplyErrors() + c.BadReplyCount()
+	attempted := len(hist) + len(thread) + failed
+	return rpsStepInputs{
+		TargetRPS:    targetRPS,
+		Hold:         hold,
+		AttemptedOps: attempted,
+		FailedOps:    failed,
+		Saturation:   c.SaturationCount(),
+		Latencies: []seriesSamples{
+			{Name: "history", Samples: latenciesOf(hist)},
+			{Name: "thread", Samples: latenciesOf(thread)},
+		},
+	}
+}
+
+// historyWorkload drives history-service read requests at a given RPS.
+// As with messagesWorkload, the natsutil connection (*otelnats.Conn) and metrics
+// server are captured by the cleanup closure, not stored on the struct.
+type historyWorkload struct {
+	cfg             *config
+	preset          *HistoryPreset
+	fixtures        HistoryFixtures
+	seed            int64
+	mix             EndpointMix
+	beforeMode      BeforeMode
+	scrollbackPages int
+	pageLimit       int
+	requestTimeout  time.Duration
+	metrics         *Metrics
+	requester       HistoryRequester
+}
+
+func (w *historyWorkload) Label() string { return "history" }
+
+// historyWorkloadParams bundles the history-specific tunables.
+type historyWorkloadParams struct {
+	Mix             EndpointMix
+	BeforeMode      BeforeMode
+	ScrollbackPages int
+	PageLimit       int
+	RequestTimeout  time.Duration
+}
+
+func newHistoryWorkload(ctx context.Context, cfg *config, preset *HistoryPreset, seed int64, p historyWorkloadParams) (*historyWorkload, func(), error) {
+	if cfg.CassandraHosts == "" {
+		return nil, nil, fmt.Errorf("history workload requires CASSANDRA_HOSTS")
+	}
+	nc, err := natsutil.Connect(cfg.NatsURL, cfg.NatsCredsFile)
+	if err != nil {
+		return nil, nil, fmt.Errorf("nats connect: %w", err)
+	}
+	metrics := NewMetrics()
+	srv := &http.Server{Addr: cfg.MetricsAddr, Handler: metrics.Handler(), ReadHeaderTimeout: 5 * time.Second}
+	go func() {
+		if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
+			slog.Warn("metrics server stopped", "error", err)
+		}
+	}()
+	w := &historyWorkload{
+		cfg: cfg, preset: preset, fixtures: BuildHistoryFixtures(preset, seed, cfg.SiteID, time.Now().UTC()),
+		seed: seed, mix: p.Mix, beforeMode: p.BeforeMode, scrollbackPages: p.ScrollbackPages,
+		pageLimit: p.PageLimit, requestTimeout: p.RequestTimeout,
+		metrics: metrics, requester: newNATSHistoryRequester(nc.NatsConn()),
+	}
+	cleanup := func() {
+		shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		_ = srv.Shutdown(shutCtx)
+		cancel()
+		_ = nc.Drain()
+	}
+	return w, cleanup, nil
+}
+
+func (w *historyWorkload) newGenerator(collector *HistoryCollector, targetRPS int) *HistoryGenerator {
+	return NewHistoryGenerator(&HistoryGeneratorConfig{
+		Preset: w.preset, Fixtures: &w.fixtures, SiteID: w.cfg.SiteID, Rate: targetRPS,
+		Mix: w.mix, BeforeMode: w.beforeMode, ScrollbackPages: w.scrollbackPages,
+		PageLimit: w.pageLimit, RequestTimeout: w.requestTimeout,
+		Requester: w.requester, Collector: collector, MaxInFlight: w.cfg.MaxInFlight,
+	}, w.seed)
+}
+
+// runFor runs gen.Run in a goroutine for d (or until ctx cancels), then stops it.
+func runFor(ctx context.Context, gen *HistoryGenerator, d time.Duration) error {
+	genCtx, cancel := context.WithCancel(ctx)
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		_ = gen.Run(genCtx)
+	}()
+	err := waitOrCancel(ctx, d)
+	cancel()
+	wg.Wait()
+	return err
+}
+
+// RunStep runs warmup (discarded) then hold (measured) as two sequential
+// generator runs so the hold collector contains only hold-window data.
+func (w *historyWorkload) RunStep(ctx context.Context, targetRPS int, warmup, hold time.Duration) (rpsStepInputs, error) {
+	if warmup > 0 {
+		warmCollector := NewHistoryCollector()
+		if err := runFor(ctx, w.newGenerator(warmCollector, targetRPS), warmup); err != nil {
+			return rpsStepInputs{}, err
+		}
+	}
+	collector := NewHistoryCollector()
+	if err := runFor(ctx, w.newGenerator(collector, targetRPS), hold); err != nil {
+		return rpsStepInputs{}, err
+	}
+	time.Sleep(2 * time.Second) // drain trailing in-flight replies into the collector
+	return buildHistoryInputs(targetRPS, hold, collector), nil
+}
diff --git a/tools/loadgen/maxrps_history_test.go b/tools/loadgen/maxrps_history_test.go
new file mode 100644
index 000000000..f3bc1697b
--- /dev/null
+++ b/tools/loadgen/maxrps_history_test.go
@@ -0,0 +1,40 @@
+package main
+
+import (
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+)
+
+// Compile-time checks: historyWorkload satisfies rpsWorkload; constructor exists.
+var _ rpsWorkload = (*historyWorkload)(nil)
+var _ = newHistoryWorkload
+
+func TestBuildHistoryInputs(t *testing.T) {
+	c := NewHistoryCollector()
+	now := time.Now()
+	for i := 0; i < 40; i++ {
+		c.RecordSample(HistorySample{Endpoint: HistoryEndpointHistory, Latency: ms(15), At: now})
+	}
+	for i := 0; i < 10; i++ {
+		c.RecordSample(HistorySample{Endpoint: HistoryEndpointThread, Latency: ms(25), At: now})
+	}
+	c.RecordError(HistoryEndpointHistory, errClassTimeout, 0)
+	c.RecordError(HistoryEndpointThread, errClassReply, 0)
+	c.RecordSaturation()
+	c.RecordSaturation()
+
+	in := buildHistoryInputs(2000, 30*time.Second, c)
+
+	// attempted = 40 + 10 history/thread samples + 2 errors (timeout+reply)
+	assert.Equal(t, 52, in.AttemptedOps)
+	assert.Equal(t, 2, in.FailedOps)
+	assert.Equal(t, 2, in.Saturation)
+	assert.Len(t, in.Latencies, 2)
+	assert.Equal(t, "history", in.Latencies[0].Name)
+	assert.Equal(t, "thread", in.Latencies[1].Name)
+	assert.Len(t, in.Latencies[0].Samples, 40)
+	assert.Len(t, in.Latencies[1].Samples, 10)
+	assert.Empty(t, in.Pending) // history has no consumer queue
+}

From 38bd876207e7041b89e8b2c11263589df3182962 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 20:09:33 +0000
Subject: [PATCH 13/16] feat(loadgen): wire max-rps subcommand into dispatch

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/main.go                 |   4 +-
 tools/loadgen/maxrps.go               | 143 ++++++++++++++++++++++++++
 tools/loadgen/maxrps_history_test.go  |   3 +-
 tools/loadgen/maxrps_messages_test.go |   3 +-
 tools/loadgen/maxrps_test.go          |  28 +++++
 5 files changed, 176 insertions(+), 5 deletions(-)
 create mode 100644 tools/loadgen/maxrps.go
 create mode 100644 tools/loadgen/maxrps_test.go

diff --git a/tools/loadgen/main.go b/tools/loadgen/main.go
index e84ca2c86..b9a639185 100644
--- a/tools/loadgen/main.go
+++ b/tools/loadgen/main.go
@@ -56,7 +56,7 @@ type config struct {
 func main() {
 	slog.SetDefault(slog.New(slog.NewJSONHandler(os.Stdout, nil)))
 	if len(os.Args) < 2 {
-		fmt.Fprintln(os.Stderr, "usage: loadgen <seed|run|teardown> [flags]")
+		fmt.Fprintln(os.Stderr, "usage: loadgen <seed|run|teardown|members-sustained|members-capacity|history-sustained|max-rps> [flags]")
 		os.Exit(2)
 	}
 	cfg, err := env.ParseAs[config]()
@@ -93,6 +93,8 @@ func dispatch(ctx context.Context, cfg *config) int {
 		return runMembersCapacity(ctx, cfg, os.Args[2:])
 	case "history-sustained":
 		return runHistorySustained(ctx, cfg, os.Args[2:])
+	case "max-rps":
+		return runMaxRPS(ctx, cfg, os.Args[2:])
 	default:
 		fmt.Fprintf(os.Stderr, "unknown subcommand: %s\n", os.Args[1])
 		return 2
diff --git a/tools/loadgen/maxrps.go b/tools/loadgen/maxrps.go
new file mode 100644
index 000000000..7bc4539eb
--- /dev/null
+++ b/tools/loadgen/maxrps.go
@@ -0,0 +1,143 @@
+package main
+
+import (
+	"context"
+	"flag"
+	"fmt"
+	"log/slog"
+	"os"
+	"time"
+)
+
+func defaultSteps(workload string) string {
+	if workload == "history" {
+		return "200,500,1000,2000,5000"
+	}
+	return "500,1000,2000,5000,10000"
+}
+
+func buildThresholds(p95, p99 time.Duration, errRate float64, pendingGrowth uint64, rateTol float64) rpsThresholds {
+	return rpsThresholds{P95: p95, P99: p99, ErrorRate: errRate, PendingGrowth: pendingGrowth, RateTolerance: rateTol}
+}
+
+// runMaxRPS parses flags, builds the workload adapter, runs the ramp and prints
+// the report. Returns the process exit code.
+func runMaxRPS(ctx context.Context, cfg *config, args []string) int {
+	fs := flag.NewFlagSet("max-rps", flag.ExitOnError)
+	workload := fs.String("workload", "messages", "messages|history")
+	preset := fs.String("preset", "", "preset name")
+	seed := fs.Int64("seed", 42, "RNG seed")
+	stepsFlag := fs.String("steps", "", "ascending RPS list, e.g. 500,1k,2k,5k,10k (default depends on workload)")
+	warmup := fs.Duration("warmup", 10*time.Second, "per-step warmup (samples discarded)")
+	hold := fs.Duration("hold", 30*time.Second, "per-step measurement window")
+	cooldown := fs.Duration("cooldown", 5*time.Second, "per-step settle gap")
+	sloP95 := fs.Duration("slo-p95", 100*time.Millisecond, "p95 latency SLO (all gated series)")
+	sloP99 := fs.Duration("slo-p99", 250*time.Millisecond, "p99 latency SLO (all gated series)")
+	sloErr := fs.Float64("slo-error-rate", 0.001, "max error rate (failed/attempted)")
+	sloPending := fs.Uint64("slo-pending-growth", 1000, "max per-durable pending growth (messages only)")
+	rateTol := fs.Float64("rate-tolerance", 0.05, "achieved-vs-target shortfall band for INCONCLUSIVE")
+	stopOnTrip := fs.Bool("stop-on-trip", true, "stop the ramp at the first TRIP")
+	inject := fs.String("inject", "frontdoor", "messages only: frontdoor|canonical")
+	// history-only tunables (ignored for messages):
+	mixFlag := fs.String("mix", "history:80,thread:20", "history only: endpoint mix")
+	beforeModeFlag := fs.String("before-mode", "open:70,scrollback:30", "history only: before-cursor mix")
+	scrollbackPages := fs.Int("scrollback-pages", 5, "history only: pages per scrollback chain")
+	pageLimit := fs.Int("page-limit", 20, "history only: page limit")
+	requestTimeout := fs.Duration("request-timeout", 5*time.Second, "history only: per-request timeout")
+	csvPath := fs.String("csv", "", "optional CSV output path")
+	_ = fs.Parse(args)
+
+	if *preset == "" {
+		fmt.Fprintln(os.Stderr, "--preset required")
+		return 2
+	}
+	stepsStr := *stepsFlag
+	if stepsStr == "" {
+		stepsStr = defaultSteps(*workload)
+	}
+	steps, err := parseRPSSteps(stepsStr)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "bad --steps: %v\n", err)
+		return 2
+	}
+	thresholds := buildThresholds(*sloP95, *sloP99, *sloErr, *sloPending, *rateTol)
+
+	var (
+		w        rpsWorkload
+		cleanup  func()
+		presetID string
+	)
+	switch *workload {
+	case "messages":
+		p, ok := BuiltinPreset(*preset)
+		if !ok {
+			fmt.Fprintf(os.Stderr, "unknown preset: %s\n", *preset)
+			return 2
+		}
+		injectMode, err := ParseInjectMode(*inject)
+		if err != nil {
+			fmt.Fprintln(os.Stderr, err.Error())
+			return 2
+		}
+		mw, clean, err := newMessagesWorkload(ctx, cfg, &p, injectMode, *seed)
+		if err != nil {
+			slog.Error("init messages workload", "error", err)
+			return 1
+		}
+		w, cleanup, presetID = mw, clean, p.Name
+	case "history":
+		p, ok := BuiltinHistoryPreset(*preset)
+		if !ok {
+			fmt.Fprintf(os.Stderr, "unknown history preset: %s\n", *preset)
+			return 2
+		}
+		mix, err := ParseEndpointMix(*mixFlag)
+		if err != nil {
+			fmt.Fprintln(os.Stderr, err.Error())
+			return 2
+		}
+		beforeMode, err := ParseBeforeMode(*beforeModeFlag)
+		if err != nil {
+			fmt.Fprintln(os.Stderr, err.Error())
+			return 2
+		}
+		if *scrollbackPages <= 0 {
+			fmt.Fprintln(os.Stderr, "--scrollback-pages must be > 0")
+			return 2
+		}
+		hw, clean, err := newHistoryWorkload(ctx, cfg, &p, *seed, historyWorkloadParams{
+			Mix: mix, BeforeMode: beforeMode, ScrollbackPages: *scrollbackPages,
+			PageLimit: *pageLimit, RequestTimeout: *requestTimeout,
+		})
+		if err != nil {
+			slog.Error("init history workload", "error", err)
+			return 1
+		}
+		w, cleanup, presetID = hw, clean, p.Name
+	default:
+		fmt.Fprintf(os.Stderr, "unknown workload: %s\n", *workload)
+		return 2
+	}
+	defer cleanup()
+
+	results := runRamp(ctx, w, &rampConfig{
+		Steps: steps, Warmup: *warmup, Hold: *hold, Cooldown: *cooldown,
+		Thresholds: thresholds, StopOnTrip: *stopOnTrip,
+	})
+
+	if err := renderRPSReport(os.Stdout, results, w.Label(), presetID); err != nil {
+		slog.Warn("render report", "error", err)
+	}
+	if *csvPath != "" {
+		f, err := os.Create(*csvPath)
+		if err != nil {
+			slog.Error("create csv", "error", err)
+		} else {
+			if err := writeRPSCSV(f, results); err != nil {
+				slog.Error("write csv", "error", err)
+			}
+			_ = f.Close()
+		}
+	}
+	return maxRPSExitCode(results)
+}
diff --git a/tools/loadgen/maxrps_history_test.go b/tools/loadgen/maxrps_history_test.go
index f3bc1697b..88c46df46 100644
--- a/tools/loadgen/maxrps_history_test.go
+++ b/tools/loadgen/maxrps_history_test.go
@@ -7,9 +7,8 @@ import (
 	"github.com/stretchr/testify/assert"
 )
 
-// Compile-time checks: historyWorkload satisfies rpsWorkload; constructor exists.
+// Compile-time check: historyWorkload satisfies rpsWorkload.
 var _ rpsWorkload = (*historyWorkload)(nil)
-var _ = newHistoryWorkload
 
 func TestBuildHistoryInputs(t *testing.T) {
 	c := NewHistoryCollector()
diff --git a/tools/loadgen/maxrps_messages_test.go b/tools/loadgen/maxrps_messages_test.go
index 882a938a4..af670cba5 100644
--- a/tools/loadgen/maxrps_messages_test.go
+++ b/tools/loadgen/maxrps_messages_test.go
@@ -7,9 +7,8 @@ import (
 	"github.com/stretchr/testify/assert"
 )
 
-// Compile-time checks: messagesWorkload satisfies rpsWorkload; constructor exists.
+// Compile-time check: messagesWorkload satisfies rpsWorkload.
 var _ rpsWorkload = (*messagesWorkload)(nil)
-var _ = newMessagesWorkload
 
 func TestDiffCounters(t *testing.T) {
 	start := msgCounters{published: 100, err: map[string]float64{"publish": 1, "saturated": 5}}
diff --git a/tools/loadgen/maxrps_test.go b/tools/loadgen/maxrps_test.go
new file mode 100644
index 000000000..b165d2069
--- /dev/null
+++ b/tools/loadgen/maxrps_test.go
@@ -0,0 +1,28 @@
+package main
+
+import (
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestDefaultSteps(t *testing.T) {
+	msgs, err := parseRPSSteps(defaultSteps("messages"))
+	require.NoError(t, err)
+	assert.Equal(t, []int{500, 1000, 2000, 5000, 10000}, msgs)
+
+	hist, err := parseRPSSteps(defaultSteps("history"))
+	require.NoError(t, err)
+	assert.Equal(t, []int{200, 500, 1000, 2000, 5000}, hist)
+}
+
+func TestBuildThresholds(t *testing.T) {
+	th := buildThresholds(100*time.Millisecond, 250*time.Millisecond, 0.001, 1000, 0.05)
+	assert.Equal(t, 100*time.Millisecond, th.P95)
+	assert.Equal(t, 250*time.Millisecond, th.P99)
+	assert.Equal(t, 0.001, th.ErrorRate)
+	assert.Equal(t, uint64(1000), th.PendingGrowth)
+	assert.Equal(t, 0.05, th.RateTolerance)
+}

From 867ff1ee41014621db8df929f4c2114d524d0c2a Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 20:17:28 +0000
Subject: [PATCH 14/16] test(loadgen): integration coverage for max-rps
 messages ramp

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/integration_test.go | 70 +++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/tools/loadgen/integration_test.go b/tools/loadgen/integration_test.go
index 2fdf23343..21f1404a1 100644
--- a/tools/loadgen/integration_test.go
+++ b/tools/loadgen/integration_test.go
@@ -138,4 +138,74 @@ func TestLoadgenSmallPreset_EndToEnd(t *testing.T) {
 	require.Equal(t, fixtures.Rooms[0].ID, room.ID)
 }
 
+func TestMaxRPS_Messages_TwoStepRamp(t *testing.T) {
+	ctx := context.Background()
+	siteID := "site-maxrps"
+
+	nc, err := nats.Connect(testutil.NATS(t))
+	require.NoError(t, err)
+	defer nc.Drain()
+	js, err := jetstream.New(nc)
+	require.NoError(t, err)
+
+	canonical := stream.MessagesCanonical(siteID)
+	_, err = js.CreateOrUpdateStream(ctx, jetstream.StreamConfig{
+		Name:     canonical.Name,
+		Subjects: canonical.Subjects,
+	})
+	require.NoError(t, err)
+
+	// Ack-only durables so the canonical stream drains to zero (pending stays low).
+	for _, durable := range []string{"message-worker", "broadcast-worker"} {
+		cons, err := js.CreateOrUpdateConsumer(ctx, canonical.Name, jetstream.ConsumerConfig{
+			Durable:   durable,
+			AckPolicy: jetstream.AckExplicitPolicy,
+		})
+		require.NoError(t, err)
+		cc, err := cons.Consume(func(msg jetstream.Msg) { _ = msg.Ack() })
+		require.NoError(t, err)
+		defer cc.Stop()
+	}
+
+	// Fake gatekeeper: frontdoor send -> canonical event.
+	gkSub, err := nc.Subscribe(subject.MsgSendWildcard(siteID), func(m *nats.Msg) {
+		var req model.SendMessageRequest
+		if err := json.Unmarshal(m.Data, &req); err != nil {
+			return
+		}
+		evt := model.MessageEvent{
+			Message:   model.Message{ID: req.ID, Content: req.Content, CreatedAt: time.Now().UTC()},
+			SiteID:    siteID,
+			Timestamp: time.Now().UnixMilli(),
+		}
+		data, _ := json.Marshal(evt)
+		_, _ = js.Publish(ctx, subject.MsgCanonicalCreated(siteID), data)
+	})
+	require.NoError(t, err)
+	defer gkSub.Unsubscribe()
+
+	cfg := &config{NatsURL: testutil.NATS(t), SiteID: siteID, MetricsAddr: ":0", MaxInFlight: 100}
+	preset, _ := BuiltinPreset("small")
+
+	w, cleanup, err := newMessagesWorkload(ctx, cfg, &preset, InjectFrontdoor, 42)
+	require.NoError(t, err)
+	defer cleanup()
+
+	results := runRamp(ctx, w, &rampConfig{
+		Steps: []int{50, 100}, Warmup: time.Second, Hold: 2 * time.Second, Cooldown: 0,
+		Thresholds: rpsThresholds{
+			P95: time.Second, P99: 2 * time.Second, ErrorRate: 0.9,
+			PendingGrowth: 1_000_000, RateTolerance: 0.9,
+		},
+		StopOnTrip: true,
+	})
+
+	require.Len(t, results, 2)
+	for _, r := range results {
+		require.NotEqual(t, verdictTrip, r.Kind, "reasons=%v", r.Reasons)
+		require.Greater(t, r.AttemptedOps, 0)
+		require.Greater(t, r.AchievedRPS, 0.0)
+	}
+}
+
 func TestMain(m *testing.M) { testutil.RunTests(m) }

From 830cacf210dacdcd64946c36145a562228ed9ba1 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 20:19:25 +0000
Subject: [PATCH 15/16] docs(loadgen): document max-rps subcommand and add
 run-max-rps target

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/README.md       | 65 +++++++++++++++++++++++++++++++++++
 tools/loadgen/deploy/Makefile | 11 +++++-
 2 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/tools/loadgen/README.md b/tools/loadgen/README.md
index e471d3e17..885ee131d 100644
--- a/tools/loadgen/README.md
+++ b/tools/loadgen/README.md
@@ -212,3 +212,68 @@ most reads.
 - Errors broken out by class (`timeout`, `reply`, `bad`); the
   `no-thread-parents` counter is informational (thread requests that
   landed on a room with no seeded parents and fell back to history).
+
+## max-rps — auto-find Max RPS under SLO
+
+Automatically finds the maximum RPS each workload can sustain while all
+SLO signals hold. The subcommand ramps the target rate through an ordered
+list of steps, holds at each step for a measurement window, evaluates SLO
+signals, and reports the largest step at which every signal passed.
+
+```bash
+loadgen max-rps --workload=messages|history --preset=<name> [flags]
+```
+
+### Quick start
+
+```bash
+# messages: ramp 500..10k rps, stop at first SLO breach
+loadgen max-rps --workload=messages --preset=medium --steps=500,1k,2k,5k,10k
+
+# history: per-endpoint SLO, custom p95
+loadgen max-rps --workload=history --preset=history-medium --steps=200,500,1k,2k --slo-p95=80ms
+```
+
+Via the deploy Makefile:
+
+```bash
+make -C tools/loadgen/deploy run-max-rps PRESET=medium
+make -C tools/loadgen/deploy run-max-rps WORKLOAD=history PRESET=history-medium STEPS=200,500,1k,2k
+```
+
+### Flags
+
+| Flag | Default | Notes |
+|------|---------|-------|
+| `--workload` | `messages` | `messages` or `history` |
+| `--preset` | (required) | an existing preset for the chosen workload |
+| `--steps` | messages `500,1k,2k,5k,10k` / history `200,500,1k,2k,5k` | explicit ordered RPS list; `k` suffix = ×1000 |
+| `--warmup` | `10s` | per-step warmup (samples discarded) |
+| `--hold` | `30s` | per-step measurement window |
+| `--cooldown` | `5s` | per-step settle gap before next step |
+| `--slo-p95` | `100ms` | applied to **every** gated latency series |
+| `--slo-p99` | `250ms` | applied to **every** gated latency series |
+| `--slo-error-rate` | `0.001` | `failed / attempted` (0.1%) |
+| `--slo-pending-growth` | `1000` | **messages only**: per-durable end−start `NumPending` delta |
+| `--rate-tolerance` | `0.05` | achieved-vs-target shortfall band for the INCONCLUSIVE guard |
+| `--stop-on-trip` | `true` | stop the ramp at the first TRIP (does **not** stop on INCONCLUSIVE) |
+| `--seed` | `42` | RNG seed (parity with existing subcommands) |
+| `--csv` | `""` | optional CSV output path |
+
+### Reading the output
+
+At the end of the run the tool prints a per-step table and a final
+verdict line:
+
+```
+ANSWER: max RPS = 2000
+```
+
+This is the largest step at which **all** SLO signals passed. If no step
+passed, the output is `ANSWER: max RPS = 0 (none passed)`.
+
+**INCONCLUSIVE rows** appear when the achieved throughput fell more than
+`--rate-tolerance` below the target (the pipeline was already saturated
+before the SLO gate ran). An INCONCLUSIVE step is treated as a soft TRIP:
+`--stop-on-trip` halts at the first INCONCLUSIVE just as it would at a
+hard TRIP, and it does not count as a passing step.
diff --git a/tools/loadgen/deploy/Makefile b/tools/loadgen/deploy/Makefile
index 1a1ec14a2..6d5ebd88e 100644
--- a/tools/loadgen/deploy/Makefile
+++ b/tools/loadgen/deploy/Makefile
@@ -3,13 +3,15 @@ DEPS_COMPOSE     := $(ROOT)/docker-local/compose.deps.yaml
 SERVICES_COMPOSE := $(ROOT)/docker-local/compose.services.yaml
 OVERLAY          := docker-compose.yml
 COMPOSE          := docker compose -f $(OVERLAY)
+WORKLOAD         ?= messages
+STEPS            ?=
 
 # Encryption is the default for broadcast-worker; the broadcast-worker compose
 # reads ENCRYPTION_ENABLED via env interpolation (defaulting to true). Flip to
 # `ENCRYPTION_ENABLED=false make up` for a plaintext comparison run.
 export ENCRYPTION_ENABLED ?= true
 
-.PHONY: up stack-up overlay-up seed teardown run run-dashboards down logs seed-members teardown-members reset-members run-sustained run-capacity
+.PHONY: up stack-up overlay-up seed teardown run run-dashboards run-max-rps down logs seed-members teardown-members reset-members run-sustained run-capacity
 
 up: stack-up overlay-up
 
@@ -73,6 +75,13 @@ run-dashboards:
 	$(COMPOSE) --profile dashboards up -d
 	$(MAKE) run PRESET=$(PRESET) RATE=$(RATE) DURATION=$(DURATION)
 
+run-max-rps: ## Ramp RPS to find the max under SLO (WORKLOAD=messages|history PRESET=.. STEPS=..)
+	@test -n "$(PRESET)" || (echo "PRESET=<name> required" && exit 1)
+	$(COMPOSE) exec -T loadgen /loadgen max-rps \
+	    --workload=$(WORKLOAD) \
+	    --preset=$(PRESET) \
+	    $(if $(STEPS),--steps=$(STEPS),)
+
 down:
 	$(COMPOSE) --profile dashboards down -v
 	docker compose -f $(SERVICES_COMPOSE) down

From bd2a863e4b17172dd30fd50f5e505fc90afe7e6e Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Thu, 28 May 2026 20:35:36 +0000
Subject: [PATCH 16/16] docs(loadgen): fix max-rps README + validate history
 flags

Correct the README "Reading the output" section (INCONCLUSIVE does not
stop the ramp and is not a pass; fix the no-pass ANSWER string), validate
--page-limit/--request-timeout > 0, and document the deliberate
straggler-exclusion in the messages hold-window counter snapshot.

https://claude.ai/code/session_01EdwhSB725x7E4SMLPg4Dha
---
 tools/loadgen/README.md          | 17 ++++++++++-------
 tools/loadgen/maxrps.go          |  8 ++++++++
 tools/loadgen/maxrps_messages.go |  4 ++++
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/tools/loadgen/README.md b/tools/loadgen/README.md
index 885ee131d..cbab24821 100644
--- a/tools/loadgen/README.md
+++ b/tools/loadgen/README.md
@@ -266,14 +266,17 @@ At the end of the run the tool prints a per-step table and a final
 verdict line:
 
 ```
-ANSWER: max RPS = 2000
+ANSWER: max RPS = 2000 (workload=messages, preset=medium)
+        Next limit: E2 p95=143ms > 100ms
 ```
 
-This is the largest step at which **all** SLO signals passed. If no step
-passed, the output is `ANSWER: max RPS = 0 (none passed)`.
+This is the largest step at which **all** SLO signals passed; the
+`Next limit:` line names why the first failing step tripped. If no step
+passed, the output is `ANSWER: no step passed (workload=…, preset=…)`.
 
 **INCONCLUSIVE rows** appear when the achieved throughput fell more than
-`--rate-tolerance` below the target (the pipeline was already saturated
-before the SLO gate ran). An INCONCLUSIVE step is treated as a soft TRIP:
-`--stop-on-trip` halts at the first INCONCLUSIVE just as it would at a
-hard TRIP, and it does not count as a passing step.
+`--rate-tolerance` below the target while the SLO signals still looked
+healthy — i.e. the load generator itself, not the service under test, was
+the limiting factor, so the step's result can't be trusted. An
+INCONCLUSIVE step does **not** count as a pass and does **not** stop the
+ramp, even with `--stop-on-trip`; only a hard TRIP stops the ramp.
diff --git a/tools/loadgen/maxrps.go b/tools/loadgen/maxrps.go
index 7bc4539eb..f60ff7a3f 100644
--- a/tools/loadgen/maxrps.go
+++ b/tools/loadgen/maxrps.go
@@ -105,6 +105,14 @@ func runMaxRPS(ctx context.Context, cfg *config, args []string) int {
 			fmt.Fprintln(os.Stderr, "--scrollback-pages must be > 0")
 			return 2
 		}
+		if *pageLimit <= 0 {
+			fmt.Fprintln(os.Stderr, "--page-limit must be > 0")
+			return 2
+		}
+		if *requestTimeout <= 0 {
+			fmt.Fprintln(os.Stderr, "--request-timeout must be > 0")
+			return 2
+		}
 		hw, clean, err := newHistoryWorkload(ctx, cfg, &p, *seed, historyWorkloadParams{
 			Mix: mix, BeforeMode: beforeMode, ScrollbackPages: *scrollbackPages,
 			PageLimit: *pageLimit, RequestTimeout: *requestTimeout,
diff --git a/tools/loadgen/maxrps_messages.go b/tools/loadgen/maxrps_messages.go
index b5f8c24e0..71e65bc1d 100644
--- a/tools/loadgen/maxrps_messages.go
+++ b/tools/loadgen/maxrps_messages.go
@@ -236,6 +236,10 @@ func (w *messagesWorkload) RunStep(ctx context.Context, targetRPS int, warmup, h
 
 	holdErr := waitOrCancel(ctx, hold)
 
+	// Counters are snapshotted at hold-end, before the drain: gatekeeper/bad_reply
+	// errors whose reply lands during the drain are deliberately excluded (see the
+	// straggler-exclusion rationale on buildMessagesInputs). The drain only lets
+	// trailing E1/E2 latency samples settle for the percentile signals.
 	endCounts := w.snapshotCounters()
 	endPending, perr2 := w.snapshotPending(ctx)
 	cancel()