test: share testcontainers per service via pkg/testutil#208
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (30)
📝 WalkthroughWalkthroughThis PR establishes a centralized integration test infrastructure in ChangesShared Integration Test Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@search-service/setup_shared_test.go`:
- Around line 224-226: The FLUSHDB error is only logged so test suite can
proceed with leftover keys; change the cleanup to fail the test on error by
replacing the t.Logf call with a test-failing assertion (e.g., call
t.Fatalf("flush valkey at %s: %v", addr, err) or use require.NoError(t, err)) so
that the rc.FlushDB(ctx).Err() failure in the setup (the statement calling
rc.FlushDB) causes the test to stop and surface the isolation breach
immediately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7bc2262b-cf97-40d2-b014-49aa3bae44da
📒 Files selected for processing (2)
search-service/integration_test.gosearch-service/setup_shared_test.go
…rrors A silent log on FLUSHDB failure could let state leak into the next sibling test, surfacing far from the real root cause. t.Errorf marks the offending test failed while still allowing the rest of its cleanup chain (valkeyutil.Disconnect, etc.) to run. Per CodeRabbit review on PR #208.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@search-service/integration_ccs_test.go`:
- Around line 132-144: The test fixture that creates the router via
natsrouter.New("search-service-test") registers middleware and handlers but
never calls router.Shutdown, causing NATS subscriptions to leak; update the
fixture to register a t.Cleanup (or defer) that calls router.Shutdown() (or
router.Shutdown(ctx) if it requires a context) after handler.Register and Flush,
ensuring the router created by natsrouter.New is properly shut down when the
test ends.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 81248fd1-e0d2-4a24-9b4c-b28ac23851a1
📒 Files selected for processing (3)
search-service/integration_ccs_test.gosearch-service/integration_test.gosearch-service/setup_shared_test.go
- Add t.Cleanup(router.Shutdown) to setupCCSFixture to match every other fixture in the package (per CodeRabbit review on PR #208). - Trim narration / what-comments across the integration_*_test.go files and setup_shared_test.go. Preserve genuine WHY comments: X-Elastic-Product product-check, mallory access boundary, FNV hash rationale, FLUSHDB isolation note, Flush before subscribe-race.
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (8)
pkg/testutil/minio.go (1)
72-74: ⚡ Quick winReplace stderr printf logging with structured slog.
Line 73 should use
log/slogJSON logging with key-value fields rather thanfmt.Fprintf.As per coding guidelines: "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/minio.go` around lines 72 - 74, Replace the stderr printf in the minio container termination block with structured slog JSON logging: when minioContainer.Terminate(ctx) returns an error, call the slog logger (e.g., slog.Error or slog.Log with Level "error") and include key-value fields such as "msg" ("terminate shared minio failed"), "err" (the error), and any context like "component":"minio" or "action":"terminate" instead of using fmt.Fprintf(os.Stderr,...); update the import to use log/slog if not already present.pkg/testutil/valkey.go (1)
44-51: ⚡ Quick winAlign cleanup paths with logging/error-handling guidelines.
Lines 44/50 silently drop cleanup errors, and Line 85 logs with
fmt.Fprintf. Please switch these to structuredsloglogging with key-value fields.As per coding guidelines: "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers" and "Never ignore errors silently — comment if intentionally discarded."Also applies to: 84-86
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/valkey.go` around lines 44 - 51, The cleanup paths in pkg/testutil/valkey.go currently ignore errors from container.Terminate(ctx) and use fmt.Fprintf for logging; change these to capture and handle termination errors instead of discarding them (do not use blank identifier), assign or log the terminate error if termination fails, and replace any fmt.Fprintf calls with structured slog calls (slog.Error/slog.Info) using key-value fields; update the branches around container.Terminate(ctx), valkeyInitErr assignments, and the fmt.Fprintf usage to call slog.* with context like ("msg", "failed to terminate container", "err", err) or include both errors when setting valkeyInitErr so cleanup failures are visible.pkg/testutil/mongo.go (2)
61-71: ⚡ Quick winMake
TerminateMongofully idempotent by clearing shared references.After teardown,
mongoClientandmongoContainerstay non-nil, so repeated calls can re-run disconnect/terminate on stale state. Reset both references after attempting teardown.Proposed diff
func TerminateMongo() { ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() if mongoClient != nil { if err := mongoClient.Disconnect(ctx); err != nil { fmt.Fprintf(os.Stderr, "disconnect shared mongo client: %v\n", err) } + mongoClient = nil } if mongoContainer != nil { if err := mongoContainer.Terminate(ctx); err != nil { fmt.Fprintf(os.Stderr, "terminate shared mongo: %v\n", err) } + mongoContainer = nil } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/mongo.go` around lines 61 - 71, The TerminateMongo function currently disconnects the shared mongoClient and terminates mongoContainer but leaves the package-level variables set; update TerminateMongo so that after attempting mongoClient.Disconnect(ctx) and mongoContainer.Terminate(ctx) it sets mongoClient = nil and mongoContainer = nil (clearing the shared references) to make teardown idempotent; ensure you still log errors from Disconnect and Terminate (using the existing fmt.Fprintf calls) before nil-ing the variables.
39-46: ⚡ Quick winFix Mongo cleanup logging and silent error-discard paths.
Lines 39/45 silently discard terminate errors, and Lines 63/68 use
fmt.Fprintflogging. Please move these to structuredsloglogging and keep cleanup failures explicit.As per coding guidelines: "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers" and "Never ignore errors silently — comment if intentionally discarded."Also applies to: 62-69
pkg/testutil/cassandra.go (1)
48-55: ⚡ Quick winUse structured logging and explicit handling in Cassandra cleanup paths.
Lines 48/54/69 discard cleanup errors silently, and Line 93 uses
fmt.Fprintf. Please switch to structuredsloglogs and avoid silent cleanup-error drops.As per coding guidelines: "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers" and "Never ignore errors silently — comment if intentionally discarded."Also applies to: 69-70, 92-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/cassandra.go` around lines 48 - 55, The cleanup paths in pkg/testutil/cassandra.go currently discard errors (e.g., "_ = container.Terminate(ctx)") and use fmt.Fprintf for output; update the error handling in the Cassandra initialization/cleanup blocks (around the container.Terminate calls that follow get cassandra host/port errors and the final cleanup) to capture and handle termination errors instead of ignoring them, set/return cassInitErr as appropriate, and emit structured JSON logs using log/slog (not fmt) with clear context (e.g., include the container ID, operation like "terminate container", and both the original and termination errors). Replace any fmt.Fprintf usage with slog calls, and if a termination error is intentionally ignored, add a brief comment explaining why; reference cassInitErr, container.MappedPort, and the container.Terminate calls to locate the changes.pkg/testutil/elasticsearch.go (1)
54-61: ⚡ Quick winUse structured slog logging and avoid silent cleanup-error drops.
Lines 54/60 silently discard
Terminateerrors, and Line 93 uses unstructuredfmt.Fprintflogging. Please log cleanup failures with structuredslogfields instead of stderr formatting.As per coding guidelines: "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers" and "Never ignore errors silently — comment if intentionally discarded."Also applies to: 92-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/elasticsearch.go` around lines 54 - 61, The code silently ignores errors from container.Terminate and uses unstructured fmt.Fprintf logging; update the cleanup and logging to use log/slog with structured fields: capture and check the error returned by container.Terminate where currently discarded and log it via slog.Error (include context fields like "action":"terminate", "container":container.Name or container.ID and the error), and replace the fmt.Fprintf usages with equivalent slog calls (slog.Error or slog.Info) including structured keys; if any Terminate error is intentionally ignored, add an explicit comment explaining why. Ensure esInitErr assignment logic (esInitErr = fmt.Errorf(...)) remains but uses slog for diagnostic output instead of fmt.pkg/testutil/nats.go (1)
44-45: ⚡ Quick winStandardize NATS teardown/error paths to structured logging.
Line 44 drops a cleanup error silently, and Line 77 uses
fmt.Fprintffor logging. UseslogJSON logs with fields for both init-cleanup and terminate failures.As per coding guidelines: "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers" and "Never ignore errors silently — comment if intentionally discarded."Also applies to: 76-78
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/nats.go` around lines 44 - 45, The teardown and init error handling in pkg/testutil/nats.go should use log/slog structured JSON logs and must not silently discard errors: replace the ignored assignment "_ = c.Terminate(ctx)" with proper error capture (e.g., errTerm := c.Terminate(ctx)) and when terminating on failure set or augment natsInitErr and log both init and terminate failures via slog (use slog.Error with key/value fields referencing the operation, error, and any context such as nats container id); also replace the fmt.Fprintf usage around lines 76-78 with slog.Error/slog.Info calls that include fields for the original init error and the terminate error instead of printing plain text. Ensure you update references to natsInitErr and c.Terminate to propagate or wrap errors consistently rather than discarding them.pkg/testutil/init.go (1)
17-18: ⚡ Quick winHandle the discarded
os.Setenverror explicitly.Line 18 ignores the error silently. Please either handle it or add an inline note explaining why discarding it is intentional here.
As per coding guidelines: "Never ignore errors silently — comment if intentionally discarded."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/init.go` around lines 17 - 18, The call to os.Setenv("TESTCONTAINERS_RYUK_DISABLED", "true") currently discards its error; update the init logic to handle the returned error from os.Setenv rather than ignoring it — either check the error and call log.Fatalf or t.Fatalf (or return the error) so failures are surfaced, or if you truly intend to ignore it add an explicit comment explaining why and suppress only after documenting; reference the env var name TESTCONTAINERS_RYUK_DISABLED and the os.Setenv call when making the change so the handling is applied in the same initialization block.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@search-service/integration_apps_test.go`:
- Line 40: Teardown calls currently discard errors (serverNATS.Drain() and
router.Shutdown(...)); change them to capture the returned error and report it
to the test instead of ignoring it — e.g., replace the discarded calls in the
t.Cleanup and shutdown block with error checks like "if err :=
serverNATS.Drain(); err != nil { t.Errorf("serverNATS.Drain failed: %v", err) }"
and similarly check the error returned by router.Shutdown(ctx) and report it via
t.Errorf or t.Fatalf so teardown failures are visible.
In `@search-service/integration_ccs_test.go`:
- Line 138: The cleanup currently discards router.Shutdown errors; update the
t.Cleanup closure to capture the returned error from
router.Shutdown(context.Background()) and surface it (e.g., call t.Fatalf or
t.Errorf with the error) instead of using `_ =`, so failing shutdowns fail the
test and prevent leaked subscriptions. Refer to the t.Cleanup closure and the
router.Shutdown call to locate and modify the code.
In `@search-service/integration_messages_test.go`:
- Line 38: The test fixture currently discards errors from io.Copy (reading
r.Body) and other calls (w.Write, Drain, Shutdown); update the test in
integration_messages_test.go to either check and assert these errors (e.g.,
require.NoError/if err != nil t.Fatalf) or explicitly comment why an error may
be safely ignored. Locate the uses of io.Copy(r.Body, io.Discard) and subsequent
w.Write, Drain, and Shutdown calls and replace the blank-error assignments with
proper error handling/assertions referencing those symbols so failures surface
during test runs.
In `@search-service/integration_rooms_test.go`:
- Line 47: Replace the silent error discards with proper error handling: in the
t.Cleanup calls that currently do `_ = serverNC.Drain()` (and any other `_ =
...` teardown calls) check the returned error and call t.Fatalf/t.Errorf (or
require.NoError) so teardown failures are reported instead of ignored, and in
the index setup where json.Marshal and ReadAll (or similar marshal/read calls)
currently drop errors, capture the error and fail the test or return it so the
test harness sees the failure; if any discard is intentional, add an explicit
comment explaining why and keep the explicit check/comment next to
serverNC.Drain(), json.Marshal, ReadAll (and any os.Remove/cleanup calls) to
satisfy the "never ignore errors silently" guideline.
In `@search-service/integration_users_test.go`:
- Line 41: Tests currently discard errors returned from cleanup/write calls
(e.g., the call to serverNC.Drain()), which hides fixture failures; update each
site (the serverNC.Drain() call and the other two cleanup/write calls
referenced) to capture the returned error and fail the test if non-nil (for
example: err := serverNC.Drain(); if err != nil { t.Fatalf("serverNC.Drain
failed: %v", err) }) so errors are surfaced with clear context.
In `@search-service/setup_shared_test.go`:
- Line 47: In seedDoc, don’t discard the error return from io.ReadAll; update
the call inside function seedDoc to capture both variables (e.g., body, err :=
io.ReadAll(resp.Body)), check err and fail the test or return a descriptive
error (using t.Fatalf/t.Errorf or returning the error) so response-body read
failures are surfaced in test output, and include the response status and any
relevant context in the error message for easier debugging.
- Around line 64-67: TestMain is currently discarding errors returned by the
Ensure* bootstrap functions inside the goroutine (the anonymous func wrapping f
func() error) so failures are hidden; change the bootstrapping to capture and
fail-fast on any non-nil error returned by the Ensure* calls (e.g., collect
errors via a buffered errCh or a synced slice while still calling wg.Done, then
after wg.Wait check for any error and call processLogger.Fatal/os.Exit(1) or
t.Fatalf) before calling m.Run; locate the anonymous goroutine and the TestMain
orchestration (wg, m.Run, the func(f func() error) { ... }(fn)) and modify it to
propagate errors instead of discarding them.
---
Nitpick comments:
In `@pkg/testutil/cassandra.go`:
- Around line 48-55: The cleanup paths in pkg/testutil/cassandra.go currently
discard errors (e.g., "_ = container.Terminate(ctx)") and use fmt.Fprintf for
output; update the error handling in the Cassandra initialization/cleanup blocks
(around the container.Terminate calls that follow get cassandra host/port errors
and the final cleanup) to capture and handle termination errors instead of
ignoring them, set/return cassInitErr as appropriate, and emit structured JSON
logs using log/slog (not fmt) with clear context (e.g., include the container
ID, operation like "terminate container", and both the original and termination
errors). Replace any fmt.Fprintf usage with slog calls, and if a termination
error is intentionally ignored, add a brief comment explaining why; reference
cassInitErr, container.MappedPort, and the container.Terminate calls to locate
the changes.
In `@pkg/testutil/elasticsearch.go`:
- Around line 54-61: The code silently ignores errors from container.Terminate
and uses unstructured fmt.Fprintf logging; update the cleanup and logging to use
log/slog with structured fields: capture and check the error returned by
container.Terminate where currently discarded and log it via slog.Error (include
context fields like "action":"terminate", "container":container.Name or
container.ID and the error), and replace the fmt.Fprintf usages with equivalent
slog calls (slog.Error or slog.Info) including structured keys; if any Terminate
error is intentionally ignored, add an explicit comment explaining why. Ensure
esInitErr assignment logic (esInitErr = fmt.Errorf(...)) remains but uses slog
for diagnostic output instead of fmt.
In `@pkg/testutil/init.go`:
- Around line 17-18: The call to os.Setenv("TESTCONTAINERS_RYUK_DISABLED",
"true") currently discards its error; update the init logic to handle the
returned error from os.Setenv rather than ignoring it — either check the error
and call log.Fatalf or t.Fatalf (or return the error) so failures are surfaced,
or if you truly intend to ignore it add an explicit comment explaining why and
suppress only after documenting; reference the env var name
TESTCONTAINERS_RYUK_DISABLED and the os.Setenv call when making the change so
the handling is applied in the same initialization block.
In `@pkg/testutil/minio.go`:
- Around line 72-74: Replace the stderr printf in the minio container
termination block with structured slog JSON logging: when
minioContainer.Terminate(ctx) returns an error, call the slog logger (e.g.,
slog.Error or slog.Log with Level "error") and include key-value fields such as
"msg" ("terminate shared minio failed"), "err" (the error), and any context like
"component":"minio" or "action":"terminate" instead of using
fmt.Fprintf(os.Stderr,...); update the import to use log/slog if not already
present.
In `@pkg/testutil/mongo.go`:
- Around line 61-71: The TerminateMongo function currently disconnects the
shared mongoClient and terminates mongoContainer but leaves the package-level
variables set; update TerminateMongo so that after attempting
mongoClient.Disconnect(ctx) and mongoContainer.Terminate(ctx) it sets
mongoClient = nil and mongoContainer = nil (clearing the shared references) to
make teardown idempotent; ensure you still log errors from Disconnect and
Terminate (using the existing fmt.Fprintf calls) before nil-ing the variables.
In `@pkg/testutil/nats.go`:
- Around line 44-45: The teardown and init error handling in
pkg/testutil/nats.go should use log/slog structured JSON logs and must not
silently discard errors: replace the ignored assignment "_ = c.Terminate(ctx)"
with proper error capture (e.g., errTerm := c.Terminate(ctx)) and when
terminating on failure set or augment natsInitErr and log both init and
terminate failures via slog (use slog.Error with key/value fields referencing
the operation, error, and any context such as nats container id); also replace
the fmt.Fprintf usage around lines 76-78 with slog.Error/slog.Info calls that
include fields for the original init error and the terminate error instead of
printing plain text. Ensure you update references to natsInitErr and c.Terminate
to propagate or wrap errors consistently rather than discarding them.
In `@pkg/testutil/valkey.go`:
- Around line 44-51: The cleanup paths in pkg/testutil/valkey.go currently
ignore errors from container.Terminate(ctx) and use fmt.Fprintf for logging;
change these to capture and handle termination errors instead of discarding them
(do not use blank identifier), assign or log the terminate error if termination
fails, and replace any fmt.Fprintf calls with structured slog calls
(slog.Error/slog.Info) using key-value fields; update the branches around
container.Terminate(ctx), valkeyInitErr assignments, and the fmt.Fprintf usage
to call slog.* with context like ("msg", "failed to terminate container", "err",
err) or include both errors when setting valkeyInitErr so cleanup failures are
visible.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9e8a5a40-0a62-4f6d-80d1-c9c4b049b15b
📒 Files selected for processing (16)
.github/workflows/ci.ymlpkg/testutil/cassandra.gopkg/testutil/elasticsearch.gopkg/testutil/init.gopkg/testutil/minio.gopkg/testutil/mongo.gopkg/testutil/nats.gopkg/testutil/terminate.gopkg/testutil/valkey.gosearch-service/integration_apps_test.gosearch-service/integration_ccs_test.gosearch-service/integration_messages_test.gosearch-service/integration_rooms_test.gosearch-service/integration_test.gosearch-service/integration_users_test.gosearch-service/setup_shared_test.go
💤 Files with no reviewable changes (1)
- search-service/integration_test.go
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/ci.yml
…gration tests
The integration_test.go fixtures used to start fresh containers per test —
~14 fixture invocations across the file, with ES startup (~30-60s) being
the worst offender. The four search.rooms tests each waited on a brand-new
ES container.
This refactor moves shared container startup into setup_shared_test.go,
using the same sync.Once pattern already used by pkg/testutil/mongo.go.
Each fixture isolates per-test state instead of using a fresh container:
- Elasticsearch: unique index name per test via t.Name() hash; the
index is DELETEd on cleanup.
- Valkey: keyspace is FLUSHDB'd on cleanup so the next test starts
clean. Uses a raw go-redis client to avoid exposing FLUSHDB on the
production valkeyutil.Client interface.
- NATS: each test creates its own *nats.Conn pair; subscriptions are
removed via router.Shutdown + nc.Drain before the next test runs.
CCS tests are the one exception: they need two networked ES nodes with
shared docker-network aliases, which doesn't fit a process-shared pool.
They keep their per-test ES pair but now piggyback on the shared Valkey
and NATS.
Expected impact on a clean run: ES container starts drop from ~6 (4 Rooms +
2 CCS) to 3 (1 shared + 2 CCS); NATS starts drop from ~11 to 1; Valkey
starts drop from ~5 to 1.
…rrors A silent log on FLUSHDB failure could let state leak into the next sibling test, surfacing far from the real root cause. t.Errorf marks the offending test failed while still allowing the rest of its cleanup chain (valkeyutil.Disconnect, etc.) to run. Per CodeRabbit review on PR #208.
CCS is the one fixture in the file that owns special infrastructure — a pair of ES nodes on a shared docker network with transport-port aliases — so it doesn't fit the process-shared container pattern the other fixtures now use. Moving it (plus its CCS-only helpers: putClusterSetting, waitForRemoteConnected, install/build/messageTestTemplate, userRoomTestTemplate) into integration_ccs_test.go leaves integration_test.go homogeneous: all remaining tests follow the same "NATS request → assert response" pattern. testHTTPClient and seedDoc stay in integration_test.go because they're shared with the non-CCS path (uniqueESIndex cleanup, putTestSpotlightIndex, the Rooms tests' seeds).
One file per search endpoint, matching the CCS split pattern: integration_apps_test.go — search.apps (Mongo + NATS) integration_users_test.go — search.users (NATS + httptest stub) integration_rooms_test.go — search.rooms (shared ES + Valkey + NATS) integration_messages_test.go — search.messages v2 (ES httptest stub + NATS) testUserRoomIndex, testHTTPClient, and seedDoc move into setup_shared_test.go alongside the shared-container helpers — they're test infrastructure used by multiple endpoints, so colocating them keeps the per-endpoint files focused. integration_test.go is removed; the package no longer has a catch-all integration file. CI auto-discovery and path filtering both still match because every new file lives under search-service/ and carries the //go:build integration tag.
- Add t.Cleanup(router.Shutdown) to setupCCSFixture to match every other fixture in the package (per CodeRabbit review on PR #208). - Trim narration / what-comments across the integration_*_test.go files and setup_shared_test.go. Preserve genuine WHY comments: X-Elastic-Product product-check, mallory access boundary, FNV hash rationale, FLUSHDB isolation note, Flush before subscribe-race.
…S heap - TestMain pre-warms shared ES, NATS, and Valkey concurrently via goroutines. First-test wall-clock for cold start drops to max(ES, NATS, Valkey) ≈ ES alone (~30-60s) instead of the sum. Refactored each sharedXxx(t) into a no-t ensureSharedXxx() so TestMain can drive them without a *testing.T. - Drop ES heap from -Xms512m -Xmx512m to -Xms256m -Xmx256m for both the shared single-node ES and the two CCS ES nodes. The test seed sizes (<10 docs per test) fit comfortably in 256m; saves ~5-10s startup per container and ~250MB RAM per node.
… in CI CI sets TESTCONTAINERS_RYUK_DISABLED=true on the integration job because Ryuk doesn't run reliably on this runner. Without Ryuk the shared containers (started via sync.Once with no t.Cleanup) would leak per test job. Wrap m.Run() so TestMain calls terminateShared() on clean exits and explicitly Terminates each shared container (ES, Valkey, NATS). Locally Ryuk stays enabled as a safety net for SIGKILL / Ctrl+C where m.Run never returns.
pkg/testutil/mongo.go now exports TerminateMongo, which disconnects the shared client and stops the shared container. search-service's terminateShared() calls it so the Mongo container is reaped on clean exits — same need as ES/Valkey/NATS now that CI runs with TESTCONTAINERS_RYUK_DISABLED=true. Other services that use testutil.MongoDB can opt in by calling TerminateMongo from their own TestMain when they need explicit cleanup.
pkg/testutil now mirrors the Mongo/Cassandra/MinIO pattern for ES, NATS, and Valkey: each exposes Xxx(t) for tests, EnsureXxx() for TestMain pre-warming, and TerminateXxx() for explicit cleanup. testutil.TerminateAll fans out to every Terminate (each is idempotent if the container was never started) so any service's TestMain becomes a one-liner. NATS starts with --jetstream so consumers that need streams just work against the shared container; core pub/sub consumers pay nothing extra. search-service drops its local sync.Once helpers and uses the testutil helpers throughout. setup_shared_test.go shrinks to just the test-wide testHTTPClient / seedDoc / uniqueESIndex / freshValkeyClient helpers that are specific to the search-* tests. CCS keeps its own networked ES pair (still local) but uses shared NATS/Valkey via testutil. Next: migrate other services that use shared containers (search-sync-worker, room-service, room-worker, etc.) to the testutil helpers + add TestMain calls.
Ryuk fails to start on our CI runner (can't pull / run the sidecar image), so set TESTCONTAINERS_RYUK_DISABLED=true once in pkg/testutil's init(). Every integration test that imports testutil inherits this automatically — no per-job env config needed. Removes the TESTCONTAINERS_RYUK_DISABLED env var from the CI workflow's integration step; the init() makes it redundant. Developers can still override locally with TESTCONTAINERS_RYUK_DISABLED=false to re-enable Ryuk for debugging container leaks. Cleanup contract: testutil.TerminateAll (called from each service's TestMain) is the only mechanism that reaps the shared containers. SIGKILL / Ctrl+C leaks containers — acceptable trade-off for the simplification, and on CI it doesn't matter since the runner VM is torn down anyway.
…ainers
Per consumer:
- search-sync-worker, room-service, room-worker, inbox-worker,
tools/loadgen, pkg/natsrouter, pkg/roomkeystore, pkg/roomsubcache:
drop inline testcontainers.GenericContainer / natsmod.Run starts and
use the testutil helpers (Elasticsearch, NATS, Valkey) instead.
Sibling-test isolation: Valkey gets FLUSHDB on cleanup via the new
testutil.FlushValkey; ES gets per-test unique indices where the suite
needs them; NATS gets unique stream / queue-group names where it
already did.
- pkg/roomkeysender, pkg/roomcrypto: keep their per-test containers
(NATS-with-WebSocket and Node have special configs); just add TestMain
so they import testutil.
- broadcast-worker, message-worker, notification-worker,
history-service/internal/{mongorepo,cassrepo,service}, pkg/userstore,
pkg/mongoutil, pkg/minioutil: add main_test.go with TestMain calling
testutil.TerminateAll. These only consume Mongo/Cassandra/MinIO
helpers from testutil — no inline container starts to remove.
search-service TestMain now fails fast on pre-warm errors (CodeRabbit).
TerminateMongo / TerminateCassandra nil their refs after cleanup so they
are idempotent (CodeRabbit).
Ryuk works on our CI runner after all, so init() and the CI env var are
removed — Ryuk stays enabled as the safety net for SIGKILL / Ctrl+C;
TerminateAll runs first on clean exits and the explicit cleanup is what
the tests actually rely on.
Merge from main pulled in tools/loadgen/members_integration_test.go which referenced the local setupNATS helper that this PR deleted. Swap it to testutil.NATS(t) like the sibling integration test.
293cb24 to
66cecfe
Compare
CI reliably needs TESTCONTAINERS_RYUK_DISABLED=true after all. Set it via pkg/testutil's init() so every test process inherits it without per-job env config. Developers can override locally with TESTCONTAINERS_RYUK_DISABLED=false to debug container leaks. Cleanup audit — every test container in this PR has: - A stored container reference (not just URL/addr). - Best-effort Terminate on init-failure paths inside ensureXxx helpers. - Final teardown driven by TestMain → testutil.TerminateAll OR by t.Cleanup for per-test local containers. This holds regardless of test pass/fail: t.Cleanup runs on failure + panic, and TestMain's `code := m.Run(); TerminateAll(); os.Exit(code)` wrap ensures cleanup runs before process exit.
…ants, naming
- Add testutil.RunTests(m) one-liner helper. Collapses 17 identical
per-service TestMains into `func TestMain(m *testing.M) { testutil.RunTests(m) }`.
- Drop search-service's local flushValkey + freshValkeyClient — the former
duplicates testutil.FlushValkey exactly; the latter is renamed to
valkeyClient (the underlying container is shared, "fresh" was misleading).
- Promote search-service NATS queue-group literals to testQueueGroup /
testQueueGroupSubs / testQueueGroupV2 constants.
- Add testutil.EnsureMongo to search-service's prewarm slice — apps tests
used to pay Mongo startup serially against the first call.
- Trim the over-long docstrings on init.go and terminate.go.
- Replace the cross-file "// Flush — see setupAppsFixture for the rationale"
comment with the inline one-liner everywhere; no more grep-to-find chain.
- Fix stale "Prewarm" reference in elasticsearch.go EnsureXxx comment.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (6)
search-service/setup_shared_test.go (1)
52-52:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHandle response-body read failures in
seedDoc.Line 52 drops the
io.ReadAllerror, which can hide useful diagnostics when indexing fails.Suggested patch
- body, _ := io.ReadAll(resp.Body) + body, err := io.ReadAll(resp.Body) + require.NoError(t, err)As per coding guidelines: "Never ignore errors silently — comment if intentionally discarded."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@search-service/setup_shared_test.go` at line 52, In the seedDoc helper, don't ignore the io.ReadAll error; change the read to capture the error (e.g., body, err := io.ReadAll(resp.Body)) and handle it by failing the test or returning the error from seedDoc (use t.Fatalf or return fmt.Errorf) with a message that includes the error and useful context (response status and partial body) so failures are diagnosable; update any callers of seedDoc if needed to propagate the error.search-service/integration_rooms_test.go (1)
47-47:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake index fixture setup/teardown fail loudly on errors.
Line 47, Line 72, Line 101, and Line 108 currently suppress errors, which can hide index setup failures and cleanup leaks.
Suggested patch
- t.Cleanup(func() { _ = serverNC.Drain() }) + t.Cleanup(func() { assert.NoError(t, serverNC.Drain()) }) ... - t.Cleanup(func() { _ = router.Shutdown(context.Background()) }) + t.Cleanup(func() { assert.NoError(t, router.Shutdown(context.Background())) }) ... - data, _ := json.Marshal(body) + data, err := json.Marshal(body) + require.NoError(t, err) ... - b, _ := io.ReadAll(resp.Body) + b, err := io.ReadAll(resp.Body) + require.NoError(t, err)As per coding guidelines: "Never ignore errors silently — comment if intentionally discarded."
Also applies to: 72-72, 101-101, 108-108
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@search-service/integration_rooms_test.go` at line 47, The cleanup and teardown currently discard errors (e.g., in the t.Cleanup closures calling serverNC.Drain() and other index fixture setup/teardown calls), which can hide failures; change each discarded-error site to check the returned error and fail the test loudly (use t.Fatalf or testing/require's NoError/NoErrorf) so that any Drain()/Close()/teardown error surfaces immediately (refer to the exact calls to serverNC.Drain(), the index fixture setup/teardown closures, and any other places where errors are assigned to `_` in the test harness).search-service/integration_ccs_test.go (1)
74-74:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSurface CCS fixture cleanup/read failures explicitly.
Several paths still discard errors (network/container/router cleanup and HTTP body reads), which can mask CCS test instability and resource leaks.
Suggested patch
- t.Cleanup(func() { _ = nw.Remove(ctx) }) + t.Cleanup(func() { assert.NoError(t, nw.Remove(ctx)) }) ... - t.Cleanup(func() { _ = serverNC.Drain() }) + t.Cleanup(func() { assert.NoError(t, serverNC.Drain()) }) ... - t.Cleanup(func() { _ = router.Shutdown(context.Background()) }) + t.Cleanup(func() { assert.NoError(t, router.Shutdown(context.Background())) }) ... - t.Cleanup(func() { _ = container.Terminate(ctx) }) + t.Cleanup(func() { assert.NoError(t, container.Terminate(ctx)) }) ... - respBody, _ := io.ReadAll(resp.Body) + respBody, err := io.ReadAll(resp.Body) + require.NoError(t, err) ... - body, _ := io.ReadAll(resp.Body) + body, readErr := io.ReadAll(resp.Body) + require.NoError(t, readErr)As per coding guidelines: "Never ignore errors silently — comment if intentionally discarded."
Also applies to: 114-114, 138-138, 190-190, 275-275, 292-293
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@search-service/integration_ccs_test.go` at line 74, The test currently discards cleanup and read errors (e.g., the t.Cleanup call that calls nw.Remove(ctx) and other network/container/router cleanup and HTTP body reads), which hides failures and leaks; update each silent discard to capture the error and report it via the test (for cleanup use t.Cleanup(func(){ if err := nw.Remove(ctx); err != nil { t.Errorf("nw.Remove failed: %v", err) } }) or t.Logf/t.Fatalf as appropriate) and for HTTP responses ensure resp.Body is closed and check errors from io.ReadAll and resp.Body.Close (e.g., if b, err := io.ReadAll(resp.Body); err != nil { t.Fatalf("read body: %v", err) }). Locate and fix the occurrences around the symbols nw.Remove, container/router cleanup calls, and HTTP response body reads to explicitly handle and report errors rather than ignoring them.search-service/integration_messages_test.go (1)
38-38:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon’t silently drop fixture and teardown errors.
The handler and cleanup paths currently ignore multiple errors (Line 38, Line 45, Line 58, Line 83), which can hide integration flakiness.
Suggested patch
- _, _ = io.Copy(io.Discard, r.Body) + if _, err := io.Copy(io.Discard, r.Body); err != nil { + t.Errorf("drain ES stub request body: %v", err) + return + } ... - _, _ = w.Write([]byte(`{"hits":{"total":{"value":1},"hits":[{"_source":{` + + if _, err := w.Write([]byte(`{"hits":{"total":{"value":1},"hits":[{"_source":{` + `"messageId":"m1","roomId":"r1","siteId":"site-a","userId":"u1",` + - `"userAccount":"alice","content":"hello","createdAt":"2026-04-01T12:00:00Z"}}]}}`)) + `"userAccount":"alice","content":"hello","createdAt":"2026-04-01T12:00:00Z"}}]}}`)); err != nil { + t.Errorf("write ES stub response: %v", err) + } ... - t.Cleanup(func() { _ = serverNATS.Drain() }) + t.Cleanup(func() { assert.NoError(t, serverNATS.Drain()) }) ... - t.Cleanup(func() { _ = router.Shutdown(context.Background()) }) + t.Cleanup(func() { assert.NoError(t, router.Shutdown(context.Background())) })As per coding guidelines: "Never ignore errors silently — comment if intentionally discarded."
Also applies to: 45-47, 58-58, 83-83
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@search-service/integration_messages_test.go` at line 38, Fix the silent error drops in the test by checking and handling the return values instead of discarding them: replace calls like io.Copy(io.Discard, r.Body) and other ignored returns (e.g., r.Body.Close(), resp.Body.Close(), os.RemoveAll(...)) with error checks (e.g., if err := io.Copy(io.Discard, r.Body); err != nil { t.Fatalf("copying body: %v", err) } or use require.NoError(t, err) as appropriate), or if you truly intend to ignore an error, add an explicit comment explaining why; update all occurrences referenced (io.Copy with r.Body, any r.Body.Close()/resp.Body.Close(), and os.RemoveAll) so no error is silently dropped.search-service/integration_users_test.go (1)
41-41:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon’t suppress fixture cleanup and stub write errors.
Line 41, Line 61, and Line 72 ignore returned errors; this can hide teardown failures or incomplete stub responses.
Suggested patch
- t.Cleanup(func() { _ = serverNC.Drain() }) + t.Cleanup(func() { assert.NoError(t, serverNC.Drain()) }) ... - t.Cleanup(func() { _ = router.Shutdown(context.Background()) }) + t.Cleanup(func() { assert.NoError(t, router.Shutdown(context.Background())) }) ... - _, _ = w.Write([]byte(stubResp)) + _, err := w.Write([]byte(stubResp)) + assert.NoError(t, err)As per coding guidelines: "Never ignore errors silently — comment if intentionally discarded."
Also applies to: 61-61, 72-72
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@search-service/integration_users_test.go` at line 41, Several cleanup and stub calls (e.g., the t.Cleanup wrapper that currently does "_ = serverNC.Drain()" and the two other places that discard errors) silently ignore returned errors; update each site to check the error and fail or log it instead. Replace discarded-error patterns like "_ = serverNC.Drain()" (and other discarded calls such as stub.Write/Close/etc.) with an explicit check: if err := serverNC.Drain(); err != nil { t.Fatalf("serverNC.Drain failed: %v", err) } (or use require.NoError(t, err) if testify is used) so teardown and stub-write failures are surfaced rather than suppressed.search-service/integration_apps_test.go (1)
40-40:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSurface fixture teardown failures instead of discarding them.
Line 40 and Line 64 ignore cleanup errors; if teardown fails, this test can pass while leaving leaked subscriptions/connections.
Suggested patch
- t.Cleanup(func() { _ = serverNATS.Drain() }) + t.Cleanup(func() { assert.NoError(t, serverNATS.Drain()) }) ... - t.Cleanup(func() { - _ = router.Shutdown(context.Background()) - }) + t.Cleanup(func() { + assert.NoError(t, router.Shutdown(context.Background())) + })As per coding guidelines: "Never ignore errors silently — comment if intentionally discarded."
Also applies to: 64-65
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@search-service/integration_apps_test.go` at line 40, The test currently discards errors returned by serverNATS.Drain in the t.Cleanup closures; update those cleanup functions (the t.Cleanup closures that call serverNATS.Drain on lines around the two occurrences) to capture the returned error and fail the test if teardown fails (e.g., if err := serverNATS.Drain(); err != nil { t.Fatalf("serverNATS.Drain failed: %v", err) }) so teardown failures surface instead of being ignored.
🧹 Nitpick comments (7)
pkg/testutil/elasticsearch.go (1)
54-60: ⚡ Quick winUse structured
sloglogging for termination paths and avoid silent drops.Line 54/60 currently discard terminate failures, and Line 93 uses
fmt.Fprintf. Please log these errors with structuredslogfields.Proposed patch
import ( "context" "fmt" + "log/slog" "os" "sync" "testing" "time" @@ var ( esOnce sync.Once esContainer testcontainers.Container esURL string esInitErr error + esLogger = slog.New(slog.NewJSONHandler(os.Stderr, nil)) ) @@ host, err := container.Host(ctx) if err != nil { - _ = container.Terminate(ctx) + if termErr := container.Terminate(ctx); termErr != nil { + esLogger.Warn("cleanup failed after elasticsearch host lookup error", "error", termErr) + } esInitErr = fmt.Errorf("get es host: %w", err) return } port, err := container.MappedPort(ctx, "9200") if err != nil { - _ = container.Terminate(ctx) + if termErr := container.Terminate(ctx); termErr != nil { + esLogger.Warn("cleanup failed after elasticsearch port lookup error", "error", termErr) + } esInitErr = fmt.Errorf("get es port: %w", err) return } @@ if err := esContainer.Terminate(ctx); err != nil { - fmt.Fprintf(os.Stderr, "terminate shared elasticsearch: %v\n", err) + esLogger.Error("terminate shared elasticsearch", "error", err) }As per coding guidelines, "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers." and "Never ignore errors silently — comment if intentionally discarded."Also applies to: 92-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/elasticsearch.go` around lines 54 - 60, The termination error from container.Terminate(ctx) is being ignored and fmt.Fprintf is used for error output; update the termination paths in the function that sets esInitErr to log termination failures with structured log/slog (e.g., call slog.Error with fields like "err" and contextual keys) instead of silently discarding the error, and replace the fmt.Fprintf usage with slog.Error/slog.Log JSON-formatted logs; ensure esInitErr still wraps the original error (esInitErr = fmt.Errorf("get es host: %w", err)) but also log the terminate error (container.Terminate(ctx)) with slog including intent if discarding is intentional.pkg/testutil/mongo.go (1)
61-63: ⚡ Quick winSwitch Mongo termination logs to
slogJSON.Line 62 and Line 68 should use structured
slogfields instead of plain stderr formatting.As per coding guidelines, "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers."Also applies to: 67-69
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/mongo.go` around lines 61 - 63, Replace the plain stderr prints around mongoClient.Disconnect and the other termination messages with structured slog calls: instead of fmt.Fprintf(os.Stderr, "...%v...", err) use slog.Error (or the package logger) with a clear message like "disconnect shared mongo client" and include the error as a field (err) and any contextual fields (e.g., "phase":"shutdown" or "client":"sharedMongo"). Update the other nearby fmt.Fprintf/Println termination lines the same way so all shutdown logs use log/slog JSON structured fields rather than text formatting, targeting the mongoClient.Disconnect call and the adjacent shutdown messages in the same function.pkg/testutil/nats.go (1)
44-45: ⚡ Quick winAlign NATS termination logging/error handling with repo logging rules.
Line 44 silently drops cleanup failure, and Line 77 uses plain stderr formatting. Switch to JSON
sloglogging with key-value fields for both paths.As per coding guidelines, "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers." and "Never ignore errors silently — comment if intentionally discarded."Also applies to: 76-78
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/nats.go` around lines 44 - 45, Replace the silent discard of c.Terminate(ctx) and the plain-text stderr output with structured JSON slog logging: capture the error returned by c.Terminate(ctx) instead of assigning it to underscore and log it via slog with key-value fields (e.g., "action":"nats.terminate", "error":err) near where natsInitErr is set, and replace the fmt.Fprintf/os.Stderr usage around lines 76-78 with slog JSON logging (e.g., "action":"nats.init", "msg":..., "error":err) so both cleanup and init failures use log/slog structured JSON and no errors are ignored.pkg/testutil/valkey.go (1)
45-52: ⚡ Quick winUse
slogand stop silently swallowing cleanup errors in Valkey lifecycle.Please replace the silent
_ = container.Terminate(ctx)calls and stderr formatting with structuredsloglogging.As per coding guidelines, "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers." and "Never ignore errors silently — comment if intentionally discarded."Also applies to: 102-104
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/valkey.go` around lines 45 - 52, Replace the silent error discards around container.Terminate(ctx) with structured slog logging and avoid swallowing the terminate error: when container.Terminate(ctx) returns an error, call slog.Error with a descriptive message and include the error and context (e.g., slog.String("step","terminate"), slog.Error(err), slog.String("component","valkey"), possibly the host/port variables) instead of assigning `_ = container.Terminate(ctx)`; keep the existing fmt.Errorf wrapping for valkeyInitErr (e.g., valkeyInitErr = fmt.Errorf("get valkey host: %w", err)) but log the cleanup failure immediately after the terminate call. Apply the same replacement for the other occurrences mentioned (the container.Terminate(ctx) at lines 102-104) so no Terminate error is silently ignored and all logs use log/slog structured fields.pkg/testutil/minio.go (1)
72-74: ⚡ Quick winUse structured
sloglogging for MinIO termination errors.Line 73 should use JSON
slogkey-value logging instead offmt.Fprintf.As per coding guidelines, "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/minio.go` around lines 72 - 74, Replace the plain-text error print when terminating MinIO with structured slog JSON logging: locate the call to minioContainer.Terminate(ctx) and the subsequent fmt.Fprintf(os.Stderr, "terminate shared minio: %v\n", err) and change it to use the package-level slog logger (e.g., slog.Error or slog.With(...).Error) to emit a key/value log containing a clear message ("terminate shared minio failed") and the error under a key like "err" (and include any context keys such as "operation":"terminate" if desired); remove the fmt and os.Stderr usage and ensure the import for log/slog is present.pkg/testutil/cassandra.go (1)
48-55: ⚡ Quick winUse structured logging and avoid silent cleanup-error drops in Cassandra helper.
Please replace silent terminate-error discards and stderr formatting with JSON
slogkey-value logs.As per coding guidelines, "Use
log/slogwith JSON format for all logging — never usefmt.Println,log.Println, or text-format loggers." and "Never ignore errors silently — comment if intentionally discarded."Also applies to: 69-70, 90-92
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/testutil/cassandra.go` around lines 48 - 55, Replace the silent discard of container.Terminate(ctx) and plain-text error handling with structured slog JSON logging: after any call to container.Terminate(ctx) capture its returned error (e.g., termErr := container.Terminate(ctx)); if termErr != nil call slog.Error or slog.Log with key/value pairs (e.g., slog.Error("terminate container failed", "terminate_error", termErr, "context_err", err, "step", "get cassandra host/port")) instead of discarding; keep setting cassInitErr (e.g., cassInitErr = fmt.Errorf("get cassandra host: %w", err)) but log both the original err and any terminate error via slog; apply the same pattern for the other occurrences around the cassInitErr paths and any places that used fmt.Println/ignored Terminate errors (references: cassInitErr variable and container.Terminate(ctx) and the container.MappedPort call).tools/loadgen/members_integration_test.go (1)
120-125: ⚡ Quick winReplace fixed post-run sleep with condition-based waiting.
time.Sleep(500ms)here is timing-sensitive; under load it can flake. Wait on the actual counters instead.Suggested fix
- // Allow trailing events to settle before checking counters. - time.Sleep(500 * time.Millisecond) - - // Both simulated stages must have seen traffic. - assert.Greater(t, rsCalls.Load(), int64(0), "room-service was never called") - assert.Greater(t, rwCalls.Load(), int64(0), "room-worker never consumed") + require.Eventually(t, func() bool { + return rsCalls.Load() > 0 && rwCalls.Load() > 0 + }, 5*time.Second, 50*time.Millisecond, "expected both simulated stages to see traffic")As per coding guidelines: "Never use
time.Sleep()for goroutine synchronization — use proper sync primitives."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/loadgen/members_integration_test.go` around lines 120 - 125, Replace the fixed time.Sleep(500 * time.Millisecond) in members_integration_test.go with a condition-based wait that polls the actual counters rsCalls.Load() and rwCalls.Load() until both are > 0 or a reasonable timeout elapses; implement a small loop using time.NewTicker and time.After (or testing's deadline) that checks rsCalls.Load() and rwCalls.Load() each tick and returns when both are positive, and call t.Fatalf or t.Fatal if the timeout is reached to fail the test, ensuring no goroutine synchronization relies on a fixed sleep.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@history-service/internal/service/main_test.go`:
- Line 3: All tests in this directory must share the same package so TestMain in
main_test.go controls the full test lifecycle; change the package declaration in
integration_test.go, main_test.go, messages_test.go, threads_test.go,
room_times_test.go, and utils_test.go from "package service_test" or "package
service" as needed so they all read "package service". Ensure TestMain (in
main_test.go) remains in the same package and that any references to unexported
types still compile after switching the four files that were "service_test" to
"service".
In `@pkg/natsrouter/main_test.go`:
- Line 3: The test file declares package natsrouter_test but must use the same
package as the code under test; change the package declaration from "package
natsrouter_test" to "package natsrouter" so the tests run in-package and can
access unexported types and methods used by this test file (update the
top-of-file package line accordingly).
In `@pkg/roomkeysender/main_test.go`:
- Line 3: The test file currently declares package "roomkeysender_test" which
violates the repo rule requiring test files to use the same package as the code;
change the package declaration in main_test.go from "package roomkeysender_test"
to "package roomkeysender" so the test runs in-package and can access unexported
types (no other changes required).
In `@search-sync-worker/integration_test.go`:
- Around line 44-50: The prewarm goroutines in TestMain are silently discarding
errors from testutil.EnsureElasticsearch and testutil.EnsureNATS; modify
TestMain so it collects and checks errors before calling m.Run (e.g., replace
the current fire-and-forget goroutine pattern using wg with an error channel or
an errgroup) and fail fast if any call to EnsureElasticsearch or EnsureNATS
returns a non-nil error; ensure the fix references the existing wg, the list of
functions (testutil.EnsureElasticsearch, testutil.EnsureNATS) and that TestMain
exits with a non-zero status (or calls t.Fatalf equivalent) when any prewarm
error is observed.
---
Duplicate comments:
In `@search-service/integration_apps_test.go`:
- Line 40: The test currently discards errors returned by serverNATS.Drain in
the t.Cleanup closures; update those cleanup functions (the t.Cleanup closures
that call serverNATS.Drain on lines around the two occurrences) to capture the
returned error and fail the test if teardown fails (e.g., if err :=
serverNATS.Drain(); err != nil { t.Fatalf("serverNATS.Drain failed: %v", err) })
so teardown failures surface instead of being ignored.
In `@search-service/integration_ccs_test.go`:
- Line 74: The test currently discards cleanup and read errors (e.g., the
t.Cleanup call that calls nw.Remove(ctx) and other network/container/router
cleanup and HTTP body reads), which hides failures and leaks; update each silent
discard to capture the error and report it via the test (for cleanup use
t.Cleanup(func(){ if err := nw.Remove(ctx); err != nil { t.Errorf("nw.Remove
failed: %v", err) } }) or t.Logf/t.Fatalf as appropriate) and for HTTP responses
ensure resp.Body is closed and check errors from io.ReadAll and resp.Body.Close
(e.g., if b, err := io.ReadAll(resp.Body); err != nil { t.Fatalf("read body:
%v", err) }). Locate and fix the occurrences around the symbols nw.Remove,
container/router cleanup calls, and HTTP response body reads to explicitly
handle and report errors rather than ignoring them.
In `@search-service/integration_messages_test.go`:
- Line 38: Fix the silent error drops in the test by checking and handling the
return values instead of discarding them: replace calls like io.Copy(io.Discard,
r.Body) and other ignored returns (e.g., r.Body.Close(), resp.Body.Close(),
os.RemoveAll(...)) with error checks (e.g., if err := io.Copy(io.Discard,
r.Body); err != nil { t.Fatalf("copying body: %v", err) } or use
require.NoError(t, err) as appropriate), or if you truly intend to ignore an
error, add an explicit comment explaining why; update all occurrences referenced
(io.Copy with r.Body, any r.Body.Close()/resp.Body.Close(), and os.RemoveAll) so
no error is silently dropped.
In `@search-service/integration_rooms_test.go`:
- Line 47: The cleanup and teardown currently discard errors (e.g., in the
t.Cleanup closures calling serverNC.Drain() and other index fixture
setup/teardown calls), which can hide failures; change each discarded-error site
to check the returned error and fail the test loudly (use t.Fatalf or
testing/require's NoError/NoErrorf) so that any Drain()/Close()/teardown error
surfaces immediately (refer to the exact calls to serverNC.Drain(), the index
fixture setup/teardown closures, and any other places where errors are assigned
to `_` in the test harness).
In `@search-service/integration_users_test.go`:
- Line 41: Several cleanup and stub calls (e.g., the t.Cleanup wrapper that
currently does "_ = serverNC.Drain()" and the two other places that discard
errors) silently ignore returned errors; update each site to check the error and
fail or log it instead. Replace discarded-error patterns like "_ =
serverNC.Drain()" (and other discarded calls such as stub.Write/Close/etc.) with
an explicit check: if err := serverNC.Drain(); err != nil {
t.Fatalf("serverNC.Drain failed: %v", err) } (or use require.NoError(t, err) if
testify is used) so teardown and stub-write failures are surfaced rather than
suppressed.
In `@search-service/setup_shared_test.go`:
- Line 52: In the seedDoc helper, don't ignore the io.ReadAll error; change the
read to capture the error (e.g., body, err := io.ReadAll(resp.Body)) and handle
it by failing the test or returning the error from seedDoc (use t.Fatalf or
return fmt.Errorf) with a message that includes the error and useful context
(response status and partial body) so failures are diagnosable; update any
callers of seedDoc if needed to propagate the error.
---
Nitpick comments:
In `@pkg/testutil/cassandra.go`:
- Around line 48-55: Replace the silent discard of container.Terminate(ctx) and
plain-text error handling with structured slog JSON logging: after any call to
container.Terminate(ctx) capture its returned error (e.g., termErr :=
container.Terminate(ctx)); if termErr != nil call slog.Error or slog.Log with
key/value pairs (e.g., slog.Error("terminate container failed",
"terminate_error", termErr, "context_err", err, "step", "get cassandra
host/port")) instead of discarding; keep setting cassInitErr (e.g., cassInitErr
= fmt.Errorf("get cassandra host: %w", err)) but log both the original err and
any terminate error via slog; apply the same pattern for the other occurrences
around the cassInitErr paths and any places that used fmt.Println/ignored
Terminate errors (references: cassInitErr variable and container.Terminate(ctx)
and the container.MappedPort call).
In `@pkg/testutil/elasticsearch.go`:
- Around line 54-60: The termination error from container.Terminate(ctx) is
being ignored and fmt.Fprintf is used for error output; update the termination
paths in the function that sets esInitErr to log termination failures with
structured log/slog (e.g., call slog.Error with fields like "err" and contextual
keys) instead of silently discarding the error, and replace the fmt.Fprintf
usage with slog.Error/slog.Log JSON-formatted logs; ensure esInitErr still wraps
the original error (esInitErr = fmt.Errorf("get es host: %w", err)) but also log
the terminate error (container.Terminate(ctx)) with slog including intent if
discarding is intentional.
In `@pkg/testutil/minio.go`:
- Around line 72-74: Replace the plain-text error print when terminating MinIO
with structured slog JSON logging: locate the call to
minioContainer.Terminate(ctx) and the subsequent fmt.Fprintf(os.Stderr,
"terminate shared minio: %v\n", err) and change it to use the package-level slog
logger (e.g., slog.Error or slog.With(...).Error) to emit a key/value log
containing a clear message ("terminate shared minio failed") and the error under
a key like "err" (and include any context keys such as "operation":"terminate"
if desired); remove the fmt and os.Stderr usage and ensure the import for
log/slog is present.
In `@pkg/testutil/mongo.go`:
- Around line 61-63: Replace the plain stderr prints around
mongoClient.Disconnect and the other termination messages with structured slog
calls: instead of fmt.Fprintf(os.Stderr, "...%v...", err) use slog.Error (or the
package logger) with a clear message like "disconnect shared mongo client" and
include the error as a field (err) and any contextual fields (e.g.,
"phase":"shutdown" or "client":"sharedMongo"). Update the other nearby
fmt.Fprintf/Println termination lines the same way so all shutdown logs use
log/slog JSON structured fields rather than text formatting, targeting the
mongoClient.Disconnect call and the adjacent shutdown messages in the same
function.
In `@pkg/testutil/nats.go`:
- Around line 44-45: Replace the silent discard of c.Terminate(ctx) and the
plain-text stderr output with structured JSON slog logging: capture the error
returned by c.Terminate(ctx) instead of assigning it to underscore and log it
via slog with key-value fields (e.g., "action":"nats.terminate", "error":err)
near where natsInitErr is set, and replace the fmt.Fprintf/os.Stderr usage
around lines 76-78 with slog JSON logging (e.g., "action":"nats.init",
"msg":..., "error":err) so both cleanup and init failures use log/slog
structured JSON and no errors are ignored.
In `@pkg/testutil/valkey.go`:
- Around line 45-52: Replace the silent error discards around
container.Terminate(ctx) with structured slog logging and avoid swallowing the
terminate error: when container.Terminate(ctx) returns an error, call slog.Error
with a descriptive message and include the error and context (e.g.,
slog.String("step","terminate"), slog.Error(err),
slog.String("component","valkey"), possibly the host/port variables) instead of
assigning `_ = container.Terminate(ctx)`; keep the existing fmt.Errorf wrapping
for valkeyInitErr (e.g., valkeyInitErr = fmt.Errorf("get valkey host: %w", err))
but log the cleanup failure immediately after the terminate call. Apply the same
replacement for the other occurrences mentioned (the container.Terminate(ctx) at
lines 102-104) so no Terminate error is silently ignored and all logs use
log/slog structured fields.
In `@tools/loadgen/members_integration_test.go`:
- Around line 120-125: Replace the fixed time.Sleep(500 * time.Millisecond) in
members_integration_test.go with a condition-based wait that polls the actual
counters rsCalls.Load() and rwCalls.Load() until both are > 0 or a reasonable
timeout elapses; implement a small loop using time.NewTicker and time.After (or
testing's deadline) that checks rsCalls.Load() and rwCalls.Load() each tick and
returns when both are positive, and call t.Fatalf or t.Fatal if the timeout is
reached to fail the test, ensuring no goroutine synchronization relies on a
fixed sleep.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9c86391a-3e00-4016-bb2f-28e445f06c8b
📒 Files selected for processing (40)
broadcast-worker/main_test.gohistory-service/internal/cassrepo/main_test.gohistory-service/internal/mongorepo/main_test.gohistory-service/internal/service/main_test.goinbox-worker/integration_test.goinbox-worker/main_test.gomessage-worker/main_test.gonotification-worker/main_test.gopkg/minioutil/main_test.gopkg/mongoutil/main_test.gopkg/natsrouter/integration_test.gopkg/natsrouter/main_test.gopkg/roomcrypto/main_test.gopkg/roomkeysender/main_test.gopkg/roomkeystore/integration_test.gopkg/roomsubcache/integration_test.gopkg/testutil/cassandra.gopkg/testutil/elasticsearch.gopkg/testutil/init.gopkg/testutil/minio.gopkg/testutil/mongo.gopkg/testutil/nats.gopkg/testutil/terminate.gopkg/testutil/testmain.gopkg/testutil/valkey.gopkg/userstore/main_test.goroom-service/integration_test.goroom-service/main_test.goroom-worker/integration_test.goroom-worker/main_test.gosearch-service/integration_apps_test.gosearch-service/integration_ccs_test.gosearch-service/integration_messages_test.gosearch-service/integration_rooms_test.gosearch-service/integration_test.gosearch-service/integration_users_test.gosearch-service/setup_shared_test.gosearch-sync-worker/integration_test.gotools/loadgen/integration_test.gotools/loadgen/members_integration_test.go
💤 Files with no reviewable changes (1)
- search-service/integration_test.go
✅ Files skipped from review due to trivial changes (1)
- notification-worker/main_test.go
Match the pattern already used by search-service's TestMain — collect prewarm errors in a channel and surface them before m.Run rather than letting individual tests fail with confusing 'couldn't connect' errors. Per CodeRabbit review on PR #208.
Repo convention (CLAUDE.md): test files live in the same package as the
code under test so they can reach unexported identifiers. Seven files
were using external _test packages — switch them all to internal:
pkg/natsrouter/{integration_test,main_test}.go
pkg/roomsubcache/integration_test.go (+ new main_test.go)
pkg/roomkeysender/{integration_test,main_test}.go
history-service/internal/service/{integration_test,main_test}.go
Mechanical changes: strip the self-import, de-prefix package-qualified
identifiers (natsrouter.New → New, service.New → New, etc.), change
the package declaration. No behaviour change.
Added pkg/roomsubcache/main_test.go that was missing — it uses
testutil.Valkey but had no TestMain to call testutil.TerminateAll.
Resolves CodeRabbit "package _test" review threads.
pkg/roomkeystore was using testutil.Valkey + testutil.FlushValkey but had no TestMain to call testutil.TerminateAll. Same pattern as pkg/roomsubcache (also caught and fixed in earlier commit). Consistency audit verified — every integration test package now has a TestMain that calls testutil.TerminateAll (via testutil.RunTests for the common case, or a custom prewarm wrapper for search-service and search-sync-worker). The only exception is auth-service, which uses no testcontainers (pure httptest).
…ablished The old Integration Tests section instructed contributors to "write setup<Dep>(t) helpers that start a container" — exactly the pattern this PR consolidated away. Rewrite the section to capture what we actually expect now: - Internal package, never external *_test - Containers come from pkg/testutil helpers, not inline starts - Every package has a TestMain calling testutil.TerminateAll - Ryuk is disabled repo-wide; explicit cleanup is the contract - Per-test isolation responsibility per container type - When inline GenericContainer is acceptable (CCS, roomcrypto, roomkeysender) - Where new shared container helpers belong (pkg/testutil with the Xxx + EnsureXxx + TerminateXxx triple)
Main introduced Valkey cluster-mode (PR #199 - 'feat: Valkey cluster support with room key ensure RPC'). Reconciliation: - Adopt main's per-test testutil.StartValkeyCluster instead of my shared Valkey helpers. valkey is now per-test (each test gets its own cluster-mode container), not process-shared. - Drop my testutil.Valkey / EnsureValkey / TerminateValkey / FlushValkey — replaced by StartValkeyCluster. - Remove TerminateValkey from TerminateAll (nothing process-shared to clean up). - Update search-service per-endpoint files (integration_ccs_test.go, integration_rooms_test.go) to use valkeyutil.WrapClusterClient(testutil.StartValkeyCluster(t)). - Drop search-service's local valkeyClient/flushValkey helpers. - Keep main's roomkeystore/roomsubcache integration tests but switch roomsubcache back to internal package (consistency). - Add main_test.go for pkg/valkeyutil (newly has integration tests on main). - Re-migrate room-service setupNATS to testutil.NATS (main reverted this when it touched the file). - Update tools/loadgen unit tests for the new ValkeyAddrs []string config field. - Update CLAUDE.md: Valkey is now per-test, not shared.
- pkg/testutil/terminate.go: drop TerminateValkey from TerminateAll (no process-shared Valkey to clean up; StartValkeyCluster is per-test). - search-service/setup_shared_test.go: drop local valkeyClient helper and EnsureValkey prewarm. - search-service/integration_ccs_test.go, search-service/integration_rooms_test.go: use valkeyutil.WrapClusterClient(testutil.StartValkeyCluster(t)). - room-service/integration_test.go: re-apply NATS migration to testutil.NATS (was reverted when merge took main's version). - pkg/roomsubcache/integration_test.go: switch back to internal package (roomsubcache, not roomsubcache_test) for consistency. - pkg/valkeyutil/main_test.go: add missing TestMain. - tools/loadgen/main_test.go: update unit tests for ValkeyAddrs []string config field rename. - CLAUDE.md: document Valkey as per-test, not shared.
testutil/valkey.go now exposes two variants:
- SharedValkeyCluster(t) — process-shared cluster via sync.Once. Fast
(cluster boots once per go test invocation). Callers register
t.Cleanup(func() { testutil.FlushValkey(t) }) for keyspace isolation.
- StartValkeyCluster(t) — per-test cluster (unchanged). Use only for
tests asserting on cluster-routing state (pkg/roomkeystore) or that
own a store wrapper that closes the underlying client.
Wired TerminateValkey back into TerminateAll so the shared cluster is
reaped at process exit.
Migrated safe callers to the shared variant:
- search-service/integration_ccs_test.go, integration_rooms_test.go
- pkg/roomsubcache/integration_test.go
- pkg/valkeyutil/integration_test.go
- search-service TestMain prewarm now includes EnsureValkey
Kept on per-test (deliberate):
- pkg/roomkeystore (CLUSTER KEYSLOT routing assertions)
- room-service, room-worker (use NewValkeyClusterStoreFromClient whose
store.Close() releases the underlying client — would kill the shared
cluster mid-run if a handler ever called Close)
Expected wins: ~15-25s per go test ./search-service/... and similar
per-package savings elsewhere on packages that hit Valkey heavily.
CLAUDE.md updated to document both variants + when to use each.
history-service/docker-local was a per-service dev compose stack (NATS + Mongo + Cassandra + the service itself). The canonical local dev stack lives in the repo-root docker-local/ — no need for the per-service duplicate.
Net -120 lines of comments / narration across the PR. Targeted only
WHAT-comments and progress narration; preserved load-bearing WHY:
- pkg/testutil/{init,terminate,testmain,valkey}.go: docstrings cut to
one or two lines each; deleted "intended for ..." sentences that
restate function names. Kept the newValkeyClusterClient
ClusterSlots-override explanation (load-bearing WHY).
- search-service/setup_shared_test.go: removed redundant "pre-warms ..."
multi-paragraph TestMain comment; same for uniqueESIndex.
- search-service/integration_ccs_test.go: deleted the 14 t.Logf progress
lines (failures from require.NoError already pinpoint the phase),
trimmed setupCCSFixture / installTemplates / startESForCCS /
buildTestTemplate docstrings, removed in-test narration like
"Local message in local room" / "Round-trips through the real
natsrouter ...". Kept the PROXY-mode rationale (real WHY) and the
Clause-A/B1/B2 inline scenario comments (load-bearing for restricted
query semantics).
- search-service/integration_apps_test.go: dropped the "Prototype
pipeline" follow-up note (belongs in a TODO, not a comment).
- search-service/integration_rooms_test.go: trimmed roomsFixture and
putTestSpotlightIndex docstrings.
- search-service/integration_users_test.go: removed inline
"controls the stub response" comment on a self-explanatory field.
- search-sync-worker/integration_test.go: TestMain comment cut to one
line, matching search-service.
- tools/loadgen/integration_test.go: removed "Allow trailing events to
flow" / "Assert canonical stream drained" / "Assert seed data
visible" — code is self-explanatory.
- pkg/roomcrypto/main_test.go, pkg/roomkeysender/main_test.go:
removed 2-line "Import testutil for the Ryuk-disable init() side
effect" preamble; the import alone now conveys it.
CodeRabbit suggestions to extract a setupRouter helper or a
PrewarmFailFast helper deferred — out of scope for a comment-trim pass.
…searchIndex helpers
Three new testutil helpers consolidate patterns the PR was already
duplicating across 5+ files:
- testutil.PrewarmFailFast(fns...): runs EnsureXxx funcs concurrently,
returns the first error.
- testutil.RunTestsWithPrewarm(m, fns...): the standard TestMain wrap
with pre-warm fail-fast. On prewarm failure, exits 1 after
TerminateAll. On success, hands off to RunTests.
- testutil.ElasticsearchIndex(t, prefix): per-test ES index name (fnv
hash of t.Name()) + DELETE on cleanup. Completes the per-test
isolation triad alongside MongoDB / CassandraKeyspace / MinIO.
Migrations:
- search-service/setup_shared_test.go: TestMain collapses from 30 lines
(manual wg+errCh+goroutines) to a single RunTestsWithPrewarm call.
Drops the local uniqueESIndex helper. Adds a setupRouter helper that
every per-endpoint fixture uses to wire NATS server+client conns +
router with the given queue group + Flush + cleanups.
- search-service/integration_{apps,users,rooms,messages,ccs}_test.go:
each fixture's setup function now ends with
`clientNATS := setupRouter(t, queueGroup, h.Register)`. Drops ~9
lines of plumbing per fixture (45 lines saved).
- search-sync-worker/integration_test.go: replaces the manual prewarm
block with testutil.PrewarmFailFast. Keeps its custom m.Run wrap
because it needs to close the lazy-init JetStream conn between
m.Run and TerminateAll.
CLAUDE.md updated to point at the new helpers.
Summary
Consolidates testcontainer management for every integration test in the repo.
pkg/testutilis now the single source of truth for Elasticsearch, NATS, Valkey, Mongo, Cassandra and MinIO startup; each service uses shared containers via the testutil helpers instead of rolling its own.Originally scoped to
search-service(per-test ES startup was the worst offender at ~30-60s × 4 Rooms tests + 2 CCS pairs). Expanded after audit to fix the same pattern across every integration package.Why merge? Tests used to leak containers — even after a green run, MongoDB / NATS / Cassandra / Valkey containers piled up on the dev server eating resources. That happened because the shared helpers never terminated their containers and we have to disable Ryuk on our CI runner. This PR adds explicit
TerminateAllvia TestMain so containers go away on every clean test exit.What changed
pkg/testutilMongoDB(t, prefix)sync.Once, no TerminateTerminateMongoCassandraKeyspace(t, prefix)sync.Once, no TerminateTerminateCassandraMinIO(t, prefix)sync.Once, no TerminateTerminateMinIOElasticsearch(t)TerminateElasticsearchNATS(t)TerminateNATSStartValkeyCluster(t)SharedValkeyCluster(t)FlushValkey+TerminateValkeyTerminateAll()RunTests(m)init()Each shared helper has a matching no-
tEnsureXxx()variant forTestMainpre-warming.Valkey: two variants
Reconciled with main's PR #199 (cluster-mode Valkey support). Two helpers, choose by use case:
SharedValkeyCluster(t)— default. Process-shared cluster (started viasync.Once, reaped viaTerminateAll). Callers pair witht.Cleanup(testutil.FlushValkey)for keyspace isolation. Used by search-service tests,pkg/roomsubcache,pkg/valkeyutil.StartValkeyCluster(t)— per-test cluster. Use only when the test asserts on cluster-routing state (pkg/roomkeystore'sCLUSTER KEYSLOTchecks) or owns a store wrapper thatClose()s the underlying client (room-service,room-worker).Per-service migrations
search-service(ES/NATS),search-sync-worker(ES/NATS),room-service(NATS),inbox-worker(NATS),tools/loadgen(NATS),pkg/natsrouter,pkg/roomkeystore,pkg/roomsubcache,pkg/valkeyutil.TestMainadded for explicit cleanup: 18 packages, each callingtestutil.TerminateAll()on exit viatestutil.RunTests(m).pkg/roomkeysender(NATS with WebSocket transport for a JS bridge),pkg/roomcrypto(Node container). Each still imports testutil so the Ryuk-disable init runs.Container reference / cleanup contract
Every testcontainer in this PR has:
Terminateon init-failure paths insideensureXxxhelpers.TestMain → TerminateAll(shared containers) ort.Cleanup(per-test local containers).This holds regardless of test pass / fail / panic:
t.Cleanupruns on failure + panic, andTestMain'scode := m.Run(); TerminateAll(); os.Exit(code)wrap ensures cleanup runs before process exit. AllTerminate*functions are idempotent and nil their refs after cleanup, so duplicate calls are safe.Isolation strategy
uniqueESIndex,t.Name()-hashed),DELETEd on cleanup.FLUSHALLacross every master on cleanup viatestutil.FlushValkey(t).*nats.Conn; subscriptions removed viarouter.Shutdown+nc.Drainbefore the next test.Other changes
package <pkg>/package main, never external*_test. Required by CLAUDE.md and gives access to unexported identifiers.history-service/docker-local/removed — redundant per-service dev compose stack; the canonical local stack lives at repo root.Performance impact (per
go testinvocation, in-process)TestMainpre-warms ES + NATS + Valkey + Mongo in parallel goroutines so cold-start wall-clock drops tomax(ES, NATS, Valkey)≈ ES alone instead of the sum. ES heap also dropped from 512m to 256m for tests (seeds are <10 docs/test).Note: container sharing is per-process, so it applies within a single
go test ./<service>/...invocation. Across services (separate test binaries), each gets its own pool — which is what the CI matrix already does anyway.Pattern for new integration tests
Verification
go vet(default and-tags=integration) clean across the whole repomake lint: 0 issuesmake test(unit) passesTest plan
test-integration (<service>)matrix jobs passhttps://claude.ai/code/session_01MzdTDAGknG36FYtqd5MBTw