Skip to content

fix(backend): local-dev unblock + dev-mode message rendering across services#148

Closed
Joey0538 wants to merge 6 commits into
mainfrom
claude/backend-dev-fixes
Closed

fix(backend): local-dev unblock + dev-mode message rendering across services#148
Joey0538 wants to merge 6 commits into
mainfrom
claude/backend-dev-fixes

Conversation

@Joey0538
Copy link
Copy Markdown
Collaborator

@Joey0538 Joey0538 commented May 3, 2026

Summary

Six service-grouped commits, all backend, that unblock the local-dev stack and close enough gaps in the message-delivery pipeline that an end-to-end channel + DM flow works against make up from a fresh checkout. Companion to PR #146 (frontend); each PR is independently mergeable.

Commits (6, oldest → newest)

Commit Scope
chore(local-dev): unblock + speed up dev stack NATS healthcheck, OTLP-skip env, .dockerignore, split up/up-rebuild, stop_grace_period: 2s on every service compose, make seed-users + make backfill-room-keys
fix(otelutil): skip OTLP tracer init when no endpoint env is set Stops traces export: connection refused log spam in local dev when no collector runs
fix(room-service): mint room key, enroll owner+DM recipient, emit member_added P-256 mint on create + DM two-sided sub + subscription.update + INBOX member_added for spotlight/user-room indexing
fix(broadcast-worker): no-key ack-skip + DEV_MODE plaintext Stops the keyless-room nak-loop; new DEV_MODE env keeps plaintext alongside encrypted payload for no-crypto local frontends
fix(search-service): env-driven user-room + spotlight index names Pin both indexes via env so the const default doesn't 404 against site-suffixed indexes
fix(room-worker): publish member_added on add + sysMsg sender on remove Add-members slice of PR #145; populate sysMsg UserID on member-remove so chat history doesn't render "Unknown"

What this unblocks

After make deps-up && make up && make seed-users, you can:

  • Log in as alice / bob (dev mode, siteId=site-local)
  • Create channel + DM rooms — appear in the left panel without refresh
  • Send messages — they render immediately in the room
  • Search across rooms + messages — both indexes resolve correctly
  • Add / remove members — system messages render with the actor's name; new members get indexed in spotlight/user-room

Notes

  • A few changes are dev-only and gated explicitly: DEV_MODE=true on broadcast-worker is wired in the local compose with a startup slog.Warn and a comment that says it MUST stay false in prod. Pin'd index names target the site-suffixed concrete indexes; prod uses ops-owned aliases.
  • room-worker only closes the add-members slice of PR docs(spec): federated room origin-site MV fix design #145's spec. Remove-individual / remove-org INBOX publishes from docs(spec): federated room origin-site MV fix design #145 stay TODO.
  • chore(local-dev) includes seed/backfill scripts under docker-local/ — purely dev fixtures.

Test plan

  • make lint && make test clean
  • make deps-up && make up && make seed-users; create + send + search + member ops as alice/bob
  • No traces export log spam without an OTLP collector
  • No broadcast-worker no current key nak-loop on a backfilled keyless room

Summary by CodeRabbit

  • New Features

    • DM rooms auto-enroll recipients and generate/store room encryption keys.
    • Search spotlight index is configurable via environment variables.
    • Same-site inbox member dispatch added for member additions.
  • Bug Fixes

    • Consistent short graceful shutdown period applied to many services.
    • Improved NATS healthcheck behavior for more reliable startup.
  • Chores

    • Dev helper scripts added: seed-users and backfill-room-keys; Makefile targets updated.
    • Updated repository ignore rules to exclude common build, IDE, and secret files.
  • Chores (Observability)

    • Tracing now no-ops when OTLP endpoints are unset.

Joey0538 added 2 commits May 3, 2026 17:16
- NATS healthcheck uses /healthz?js-server-only=true so a fresh JetStream
  volume doesn't 503; bump start_period for slower disks.
- Root .dockerignore so service builds don't tar the whole repo.
- Split `make up` (no rebuild) from `make up-rebuild`.
- stop_grace_period: 2s on every service compose (was 10s × 11 ≈ 110s
  on `make down`).
- `make seed-users` + docker-local/seed-users.sh: idempotent fixtures
  (alice, bob) so dev-auth users have a `users` row.
- `make backfill-room-keys` + docker-local/backfill-room-keys.sh: mint
  Valkey keys for rooms created before mint-on-create.
InitTracer now no-ops (returns the SDK noop provider) unless
OTEL_EXPORTER_OTLP_ENDPOINT or OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is
set. Prevents local dev from flooding logs with "traces export:
connection refused" against 127.0.0.1:4317 when no collector is
running. Prod/staging configure the env via deployment.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

Warning

Rate limit exceeded

@Joey0538 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 37 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fae7bd7a-f828-4ac0-b557-81de216f9e3e

📥 Commits

Reviewing files that changed from the base of the PR and between 0db4e6a and cf2c31a.

📒 Files selected for processing (14)
  • broadcast-worker/deploy/docker-compose.yml
  • broadcast-worker/handler.go
  • broadcast-worker/handler_test.go
  • broadcast-worker/main.go
  • room-service/handler.go
  • room-service/handler_test.go
  • room-service/main.go
  • room-service/mock_store_test.go
  • room-service/store.go
  • room-worker/handler.go
  • search-service/deploy/docker-compose.yml
  • search-service/handler.go
  • search-service/handler_test.go
  • search-service/main.go
📝 Walkthrough

Walkthrough

Adds dev tools and scripts, introduces broadcast-worker DEV_MODE and keyless-room behavior, generates/persists room ECDH keys at room creation, enhances member add/remove messaging with identity and local INBOX dispatch, parameterizes search indices, tightens OTEL initialization, expands .dockerignore, and standardizes stop_grace_period across services.

Changes

Broadcast Worker Dev Mode & Keyless Room Handling

Layer / File(s) Summary
Data Shape
broadcast-worker/handler.go
Handler gains devMode bool.
Core Logic
broadcast-worker/handler.go
If keystore lookup returns key == nil the handler logs a warning and returns nil (drop broadcast); plaintext evt.Message is cleared when devMode is false.
Config & Wiring
broadcast-worker/main.go, broadcast-worker/deploy/docker-compose.yml
config reads DEV_MODE; main() assigns handler.devMode and logs a warning when enabled.
Tests
broadcast-worker/handler_test.go
Missing-key test expectation changed from error to no-error; no-publish assertion retained.

Room Service Encryption Key Generation & DM Member Management

Layer / File(s) Summary
Interface
room-service/store.go
RoomKeyStore adds Set(ctx, roomID, pair) (int, error) for writing room keypairs.
Data Shape & Key Generation
room-service/handler.go
Handler adds publishEvent callback; DM creation validates recipient != creator, sets Room.UserCount = 2 for DMs, generates P-256 ECDH keypair and best-effort stores it via RoomKeyStore.Set.
Member Enrollment
room-service/handler.go
For DMs, creates second Subscription for recipient; builds subscription update event and best-effort publishes it.
Event Publishing
room-service/main.go
Handler is wired with .WithEventPublisher(...) using nc.PublishMsg for transient subscription events.
Mocks & Tests
room-service/mock_store_test.go, room-service/handler_test.go
MockRoomKeyStore.Set added; test adjusted to allow multiple subscription creations (AnyTimes()).

Room Worker Member Identity & Local Inbox Dispatch

Layer / File(s) Summary
System Message Identity
room-worker/handler.go
processRemoveIndividual and processRemoveOrg now set UserID and UserAccount on system model.Message from req.Requester.
Local Member Addition Dispatch
room-worker/handler.go
processAddMembers computes same-site accounts, publishes InboxMemberEvent wrapped in OutboxEvent to local InboxMemberAdded subject with a :local-added dedup seed.

Search Service Index Parameterization

Layer / File(s) Summary
Config Structure
search-service/main.go
SearchConfig adds required SpotlightIndex (SPOTLIGHT_INDEX) and makes USER_ROOM_INDEX required.
Handler Config
search-service/handler.go
handlerConfig adds SpotlightIndex field; search uses h.cfg.SpotlightIndex instead of package constant.
Wiring & Tests
search-service/main.go, search-service/handler_test.go, search-service/deploy/docker-compose.yml
Handler initialized with SpotlightIndex; compose and test updated accordingly.

Infrastructure, Development Utilities & Container Lifecycle

Layer / File(s) Summary
Container Lifecycle
*/deploy/docker-compose.yml (many services)
Added stop_grace_period: 2s across services (auth, broadcast-worker, history, inbox-worker, message-gatekeeper, message-worker, notification-worker, room-service, room-worker, search-service, search-sync-worker).
Build & Dev Targets
Makefile
.PHONY extended; up no longer uses --build; added up-rebuild, seed-users, backfill-room-keys targets.
Dev Scripts
docker-local/seed-users.sh, docker-local/backfill-room-keys.sh
New idempotent scripts: seed dev users (alice, bob) and backfill dev P-256 room keys into Valkey for rooms missing keys.
Container Configuration
docker-local/compose.deps.yaml
NATS healthcheck changed to /healthz?js-server-only=true, retries increased to 12, start_period to 15s.
Docker Exclude Patterns
.dockerignore
Expanded ignore list for VCS, IDE, macOS, frontend build outputs, local docker-local secrets/configs, bins, coverage/logs/tests/tmp, docs/tools.
Observability
pkg/otelutil/otel.go
InitTracer skips OTLP exporter when OTLP endpoint env vars are unset and returns a no-op shutdown; sets text map propagator accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RoomService as RoomService/Handler
    participant MongoDB
    participant Valkey
    participant NATS

    Client->>RoomService: CreateRoom (DM)
    RoomService->>MongoDB: Insert Room + Subscription (creator)
    RoomService->>RoomService: generate P-256 keypair (ephemeral)
    RoomService->>Valkey: RoomKeyStore.Set(roomID, keypair) [best-effort]
    RoomService->>MongoDB: Create Subscription (recipient)
    RoomService->>NATS: Publish SubscriptionUpdateEvent (transient)
    RoomService->>NATS: Publish Inbox/Outbox member_added event
    NATS-->>Client: publish ack (async)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • mliu33

"i sprouted keys beneath the moonlight,
dev-mode hums and keeps plain text in sight,
seeds and backfills danced all night,
services bowed with graceful flight,
a rabbit cheers for infra done right 🐇"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main changes: fixes for local development unblocking and dev-mode message rendering across backend services.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/backend-dev-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 53 minutes and 37 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Joey0538 added a commit that referenced this pull request May 3, 2026
CI lint failed on PR #148 — goimports flagged broadcast-worker/main.go
after the DevMode field was added to the env-tagged config block. Run
`make fmt` to align.
@Joey0538 Joey0538 force-pushed the claude/backend-dev-fixes branch 2 times, most recently from b82086c to 83aa0dc Compare May 4, 2026 03:38
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
room-service/handler.go (2)

134-145: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject self-DMs before creating the room.

len(req.Members) == 1 still allows req.CreatedBy == req.Members[0]. That produces a one-user DM, sets UserCount to 2, and later attempts a duplicate subscription insert for the same principal.

Suggested fix
 	case model.RoomTypeDM:
 		if len(req.Members) != 1 {
 			return nil, fmt.Errorf("DM requires exactly one other member, got %d", len(req.Members))
 		}
+		if req.Members[0] == req.CreatedBy {
+			return nil, fmt.Errorf("DM requires exactly one other member")
+		}
 		roomID = idgen.BuildDMRoomID(req.CreatedBy, req.Members[0])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@room-service/handler.go` around lines 134 - 145, Reject attempts to create a
DM with the creator as the sole member by validating req.CreatedBy !=
req.Members[0] before proceeding; in the DM branch (where you check
len(req.Members) != 1 and call idgen.BuildDMRoomID(req.CreatedBy,
req.Members[0])), add a guard that returns an error if req.CreatedBy ==
req.Members[0] to prevent creating a one-user DM that later sets userCount = 2
and causes duplicate subscription inserts.

157-172: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail room creation when key provisioning fails.

CreateRoom commits first, and both key-generation and keyStore.Set failures are only logged. That leaves a persisted room with no usable key, which breaks encrypted delivery outside DEV_MODE until someone runs a backfill.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@room-service/handler.go` around lines 157 - 172, Room creation currently
commits via h.store.CreateRoom before key provisioning, and failures in
ecdh.P256().GenerateKey or h.keyStore.Set are only logged; change this so key
provisioning failures cause the overall CreateRoom to fail. Either generate the
ECDH key and call h.keyStore.Set(ctx, room.ID, roomkeystore.RoomKeyPair{...})
before calling h.store.CreateRoom, or if you must create the room first, ensure
you delete/rollback the persisted room (call the inverse store method) and
return an error when key generation or keyStore.Set fails rather than just
slogging a warning; update the CreateRoom call site and error handling around
ecdh.P256().GenerateKey and h.keyStore.Set to return fmt.Errorf("create room:
%w", err) on failure.
🧹 Nitpick comments (2)
search-service/handler_test.go (1)

225-239: ⚡ Quick win

Add one non-default index test case to prove config-driven behavior.

Current assertion still matches the default constant path; a custom SpotlightIndex value would validate that searchRooms truly uses handler config rather than a hardcoded constant.

✅ Suggested test addition
+func TestHandler_SearchRooms_UsesConfiguredSpotlightIndex(t *testing.T) {
+	store := &fakeStore{
+		searchBody: json.RawMessage(`{"hits":{"total":{"value":0},"hits":[]}}`),
+	}
+	cache := newFakeCache()
+	h := newHandler(store, cache, handlerConfig{
+		DocCounts:               25,
+		MaxDocCounts:            100,
+		RestrictedRoomsCacheTTL: 5 * time.Minute,
+		RecentWindow:            365 * 24 * time.Hour,
+		SpotlightIndex:          "spotlight_site_custom",
+	})
+
+	_, err := h.searchRooms(ctxWithAccount("alice"), model.SearchRoomsRequest{SearchText: "general"})
+	require.NoError(t, err)
+	require.Len(t, store.searchCalls, 1)
+	assert.Equal(t, []string{"spotlight_site_custom"}, store.searchCalls[0].indices)
+}

As per coding guidelines "Tests must cover: happy path, error paths, edge cases (empty collections, boundary conditions), and invalid input — never write implementation code before its corresponding tests exist."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@search-service/handler_test.go` around lines 225 - 239, Add a test variant
that configures the handler with a non-default SpotlightIndex and asserts
searchRooms uses that configured index: create a fake handler via newTestHandler
but pass a custom config where SpotlightIndex != default, invoke h.searchRooms
with the same request, then verify store.searchCalls[0].indices equals the
custom index (not the constant); update
TestHandler_SearchRooms_ScopeAllHappyPath or add a new test function to cover
this config-driven path and reference newTestHandler, searchRooms,
SpotlightIndex, and store.searchCalls in the assertions.
room-service/handler_test.go (1)

1901-1911: ⚡ Quick win

Assert the DM subscription behavior explicitly.

AnyTimes() plus “capture only the first call” means this test still passes if the recipient subscription stops being created or if create-room starts inserting extra subscriptions. Please assert the exact call count and recipient fields for the DM case instead of weakening the expectation.

As per coding guidelines, "Tests must cover: happy path, error paths, edge cases (empty collections, boundary conditions), and invalid input — never write implementation code before its corresponding tests exist."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@room-service/handler_test.go` around lines 1901 - 1911, The test currently
uses store.EXPECT().CreateSubscription(...).AnyTimes() and only captures the
first call (capturedSub), which masks missing or extra subscription creations;
change the mock to assert exact DM behavior by replacing AnyTimes() with an
explicit expectation for two CreateSubscription calls (e.g., Times(2) or two
ordered EXPECTs), capture both subscription arguments (e.g., capturedSubCreator
and capturedSubRecipient) in the DoAndReturn callback, and add assertions that
the recipient subscription has the expected recipient/user fields (and the
creator subscription still matches previous assertions) when exercising the DM
create-room path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@room-service/handler.go`:
- Around line 190-205: The code incorrectly uses req.Members[0] (a user ID) as
an account ID when creating the DM subscription and event payload; instead, look
up the real account ID for that user and use it everywhere an account is
required. Modify the DM creation path (the block that constructs
model.Subscription/SubscriptionUser and the DM room-ID construction) to call the
user→account lookup helper (e.g., a store method like GetAccountIDByUserID or a
new helper on h) to obtain recipientAccountID, use recipientAccountID for
SubscriptionUser.Account and any Accounts event payloads, and fall back with
proper error handling/logging if the lookup fails; apply the same change to the
other occurrence around the 229-238 block to avoid reusing user IDs as account
IDs.
- Around line 233-256: The code is publishing a model.OutboxEvent payload to
subject.InboxMemberAdded but the consumer expects model.InboxMemberEvent
(causing wrong deserialization); change the publish so that when building
inboxEvt (model.InboxMemberEvent) you marshal and publish that inboxData
directly to publishToStream(ctx, subject.InboxMemberAdded(h.siteID), inboxData)
instead of wrapping it in model.OutboxEvent, or if wrapping is intended publish
to the outbox subject with model.OutboxEvent; update the branch around
InboxMemberEvent, OutboxEvent, publishToStream and subject.InboxMemberAdded to
use the correct payload/subject pair accordingly.

In `@room-worker/handler.go`:
- Around line 290-295: The remove flow is incorrectly setting Message.UserID to
the account string (req.Requester) instead of the real user ID; update the
remove-message creation (model.Message constructed with
idgen.MessageIDFromRequestID(seed, "rmindiv")) to use the request's real user ID
field (thread through and use RequesterID or equivalent) for UserID while
keeping UserAccount set to req.Requester, and apply the same change in the
org-removal branch and the other occurrence around lines 418-421 so consumers
that key off Message.UserID get the actual user ID.
- Around line 705-729: The local same-site publish currently logs failures but
swallows the error; update the block handling sameSiteAccounts (the
inboxEvt/outboxWrap/outboxData creation and payloadSeed/dedupID) so that if
h.publish(ctx, subject.InboxMemberAdded(room.SiteID), outboxData, dedupID)
returns an error you return that error from the enclosing handler (propagate the
error just like the cross-site branch) instead of only calling slog.Error,
ensuring the job will retry on NATS publish failures.

In `@search-service/deploy/docker-compose.yml`:
- Line 8: The docker-compose stop_grace_period for the search-service is too
short (2s) and will SIGKILL the process before the 25s shutdown sequence in
search-service/main.go (lines ~156-166) can drain NATS and close the metrics
listener; update stop_grace_period for this service (and the other identical 2s
entries added in this PR) to at least 30s (or 25s plus a safety buffer) so the
shutdown handler in main.go can complete gracefully.

In `@search-service/main.go`:
- Line 49: Update the SpotlightIndex config field so missing
SEARCH_SPOTLIGHT_INDEX causes a startup failure: change the struct tag on
SpotlightIndex (the SpotlightIndex string field in the config struct in
search-service/main.go) to include the env tag required option (e.g.
env:"SPOTLIGHT_INDEX,required") so the env loader fails fast and returns a
non-zero exit instead of silently defaulting to an empty string.

---

Outside diff comments:
In `@room-service/handler.go`:
- Around line 134-145: Reject attempts to create a DM with the creator as the
sole member by validating req.CreatedBy != req.Members[0] before proceeding; in
the DM branch (where you check len(req.Members) != 1 and call
idgen.BuildDMRoomID(req.CreatedBy, req.Members[0])), add a guard that returns an
error if req.CreatedBy == req.Members[0] to prevent creating a one-user DM that
later sets userCount = 2 and causes duplicate subscription inserts.
- Around line 157-172: Room creation currently commits via h.store.CreateRoom
before key provisioning, and failures in ecdh.P256().GenerateKey or
h.keyStore.Set are only logged; change this so key provisioning failures cause
the overall CreateRoom to fail. Either generate the ECDH key and call
h.keyStore.Set(ctx, room.ID, roomkeystore.RoomKeyPair{...}) before calling
h.store.CreateRoom, or if you must create the room first, ensure you
delete/rollback the persisted room (call the inverse store method) and return an
error when key generation or keyStore.Set fails rather than just slogging a
warning; update the CreateRoom call site and error handling around
ecdh.P256().GenerateKey and h.keyStore.Set to return fmt.Errorf("create room:
%w", err) on failure.

---

Nitpick comments:
In `@room-service/handler_test.go`:
- Around line 1901-1911: The test currently uses
store.EXPECT().CreateSubscription(...).AnyTimes() and only captures the first
call (capturedSub), which masks missing or extra subscription creations; change
the mock to assert exact DM behavior by replacing AnyTimes() with an explicit
expectation for two CreateSubscription calls (e.g., Times(2) or two ordered
EXPECTs), capture both subscription arguments (e.g., capturedSubCreator and
capturedSubRecipient) in the DoAndReturn callback, and add assertions that the
recipient subscription has the expected recipient/user fields (and the creator
subscription still matches previous assertions) when exercising the DM
create-room path.

In `@search-service/handler_test.go`:
- Around line 225-239: Add a test variant that configures the handler with a
non-default SpotlightIndex and asserts searchRooms uses that configured index:
create a fake handler via newTestHandler but pass a custom config where
SpotlightIndex != default, invoke h.searchRooms with the same request, then
verify store.searchCalls[0].indices equals the custom index (not the constant);
update TestHandler_SearchRooms_ScopeAllHappyPath or add a new test function to
cover this config-driven path and reference newTestHandler, searchRooms,
SpotlightIndex, and store.searchCalls in the assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0fc3a763-f6fa-46a8-a178-f6d2187589fb

📥 Commits

Reviewing files that changed from the base of the PR and between 68d0b0e and 83aa0dc.

📒 Files selected for processing (29)
  • .dockerignore
  • Makefile
  • auth-service/deploy/docker-compose.yml
  • broadcast-worker/deploy/docker-compose.yml
  • broadcast-worker/handler.go
  • broadcast-worker/handler_test.go
  • broadcast-worker/main.go
  • docker-local/backfill-room-keys.sh
  • docker-local/compose.deps.yaml
  • docker-local/seed-users.sh
  • history-service/deploy/docker-compose.yml
  • inbox-worker/deploy/docker-compose.yml
  • message-gatekeeper/deploy/docker-compose.yml
  • message-worker/deploy/docker-compose.yml
  • notification-worker/deploy/docker-compose.yml
  • pkg/otelutil/otel.go
  • room-service/deploy/docker-compose.yml
  • room-service/handler.go
  • room-service/handler_test.go
  • room-service/main.go
  • room-service/mock_store_test.go
  • room-service/store.go
  • room-worker/deploy/docker-compose.yml
  • room-worker/handler.go
  • search-service/deploy/docker-compose.yml
  • search-service/handler.go
  • search-service/handler_test.go
  • search-service/main.go
  • search-sync-worker/deploy/docker-compose.yml

Comment thread room-service/handler.go
Comment on lines +190 to +205
// Dev convention: account == user.ID. Prod will need a real account → ID lookup.
if req.Type == model.RoomTypeDM {
recipientAccount := req.Members[0]
recipSub := model.Subscription{
ID: idgen.GenerateUUIDv7(),
User: model.SubscriptionUser{ID: recipientAccount, Account: recipientAccount},
RoomID: room.ID,
RoomType: req.Type,
SiteID: req.SiteID,
Roles: []model.Role{model.RoleMember},
HistorySharedSince: &now,
JoinedAt: now,
}
if err := h.store.CreateSubscription(ctx, &recipSub); err != nil {
slog.Warn("create recipient subscription failed", "error", err, "account", recipientAccount)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't reuse DM member IDs as account IDs.

This path treats req.Members[0] as a user ID when building the DM room ID, then reuses the same value as SubscriptionUser.Account and in the Accounts event payload. That only works for local-dev; in real envs where account != user ID, the recipient gets subscribed and indexed under the wrong account.

Also applies to: 229-238

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@room-service/handler.go` around lines 190 - 205, The code incorrectly uses
req.Members[0] (a user ID) as an account ID when creating the DM subscription
and event payload; instead, look up the real account ID for that user and use it
everywhere an account is required. Modify the DM creation path (the block that
constructs model.Subscription/SubscriptionUser and the DM room-ID construction)
to call the user→account lookup helper (e.g., a store method like
GetAccountIDByUserID or a new helper on h) to obtain recipientAccountID, use
recipientAccountID for SubscriptionUser.Account and any Accounts event payloads,
and fall back with proper error handling/logging if the lookup fails; apply the
same change to the other occurrence around the 229-238 block to avoid reusing
user IDs as account IDs.

Comment thread room-service/handler.go
Comment on lines +233 to +256
inboxEvt := model.InboxMemberEvent{
RoomID: room.ID,
RoomName: room.Name,
RoomType: room.Type,
SiteID: h.siteID,
Accounts: accounts,
JoinedAt: now.UnixMilli(),
Timestamp: now.UnixMilli(),
}
inboxData, err := json.Marshal(inboxEvt)
if err != nil {
slog.Warn("marshal inbox member event failed", "error", err, "roomID", room.ID)
} else {
outboxEvt := model.OutboxEvent{
Type: model.OutboxMemberAdded,
SiteID: h.siteID,
DestSiteID: h.siteID,
Payload: inboxData,
Timestamp: now.UnixMilli(),
}
if outboxData, err := json.Marshal(outboxEvt); err != nil {
slog.Warn("marshal outbox event failed", "error", err, "roomID", room.ID)
} else if err := h.publishToStream(ctx, subject.InboxMemberAdded(h.siteID), outboxData); err != nil {
slog.Warn("publish owner member_added failed", "error", err, "roomID", room.ID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Publish the inbox payload on the inbox subject.

The comment and subject both say this is a same-site INBOX member_added, but the payload is wrapped as model.OutboxEvent. If the consumer for subject.InboxMemberAdded(...) expects model.InboxMemberEvent, this will deserialize to the wrong shape and silently miss indexing.

Suggested fix if this is meant to be a direct inbox publish
-			outboxEvt := model.OutboxEvent{
-				Type:       model.OutboxMemberAdded,
-				SiteID:     h.siteID,
-				DestSiteID: h.siteID,
-				Payload:    inboxData,
-				Timestamp:  now.UnixMilli(),
-			}
-			if outboxData, err := json.Marshal(outboxEvt); err != nil {
-				slog.Warn("marshal outbox event failed", "error", err, "roomID", room.ID)
-			} else if err := h.publishToStream(ctx, subject.InboxMemberAdded(h.siteID), outboxData); err != nil {
+			if err := h.publishToStream(ctx, subject.InboxMemberAdded(h.siteID), inboxData); err != nil {
 				slog.Warn("publish owner member_added failed", "error", err, "roomID", room.ID)
-			}
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@room-service/handler.go` around lines 233 - 256, The code is publishing a
model.OutboxEvent payload to subject.InboxMemberAdded but the consumer expects
model.InboxMemberEvent (causing wrong deserialization); change the publish so
that when building inboxEvt (model.InboxMemberEvent) you marshal and publish
that inboxData directly to publishToStream(ctx,
subject.InboxMemberAdded(h.siteID), inboxData) instead of wrapping it in
model.OutboxEvent, or if wrapping is intended publish to the outbox subject with
model.OutboxEvent; update the branch around InboxMemberEvent, OutboxEvent,
publishToStream and subject.InboxMemberAdded to use the correct payload/subject
pair accordingly.

Comment thread room-worker/handler.go
Comment on lines +290 to +295
// UserID == UserAccount under dev convention; prod needs real account → ID lookup.
sysMsg := model.Message{
ID: idgen.MessageIDFromRequestID(seed, "rmindiv"),
RoomID: req.RoomID,
Type: evtType,
SysMsgData: sysMsgData,
CreatedAt: now,
ID: idgen.MessageIDFromRequestID(seed, "rmindiv"),
RoomID: req.RoomID,
UserID: req.Requester,
UserAccount: req.Requester,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Populate UserID with the requester’s real user ID, not the account.

req.Requester on the remove flow is the account, so these assignments now write the account into both Message.UserID and Message.UserAccount. That makes remove-system-message sender metadata inconsistent with the add-members path and incorrect anywhere consumers key off UserID.

Suggested fix
+ requester, err := h.store.GetUser(ctx, req.Requester)
+ if err != nil {
+ 	return fmt.Errorf("get requester user: %w", err)
+ }
+
  sysMsg := model.Message{
  	ID:          idgen.MessageIDFromRequestID(seed, "rmindiv"),
  	RoomID:      req.RoomID,
- 	UserID:      req.Requester,
- 	UserAccount: req.Requester,
+ 	UserID:      requester.ID,
+ 	UserAccount: requester.Account,
  	Type:        evtType,
  	SysMsgData:  sysMsgData,
  	CreatedAt:   now,
  }

Apply the same fix to the org-removal branch, or thread RequesterID through the remove request like the add-members path already does.

Also applies to: 418-421

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@room-worker/handler.go` around lines 290 - 295, The remove flow is
incorrectly setting Message.UserID to the account string (req.Requester) instead
of the real user ID; update the remove-message creation (model.Message
constructed with idgen.MessageIDFromRequestID(seed, "rmindiv")) to use the
request's real user ID field (thread through and use RequesterID or equivalent)
for UserID while keeping UserAccount set to req.Requester, and apply the same
change in the org-removal branch and the other occurrence around lines 418-421
so consumers that key off Message.UserID get the actual user ID.

Comment thread room-worker/handler.go
build:
context: ../..
dockerfile: search-service/deploy/Dockerfile
stop_grace_period: 2s
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Give the service enough time to finish its 25s shutdown path.

search-service/main.go:156-166 intentionally reserves up to 25 seconds to drain NATS and close the metrics listener. With stop_grace_period: 2s, docker compose stop/up-rebuild will SIGKILL the process long before that cleanup can finish. The same concern applies to the other stop_grace_period: 2s additions in this PR.

Suggested fix
-    stop_grace_period: 2s
+    stop_grace_period: 30s
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
stop_grace_period: 2s
stop_grace_period: 30s
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@search-service/deploy/docker-compose.yml` at line 8, The docker-compose
stop_grace_period for the search-service is too short (2s) and will SIGKILL the
process before the 25s shutdown sequence in search-service/main.go (lines
~156-166) can drain NATS and close the metrics listener; update
stop_grace_period for this service (and the other identical 2s entries added in
this PR) to at least 30s (or 25s plus a safety buffer) so the shutdown handler
in main.go can complete gracefully.

Comment thread search-service/main.go Outdated
@Joey0538 Joey0538 force-pushed the claude/backend-dev-fixes branch from 83aa0dc to 0db4e6a Compare May 4, 2026 04:44
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
room-service/handler.go (1)

164-175: 💤 Low value

Best-effort key minting may silently drop all subsequent messages.

Per the relevant code snippet (broadcast-worker/handler.go:109-123), if the room key is missing at broadcast time, messages are dropped permanently with only a warning log. Since key generation is best-effort here, a failure leaves the room in a state where all messages will be silently discarded.

For local dev this is likely acceptable (failures are rare and the warning surfaces the issue), but for production you may want to either:

  • Fail room creation if key storage fails, or
  • Elevate the log level from Warn to Error so it's more visible
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@room-service/handler.go` around lines 164 - 175, The current best-effort key
minting in the room creation path can leave a room without a key (see h.keyStore
check, ecdh.P256().GenerateKey, and h.keyStore.Set for room.ID), which causes
broadcast-worker to drop messages silently; update the handler to either return
an error from the room creation request when key generation or h.keyStore.Set
fails (make the function propagate the error) or change the warning logs to
errors so failures are surfaced (replace slog.Warn with slog.Error and include
the error and room.ID) — pick one approach and apply it consistently to the
GenerateKey and Set error branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@room-service/handler.go`:
- Around line 211-225: The subscription update is only sent for the creator
(sub/ subEvt) so DM recipients miss the UI update; hoist or expose the recipient
subscription (recipSub) created inside the DM branch and publish a second
SubscriptionUpdateEvent for that recipient using the same pattern (marshal a
SubscriptionUpdateEvent with UserID set to recipSub.User.ID and call
h.publishEvent with subject.SubscriptionUpdate(recipSub.User.ID)), ensuring you
handle json.Marshal and h.publishEvent errors the same way as for subEvt.

---

Nitpick comments:
In `@room-service/handler.go`:
- Around line 164-175: The current best-effort key minting in the room creation
path can leave a room without a key (see h.keyStore check,
ecdh.P256().GenerateKey, and h.keyStore.Set for room.ID), which causes
broadcast-worker to drop messages silently; update the handler to either return
an error from the room creation request when key generation or h.keyStore.Set
fails (make the function propagate the error) or change the warning logs to
errors so failures are surfaced (replace slog.Warn with slog.Error and include
the error and room.ID) — pick one approach and apply it consistently to the
GenerateKey and Set error branches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18b73bea-fce8-4149-89c2-29b958bba809

📥 Commits

Reviewing files that changed from the base of the PR and between 83aa0dc and 0db4e6a.

📒 Files selected for processing (14)
  • broadcast-worker/deploy/docker-compose.yml
  • broadcast-worker/handler.go
  • broadcast-worker/handler_test.go
  • broadcast-worker/main.go
  • room-service/handler.go
  • room-service/handler_test.go
  • room-service/main.go
  • room-service/mock_store_test.go
  • room-service/store.go
  • room-worker/handler.go
  • search-service/deploy/docker-compose.yml
  • search-service/handler.go
  • search-service/handler_test.go
  • search-service/main.go
✅ Files skipped from review due to trivial changes (5)
  • broadcast-worker/deploy/docker-compose.yml
  • room-service/mock_store_test.go
  • search-service/handler.go
  • search-service/handler_test.go
  • search-service/deploy/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (6)
  • room-service/main.go
  • broadcast-worker/main.go
  • broadcast-worker/handler.go
  • room-worker/handler.go
  • room-service/handler_test.go
  • broadcast-worker/handler_test.go

Comment thread room-service/handler.go
Joey0538 added 4 commits May 4, 2026 04:50
…ber_added

Four changes to handleCreateRoom — none of which existed before — so
that newly-created rooms are immediately functional end-to-end:

- Mint a P-256 keypair in Valkey via h.keyStore.Set after CreateRoom.
  Without this, broadcast-worker fails the encrypt step ("no current
  key") and JetStream redelivers forever. Extends the narrow
  RoomKeyStore interface with Set; nil-tolerated for tests.
- DMs now persist a second Subscription for req.Members[0] and bump
  Room.UserCount to 2. Without this, the recipient logs in and every
  read path hits "not subscribed to room". Dev convention is account
  == user.ID, so req.Members[0] doubles for both fields; prod will
  need a real account → user.ID lookup.
- Best-effort core-NATS publish of SubscriptionUpdateEvent{Action:
  "added"} via a new WithEventPublisher hook so the creator's
  frontend sees the room appear without a refresh. Mirrors how
  room-worker emits the event for member-add / role-update.
- Best-effort INBOX same-site OutboxEvent{member_added} for the new
  subscription(s) so search-sync-worker's spotlight + user-room
  collections index the auto-enrolled accounts. Wire format matches
  PR #145's spec; HSS=nil keeps the bulk unrestricted.
Two related changes so channel events stop wedging the consumer and
render in local dev:

- On keyStore.Get returning nil for a room, log a warning and return
  nil so the caller acks. Old keyless rooms (created before
  room-service mint-on-create) previously errored, the consumer loop
  called Nak, JetStream redelivered, and the worker spammed logs
  forever. Cassandra still has the message via message-worker.
- New DEV_MODE config (env DEV_MODE, default false) keeps evt.Message
  populated alongside the encrypted payload on channel events so a
  frontend without client-side decryption can still render. MUST stay
  false in prod — bundles plaintext alongside the E2E payload.
  DEV_MODE=true wired in the deploy compose for local; startup
  slog.Warn on boot when on so it can't slip into prod silently.
Both index names were hardcoded constants ("user-room", "spotlight")
that don't match what search-sync-worker writes
(user-room-{siteID}, spotlight-{siteID}-v1-chat). The httpAdapter's
ignore_unavailable=true masked the mismatch — every query silently
returned zero hits. End-user symptom: search returns nothing for any
account, any term.

Plumb USER_ROOM_INDEX (already partially wired) + new SPOTLIGHT_INDEX
through SearchConfig → handlerConfig → searchRooms. Pin both env vars
in the deploy compose; prod uses ops/IaC-owned aliases.
Two member-event fixes:

- processAddMembers now publishes a same-site
  OutboxEvent{member_added} on chat.inbox.{siteID}.member_added for
  the local subset of accounts (cross-site keep going through OUTBOX
  unchanged). Implements the add-members slice of PR #145's spec;
  same wire format as room-service's room-create owner publish so
  search-sync-worker's parseMemberEvent accepts both.

- processRemoveIndividual + processRemoveOrg system messages now
  populate UserID/UserAccount from req.Requester. Prior code left
  these blank, so message-worker logged "user not found for system
  message" on every member-remove and the chat history rendered
  the entry as "Unknown". Dev convention: account == _id. Prod
  needs a real account → user.ID lookup upstream.

Remove-individual / remove-org INBOX publishes from #145's spec are
still TODO; only the add-member slice is closed here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant