fix(backend): local-dev unblock + dev-mode message rendering across services#148
fix(backend): local-dev unblock + dev-mode message rendering across services#148Joey0538 wants to merge 6 commits into
Conversation
- NATS healthcheck uses /healthz?js-server-only=true so a fresh JetStream volume doesn't 503; bump start_period for slower disks. - Root .dockerignore so service builds don't tar the whole repo. - Split `make up` (no rebuild) from `make up-rebuild`. - stop_grace_period: 2s on every service compose (was 10s × 11 ≈ 110s on `make down`). - `make seed-users` + docker-local/seed-users.sh: idempotent fixtures (alice, bob) so dev-auth users have a `users` row. - `make backfill-room-keys` + docker-local/backfill-room-keys.sh: mint Valkey keys for rooms created before mint-on-create.
InitTracer now no-ops (returns the SDK noop provider) unless OTEL_EXPORTER_OTLP_ENDPOINT or OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is set. Prevents local dev from flooding logs with "traces export: connection refused" against 127.0.0.1:4317 when no collector is running. Prod/staging configure the env via deployment.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (14)
📝 WalkthroughWalkthroughAdds dev tools and scripts, introduces broadcast-worker DEV_MODE and keyless-room behavior, generates/persists room ECDH keys at room creation, enhances member add/remove messaging with identity and local INBOX dispatch, parameterizes search indices, tightens OTEL initialization, expands .dockerignore, and standardizes stop_grace_period across services. ChangesBroadcast Worker Dev Mode & Keyless Room Handling
Room Service Encryption Key Generation & DM Member Management
Room Worker Member Identity & Local Inbox Dispatch
Search Service Index Parameterization
Infrastructure, Development Utilities & Container Lifecycle
Sequence Diagram(s)sequenceDiagram
participant Client
participant RoomService as RoomService/Handler
participant MongoDB
participant Valkey
participant NATS
Client->>RoomService: CreateRoom (DM)
RoomService->>MongoDB: Insert Room + Subscription (creator)
RoomService->>RoomService: generate P-256 keypair (ephemeral)
RoomService->>Valkey: RoomKeyStore.Set(roomID, keypair) [best-effort]
RoomService->>MongoDB: Create Subscription (recipient)
RoomService->>NATS: Publish SubscriptionUpdateEvent (transient)
RoomService->>NATS: Publish Inbox/Outbox member_added event
NATS-->>Client: publish ack (async)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 53 minutes and 37 seconds.Comment |
CI lint failed on PR #148 — goimports flagged broadcast-worker/main.go after the DevMode field was added to the env-tagged config block. Run `make fmt` to align.
b82086c to
83aa0dc
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
room-service/handler.go (2)
134-145:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject self-DMs before creating the room.
len(req.Members) == 1still allowsreq.CreatedBy == req.Members[0]. That produces a one-user DM, setsUserCountto2, and later attempts a duplicate subscription insert for the same principal.Suggested fix
case model.RoomTypeDM: if len(req.Members) != 1 { return nil, fmt.Errorf("DM requires exactly one other member, got %d", len(req.Members)) } + if req.Members[0] == req.CreatedBy { + return nil, fmt.Errorf("DM requires exactly one other member") + } roomID = idgen.BuildDMRoomID(req.CreatedBy, req.Members[0])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@room-service/handler.go` around lines 134 - 145, Reject attempts to create a DM with the creator as the sole member by validating req.CreatedBy != req.Members[0] before proceeding; in the DM branch (where you check len(req.Members) != 1 and call idgen.BuildDMRoomID(req.CreatedBy, req.Members[0])), add a guard that returns an error if req.CreatedBy == req.Members[0] to prevent creating a one-user DM that later sets userCount = 2 and causes duplicate subscription inserts.
157-172:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail room creation when key provisioning fails.
CreateRoomcommits first, and both key-generation andkeyStore.Setfailures are only logged. That leaves a persisted room with no usable key, which breaks encrypted delivery outside DEV_MODE until someone runs a backfill.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@room-service/handler.go` around lines 157 - 172, Room creation currently commits via h.store.CreateRoom before key provisioning, and failures in ecdh.P256().GenerateKey or h.keyStore.Set are only logged; change this so key provisioning failures cause the overall CreateRoom to fail. Either generate the ECDH key and call h.keyStore.Set(ctx, room.ID, roomkeystore.RoomKeyPair{...}) before calling h.store.CreateRoom, or if you must create the room first, ensure you delete/rollback the persisted room (call the inverse store method) and return an error when key generation or keyStore.Set fails rather than just slogging a warning; update the CreateRoom call site and error handling around ecdh.P256().GenerateKey and h.keyStore.Set to return fmt.Errorf("create room: %w", err) on failure.
🧹 Nitpick comments (2)
search-service/handler_test.go (1)
225-239: ⚡ Quick winAdd one non-default index test case to prove config-driven behavior.
Current assertion still matches the default constant path; a custom
SpotlightIndexvalue would validate thatsearchRoomstruly uses handler config rather than a hardcoded constant.✅ Suggested test addition
+func TestHandler_SearchRooms_UsesConfiguredSpotlightIndex(t *testing.T) { + store := &fakeStore{ + searchBody: json.RawMessage(`{"hits":{"total":{"value":0},"hits":[]}}`), + } + cache := newFakeCache() + h := newHandler(store, cache, handlerConfig{ + DocCounts: 25, + MaxDocCounts: 100, + RestrictedRoomsCacheTTL: 5 * time.Minute, + RecentWindow: 365 * 24 * time.Hour, + SpotlightIndex: "spotlight_site_custom", + }) + + _, err := h.searchRooms(ctxWithAccount("alice"), model.SearchRoomsRequest{SearchText: "general"}) + require.NoError(t, err) + require.Len(t, store.searchCalls, 1) + assert.Equal(t, []string{"spotlight_site_custom"}, store.searchCalls[0].indices) +}As per coding guidelines "Tests must cover: happy path, error paths, edge cases (empty collections, boundary conditions), and invalid input — never write implementation code before its corresponding tests exist."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@search-service/handler_test.go` around lines 225 - 239, Add a test variant that configures the handler with a non-default SpotlightIndex and asserts searchRooms uses that configured index: create a fake handler via newTestHandler but pass a custom config where SpotlightIndex != default, invoke h.searchRooms with the same request, then verify store.searchCalls[0].indices equals the custom index (not the constant); update TestHandler_SearchRooms_ScopeAllHappyPath or add a new test function to cover this config-driven path and reference newTestHandler, searchRooms, SpotlightIndex, and store.searchCalls in the assertions.room-service/handler_test.go (1)
1901-1911: ⚡ Quick winAssert the DM subscription behavior explicitly.
AnyTimes()plus “capture only the first call” means this test still passes if the recipient subscription stops being created or if create-room starts inserting extra subscriptions. Please assert the exact call count and recipient fields for the DM case instead of weakening the expectation.As per coding guidelines, "Tests must cover: happy path, error paths, edge cases (empty collections, boundary conditions), and invalid input — never write implementation code before its corresponding tests exist."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@room-service/handler_test.go` around lines 1901 - 1911, The test currently uses store.EXPECT().CreateSubscription(...).AnyTimes() and only captures the first call (capturedSub), which masks missing or extra subscription creations; change the mock to assert exact DM behavior by replacing AnyTimes() with an explicit expectation for two CreateSubscription calls (e.g., Times(2) or two ordered EXPECTs), capture both subscription arguments (e.g., capturedSubCreator and capturedSubRecipient) in the DoAndReturn callback, and add assertions that the recipient subscription has the expected recipient/user fields (and the creator subscription still matches previous assertions) when exercising the DM create-room path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@room-service/handler.go`:
- Around line 190-205: The code incorrectly uses req.Members[0] (a user ID) as
an account ID when creating the DM subscription and event payload; instead, look
up the real account ID for that user and use it everywhere an account is
required. Modify the DM creation path (the block that constructs
model.Subscription/SubscriptionUser and the DM room-ID construction) to call the
user→account lookup helper (e.g., a store method like GetAccountIDByUserID or a
new helper on h) to obtain recipientAccountID, use recipientAccountID for
SubscriptionUser.Account and any Accounts event payloads, and fall back with
proper error handling/logging if the lookup fails; apply the same change to the
other occurrence around the 229-238 block to avoid reusing user IDs as account
IDs.
- Around line 233-256: The code is publishing a model.OutboxEvent payload to
subject.InboxMemberAdded but the consumer expects model.InboxMemberEvent
(causing wrong deserialization); change the publish so that when building
inboxEvt (model.InboxMemberEvent) you marshal and publish that inboxData
directly to publishToStream(ctx, subject.InboxMemberAdded(h.siteID), inboxData)
instead of wrapping it in model.OutboxEvent, or if wrapping is intended publish
to the outbox subject with model.OutboxEvent; update the branch around
InboxMemberEvent, OutboxEvent, publishToStream and subject.InboxMemberAdded to
use the correct payload/subject pair accordingly.
In `@room-worker/handler.go`:
- Around line 290-295: The remove flow is incorrectly setting Message.UserID to
the account string (req.Requester) instead of the real user ID; update the
remove-message creation (model.Message constructed with
idgen.MessageIDFromRequestID(seed, "rmindiv")) to use the request's real user ID
field (thread through and use RequesterID or equivalent) for UserID while
keeping UserAccount set to req.Requester, and apply the same change in the
org-removal branch and the other occurrence around lines 418-421 so consumers
that key off Message.UserID get the actual user ID.
- Around line 705-729: The local same-site publish currently logs failures but
swallows the error; update the block handling sameSiteAccounts (the
inboxEvt/outboxWrap/outboxData creation and payloadSeed/dedupID) so that if
h.publish(ctx, subject.InboxMemberAdded(room.SiteID), outboxData, dedupID)
returns an error you return that error from the enclosing handler (propagate the
error just like the cross-site branch) instead of only calling slog.Error,
ensuring the job will retry on NATS publish failures.
In `@search-service/deploy/docker-compose.yml`:
- Line 8: The docker-compose stop_grace_period for the search-service is too
short (2s) and will SIGKILL the process before the 25s shutdown sequence in
search-service/main.go (lines ~156-166) can drain NATS and close the metrics
listener; update stop_grace_period for this service (and the other identical 2s
entries added in this PR) to at least 30s (or 25s plus a safety buffer) so the
shutdown handler in main.go can complete gracefully.
In `@search-service/main.go`:
- Line 49: Update the SpotlightIndex config field so missing
SEARCH_SPOTLIGHT_INDEX causes a startup failure: change the struct tag on
SpotlightIndex (the SpotlightIndex string field in the config struct in
search-service/main.go) to include the env tag required option (e.g.
env:"SPOTLIGHT_INDEX,required") so the env loader fails fast and returns a
non-zero exit instead of silently defaulting to an empty string.
---
Outside diff comments:
In `@room-service/handler.go`:
- Around line 134-145: Reject attempts to create a DM with the creator as the
sole member by validating req.CreatedBy != req.Members[0] before proceeding; in
the DM branch (where you check len(req.Members) != 1 and call
idgen.BuildDMRoomID(req.CreatedBy, req.Members[0])), add a guard that returns an
error if req.CreatedBy == req.Members[0] to prevent creating a one-user DM that
later sets userCount = 2 and causes duplicate subscription inserts.
- Around line 157-172: Room creation currently commits via h.store.CreateRoom
before key provisioning, and failures in ecdh.P256().GenerateKey or
h.keyStore.Set are only logged; change this so key provisioning failures cause
the overall CreateRoom to fail. Either generate the ECDH key and call
h.keyStore.Set(ctx, room.ID, roomkeystore.RoomKeyPair{...}) before calling
h.store.CreateRoom, or if you must create the room first, ensure you
delete/rollback the persisted room (call the inverse store method) and return an
error when key generation or keyStore.Set fails rather than just slogging a
warning; update the CreateRoom call site and error handling around
ecdh.P256().GenerateKey and h.keyStore.Set to return fmt.Errorf("create room:
%w", err) on failure.
---
Nitpick comments:
In `@room-service/handler_test.go`:
- Around line 1901-1911: The test currently uses
store.EXPECT().CreateSubscription(...).AnyTimes() and only captures the first
call (capturedSub), which masks missing or extra subscription creations; change
the mock to assert exact DM behavior by replacing AnyTimes() with an explicit
expectation for two CreateSubscription calls (e.g., Times(2) or two ordered
EXPECTs), capture both subscription arguments (e.g., capturedSubCreator and
capturedSubRecipient) in the DoAndReturn callback, and add assertions that the
recipient subscription has the expected recipient/user fields (and the creator
subscription still matches previous assertions) when exercising the DM
create-room path.
In `@search-service/handler_test.go`:
- Around line 225-239: Add a test variant that configures the handler with a
non-default SpotlightIndex and asserts searchRooms uses that configured index:
create a fake handler via newTestHandler but pass a custom config where
SpotlightIndex != default, invoke h.searchRooms with the same request, then
verify store.searchCalls[0].indices equals the custom index (not the constant);
update TestHandler_SearchRooms_ScopeAllHappyPath or add a new test function to
cover this config-driven path and reference newTestHandler, searchRooms,
SpotlightIndex, and store.searchCalls in the assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0fc3a763-f6fa-46a8-a178-f6d2187589fb
📒 Files selected for processing (29)
.dockerignoreMakefileauth-service/deploy/docker-compose.ymlbroadcast-worker/deploy/docker-compose.ymlbroadcast-worker/handler.gobroadcast-worker/handler_test.gobroadcast-worker/main.godocker-local/backfill-room-keys.shdocker-local/compose.deps.yamldocker-local/seed-users.shhistory-service/deploy/docker-compose.ymlinbox-worker/deploy/docker-compose.ymlmessage-gatekeeper/deploy/docker-compose.ymlmessage-worker/deploy/docker-compose.ymlnotification-worker/deploy/docker-compose.ymlpkg/otelutil/otel.goroom-service/deploy/docker-compose.ymlroom-service/handler.goroom-service/handler_test.goroom-service/main.goroom-service/mock_store_test.goroom-service/store.goroom-worker/deploy/docker-compose.ymlroom-worker/handler.gosearch-service/deploy/docker-compose.ymlsearch-service/handler.gosearch-service/handler_test.gosearch-service/main.gosearch-sync-worker/deploy/docker-compose.yml
| // Dev convention: account == user.ID. Prod will need a real account → ID lookup. | ||
| if req.Type == model.RoomTypeDM { | ||
| recipientAccount := req.Members[0] | ||
| recipSub := model.Subscription{ | ||
| ID: idgen.GenerateUUIDv7(), | ||
| User: model.SubscriptionUser{ID: recipientAccount, Account: recipientAccount}, | ||
| RoomID: room.ID, | ||
| RoomType: req.Type, | ||
| SiteID: req.SiteID, | ||
| Roles: []model.Role{model.RoleMember}, | ||
| HistorySharedSince: &now, | ||
| JoinedAt: now, | ||
| } | ||
| if err := h.store.CreateSubscription(ctx, &recipSub); err != nil { | ||
| slog.Warn("create recipient subscription failed", "error", err, "account", recipientAccount) | ||
| } |
There was a problem hiding this comment.
Don't reuse DM member IDs as account IDs.
This path treats req.Members[0] as a user ID when building the DM room ID, then reuses the same value as SubscriptionUser.Account and in the Accounts event payload. That only works for local-dev; in real envs where account != user ID, the recipient gets subscribed and indexed under the wrong account.
Also applies to: 229-238
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@room-service/handler.go` around lines 190 - 205, The code incorrectly uses
req.Members[0] (a user ID) as an account ID when creating the DM subscription
and event payload; instead, look up the real account ID for that user and use it
everywhere an account is required. Modify the DM creation path (the block that
constructs model.Subscription/SubscriptionUser and the DM room-ID construction)
to call the user→account lookup helper (e.g., a store method like
GetAccountIDByUserID or a new helper on h) to obtain recipientAccountID, use
recipientAccountID for SubscriptionUser.Account and any Accounts event payloads,
and fall back with proper error handling/logging if the lookup fails; apply the
same change to the other occurrence around the 229-238 block to avoid reusing
user IDs as account IDs.
| inboxEvt := model.InboxMemberEvent{ | ||
| RoomID: room.ID, | ||
| RoomName: room.Name, | ||
| RoomType: room.Type, | ||
| SiteID: h.siteID, | ||
| Accounts: accounts, | ||
| JoinedAt: now.UnixMilli(), | ||
| Timestamp: now.UnixMilli(), | ||
| } | ||
| inboxData, err := json.Marshal(inboxEvt) | ||
| if err != nil { | ||
| slog.Warn("marshal inbox member event failed", "error", err, "roomID", room.ID) | ||
| } else { | ||
| outboxEvt := model.OutboxEvent{ | ||
| Type: model.OutboxMemberAdded, | ||
| SiteID: h.siteID, | ||
| DestSiteID: h.siteID, | ||
| Payload: inboxData, | ||
| Timestamp: now.UnixMilli(), | ||
| } | ||
| if outboxData, err := json.Marshal(outboxEvt); err != nil { | ||
| slog.Warn("marshal outbox event failed", "error", err, "roomID", room.ID) | ||
| } else if err := h.publishToStream(ctx, subject.InboxMemberAdded(h.siteID), outboxData); err != nil { | ||
| slog.Warn("publish owner member_added failed", "error", err, "roomID", room.ID) |
There was a problem hiding this comment.
Publish the inbox payload on the inbox subject.
The comment and subject both say this is a same-site INBOX member_added, but the payload is wrapped as model.OutboxEvent. If the consumer for subject.InboxMemberAdded(...) expects model.InboxMemberEvent, this will deserialize to the wrong shape and silently miss indexing.
Suggested fix if this is meant to be a direct inbox publish
- outboxEvt := model.OutboxEvent{
- Type: model.OutboxMemberAdded,
- SiteID: h.siteID,
- DestSiteID: h.siteID,
- Payload: inboxData,
- Timestamp: now.UnixMilli(),
- }
- if outboxData, err := json.Marshal(outboxEvt); err != nil {
- slog.Warn("marshal outbox event failed", "error", err, "roomID", room.ID)
- } else if err := h.publishToStream(ctx, subject.InboxMemberAdded(h.siteID), outboxData); err != nil {
+ if err := h.publishToStream(ctx, subject.InboxMemberAdded(h.siteID), inboxData); err != nil {
slog.Warn("publish owner member_added failed", "error", err, "roomID", room.ID)
- }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@room-service/handler.go` around lines 233 - 256, The code is publishing a
model.OutboxEvent payload to subject.InboxMemberAdded but the consumer expects
model.InboxMemberEvent (causing wrong deserialization); change the publish so
that when building inboxEvt (model.InboxMemberEvent) you marshal and publish
that inboxData directly to publishToStream(ctx,
subject.InboxMemberAdded(h.siteID), inboxData) instead of wrapping it in
model.OutboxEvent, or if wrapping is intended publish to the outbox subject with
model.OutboxEvent; update the branch around InboxMemberEvent, OutboxEvent,
publishToStream and subject.InboxMemberAdded to use the correct payload/subject
pair accordingly.
| // UserID == UserAccount under dev convention; prod needs real account → ID lookup. | ||
| sysMsg := model.Message{ | ||
| ID: idgen.MessageIDFromRequestID(seed, "rmindiv"), | ||
| RoomID: req.RoomID, | ||
| Type: evtType, | ||
| SysMsgData: sysMsgData, | ||
| CreatedAt: now, | ||
| ID: idgen.MessageIDFromRequestID(seed, "rmindiv"), | ||
| RoomID: req.RoomID, | ||
| UserID: req.Requester, | ||
| UserAccount: req.Requester, |
There was a problem hiding this comment.
Populate UserID with the requester’s real user ID, not the account.
req.Requester on the remove flow is the account, so these assignments now write the account into both Message.UserID and Message.UserAccount. That makes remove-system-message sender metadata inconsistent with the add-members path and incorrect anywhere consumers key off UserID.
Suggested fix
+ requester, err := h.store.GetUser(ctx, req.Requester)
+ if err != nil {
+ return fmt.Errorf("get requester user: %w", err)
+ }
+
sysMsg := model.Message{
ID: idgen.MessageIDFromRequestID(seed, "rmindiv"),
RoomID: req.RoomID,
- UserID: req.Requester,
- UserAccount: req.Requester,
+ UserID: requester.ID,
+ UserAccount: requester.Account,
Type: evtType,
SysMsgData: sysMsgData,
CreatedAt: now,
}Apply the same fix to the org-removal branch, or thread RequesterID through the remove request like the add-members path already does.
Also applies to: 418-421
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@room-worker/handler.go` around lines 290 - 295, The remove flow is
incorrectly setting Message.UserID to the account string (req.Requester) instead
of the real user ID; update the remove-message creation (model.Message
constructed with idgen.MessageIDFromRequestID(seed, "rmindiv")) to use the
request's real user ID field (thread through and use RequesterID or equivalent)
for UserID while keeping UserAccount set to req.Requester, and apply the same
change in the org-removal branch and the other occurrence around lines 418-421
so consumers that key off Message.UserID get the actual user ID.
| build: | ||
| context: ../.. | ||
| dockerfile: search-service/deploy/Dockerfile | ||
| stop_grace_period: 2s |
There was a problem hiding this comment.
Give the service enough time to finish its 25s shutdown path.
search-service/main.go:156-166 intentionally reserves up to 25 seconds to drain NATS and close the metrics listener. With stop_grace_period: 2s, docker compose stop/up-rebuild will SIGKILL the process long before that cleanup can finish. The same concern applies to the other stop_grace_period: 2s additions in this PR.
Suggested fix
- stop_grace_period: 2s
+ stop_grace_period: 30s📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| stop_grace_period: 2s | |
| stop_grace_period: 30s |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@search-service/deploy/docker-compose.yml` at line 8, The docker-compose
stop_grace_period for the search-service is too short (2s) and will SIGKILL the
process before the 25s shutdown sequence in search-service/main.go (lines
~156-166) can drain NATS and close the metrics listener; update
stop_grace_period for this service (and the other identical 2s entries added in
this PR) to at least 30s (or 25s plus a safety buffer) so the shutdown handler
in main.go can complete gracefully.
83aa0dc to
0db4e6a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
room-service/handler.go (1)
164-175: 💤 Low valueBest-effort key minting may silently drop all subsequent messages.
Per the relevant code snippet (broadcast-worker/handler.go:109-123), if the room key is missing at broadcast time, messages are dropped permanently with only a warning log. Since key generation is best-effort here, a failure leaves the room in a state where all messages will be silently discarded.
For local dev this is likely acceptable (failures are rare and the warning surfaces the issue), but for production you may want to either:
- Fail room creation if key storage fails, or
- Elevate the log level from
WarntoErrorso it's more visible🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@room-service/handler.go` around lines 164 - 175, The current best-effort key minting in the room creation path can leave a room without a key (see h.keyStore check, ecdh.P256().GenerateKey, and h.keyStore.Set for room.ID), which causes broadcast-worker to drop messages silently; update the handler to either return an error from the room creation request when key generation or h.keyStore.Set fails (make the function propagate the error) or change the warning logs to errors so failures are surfaced (replace slog.Warn with slog.Error and include the error and room.ID) — pick one approach and apply it consistently to the GenerateKey and Set error branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@room-service/handler.go`:
- Around line 211-225: The subscription update is only sent for the creator
(sub/ subEvt) so DM recipients miss the UI update; hoist or expose the recipient
subscription (recipSub) created inside the DM branch and publish a second
SubscriptionUpdateEvent for that recipient using the same pattern (marshal a
SubscriptionUpdateEvent with UserID set to recipSub.User.ID and call
h.publishEvent with subject.SubscriptionUpdate(recipSub.User.ID)), ensuring you
handle json.Marshal and h.publishEvent errors the same way as for subEvt.
---
Nitpick comments:
In `@room-service/handler.go`:
- Around line 164-175: The current best-effort key minting in the room creation
path can leave a room without a key (see h.keyStore check,
ecdh.P256().GenerateKey, and h.keyStore.Set for room.ID), which causes
broadcast-worker to drop messages silently; update the handler to either return
an error from the room creation request when key generation or h.keyStore.Set
fails (make the function propagate the error) or change the warning logs to
errors so failures are surfaced (replace slog.Warn with slog.Error and include
the error and room.ID) — pick one approach and apply it consistently to the
GenerateKey and Set error branches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 18b73bea-fce8-4149-89c2-29b958bba809
📒 Files selected for processing (14)
broadcast-worker/deploy/docker-compose.ymlbroadcast-worker/handler.gobroadcast-worker/handler_test.gobroadcast-worker/main.goroom-service/handler.goroom-service/handler_test.goroom-service/main.goroom-service/mock_store_test.goroom-service/store.goroom-worker/handler.gosearch-service/deploy/docker-compose.ymlsearch-service/handler.gosearch-service/handler_test.gosearch-service/main.go
✅ Files skipped from review due to trivial changes (5)
- broadcast-worker/deploy/docker-compose.yml
- room-service/mock_store_test.go
- search-service/handler.go
- search-service/handler_test.go
- search-service/deploy/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (6)
- room-service/main.go
- broadcast-worker/main.go
- broadcast-worker/handler.go
- room-worker/handler.go
- room-service/handler_test.go
- broadcast-worker/handler_test.go
…ber_added
Four changes to handleCreateRoom — none of which existed before — so
that newly-created rooms are immediately functional end-to-end:
- Mint a P-256 keypair in Valkey via h.keyStore.Set after CreateRoom.
Without this, broadcast-worker fails the encrypt step ("no current
key") and JetStream redelivers forever. Extends the narrow
RoomKeyStore interface with Set; nil-tolerated for tests.
- DMs now persist a second Subscription for req.Members[0] and bump
Room.UserCount to 2. Without this, the recipient logs in and every
read path hits "not subscribed to room". Dev convention is account
== user.ID, so req.Members[0] doubles for both fields; prod will
need a real account → user.ID lookup.
- Best-effort core-NATS publish of SubscriptionUpdateEvent{Action:
"added"} via a new WithEventPublisher hook so the creator's
frontend sees the room appear without a refresh. Mirrors how
room-worker emits the event for member-add / role-update.
- Best-effort INBOX same-site OutboxEvent{member_added} for the new
subscription(s) so search-sync-worker's spotlight + user-room
collections index the auto-enrolled accounts. Wire format matches
PR #145's spec; HSS=nil keeps the bulk unrestricted.
Two related changes so channel events stop wedging the consumer and render in local dev: - On keyStore.Get returning nil for a room, log a warning and return nil so the caller acks. Old keyless rooms (created before room-service mint-on-create) previously errored, the consumer loop called Nak, JetStream redelivered, and the worker spammed logs forever. Cassandra still has the message via message-worker. - New DEV_MODE config (env DEV_MODE, default false) keeps evt.Message populated alongside the encrypted payload on channel events so a frontend without client-side decryption can still render. MUST stay false in prod — bundles plaintext alongside the E2E payload. DEV_MODE=true wired in the deploy compose for local; startup slog.Warn on boot when on so it can't slip into prod silently.
Both index names were hardcoded constants ("user-room", "spotlight")
that don't match what search-sync-worker writes
(user-room-{siteID}, spotlight-{siteID}-v1-chat). The httpAdapter's
ignore_unavailable=true masked the mismatch — every query silently
returned zero hits. End-user symptom: search returns nothing for any
account, any term.
Plumb USER_ROOM_INDEX (already partially wired) + new SPOTLIGHT_INDEX
through SearchConfig → handlerConfig → searchRooms. Pin both env vars
in the deploy compose; prod uses ops/IaC-owned aliases.
Two member-event fixes:
- processAddMembers now publishes a same-site
OutboxEvent{member_added} on chat.inbox.{siteID}.member_added for
the local subset of accounts (cross-site keep going through OUTBOX
unchanged). Implements the add-members slice of PR #145's spec;
same wire format as room-service's room-create owner publish so
search-sync-worker's parseMemberEvent accepts both.
- processRemoveIndividual + processRemoveOrg system messages now
populate UserID/UserAccount from req.Requester. Prior code left
these blank, so message-worker logged "user not found for system
message" on every member-remove and the chat history rendered
the entry as "Unknown". Dev convention: account == _id. Prod
needs a real account → user.ID lookup upstream.
Remove-individual / remove-org INBOX publishes from #145's spec are
still TODO; only the add-member slice is closed here.
0db4e6a to
cf2c31a
Compare
Summary
Six service-grouped commits, all backend, that unblock the local-dev stack and close enough gaps in the message-delivery pipeline that an end-to-end channel + DM flow works against
make upfrom a fresh checkout. Companion to PR #146 (frontend); each PR is independently mergeable.Commits (6, oldest → newest)
chore(local-dev): unblock + speed up dev stack.dockerignore, splitup/up-rebuild,stop_grace_period: 2son every service compose,make seed-users+make backfill-room-keysfix(otelutil): skip OTLP tracer init when no endpoint env is settraces export: connection refusedlog spam in local dev when no collector runsfix(room-service): mint room key, enroll owner+DM recipient, emit member_addedsubscription.update+ INBOXmember_addedfor spotlight/user-room indexingfix(broadcast-worker): no-key ack-skip + DEV_MODE plaintextDEV_MODEenv keeps plaintext alongside encrypted payload for no-crypto local frontendsfix(search-service): env-driven user-room + spotlight index namesfix(room-worker): publish member_added on add + sysMsg sender on removeWhat this unblocks
After
make deps-up && make up && make seed-users, you can:alice/bob(dev mode,siteId=site-local)Notes
DEV_MODE=trueon broadcast-worker is wired in the local compose with a startupslog.Warnand a comment that says it MUST stay false in prod. Pin'd index names target the site-suffixed concrete indexes; prod uses ops-owned aliases.room-workeronly closes the add-members slice of PR docs(spec): federated room origin-site MV fix design #145's spec. Remove-individual / remove-org INBOX publishes from docs(spec): federated room origin-site MV fix design #145 stay TODO.chore(local-dev)includes seed/backfill scripts underdocker-local/— purely dev fixtures.Test plan
make lint && make testcleanmake deps-up && make up && make seed-users; create + send + search + member ops as alice/bobtraces exportlog spam without an OTLP collectorno current keynak-loop on a backfilled keyless roomSummary by CodeRabbit
New Features
Bug Fixes
Chores
Chores (Observability)