Skip to content

feat(room-worker): sync server-to-server DM creation endpoint + drop HistoryConfig.SharedSince#164

Merged
vjauhari-work merged 35 commits into
mainfrom
claude/sync-dm-endpoint
May 8, 2026
Merged

feat(room-worker): sync server-to-server DM creation endpoint + drop HistoryConfig.SharedSince#164
vjauhari-work merged 35 commits into
mainfrom
claude/sync-dm-endpoint

Conversation

@vjauhari-work
Copy link
Copy Markdown
Collaborator

@vjauhari-work vjauhari-work commented May 8, 2026

Summary

Adds a synchronous server-to-server NATS request/reply endpoint hosted by room-worker that creates a DM/botDM Room + 2 Subscriptions, emits the same events the async path emits, and replies inline with the requester's Subscription. Also removes the unused HistoryConfig.SharedSince override field.

Spec: docs/superpowers/specs/2026-05-05-sync-dm-and-historyconfig-design.md
Plans: docs/superpowers/plans/2026-05-07-sync-dm-endpoint-part1.md, …-part2.md

What's new

  • NATS subject: chat.server.request.room.{siteID}.create.dm (queue group room-worker, server-cred-scoped) — pkg/subject.RoomCreateDMSync.
  • Wire types: model.SyncCreateDMRequest / model.SyncCreateDMReply.
  • Handler: handleSyncCreateDM in room-worker/handler.go (sync DM section near the bottom). Resolves users via GetUser, persists Room + 2 subs (idempotent under retry via Mongo dup-key), fans out subscription.update, emits cross-site room_created outbox via the existing outboxDedupID helper, replies with the canonical persisted requester sub.
  • Refactor: buildDMSubs / buildBotDMSubs extracted from processCreateRoomDM / processCreateRoomBotDM and shared with the new sync handler. No async behavior change.
  • Part 2: HistoryConfig.SharedSince field removed; historySharedSincePtr collapsed to a single fallback branch (Mode != HistoryModeNone → nil; otherwise req.Timestamp).

Implementation deviations from spec

Captured in the spec's "Implementation deviations from spec" section. Briefly:

Spec said Shipped
Call ReconcileMemberCounts; log on error Skip; set Room.UserCount/AppCount inline (DM=2,0; botDM=1,1). DMs have a fixed roster — no future membership change to self-heal.
Outbox publish: log and continue Fail the request. Sync RPC has no auto-retry; silent federation break otherwise.
FindDMSubscription only on dup-key fallback Always re-read after BulkCreateSubscriptions (the store swallows dup-key, in-memory sub may diverge from persisted).
errUserLookupFailed on any GetUser error Only on ErrUserNotFound; transient errors wrapped with %w err"internal error".
errRoomIDCollision surfaces verbatim Masked as "internal error"; mismatch details logged with roomID + requestID.
New file handler_sync_create_dm.go Code merged into room-worker/handler.go; tests in handler_test.go.

Out-of-spec fix in this PR

room-service/store_mongo.go partial-unique DM dedup index used roomType: {$in:…} in partialFilterExpression, which MongoDB rejects on most server versions. Split into two $eq-filtered partial indexes — one per room type — preserving the uniqueness contract.

Operator note: if any environment has a stale u_account_name_roomtype_dm_idx from a partial prior deploy, drop it manually before redeploying — the new code creates an index with that name but a different filter spec.

Test plan

  • make lint — clean
  • make test (all unit tests, -race) — every package PASS
  • go test -tags integration -race ./room-worker/... — PASS (4 sync DM integration tests + existing room-worker integration suite, on Mongo testcontainer)
  • Smoke test against staging once user-service caller PR lands
  • NATS callout policy: permit chat.server.request.room.> on room-worker (ops/IaC change, separate)

Out of scope

  • user-service caller (separate PR) — implements the validation contract: existence checks, EngName/ChineseName checks, bot Assistant.Enabled, self-DM rejection, dedup-existing.
  • NATS callout policy update for chat.server.request.room.> permission.

Generated by Claude Code

Summary by CodeRabbit

  • New Features

    • Added a synchronous server-to-server DM creation endpoint with request/reply behavior and idempotent deduplication.
  • Improvements

    • Removed the SharedSince history field; history sharing is now derived from subscription acceptance time.
    • More reliable DM creation: subscription.update fan-out and cross-site room-created outbox are produced consistently.
  • Documentation

    • Added detailed implementation and rollout plans.
  • Tests

    • Added unit and integration tests for sync DM flows, retries, idempotency, and outbox publishing.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

Warning

Rate limit exceeded

@vjauhari-work has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 35 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e41dad4a-038a-4c33-8f81-6cdfde6d3a0d

📥 Commits

Reviewing files that changed from the base of the PR and between 2189a5b and 480bab1.

📒 Files selected for processing (3)
  • docs/superpowers/specs/2026-05-05-sync-dm-and-historyconfig-design.md
  • room-worker/handler.go
  • room-worker/handler_test.go
📝 Walkthrough

Walkthrough

This PR adds a synchronous NATS request/reply DM/botDM creation endpoint in room-worker, removes HistoryConfig.SharedSince, adds SyncCreateDM request/reply types and a subject builder, extends the subscription store and Mongo indexes, refactors subscription construction and history handling, wires the NATS handler in main, and adds unit and integration tests.

Changes

Sync DM Endpoint & History Config Overhaul

Layer / File(s) Summary
Design & Planning
docs/superpowers/specs/2026-05-05-sync-dm-and-historyconfig-design.md, docs/superpowers/plans/2026-05-07-sync-dm-endpoint-part*.md
Two-part plan: Part 1 sync DM request/reply contract and handler pipeline; Part 2 remove HistoryConfig.SharedSince and update handler/tests.
Data Models & Tests
pkg/model/member.go, pkg/model/model_test.go
Removed HistoryConfig.SharedSince; added SyncCreateDMRequest and SyncCreateDMReply; JSON round-trip tests added (plus a stray numeric literal in tests).
Subject Builder
pkg/subject/subject.go, pkg/subject/subject_test.go
Added RoomCreateDMSync(siteID) subject builder returning chat.server.request.room.{siteID}.create.dm with unit test.
Store Interface
room-worker/store.go
Added SubscriptionStore.FindDMSubscription(ctx, account, targetName) for canonical requester-sub lookup.
Mock Regeneration
room-worker/mock_store_test.go
GoMock mock updated with FindDMSubscription and recorder methods.
Store & Index Implementation
room-worker/store_mongo.go, room-service/store_mongo.go
Implemented MongoStore.FindDMSubscription; replaced a single $in partial-unique index with two per-room-type partial unique indexes (DM and BotDM).
DM Subscription Refactoring
room-worker/handler.go
Extracted buildDMSubs and buildBotDMSubs; changed historySharedSincePtr to ignore removed SharedSince and validate/log timestamps.
Sync DM Handler
room-worker/handler.go
Added sentinel errors, sanitizeSyncDMError, handleSyncCreateDM orchestration, and natsServerCreateDM wrapper enforcing X-Request-ID, idempotent create/reuse, subscription insertion + canonical re-read, subscription.update publishes, and cross-site outbox emission (failing on outbox publish).
NATS Wiring
room-worker/main.go
QueueSubscribe to subject.RoomCreateDMSync(cfg.SiteID) with handler.natsServerCreateDM; process exits on subscription failure.
Handler Unit Tests
room-worker/handler_test.go
Extensive tests for error sanitization, validation, user lookup, collision mismatch handling, persistence and canonical re-read (FindDMSubscription), publish fan-out, outbox behavior, transient errors, idempotency, and history fallback cases.
Handler Integration Tests
room-worker/integration_test.go
Integration tests verifying DM/BotDM persistence (room + 2 subs), per-user subscription.update publishes, cross-site outbox for BotDM, retry idempotency with fixed request ID, and outbox payload/msgID convergence.
MinIO Test Formatting
pkg/minioutil/minio_integration_test.go
Refactored local test payload struct declarations from single-line to multi-line; no behavior changes.
History-service mongorepo
history-service/internal/mongorepo/room.go, history-service/internal/service/service.go
Switched to mongoutil.Collection and updated projection helper import for compile-time assertion.
Test Images
pkg/testutil/testimages/testimages.go
Pin Mongo integration image to mongo:4.4.15.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • hmchangw/chat#41: Related history/shared-history and handler/store changes affecting shared-history fields and related logic.

Suggested reviewers

  • mliu33
  • Joey0538

"🐰 I hopped through subjects and subs so neat,
Two subscriptions formed where DM paths meet.
SharedSince is gone, history finds its way,
Outbox hops cross-site to brighten the day. ✨"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the two main changes: adding a sync DM endpoint in room-worker and removing HistoryConfig.SharedSince, matching the substantial implementation across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/sync-dm-endpoint

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vjauhari-work vjauhari-work force-pushed the claude/sync-dm-endpoint branch from 604af0d to 7c08d56 Compare May 8, 2026 04:40
@vjauhari-work
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
room-service/store_mongo.go (1)

83-115: Operator action required before deploying: drop the stale prior partial index.

The PR description notes that if the old $in-based partial index exists, it must be manually dropped. EnsureIndexes creates the two new $eq-split indexes (idempotent when names match), but MongoDB will keep any old index with a different name or spec silently, which means the prior uniqueness guarantee could remain in effect under the wrong key until the old index is removed.

// Drop the prior $in-based partial index (run on each shard/replica set primary before rolling deploy):
db.subscriptions.dropIndex("<old_index_name>")

Verify the stale index name against your current production index list before running.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@room-service/store_mongo.go` around lines 83 - 115, Before deploying,
manually drop the old $in-based partial index that would conflict with the new
split indexes created by the dmDedupIndexes logic (the CreateMany call on
s.subscriptions.Indexes()); verify the existing index name in your production
subscriptions index list and run a dropIndex for that stale index on each
shard/replica set primary so MongoDB does not silently keep the old uniqueness
rule—then proceed with the deploy that creates the new indexes named
"u_account_name_roomtype_dm_idx" and "u_account_name_roomtype_botdm_idx".
room-worker/handler.go (1)

1403-1417: ⚡ Quick win

Use natsutil.ReplyJSON for the success reply, not m.Msg.Respond

The coding guidelines mandate natsutil.ReplyJSON for all NATS success responses. Line 1414 bypasses this by calling m.Msg.Respond(reply) directly on pre-marshaled bytes. The idiomatic fix is to change handleSyncCreateDM's return type from ([]byte, error) to (*model.SyncCreateDMReply, error) and let natsServerCreateDM drive serialisation.

As per coding guidelines: "Use natsutil.ReplyJSON for NATS success responses and natsutil.ReplyError for errors".

♻️ Proposed refactor
-func (h *Handler) handleSyncCreateDM(ctx context.Context, data []byte) ([]byte, error) {
+func (h *Handler) handleSyncCreateDM(ctx context.Context, data []byte) (*model.SyncCreateDMReply, error) {
     ...
-    reply, err := json.Marshal(model.SyncCreateDMReply{
-        Success:      true,
-        Subscription: *requesterSub,
-    })
-    if err != nil {
-        return nil, fmt.Errorf("marshal reply: %w", err)
-    }
-    return reply, nil
+    return &model.SyncCreateDMReply{Success: true, Subscription: *requesterSub}, nil
 }

 func (h *Handler) natsServerCreateDM(m otelnats.Msg) {
     ctx := natsutil.ContextWithRequestIDFromHeaders(m.Context(), m.Msg.Header)
     reply, err := h.handleSyncCreateDM(ctx, m.Msg.Data)
     if err != nil {
         slog.Error("sync DM: handler failed",
             "error", err, "subject", m.Msg.Subject,
             "requestID", natsutil.RequestIDFromContext(ctx))
         natsutil.ReplyError(m.Msg, sanitizeSyncDMError(err))
         return
     }
-    if err := m.Msg.Respond(reply); err != nil {
+    if err := natsutil.ReplyJSON(m.Msg, reply); err != nil {
         slog.Error("sync DM: reply failed", "error", err, "subject", m.Msg.Subject)
     }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@room-worker/handler.go` around lines 1403 - 1417, The handler currently calls
m.Msg.Respond with pre-marshaled bytes; change handleSyncCreateDM to return
(*model.SyncCreateDMReply, error) instead of ([]byte, error), update its
callers, and let natsServerCreateDM perform JSON serialization by calling
natsutil.ReplyJSON(ctxOrMsg, replyStruct) on success (keep using
natsutil.ReplyError and sanitizeSyncDMError for errors); update imports/types as
needed and ensure error logging around natsutil.ReplyJSON mirrors the existing
slog.Error behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/plans/2026-05-07-sync-dm-endpoint-part2.md`:
- Around line 22-26: The markdown fenced code block containing the grep commands
(the block that searches for "HistoryConfig{" and "History.SharedSince" /
"\"sharedSince\"") lacks a language tag and triggers markdownlint MD040; fix it
by updating the opening fence to include a language identifier (e.g., change the
opening "```" to "```bash") so the block is marked as bash and the linter
warning is resolved.

In `@docs/superpowers/specs/2026-05-05-sync-dm-and-historyconfig-design.md`:
- Around line 47-52: The markdown fenced code blocks for the topology diagram
(the block containing "user-service ──validates──> NATS req/reply ──>
room-worker ...") and the handler pipeline pseudocode (the block starting with
"1. Read X-Request-ID from ctx ...") are missing language specifiers and causing
MD040; add a language specifier "text" to both opening backtick fences (e.g.,
replace ``` with ```text) so markdownlint stops flagging them, leaving the block
contents unchanged.

---

Nitpick comments:
In `@room-service/store_mongo.go`:
- Around line 83-115: Before deploying, manually drop the old $in-based partial
index that would conflict with the new split indexes created by the
dmDedupIndexes logic (the CreateMany call on s.subscriptions.Indexes()); verify
the existing index name in your production subscriptions index list and run a
dropIndex for that stale index on each shard/replica set primary so MongoDB does
not silently keep the old uniqueness rule—then proceed with the deploy that
creates the new indexes named "u_account_name_roomtype_dm_idx" and
"u_account_name_roomtype_botdm_idx".

In `@room-worker/handler.go`:
- Around line 1403-1417: The handler currently calls m.Msg.Respond with
pre-marshaled bytes; change handleSyncCreateDM to return
(*model.SyncCreateDMReply, error) instead of ([]byte, error), update its
callers, and let natsServerCreateDM perform JSON serialization by calling
natsutil.ReplyJSON(ctxOrMsg, replyStruct) on success (keep using
natsutil.ReplyError and sanitizeSyncDMError for errors); update imports/types as
needed and ensure error logging around natsutil.ReplyJSON mirrors the existing
slog.Error behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b5506ebc-60a1-4ddf-9146-0783d43f9d9a

📥 Commits

Reviewing files that changed from the base of the PR and between 2940eec and 604af0d.

📒 Files selected for processing (16)
  • docs/superpowers/plans/2026-05-07-sync-dm-endpoint-part1.md
  • docs/superpowers/plans/2026-05-07-sync-dm-endpoint-part2.md
  • docs/superpowers/specs/2026-05-05-sync-dm-and-historyconfig-design.md
  • pkg/minioutil/minio_integration_test.go
  • pkg/model/member.go
  • pkg/model/model_test.go
  • pkg/subject/subject.go
  • pkg/subject/subject_test.go
  • room-service/store_mongo.go
  • room-worker/handler.go
  • room-worker/handler_test.go
  • room-worker/integration_test.go
  • room-worker/main.go
  • room-worker/mock_store_test.go
  • room-worker/store.go
  • room-worker/store_mongo.go

Comment thread docs/superpowers/plans/2026-05-07-sync-dm-endpoint-part2.md Outdated
Comment thread docs/superpowers/specs/2026-05-05-sync-dm-and-historyconfig-design.md Outdated
@vjauhari-work
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

vjauhari-work pushed a commit that referenced this pull request May 8, 2026
PR #161 (1807b75) introduced room.go and a usage of mongorepo.RoomRepo
in service.go but missed:
- Adding the pkg/mongoutil import in room.go (Collection, NewCollection,
  WithProjection are unqualified there, which compiles only with stricter
  go vet on newer golangci-lint).
- Adding the mongorepo import in service.go for the var _ assertion.

Lint v2.11.4 (CI version) catches both as typecheck failures; older lint
versions silently passed. Apply the import fixes so the rebased PR #164
passes CI.

Includes a rebase onto latest main.
@vjauhari-work vjauhari-work force-pushed the claude/sync-dm-endpoint branch from 948a946 to bd440c6 Compare May 8, 2026 05:50
Comment thread room-worker/handler.go
}

// buildBotDMSubs returns the two botDM subs (human IsSubscribed=true, bot IsSubscribed=false).
func buildBotDMSubs(requester, bot *model.User, room *model.Room, acceptedAt time.Time) []*model.Subscription {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future, we will probably need the other way around where requester is bot in the case that bot is sending msg directly to new user

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this part in next PR.

Comment thread room-worker/handler.go Outdated
claude added 19 commits May 8, 2026 07:14
…nfig.SharedSince removal

Two related-but-distinct cleanups in a single design document:

Part 1 — Sync server-to-server DM creation. New NATS request/reply
endpoint chat.server.request.room.{siteID}.create.dm hosted by
room-worker (server credentials, queue group room-worker). The endpoint
is intentionally persistence-only:

  - Caller-side validation contract: user-service performs all
    data-integrity checks (existence, EngName/ChineseName, bot
    Assistant.Enabled, self-DM, dedup-existing via FindDMSubscription)
    before issuing the request.
  - room-worker performs only request-shape sanity checks, resolves
    requester/other User records via GetUser for data (User.ID,
    User.SiteID), persists Room + 2 subs, emits subscription.update
    events and cross-site room_created outbox event, and replies with
    the requester's Subscription.
  - Idempotent against retries and concurrent races via duplicate-key
    fallback at insert time.

Request shape (3 fields):
  type SyncCreateDMRequest struct {
      RoomType         RoomType `json:"roomType"`
      RequesterAccount string   `json:"requesterAccount"`
      OtherAccount     string   `json:"otherAccount"`
  }

Reply shape:
  type SyncCreateDMReply struct {
      Success      bool               `json:"success"`
      Subscription model.Subscription `json:"subscription"`
  }

Errors via standard natsutil.ReplyError -> model.ErrorResponse. No new
shared package; small internal buildDMSubs / buildBotDMSubs extractions
inside room-worker. room-service is not touched on this path.

Part 2 — Remove HistoryConfig.SharedSince. HistoryConfig now carries
only Mode. Subscription.HistorySharedSince is derived exclusively from
acceptedAt (the request-acceptance time stamped by room-service) when
HistoryMode == "none". historySharedSincePtr signature reduces to
(mode HistoryMode, acceptedAt time.Time) *int64.

https://claude.ai/code/session_015HZFYkTm4gPbZmq2NYRUpb
Part 1 specifies the synchronous DM-create endpoint. Part 2 removes the
unused HistoryConfig.SharedSince override field to simplify the add-member
request schema before downstream code stabilizes around it.

https://claude.ai/code/session_015HZFYkTm4gPbZmq2NYRUpb
claude added 14 commits May 8, 2026 07:15
- B1: stop wrapping the errUserLookupFailed sentinel; wrap the real store
  error so transient mongo failures aren't masked as "user not found".
- B2: always re-read the canonical persisted sub via FindDMSubscription
  after BulkCreateSubscriptions. The store swallows duplicate-key races,
  so the in-memory sub may carry stale fields on retry.
- F1: bubble outbox publish errors to the caller. Federation can't recover
  silently — if the cross-site room_created emit fails, the request must
  fail so the caller can retry.
- F2: use the existing outboxDedupID helper for consistency with the
  async outbox call sites.
- Q4: mask errRoomIDCollision as "internal error" on the wire; log details
  server-side. Implementation detail no longer leaks to callers.
- Q5: add bson tags to SyncCreateDMRequest/SyncCreateDMReply per project rule.
- Q9: log natsServerCreateDM errors structurally with subject + requestID
  so operators can diagnose without scraping reply errors.
- Q6: rename test variable cap → capture (avoid shadowing builtin).
- Q7: extract newSyncDMTestHandler helper, mirroring newAddMembersTestHandler.

Q1+Q2 (share helpers/error types between sync and async) deferred —
async uses errPermanent/sanitizeAsyncJobError for JetStream Ack/Nak; sync
uses named sentinels for stable RPC error codes. Different contracts on
purpose; merging would worsen one path. Async path code untouched.

Q8 (drop dm-prefix on test types) deferred — collision with the type of
the same name in integration_test.go. Rename in unit-test file alone
isn't useful; renaming both would touch async-path tests.

Q3 (split handleSyncCreateDM) deferred — ~80 lines doing one logical op,
already covered by per-step tests; split would just shuffle parameters.
- Merge handler_sync_create_dm.go into handler.go per request to colocate
  the feature with the rest of the handler logic. Production file deleted.
- T3: per-sub field assertions in DM_PersistsSubs test (alice + bob subs
  named correctly, IsSubscribed false, RoomType set).
- T4: table-driven RoomCollisionMismatch covering Type/SiteID/Name/CreatedBy.
- T6: marshalReq helper replaces ignored json.Marshal errors.
- T7: federation-convergence integration test asserts outbox RoomID matches
  the deterministic BuildDMRoomID and that replay produces identical
  Nats-Msg-Id (stream dedup blocks duplicate cross-site emits).
- Q7: extract newSyncDMTestHandler to mirror newAddMembersTestHandler.

T5 (t.Parallel()) deferred — handler_test.go has 0 uses; adding only here
would diverge from existing convention. Q1+Q2 (share helpers/error types
with async path) deferred — different contracts; user instructed no async
refactor.
MongoDB's partialFilterExpression rejects $in on most server versions
(only $eq/$exists/$gt/$lt/$type/$and are universally supported).
The single combined index using {roomType: {$in: [dm, botDM]}} fails on
startup with: "unsupported expression in partial index: roomType $in".

Replace with two equivalent indexes — one filtered to {$eq: "dm"}, one
to {$eq: "botDM"} — both with the same (u.account, name, roomType)
unique key. Net effect on the create-room contract is unchanged.

Operators with the broken index from a prior deploy must drop
u_account_name_roomtype_dm_idx manually before redeploying — the new
code creates a fresh index with that name (filtered to $eq dm) and a
new u_account_name_roomtype_botdm_idx.
For sync DM creation, we know the membership shape at creation time:
- DM     → UserCount=2, AppCount=0
- botDM  → UserCount=1, AppCount=1

Set these directly on the inserted Room and drop the ReconcileMemberCounts
call on the sync path. Rationale: DMs/botDMs have a fixed roster — there
will never be a future add/remove to trigger a Reconcile pass — so the
spec's "Reconcile failure self-heals on the next membership change"
reasoning doesn't apply. A failed Reconcile would leave the Room with
counts (0, 0) forever; clients would see "0 members" on a 2-person DM.

Setting counts inline removes a Mongo round-trip per call, eliminates
the failure mode, and reduces handler complexity. Async path (channels)
is unchanged — Reconcile is still called via finishCreateRoom because
channels do see future membership changes.

Tests:
- DM unit test asserts inserted Room carries (2, 0).
- botDM unit test asserts (1, 1).
- DM integration test asserts the persisted Mongo Room has (2, 0)
  without ever calling Reconcile.
- Drop dead default branch in sync DM room-type switch (validateSyncCreateDMShape
  already gates this upstream).
- Trim two-line FindDMSubscription comment to one line.
- Drop redundant defer ctrl.Finish() (gomock v1.5+ auto-registers cleanup).
- Replace ad-hoc section-divider style with plain section comment to match the file.
- Add unit tests: BulkCreateSubscriptions transient (non-dup-key) error → "internal
  error"; idempotent re-create path asserts sub.JoinedAt = existing.CreatedAt.

Spec + Part 1 plan: add concise table of implementation deviations from the
original design (Reconcile-skip, outbox-fail, always-re-read, error-wrap,
collision-mask, file-merge) plus the room-service partial-index $in→$eq fix.

Items deferred (round 2): roomID-based outbox dedup (would diverge from async
convention which uses requestID); DM AddMember guard (would touch async path);
FindDMSubscription by roomId (async room-service version uses Name).
Mirrors the production-side merge of handler_sync_create_dm.go into handler.go
done previously. Tests for the sync DM endpoint now live with the rest of the
handler tests; handler_sync_create_dm_test.go deleted.
- room-service/store_mongo.go: auto-drop the legacy $in-filtered DM dedup
  index in EnsureIndexes if a prior deploy installed it. Eliminates the
  manual operator step CodeRabbit flagged: same name as the new dm index
  but with an incompatible partialFilterExpression would otherwise fail
  CreateMany with IndexOptionsConflict on startup. Idempotent — no-op when
  the index doesn't exist or already has the new $eq spec; logs at warn
  level when it actually drops.
- docs/superpowers/plans/.../part2.md: add `bash` language tag to grep
  block (markdownlint MD040).
- docs/superpowers/specs/...sync-dm-design.md: add `text` language tags
  to topology and handler-pipeline fenced blocks (markdownlint MD040).
Per CodeRabbit review: convention violation — every other NATS reply in
the repo goes through natsutil.ReplyJSON (handles marshal + Respond +
error logging). The sync DM handler was bypassing it with json.Marshal +
m.Msg.Respond directly.

Refactor:
- handleSyncCreateDM now returns (*model.SyncCreateDMReply, error)
  instead of ([]byte, error). Drops the manual marshal block.
- natsServerCreateDM calls natsutil.ReplyJSON(m.Msg, reply); the helper
  handles marshal failure (replies with sanitized error) and respond
  failure (logs).
- Tests no longer json.Unmarshal raw bytes — assert directly on the
  returned struct, simplifying ~10 unit/integration test assertions.

Net: -30 / +5 lines. Same wire format, same observable behavior.
The original $in-based partialFilterExpression was rejected by MongoDB at
index-creation time, so no environment ever ended up with a stale legacy
index to clean up — the auto-drop code was dead by construction. Revert
to leave EnsureIndexes minimal; the split-into-two-$eq fix is unchanged.
PR #161 (1807b75) introduced room.go and a usage of mongorepo.RoomRepo
in service.go but missed:
- Adding the pkg/mongoutil import in room.go (Collection, NewCollection,
  WithProjection are unqualified there, which compiles only with stricter
  go vet on newer golangci-lint).
- Adding the mongorepo import in service.go for the var _ assertion.

Lint v2.11.4 (CI version) catches both as typecheck failures; older lint
versions silently passed. Apply the import fixes so the rebased PR #164
passes CI.

Includes a rebase onto latest main.
Catches operator-allow-list regressions like $in in
partialFilterExpression that newer Mongo silently accepts but production
4.4.x rejects. Same shape as the bug fixed by 2734235 (split DM dedup
index into two $eq partial indexes).
MongoDB 4.4 (production) rejects two indexes with identical key specs
but different partialFilterExpression — IndexOptionsConflict at index
build time. The split-into-two-$eq-indexes fix from 2734235 worked on
Mongo 6+ where partialFilter is part of index identity, but not on 4.4.

The partial-unique constraint was redundant anyway:
- DM rooms use idgen.BuildDMRoomID(a, b), deterministic in user IDs.
  Concurrent same-pair DM creation hits Room._id duplicate-key.
- The existing (roomId, u.account) unique index at line 58 already
  catches duplicate subs per user per room.

Dropping the partial-unique (u.account, name, roomType) indexes restores
Mongo 4.4 compatibility without losing any dedup guarantee.
@vjauhari-work vjauhari-work force-pushed the claude/sync-dm-endpoint branch from 3f149e4 to 2189a5b Compare May 8, 2026 07:16
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
room-worker/handler_test.go (1)

2406-2481: ⚡ Quick win

Add coverage for the invalid UUID request-ID branch.

handleSyncCreateDM has a dedicated errInvalidRequestID path, but every helper here injects a valid UUID and none of the new tests hit the malformed-header case. Add one case with a bad X-Request-ID so this service-boundary validation stays locked down.

As per coding guidelines, **/*handler_test.go: Every handler method must have tests for: valid input, invalid/malformed input, store errors, and edge cases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@room-worker/handler_test.go` around lines 2406 - 2481, Add a test that
exercises the invalid UUID request-ID branch in handleSyncCreateDM by calling
Handler.handleSyncCreateDM with request data (e.g. marshalReq of a valid
SyncCreateDMRequest) but a context that contains a malformed X-Request-ID header
(not a valid UUID) so the function returns errInvalidRequestID; locate where
other tests build the context (newRequestCtx) and either create a variant that
injects the bad X-Request-ID or manually set the header in the context before
calling handleSyncCreateDM to assert ErrorIs(err, errInvalidRequestID).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/specs/2026-05-05-sync-dm-and-historyconfig-design.md`:
- Around line 331-368: The doc's snippet and prose claim historySharedSincePtr
has signature historySharedSincePtr(mode, acceptedAt) and that call sites pass
acceptedAt, but the shipped implementation in room-worker/handler.go still
defines historySharedSincePtr(history, timestamp, roomID) and derives timestamp
from req.Timestamp; update the spec to match the shipped code (or explicitly
note the shipped deviation): either change the doc snippet and surrounding text
to show the current signature historySharedSincePtr(history model.HistoryConfig,
timestamp int64, roomID string) and call sites using req.Timestamp (with the
existing normalization in processAddMembers), or add a clear note stating that
the implementation was shipped with the older signature and why, referencing the
function name historySharedSincePtr and the handler file room-worker/handler.go
so follow-up changes/tests use the correct form.

In `@room-worker/handler.go`:
- Around line 1350-1357: The current code re-reads only the requester
subscription (requesterSub) but still calls h.publishSubscriptionUpdates with
the original in-memory subs, which can leak generated IDs/JoinedAt from a raced
BulkCreateSubscriptions; instead, after calling h.store.FindDMSubscription(ctx,
requester.Account, other.Account) also re-read the counterpart subscription via
h.store.FindDMSubscription(ctx, other.Account, requester.Account) (or the
appropriate Find method for the opposite direction), handle errors, and pass the
persisted pair (the two re-read subscription objects) into
h.publishSubscriptionUpdates in place of subs along with acceptedAt and
requestID so events are published from canonical DB state.
- Around line 1381-1436: The event Timestamp fields are being set from the old
acceptedAt value, producing stale timestamps on retries; in
publishSubscriptionUpdates (model.SubscriptionUpdateEvent) set evt.Timestamp =
time.Now().UTC().UnixMilli() at publish time (keep acceptedAt on domain fields
like the embedded Subscription.JoinedAt), and in publishSyncDMOutbox keep
RoomCreatedOutbox.Timestamp = acceptedAt.UnixMilli() but set envelope.Timestamp
(model.OutboxEvent.Timestamp) = time.Now().UTC().UnixMilli() before
marshalling/publishing; use time.Now().UTC().UnixMilli() consistently at the
publish site for all NATS event Timestamp fields.

---

Nitpick comments:
In `@room-worker/handler_test.go`:
- Around line 2406-2481: Add a test that exercises the invalid UUID request-ID
branch in handleSyncCreateDM by calling Handler.handleSyncCreateDM with request
data (e.g. marshalReq of a valid SyncCreateDMRequest) but a context that
contains a malformed X-Request-ID header (not a valid UUID) so the function
returns errInvalidRequestID; locate where other tests build the context
(newRequestCtx) and either create a variant that injects the bad X-Request-ID or
manually set the header in the context before calling handleSyncCreateDM to
assert ErrorIs(err, errInvalidRequestID).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 684a5b67-f025-4958-ae12-e6824c2078c4

📥 Commits

Reviewing files that changed from the base of the PR and between 948a946 and 2189a5b.

📒 Files selected for processing (19)
  • docs/superpowers/plans/2026-05-07-sync-dm-endpoint-part1.md
  • docs/superpowers/plans/2026-05-07-sync-dm-endpoint-part2.md
  • docs/superpowers/specs/2026-05-05-sync-dm-and-historyconfig-design.md
  • history-service/internal/mongorepo/room.go
  • history-service/internal/service/service.go
  • pkg/minioutil/minio_integration_test.go
  • pkg/model/member.go
  • pkg/model/model_test.go
  • pkg/subject/subject.go
  • pkg/subject/subject_test.go
  • pkg/testutil/testimages/testimages.go
  • room-service/store_mongo.go
  • room-worker/handler.go
  • room-worker/handler_test.go
  • room-worker/integration_test.go
  • room-worker/main.go
  • room-worker/mock_store_test.go
  • room-worker/store.go
  • room-worker/store_mongo.go
💤 Files with no reviewable changes (1)
  • room-service/store_mongo.go
✅ Files skipped from review due to trivial changes (4)
  • pkg/testutil/testimages/testimages.go
  • history-service/internal/service/service.go
  • pkg/minioutil/minio_integration_test.go
  • docs/superpowers/plans/2026-05-07-sync-dm-endpoint-part1.md
🚧 Files skipped from review as they are similar to previous changes (8)
  • pkg/subject/subject_test.go
  • room-worker/main.go
  • room-worker/mock_store_test.go
  • room-worker/store_mongo.go
  • pkg/model/member.go
  • room-worker/store.go
  • room-worker/integration_test.go
  • pkg/subject/subject.go

Comment thread room-worker/handler.go Outdated
Comment thread room-worker/handler.go Outdated
handleSyncCreateDM did two sequential GetUser round-trips (requester then
other). Collapse to a single FindUsersByAccounts call (per @mliu33's review)
and look up each side from a local map. Cuts one DB round-trip per sync
DM creation.

Also drop publishedMsg.msgID — leftover from the SharedSince test deletion;
golangci-lint flagged it as unused, breaking CI.

https://claude.ai/code/session_01MprNzSB1QQJ7r2GHzRZonY
Copy link
Copy Markdown
Collaborator

@mliu33 mliu33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Thanks!

Three CodeRabbit review items on PR #164:

1. Publish canonical persisted subs (not in-memory placeholders).
   BulkCreateSubscriptions swallows dup-key races, so on retry the
   in-memory subs may carry IDs/JoinedAt that never made it to Mongo.
   Re-read both subs via FindDMSubscription before fan-out.

2. Stamp event-level Timestamp at publish time. Per project guideline,
   NATS event envelope timestamps must be time.Now().UTC().UnixMilli()
   at publish — not the (potentially stale) acceptedAt. Domain-level
   timestamps (JoinedAt, RoomCreatedOutbox.Timestamp) keep acceptedAt.

3. Spec doc drift: historySharedSincePtr signature in the spec showed
   (mode, acceptedAt) but shipped code kept (history, timestamp, roomID).
   Add a "Shipped deviation" note to the spec.

https://claude.ai/code/session_01MprNzSB1QQJ7r2GHzRZonY
@vjauhari-work vjauhari-work merged commit 19b46fe into main May 8, 2026
8 checks passed
@vjauhari-work vjauhari-work deleted the claude/sync-dm-endpoint branch May 8, 2026 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants