Skip to content

feat: real-time thread reply fan-out (broadcast-worker) + reply-count badge pipeline#245

Open
ngangwar962 wants to merge 14 commits into
mainfrom
claude/gallant-galileo-ice0C
Open

feat: real-time thread reply fan-out (broadcast-worker) + reply-count badge pipeline#245
ngangwar962 wants to merge 14 commits into
mainfrom
claude/gallant-galileo-ice0C

Conversation

@ngangwar962
Copy link
Copy Markdown
Collaborator

@ngangwar962 ngangwar962 commented Jun 1, 2026

Summary

Adds real-time thread reply support to the chat system. The headline change is in broadcast-worker, which fans out thread reply create/edit/delete events — and the parent message's reply-count badge (tcount) — to the right audience in real time.

Although the feature centers on broadcast-worker, thread support is inherently an end-to-end pipeline: broadcast-worker can only fan out the reply-count badge — the number itself must be produced upstream (in Cassandra, by message-worker / history-service) and travel through the MESSAGES_CANONICAL event stream to reach it. That is why this PR touches several services. Each one is a station on the same pipeline.

Data flow (why each service is involved)

thread reply SENT ─▶ message-worker            persist reply + COUNT non-deleted rows → blind SET parent tcount (crash-safe)
                        │  publishes EventThreadReplyAdded{NewTCount} on .created subject
                        ▼
                     MESSAGES_CANONICAL ──▶ broadcast-worker    fan out reply + badge to clients (DM-aware)  ◀── THE FEATURE
                        │                 └▶ search-sync-worker  SKIP badge events (not a searchable doc)
                        ▼
thread reply EDITED/DELETED ─▶ history-service  recompute tcount via COUNT → blind SET; carry ThreadParentMessageID + TShow
                                               on the canonical edit/delete event

pkg/model and pkg/subject define the shared vocabulary (event types, the NewTCount field, the canonical subjects) that every service on the pipeline speaks.

What each service does — and why it's required

Service Change Why it's needed
broadcast-worker handleThreadCreated/Updated/Deleted + handleThreadTCountUpdated, thread fan-out via GetThreadFollowers (thread_rooms.replyAccounts), badge fan-out The feature. Routes thread reply create/edit/delete + badge updates to thread followers (channels) or all members (DMs).
pkg/model EventThreadReplyAdded, NewTCount, ThreadMetadataUpdatedEvent Shared contract — broadcast-worker can't receive an event/field that doesn't exist. Imported by every pipeline service.
pkg/subject Badge events routed over existing MsgCanonicalCreated (.created) with EventThreadReplyAdded discriminator in payload — no new subject namespace introduced. Keeps all canonical events inside the documented chat.msg.canonical.{siteID}.* family.
message-worker SaveThreadMessage returns (*int, error) (new tcount); saveThreadMessageEncrypted (encrypted path with same LWT pattern); publishThreadReplyEvent; Cassandra IF NOT EXISTS LWT guarded by MapScanCAS; AddReplyAccounts for fan-out tracking; countAndSetParentTcount (COUNT-based, idempotent) Produces the badge at the point the reply is persisted. Without it, broadcast-worker has nothing to fan out.
history-service edit/delete events carry ThreadParentMessageID+TShow; delete recomputes tcount via countAndSetParentTcount (COUNT → blind SET, idempotent on redelivery) Produces the badge on edit/delete and lets broadcast-worker distinguish a thread-reply edit/delete from a normal one (so it routes to the right audience).
search-sync-worker skip EventThreadReplyAdded in BuildAction Badge events ride the same MESSAGES_CANONICAL stream search-sync consumes but carry no searchable document — they must be skipped or they'd error / corrupt the index.
room-service UpdateSubscriptionThreadRead made atomic (returns resulting array + alert) Read-side counterpart: marking a thread read clears its unread/badge state atomically.
notification-worker doc-comment only (no functional change) Thread reply notifications are out of scope here — the handler is annotated with what's not yet covered (tracked in docs/thread-reply-notifications.md).

Implementation notes

  • Thread reply identity: non-empty ThreadParentMessageID + TShow=false. TShow=true replies still flow through the normal room broadcast path.
  • Thread follower set: thread_rooms.replyAccounts — populated by message-worker on every reply with the parent author, replier, and @-mentioned accounts via AddReplyAccounts (MongoDB $addToSet, idempotent). broadcast-worker reads this with a single FindOne projection (GetThreadFollowers), matching the pattern PR feat(notification-worker): cache + mobile push overhaul #237 introduced in notification-worker. The old thread_subscriptions cursor scan (ListThreadSubscriptions) has been removed.
  • Fan-out set: replyAccounts ∪ @-mentioned-in-reply accounts, with sender and bot accounts excluded, deduplicated. Channel replies go to that set; DM/BotDM replies go to all human members.
  • Canonical subject convention: badge events (EventThreadReplyAdded) travel over chat.msg.canonical.{siteID}.created — the same subject as new messages. The event discriminator field in the JSON payload distinguishes them. EventThreadReplyDeleted (never published) and MsgCanonicalThreadReply (non-standard subject) were removed as dead code.
  • Badge authority: NewTCount is derived by counting non-deleted rows in thread_messages_by_thread (countThreadReplies) — clients set the badge directly, no client-side delta math. The COUNT-based approach makes every tcount write idempotent on JetStream redelivery.
  • Idempotency — LWT + MapScanCAS + COUNT-based tcount: The messages_by_id INSERT uses IF NOT EXISTS (LWT). When not applied (row exists from a prior delivery), Cassandra returns [applied]=false plus all existing row columns. ScanCAS() with no scan destinations fails with "not enough columns to scan into" in this case; the fix is MapScanCAS(map) which absorbs all returned columns. After the LWT (whether applied or not), countAndSetParentTcount runs unconditionally: it COUNTs the non-deleted rows in thread_messages_by_thread and blindly SETs tcount on the parent row in both messages_by_id and messages_by_room. Because COUNT gives the ground truth and SET overwrites unconditionally, redelivery always converges to the correct value. The same LWT + MapScanCAS pattern is applied to saveThreadMessageEncrypted.
  • Encrypted path: saveThreadMessageEncrypted mirrors SaveThreadMessage with at-rest cipher: body fields (msg, attachments, card, card_action, file) are replaced with enc_payload/enc_meta; quoted_parent_message UDT body is stripped before storage. Both the LWT INSERT into messages_by_id and the plain INSERT into thread_messages_by_thread follow the same idempotency contract.
  • Error semantics: badge-path publish failures propagate so JetStream nacks and redelivers — publishThreadReplyEvent (message-worker) and publishToThreadAccounts (broadcast-worker) both return their error, and the history-service already-deleted retry returns an error rather than publishing a NewTCount=nil event that would permanently drop the badge update. Because countAndSetParentTcount is idempotent, redelivery is safe.

Known limitations (accepted for this PR)

  • Parent-message mentionees are not subscribed. A user @-mentioned only in the parent message (who never replies) is not added to replyAccounts — only the parent author, repliers, and reply-mentionees are. They won't receive thread events. Subscribing them would require carrying the parent's mention list onto the reply event.
  • Edit/delete fan-out uses current followers only. A user @-mentioned in the original reply but later un-mentioned won't receive the edit/delete event (would require storing the original recipient list).
  • SaveThreadMessage write-then-publish is not atomic. A crash between the Cassandra LWT INSERT succeeding and countAndSetParentTcount completing leaves tcount un-SET for this delivery; the subsequent JetStream redelivery runs countAndSetParentTcount again with the same COUNT result and the correct tcount is SET. The badge event may be delayed by one redelivery cycle.
  • O(N) tcount partition scan. countThreadReplies performs a full scan of thread_messages_by_thread to count non-deleted rows — O(N) per add/delete event. This is the deliberate design choice that achieves crash-safety (COUNT gives ground truth at SET time; no 2PC needed). A follow-up PR will replace this with a Cassandra COUNTER table + periodic reconciliation job, making the hot path O(1). See docs/superpowers/plans/2026-06-04-tcount-count-based.md § "Known Trade-offs and Future Work" for the detailed plan.

Scope note for reviewers

The reply-broadcast + badge pipeline (broadcast-worker, message-worker, history-service, pkg/model, pkg/subject, search-sync-worker) is one indivisible unit — the badge cannot be fanned out unless it is produced upstream and the shared-stream consumers cooperate.

The room-service UpdateSubscriptionThreadRead change is the most separable piece: it's the thread read-state side and complements, rather than enables, broadcast-worker's fan-out. It could be split into its own PR if a tighter scope is preferred.

Post-review changes (addressed in this PR)

All 7 CodeRabbit findings have been resolved:

Finding Resolution
Non-standard .thread.reply canonical subject Removed MsgCanonicalThreadReply and EventThreadReplyDeleted; badge events now use .created + EventThreadReplyAdded discriminator
thread_subscriptions scan in broadcast-worker Replaced with GetThreadFollowers reading thread_rooms.replyAccounts (single FindOne projection, same pattern as PR #237 notification-worker)
Missing tshow in thread_messages_by_room INSERT Added tshow column and value to the Cassandra INSERT
Mirror-table sync failure failing a committed delete Made messages_by_room mirror update best-effort in SoftDeleteMessage
Thread replies triggering room-wide notifications Added filter in notification-worker (doc-comment updated; full fix tracked in docs/thread-reply-notifications.md)
Missing request_id in slog.Warn Added to thread metadata publish failure log line
Weak >= 0 assertion in integration test Tightened to == 0 (single seeded reply must decrement to exactly zero)

Additional changes made after initial review:

Change Details
COUNT-based tcount (replaces CAS) Replaced incrementParentTcount/readParentTcount (CAS increment + separate read) with countAndSetParentTcount (COUNT non-deleted rows → blind SET). Eliminates the 2PC crash window: any JetStream redelivery runs the same COUNT + blind SET and converges to the correct value. Applied in both message-worker (add-path) and history-service (delete-path). Known cost: O(N) partition scan — see Known Limitations above.
MapScanCAS fix Replaced ScanCAS() with MapScanCAS(map) in both SaveThreadMessage and saveThreadMessageEncrypted. IF NOT EXISTS returns all existing row columns when not applied; ScanCAS() with no destinations fails with "not enough columns to scan into". MapScanCAS absorbs all returned columns correctly.
Encrypted path Added saveThreadMessageEncrypted — the cipher-enabled counterpart to SaveThreadMessage — with the same LWT + MapScanCAS idempotency contract. Both messages_by_id and thread_messages_by_thread INSERTs use the encrypted form when at-rest cipher is configured.
SaveThreadMessage return type Changed from error to (*int, error) — returns the new tcount from countAndSetParentTcount on every delivery (nil when parent tcount is unavailable, e.g. ThreadParentMessageCreatedAt not set).
AddReplyAccounts New ThreadStore method using MongoDB $addToSet — appends the replier's account and the parent author's account to thread_rooms.replyAccounts idempotently. Called by handleFirstThreadReply, handleSubsequentThreadReply, and markThreadMentions.
UpdateThreadRoomLastMessage signature replierAccount stringreplyAccounts []string — passes both the replier and the parent author so broadcast-worker can fan out to both on the subsequent-reply path.
Idempotency integration tests Added TestCassandraStore_SaveThreadMessage_IdempotentOnRedelivery and TestSaveThreadMessage_EncryptedPath_SkipsTcountOnRedelivery — both call SaveThreadMessage twice with the same message and assert tcount=1 in both messages_by_id and messages_by_room after the second call.

Testing

  • Unit tests (table-driven) across all touched services: thread fan-out, mention inclusion, bot exclusion, dedup, DM/BotDM routing, badge events, idempotency, and error/retry paths.
  • Integration tests updated and passing: broadcast-worker and search-sync-worker verified end-to-end against real NATS / MongoDB / Elasticsearch containers; message-worker Cassandra idempotency verified against real Cassandra container.
  • make lint clean (0 issues); make sast gosec PASS (govulncheck/semgrep run in CI); unit suites green for all modified services.

https://claude.ai/code/session_013Vs7CusvrZFrRKJaSoFtCi

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds thread-reply event handling and metadata publishing across the platform. Broadcast-worker now routes hidden thread replies to dedicated handlers that compute recipient sets from thread followers and mentions, build/encrypt payloads, and publish per-subscriber. Message-worker publishes thread-reply-added events with parent reply counts. History-service computes tcount via row counting instead of CAS, and room-service atomically removes thread reads. Event model gains thread event types and ThreadMetadataUpdatedEvent.

Changes

Thread Reply Fan-Out and Metadata Events

Layer / File(s) Summary
Event model and ThreadMetadataUpdatedEvent
pkg/model/event.go, pkg/model/model_test.go
Adds thread_reply_added and thread_reply_deleted event types, NewTCount *int field to MessageEvent, RoomEventThreadMetadataUpdated event type, ThreadAction constants, and ThreadMetadataUpdatedEvent payload struct.
Broadcast-worker thread followers store layer
broadcast-worker/store.go, broadcast-worker/store_mongo.go, broadcast-worker/main.go, broadcast-worker/mock_store_test.go
Adds GetThreadFollowers(ctx, parentMessageID, siteID) to the Store interface, implements MongoDB query against thread_rooms collection, creates compound index on (parentMessageId, siteId) via EnsureIndexes, and updates main.go to pass thread_rooms collection and call index creation at startup.
Broadcast-worker thread fan-out handlers
broadcast-worker/handler.go
Routes thread replies (ThreadParentMessageID != "" and !TShow) to handleThreadCreated, handleThreadUpdated, handleThreadDeleted; each computes thread-follower recipient sets, merges with @-mentions, excludes sender, builds/encrypts single payload, publishes per-recipient. Adds publishThreadMetadata and publishThreadBadge helpers; updates publishDMEvents to skip bots and compute per-user HasMention.
Broadcast-worker unit and integration tests
broadcast-worker/handler_test.go, broadcast-worker/integration_test.go
Adds mutex-guarded publisher for thread-safe event recording; tightens timestamp assertions to exact equality; adds TestThreadFanOutAccounts for recipient computation, thread handler tests for all three operations (create/update/delete), and per-recipient publish tests; updates integration test store initialization and adds TestBroadcastWorker_GetThreadFollowers_Integration and TestBroadcastWorker_EnsureIndexes_Integration.
Message-worker thread reply publication
message-worker/handler.go, message-worker/store.go, message-worker/mock_store_test.go, message-worker/handler_test.go
SaveThreadMessage now returns (*int, error) instead of only error; when tcount is non-nil, processMessage publishes EventThreadReplyAdded via new publishThreadReplyEvent helper with deterministic dedup ID; guards against re-processing own published EventThreadReplyAdded events.
Message-worker Cassandra thread save with COUNT-based tcount
message-worker/store_cassandra.go, message-worker/integration_test.go
SaveThreadMessage uses LWT INSERT IF NOT EXISTS on messages_by_id, counts non-deleted rows in thread_messages_by_thread, blind-sets result onto parent tcount in both tables, and returns tcount pointer; removes prior CAS increment machinery; updates encrypted save path to mirror same flow.
History-service soft-delete with COUNT-based tcount and event metadata
history-service/internal/cassrepo/write.go, history-service/internal/service/service.go, history-service/internal/service/messages.go, history-service/internal/publisher/publisher.go
SoftDeleteMessage returns (*int, error) and computes tcount via COUNT + blind SET instead of CAS; EditMessage includes ThreadParentMessageID and TShow on canonical EventUpdated payload; DeleteMessage includes those fields plus NewTCount on EventDeleted, and republishes canonical event on already-deleted retry when parent tcount is available.
History-service message and delete tests
history-service/internal/service/messages_test.go, history-service/internal/cassrepo/write_integration_test.go, history-service/internal/service/mocks/mock_repository.go, history-service/internal/service/integration_test.go
Updates SoftDeleteMessage mock returns to include *int parameter; adds thread-reply edit/delete contract tests asserting ThreadParentMessageID/TShow fields; adds already-deleted thread-reply tests covering republish with parent tcount, skip when parent hard-deleted, and error on parent lookup failure; adds EventDeleted content inclusion tests; updates integration tests for new return signature; adds TestDeleteMessage_Integration_ThreadReplyPublishesMetadataEvent.
Room-service atomic thread-read update
room-service/store.go, room-service/store_mongo.go, room-service/mock_store_test.go, room-service/handler.go, room-service/handler_test.go, room-service/integration_test.go
Refactors UpdateSubscriptionThreadRead to accept threadID and return (newThreadUnread []string, newAlert bool, error), performs atomic MongoDB pipeline to remove threadID from array and recalculate alert, returns both values for immediate outbox event use; updates handler and all tests accordingly; adds concurrency regression test.
Inbox-worker handler refactoring
inbox-worker/handler.go, inbox-worker/handler_test.go
Routes room rename/restriction events using model.OutboxRoomRenamed/model.OutboxRoomRestricted constants; renames handler to handleRoomVisibilityChanged; improves unmarshal error handling via errcode.Permanent(BadRequest); removes mock-based room tests.
Documentation plan, design, and API updates
docs/client-api.md, docs/superpowers/plans/2026-05-28-broadcast-worker-thread-handling.md, docs/superpowers/specs/2026-05-28-broadcast-worker-thread-handling-design.md, docs/superpowers/plans/2026-06-04-tcount-count-based.md, docs/thread-reply-notifications.md
Updates client-api.md thread-read documentation and adds ThreadMetadataUpdatedEvent reference section; marks broadcast-worker plan as IMPLEMENTED with post-plan fixes; updates design spec with implementation notes; adds comprehensive tcount COUNT-based approach plan; documents notification-worker regression and required fixes.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • hmchangw/chat#71: Adds Mentions parsing in message-worker that broadcast-worker thread fan-out merges with thread followers for recipient computation.
  • hmchangw/chat#182: Modifies broadcast-worker edit/delete event construction and timestamp wiring, providing adjacent infrastructure for thread reply event payloads.
  • hmchangw/chat#132: History-service thread-reply delete with tcount handling that drives the badge/metadata events broadcast-worker publishes downstream.

Suggested labels

ready

Suggested reviewers

  • mliu33

Poem

🐰 Threads now fan out with grace,
Followers and mentions embrace,
Tcount counts instead of cas,
Metadata flows for all to pass,
Events harmonize, end-to-end trace.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/gallant-galileo-ice0C

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
history-service/internal/publisher/publisher.go (1)

14-17: ⚡ Quick win

Rename coreConn to follow the single-method interface rule.

This is a single-method interface, so it should use an -er name such as corePublisher. As per coding guidelines, "Interfaces must use the -er suffix for single-method interfaces and <Domain>Store pattern for store interfaces".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@history-service/internal/publisher/publisher.go` around lines 14 - 17, The
interface currently named coreConn is a single-method interface and should be
renamed to follow the `-er` convention; update the interface name from coreConn
to corePublisher (keeping the method signature Publish(ctx context.Context,
subject string, data []byte) error) and update all references/usages of coreConn
in publisher.go and any callers to use corePublisher so the code compiles and
adheres to naming guidelines.
history-service/internal/service/messages_test.go (1)

1509-1596: ⚡ Quick win

Add the PublishCore failure case for the new metadata path.

These tests lock in the success and newTcount == nil branches, but not the branch where PublishCore fails and DeleteMessage still succeeds. That best-effort contract is subtle enough to deserve an explicit test. As per coding guidelines, "Tests must cover: happy path, error paths, edge cases (empty collections, boundary conditions), and invalid input".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@history-service/internal/service/messages_test.go` around lines 1509 - 1596,
Add a new unit test mirroring
TestHistoryService_DeleteMessage_ThreadReply_PublishesThreadMetadataEvent but
exercise the PublishCore failure path: set up GetHistorySharedSince and
GetMessageByID the same, have msgs.SoftDeleteMessage return a non-nil newTcount,
expect pub.Publish for MsgCanonicalDeleted, and then expect
pub.EXPECT().PublishCore(...).Return(errors.New("publish failure")) (or similar)
to simulate a failing PublishCore; call svc.DeleteMessage(site,
models.DeleteMessageRequest{MessageID: "..."} ) and assert it returns no error
and the response is non-nil (i.e., DeleteMessage succeeds despite PublishCore
failing). Ensure the test references the same symbols (svc.DeleteMessage,
msgs.SoftDeleteMessage, pub.PublishCore) so the mock expectations align with the
existing tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@docs/superpowers/specs/2026-05-28-broadcast-worker-thread-handling-design.md`:
- Around line 55-56: Update the doc line that describes the covering index to
match the implementation: change the index tuple mentioned for the
thread_subscriptions query from (parentMessageId, userAccount) to
(parentMessageId, siteId) so the spec aligns with the code in
broadcast-worker/store_mongo.go; reference the collection name
thread_subscriptions and the fields parentMessageId and siteId in the updated
sentence.

In `@history-service/internal/cassrepo/write.go`:
- Around line 242-262: The messages_by_room mirror update must not turn a
successful delete into an RPC failure: in the read and casDecrement error
branches inside the parent CAS block (references: r.session.Query(...) scanning
tcount, casDecrement(...), parentID, newTcountByID), stop returning an error and
instead treat the mirror as best-effort by returning &newTcountByID (optionally
logging the failure). Keep the existing behavior for gocql.ErrNotFound, but
change the other error returns (both the read error path and the casDecrement
error path) to return &newTcountByID so the authoritative messages_by_id result
remains the outcome.

In `@message-worker/handler.go`:
- Around line 104-105: The slog.Warn call logging the failed publish of thread
metadata (the call that currently includes "error", err, "roomID",
evt.Message.RoomID, "messageID", evt.Message.ID) must include the propagated
request/correlation ID; add the key "request_id" with value
natsutil.RequestIDFromContext(ctx) to that structured log entry so it matches
the project's logging pattern and other occurrences (e.g., store_cassandra.go
usage).

---

Nitpick comments:
In `@history-service/internal/publisher/publisher.go`:
- Around line 14-17: The interface currently named coreConn is a single-method
interface and should be renamed to follow the `-er` convention; update the
interface name from coreConn to corePublisher (keeping the method signature
Publish(ctx context.Context, subject string, data []byte) error) and update all
references/usages of coreConn in publisher.go and any callers to use
corePublisher so the code compiles and adheres to naming guidelines.

In `@history-service/internal/service/messages_test.go`:
- Around line 1509-1596: Add a new unit test mirroring
TestHistoryService_DeleteMessage_ThreadReply_PublishesThreadMetadataEvent but
exercise the PublishCore failure path: set up GetHistorySharedSince and
GetMessageByID the same, have msgs.SoftDeleteMessage return a non-nil newTcount,
expect pub.Publish for MsgCanonicalDeleted, and then expect
pub.EXPECT().PublishCore(...).Return(errors.New("publish failure")) (or similar)
to simulate a failing PublishCore; call svc.DeleteMessage(site,
models.DeleteMessageRequest{MessageID: "..."} ) and assert it returns no error
and the response is non-nil (i.e., DeleteMessage succeeds despite PublishCore
failing). Ensure the test references the same symbols (svc.DeleteMessage,
msgs.SoftDeleteMessage, pub.PublishCore) so the mock expectations align with the
existing tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 35b17d15-de9e-407e-9fb5-fbaa3a4509fc

📥 Commits

Reviewing files that changed from the base of the PR and between 7769d41 and d804acd.

📒 Files selected for processing (24)
  • broadcast-worker/handler.go
  • broadcast-worker/handler_test.go
  • broadcast-worker/integration_test.go
  • broadcast-worker/main.go
  • broadcast-worker/mock_store_test.go
  • broadcast-worker/store.go
  • broadcast-worker/store_mongo.go
  • docs/superpowers/plans/2026-05-28-broadcast-worker-thread-handling.md
  • docs/superpowers/specs/2026-05-28-broadcast-worker-thread-handling-design.md
  • history-service/cmd/main.go
  • history-service/internal/cassrepo/write.go
  • history-service/internal/publisher/publisher.go
  • history-service/internal/service/messages.go
  • history-service/internal/service/messages_test.go
  • history-service/internal/service/mocks/mock_repository.go
  • history-service/internal/service/service.go
  • message-worker/handler.go
  • message-worker/handler_test.go
  • message-worker/mock_store_test.go
  • message-worker/store.go
  • message-worker/store_cassandra.go
  • message-worker/store_cassandra_test.go
  • pkg/model/event.go
  • pkg/model/model_test.go

Comment thread docs/superpowers/specs/2026-05-28-broadcast-worker-thread-handling-design.md Outdated
Comment thread history-service/internal/cassrepo/write.go Outdated
Comment thread message-worker/handler.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
history-service/internal/service/integration_test.go (1)

409-409: 💤 Low value

Consider asserting the exact NewTCount value.

The test seeds the parent with tcount = 1 and deletes the single reply, so NewTCount should be exactly 0 after the decrement. The current assertion GreaterOrEqual validates the non-negative constraint but is looser than necessary.

📊 More precise assertion
-	assert.GreaterOrEqual(t, evt.NewTCount, 0, "NewTCount must be non-negative after decrement")
+	assert.Equal(t, 0, evt.NewTCount, "NewTCount should be 0 after deleting the single reply")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@history-service/internal/service/integration_test.go` at line 409, The
assertion on evt.NewTCount is too loose; since the test seeds the parent with
tcount = 1 and deletes the only reply, assert that evt.NewTCount equals 0
exactly. Replace the call to assert.GreaterOrEqual(t, evt.NewTCount, 0, "...")
with assert.Equal(t, 0, evt.NewTCount, "NewTCount must be 0 after deleting the
single reply") so the test verifies the precise expected decrement of
evt.NewTCount.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@history-service/internal/service/integration_test.go`:
- Line 409: The assertion on evt.NewTCount is too loose; since the test seeds
the parent with tcount = 1 and deletes the only reply, assert that evt.NewTCount
equals 0 exactly. Replace the call to assert.GreaterOrEqual(t, evt.NewTCount, 0,
"...") with assert.Equal(t, 0, evt.NewTCount, "NewTCount must be 0 after
deleting the single reply") so the test verifies the precise expected decrement
of evt.NewTCount.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89c19b5e-a792-4c11-aedd-e61baaf1a82e

📥 Commits

Reviewing files that changed from the base of the PR and between d804acd and afa8145.

📒 Files selected for processing (9)
  • broadcast-worker/handler.go
  • broadcast-worker/handler_test.go
  • broadcast-worker/integration_test.go
  • history-service/internal/cassrepo/write_integration_test.go
  • history-service/internal/service/integration_test.go
  • history-service/internal/service/messages_test.go
  • message-worker/handler_test.go
  • message-worker/integration_test.go
  • pkg/model/event.go
💤 Files with no reviewable changes (1)
  • broadcast-worker/handler_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/model/event.go
  • message-worker/handler_test.go
  • history-service/internal/service/messages_test.go
  • broadcast-worker/handler.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
broadcast-worker/handler.go (1)

191-199: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve mention metadata in thread-created fan-out.

This path parses mentions only to widen fanOut, but the emitted RoomEvent drops the mention signal entirely. Unlike the normal created-message paths, it never sets MentionAll, Mentions, or any per-recipient mention flag, so mentioned non-subscribers can receive the thread reply without the metadata clients use for mention badges/highlights.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@broadcast-worker/handler.go` around lines 191 - 199, The thread-created
fan-out path builds roomEvt via buildRoomEvent(meta, clientMsg) but never
populates mention metadata (MentionAll, Mentions, and per-recipient mention
flags) before marshaling, so clients lose mention badges; update the code after
building roomEvt (and before json.Marshal) to compute and set the same mention
fields used by the normal created-message path: derive MentionAll and Mentions
from clientMsg/parsed mention data (and mark per-recipient mention flags when
computing fanOut) and assign them onto roomEvt (respecting the existing
encJSON/message override logic) so the emitted RoomEvent contains the mention
metadata.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@history-service/internal/service/integration_test.go`:
- Around line 401-402: The test currently only checks canonicalEvt.NewTCount is
non-negative; tighten it to assert the exact expected value 0 given the fixture
starts with one reply: after require.NotNil(t, canonicalEvt.NewTCount) replace
the GreaterOrEqual assertion with an equality check (e.g., assert.Equal(t, 0,
*canonicalEvt.NewTCount, "NewTCount should be 0 after deleting the sole reply"))
so the test fails if the decrement did not occur.

In `@notification-worker/handler.go`:
- Around line 42-46: The handler currently treats all EventCreated events as
room-wide notifications, causing actual thread replies (which arrive as
EventCreated with a non-empty evt.Message.ThreadParentMessageID) to notify the
whole room; modify the early filter in the handler to also skip messages that
are thread replies by checking evt.Message.ThreadParentMessageID (and/or the
TShow flag if present) before calling h.members.ListSubscriptions so only
top-level messages use room-wide subscription logic while thread replies follow
the thread-specific path used in broadcast-worker/handler.go.

In `@pkg/subject/subject.go`:
- Around line 164-166: The function MsgCanonicalThreadReply introduces a new
canonical subject shape; change it to use the documented MESSAGES_CANONICAL
family instead—return "chat.msg.canonical.%s.created" (or create a new
MsgCanonicalCreated(siteID) wrapper) and update callers to use the existing
canonical subject builder (e.g., the MESSAGES_CANONICAL constant or the
MsgCanonicalCreated helper) so canonical events stay under
chat.msg.canonical.{siteID}.created/.edited/.deleted; rename or remove
MsgCanonicalThreadReply and replace its usages with the corrected builder to
maintain hierarchical dot-delimited subjects.

---

Outside diff comments:
In `@broadcast-worker/handler.go`:
- Around line 191-199: The thread-created fan-out path builds roomEvt via
buildRoomEvent(meta, clientMsg) but never populates mention metadata
(MentionAll, Mentions, and per-recipient mention flags) before marshaling, so
clients lose mention badges; update the code after building roomEvt (and before
json.Marshal) to compute and set the same mention fields used by the normal
created-message path: derive MentionAll and Mentions from clientMsg/parsed
mention data (and mark per-recipient mention flags when computing fanOut) and
assign them onto roomEvt (respecting the existing encJSON/message override
logic) so the emitted RoomEvent contains the mention metadata.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff6682db-93d6-4a34-8dc0-2744ab63c787

📥 Commits

Reviewing files that changed from the base of the PR and between afa8145 and ce2530b.

📒 Files selected for processing (14)
  • broadcast-worker/handler.go
  • broadcast-worker/handler_test.go
  • broadcast-worker/integration_test.go
  • history-service/internal/service/integration_test.go
  • history-service/internal/service/messages.go
  • history-service/internal/service/messages_test.go
  • history-service/internal/service/mocks/mock_repository.go
  • history-service/internal/service/service.go
  • message-worker/handler.go
  • message-worker/handler_test.go
  • notification-worker/handler.go
  • notification-worker/handler_test.go
  • pkg/model/event.go
  • pkg/subject/subject.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • broadcast-worker/integration_test.go

Comment thread history-service/internal/service/integration_test.go Outdated
Comment thread notification-worker/handler.go Outdated
Comment thread pkg/subject/subject.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
message-worker/store_cassandra.go (1)

110-140: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Don't treat any existing messages_by_id row as a completed thread write.

If the LWT insert succeeds and the later thread_messages_by_room insert or incrementParentTcount fails, JetStream will redeliver this message. On retry, Lines 123-125 return success immediately, so the missing by-room row / parent tcount is never repaired and the message can be acked in a permanently partial state.

A fix here needs a completion marker or a repair path on !applied that can safely finish the remaining side effects instead of unconditionally returning nil, nil.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@message-worker/store_cassandra.go` around lines 110 - 140, The LWT branch
treats any existing messages_by_id row as fully complete on !applied (variable
applied from MapScanCAS) which allows permanent partial writes if later inserts
fail; update the handler so that when applied is false you detect whether the
side-effects (thread_messages_by_room insert and incrementParentTcount) have
completed and, if not, perform the missing work or mark the original row as
completed; specifically, add a completion marker column to messages_by_id (or
read the row and inspect a completion flag) and, on !applied, query
messages_by_id for that flag and either perform the missing
thread_messages_by_room insert + call incrementParentTcount (idempotently) or
set the completion flag atomically so retries see a finished state; ensure you
use the same identifiers (messages_by_id, thread_messages_by_room,
incrementParentTcount, MapScanCAS/applied) and make all repair operations
idempotent.
🧹 Nitpick comments (5)
room-service/integration_test.go (1)

1870-1874: ⚡ Quick win

Seed each subtest independently.

The last element removed... case only passes because the previous subtest already mutated sub-1 from ["t1","t2"] to ["t2"]. That makes these assertions order-dependent and brittle. Reinsert or reset the subscription inside each subtest instead of sharing the mutated document.

As per coding guidelines, "Each test must be fully independent — no shared mutable state between tests; never rely on test execution order."

Also applies to: 1881-1885

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@room-service/integration_test.go` around lines 1870 - 1874, The failing tests
share mutable state: before calling store.UpdateSubscriptionThreadRead in the
"removes specified threadID and returns remaining" subtest (and the sibling
"last element removed..." subtest), reseed or recreate the subscription document
so each subtest starts from the same initial unread list (e.g., recreate
subscription "sub-1" with UnreadThreads ["t1","t2"] for user "alice" in the test
setup), or call the existing test helper that inserts the subscription before
invoking UpdateSubscriptionThreadRead; ensure each t.Run block independently
seeds the store (or uses a fresh store/session) so tests do not rely on prior
subtest mutations.
notification-worker/handler.go (1)

20-23: ⚡ Quick win

Rename ThreadSubscriberLookup to match the single-method interface rule.

This new interface is single-method, so it should use the repo's -er naming convention instead of *Lookup. Renaming it now avoids spreading the nonstandard name through the worker.

As per coding guidelines, "Interfaces must use the -er suffix for single-method interfaces and <Domain>Store pattern for store interfaces".

Also applies to: 32-38

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@notification-worker/handler.go` around lines 20 - 23, The interface
ThreadSubscriberLookup is a single-method interface and should be renamed to
follow the repo "-er" convention (e.g., ThreadSubscriptionLister or
ThreadSubscriberLister); update the interface name and keep the method signature
ListThreadSubscriptions(ctx context.Context, parentMessageID, siteID string)
([]model.ThreadSubscription, error). Replace all usages and references
(including the other occurrence that mirrors this interface) to the new name and
update any implementing types/functions so they satisfy the renamed interface
(adjust imports/tests/constructors where the old type name was used).
room-service/handler_test.go (1)

3668-3672: ⚡ Quick win

Don't allow write expectations on the GetUserSiteID failure path.

handleMessageThreadRead returns before the write phase when GetUserSiteID fails, so these .AnyTimes() expectations make the test pass even if a future refactor starts mutating state too early. Leaving both methods un-expected here gives you a stronger regression check.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@room-service/handler_test.go` around lines 3668 - 3672, Remove the permissive
write expectations on the GetUserSiteID failure path so the test fails if writes
occur: in the test around handleMessageThreadRead remove the
f.store.EXPECT().UpdateSubscriptionThreadRead(...).Return(...).AnyTimes() and
f.store.EXPECT().UpdateThreadSubscriptionRead(...).Return(...).AnyTimes() lines
(or replace them with explicit .Times(0) if you prefer), leaving no positive
expectations for UpdateSubscriptionThreadRead and UpdateThreadSubscriptionRead
when GetUserSiteID is supposed to fail.
notification-worker/main.go (1)

64-76: ⚡ Quick win

Wrap the Mongo lookup errors with query context.

Find and cursor.All currently return bare Mongo errors here, which makes failures hard to attribute once they hit worker logs. Please wrap them with the parent message/site context in this helper too.

Suggested fix
func (m *mongoThreadSubLookup) ListThreadSubscriptions(ctx context.Context, parentMessageID, siteID string) ([]model.ThreadSubscription, error) {
	filter := map[string]string{"parentMessageId": parentMessageID, "siteId": siteID}
	cursor, err := m.col.Find(ctx, filter)
	if err != nil {
-		return nil, err
+		return nil, fmt.Errorf("find thread subscriptions for parent %q site %q: %w", parentMessageID, siteID, err)
	}
	defer cursor.Close(ctx)

	var subs []model.ThreadSubscription
	if err := cursor.All(ctx, &subs); err != nil {
-		return nil, err
+		return nil, fmt.Errorf("decode thread subscriptions for parent %q site %q: %w", parentMessageID, siteID, err)
	}
	return subs, nil
}

As per coding guidelines, "Always wrap errors with context using fmt.Errorf("short description: %w", err) describing what the current function was doing".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@notification-worker/main.go` around lines 64 - 76, In
ListThreadSubscriptions, wrap errors returned by m.col.Find and cursor.All with
contextual messages including parentMessageID and siteID using fmt.Errorf("...:
%w", err) so logs show the query context; update the error returns for the Find
call to something like fmt.Errorf("find thread subscriptions parent=%s site=%s:
%w", parentMessageID, siteID, err) and similarly wrap the cursor.All error with
a message like fmt.Errorf("decode thread subscriptions parent=%s site=%s: %w",
parentMessageID, siteID, err).
notification-worker/handler_test.go (1)

57-58: ⚡ Quick win

Make the thread-subscriber stub assert siteID.

The production lookup filters on both parentMessageID and siteID, but this stub ignores the site completely. That means these tests still pass if HandleMessage sends the wrong site. Key the stub on both fields, or fail when siteID is unexpected.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@notification-worker/handler_test.go` around lines 57 - 58, The stub
ListThreadSubscriptions currently ignores siteID (in
stubThreadSubLookup.ListThreadSubscriptions) causing tests to miss incorrect
site values; update the stub to either key subscriptions by a composite key of
parentMsgID+siteID (e.g., map[string][]model.ThreadSubscription using
parentMsgID+"|"+siteID) or add an assertion that the passed siteID matches the
expected value stored on the stub and return an error or fail the test when it
doesn't; modify stubThreadSubLookup (its subs map and any setup in tests) so
Lookups require both parentMsgID and siteID to match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@message-worker/store_cassandra.go`:
- Around line 128-136: The INSERT into thread_messages_by_room is missing the
tshow column/value, causing msg.TShow to be dropped; update the Query in
store_cassandra.go (the s.cassSession.Query call that inserts into
thread_messages_by_room) to include the tshow column in the column list and pass
msg.TShow as the corresponding value in the VALUES arguments (mirror what
messages_by_id write does) so readers of thread_messages_by_room see the correct
TShow.

---

Outside diff comments:
In `@message-worker/store_cassandra.go`:
- Around line 110-140: The LWT branch treats any existing messages_by_id row as
fully complete on !applied (variable applied from MapScanCAS) which allows
permanent partial writes if later inserts fail; update the handler so that when
applied is false you detect whether the side-effects (thread_messages_by_room
insert and incrementParentTcount) have completed and, if not, perform the
missing work or mark the original row as completed; specifically, add a
completion marker column to messages_by_id (or read the row and inspect a
completion flag) and, on !applied, query messages_by_id for that flag and either
perform the missing thread_messages_by_room insert + call incrementParentTcount
(idempotently) or set the completion flag atomically so retries see a finished
state; ensure you use the same identifiers (messages_by_id,
thread_messages_by_room, incrementParentTcount, MapScanCAS/applied) and make all
repair operations idempotent.

---

Nitpick comments:
In `@notification-worker/handler_test.go`:
- Around line 57-58: The stub ListThreadSubscriptions currently ignores siteID
(in stubThreadSubLookup.ListThreadSubscriptions) causing tests to miss incorrect
site values; update the stub to either key subscriptions by a composite key of
parentMsgID+siteID (e.g., map[string][]model.ThreadSubscription using
parentMsgID+"|"+siteID) or add an assertion that the passed siteID matches the
expected value stored on the stub and return an error or fail the test when it
doesn't; modify stubThreadSubLookup (its subs map and any setup in tests) so
Lookups require both parentMsgID and siteID to match.

In `@notification-worker/handler.go`:
- Around line 20-23: The interface ThreadSubscriberLookup is a single-method
interface and should be renamed to follow the repo "-er" convention (e.g.,
ThreadSubscriptionLister or ThreadSubscriberLister); update the interface name
and keep the method signature ListThreadSubscriptions(ctx context.Context,
parentMessageID, siteID string) ([]model.ThreadSubscription, error). Replace all
usages and references (including the other occurrence that mirrors this
interface) to the new name and update any implementing types/functions so they
satisfy the renamed interface (adjust imports/tests/constructors where the old
type name was used).

In `@notification-worker/main.go`:
- Around line 64-76: In ListThreadSubscriptions, wrap errors returned by
m.col.Find and cursor.All with contextual messages including parentMessageID and
siteID using fmt.Errorf("...: %w", err) so logs show the query context; update
the error returns for the Find call to something like fmt.Errorf("find thread
subscriptions parent=%s site=%s: %w", parentMessageID, siteID, err) and
similarly wrap the cursor.All error with a message like fmt.Errorf("decode
thread subscriptions parent=%s site=%s: %w", parentMessageID, siteID, err).

In `@room-service/handler_test.go`:
- Around line 3668-3672: Remove the permissive write expectations on the
GetUserSiteID failure path so the test fails if writes occur: in the test around
handleMessageThreadRead remove the
f.store.EXPECT().UpdateSubscriptionThreadRead(...).Return(...).AnyTimes() and
f.store.EXPECT().UpdateThreadSubscriptionRead(...).Return(...).AnyTimes() lines
(or replace them with explicit .Times(0) if you prefer), leaving no positive
expectations for UpdateSubscriptionThreadRead and UpdateThreadSubscriptionRead
when GetUserSiteID is supposed to fail.

In `@room-service/integration_test.go`:
- Around line 1870-1874: The failing tests share mutable state: before calling
store.UpdateSubscriptionThreadRead in the "removes specified threadID and
returns remaining" subtest (and the sibling "last element removed..." subtest),
reseed or recreate the subscription document so each subtest starts from the
same initial unread list (e.g., recreate subscription "sub-1" with UnreadThreads
["t1","t2"] for user "alice" in the test setup), or call the existing test
helper that inserts the subscription before invoking
UpdateSubscriptionThreadRead; ensure each t.Run block independently seeds the
store (or uses a fresh store/session) so tests do not rely on prior subtest
mutations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7b12dad2-7d47-4fe2-be22-61ea82e4cd3e

📥 Commits

Reviewing files that changed from the base of the PR and between 4e18de7 and 2f62a19.

📒 Files selected for processing (13)
  • history-service/internal/service/messages.go
  • history-service/internal/service/messages_test.go
  • message-worker/integration_test.go
  • message-worker/store_cassandra.go
  • notification-worker/handler.go
  • notification-worker/handler_test.go
  • notification-worker/main.go
  • room-service/handler.go
  • room-service/handler_test.go
  • room-service/integration_test.go
  • room-service/mock_store_test.go
  • room-service/store.go
  • room-service/store_mongo.go
✅ Files skipped from review due to trivial changes (1)
  • room-service/mock_store_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • history-service/internal/service/messages.go
  • history-service/internal/service/messages_test.go

Comment thread message-worker/store_cassandra.go
@ngangwar962 ngangwar962 changed the title feat(broadcast-worker): add thread message fan-out to subscribers feat: real-time thread reply fan-out (broadcast-worker) + reply-count badge pipeline Jun 2, 2026
ngangwar962 pushed a commit that referenced this pull request Jun 2, 2026
…of-scope work

All notification-worker changes are dropped from this PR — a separate engineer
owns that service. The handler is restored to its main-branch state.

Two artifacts document what still needs to be built there:
- notification-worker/handler.go: NOT COVERED comment on HandleMessage listing
  the three things required (EventCreated filter, thread-subscriber routing,
  @-mention fan-out via EventThreadReplyAdded)
- docs/thread-reply-notifications.md: full design notes with file references for
  whoever picks up the work

https://claude.ai/code/session_0198s2kLNZM8cUVw5JLW28BK
ngangwar962 pushed a commit that referenced this pull request Jun 2, 2026
Mark all plan tasks [x] done. Add "Implementation Notes" to the design
spec covering the six additions that went beyond the original scope
(DM thread handling, EventThreadReplyAdded badge pipeline, thread delete
badge + mention fan-out, message-worker tcount publish, history-service
tcount CAS + EventDeleted.Content). Documents notification-worker as
intentionally out of scope and points to docs/thread-reply-notifications.md.

https://claude.ai/code/session_0198s2kLNZM8cUVw5JLW28BK
ngangwar962 pushed a commit that referenced this pull request Jun 2, 2026
…t-review fixes

Spec: adds 8 missing broadcast-worker entries to the Implementation Notes block
covering the 3 post-review correctness fixes (evt.Timestamp propagation,
TShow=true badge on delete, history-service tcount best-effort), plus the
simplification and defensive changes from the /simplify and /code-review passes
(shouldUseThreadFanOut rename, buildEditRoomEvent/buildDeleteRoomEvent helpers,
publishThreadBadge, default-branch return nil removal, nil guard).

Plan: adds a note in the status block pointing to a new "Post-Plan Fixes and
Refactoring" section that documents all post-plan commits with their rationale,
so the plan is a complete record of everything that landed in PR #245.

https://claude.ai/code/session_0198s2kLNZM8cUVw5JLW28BK
@ngangwar962 ngangwar962 force-pushed the claude/gallant-galileo-ice0C branch from ca735d2 to 00a6434 Compare June 2, 2026 08:35
@mliu33 mliu33 requested review from Joey0538 and mliu33 June 2, 2026 09:08
@ngangwar962 ngangwar962 force-pushed the claude/gallant-galileo-ice0C branch 3 times, most recently from 2d51754 to 3f85333 Compare June 4, 2026 01:15
ngangwar962 pushed a commit that referenced this pull request Jun 4, 2026
- room-worker/handler.go: fix rangeValCopy lint (iterate by index)
- message-worker/integration_test.go: SaveThreadMessage returns (*int,
  error); assign both values instead of single-value require.NoError
- broadcast-worker/integration_test.go: MongoDB driver v2 decodes nested
  BSON documents as bson.D, not bson.M; fix EnsureIndexes index
  verification to use []bson.D iteration (matches room-service pattern)
- inbox-worker/handler.go: add UpdateSubscriptionNamesForRoom and
  ApplySubscriptionVisibility to InboxStore interface; implement
  handleRoomRenamed and handleRoomVisibilityChanged; wire room_renamed
  and room_restricted cases in HandleEvent switch
- inbox-worker/handler_test.go: add stub methods for new interface methods
- room-worker/integration_test.go: fix TestIntegration_ProcessRoomRename
  seed — carol's subscription must use SiteID=remoteSite so outbox
  fan-out to site-b fires correctly

https://claude.ai/code/session_013Vs7CusvrZFrRKJaSoFtCi
@ngangwar962 ngangwar962 force-pushed the claude/gallant-galileo-ice0C branch from e61905b to bc475d8 Compare June 4, 2026 05:53
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review


Generated by Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

@ngangwar962 I'll kick off a full review of PR #245 now.


🧠 Learnings used
Learnt from: yenta
Repo: hmchangw/chat PR: 80
File: message-worker/handler.go:192-199
Timestamp: 2026-04-14T16:58:26.689Z
Learning: In `message-worker/handler.go` (Go), `handleFirstThreadReply` and `handleSubsequentThreadReply` intentionally return `nil` (after a `slog.Warn`) when `GetMessageSender` returns `errMessageNotFound` for the parent message. This skips all remaining steps — including replier subscription creation and `UpdateThreadRoomLastMessage` — by design: a missing parent is treated as an unresolvable thread context, and no partial Mongo state should be written. Do not flag this early-return pattern as a bug.

Learnt from: general-lex
Repo: hmchangw/chat PR: 247
File: room-service/handler.go:1506-1559
Timestamp: 2026-06-01T09:01:42.635Z
Learning: In hmchangw/chat, the pattern of committing a local Mongo write then publishing a cross-site OUTBOX event and returning an error to the client if the publish fails is uniform across ALL cross-site handlers (mute.toggle, message.read, member.role.update, member.remove, add.member, favorite.toggle, thread.read, etc.). This is an intentional architectural trade-off: the DB write is the source of truth and clients are expected to debounce retries. A proper durable-outbox solution (Mongo outbox collection, new store method, idempotency keys, background drainer, dead-letter handling) is tracked as a separate multi-PR service-wide initiative. Do NOT flag individual handlers for this pattern as it is consistent by design.

Learnt from: vjauhari-work
Repo: hmchangw/chat PR: 44
File: room-service/handler.go:392-400
Timestamp: 2026-04-01T10:43:13.466Z
Learning: In room-service/handler.go (Go), the handleAddMembers flow intentionally runs BulkCreateSubscriptions before writeRoomMembers without a transaction. The partial-commit inconsistency risk is mitigated by unique indexes on both `subscriptions` and `room_members` collections (added in PR `#44`), making retries idempotent (duplicate-key errors suppressed). A full MongoDB transactional approach is deferred to a follow-up PR; MongoDB transactions require a replica set.

Learnt from: general-lex
Repo: hmchangw/chat PR: 141
File: inbox-worker/main.go:115-120
Timestamp: 2026-04-30T05:55:03.689Z
Learning: In `hmchangw/chat`, the canonical MongoDB collection name for thread subscriptions is `"thread_subscriptions"` (snake_case), not `"threadSubscriptions"`. This is used consistently in `message-worker/store_mongo.go`, `message-worker/integration_test.go`, and `history-service/internal/mongorepo/threadroom.go`. The Go struct field holding the collection reference may be named `threadSubscriptions` (camelCase), but the string passed to `db.Collection(...)` must be `"thread_subscriptions"`. Inbox-worker must use the same name to avoid writing to a different collection.

Learnt from: yenta
Repo: hmchangw/chat PR: 80
File: message-worker/handler.go:203-215
Timestamp: 2026-04-14T17:02:08.750Z
Learning: In `message-worker/handler.go` (Go), `handleFirstThreadReply` and `handleSubsequentThreadReply` always initialize `ThreadSubscription.LastSeenAt` to `time.Time{}` (zero value) for the parent author — including when `msg.UserID == parentSender.ID` (self-reply). This is intentional: subscriptions are created with a zero read cursor; `LastSeenAt` is updated exclusively by the read-behaviour flow. The self-reply case (where the author's own subscription should start as "seen") is a known gap explicitly deferred to a follow-up PR. Do not re-flag the zero `lastSeenAt` init as a bug.
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review


Generated by Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

@ngangwar962 I'll kick off a full review of PR #245 right away.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/client-api.md`:
- Around line 616-628: The example JSON for the "Rename Room" error in the
Rename Room section uses only an "error" field but must follow the canonical
Error envelope; update the example to include the required "code" property (and
add "reason" when applicable) so it matches the documented envelope referenced
by "Error envelope" (see the error list for the `"rename is only allowed in
channel rooms"` case) and ensure the example keys/order match other examples in
the document.
- Around line 936-1033: The file contains verbatim duplicated API sections
("Toggle Favorite", "Pin Message", "Unpin Message", "List Pinned Messages")
which risk contract drift; remove the duplicate blocks and keep a single
canonical copy for each named section. Locate the duplicate headings by the
exact section titles and ensure you preserve the complete content from one copy
(subjects, request/response tables, error cases, triggered events, and
cross-site behaviour) while deleting the other identical copy, then run a quick
search to confirm no other verbatim duplicates remain and adjust the table of
contents or internal anchors if necessary so links still resolve to the single
retained section.
- Around line 3137-3147: Update the documentation to make the `code` cardinality
consistent: pick whether `code` is one of 7 or one of 8 categories, then change
the conflicting textual occurrences so both state the same number and ensure the
enumerated list of categories exactly matches that number; update the sentence
that currently reads `"code": "<one of 8 generic categories>"` and the later
sentence that reads `One of the 7 categories below.` to the chosen cardinality
and verify the `code` enum in the specification (the list under the `code`
field) contains the same count.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 14e59bbc-2a5c-4a9c-86c1-c365739aad67

📥 Commits

Reviewing files that changed from the base of the PR and between 4e18de7 and ac6132c.

📒 Files selected for processing (34)
  • .gitignore
  • broadcast-worker/handler.go
  • broadcast-worker/handler_test.go
  • broadcast-worker/integration_test.go
  • broadcast-worker/main.go
  • broadcast-worker/mock_store_test.go
  • broadcast-worker/store.go
  • broadcast-worker/store_mongo.go
  • docs/client-api.md
  • docs/superpowers/plans/2026-05-28-broadcast-worker-thread-handling.md
  • docs/superpowers/plans/2026-06-04-tcount-count-based.md
  • docs/superpowers/specs/2026-05-28-broadcast-worker-thread-handling-design.md
  • docs/thread-reply-notifications.md
  • history-service/internal/cassrepo/write.go
  • history-service/internal/cassrepo/write_integration_test.go
  • history-service/internal/publisher/publisher.go
  • history-service/internal/service/integration_test.go
  • history-service/internal/service/messages.go
  • history-service/internal/service/messages_test.go
  • history-service/internal/service/mocks/mock_repository.go
  • history-service/internal/service/service.go
  • inbox-worker/handler.go
  • inbox-worker/handler_test.go
  • message-worker/handler.go
  • message-worker/handler_test.go
  • message-worker/integration_test.go
  • message-worker/mock_store_test.go
  • message-worker/store.go
  • message-worker/store_cassandra.go
  • message-worker/store_cassandra_test.go
  • pkg/model/event.go
  • pkg/model/model_test.go
  • room-service/handler.go
  • room-service/handler_test.go
💤 Files with no reviewable changes (24)
  • docs/thread-reply-notifications.md
  • docs/superpowers/plans/2026-06-04-tcount-count-based.md
  • history-service/internal/publisher/publisher.go
  • message-worker/store.go
  • message-worker/mock_store_test.go
  • pkg/model/model_test.go
  • message-worker/store_cassandra_test.go
  • history-service/internal/service/mocks/mock_repository.go
  • message-worker/handler.go
  • inbox-worker/handler.go
  • history-service/internal/service/messages.go
  • history-service/internal/service/service.go
  • history-service/internal/service/integration_test.go
  • pkg/model/event.go
  • docs/superpowers/specs/2026-05-28-broadcast-worker-thread-handling-design.md
  • room-service/handler.go
  • inbox-worker/handler_test.go
  • room-service/handler_test.go
  • history-service/internal/cassrepo/write_integration_test.go
  • message-worker/handler_test.go
  • message-worker/integration_test.go
  • message-worker/store_cassandra.go
  • history-service/internal/service/messages_test.go
  • history-service/internal/cassrepo/write.go
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (1)
  • broadcast-worker/main.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/client-api.md`:
- Around line 616-628: The example JSON for the "Rename Room" error in the
Rename Room section uses only an "error" field but must follow the canonical
Error envelope; update the example to include the required "code" property (and
add "reason" when applicable) so it matches the documented envelope referenced
by "Error envelope" (see the error list for the `"rename is only allowed in
channel rooms"` case) and ensure the example keys/order match other examples in
the document.
- Around line 936-1033: The file contains verbatim duplicated API sections
("Toggle Favorite", "Pin Message", "Unpin Message", "List Pinned Messages")
which risk contract drift; remove the duplicate blocks and keep a single
canonical copy for each named section. Locate the duplicate headings by the
exact section titles and ensure you preserve the complete content from one copy
(subjects, request/response tables, error cases, triggered events, and
cross-site behaviour) while deleting the other identical copy, then run a quick
search to confirm no other verbatim duplicates remain and adjust the table of
contents or internal anchors if necessary so links still resolve to the single
retained section.
- Around line 3137-3147: Update the documentation to make the `code` cardinality
consistent: pick whether `code` is one of 7 or one of 8 categories, then change
the conflicting textual occurrences so both state the same number and ensure the
enumerated list of categories exactly matches that number; update the sentence
that currently reads `"code": "<one of 8 generic categories>"` and the later
sentence that reads `One of the 7 categories below.` to the chosen cardinality
and verify the `code` enum in the specification (the list under the `code`
field) contains the same count.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 14e59bbc-2a5c-4a9c-86c1-c365739aad67

📥 Commits

Reviewing files that changed from the base of the PR and between 4e18de7 and ac6132c.

📒 Files selected for processing (34)
  • .gitignore
  • broadcast-worker/handler.go
  • broadcast-worker/handler_test.go
  • broadcast-worker/integration_test.go
  • broadcast-worker/main.go
  • broadcast-worker/mock_store_test.go
  • broadcast-worker/store.go
  • broadcast-worker/store_mongo.go
  • docs/client-api.md
  • docs/superpowers/plans/2026-05-28-broadcast-worker-thread-handling.md
  • docs/superpowers/plans/2026-06-04-tcount-count-based.md
  • docs/superpowers/specs/2026-05-28-broadcast-worker-thread-handling-design.md
  • docs/thread-reply-notifications.md
  • history-service/internal/cassrepo/write.go
  • history-service/internal/cassrepo/write_integration_test.go
  • history-service/internal/publisher/publisher.go
  • history-service/internal/service/integration_test.go
  • history-service/internal/service/messages.go
  • history-service/internal/service/messages_test.go
  • history-service/internal/service/mocks/mock_repository.go
  • history-service/internal/service/service.go
  • inbox-worker/handler.go
  • inbox-worker/handler_test.go
  • message-worker/handler.go
  • message-worker/handler_test.go
  • message-worker/integration_test.go
  • message-worker/mock_store_test.go
  • message-worker/store.go
  • message-worker/store_cassandra.go
  • message-worker/store_cassandra_test.go
  • pkg/model/event.go
  • pkg/model/model_test.go
  • room-service/handler.go
  • room-service/handler_test.go
💤 Files with no reviewable changes (24)
  • docs/thread-reply-notifications.md
  • docs/superpowers/plans/2026-06-04-tcount-count-based.md
  • history-service/internal/publisher/publisher.go
  • message-worker/store.go
  • message-worker/mock_store_test.go
  • pkg/model/model_test.go
  • message-worker/store_cassandra_test.go
  • history-service/internal/service/mocks/mock_repository.go
  • message-worker/handler.go
  • inbox-worker/handler.go
  • history-service/internal/service/messages.go
  • history-service/internal/service/service.go
  • history-service/internal/service/integration_test.go
  • pkg/model/event.go
  • docs/superpowers/specs/2026-05-28-broadcast-worker-thread-handling-design.md
  • room-service/handler.go
  • inbox-worker/handler_test.go
  • room-service/handler_test.go
  • history-service/internal/cassrepo/write_integration_test.go
  • message-worker/handler_test.go
  • message-worker/integration_test.go
  • message-worker/store_cassandra.go
  • history-service/internal/service/messages_test.go
  • history-service/internal/cassrepo/write.go
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (1)
  • broadcast-worker/main.go
🛑 Comments failed to post (3)
docs/client-api.md (3)

616-628: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align Rename Room error example with the canonical error envelope.

The example { "error": "rename is only allowed in channel rooms" } omits code, but this document states code is always present for client-facing errors. The example should include code (and reason if applicable).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/client-api.md` around lines 616 - 628, The example JSON for the "Rename
Room" error in the Rename Room section uses only an "error" field but must
follow the canonical Error envelope; update the example to include the required
"code" property (and add "reason" when applicable) so it matches the documented
envelope referenced by "Error envelope" (see the error list for the `"rename is
only allowed in channel rooms"` case) and ensure the example keys/order match
other examples in the document.

936-1033: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove duplicated API sections to avoid contract drift.

Toggle Favorite, Pin Message, Unpin Message, and List Pinned Messages are duplicated verbatim in this file. This creates two sources of truth and increases the chance of clients implementing stale behavior.

Also applies to: 2073-2357

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 986-986: Multiple headings with the same content

(MD024, no-duplicate-heading)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/client-api.md` around lines 936 - 1033, The file contains verbatim
duplicated API sections ("Toggle Favorite", "Pin Message", "Unpin Message",
"List Pinned Messages") which risk contract drift; remove the duplicate blocks
and keep a single canonical copy for each named section. Locate the duplicate
headings by the exact section titles and ensure you preserve the complete
content from one copy (subjects, request/response tables, error cases, triggered
events, and cross-site behaviour) while deleting the other identical copy, then
run a quick search to confirm no other verbatim duplicates remain and adjust the
table of contents or internal anchors if necessary so links still resolve to the
single retained section.

3137-3147: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix contradictory error-code cardinality in the envelope spec.

Line 3137 says code is “one of 8 generic categories,” while Line 3146 says “one of the 7 categories below.” This should be consistent to keep the contract unambiguous.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/client-api.md` around lines 3137 - 3147, Update the documentation to
make the `code` cardinality consistent: pick whether `code` is one of 7 or one
of 8 categories, then change the conflicting textual occurrences so both state
the same number and ensure the enumerated list of categories exactly matches
that number; update the sentence that currently reads `"code": "<one of 8
generic categories>"` and the later sentence that reads `One of the 7 categories
below.` to the chosen cardinality and verify the `code` enum in the
specification (the list under the `code` field) contains the same count.

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review


Generated by Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

@ngangwar962 Kicking off a full review of PR #245 now.


🧠 Learnings used
Learnt from: yenta
Repo: hmchangw/chat PR: 80
File: message-worker/handler.go:192-199
Timestamp: 2026-04-14T16:58:26.689Z
Learning: In `message-worker/handler.go` (Go), `handleFirstThreadReply` and `handleSubsequentThreadReply` intentionally return `nil` (after a `slog.Warn`) when `GetMessageSender` returns `errMessageNotFound` for the parent message. This skips all remaining steps — including replier subscription creation and `UpdateThreadRoomLastMessage` — by design: a missing parent is treated as an unresolvable thread context, and no partial Mongo state should be written. Do not flag this early-return pattern as a bug.

Learnt from: CR
Repo: hmchangw/chat PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T05:10:44.301Z
Learning: Never merge code directly into `master` or `main` — always create a PR for review first. If your changes touch a client-facing handler (NATS subjects starting with `chat.user.{account}.request.…` or `chat.user.{account}.room.{roomID}.{siteID}.msg.send`, or any HTTP route in `auth-service`), update `docs/client-api.md` in the same PR with new request/response schema, error cases, and triggered events. Delete all files under `docs/reviews/` from the branch just before creating the PR.

Learnt from: yenta
Repo: hmchangw/chat PR: 80
File: message-worker/handler.go:203-215
Timestamp: 2026-04-14T17:02:08.750Z
Learning: In `message-worker/handler.go` (Go), `handleFirstThreadReply` and `handleSubsequentThreadReply` always initialize `ThreadSubscription.LastSeenAt` to `time.Time{}` (zero value) for the parent author — including when `msg.UserID == parentSender.ID` (self-reply). This is intentional: subscriptions are created with a zero read cursor; `LastSeenAt` is updated exclusively by the read-behaviour flow. The self-reply case (where the author's own subscription should start as "seen") is a known gap explicitly deferred to a follow-up PR. Do not re-flag the zero `lastSeenAt` init as a bug.

Learnt from: general-lex
Repo: hmchangw/chat PR: 247
File: room-service/handler.go:1506-1559
Timestamp: 2026-06-01T09:01:42.635Z
Learning: In hmchangw/chat, the pattern of committing a local Mongo write then publishing a cross-site OUTBOX event and returning an error to the client if the publish fails is uniform across ALL cross-site handlers (mute.toggle, message.read, member.role.update, member.remove, add.member, favorite.toggle, thread.read, etc.). This is an intentional architectural trade-off: the DB write is the source of truth and clients are expected to debounce retries. A proper durable-outbox solution (Mongo outbox collection, new store method, idempotency keys, background drainer, dead-letter handling) is tracked as a separate multi-PR service-wide initiative. Do NOT flag individual handlers for this pattern as it is consistent by design.

Learnt from: saurav-err-137
Repo: hmchangw/chat PR: 0
File: :0-0
Timestamp: 2026-04-29T03:25:14.693Z
Learning: In `hmchangw/chat` (Go), member-management operations (`member.add`, `member.remove`, `member.role-update`) are gated by room-service to **channel rooms only**. As a consequence, all subscription creation/update sites that handle these operations — `processAddMembers`, `processRemoveIndividual`, `processRemoveOrg` in `room-worker/handler.go`, and `handleMemberAdded` in `inbox-worker/handler.go` — hardcode `RoomType: model.RoomTypeChannel` rather than doing a runtime room fetch. The `lookupRoomType` helper (introduced in PR `#137` and removed in the same PR after review) was unnecessary for this reason. Do not flag the absence of a dynamic room-type lookup in these paths.

Learnt from: vjauhari-work
Repo: hmchangw/chat PR: 44
File: room-service/handler.go:392-400
Timestamp: 2026-04-01T10:43:13.466Z
Learning: In room-service/handler.go (Go), the handleAddMembers flow intentionally runs BulkCreateSubscriptions before writeRoomMembers without a transaction. The partial-commit inconsistency risk is mitigated by unique indexes on both `subscriptions` and `room_members` collections (added in PR `#44`), making retries idempotent (duplicate-key errors suppressed). A full MongoDB transactional approach is deferred to a follow-up PR; MongoDB transactions require a replica set.
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ngangwar962 ngangwar962 force-pushed the claude/gallant-galileo-ice0C branch from 5fe6146 to 155e097 Compare June 4, 2026 10:24
case model.RoomTypeDM, model.RoomTypeBotDM:
// DM thread replies are visible to all members, so @-mention badges are correct.
if len(resolved.Accounts) > 0 {
if err := h.store.SetSubscriptionMentions(ctx, meta.ID, resolved.Accounts); err != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to double check whether thread msg in a DM will make frontend sidebar menu room name bold with red dot badge

return fmt.Errorf("set subscription mentions: %w", err)
}
}
if err := h.store.UpdateRoomLastMessage(ctx, msg.RoomID, msg.ID, msg.CreatedAt, resolved.MentionAll); err != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to check whether this aligns with the current behavior

Comment thread broadcast-worker/handler.go Outdated
return nil
}
if h.encrypt {
if err := h.encryptEditedContent(ctx, room.ID, &edit); err != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread msg broadcast is sending to each user subject, so we don't need to encrypt the content

Comment thread broadcast-worker/handler.go Outdated
if len(resolved.Participants) > 0 {
roomEvt.Mentions = resolved.Participants
}
if err := h.encryptRoomEvent(ctx, meta.ID, clientMsg, &roomEvt); err != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread msg is broadcasted to each recipient subject, so we don't need to encrypt the content

Comment thread broadcast-worker/handler.go Outdated
RoomID: room.ID,
SiteID: room.SiteID,
Timestamp: time.Now().UTC().UnixMilli(),
Timestamp: evt.Timestamp,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Timestamp is meant for tracking the actual broadcast worker publish time. For example, in case of Jetstream redelivery, we can find out the actual publish time by broadcast worker in frontend.

We could have another field for triggered message event time.

Comment thread room-worker/handler.go Outdated
siteEvtData, _ := json.Marshal(siteEvt)
outbox := model.OutboxEvent{
Type: model.OutboxMemberAdded, SiteID: room.SiteID, DestSiteID: destSiteID,
Type: "member_added", SiteID: room.SiteID, DestSiteID: destSiteID,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should revert

Comment thread room-worker/handler.go Outdated
payloadSeed := fmt.Sprintf("%s:%s:%d", req.RoomID, req.RequesterAccount, req.Timestamp)
dedupID := natsutil.OutboxDedupID(ctx, destSiteID, payloadSeed)
if err := h.publish(ctx, subject.Outbox(room.SiteID, destSiteID, model.OutboxMemberAdded), outboxData, dedupID); err != nil {
if err := h.publish(ctx, subject.Outbox(room.SiteID, destSiteID, "member_added"), outboxData, dedupID); err != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should revert

Comment thread room-worker/handler.go Outdated
continue
}
h.publishSubscriptionUpdate(ctx, sub.User.Account, data)
if err := h.publish(ctx, subject.SubscriptionUpdate(sub.User.Account), data, ""); err != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should revert

Comment thread room-worker/handler.go
})
if err != nil {
return fmt.Errorf("marshal rename outbox payload: %w", err)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should revert the change

// Thread-reply badge events carry only the parent's tcount, not a
// searchable document — skip them before the document-shape guards below,
// which would otherwise reject the sparse Message as a hard error.
if evt.Event == model.EventThreadReplyAdded {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like previous comment, we should publish to another subject for tcount event instead of using message canonical Jetstream

@ngangwar962 ngangwar962 force-pushed the claude/gallant-galileo-ice0C branch from 155e097 to 0d9a731 Compare June 6, 2026 15:13
claude added 2 commits June 6, 2026 15:37
Adds end-to-end thread reply support across the chat pipeline:

- broadcast-worker: fan out thread reply create/edit/delete events to thread
  followers (replyAccounts) and @-mentions; fan out tcount badge via
  handleThreadTCountUpdated. DM/BotDM replies go to all human members.

- message-worker: persist thread replies with IF NOT EXISTS LWT + MapScanCAS
  idempotency; publish EventThreadReplyAdded with tcount from
  countAndSetParentTcount (COUNT-based, crash-safe, idempotent on redelivery).

- history-service: edit/delete events carry ThreadParentMessageID + TShow;
  delete path recomputes tcount via countAndSetParentTcount (COUNT → blind SET).

- search-sync-worker: skip EventThreadReplyAdded events (no searchable doc).

- room-service: UpdateSubscriptionThreadRead made atomic (returns new array +
  alert in a single MongoDB pipeline update).

- pkg/model: EventThreadReplyAdded, NewTCount, ThreadMetadataUpdatedEvent.
- pkg/subject: badge events travel on MsgCanonicalCreated (.created subject).
- docs: tcount COUNT-based implementation plan + client-api updates.

tcount approach: replaces CAS increment/decrement with a full partition scan
of thread_messages_by_thread (COUNT non-deleted rows → blind SET on parent
rows). Eliminates 2PC crash window — any JetStream redelivery re-COUNTs and
re-SETs, converging to the correct value. Known cost: O(N) scan per event.
Follow-up PR will replace with a Cassandra COUNTER table + reconciliation job
(see docs/superpowers/plans/2026-06-04-tcount-count-based.md §"Known Trade-offs
and Future Work").

https://claude.ai/code/session_013Vs7CusvrZFrRKJaSoFtCi
The PR accidentally deleted the publishSubscriptionUpdate helper,
inlined its three call sites, replaced model.OutboxMemberAdded
constants with raw strings, deleted findRemoteSitesForAccounts, and
replaced the full processRoomRename implementation with a stripped-down
version that lost async-job-result publishing and request-ID
validation. None of these changes are part of the thread-reply
pipeline.

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
claude added 3 commits June 6, 2026 15:37
…ollowers

parentMessageId is the unique key in thread_rooms — one document per parent
message — so the siteId filter was redundant. Removing it simplifies the query,
the store interface, and all callers.

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
Thread replies are published per-user to chat.user.{account}.room.event
subjects — each message goes to exactly one user's mailbox, so there is
no shared stream to protect. Encrypting per-user subjects adds overhead
with no security benefit.

Encryption is retained for the shared room channel stream
(chat.room.{roomID}.event) where all members subscribe.

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
…lishes fail

Previously errgroup returned the first error, causing JetStream to
redeliver to all accounts including those who already received the event,
producing duplicate thread replies on the frontend.

Now partial failure is tolerated: only return an error (triggering
redelivery) when every single publish failed. Individual failures are
logged but do not cause redelivery for successful recipients.

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
@ngangwar962 ngangwar962 force-pushed the claude/gallant-galileo-ice0C branch from 4ff3363 to a950796 Compare June 6, 2026 15:37
claude added 9 commits June 6, 2026 15:40
- history-service/internal/service/integration_test.go:349 —
  TestDeleteMessage_Integration_ThreadReplyPublishesMetadataEvent called
  New() with 6 args; function now requires 8 (UserStore + CustomEmojiStore
  were added). Add nil, nil for the two store params.

- history-service/internal/cassrepo/reactions_integration_test.go —
  TestRepository_AddReaction_Pinned and TestRepository_RemoveReaction_Pinned
  inserted pinned_at into messages_by_room, which lacks that column per the
  Cassandra model doc. Remove the column from the INSERT statements.

- broadcast-worker/integration_test.go — TestBroadcastWorker_GetThreadFollowers
  called GetThreadFollowers with a siteID arg after commit 37b55b7 removed it
  from the Store interface. Update calls to 1-arg form and replace the
  siteId-isolation subtest (not applicable per-site deployment model) with a
  distinct-parentMessageId subtest.
…m test INSERTs/SELECTs

TestRepository_AddReaction_Pinned and TestRepository_RemoveReaction_Pinned were
inserting into pinned_messages_by_room using column name 'created_at' instead of
'pinned_at' (the actual clustering key), and the verification SELECTs had the
same wrong column in the WHERE clause.

https://claude.ai/code/session_013Vs7CusvrZFrRKJaSoFtCi
…parent lookup and re-publish

tcount is persisted durably by countAndSetParentTcount on the first delete.
Re-publishing EventDeleted on retry adds unnecessary Cassandra reads and
failure modes (parent-lookup error → retry loop) without any benefit.

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
JetStream MsgID dedup prevents double-delivery at the consumer level, so
IF NOT EXISTS was adding 5-10x Paxos overhead for no benefit. tcount is
derived from a COUNT + blind SET via countAndSetParentTcount, which is
idempotent on redelivery without any CAS.

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
…ThreadRead with app-side logic

Use $pull to atomically remove the threadID, then check in Go whether
threadUnread is now empty. If empty, a second UpdateOne clears alert and
unsets the field. Avoids aggregation pipeline CPU overhead on MongoDB.

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
…amp for canonical event time

Timestamp on each room event struct now records when broadcast-worker
publishes the event, enabling clients to detect JetStream redeliveries.
EventTimestamp carries the original canonical event time from
message-worker for correlation. ReactRoomEvent already used publish time;
all other event types (RoomEvent, EditRoomEvent, DeleteRoomEvent,
PinRoomEvent, UnpinRoomEvent, ThreadMetadataUpdatedEvent) are updated.

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
…rver-broadcast

Publishing EventThreadReplyAdded back to MESSAGES_CANONICAL polluted the
message CRUD event store with badge events and required a skip guard in
message-worker to avoid reprocessing its own publishes.

New flow: message-worker publishes via core NATS on
chat.server.broadcast.{siteID}.thread.tcount; broadcast-worker subscribes
to chat.server.broadcast.{siteID}.> with a queue group and handles the
event via HandleServerBroadcast. Badge events are best-effort
(fire-and-forget, no JetStream durability needed).

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
…cast events

timestamp = when broadcast-worker published the event (allows clients to
detect JetStream redeliveries). eventTimestamp = when message-worker
published the canonical event (source-of-truth time for correlation).
Updated DeleteRoomEvent and ThreadMetadataUpdatedEvent tables.

https://claude.ai/code/session_01LjGg9QJU7QVZFU6pXXFbgb
New commits added EventTimestamp to DeleteRoomEvent, PinRoomEvent,
UnpinRoomEvent, and ReactRoomEvent but left older fields misaligned.
goimports now requires all fields in a struct to be column-aligned.

https://claude.ai/code/session_013Vs7CusvrZFrRKJaSoFtCi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants