Skip to content

perf(room-worker): count members off denormalized u.isBot instead of per-doc regex#241

Merged
mliu33 merged 1 commit into
mainfrom
claude/room-worker-reconcile-no-regex
May 30, 2026
Merged

perf(room-worker): count members off denormalized u.isBot instead of per-doc regex#241
mliu33 merged 1 commit into
mainfrom
claude/room-worker-reconcile-no-regex

Conversation

@hmchangw
Copy link
Copy Markdown
Owner

@hmchangw hmchangw commented May 29, 2026

Problem

ReconcileMemberCounts classified each subscription as bot vs user with a $regexMatch on $u.account, inside a $group over every subscription in the room, on every create / add / remove. $regexMatch is not index-usable and runs once per document, so the cost scales with room size on every membership change — pure waste, since the user/app split rarely changes.

Fix

model.Subscription already carries a u.isBot field that was never populated — the regex existed precisely because the stored field was unreliable. This wires it up and counts off it.

  • Single source of truth: add model.IsBotAccount (.bot suffix or p_ prefix; equivalent to the (\.bot$|^p_) regex used by pkg/pipelines and room-service).
  • Stamp at write time: newSub now sets u.isBot via the predicate. The add-member path was building subs inline — routed through newSub so there's a single construction site. determineRoomTypeFromPayload reuses the predicate too. room-worker's BulkCreateSubscriptions is the only inserter of subscription docs, so write-path coverage is complete.
  • Regex-free reconcile: two index-backed counts — total subs {roomId} and bot subs {roomId, u.isBot}; UserCount = total − bots. Deriving by subtraction keeps legacy docs missing the field counted as users. Still recompute-and-$set, preserving JetStream-redelivery idempotency.
  • Index: added {roomId, u.isBot} in room-service EnsureIndexes so both counts are index-only.

Deploy ordering ⚠️

Ship this (writers stamp u.isBot) and backfill existing docs before the new counts are trustworthy — otherwise pre-existing bots count as users until backfilled. Backfill (run once; this is the only place the regex survives — one pass over the collection, not per-read-per-room):

db.subscriptions.updateMany({}, [
  {$set: {"u.isBot": {$regexMatch: {input: "$u.account", regex: "(\\.bot$|^p_)"}}}}
])

Tests

  • New TestIsBotAccount (table-driven, incl. case-sensitivity and robot/alice.bot.com negatives) and TestNewSub_SetsIsBotFromAccount — both written test-first (red → green).
  • Updated the ReconcileMemberCounts bot-split integration test to seed u.isBot on the directly-inserted bot sub (the test bypasses the handler).
  • make test (whole repo, -race): 0 failures. make lint: 0 issues. gosec: 0 issues.

isBot is already documented in docs/client-api.md and was always serialized, so this is a correctness fix to an existing field — no client-API schema change.

Note: integration tests (Mongo-backed) were not run locally — no Docker in the dev environment — so the reconcile count behavior rests on CI.

🤖 This PR addresses finding #3 from the room-worker performance review; #1 and #2 are in a separate PR (branch claude/room-worker-performance-yMxIi).


Generated by Claude Code

Summary by CodeRabbit

  • New Features

    • Introduced standardized bot account classification system for consistent bot and pseudo-account identification across the platform.
  • Performance

    • Optimized member count reconciliation through targeted database indexing, enabling faster bot/non-bot member distinction.
  • Tests

    • Added comprehensive test coverage for bot classification and member count reconciliation logic.

Review Change Stack

…per-doc regex

ReconcileMemberCounts ran a $regexMatch on $u.account for every
subscription in the room, on every create/add/remove, to split user vs
app counts. The regex is not index-usable and runs once per document, so
the cost scaled with room size on every membership change.

model.Subscription already carries a u.isBot field that was never
populated. This wires it up:

- Add model.IsBotAccount as the single source of truth for the bot rule
  (".bot" suffix or "p_" prefix; equivalent to the (\.bot$|^p_) regex).
- Stamp u.isBot at sub-creation in newSub, and route the add-member
  inline build through newSub so there is one construction site. Also
  reuse the predicate in determineRoomTypeFromPayload.
- Rewrite ReconcileMemberCounts to two index-backed counts: total subs
  and bot subs ({roomId, u.isBot}); UserCount = total - bots. Deriving
  by subtraction keeps legacy docs missing the field counted as users.
  Recompute-and-$set preserves JetStream-redelivery idempotency.
- Add the {roomId, u.isBot} index in room-service EnsureIndexes.

Deploy ordering: ship this (writers stamp u.isBot) and backfill existing
docs before relying on the new counts, otherwise pre-existing bots count
as users until backfilled. Backfill (run once):

  db.subscriptions.updateMany({}, [
    {$set: {"u.isBot": {$regexMatch: {input: "$u.account", regex: "(\\.bot$|^p_)"}}}}
  ])

https://claude.ai/code/session_01KyEPakZnVkZKrPL5j9cjpw
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This PR centralizes bot account classification via a new IsBotAccount helper in the model layer, applies it during subscription creation to populate an IsBot flag, and optimizes member count reconciliation to use indexed queries on that flag instead of regex-based aggregation.

Changes

Bot account classification and member count optimization

Layer / File(s) Summary
Bot account classification model
pkg/model/account.go, pkg/model/account_test.go
New IsBotAccount(account string) bool helper classifies accounts ending with .bot or starting with p_ as bots, with table-driven tests covering positive patterns (.bot suffix, p_ prefix), negative cases (substring mismatches, wrong position, case sensitivity), and boundary conditions.
Subscription creation with bot classification
room-worker/handler.go, room-worker/subscription_isbot_test.go
newSub helper now sets SubscriptionUser.IsBot via model.IsBotAccount(user.Account). processAddMembers refactored to use newSub for all subscriptions. determineRoomTypeFromPayload unified to classify bot DMs via the centralized helper instead of inline pattern checks. New test validates IsBot derivation for bot and human accounts.
Member count reconciliation via indexed bot flag
room-service/store_mongo.go, room-worker/store_mongo.go, room-worker/store.go, room-worker/integration_test.go
Compound index on (roomId, u.isBot) added to subscriptions collection. ReconcileMemberCounts reimplemented to count total and bot subscriptions via two CountDocuments queries, then write userCount (total minus bots) and appCount (bots) back atomically, replacing prior aggregation pipeline with regex classification. Documentation and integration test updated to describe index-backed approach and explicit IsBot flag handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • hmchangw/chat#137: Both PRs modify room-worker/handler.go's processAddMembers and subscription creation logic—this PR updates newSub to set u.isBot via IsBotAccount, while that PR populates Subscription.RoomType on the same created subscriptions.

Suggested reviewers

  • mliu33
  • yenta

Poem

🐰 A bot now has a badge to wear,
No regex needed everywhere!
Index the flag, count without care,
Room counts reconcile fair and square. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main performance optimization: replacing per-document regex matching with a denormalized u.isBot field for member counting.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/room-worker-reconcile-no-regex

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/model/account_test.go (1)

1-1: ⚡ Quick win

Use the package under test here.

This repository’s test convention is same-package tests; package model_test makes this an external test and diverges from the rest of the Go test guidelines.

As per coding guidelines, "Test files live in the same package (package main) to access unexported types."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/model/account_test.go` at line 1, Change the test package from the
external package "model_test" to the same-package "model" so the test runs as an
internal test and can access unexported types; update the top-level package
declaration in pkg/model/account_test.go from "package model_test" to "package
model" and verify imports/usages in that file still compile (adjust any
references that assumed external visibility).
room-worker/integration_test.go (1)

473-477: ⚡ Quick win

Add a legacy-doc case for missing u.isBot.

This test now covers the backfilled shape, but the new reconciliation behavior also depends on subscriptions that predate the field being counted as users. A second fixture/assertion for a bot-looking account without IsBot would lock in that edge case.

As per coding guidelines, "Tests must cover: happy path, error paths, edge cases (empty collections, boundary conditions), and invalid input."

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/model/account_test.go`:
- Line 1: Change the test package from the external package "model_test" to the
same-package "model" so the test runs as an internal test and can access
unexported types; update the top-level package declaration in
pkg/model/account_test.go from "package model_test" to "package model" and
verify imports/usages in that file still compile (adjust any references that
assumed external visibility).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a6d1254c-0414-442f-97c5-d8837d93c91b

📥 Commits

Reviewing files that changed from the base of the PR and between ad10203 and b4e4e03.

📒 Files selected for processing (8)
  • pkg/model/account.go
  • pkg/model/account_test.go
  • room-service/store_mongo.go
  • room-worker/handler.go
  • room-worker/integration_test.go
  • room-worker/store.go
  • room-worker/store_mongo.go
  • room-worker/subscription_isbot_test.go

Copy link
Copy Markdown
Collaborator

@mliu33 mliu33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mliu33 mliu33 merged commit 7cd48d1 into main May 30, 2026
8 checks passed
vjauhari-work pushed a commit that referenced this pull request Jun 1, 2026
…rop duplicate index

Three CodeRabbit findings on commit 5c6d14c:

- authorizeRoomAppRead now verifies the room exists before allowing
  the admin bypass. Without this, a platform admin could query
  app.tabs/app.cmd-menu for a fabricated roomID and receive a
  plausible-looking response (global default-tab apps; empty
  cmd-menu). Order: member-or-admin first; if admin, then GetRoom
  gates. Non-admin non-members still deny in 2 round-trips.
- handleGetRoomAppTabs warning logs (nil ChannelTab, unparseable
  URL) now carry requestId for trace correlation.
- Drop my duplicate (roomId, u.isBot) subscriptions index in
  EnsureIndexes — PR #241 already added the same index upstream;
  rebase landed both and the second CreateOne is redundant.

Spec doc updated to reflect the room-existence gate. Handler tests
cover both new branches (admin + room-exists, admin + room-missing,
admin + room-error).
vjauhari-work pushed a commit that referenced this pull request Jun 2, 2026
…rop duplicate index

Three CodeRabbit findings on commit 5c6d14c:

- authorizeRoomAppRead now verifies the room exists before allowing
  the admin bypass. Without this, a platform admin could query
  app.tabs/app.cmd-menu for a fabricated roomID and receive a
  plausible-looking response (global default-tab apps; empty
  cmd-menu). Order: member-or-admin first; if admin, then GetRoom
  gates. Non-admin non-members still deny in 2 round-trips.
- handleGetRoomAppTabs warning logs (nil ChannelTab, unparseable
  URL) now carry requestId for trace correlation.
- Drop my duplicate (roomId, u.isBot) subscriptions index in
  EnsureIndexes — PR #241 already added the same index upstream;
  rebase landed both and the second CreateOne is redundant.

Spec doc updated to reflect the room-existence gate. Handler tests
cover both new branches (admin + room-exists, admin + room-missing,
admin + room-error).
vjauhari-work pushed a commit that referenced this pull request Jun 2, 2026
…rop duplicate index

Three CodeRabbit findings on commit 5c6d14c:

- authorizeRoomAppRead now verifies the room exists before allowing
  the admin bypass. Without this, a platform admin could query
  app.tabs/app.cmd-menu for a fabricated roomID and receive a
  plausible-looking response (global default-tab apps; empty
  cmd-menu). Order: member-or-admin first; if admin, then GetRoom
  gates. Non-admin non-members still deny in 2 round-trips.
- handleGetRoomAppTabs warning logs (nil ChannelTab, unparseable
  URL) now carry requestId for trace correlation.
- Drop my duplicate (roomId, u.isBot) subscriptions index in
  EnsureIndexes — PR #241 already added the same index upstream;
  rebase landed both and the second CreateOne is redundant.

Spec doc updated to reflect the room-existence gate. Handler tests
cover both new branches (admin + room-exists, admin + room-missing,
admin + room-error).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants