Skip to content

perf(search-sync-worker): stored scripts for user-room + batch/ack config guard#252

Merged
hmchangw merged 2 commits into
mainfrom
claude/search-sync-perf-kCrsp
Jun 4, 2026
Merged

perf(search-sync-worker): stored scripts for user-room + batch/ack config guard#252
hmchangw merged 2 commits into
mainfrom
claude/search-sync-perf-kCrsp

Conversation

@hmchangw
Copy link
Copy Markdown
Owner

@hmchangw hmchangw commented Jun 1, 2026

Summary

Two performance fixes to search-sync-worker, plus a benchmark backing an investigation into a third (larger) throughput change that is not included here.

1. Stored scripts for user-room updates (commit 1)

The user-room scripted _update actions previously inlined the full ~600-byte painless source in every bulk action. A member event fanning out to N accounts shipped N copies of the script body. They now reference ES stored scripts by id:

  • New searchengine.PutScript (PUT /_scripts/{id}) on the engine interface + httpAdapter, mirroring UpsertTemplate.
  • New Collection.StoredScripts(); userRoomCollection registers its add/remove scripts under versioned ids (search-sync-user-room-add/remove-v1); messages/spotlight return nil.
  • buildAddRoomUpdateBody/buildRemoveRoomUpdateBody now emit {"script":{"id":…}}; main.go registers scripts at startup before any consumer runs.

Impact: ~3–4× smaller bulk payloads on large fan-out member events; negligible on steady-state message indexing. The painless logic itself is unchanged — only its registration mechanism.

2. Batch / ack-pending config guard (commit 1)

Startup now warns when BULK_BATCH_SIZE > CONSUMER_MAX_ACK_PENDING. In that regime a 1:1 collection stalls at MaxAckPending unacked messages before ActionCount can reach the bulk cap, so the size-based flush never fires and every flush waits the full BulkFlushInterval. Warning (not fatal), since fan-out collections are unaffected.

3. Parse-cost benchmarks (commit 2) — investigation only, no behavior change

perf_bench_test.go measures BuildAction cost per collection across fan-out widths. Used to evaluate whether pipelining/double-buffering the ES flush is worth it.

Finding: parse is single-digit ms per 500-action batch vs a tens-to-100ms ES round-trip, so double-buffering would yield only ~5–17%. The real lever is a bounded concurrent flush-pool (safe here because all three collections are order-independent: external versioning for messages/spotlight, painless LWW guard for user-room). That change is not in this PR — it should be gated on confirming actual consumer backlog first.

Test plan

  • make test (full repo, race) ✅
  • make lint ✅ (0 issues)
  • go vet -tags integration ./search-sync-worker/
  • TDD throughout: PutScript, StoredScripts, checkBatchAckCoupling written test-first (RED → GREEN).
  • Coverage of new code: checkBatchAckCoupling/StoredScripts/storedScriptBody 100%, PutScript 78.6%, update-body builders 80–90%.
  • Integration tests updated to register stored scripts before exercising real-ES updates.

Notes for reviewer

  • Script ids are suffixed -v1; bump the suffix if a script body ever changes incompatibly, to avoid old/new pods sharing a mutated definition mid-rolling-deploy.
  • A flush-pool (item 3) would also require extending the ack-pending guard to K × BULK_BATCH_SIZE ≤ MAX_ACK_PENDING and raising MaxIdleConnsPerHost.

https://claude.ai/code/session_01Dpze4Fvf76yELwfGjoTTht


Generated by Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for registering and referencing Elasticsearch stored scripts, reducing payload size and improving indexing performance.
  • Tests

    • Added comprehensive test coverage for stored script registration and configuration validation.
    • Added performance benchmarks for action building across different collection types.
  • Chores

    • Enhanced startup validation to warn when bulk batch size exceeds consumer acknowledgment capacity, ensuring optimal flush behavior.

claude added 2 commits June 1, 2026 13:39
…nfig guard

Two performance improvements to search-sync-worker:

1. User-room scripted updates now reference ES stored scripts by id instead
   of inlining the full ~600-byte painless source in every bulk action. A
   bulk member event fanning out to N accounts previously shipped N copies
   of the script body; it now ships one id reference per action, shrinking
   large fan-out bulk payloads ~3-4x and avoiding repeated inline-script
   lookups on the ES side. Scripts are registered at startup via a new
   searchengine.PutScript (PUT /_scripts/{id}) and exposed per-collection
   through Collection.StoredScripts().

2. Startup now warns when BULK_BATCH_SIZE exceeds CONSUMER_MAX_ACK_PENDING.
   In that regime a 1:1 collection stalls at MaxAckPending unacked messages
   before ActionCount can reach the bulk cap, so the size-based flush never
   fires and every flush waits the full BulkFlushInterval with undersized
   batches. The guard surfaces the misconfiguration instead of silently
   degrading latency.

https://claude.ai/code/session_01Dpze4Fvf76yELwfGjoTTht
Benchmarks the per-event parse + document-build cost for the message,
spotlight, and user-room collections across fan-out widths. Captures the
CPU work a pipelined flush would overlap with the ES bulk round-trip, and
doubles as a regression guard on the user-room scripted-update build cost.

https://claude.ai/code/session_01Dpze4Fvf76yELwfGjoTTht
@hmchangw hmchangw added the not ready label Jun 1, 2026 — with Claude
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds Elasticsearch stored scripts support to the search worker. The SearchEngine interface and httpAdapter gain a PutScript method for registering scripts via HTTP PUT. The Collection interface extends with StoredScripts() for declaring per-collection script dependencies. Worker startup registers scripts before consumer creation, with configuration validation for batch/ack coupling. Most collections return nil; user-room collection implements actual painless scripts for add/remove room operations and switches bulk action payloads from inlined source to stored script ID references.

Changes

Elasticsearch Stored Scripts Infrastructure

Layer / File(s) Summary
SearchEngine PutScript contract and adapter
pkg/searchengine/searchengine.go, pkg/searchengine/adapter.go, pkg/searchengine/adapter_test.go
SearchEngine interface adds PutScript(ctx, id, body) method. httpAdapter implements it with HTTP PUT to /_scripts/{id}, JSON content-type, response validation, and error body inclusion. Tests assert correct endpoint, headers, and error behavior.
Collection interface StoredScripts extension
search-sync-worker/collection.go
Collection interface gains StoredScripts() returning map of script IDs to JSON bodies (or nil), enabling each collection to declare Elasticsearch stored script dependencies for startup registration.
Worker startup registration and config validation
search-sync-worker/main.go, search-sync-worker/consumer_config_test.go
Worker startup registers per-collection stored scripts before creating consumers, iterating StoredScripts() and calling engine.PutScript. Added checkBatchAckCoupling validation warns when BULK_BATCH_SIZE exceeds consumer MaxAckPending, preventing size-based flush triggers under 1:1 delivery. Tests verify coupling threshold behavior.
Collection no-script implementations
search-sync-worker/messages.go, search-sync-worker/messages_test.go, search-sync-worker/inbox_stream.go, search-sync-worker/handler_test.go, search-sync-worker/spotlight_test.go
messageCollection, inboxMemberCollection, and test stubs implement StoredScripts() returning nil. Unit tests assert empty/nil return values confirming no stored scripts are used.
User room stored scripts implementation
search-sync-worker/user_room.go, search-sync-worker/user_room_test.go
userRoomCollection implements StoredScripts() with add/remove painless scripts via storedScriptBody helper. Script ID constants (addRoomScriptID, removeRoomScriptID) introduced. buildAddRoomUpdateBody and buildRemoveRoomUpdateBody switch from inlined script.source to script.id references. Tests updated to assert script.id field and lack of script.source; StoredScripts test validates script bodies contain expected painless logic.
Integration testing and helpers
search-sync-worker/integration_test.go, search-sync-worker/inbox_integration_test.go
registerStoredScripts helper registers collection scripts into Elasticsearch. Three inbox integration tests call helper after template upsert and before index creation.
Performance benchmarks
search-sync-worker/perf_bench_test.go
New benchmark file measures per-action CPU cost via helpers benchMessageData and benchMemberData generating realistic payloads. BenchmarkBuildAction_Message, BenchmarkBuildAction_UserRoom (fan-out widths 1/10/100/1000), and BenchmarkBuildAction_Spotlight exercise BuildAction across collection types.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • hmchangw/chat#109: Refactors INBOX-based user-room sync by switching from inlined Elasticsearch painless scripts to registered stored scripts using the new PutScript interface and StoredScripts() collection method.
  • hmchangw/chat#115: Modifies user-room Elasticsearch script logic to route restrictedRooms via HistorySharedSince, impacting the same user_room.go painless script bodies being migrated to stored scripts.
  • hmchangw/chat#64: Introduces the initial pkg/searchengine HTTP adapter and SearchEngine interface for Elasticsearch bulk/template operations that this PR extends with PutScript for stored script registration.

Suggested reviewers

  • mliu33

🐰 Scripts stored, fees deferred,
No more source inline,
Just IDs, cleanly preferred—
Elasticsearch shine!
✨🔍

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the two main changes: stored scripts for user-room updates and a batch/ack configuration guard, matching the core objectives of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/search-sync-perf-kCrsp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@search-sync-worker/consumer_config_test.go`:
- Around line 86-102: Add coverage for unlimited ack-pending to
TestCheckBatchAckCoupling: update the TestCheckBatchAckCoupling test (the
subtests around checkBatchAckCoupling) to include a subtest that asserts
checkBatchAckCoupling(2000, -1) returns empty (and similarly for 0) so that
non-positive maxAckPending values do not emit the warning; place these
assertions alongside the existing "warns when bulk size exceeds ack pending"
case to lock in the unlimited-consumer behavior for the checkBatchAckCoupling
function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bca3d06e-b976-47ff-aa77-95e472c07e95

📥 Commits

Reviewing files that changed from the base of the PR and between 3674c1d and f4bfc47.

📒 Files selected for processing (16)
  • pkg/searchengine/adapter.go
  • pkg/searchengine/adapter_test.go
  • pkg/searchengine/searchengine.go
  • search-sync-worker/collection.go
  • search-sync-worker/consumer_config_test.go
  • search-sync-worker/handler_test.go
  • search-sync-worker/inbox_integration_test.go
  • search-sync-worker/inbox_stream.go
  • search-sync-worker/integration_test.go
  • search-sync-worker/main.go
  • search-sync-worker/messages.go
  • search-sync-worker/messages_test.go
  • search-sync-worker/perf_bench_test.go
  • search-sync-worker/spotlight_test.go
  • search-sync-worker/user_room.go
  • search-sync-worker/user_room_test.go

Comment on lines +86 to +102
func TestCheckBatchAckCoupling(t *testing.T) {
t.Run("ok when bulk size below ack pending", func(t *testing.T) {
assert.Empty(t, checkBatchAckCoupling(500, 1000))
})

t.Run("ok when bulk size equals ack pending", func(t *testing.T) {
// At equality a 1:1 collection can still just reach the threshold
// (the size trigger fires at ActionCount >= bulkBatchSize), so this
// is not flagged.
assert.Empty(t, checkBatchAckCoupling(1000, 1000))
})

t.Run("warns when bulk size exceeds ack pending", func(t *testing.T) {
msg := checkBatchAckCoupling(2000, 1000)
assert.NotEmpty(t, msg, "bulk size above ack pending must produce a warning")
})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a case for unlimited ack-pending.

If the non-positive maxAckPending guard is adopted (see main.go comment), add a subtest asserting checkBatchAckCoupling(2000, -1) (and 0) returns empty, to lock in that unlimited consumers don't trigger the warning.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@search-sync-worker/consumer_config_test.go` around lines 86 - 102, Add
coverage for unlimited ack-pending to TestCheckBatchAckCoupling: update the
TestCheckBatchAckCoupling test (the subtests around checkBatchAckCoupling) to
include a subtest that asserts checkBatchAckCoupling(2000, -1) returns empty
(and similarly for 0) so that non-positive maxAckPending values do not emit the
warning; place these assertions alongside the existing "warns when bulk size
exceeds ack pending" case to lock in the unlimited-consumer behavior for the
checkBatchAckCoupling function.

Copy link
Copy Markdown
Collaborator

@Joey0538 Joey0538 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Learnt something new about stored scripts ES API! 🚀

@hmchangw hmchangw added ready and removed not ready labels Jun 4, 2026
@hmchangw hmchangw merged commit 4e37a8a into main Jun 4, 2026
6 checks passed
@hmchangw hmchangw deleted the claude/search-sync-perf-kCrsp branch June 4, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants