perf(search-sync-worker): stored scripts for user-room + batch/ack config guard#252
Conversation
…nfig guard
Two performance improvements to search-sync-worker:
1. User-room scripted updates now reference ES stored scripts by id instead
of inlining the full ~600-byte painless source in every bulk action. A
bulk member event fanning out to N accounts previously shipped N copies
of the script body; it now ships one id reference per action, shrinking
large fan-out bulk payloads ~3-4x and avoiding repeated inline-script
lookups on the ES side. Scripts are registered at startup via a new
searchengine.PutScript (PUT /_scripts/{id}) and exposed per-collection
through Collection.StoredScripts().
2. Startup now warns when BULK_BATCH_SIZE exceeds CONSUMER_MAX_ACK_PENDING.
In that regime a 1:1 collection stalls at MaxAckPending unacked messages
before ActionCount can reach the bulk cap, so the size-based flush never
fires and every flush waits the full BulkFlushInterval with undersized
batches. The guard surfaces the misconfiguration instead of silently
degrading latency.
https://claude.ai/code/session_01Dpze4Fvf76yELwfGjoTTht
Benchmarks the per-event parse + document-build cost for the message, spotlight, and user-room collections across fan-out widths. Captures the CPU work a pipelined flush would overlap with the ES bulk round-trip, and doubles as a regression guard on the user-room scripted-update build cost. https://claude.ai/code/session_01Dpze4Fvf76yELwfGjoTTht
📝 WalkthroughWalkthroughThis PR adds Elasticsearch stored scripts support to the search worker. The SearchEngine interface and httpAdapter gain a PutScript method for registering scripts via HTTP PUT. The Collection interface extends with StoredScripts() for declaring per-collection script dependencies. Worker startup registers scripts before consumer creation, with configuration validation for batch/ack coupling. Most collections return nil; user-room collection implements actual painless scripts for add/remove room operations and switches bulk action payloads from inlined source to stored script ID references. ChangesElasticsearch Stored Scripts Infrastructure
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@search-sync-worker/consumer_config_test.go`:
- Around line 86-102: Add coverage for unlimited ack-pending to
TestCheckBatchAckCoupling: update the TestCheckBatchAckCoupling test (the
subtests around checkBatchAckCoupling) to include a subtest that asserts
checkBatchAckCoupling(2000, -1) returns empty (and similarly for 0) so that
non-positive maxAckPending values do not emit the warning; place these
assertions alongside the existing "warns when bulk size exceeds ack pending"
case to lock in the unlimited-consumer behavior for the checkBatchAckCoupling
function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bca3d06e-b976-47ff-aa77-95e472c07e95
📒 Files selected for processing (16)
pkg/searchengine/adapter.gopkg/searchengine/adapter_test.gopkg/searchengine/searchengine.gosearch-sync-worker/collection.gosearch-sync-worker/consumer_config_test.gosearch-sync-worker/handler_test.gosearch-sync-worker/inbox_integration_test.gosearch-sync-worker/inbox_stream.gosearch-sync-worker/integration_test.gosearch-sync-worker/main.gosearch-sync-worker/messages.gosearch-sync-worker/messages_test.gosearch-sync-worker/perf_bench_test.gosearch-sync-worker/spotlight_test.gosearch-sync-worker/user_room.gosearch-sync-worker/user_room_test.go
| func TestCheckBatchAckCoupling(t *testing.T) { | ||
| t.Run("ok when bulk size below ack pending", func(t *testing.T) { | ||
| assert.Empty(t, checkBatchAckCoupling(500, 1000)) | ||
| }) | ||
|
|
||
| t.Run("ok when bulk size equals ack pending", func(t *testing.T) { | ||
| // At equality a 1:1 collection can still just reach the threshold | ||
| // (the size trigger fires at ActionCount >= bulkBatchSize), so this | ||
| // is not flagged. | ||
| assert.Empty(t, checkBatchAckCoupling(1000, 1000)) | ||
| }) | ||
|
|
||
| t.Run("warns when bulk size exceeds ack pending", func(t *testing.T) { | ||
| msg := checkBatchAckCoupling(2000, 1000) | ||
| assert.NotEmpty(t, msg, "bulk size above ack pending must produce a warning") | ||
| }) | ||
| } |
There was a problem hiding this comment.
Add a case for unlimited ack-pending.
If the non-positive maxAckPending guard is adopted (see main.go comment), add a subtest asserting checkBatchAckCoupling(2000, -1) (and 0) returns empty, to lock in that unlimited consumers don't trigger the warning.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@search-sync-worker/consumer_config_test.go` around lines 86 - 102, Add
coverage for unlimited ack-pending to TestCheckBatchAckCoupling: update the
TestCheckBatchAckCoupling test (the subtests around checkBatchAckCoupling) to
include a subtest that asserts checkBatchAckCoupling(2000, -1) returns empty
(and similarly for 0) so that non-positive maxAckPending values do not emit the
warning; place these assertions alongside the existing "warns when bulk size
exceeds ack pending" case to lock in the unlimited-consumer behavior for the
checkBatchAckCoupling function.
Summary
Two performance fixes to
search-sync-worker, plus a benchmark backing an investigation into a third (larger) throughput change that is not included here.1. Stored scripts for user-room updates (commit 1)
The user-room scripted
_updateactions previously inlined the full ~600-byte painless source in every bulk action. A member event fanning out to N accounts shipped N copies of the script body. They now reference ES stored scripts by id:searchengine.PutScript(PUT /_scripts/{id}) on the engine interface +httpAdapter, mirroringUpsertTemplate.Collection.StoredScripts();userRoomCollectionregisters its add/remove scripts under versioned ids (search-sync-user-room-add/remove-v1); messages/spotlight return nil.buildAddRoomUpdateBody/buildRemoveRoomUpdateBodynow emit{"script":{"id":…}};main.goregisters scripts at startup before any consumer runs.Impact: ~3–4× smaller bulk payloads on large fan-out member events; negligible on steady-state message indexing. The painless logic itself is unchanged — only its registration mechanism.
2. Batch / ack-pending config guard (commit 1)
Startup now warns when
BULK_BATCH_SIZE > CONSUMER_MAX_ACK_PENDING. In that regime a 1:1 collection stalls atMaxAckPendingunacked messages beforeActionCountcan reach the bulk cap, so the size-based flush never fires and every flush waits the fullBulkFlushInterval. Warning (not fatal), since fan-out collections are unaffected.3. Parse-cost benchmarks (commit 2) — investigation only, no behavior change
perf_bench_test.gomeasuresBuildActioncost per collection across fan-out widths. Used to evaluate whether pipelining/double-buffering the ES flush is worth it.Finding: parse is single-digit ms per 500-action batch vs a tens-to-100ms ES round-trip, so double-buffering would yield only ~5–17%. The real lever is a bounded concurrent flush-pool (safe here because all three collections are order-independent: external versioning for messages/spotlight, painless LWW guard for user-room). That change is not in this PR — it should be gated on confirming actual consumer backlog first.
Test plan
make test(full repo, race) ✅make lint✅ (0 issues)go vet -tags integration ./search-sync-worker/✅PutScript,StoredScripts,checkBatchAckCouplingwritten test-first (RED → GREEN).checkBatchAckCoupling/StoredScripts/storedScriptBody100%,PutScript78.6%, update-body builders 80–90%.Notes for reviewer
-v1; bump the suffix if a script body ever changes incompatibly, to avoid old/new pods sharing a mutated definition mid-rolling-deploy.K × BULK_BATCH_SIZE ≤ MAX_ACK_PENDINGand raisingMaxIdleConnsPerHost.https://claude.ai/code/session_01Dpze4Fvf76yELwfGjoTTht
Generated by Claude Code
Summary by CodeRabbit
Release Notes
New Features
Tests
Chores