Add --parallel to serena project index for concurrent indexing#1528
Open
QuantFunc wants to merge 1 commit into
Open
Add --parallel to serena project index for concurrent indexing#1528QuantFunc wants to merge 1 commit into
serena project index for concurrent indexing#1528QuantFunc wants to merge 1 commit into
Conversation
…indexing serena project index indexed files one at a time. The bottleneck is language-server round-trip latency (not CPU), so concurrent requests pipeline well. Add --parallel N (click.IntRange(min=1), default 1 = unchanged serial). N>1 uses a ThreadPoolExecutor; the main thread drains futures and does ALL accumulation (no worker-side race); periodic cache save skipped on the parallel path (single save_all_caches() after join avoids "dict changed size during iteration"). Make SolidLanguageServer safely multi-thread-drivable: add a per-LS re-entrant _state_lock guarding open_file_buffers bookkeeping + document-symbol cache writes. The lock wraps ONLY in-process dict ops — never a language-server round-trip (didOpen/didClose I/O is done outside it), so it neither serializes concurrent requests nor risks a _state_lock<->_stdin_lock deadlock. open_file teardown moves fb.close() outside the lock and the ref-count decrement into try/finally (also fixes a pre-existing ref-count leak when the yielded body raised). Measured (52-file C++ subtree): serial 2.25 it/s vs --parallel 8 9.45 it/s; cold-cache serial-vs-parallel symbol caches byte-identical (52/52), 0 failures, no deadlock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
serena project indexindexes files one at a time (onerequest_document_symbolsper file). On large projects this is slow — the bottleneck is language-server round-trip latency, not local CPU, so concurrent requests pipeline well.This PR adds
--parallel Ntoserena project index(click.IntRange(min=1), default 1 = unchanged serial behaviour). ForN > 1, aThreadPoolExecutorissues concurrent document-symbol requests.What changed
src/serena/cli.py--parallel Noption onproject index.index_one→ returns(language, exception)) and a main-thread accumulator (record). Workers do no shared-state mutation; the main thread does all counting / failure-list building, so there is no worker-side data race.save_all_caches()(which iterates each LS's cache dict) while workers still write keys could raisedict changed size during iteration. The singlesave_all_caches()after the pool joins covers it.src/solidlsp/ls.py— makesSolidLanguageServersafe to drive from multiple threads:_state_lockguarding theopen_file_buffersbookkeeping and the document-symbol cache writes.open_fileconstructs a new buffer withopen_in_ls=Falseunder the lock and sends thedidOpenviaensure_open_in_ls()after releasing it; teardown captures the buffer under the lock and callsfb.close()(didClose) outside it. So the lock neither serializes concurrent requests nor creates a_state_lock↔ stdin-lock ordering hazard.open_file's ref-count decrement moves into atry/finally(also fixes a pre-existing ref-count leak when theyieldbody raised); the buffer delete is guarded by an identity check against concurrent re-creation.Backwards compatibility
--paralleldefaults to 1 → the serial branch, which reproduces the original loop exactly._index_projectgainsparallel: int = 1, so the existingcreate --indexcaller is unaffected. Theopen_filerefactor preserves single-threaded semantics (didOpentiming unchanged via the idempotentensure_open_in_ls).Measured
52-file C++ subtree (clangd):
--parallel 1(serial)--parallel 8Cold-cache
--parallel 8(exercises the new-buffer +didOpenpath) and the serial run produce byte-identical symbol caches (52/52 entries) — parallel indexing yields the same result, just faster.Note
Document-symbol cache writes are under
_state_lock; the cache reads are not yet (a follow-up). Under CPython this is safe for the--paralleldistinct-file case (each worker writes a distinct key), which is verified by the byte-identical result above. Locking the reads too (for arbitrary concurrent callers / free-threaded builds) is a small follow-up.Usage