Add --parallel to `serena project index` for concurrent indexing by QuantFunc · Pull Request #1528 · oraios/serena

QuantFunc · 2026-05-30T06:03:42Z

Summary

serena project index indexes files one at a time (one request_document_symbols per file). On large projects this is slow — the bottleneck is language-server round-trip latency, not local CPU, so concurrent requests pipeline well.

This PR adds --parallel N to serena project index (click.IntRange(min=1), default 1 = unchanged serial behaviour). For N > 1, a ThreadPoolExecutor issues concurrent document-symbol requests.

What changed

src/serena/cli.py

New --parallel N option on project index.
The per-file loop is split into a worker (index_one → returns (language, exception)) and a main-thread accumulator (record). Workers do no shared-state mutation; the main thread does all counting / failure-list building, so there is no worker-side data race.
On the parallel path the periodic intermediate save is intentionally skipped — running save_all_caches() (which iterates each LS's cache dict) while workers still write keys could raise dict changed size during iteration. The single save_all_caches() after the pool joins covers it.

src/solidlsp/ls.py — makes SolidLanguageServer safe to drive from multiple threads:

Add a per-LS re-entrant _state_lock guarding the open_file_buffers bookkeeping and the document-symbol cache writes.
The lock wraps only in-process dict operations — never a language-server round-trip. open_file constructs a new buffer with open_in_ls=False under the lock and sends the didOpen via ensure_open_in_ls() after releasing it; teardown captures the buffer under the lock and calls fb.close() (didClose) outside it. So the lock neither serializes concurrent requests nor creates a _state_lock ↔ stdin-lock ordering hazard.
open_file's ref-count decrement moves into a try/finally (also fixes a pre-existing ref-count leak when the yield body raised); the buffer delete is guarded by an identity check against concurrent re-creation.

Backwards compatibility

--parallel defaults to 1 → the serial branch, which reproduces the original loop exactly. _index_project gains parallel: int = 1, so the existing create --index caller is unaffected. The open_file refactor preserves single-threaded semantics (didOpen timing unchanged via the idempotent ensure_open_in_ls).

Measured

52-file C++ subtree (clangd):

mode	rate	result
`--parallel 1` (serial)	~2.25 it/s	cpp=52, 0 failures
`--parallel 8`	~9.45 it/s (~4x)	cpp=52, 0 failures, no deadlock

Cold-cache --parallel 8 (exercises the new-buffer + didOpen path) and the serial run produce byte-identical symbol caches (52/52 entries) — parallel indexing yields the same result, just faster.

Note

Document-symbol cache writes are under _state_lock; the cache reads are not yet (a follow-up). Under CPython this is safe for the --parallel distinct-file case (each worker writes a distinct key), which is verified by the byte-identical result above. Locking the reads too (for arbitrary concurrent callers / free-threaded builds) is a small follow-up.

Usage

serena project index --parallel 8

…indexing serena project index indexed files one at a time. The bottleneck is language-server round-trip latency (not CPU), so concurrent requests pipeline well. Add --parallel N (click.IntRange(min=1), default 1 = unchanged serial). N>1 uses a ThreadPoolExecutor; the main thread drains futures and does ALL accumulation (no worker-side race); periodic cache save skipped on the parallel path (single save_all_caches() after join avoids "dict changed size during iteration"). Make SolidLanguageServer safely multi-thread-drivable: add a per-LS re-entrant _state_lock guarding open_file_buffers bookkeeping + document-symbol cache writes. The lock wraps ONLY in-process dict ops — never a language-server round-trip (didOpen/didClose I/O is done outside it), so it neither serializes concurrent requests nor risks a _state_lock<->_stdin_lock deadlock. open_file teardown moves fb.close() outside the lock and the ref-count decrement into try/finally (also fixes a pre-existing ref-count leak when the yielded body raised). Measured (52-file C++ subtree): serial 2.25 it/s vs --parallel 8 9.45 it/s; cold-cache serial-vs-parallel symbol caches byte-identical (52/52), 0 failures, no deadlock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tphakala mentioned this pull request May 30, 2026

Fix gopls replace_symbol_body corrupting type/var/const declarations #1530

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add --parallel to `serena project index` for concurrent indexing#1528

Add --parallel to `serena project index` for concurrent indexing#1528
QuantFunc wants to merge 1 commit into
oraios:mainfrom
QuantFunc:feat/parallel-index

QuantFunc commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

QuantFunc commented May 30, 2026

Summary

What changed

Backwards compatibility

Measured

Note

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant