fix: defer KG rebuild during batch document deletion by he-yufeng · Pull Request #2819 · HKUDS/LightRAG

he-yufeng · 2026-03-21T15:27:01Z

Summary

When deleting N documents via background_delete_documents(), each adelete_by_doc_id() call triggers a full rebuild_knowledge_from_chunks(). For entities shared across documents, this means the same summaries get recomputed N times. On an 85-document batch, the issue author measured ~75x redundant LLM calls.

This PR adds a skip_rebuild parameter to adelete_by_doc_id() (defaults to False for backwards compat). The batch deletion loop passes skip_rebuild=True, collects rebuild targets from each result, and does a single combined rebuild at the end.

Changes:

lightrag/base.py — DeletionResult gains optional entities_to_rebuild / relationships_to_rebuild fields
lightrag/lightrag.py — adelete_by_doc_id() conditionally skips rebuild when skip_rebuild=True, populates rebuild targets on the result
lightrag/api/routers/document_routes.py — background_delete_documents() aggregates targets across loop, single rebuild_knowledge_from_chunks() call at the end
tests/test_batch_delete_deferred_rebuild.py — 4 tests covering both modes

Single-doc deletion (direct adelete_by_doc_id() calls) is completely unaffected — same code path as before.

Closes #2795

danielaskdd · 2026-03-23T10:29:08Z

Deferring the refactoring of entity relationships is a sound approach; however, there are several issues in the current PR that need to be addressed:

Redundant Tasks in Overlapping Pools: If multiple documents scheduled for deletion share common entities, those entities might be tagged for both "Refactor" and "Delete" pools simultaneously. In such cases, the refactoring task becomes redundant and should be invalidated or skipped.
Lack of Robust Error Handling: The LLM-driven KG rebuild stage is prone to failure. If an error occurs during this phase, the pipeline should abort without removing the file from doc_status. This ensures the system maintains state and allows the user to re-trigger the deletion process later, and idempotency for deletion operation should be ensured.

he-yufeng · 2026-03-23T11:12:42Z

Addressed the feedback — latest push includes:

Pruned stale entities/relations from the rebuild pool (checks graph existence before queueing)
Added error handling around the deferred rebuild so a failure there doesn't take down the whole delete operation
Cleaned up the pool merging logic to avoid duplicates across overlapping batches

danielaskdd · 2026-03-23T14:03:33Z

This PR conflicts with PR #2826 ("make document deletion retry-safe"). Please resolve the conflicts before we proceed with the review.

he-yufeng · 2026-03-24T02:56:32Z

Rebased onto main, conflicts with #2826 resolved.

danielaskdd · 2026-03-24T04:09:28Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c7f84b884

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T04:13:52Z

+                if not skip_rebuild:
+                    await self.doc_status.delete([doc_id])
                await self.full_docs.delete([doc_id])


Preserve rebuild metadata when deferring cleanup

When skip_rebuild=True, this path skips doc_status deletion but still proceeds after stage 10, where full_entities/full_relations are removed before returning success. If the later deferred rebuild fails in background_delete_documents, a retry cannot reconstruct entities_to_rebuild/relationships_to_rebuild because the per-doc graph metadata is already gone, so the promised “re-trigger deletion to retry” flow cannot actually repair the KG and can leave stale summaries indefinitely.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-24T04:13:52Z

+        elif not rebuild_ok:
+            keep_msg = (
+                f"Keeping doc_status for {len(successful_deletions)} docs "
+                f"because rebuild failed — re-trigger deletion to retry"
+            )


Mark deferred rebuild failures as failed deletions

In the rebuild-failure branch, the code logs a warning but never reclassifies the previously successful_deletions as failed. The finally block then emits a completion summary from these counters, so users can see a fully successful batch even though doc cleanup was intentionally withheld due to rebuild failure; this makes operational status inaccurate and can hide that manual retry is still required.

Useful? React with 👍 / 👎.

he-yufeng · 2026-03-24T13:33:24Z

@danielaskdd Thanks for the review — both points were solid catches. Here's what I did:

1. Redundant tasks in overlapping pools:

Before the deferred rebuild runs, I now check whether each entity/relation still exists in the graph. If doc A tagged an entity for "rebuild" but doc B's deletion later removed it entirely, the rebuild for that node gets skipped. The pruning loop sits right before rebuild_knowledge_from_chunks in document_routes.py — it calls has_node() / has_edge() and drops anything that's already gone.

2. Error handling / idempotency:

When skip_rebuild=True, adelete_by_doc_id now keeps doc_status alive intentionally. The batch handler in document_routes.py only removes doc_status records after the deferred rebuild succeeds. If the rebuild blows up, doc_status stays put so the user can re-trigger deletion later — the retry-safe machinery from #2826 handles the rest.

3. Conflict with #2826:

Rebased on latest main (which includes #2826). Resolved cleanly, no merge conflicts. The in_final_delete_stage guard and _update_delete_retry_state from #2826 work as-is with the new skip_rebuild path.

danielaskdd · 2026-03-24T14:43:35Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 415e12970c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T14:48:34Z

+                if not skip_rebuild:
+                    await self.doc_status.delete([doc_id])
                await self.full_docs.delete([doc_id])


Preserve graph metadata until deferred rebuild succeeds

When skip_rebuild=True, this branch keeps doc_status for retry, but the same call has already deleted full_entities, full_relations, and full_docs earlier in adelete_by_doc_id. If the later batch-level rebuild_knowledge_from_chunks fails, re-running deletion for those doc IDs cannot reconstruct entities_to_rebuild / relationships_to_rebuild because the per-document graph metadata is gone, so the promised retry path can leave stale KG summaries indefinitely.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-24T14:48:34Z

+            keep_msg = (
+                f"Keeping doc_status for {len(successful_deletions)} docs "
+                f"because rebuild failed — re-trigger deletion to retry"
+            )


Mark batch as failed when deferred rebuild errors

If the deferred rebuild raises, rebuild_ok is set to False but no documents are moved from successful_deletions to failed_deletions; the function only logs a "keep doc_status" message. The final completion summary still reports those documents as successful, which can mislead operators/automation into treating a partially failed batch as complete even though cleanup was intentionally deferred for retry.

Useful? React with 👍 / 👎.

The finally block of `adelete_by_doc_id` calls `_insert_done()` whenever `deletion_operations_started=True`, which the no-chunks early-return path sets even though it only mutates `doc_status` and `full_docs`. `_insert_done()` then triggers `index_done_callback()` on every storage, including the ~62 MB GraphML serialization and the ~500 MB-each NanoVectorDB JSON dumps — for a graph that wasn't touched at all. For the canonical use case of bulk-deleting FAILED ingest documents (WebUI multi-select cleanup), this turns a 200-doc batch into ~300 GB of pure no-op disk writes. This patch branches the finally on `chunk_ids`: * `chunk_ids` non-empty (chunks were removed) → keep `_insert_done()`. The graph + VDBs were genuinely mutated and must be persisted. * `chunk_ids` empty (no-chunks path) → persist only the storages this path actually touched: `doc_status`, `full_docs`, and (when `delete_llm_cache=True` with persisted cache IDs) `llm_response_cache`. Composes cleanly with the existing W2 patch (PR HKUDS#2819 deferred KG rebuild). Both reduce the per-document cost of batch deletion; W5 helps the no-chunks subset specifically while W2 helps the chunk-bearing subset by deferring the LLM-driven graph re-extraction. Tests: tests/test_no_chunks_skip_persist.py exercises both the basic no-chunks path and the no-chunks + delete_llm_cache path, asserting exactly which storages get awaited. All four existing W2 tests still pass — the chunk-bearing path is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

he-yufeng · 2026-05-13T23:14:38Z

Hi, checking in on this PR. I addressed the earlier feedback around overlapping rebuild/delete pools and deferred rebuild error handling, and GitHub still shows the branch clean with lint/offline tests green. Happy to adjust if there is anything else blocking review.

When deleting N documents via background_delete_documents(), each call to adelete_by_doc_id() was triggering a full rebuild_knowledge_from_chunks(). For shared entities spanning multiple docs, this means the same summaries get recomputed N times -- 75x wasted LLM calls on an 85-doc batch. Added skip_rebuild parameter to adelete_by_doc_id(). When set, the per-doc rebuild is skipped and the affected entities/relationships are returned in DeletionResult so the caller can aggregate them. The batch delete loop now collects all targets and does a single rebuild pass at the end. Backwards compatible: skip_rebuild defaults to False, so single-doc deletion and direct API callers behave exactly as before. Closes HKUDS#2795

…failure Address review feedback on two issues: 1. When multiple documents share entities, a later deletion may fully remove an entity that an earlier deletion tagged for rebuild. Before the deferred rebuild, verify each entity/relation still exists in the graph and drop any that were already deleted — avoids wasted LLM calls and potential errors. 2. adelete_by_doc_id(skip_rebuild=True) now keeps doc_status alive. The batch caller only removes doc_status after a successful rebuild. If rebuild fails, docs remain visible so the user can re-trigger deletion (idempotent).

he-yufeng · 2026-06-07T00:44:16Z

Rebased this onto current upstream/main and resolved the conflict in adelete_by_doc_id around the rebuild step.

Kept the PR's skip_rebuild=True deferred rebuild behavior, while preserving upstream's current _build_global_config() call for the normal per-doc rebuild path.

Validated locally on Windows:

python -m pytest tests/test_batch_delete_deferred_rebuild.py -q (4 passed)
python -m ruff check lightrag/lightrag.py lightrag/api/routers/document_routes.py lightrag/base.py tests/test_batch_delete_deferred_rebuild.py
python -m ruff format --check lightrag/lightrag.py lightrag/api/routers/document_routes.py lightrag/base.py tests/test_batch_delete_deferred_rebuild.py
python -m py_compile lightrag/lightrag.py lightrag/api/routers/document_routes.py lightrag/base.py tests/test_batch_delete_deferred_rebuild.py
git diff --check upstream/main..HEAD

he-yufeng · 2026-06-07T00:46:59Z

Follow-up after the lint job: CI's pinned pre-commit ruff-format wanted one additional assert wrapping change in tests/test_batch_delete_deferred_rebuild.py that my local direct ruff format did not catch.

Revalidated locally on Windows:

python -m pytest tests/test_batch_delete_deferred_rebuild.py -q (4 passed)
python -m pre_commit run ruff-format --files tests/test_batch_delete_deferred_rebuild.py lightrag/api/routers/document_routes.py lightrag/lightrag.py lightrag/base.py
python -m pre_commit run ruff --files tests/test_batch_delete_deferred_rebuild.py lightrag/api/routers/document_routes.py lightrag/lightrag.py lightrag/base.py
python -m py_compile lightrag/lightrag.py lightrag/api/routers/document_routes.py lightrag/base.py tests/test_batch_delete_deferred_rebuild.py
git diff --check upstream/main..HEAD

he-yufeng · 2026-06-07T00:55:02Z

Fixed the failing offline test after the rebase follow-up. The failure was from the document-route test double still using the old adelete_by_doc_id signature; it now records the skip_rebuild flag and asserts the batch/background path passes skip_rebuild=True. Validation: pytest tests/api/routes/test_document_routes_docx_archive.py::test_background_delete_removes_parser_hint_file_variants tests/test_batch_delete_deferred_rebuild.py -q passed (5 passed); pre-commit ruff passed on touched files; pre-commit ruff-format passed on touched files; py_compile on touched files passed; git diff --check passed.

he-yufeng force-pushed the fix/batch-delete-deferred-rebuild branch from 7e74326 to 7c7f84b Compare March 24, 2026 02:55

chatgpt-codex-connector Bot reviewed Mar 24, 2026

View reviewed changes

he-yufeng force-pushed the fix/batch-delete-deferred-rebuild branch from 7c7f84b to 18eaa8a Compare March 24, 2026 13:32

danielaskdd added the enhancement New feature or request label Mar 24, 2026

chatgpt-codex-connector Bot reviewed Mar 24, 2026

View reviewed changes

he-yufeng added 3 commits June 7, 2026 08:42

chore: clarify step 11 comment for normal vs batch mode

76360be

he-yufeng force-pushed the fix/batch-delete-deferred-rebuild branch from 415e129 to 06b708f Compare June 7, 2026 00:44

he-yufeng force-pushed the fix/batch-delete-deferred-rebuild branch from 06b708f to bb1cfd2 Compare June 7, 2026 00:46

style: apply ruff-format fixes

aa81afb

he-yufeng force-pushed the fix/batch-delete-deferred-rebuild branch from bb1cfd2 to aa81afb Compare June 7, 2026 00:54

Conversation

he-yufeng commented Mar 21, 2026

Summary

Uh oh!

danielaskdd commented Mar 23, 2026

Uh oh!

he-yufeng commented Mar 23, 2026

Uh oh!

danielaskdd commented Mar 23, 2026

Uh oh!

he-yufeng commented Mar 24, 2026

Uh oh!

danielaskdd commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

he-yufeng commented Mar 24, 2026

Uh oh!

danielaskdd commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

he-yufeng commented May 13, 2026

Uh oh!

he-yufeng commented Jun 7, 2026

Uh oh!

he-yufeng commented Jun 7, 2026

Uh oh!

he-yufeng commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants