fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state) by ShenAC-SAC · Pull Request #3342 · bytedance/deer-flow

ShenAC-SAC · 2026-06-01T14:25:01Z

Why

With tool_search.enabled: true, deferred MCP tool schemas were still bound into the model every turn and kept consuming the context window — the opposite of what the flag promises (#3272).

Root cause: deferred-tool state lived in a module-level ContextVar (DeferredToolRegistry), populated when the graph was built. LangGraph runs the model call in a different execution context (contexts are copied at task-creation; ContextVar.set() only affects the current and later-derived contexts). At execute time the registry the filter middleware read was empty, so nothing was hidden and full schemas were bound.

That is shared, process-global, repeatedly-mutated state. Beyond #3272 it is fragile under concurrency/re-entrancy — e.g. the already-closed #2884 (the registry was unconditionally reset on every get_available_tools call, breaking promotion). Guarding symptoms on top of shared mutable state invites more of the same, so this change removes the shared state instead of guarding it, which structurally eliminates that whole class of bug (the band-aid #2884 needed is no longer reachable).

What Changed

Replace the ContextVar + mutable DeferredToolRegistry with two context-independent pieces:

Build-time immutable closure. build_deferred_tool_setup(filtered_tools, enabled=...) builds an immutable DeferredToolCatalog (stable content hash) and a tool_search tool that closes over that catalog. The deferred-name set and catalog hash are plain values handed to the middleware at construction — no global state.
Per-thread graph state. Promotions persist in ThreadState.promoted ({catalog_hash, names}) via a hash-scoped merge_promoted reducer. The tool_search tool returns Command(update={"promoted": ...}), so promotion flows through LangGraph state and works regardless of which context built or ran the graph. DeferredToolFilterMiddleware reads state.promoted (only when the catalog hash matches) to decide what to unhide.

Supporting changes:

MCP tools tagged with metadata["deerflow_mcp"] (stable provenance); the deferred set is derived after skill/policy filtering, so a policy-removed tool is never searchable.
Fail-closed: if tool_search.enabled and MCP tools survive filtering but the deferred set didn't reach the middleware, construction raises instead of silently binding full schemas.
Deleted: DeferredToolRegistry, the ContextVar, its get/set/reset helpers, the ~40-line [BUG] DeferredToolRegistry is unconditionally reset on every get_available_tools call, breaking tool promotion #2884 "re-entrant reuse" special-case, and the reset_deferred_registry() test fixture.

Net: 17 files, +711 / −1263.

Testing

make test → 3833 passed, 16 skipped, 0 failed; make lint + ruff format --check clean.
New regressions: cross-context (graph built in one asyncio task, run in a sibling that didn't inherit it, still hides deferred tools — the MCP 工具描述占用模型上下文 #3272 lock that the old ContextVar fails); end-to-end promotion through a real create_agent graph (turn 1 hides → tool_search promotes one → turn 2 binds only that one); policy-leak, fail-closed, hash-scoped promotion, and [BUG] DeferredToolRegistry is unconditionally reset on every get_available_tools call, breaking tool promotion #2884-style subagent re-entry isolation.

Known limitation / follow-up

Subagents (SubagentExecutor) still bind full MCP schemas when tool_search is enabled. Pre-existing — upstream only ever attached DeferredToolFilterMiddleware to the lead agent — and this PR keeps that scope (no new leak; subagents neither gained nor lost deferral). Extending deferred loading to subagents: #3341.

Risk / Rollback

Risk: low. Gated by tool_search.enabled (default off); with it off the path is unchanged. The fail-closed guard surfaces wiring regressions loudly.
Rollback: revert the PR; no migration — the promoted state field is additive and ignored when absent.

Fixes #3272.

…ol_search

… + graph state (bytedance#3272) Build the deferred catalog + tool_search tool per agent from the policy-filtered tool list (after skill allowed-tools), pass deferred_names + catalog_hash explicitly to DeferredToolFilterMiddleware and the prompt, and record promotions in ThreadState.promoted (scoped by catalog_hash) via a Command-returning tool_search. Removes DeferredToolRegistry and the _registry_var ContextVar so deferral no longer depends on build/execute sharing an async context. MCP tools are tagged with metadata[deerflow_mcp]; client.py assembles deferral the same way. Catalog is built AFTER tool-policy filtering (no policy-excluded tool can leak via tool_search) and assembly is fail-closed. Migrate tests off the deleted registry APIs; delete the obsolete ContextVar-based bytedance#2884 regression (re-covered by state-based tests in a follow-up).

…a graph state

…#2884 isolation regressions

…te design

…og test From independent code review: - merge_promoted: use existing.get("catalog_hash") so a forward-incompatible or externally-injected persisted promoted dict triggers a replace instead of a KeyError crash; add regression test for the malformed-existing case. - test_deferred_catalog: replace the `== [] or True` tautology (a test that could never fail) with a deterministic invalid-regex->literal-fallback check (positive match on calc + negative empty match). - DeferredToolCatalog: comment why frozen-without-slots is required for the cached_property hash/names fields (adding slots=True would break them).

…lient DeerFlowClient._ensure_agent called get_app_config() directly to read tool_search.enabled, but the client already resolves and stores its config as self._app_config at construction (and uses it everywhere else). The bare call re-resolves config from disk at agent-build time, which raises FileNotFoundError in environments without a config.yaml (CI) — test_client.py's fixture only patches get_app_config during __init__, so the later call hit the real loader. Use self._app_config, matching the rest of the client.

crystal-wk · 2026-06-02T02:03:35Z

什么时候合入main分支，目前来看，上下文太长了

Copilot

Pull request overview

This PR fixes deferred MCP tool loading when tool_search.enabled: true by removing the module-level ContextVar registry and replacing it with (1) a build-time immutable deferred-tool catalog closed over by the tool_search tool, and (2) per-thread promotion state persisted in ThreadState.promoted (hash-scoped).

Changes:

Replaced the mutable deferred-tool ContextVar registry with DeferredToolCatalog + build_deferred_tool_setup(...), and made tool_search write promotions into LangGraph state via Command(update={...}).
Added ThreadState.promoted plus a merge_promoted reducer to persist (and hash-scope) promotions across turns.
Updated lead-agent/client wiring to assemble deferred setup after tool-policy filtering, inject the deferred tools prompt section, and pass deferred info into DeferredToolFilterMiddleware; refreshed/expanded regression tests and docs.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
backend/tests/test_tool_search.py	Narrows to config + prompt-section tests after registry removal.
backend/tests/test_thread_state_promoted.py	Adds reducer tests for hash-scoped promotion merging.
backend/tests/test_deferred_tool_registry_promotion.py	Removes obsolete ContextVar-registry regression tests (#2884 old architecture).
backend/tests/test_deferred_tool_promotion_real_llm.py	Updates real-LLM promotion test to use new deferred setup + middleware ctor args.
backend/tests/test_deferred_tool_crosscontext.py	Adds regressions for cross-context execution, policy leak prevention, and fail-closed wiring.
backend/tests/test_deferred_setup.py	Unit tests for catalog tagging, setup assembly, and `tool_search` → `Command` promotion updates.
backend/tests/test_deferred_promotion_integration.py	End-to-end graph test: promote on turn 1, bind promoted tool on turn 2.
backend/tests/test_deferred_filter_middleware.py	Unit tests for new middleware behavior (hide until promoted, hash scoping, block messages).
backend/tests/test_deferred_catalog.py	Unit tests for catalog search behavior and stable hashing.
backend/packages/harness/deerflow/tools/tools.py	Tags MCP tools with `metadata["deerflow_mcp"]=True` and stops building any registry in tool loading.
backend/packages/harness/deerflow/tools/builtins/tool_search.py	Implements `DeferredToolCatalog`, `build_tool_search_tool`, and `build_deferred_tool_setup` (no ContextVar).
backend/packages/harness/deerflow/client.py	Wires deferred assembly, prompt injection, and middleware creation for the client-created agent.
backend/packages/harness/deerflow/agents/thread_state.py	Adds `promoted` state and `merge_promoted` reducer to persist promotions.
backend/packages/harness/deerflow/agents/middlewares/deferred_tool_filter_middleware.py	Switches middleware to closure-provided deferred set + state-driven promotions; blocks unpromoted calls.
backend/packages/harness/deerflow/agents/lead_agent/prompt.py	Makes deferred-tools prompt section purely data-driven (`deferred_names`), no config/registry lookup.
backend/packages/harness/deerflow/agents/lead_agent/agent.py	Adds `_assemble_deferred`, passes deferred setup into middleware + prompt, and fail-closes on wiring regressions.
backend/CLAUDE.md	Updates middleware documentation to reflect closure + graph-state deferred design.

ShenAC-SAC · 2026-06-02T02:25:05Z

+    def _filter_tools(self, request: ModelRequest) -> ModelRequest:
+        if not self._deferred:
+            return request
+        hide = self._hidden(request.state)


Verified against the pinned langchain==1.2.15: state is a declared dataclass field on ModelRequest (model, messages, system_message, tool_choice, tools, response_format, state, runtime, model_settings), so request.state is the supported access path here, not an attribute that can be missing.

The suggested fallback would actually break this path. ModelRequest.runtime is a Runtime, which has no state attribute (its fields are context, store, stream_writer, previous, execution_info, server_info) — so request.runtime.state is what would raise AttributeError. There are also no request.runtime.state usages anywhere in this repo, so the premise that other handlers read state that way doesn't hold here. Keeping request.state.

ShenAC-SAC · 2026-06-02T02:25:07Z

-
-        if not registry.contains(tool_name):
+        name = str(request.tool_call.get("name") or "")
+        if not name or name not in self._hidden(request.state):


Same as the _filter_tools thread: on the pinned langgraph==1.1.9, state is a declared dataclass field on ToolCallRequest (tool_call, tool, state, runtime), so request.state is correct. The block path is exercised by test_blocked_message_for_unpromoted_deferred_call plus the end-to-end promotion test, both green in CI.

For completeness: here request.runtime is a ToolRuntime, which does expose .state, so the suggested fallback wouldn't crash on this path — but it's unnecessary, and a getattr(request, "state", None) fallback would silently swallow a real wiring regression, which runs counter to this PR's fail-closed design. Keeping request.state.

ShenAC-SAC · 2026-06-02T02:25:09Z

+    setup = build_deferred_tool_setup(filtered_tools, enabled=enabled)
+    if enabled and not setup.deferred_names and any(_is_mcp_tool(t) for t in filtered_tools):
+        raise RuntimeError("tool_search enabled and MCP tools survived policy filtering, but no deferred set was recovered — refusing to bind MCP schemas (fail-closed).")
+    final_tools = list(filtered_tools) + ([setup.tool_search_tool] if setup.tool_search_tool else [])
+    return final_tools, setup


Intentional — and a good thing to lock down. tool_search is appended after policy filtering, but it's only ever produced when MCP tools survive that filter: build_deferred_tool_setup returns a None tool when no deferred tool remains, and its catalog is derived from the already policy-filtered list, so it can never expose a tool the allowlist denied. Coupling tool_search's presence to surviving deferred tools also removes an upstream footgun where an allowlist that permitted MCP tools but omitted tool_search would strip the search tool and leave those deferred tools permanently unreachable.

Locked the contract with two regression tests in tests/test_deferred_tool_crosscontext.py — test_policy_denied_mcp_yields_no_tool_search_end_to_end and test_tool_search_appended_after_policy_but_never_exposes_denied_tool — in 92d463e.

tool_search is appended after skill-allowlist filtering, so the allowlist can no longer deny it by name. Lock the intended contract: it only appears when allowed MCP tools survive the filter, and its catalog (derived from the already policy-filtered list) can never expose a denied tool. Addresses the ordering observation from the Copilot review on bytedance#3342.

ShenAC-SAC · 2026-06-02T02:30:27Z

@crystal-wk 争取今天合入，麻烦到时候再试下

…ed-tool setup (#3370) Follow-up to #3342 (deferred MCP tool loading). Maintainability cleanup plus hardening of malformed/empty tool_search queries; no change to the deferral mechanism or search ranking. - Add deerflow/tools/mcp_metadata.py as the single source of truth for the "deerflow_mcp" tag (MCP_TOOL_METADATA_KEY + tag_mcp_tool + public is_mcp_tool). Removes the duplicated magic string and the private, cross-module _is_mcp_tool import. - tool_search.search: never raise on model-generated input. Extract _compile_catalog_regex (shared compile-with-literal-fallback); return empty for empty/whitespace queries and a bare "+" instead of matching everything or raising IndexError. - DeferredToolSetup: document the empty-vs-populated invariant. - build_deferred_tool_setup: comment the two distinct empty-return branches. - _assemble_deferred: add return type, rename local to deferred_setup, build the final list with an explicit append. - Tests: use tag_mcp_tool instead of per-file tag helpers; cover empty and bare-"+" queries.

ShenAC-SAC added 10 commits June 1, 2026 22:21

feat(tool-search): add hash-scoped promoted state to ThreadState

1a48fb7

feat(tool-search): add immutable DeferredToolCatalog with stable hash

6f79ee3

feat(tool-search): add build_deferred_tool_setup + Command-writing to…

2c6a210

…ol_search

test(tool-search): lock tool_search promotion into next model turn vi…

3379ee5

…a graph state

test(tool-search): cross-context, policy-leak, fail-closed, bytedance…

7d3f14b

…#2884 isolation regressions

test(tool-search): align real-LLM e2e with closure-based deferred setup

c0a94aa

docs: update DeferredToolFilterMiddleware description for closure+sta…

4cf13dd

…te design

style(tests): drop unused import in test_deferred_setup (ruff)

4052efe

ShenAC-SAC marked this pull request as ready for review June 1, 2026 14:28

WillemJiang requested a review from Copilot June 2, 2026 02:04

Copilot started reviewing on behalf of WillemJiang June 2, 2026 02:04 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

ShenAC-SAC added the reviewing A maintainer is reviewing this PR label Jun 2, 2026

WillemJiang approved these changes Jun 2, 2026

View reviewed changes

WillemJiang merged commit d9f4724 into bytedance:main Jun 2, 2026
6 checks passed

ShenAC-SAC mentioned this pull request Jun 3, 2026

refactor(tool-search): consolidate MCP metadata tag and harden deferred-tool setup #3370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state)#3342

fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state)#3342
WillemJiang merged 12 commits into
bytedance:mainfrom
ShenAC-SAC:fix/3272-deferred-tool-pr

ShenAC-SAC commented Jun 1, 2026

Uh oh!

crystal-wk commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

ShenAC-SAC Jun 2, 2026

Uh oh!

ShenAC-SAC Jun 2, 2026

Uh oh!

ShenAC-SAC Jun 2, 2026

Uh oh!

ShenAC-SAC commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ShenAC-SAC commented Jun 1, 2026

Why

What Changed

Testing

Known limitation / follow-up

Risk / Rollback

Uh oh!

crystal-wk commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

ShenAC-SAC Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

ShenAC-SAC Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

ShenAC-SAC Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

ShenAC-SAC commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants