Skip to content

fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state)#3342

Merged
WillemJiang merged 12 commits into
bytedance:mainfrom
ShenAC-SAC:fix/3272-deferred-tool-pr
Jun 2, 2026
Merged

fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state)#3342
WillemJiang merged 12 commits into
bytedance:mainfrom
ShenAC-SAC:fix/3272-deferred-tool-pr

Conversation

@ShenAC-SAC
Copy link
Copy Markdown
Collaborator

Why

With tool_search.enabled: true, deferred MCP tool schemas were still bound into the model every turn and kept consuming the context window — the opposite of what the flag promises (#3272).

Root cause: deferred-tool state lived in a module-level ContextVar (DeferredToolRegistry), populated when the graph was built. LangGraph runs the model call in a different execution context (contexts are copied at task-creation; ContextVar.set() only affects the current and later-derived contexts). At execute time the registry the filter middleware read was empty, so nothing was hidden and full schemas were bound.

That is shared, process-global, repeatedly-mutated state. Beyond #3272 it is fragile under concurrency/re-entrancy — e.g. the already-closed #2884 (the registry was unconditionally reset on every get_available_tools call, breaking promotion). Guarding symptoms on top of shared mutable state invites more of the same, so this change removes the shared state instead of guarding it, which structurally eliminates that whole class of bug (the band-aid #2884 needed is no longer reachable).

What Changed

Replace the ContextVar + mutable DeferredToolRegistry with two context-independent pieces:

  • Build-time immutable closure. build_deferred_tool_setup(filtered_tools, enabled=...) builds an immutable DeferredToolCatalog (stable content hash) and a tool_search tool that closes over that catalog. The deferred-name set and catalog hash are plain values handed to the middleware at construction — no global state.
  • Per-thread graph state. Promotions persist in ThreadState.promoted ({catalog_hash, names}) via a hash-scoped merge_promoted reducer. The tool_search tool returns Command(update={"promoted": ...}), so promotion flows through LangGraph state and works regardless of which context built or ran the graph. DeferredToolFilterMiddleware reads state.promoted (only when the catalog hash matches) to decide what to unhide.

Supporting changes:

  • MCP tools tagged with metadata["deerflow_mcp"] (stable provenance); the deferred set is derived after skill/policy filtering, so a policy-removed tool is never searchable.
  • Fail-closed: if tool_search.enabled and MCP tools survive filtering but the deferred set didn't reach the middleware, construction raises instead of silently binding full schemas.
  • Deleted: DeferredToolRegistry, the ContextVar, its get/set/reset helpers, the ~40-line [BUG] DeferredToolRegistry is unconditionally reset on every get_available_tools call, breaking tool promotion #2884 "re-entrant reuse" special-case, and the reset_deferred_registry() test fixture.

Net: 17 files, +711 / −1263.

Testing

Known limitation / follow-up

Subagents (SubagentExecutor) still bind full MCP schemas when tool_search is enabled. Pre-existing — upstream only ever attached DeferredToolFilterMiddleware to the lead agent — and this PR keeps that scope (no new leak; subagents neither gained nor lost deferral). Extending deferred loading to subagents: #3341.

Risk / Rollback

  • Risk: low. Gated by tool_search.enabled (default off); with it off the path is unchanged. The fail-closed guard surfaces wiring regressions loudly.
  • Rollback: revert the PR; no migration — the promoted state field is additive and ignored when absent.

Fixes #3272.

ShenAC-SAC added 10 commits June 1, 2026 22:21
… + graph state (bytedance#3272)

Build the deferred catalog + tool_search tool per agent from the policy-filtered
tool list (after skill allowed-tools), pass deferred_names + catalog_hash
explicitly to DeferredToolFilterMiddleware and the prompt, and record promotions
in ThreadState.promoted (scoped by catalog_hash) via a Command-returning
tool_search. Removes DeferredToolRegistry and the _registry_var ContextVar so
deferral no longer depends on build/execute sharing an async context. MCP tools
are tagged with metadata[deerflow_mcp]; client.py assembles deferral the same way.

Catalog is built AFTER tool-policy filtering (no policy-excluded tool can leak via
tool_search) and assembly is fail-closed. Migrate tests off the deleted registry
APIs; delete the obsolete ContextVar-based bytedance#2884 regression (re-covered by
state-based tests in a follow-up).
…og test

From independent code review:
- merge_promoted: use existing.get("catalog_hash") so a forward-incompatible
  or externally-injected persisted promoted dict triggers a replace instead of
  a KeyError crash; add regression test for the malformed-existing case.
- test_deferred_catalog: replace the `== [] or True` tautology (a test that
  could never fail) with a deterministic invalid-regex->literal-fallback check
  (positive match on calc + negative empty match).
- DeferredToolCatalog: comment why frozen-without-slots is required for the
  cached_property hash/names fields (adding slots=True would break them).
@ShenAC-SAC ShenAC-SAC marked this pull request as ready for review June 1, 2026 14:28
…lient

DeerFlowClient._ensure_agent called get_app_config() directly to read
tool_search.enabled, but the client already resolves and stores its config as
self._app_config at construction (and uses it everywhere else). The bare call
re-resolves config from disk at agent-build time, which raises FileNotFoundError
in environments without a config.yaml (CI) — test_client.py's fixture only
patches get_app_config during __init__, so the later call hit the real loader.
Use self._app_config, matching the rest of the client.
@crystal-wk
Copy link
Copy Markdown

什么时候合入main分支,目前来看,上下文太长了

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes deferred MCP tool loading when tool_search.enabled: true by removing the module-level ContextVar registry and replacing it with (1) a build-time immutable deferred-tool catalog closed over by the tool_search tool, and (2) per-thread promotion state persisted in ThreadState.promoted (hash-scoped).

Changes:

  • Replaced the mutable deferred-tool ContextVar registry with DeferredToolCatalog + build_deferred_tool_setup(...), and made tool_search write promotions into LangGraph state via Command(update={...}).
  • Added ThreadState.promoted plus a merge_promoted reducer to persist (and hash-scope) promotions across turns.
  • Updated lead-agent/client wiring to assemble deferred setup after tool-policy filtering, inject the deferred tools prompt section, and pass deferred info into DeferredToolFilterMiddleware; refreshed/expanded regression tests and docs.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
backend/tests/test_tool_search.py Narrows to config + prompt-section tests after registry removal.
backend/tests/test_thread_state_promoted.py Adds reducer tests for hash-scoped promotion merging.
backend/tests/test_deferred_tool_registry_promotion.py Removes obsolete ContextVar-registry regression tests (#2884 old architecture).
backend/tests/test_deferred_tool_promotion_real_llm.py Updates real-LLM promotion test to use new deferred setup + middleware ctor args.
backend/tests/test_deferred_tool_crosscontext.py Adds regressions for cross-context execution, policy leak prevention, and fail-closed wiring.
backend/tests/test_deferred_setup.py Unit tests for catalog tagging, setup assembly, and tool_searchCommand promotion updates.
backend/tests/test_deferred_promotion_integration.py End-to-end graph test: promote on turn 1, bind promoted tool on turn 2.
backend/tests/test_deferred_filter_middleware.py Unit tests for new middleware behavior (hide until promoted, hash scoping, block messages).
backend/tests/test_deferred_catalog.py Unit tests for catalog search behavior and stable hashing.
backend/packages/harness/deerflow/tools/tools.py Tags MCP tools with metadata["deerflow_mcp"]=True and stops building any registry in tool loading.
backend/packages/harness/deerflow/tools/builtins/tool_search.py Implements DeferredToolCatalog, build_tool_search_tool, and build_deferred_tool_setup (no ContextVar).
backend/packages/harness/deerflow/client.py Wires deferred assembly, prompt injection, and middleware creation for the client-created agent.
backend/packages/harness/deerflow/agents/thread_state.py Adds promoted state and merge_promoted reducer to persist promotions.
backend/packages/harness/deerflow/agents/middlewares/deferred_tool_filter_middleware.py Switches middleware to closure-provided deferred set + state-driven promotions; blocks unpromoted calls.
backend/packages/harness/deerflow/agents/lead_agent/prompt.py Makes deferred-tools prompt section purely data-driven (deferred_names), no config/registry lookup.
backend/packages/harness/deerflow/agents/lead_agent/agent.py Adds _assemble_deferred, passes deferred setup into middleware + prompt, and fail-closes on wiring regressions.
backend/CLAUDE.md Updates middleware documentation to reflect closure + graph-state deferred design.

def _filter_tools(self, request: ModelRequest) -> ModelRequest:
if not self._deferred:
return request
hide = self._hidden(request.state)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified against the pinned langchain==1.2.15: state is a declared dataclass field on ModelRequest (model, messages, system_message, tool_choice, tools, response_format, state, runtime, model_settings), so request.state is the supported access path here, not an attribute that can be missing.

The suggested fallback would actually break this path. ModelRequest.runtime is a Runtime, which has no state attribute (its fields are context, store, stream_writer, previous, execution_info, server_info) — so request.runtime.state is what would raise AttributeError. There are also no request.runtime.state usages anywhere in this repo, so the premise that other handlers read state that way doesn't hold here. Keeping request.state.


if not registry.contains(tool_name):
name = str(request.tool_call.get("name") or "")
if not name or name not in self._hidden(request.state):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the _filter_tools thread: on the pinned langgraph==1.1.9, state is a declared dataclass field on ToolCallRequest (tool_call, tool, state, runtime), so request.state is correct. The block path is exercised by test_blocked_message_for_unpromoted_deferred_call plus the end-to-end promotion test, both green in CI.

For completeness: here request.runtime is a ToolRuntime, which does expose .state, so the suggested fallback wouldn't crash on this path — but it's unnecessary, and a getattr(request, "state", None) fallback would silently swallow a real wiring regression, which runs counter to this PR's fail-closed design. Keeping request.state.

Comment on lines +369 to +373
setup = build_deferred_tool_setup(filtered_tools, enabled=enabled)
if enabled and not setup.deferred_names and any(_is_mcp_tool(t) for t in filtered_tools):
raise RuntimeError("tool_search enabled and MCP tools survived policy filtering, but no deferred set was recovered — refusing to bind MCP schemas (fail-closed).")
final_tools = list(filtered_tools) + ([setup.tool_search_tool] if setup.tool_search_tool else [])
return final_tools, setup
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — and a good thing to lock down. tool_search is appended after policy filtering, but it's only ever produced when MCP tools survive that filter: build_deferred_tool_setup returns a None tool when no deferred tool remains, and its catalog is derived from the already policy-filtered list, so it can never expose a tool the allowlist denied. Coupling tool_search's presence to surviving deferred tools also removes an upstream footgun where an allowlist that permitted MCP tools but omitted tool_search would strip the search tool and leave those deferred tools permanently unreachable.

Locked the contract with two regression tests in tests/test_deferred_tool_crosscontext.pytest_policy_denied_mcp_yields_no_tool_search_end_to_end and test_tool_search_appended_after_policy_but_never_exposes_denied_tool — in 92d463e.

tool_search is appended after skill-allowlist filtering, so the allowlist
can no longer deny it by name. Lock the intended contract: it only appears
when allowed MCP tools survive the filter, and its catalog (derived from the
already policy-filtered list) can never expose a denied tool. Addresses the
ordering observation from the Copilot review on bytedance#3342.
@ShenAC-SAC
Copy link
Copy Markdown
Collaborator Author

@crystal-wk 争取今天合入,麻烦到时候再试下

@ShenAC-SAC ShenAC-SAC added the reviewing A maintainer is reviewing this PR label Jun 2, 2026
@WillemJiang WillemJiang merged commit d9f4724 into bytedance:main Jun 2, 2026
6 checks passed
WillemJiang pushed a commit that referenced this pull request Jun 5, 2026
…ed-tool setup (#3370)

Follow-up to #3342 (deferred MCP tool loading). Maintainability cleanup plus
hardening of malformed/empty tool_search queries; no change to the deferral
mechanism or search ranking.

- Add deerflow/tools/mcp_metadata.py as the single source of truth for the
  "deerflow_mcp" tag (MCP_TOOL_METADATA_KEY + tag_mcp_tool + public
  is_mcp_tool). Removes the duplicated magic string and the private,
  cross-module _is_mcp_tool import.
- tool_search.search: never raise on model-generated input. Extract
  _compile_catalog_regex (shared compile-with-literal-fallback); return empty
  for empty/whitespace queries and a bare "+" instead of matching everything
  or raising IndexError.
- DeferredToolSetup: document the empty-vs-populated invariant.
- build_deferred_tool_setup: comment the two distinct empty-return branches.
- _assemble_deferred: add return type, rename local to deferred_setup, build
  the final list with an explicit append.
- Tests: use tag_mcp_tool instead of per-file tag helpers; cover empty and
  bare-"+" queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reviewing A maintainer is reviewing this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP 工具描述占用模型上下文 [BUG] DeferredToolRegistry is unconditionally reset on every get_available_tools call, breaking tool promotion

4 participants