fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state)#3342
Conversation
… + graph state (bytedance#3272) Build the deferred catalog + tool_search tool per agent from the policy-filtered tool list (after skill allowed-tools), pass deferred_names + catalog_hash explicitly to DeferredToolFilterMiddleware and the prompt, and record promotions in ThreadState.promoted (scoped by catalog_hash) via a Command-returning tool_search. Removes DeferredToolRegistry and the _registry_var ContextVar so deferral no longer depends on build/execute sharing an async context. MCP tools are tagged with metadata[deerflow_mcp]; client.py assembles deferral the same way. Catalog is built AFTER tool-policy filtering (no policy-excluded tool can leak via tool_search) and assembly is fail-closed. Migrate tests off the deleted registry APIs; delete the obsolete ContextVar-based bytedance#2884 regression (re-covered by state-based tests in a follow-up).
…#2884 isolation regressions
…og test
From independent code review:
- merge_promoted: use existing.get("catalog_hash") so a forward-incompatible
or externally-injected persisted promoted dict triggers a replace instead of
a KeyError crash; add regression test for the malformed-existing case.
- test_deferred_catalog: replace the `== [] or True` tautology (a test that
could never fail) with a deterministic invalid-regex->literal-fallback check
(positive match on calc + negative empty match).
- DeferredToolCatalog: comment why frozen-without-slots is required for the
cached_property hash/names fields (adding slots=True would break them).
…lient DeerFlowClient._ensure_agent called get_app_config() directly to read tool_search.enabled, but the client already resolves and stores its config as self._app_config at construction (and uses it everywhere else). The bare call re-resolves config from disk at agent-build time, which raises FileNotFoundError in environments without a config.yaml (CI) — test_client.py's fixture only patches get_app_config during __init__, so the later call hit the real loader. Use self._app_config, matching the rest of the client.
|
什么时候合入main分支,目前来看,上下文太长了 |
There was a problem hiding this comment.
Pull request overview
This PR fixes deferred MCP tool loading when tool_search.enabled: true by removing the module-level ContextVar registry and replacing it with (1) a build-time immutable deferred-tool catalog closed over by the tool_search tool, and (2) per-thread promotion state persisted in ThreadState.promoted (hash-scoped).
Changes:
- Replaced the mutable deferred-tool
ContextVarregistry withDeferredToolCatalog+build_deferred_tool_setup(...), and madetool_searchwrite promotions into LangGraph state viaCommand(update={...}). - Added
ThreadState.promotedplus amerge_promotedreducer to persist (and hash-scope) promotions across turns. - Updated lead-agent/client wiring to assemble deferred setup after tool-policy filtering, inject the deferred tools prompt section, and pass deferred info into
DeferredToolFilterMiddleware; refreshed/expanded regression tests and docs.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/tests/test_tool_search.py | Narrows to config + prompt-section tests after registry removal. |
| backend/tests/test_thread_state_promoted.py | Adds reducer tests for hash-scoped promotion merging. |
| backend/tests/test_deferred_tool_registry_promotion.py | Removes obsolete ContextVar-registry regression tests (#2884 old architecture). |
| backend/tests/test_deferred_tool_promotion_real_llm.py | Updates real-LLM promotion test to use new deferred setup + middleware ctor args. |
| backend/tests/test_deferred_tool_crosscontext.py | Adds regressions for cross-context execution, policy leak prevention, and fail-closed wiring. |
| backend/tests/test_deferred_setup.py | Unit tests for catalog tagging, setup assembly, and tool_search → Command promotion updates. |
| backend/tests/test_deferred_promotion_integration.py | End-to-end graph test: promote on turn 1, bind promoted tool on turn 2. |
| backend/tests/test_deferred_filter_middleware.py | Unit tests for new middleware behavior (hide until promoted, hash scoping, block messages). |
| backend/tests/test_deferred_catalog.py | Unit tests for catalog search behavior and stable hashing. |
| backend/packages/harness/deerflow/tools/tools.py | Tags MCP tools with metadata["deerflow_mcp"]=True and stops building any registry in tool loading. |
| backend/packages/harness/deerflow/tools/builtins/tool_search.py | Implements DeferredToolCatalog, build_tool_search_tool, and build_deferred_tool_setup (no ContextVar). |
| backend/packages/harness/deerflow/client.py | Wires deferred assembly, prompt injection, and middleware creation for the client-created agent. |
| backend/packages/harness/deerflow/agents/thread_state.py | Adds promoted state and merge_promoted reducer to persist promotions. |
| backend/packages/harness/deerflow/agents/middlewares/deferred_tool_filter_middleware.py | Switches middleware to closure-provided deferred set + state-driven promotions; blocks unpromoted calls. |
| backend/packages/harness/deerflow/agents/lead_agent/prompt.py | Makes deferred-tools prompt section purely data-driven (deferred_names), no config/registry lookup. |
| backend/packages/harness/deerflow/agents/lead_agent/agent.py | Adds _assemble_deferred, passes deferred setup into middleware + prompt, and fail-closes on wiring regressions. |
| backend/CLAUDE.md | Updates middleware documentation to reflect closure + graph-state deferred design. |
| def _filter_tools(self, request: ModelRequest) -> ModelRequest: | ||
| if not self._deferred: | ||
| return request | ||
| hide = self._hidden(request.state) |
There was a problem hiding this comment.
Verified against the pinned langchain==1.2.15: state is a declared dataclass field on ModelRequest (model, messages, system_message, tool_choice, tools, response_format, state, runtime, model_settings), so request.state is the supported access path here, not an attribute that can be missing.
The suggested fallback would actually break this path. ModelRequest.runtime is a Runtime, which has no state attribute (its fields are context, store, stream_writer, previous, execution_info, server_info) — so request.runtime.state is what would raise AttributeError. There are also no request.runtime.state usages anywhere in this repo, so the premise that other handlers read state that way doesn't hold here. Keeping request.state.
|
|
||
| if not registry.contains(tool_name): | ||
| name = str(request.tool_call.get("name") or "") | ||
| if not name or name not in self._hidden(request.state): |
There was a problem hiding this comment.
Same as the _filter_tools thread: on the pinned langgraph==1.1.9, state is a declared dataclass field on ToolCallRequest (tool_call, tool, state, runtime), so request.state is correct. The block path is exercised by test_blocked_message_for_unpromoted_deferred_call plus the end-to-end promotion test, both green in CI.
For completeness: here request.runtime is a ToolRuntime, which does expose .state, so the suggested fallback wouldn't crash on this path — but it's unnecessary, and a getattr(request, "state", None) fallback would silently swallow a real wiring regression, which runs counter to this PR's fail-closed design. Keeping request.state.
| setup = build_deferred_tool_setup(filtered_tools, enabled=enabled) | ||
| if enabled and not setup.deferred_names and any(_is_mcp_tool(t) for t in filtered_tools): | ||
| raise RuntimeError("tool_search enabled and MCP tools survived policy filtering, but no deferred set was recovered — refusing to bind MCP schemas (fail-closed).") | ||
| final_tools = list(filtered_tools) + ([setup.tool_search_tool] if setup.tool_search_tool else []) | ||
| return final_tools, setup |
There was a problem hiding this comment.
Intentional — and a good thing to lock down. tool_search is appended after policy filtering, but it's only ever produced when MCP tools survive that filter: build_deferred_tool_setup returns a None tool when no deferred tool remains, and its catalog is derived from the already policy-filtered list, so it can never expose a tool the allowlist denied. Coupling tool_search's presence to surviving deferred tools also removes an upstream footgun where an allowlist that permitted MCP tools but omitted tool_search would strip the search tool and leave those deferred tools permanently unreachable.
Locked the contract with two regression tests in tests/test_deferred_tool_crosscontext.py — test_policy_denied_mcp_yields_no_tool_search_end_to_end and test_tool_search_appended_after_policy_but_never_exposes_denied_tool — in 92d463e.
tool_search is appended after skill-allowlist filtering, so the allowlist can no longer deny it by name. Lock the intended contract: it only appears when allowed MCP tools survive the filter, and its catalog (derived from the already policy-filtered list) can never expose a denied tool. Addresses the ordering observation from the Copilot review on bytedance#3342.
|
@crystal-wk 争取今天合入,麻烦到时候再试下 |
…ed-tool setup (#3370) Follow-up to #3342 (deferred MCP tool loading). Maintainability cleanup plus hardening of malformed/empty tool_search queries; no change to the deferral mechanism or search ranking. - Add deerflow/tools/mcp_metadata.py as the single source of truth for the "deerflow_mcp" tag (MCP_TOOL_METADATA_KEY + tag_mcp_tool + public is_mcp_tool). Removes the duplicated magic string and the private, cross-module _is_mcp_tool import. - tool_search.search: never raise on model-generated input. Extract _compile_catalog_regex (shared compile-with-literal-fallback); return empty for empty/whitespace queries and a bare "+" instead of matching everything or raising IndexError. - DeferredToolSetup: document the empty-vs-populated invariant. - build_deferred_tool_setup: comment the two distinct empty-return branches. - _assemble_deferred: add return type, rename local to deferred_setup, build the final list with an explicit append. - Tests: use tag_mcp_tool instead of per-file tag helpers; cover empty and bare-"+" queries.
Why
With
tool_search.enabled: true, deferred MCP tool schemas were still bound into the model every turn and kept consuming the context window — the opposite of what the flag promises (#3272).Root cause: deferred-tool state lived in a module-level
ContextVar(DeferredToolRegistry), populated when the graph was built. LangGraph runs the model call in a different execution context (contexts are copied at task-creation;ContextVar.set()only affects the current and later-derived contexts). At execute time the registry the filter middleware read was empty, so nothing was hidden and full schemas were bound.That is shared, process-global, repeatedly-mutated state. Beyond #3272 it is fragile under concurrency/re-entrancy — e.g. the already-closed #2884 (the registry was unconditionally reset on every
get_available_toolscall, breaking promotion). Guarding symptoms on top of shared mutable state invites more of the same, so this change removes the shared state instead of guarding it, which structurally eliminates that whole class of bug (the band-aid #2884 needed is no longer reachable).What Changed
Replace the
ContextVar+ mutableDeferredToolRegistrywith two context-independent pieces:build_deferred_tool_setup(filtered_tools, enabled=...)builds an immutableDeferredToolCatalog(stable content hash) and atool_searchtool that closes over that catalog. The deferred-name set and catalog hash are plain values handed to the middleware at construction — no global state.ThreadState.promoted({catalog_hash, names}) via a hash-scopedmerge_promotedreducer. Thetool_searchtool returnsCommand(update={"promoted": ...}), so promotion flows through LangGraph state and works regardless of which context built or ran the graph.DeferredToolFilterMiddlewarereadsstate.promoted(only when the catalog hash matches) to decide what to unhide.Supporting changes:
metadata["deerflow_mcp"](stable provenance); the deferred set is derived after skill/policy filtering, so a policy-removed tool is never searchable.tool_search.enabledand MCP tools survive filtering but the deferred set didn't reach the middleware, construction raises instead of silently binding full schemas.DeferredToolRegistry, theContextVar, its get/set/reset helpers, the ~40-line [BUG] DeferredToolRegistry is unconditionally reset on every get_available_tools call, breaking tool promotion #2884 "re-entrant reuse" special-case, and thereset_deferred_registry()test fixture.Net: 17 files, +711 / −1263.
Testing
make test→ 3833 passed, 16 skipped, 0 failed;make lint+ruff format --checkclean.create_agentgraph (turn 1 hides →tool_searchpromotes one → turn 2 binds only that one); policy-leak, fail-closed, hash-scoped promotion, and [BUG] DeferredToolRegistry is unconditionally reset on every get_available_tools call, breaking tool promotion #2884-style subagent re-entry isolation.Known limitation / follow-up
Subagents (
SubagentExecutor) still bind full MCP schemas whentool_searchis enabled. Pre-existing — upstream only ever attachedDeferredToolFilterMiddlewareto the lead agent — and this PR keeps that scope (no new leak; subagents neither gained nor lost deferral). Extending deferred loading to subagents: #3341.Risk / Rollback
tool_search.enabled(default off); with it off the path is unchanged. The fail-closed guard surfaces wiring regressions loudly.promotedstate field is additive and ignored when absent.Fixes #3272.