fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state) (#3342)

* feat(tool-search): add hash-scoped promoted state to ThreadState * feat(tool-search): add immutable DeferredToolCatalog with stable hash * feat(tool-search): add build_deferred_tool_setup + Command-writing tool_search * refactor(tool-search): replace deferred-tool ContextVar with closures + graph state (#3272) Build the deferred catalog + tool_search tool per agent from the policy-filtered tool list (after skill allowed-tools), pass deferred_names + catalog_hash explicitly to DeferredToolFilterMiddleware and the prompt, and record promotions in ThreadState.promoted (scoped by catalog_hash) via a Command-returning tool_search. Removes DeferredToolRegistry and the _registry_var ContextVar so deferral no longer depends on build/execute sharing an async context. MCP tools are tagged with metadata[deerflow_mcp]; client.py assembles deferral the same way. Catalog is built AFTER tool-policy filtering (no policy-excluded tool can leak via tool_search) and assembly is fail-closed. Migrate tests off the deleted registry APIs; delete the obsolete ContextVar-based #2884 regression (re-covered by state-based tests in a follow-up). * test(tool-search): lock tool_search promotion into next model turn via graph state * test(tool-search): cross-context, policy-leak, fail-closed, #2884 isolation regressions * test(tool-search): align real-LLM e2e with closure-based deferred setup * docs: update DeferredToolFilterMiddleware description for closure+state design * style(tests): drop unused import in test_deferred_setup (ruff) * test(tool-search): harden merge_promoted + replace tautological catalog test From independent code review: - merge_promoted: use existing.get("catalog_hash") so a forward-incompatible or externally-injected persisted promoted dict triggers a replace instead of a KeyError crash; add regression test for the malformed-existing case. - test_deferred_catalog: replace the `== [] or True` tautology (a test that could never fail) with a deterministic invalid-regex->literal-fallback check (positive match on calc + negative empty match). - DeferredToolCatalog: comment why frozen-without-slots is required for the cached_property hash/names fields (adding slots=True would break them). * fix(tool-search): read tool_search.enabled from self._app_config in client DeerFlowClient._ensure_agent called get_app_config() directly to read tool_search.enabled, but the client already resolves and stores its config as self._app_config at construction (and uses it everywhere else). The bare call re-resolves config from disk at agent-build time, which raises FileNotFoundError in environments without a config.yaml (CI) — test_client.py's fixture only patches get_app_config during __init__, so the later call hit the real loader. Use self._app_config, matching the rest of the client. * test(tool-search): lock tool_search post-policy append ordering tool_search is appended after skill-allowlist filtering, so the allowlist can no longer deny it by name. Lock the intended contract: it only appears when allowed MCP tools survive the filter, and its catalog (derived from the already policy-filtered list) can never expose a denied tool. Addresses the ordering observation from the Copilot review on #3342.
2026-06-10 09:25:57 +00:00 · 2026-06-02 22:43:22 +08:00
parent 74e3e80cf6
commit d9f4724950
17 changed files with 768 additions and 1267 deletions
@@ -270,6 +270,7 @@ def _build_middlewares(
    custom_middlewares: list[AgentMiddleware] | None = None,
    *,
    app_config: AppConfig | None = None,
+    deferred_setup=None,
 ):
    """Build middleware chain based on runtime configuration.

@@ -318,11 +319,13 @@ def _build_middlewares(
    if model_config is not None and model_config.supports_vision:
        middlewares.append(ViewImageMiddleware())

-    # Add DeferredToolFilterMiddleware to hide deferred tool schemas from model binding
-    if resolved_app_config.tool_search.enabled:
+    # Hide deferred tool schemas from model binding until tool_search promotes them.
+    # The deferred set + catalog hash come from the build-time setup (assembled
+    # after tool-policy filtering); promotion is read from graph state.
+    if deferred_setup is not None and deferred_setup.deferred_names:
        from deerflow.agents.middlewares.deferred_tool_filter_middleware import DeferredToolFilterMiddleware

-        middlewares.append(DeferredToolFilterMiddleware())
+        middlewares.append(DeferredToolFilterMiddleware(deferred_setup.deferred_names, deferred_setup.catalog_hash))

    # Add SubagentLimitMiddleware to truncate excess parallel task calls
    subagent_enabled = cfg.get("subagent_enabled", False)
@@ -353,6 +356,23 @@ def _build_middlewares(
    return middlewares


+def _assemble_deferred(filtered_tools, *, enabled: bool):
+    """Build the final tool list + deferred setup from a policy-filtered list.
+
+    Call AFTER tool-policy filtering so the deferred catalog never exposes a
+    tool the agent is not allowed to use. Fail-closed: if tool_search is enabled
+    and MCP tools survived filtering but no deferred set was recovered, raise
+    rather than silently binding their full schemas to the model.
+    """
+    from deerflow.tools.builtins.tool_search import _is_mcp_tool, build_deferred_tool_setup
+
+    setup = build_deferred_tool_setup(filtered_tools, enabled=enabled)
+    if enabled and not setup.deferred_names and any(_is_mcp_tool(t) for t in filtered_tools):
+        raise RuntimeError("tool_search enabled and MCP tools survived policy filtering, but no deferred set was recovered — refusing to bind MCP schemas (fail-closed).")
+    final_tools = list(filtered_tools) + ([setup.tool_search_tool] if setup.tool_search_tool else [])
+    return final_tools, setup
+
+
 def _available_skill_names(agent_config, is_bootstrap: bool) -> set[str] | None:
    if is_bootstrap:
        return {"bootstrap"}
@@ -460,16 +480,19 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):

    if is_bootstrap:
        # Special bootstrap agent with minimal prompt for initial custom agent creation flow
-        tools = get_available_tools(model_name=model_name, subagent_enabled=subagent_enabled, app_config=resolved_app_config) + [setup_agent]
+        raw_tools = get_available_tools(model_name=model_name, subagent_enabled=subagent_enabled, app_config=resolved_app_config) + [setup_agent]
+        filtered = filter_tools_by_skill_allowed_tools(raw_tools, skills_for_tool_policy)
+        final_tools, setup = _assemble_deferred(filtered, enabled=resolved_app_config.tool_search.enabled)
        return create_agent(
            model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, app_config=resolved_app_config, attach_tracing=False),
-            tools=filter_tools_by_skill_allowed_tools(tools, skills_for_tool_policy),
-            middleware=_build_middlewares(config, model_name=model_name, app_config=resolved_app_config),
+            tools=final_tools,
+            middleware=_build_middlewares(config, model_name=model_name, app_config=resolved_app_config, deferred_setup=setup),
            system_prompt=apply_prompt_template(
                subagent_enabled=subagent_enabled,
                max_concurrent_subagents=max_concurrent_subagents,
                available_skills=set(["bootstrap"]),
                app_config=resolved_app_config,
+                deferred_names=setup.deferred_names,
            ),
            state_schema=ThreadState,
        )
@@ -478,17 +501,20 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
    # The default agent (no agent_name) does not see this tool.
    extra_tools = [update_agent] if agent_name else []
    # Default lead agent (unchanged behavior)
-    tools = get_available_tools(model_name=model_name, groups=agent_config.tool_groups if agent_config else None, subagent_enabled=subagent_enabled, app_config=resolved_app_config)
+    raw_tools = get_available_tools(model_name=model_name, groups=agent_config.tool_groups if agent_config else None, subagent_enabled=subagent_enabled, app_config=resolved_app_config)
+    filtered = filter_tools_by_skill_allowed_tools(raw_tools + extra_tools, skills_for_tool_policy)
+    final_tools, setup = _assemble_deferred(filtered, enabled=resolved_app_config.tool_search.enabled)
    return create_agent(
        model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, reasoning_effort=reasoning_effort, app_config=resolved_app_config, attach_tracing=False),
-        tools=filter_tools_by_skill_allowed_tools(tools + extra_tools, skills_for_tool_policy),
-        middleware=_build_middlewares(config, model_name=model_name, agent_name=agent_name, app_config=resolved_app_config),
+        tools=final_tools,
+        middleware=_build_middlewares(config, model_name=model_name, agent_name=agent_name, app_config=resolved_app_config, deferred_setup=setup),
        system_prompt=apply_prompt_template(
            subagent_enabled=subagent_enabled,
            max_concurrent_subagents=max_concurrent_subagents,
            agent_name=agent_name,
            available_skills=set(agent_config.skills) if agent_config and agent_config.skills is not None else None,
            app_config=resolved_app_config,
+            deferred_names=setup.deferred_names,
        ),
        state_schema=ThreadState,
    )