fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state) (#3342)

* feat(tool-search): add hash-scoped promoted state to ThreadState

* feat(tool-search): add immutable DeferredToolCatalog with stable hash

* feat(tool-search): add build_deferred_tool_setup + Command-writing tool_search

* refactor(tool-search): replace deferred-tool ContextVar with closures + graph state (#3272)

Build the deferred catalog + tool_search tool per agent from the policy-filtered
tool list (after skill allowed-tools), pass deferred_names + catalog_hash
explicitly to DeferredToolFilterMiddleware and the prompt, and record promotions
in ThreadState.promoted (scoped by catalog_hash) via a Command-returning
tool_search. Removes DeferredToolRegistry and the _registry_var ContextVar so
deferral no longer depends on build/execute sharing an async context. MCP tools
are tagged with metadata[deerflow_mcp]; client.py assembles deferral the same way.

Catalog is built AFTER tool-policy filtering (no policy-excluded tool can leak via
tool_search) and assembly is fail-closed. Migrate tests off the deleted registry
APIs; delete the obsolete ContextVar-based #2884 regression (re-covered by
state-based tests in a follow-up).

* test(tool-search): lock tool_search promotion into next model turn via graph state

* test(tool-search): cross-context, policy-leak, fail-closed, #2884 isolation regressions

* test(tool-search): align real-LLM e2e with closure-based deferred setup

* docs: update DeferredToolFilterMiddleware description for closure+state design

* style(tests): drop unused import in test_deferred_setup (ruff)

* test(tool-search): harden merge_promoted + replace tautological catalog test

From independent code review:
- merge_promoted: use existing.get("catalog_hash") so a forward-incompatible
  or externally-injected persisted promoted dict triggers a replace instead of
  a KeyError crash; add regression test for the malformed-existing case.
- test_deferred_catalog: replace the `== [] or True` tautology (a test that
  could never fail) with a deterministic invalid-regex->literal-fallback check
  (positive match on calc + negative empty match).
- DeferredToolCatalog: comment why frozen-without-slots is required for the
  cached_property hash/names fields (adding slots=True would break them).

* fix(tool-search): read tool_search.enabled from self._app_config in client

DeerFlowClient._ensure_agent called get_app_config() directly to read
tool_search.enabled, but the client already resolves and stores its config as
self._app_config at construction (and uses it everywhere else). The bare call
re-resolves config from disk at agent-build time, which raises FileNotFoundError
in environments without a config.yaml (CI) — test_client.py's fixture only
patches get_app_config during __init__, so the later call hit the real loader.
Use self._app_config, matching the rest of the client.

* test(tool-search): lock tool_search post-policy append ordering

tool_search is appended after skill-allowlist filtering, so the allowlist
can no longer deny it by name. Lock the intended contract: it only appears
when allowed MCP tools survive the filter, and its catalog (derived from the
already policy-filtered list) can never expose a denied tool. Addresses the
ordering observation from the Copilot review on #3342.
This commit is contained in:
AochenShen99
2026-06-02 22:43:22 +08:00
committed by GitHub
parent 74e3e80cf6
commit d9f4724950
17 changed files with 768 additions and 1267 deletions
@@ -270,6 +270,7 @@ def _build_middlewares(
custom_middlewares: list[AgentMiddleware] | None = None,
*,
app_config: AppConfig | None = None,
deferred_setup=None,
):
"""Build middleware chain based on runtime configuration.
@@ -318,11 +319,13 @@ def _build_middlewares(
if model_config is not None and model_config.supports_vision:
middlewares.append(ViewImageMiddleware())
# Add DeferredToolFilterMiddleware to hide deferred tool schemas from model binding
if resolved_app_config.tool_search.enabled:
# Hide deferred tool schemas from model binding until tool_search promotes them.
# The deferred set + catalog hash come from the build-time setup (assembled
# after tool-policy filtering); promotion is read from graph state.
if deferred_setup is not None and deferred_setup.deferred_names:
from deerflow.agents.middlewares.deferred_tool_filter_middleware import DeferredToolFilterMiddleware
middlewares.append(DeferredToolFilterMiddleware())
middlewares.append(DeferredToolFilterMiddleware(deferred_setup.deferred_names, deferred_setup.catalog_hash))
# Add SubagentLimitMiddleware to truncate excess parallel task calls
subagent_enabled = cfg.get("subagent_enabled", False)
@@ -353,6 +356,23 @@ def _build_middlewares(
return middlewares
def _assemble_deferred(filtered_tools, *, enabled: bool):
"""Build the final tool list + deferred setup from a policy-filtered list.
Call AFTER tool-policy filtering so the deferred catalog never exposes a
tool the agent is not allowed to use. Fail-closed: if tool_search is enabled
and MCP tools survived filtering but no deferred set was recovered, raise
rather than silently binding their full schemas to the model.
"""
from deerflow.tools.builtins.tool_search import _is_mcp_tool, build_deferred_tool_setup
setup = build_deferred_tool_setup(filtered_tools, enabled=enabled)
if enabled and not setup.deferred_names and any(_is_mcp_tool(t) for t in filtered_tools):
raise RuntimeError("tool_search enabled and MCP tools survived policy filtering, but no deferred set was recovered — refusing to bind MCP schemas (fail-closed).")
final_tools = list(filtered_tools) + ([setup.tool_search_tool] if setup.tool_search_tool else [])
return final_tools, setup
def _available_skill_names(agent_config, is_bootstrap: bool) -> set[str] | None:
if is_bootstrap:
return {"bootstrap"}
@@ -460,16 +480,19 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
if is_bootstrap:
# Special bootstrap agent with minimal prompt for initial custom agent creation flow
tools = get_available_tools(model_name=model_name, subagent_enabled=subagent_enabled, app_config=resolved_app_config) + [setup_agent]
raw_tools = get_available_tools(model_name=model_name, subagent_enabled=subagent_enabled, app_config=resolved_app_config) + [setup_agent]
filtered = filter_tools_by_skill_allowed_tools(raw_tools, skills_for_tool_policy)
final_tools, setup = _assemble_deferred(filtered, enabled=resolved_app_config.tool_search.enabled)
return create_agent(
model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, app_config=resolved_app_config, attach_tracing=False),
tools=filter_tools_by_skill_allowed_tools(tools, skills_for_tool_policy),
middleware=_build_middlewares(config, model_name=model_name, app_config=resolved_app_config),
tools=final_tools,
middleware=_build_middlewares(config, model_name=model_name, app_config=resolved_app_config, deferred_setup=setup),
system_prompt=apply_prompt_template(
subagent_enabled=subagent_enabled,
max_concurrent_subagents=max_concurrent_subagents,
available_skills=set(["bootstrap"]),
app_config=resolved_app_config,
deferred_names=setup.deferred_names,
),
state_schema=ThreadState,
)
@@ -478,17 +501,20 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
# The default agent (no agent_name) does not see this tool.
extra_tools = [update_agent] if agent_name else []
# Default lead agent (unchanged behavior)
tools = get_available_tools(model_name=model_name, groups=agent_config.tool_groups if agent_config else None, subagent_enabled=subagent_enabled, app_config=resolved_app_config)
raw_tools = get_available_tools(model_name=model_name, groups=agent_config.tool_groups if agent_config else None, subagent_enabled=subagent_enabled, app_config=resolved_app_config)
filtered = filter_tools_by_skill_allowed_tools(raw_tools + extra_tools, skills_for_tool_policy)
final_tools, setup = _assemble_deferred(filtered, enabled=resolved_app_config.tool_search.enabled)
return create_agent(
model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, reasoning_effort=reasoning_effort, app_config=resolved_app_config, attach_tracing=False),
tools=filter_tools_by_skill_allowed_tools(tools + extra_tools, skills_for_tool_policy),
middleware=_build_middlewares(config, model_name=model_name, agent_name=agent_name, app_config=resolved_app_config),
tools=final_tools,
middleware=_build_middlewares(config, model_name=model_name, agent_name=agent_name, app_config=resolved_app_config, deferred_setup=setup),
system_prompt=apply_prompt_template(
subagent_enabled=subagent_enabled,
max_concurrent_subagents=max_concurrent_subagents,
agent_name=agent_name,
available_skills=set(agent_config.skills) if agent_config and agent_config.skills is not None else None,
app_config=resolved_app_config,
deferred_names=setup.deferred_names,
),
state_schema=ThreadState,
)
@@ -684,33 +684,16 @@ Rules:
"""
def get_deferred_tools_prompt_section(*, app_config: AppConfig | None = None) -> str:
"""Generate <available-deferred-tools> block for the system prompt.
def get_deferred_tools_prompt_section(*, deferred_names: frozenset[str] = frozenset()) -> str:
"""Generate <available-deferred-tools> from an explicit deferred-name set.
Lists only deferred tool names so the agent knows what exists
and can use tool_search to load them.
Returns empty string when tool_search is disabled or no tools are deferred.
Lists only names so the agent knows what exists and can use tool_search to
load them. Returns empty string when there are no deferred tools. The set is
computed at agent build time (after tool-policy filtering) and passed in.
"""
from deerflow.tools.builtins.tool_search import get_deferred_registry
if app_config is None:
try:
from deerflow.config import get_app_config
config = get_app_config()
except Exception:
return ""
else:
config = app_config
if not config.tool_search.enabled:
if not deferred_names:
return ""
registry = get_deferred_registry()
if not registry:
return ""
names = "\n".join(e.name for e in registry.entries)
names = "\n".join(sorted(deferred_names))
return f"<available-deferred-tools>\n{names}\n</available-deferred-tools>"
@@ -772,6 +755,7 @@ def apply_prompt_template(
agent_name: str | None = None,
available_skills: set[str] | None = None,
app_config: AppConfig | None = None,
deferred_names: frozenset[str] = frozenset(),
) -> str:
# Include subagent section only if enabled (from runtime parameter)
n = max_concurrent_subagents
@@ -799,7 +783,7 @@ def apply_prompt_template(
skills_section = get_skills_prompt_section(available_skills, app_config=app_config)
# Get deferred tools section (tool_search)
deferred_tools_section = get_deferred_tools_prompt_section(app_config=app_config)
deferred_tools_section = get_deferred_tools_prompt_section(deferred_names=deferred_names)
# Build ACP agent section only if ACP agents are configured
acp_section = _build_acp_section(app_config=app_config)
@@ -1,12 +1,15 @@
"""Middleware to filter deferred tool schemas from model binding.
When tool_search is enabled, MCP tools are registered in the DeferredToolRegistry
and passed to ToolNode for execution, but their schemas should NOT be sent to the
LLM via bind_tools (that's the whole point of deferral — saving context tokens).
When tool_search is enabled, MCP tools are still passed to ToolNode for
execution, but their schemas must NOT be sent to the LLM via bind_tools until
the model has discovered them via tool_search. This middleware removes the
still-deferred tools from request.tools before model binding, and blocks tool
calls to tools that have not been promoted yet.
This middleware intercepts wrap_model_call and removes deferred tools from
request.tools so that model.bind_tools only receives active tool schemas.
The agent discovers deferred tools at runtime via the tool_search tool.
The deferred name set and the catalog hash are injected at construction time
(no ContextVar). Promotion state is read from graph state (``state["promoted"]``),
scoped by catalog hash so a stale persisted promotion cannot expose a renamed
or drifted tool.
"""
import logging
@@ -24,47 +27,49 @@ logger = logging.getLogger(__name__)
class DeferredToolFilterMiddleware(AgentMiddleware[AgentState]):
"""Remove deferred tools from request.tools before model binding.
"""Hide deferred tool schemas from the bound model until promoted.
ToolNode still holds all tools (including deferred) for execution routing,
but the LLM only sees active tool schemas — deferred tools are discoverable
via tool_search at runtime.
but the LLM only sees active tool schemas plus tools that have already been
promoted (recorded in ``state["promoted"]`` under the current catalog hash).
"""
def __init__(self, deferred_names: frozenset[str], catalog_hash: str | None):
super().__init__()
self._deferred = deferred_names
self._catalog_hash = catalog_hash
def _promoted(self, state) -> set[str]:
promoted = (state or {}).get("promoted")
if promoted and promoted.get("catalog_hash") == self._catalog_hash:
return set(promoted.get("names") or [])
return set()
def _hidden(self, state) -> set[str]:
return set(self._deferred) - self._promoted(state)
def _filter_tools(self, request: ModelRequest) -> ModelRequest:
from deerflow.tools.builtins.tool_search import get_deferred_registry
registry = get_deferred_registry()
if not registry:
if not self._deferred:
return request
deferred_names = registry.deferred_names
active_tools = [t for t in request.tools if getattr(t, "name", None) not in deferred_names]
if len(active_tools) < len(request.tools):
logger.debug(f"Filtered {len(request.tools) - len(active_tools)} deferred tool schema(s) from model binding")
return request.override(tools=active_tools)
hide = self._hidden(request.state)
if not hide:
return request
active = [t for t in request.tools if getattr(t, "name", None) not in hide]
if len(active) < len(request.tools):
logger.debug("Filtered %d deferred tool schema(s) from model binding", len(request.tools) - len(active))
return request.override(tools=active)
def _blocked_tool_message(self, request: ToolCallRequest) -> ToolMessage | None:
from deerflow.tools.builtins.tool_search import get_deferred_registry
registry = get_deferred_registry()
if not registry:
if not self._deferred:
return None
tool_name = str(request.tool_call.get("name") or "")
if not tool_name:
name = str(request.tool_call.get("name") or "")
if not name or name not in self._hidden(request.state):
return None
if not registry.contains(tool_name):
return None
tool_call_id = str(request.tool_call.get("id") or "missing_tool_call_id")
return ToolMessage(
content=(f"Error: Tool '{tool_name}' is deferred and has not been promoted yet. Call tool_search first to expose and promote this tool's schema, then retry."),
content=(f"Error: Tool '{name}' is deferred and has not been promoted yet. Call tool_search first to expose and promote this tool's schema, then retry."),
tool_call_id=tool_call_id,
name=tool_name,
name=name,
status="error",
)
@@ -58,6 +58,32 @@ def merge_todos(existing: list | None, new: list | None) -> list | None:
return new
class PromotedTools(TypedDict):
catalog_hash: str
names: list[str]
def merge_promoted(existing: PromotedTools | None, new: PromotedTools | None) -> PromotedTools | None:
"""Reducer for deferred-tool promotions, scoped by catalog hash.
- new None/empty -> preserve existing (node didn't touch promotions).
- catalog_hash changed -> replace wholesale, dropping stale names (prevents a
persisted bare name from exposing a different tool after catalog drift).
- same catalog_hash -> union names, dedupe, preserve order.
"""
if not new:
return existing
if existing is None or existing.get("catalog_hash") != new["catalog_hash"]:
return {
"catalog_hash": new["catalog_hash"],
"names": list(dict.fromkeys(new["names"])),
}
return {
"catalog_hash": existing["catalog_hash"],
"names": list(dict.fromkeys(existing["names"] + new["names"])),
}
class ThreadState(AgentState):
sandbox: NotRequired[SandboxState | None]
thread_data: NotRequired[ThreadDataState | None]
@@ -66,3 +92,4 @@ class ThreadState(AgentState):
todos: Annotated[list | None, merge_todos]
uploaded_files: NotRequired[list[dict] | None]
viewed_images: Annotated[dict[str, ViewedImageData], merge_viewed_images] # image_path -> {base64, mime_type}
promoted: Annotated[PromotedTools | None, merge_promoted]