mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-10 09:25:57 +00:00
fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state) (#3342)
* feat(tool-search): add hash-scoped promoted state to ThreadState * feat(tool-search): add immutable DeferredToolCatalog with stable hash * feat(tool-search): add build_deferred_tool_setup + Command-writing tool_search * refactor(tool-search): replace deferred-tool ContextVar with closures + graph state (#3272) Build the deferred catalog + tool_search tool per agent from the policy-filtered tool list (after skill allowed-tools), pass deferred_names + catalog_hash explicitly to DeferredToolFilterMiddleware and the prompt, and record promotions in ThreadState.promoted (scoped by catalog_hash) via a Command-returning tool_search. Removes DeferredToolRegistry and the _registry_var ContextVar so deferral no longer depends on build/execute sharing an async context. MCP tools are tagged with metadata[deerflow_mcp]; client.py assembles deferral the same way. Catalog is built AFTER tool-policy filtering (no policy-excluded tool can leak via tool_search) and assembly is fail-closed. Migrate tests off the deleted registry APIs; delete the obsolete ContextVar-based #2884 regression (re-covered by state-based tests in a follow-up). * test(tool-search): lock tool_search promotion into next model turn via graph state * test(tool-search): cross-context, policy-leak, fail-closed, #2884 isolation regressions * test(tool-search): align real-LLM e2e with closure-based deferred setup * docs: update DeferredToolFilterMiddleware description for closure+state design * style(tests): drop unused import in test_deferred_setup (ruff) * test(tool-search): harden merge_promoted + replace tautological catalog test From independent code review: - merge_promoted: use existing.get("catalog_hash") so a forward-incompatible or externally-injected persisted promoted dict triggers a replace instead of a KeyError crash; add regression test for the malformed-existing case. - test_deferred_catalog: replace the `== [] or True` tautology (a test that could never fail) with a deterministic invalid-regex->literal-fallback check (positive match on calc + negative empty match). - DeferredToolCatalog: comment why frozen-without-slots is required for the cached_property hash/names fields (adding slots=True would break them). * fix(tool-search): read tool_search.enabled from self._app_config in client DeerFlowClient._ensure_agent called get_app_config() directly to read tool_search.enabled, but the client already resolves and stores its config as self._app_config at construction (and uses it everywhere else). The bare call re-resolves config from disk at agent-build time, which raises FileNotFoundError in environments without a config.yaml (CI) — test_client.py's fixture only patches get_app_config during __init__, so the later call hit the real loader. Use self._app_config, matching the rest of the client. * test(tool-search): lock tool_search post-policy append ordering tool_search is appended after skill-allowlist filtering, so the allowlist can no longer deny it by name. Lock the intended contract: it only appears when allowed MCP tools survive the filter, and its catalog (derived from the already policy-filtered list) can never expose a denied tool. Addresses the ordering observation from the Copilot review on #3342.
This commit is contained in:
@@ -1,202 +1,151 @@
|
||||
"""Tool search — deferred tool discovery at runtime.
|
||||
|
||||
Contains:
|
||||
- DeferredToolRegistry: stores deferred tools and handles regex search
|
||||
- tool_search: the LangChain tool the agent calls to discover deferred tools
|
||||
- DeferredToolCatalog: immutable, searchable catalog of deferred tools.
|
||||
- build_tool_search_tool: builds the `tool_search` tool as a closure over a
|
||||
catalog; it records promotions into graph state via ``Command``.
|
||||
- build_deferred_tool_setup: assembles the catalog + tool from a
|
||||
policy-filtered tool list (call AFTER tool-policy filtering).
|
||||
|
||||
The agent sees deferred tool names in <available-deferred-tools> but cannot
|
||||
call them until it fetches their full schema via the tool_search tool.
|
||||
Source-agnostic: no mention of MCP or tool origin.
|
||||
call them until it fetches their full schema via the tool_search tool. The
|
||||
deferred set rides on a build-time closure and promotion lives in per-thread
|
||||
graph state — there is no ContextVar. Source-agnostic: a tool is "deferred"
|
||||
when it carries the ``deerflow_mcp`` metadata tag.
|
||||
"""
|
||||
|
||||
import contextvars
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from functools import cached_property
|
||||
from typing import Annotated
|
||||
|
||||
from langchain.tools import BaseTool
|
||||
from langchain_core.tools import tool
|
||||
from langchain_core.messages import ToolMessage
|
||||
from langchain_core.tools import InjectedToolCallId, tool
|
||||
from langchain_core.utils.function_calling import convert_to_openai_function
|
||||
from langgraph.types import Command
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
MAX_RESULTS = 5 # Max tools returned per search
|
||||
|
||||
|
||||
# ── Registry ──
|
||||
# ── Catalog ──
|
||||
|
||||
|
||||
@dataclass
|
||||
class DeferredToolEntry:
|
||||
"""Lightweight metadata for a deferred tool (no full schema in context)."""
|
||||
# NOTE: frozen=True without slots=True keeps __dict__, which is what lets the
|
||||
# @cached_property fields below cache (they write to instance.__dict__, bypassing
|
||||
# the frozen __setattr__). Do NOT add slots=True or hash/names break at runtime.
|
||||
@dataclass(frozen=True)
|
||||
class DeferredToolCatalog:
|
||||
"""Immutable catalog of deferred tools. Pure search, no mutation."""
|
||||
|
||||
name: str
|
||||
description: str
|
||||
tool: BaseTool # Full tool object, returned only on search match
|
||||
tools: tuple[BaseTool, ...]
|
||||
|
||||
@cached_property
|
||||
def names(self) -> frozenset[str]:
|
||||
return frozenset(t.name for t in self.tools)
|
||||
|
||||
class DeferredToolRegistry:
|
||||
"""Registry of deferred tools, searchable by regex pattern."""
|
||||
|
||||
def __init__(self):
|
||||
self._entries: list[DeferredToolEntry] = []
|
||||
|
||||
def register(self, tool: BaseTool) -> None:
|
||||
self._entries.append(
|
||||
DeferredToolEntry(
|
||||
name=tool.name,
|
||||
description=tool.description or "",
|
||||
tool=tool,
|
||||
)
|
||||
)
|
||||
|
||||
def promote(self, names: set[str]) -> None:
|
||||
"""Remove tools from the deferred registry so they pass through the filter.
|
||||
|
||||
Called after tool_search returns a tool's schema — the LLM now knows
|
||||
the full definition, so the DeferredToolFilterMiddleware should stop
|
||||
stripping it from bind_tools on subsequent calls.
|
||||
"""
|
||||
if not names:
|
||||
return
|
||||
before = len(self._entries)
|
||||
self._entries = [e for e in self._entries if e.name not in names]
|
||||
promoted = before - len(self._entries)
|
||||
if promoted:
|
||||
logger.debug(f"Promoted {promoted} tool(s) from deferred to active: {names}")
|
||||
@cached_property
|
||||
def hash(self) -> str:
|
||||
canon = [{"name": t.name, "schema": convert_to_openai_function(t)} for t in sorted(self.tools, key=lambda t: t.name)]
|
||||
blob = json.dumps(canon, sort_keys=True, ensure_ascii=False, default=str)
|
||||
return hashlib.sha256(blob.encode("utf-8")).hexdigest()[:16]
|
||||
|
||||
def search(self, query: str) -> list[BaseTool]:
|
||||
"""Search deferred tools by regex pattern against name + description.
|
||||
|
||||
Supports three query forms (aligned with Claude Code):
|
||||
- "select:name1,name2" — exact name match
|
||||
- "+keyword rest" — name must contain keyword, rank by rest
|
||||
- "keyword query" — regex match against name + description
|
||||
|
||||
Returns:
|
||||
List of matched BaseTool objects (up to MAX_RESULTS).
|
||||
"""
|
||||
if query.startswith("select:"):
|
||||
names = {n.strip() for n in query[7:].split(",")}
|
||||
return [e.tool for e in self._entries if e.name in names][:MAX_RESULTS]
|
||||
wanted = {n.strip() for n in query[7:].split(",")}
|
||||
return [t for t in self.tools if t.name in wanted][:MAX_RESULTS]
|
||||
|
||||
if query.startswith("+"):
|
||||
parts = query[1:].split(None, 1)
|
||||
required = parts[0].lower()
|
||||
candidates = [e for e in self._entries if required in e.name.lower()]
|
||||
candidates = [t for t in self.tools if required in t.name.lower()]
|
||||
if len(parts) > 1:
|
||||
candidates.sort(
|
||||
key=lambda e: _regex_score(parts[1], e),
|
||||
reverse=True,
|
||||
)
|
||||
return [e.tool for e in candidates][:MAX_RESULTS]
|
||||
candidates.sort(key=lambda t: _catalog_regex_score(parts[1], t), reverse=True)
|
||||
return candidates[:MAX_RESULTS]
|
||||
|
||||
# General regex search
|
||||
try:
|
||||
regex = re.compile(query, re.IGNORECASE)
|
||||
except re.error:
|
||||
regex = re.compile(re.escape(query), re.IGNORECASE)
|
||||
|
||||
scored = []
|
||||
for entry in self._entries:
|
||||
searchable = f"{entry.name} {entry.description}"
|
||||
scored: list[tuple[int, BaseTool]] = []
|
||||
for t in self.tools:
|
||||
searchable = f"{t.name} {t.description or ''}"
|
||||
if regex.search(searchable):
|
||||
score = 2 if regex.search(entry.name) else 1
|
||||
scored.append((score, entry))
|
||||
|
||||
scored.append((2 if regex.search(t.name) else 1, t))
|
||||
scored.sort(key=lambda x: x[0], reverse=True)
|
||||
return [entry.tool for _, entry in scored][:MAX_RESULTS]
|
||||
|
||||
@property
|
||||
def entries(self) -> list[DeferredToolEntry]:
|
||||
return list(self._entries)
|
||||
|
||||
@property
|
||||
def deferred_names(self) -> set[str]:
|
||||
"""Names of tools that are still hidden from model binding."""
|
||||
return {entry.name for entry in self._entries}
|
||||
|
||||
def contains(self, name: str) -> bool:
|
||||
"""Return whether *name* is still deferred."""
|
||||
return any(entry.name == name for entry in self._entries)
|
||||
|
||||
def __len__(self) -> int:
|
||||
return len(self._entries)
|
||||
return [t for _, t in scored][:MAX_RESULTS]
|
||||
|
||||
|
||||
def _regex_score(pattern: str, entry: DeferredToolEntry) -> int:
|
||||
def _catalog_regex_score(pattern: str, t: BaseTool) -> int:
|
||||
try:
|
||||
regex = re.compile(pattern, re.IGNORECASE)
|
||||
except re.error:
|
||||
regex = re.compile(re.escape(pattern), re.IGNORECASE)
|
||||
return len(regex.findall(f"{entry.name} {entry.description}"))
|
||||
return len(regex.findall(f"{t.name} {t.description or ''}"))
|
||||
|
||||
|
||||
# ── Per-request registry (ContextVar) ──
|
||||
#
|
||||
# Using a ContextVar instead of a module-level global prevents concurrent
|
||||
# requests from clobbering each other's registry. In asyncio-based LangGraph
|
||||
# each graph run executes in its own async context, so each request gets an
|
||||
# independent registry value. For synchronous tools run via
|
||||
# loop.run_in_executor, Python copies the current context to the worker thread,
|
||||
# so the ContextVar value is correctly inherited there too.
|
||||
|
||||
_registry_var: contextvars.ContextVar[DeferredToolRegistry | None] = contextvars.ContextVar("deferred_tool_registry", default=None)
|
||||
# ── Setup / tool ──
|
||||
|
||||
|
||||
def get_deferred_registry() -> DeferredToolRegistry | None:
|
||||
return _registry_var.get()
|
||||
@dataclass(frozen=True)
|
||||
class DeferredToolSetup:
|
||||
tool_search_tool: BaseTool | None
|
||||
deferred_names: frozenset[str]
|
||||
catalog_hash: str | None
|
||||
|
||||
|
||||
def set_deferred_registry(registry: DeferredToolRegistry) -> None:
|
||||
_registry_var.set(registry)
|
||||
def _is_mcp_tool(t: BaseTool) -> bool:
|
||||
return (getattr(t, "metadata", None) or {}).get("deerflow_mcp") is True
|
||||
|
||||
|
||||
def reset_deferred_registry() -> None:
|
||||
"""Reset the deferred registry for the current async context."""
|
||||
_registry_var.set(None)
|
||||
def build_tool_search_tool(catalog: DeferredToolCatalog) -> BaseTool:
|
||||
catalog_hash = catalog.hash
|
||||
|
||||
@tool
|
||||
def tool_search(query: str, tool_call_id: Annotated[str, InjectedToolCallId]) -> Command:
|
||||
"""Fetches full schema definitions for deferred tools so they can be called.
|
||||
|
||||
Deferred tools appear by name in <available-deferred-tools> in the system
|
||||
prompt. Until fetched, only the name is known. This tool matches a query
|
||||
against the deferred tools and returns the matched tools complete schemas;
|
||||
once returned, a tool becomes callable.
|
||||
|
||||
Query forms:
|
||||
- "select:Read,Edit" -- fetch these exact tools by name
|
||||
- "notebook jupyter" -- keyword search, up to max_results best matches
|
||||
- "+slack send" -- require "slack" in the name, rank by remaining terms
|
||||
"""
|
||||
matched = catalog.search(query)[:MAX_RESULTS]
|
||||
if not matched:
|
||||
content, names = f"No tools found matching: {query}", []
|
||||
else:
|
||||
content = json.dumps([convert_to_openai_function(t) for t in matched], indent=2, ensure_ascii=False)
|
||||
names = [t.name for t in matched]
|
||||
return Command(
|
||||
update={
|
||||
"promoted": {"catalog_hash": catalog_hash, "names": names},
|
||||
"messages": [ToolMessage(content=content, tool_call_id=tool_call_id, name="tool_search")],
|
||||
}
|
||||
)
|
||||
|
||||
return tool_search
|
||||
|
||||
|
||||
# ── Tool ──
|
||||
def build_deferred_tool_setup(filtered_tools: list[BaseTool], *, enabled: bool) -> DeferredToolSetup:
|
||||
"""Build the deferred-tool setup from a POLICY-FILTERED tool list.
|
||||
|
||||
|
||||
@tool
|
||||
def tool_search(query: str) -> str:
|
||||
"""Fetches full schema definitions for deferred tools so they can be called.
|
||||
|
||||
Deferred tools appear by name in <available-deferred-tools> in the system
|
||||
prompt. Until fetched, only the name is known — there is no parameter
|
||||
schema, so the tool cannot be invoked. This tool takes a query, matches
|
||||
it against the deferred tool list, and returns the matched tools' complete
|
||||
definitions. Once a tool's schema appears in that result, it is callable.
|
||||
|
||||
Query forms:
|
||||
- "select:Read,Edit,Grep" — fetch these exact tools by name
|
||||
- "notebook jupyter" — keyword search, up to max_results best matches
|
||||
- "+slack send" — require "slack" in the name, rank by remaining terms
|
||||
|
||||
Args:
|
||||
query: Query to find deferred tools. Use "select:<tool_name>" for
|
||||
direct selection, or keywords to search.
|
||||
|
||||
Returns:
|
||||
Matched tool definitions as JSON array.
|
||||
Must be called after skill/agent tool-policy filtering so the catalog never
|
||||
exposes a tool the current agent is not allowed to use.
|
||||
"""
|
||||
registry = get_deferred_registry()
|
||||
if not registry:
|
||||
return "No deferred tools available."
|
||||
|
||||
matched_tools = registry.search(query)
|
||||
if not matched_tools:
|
||||
return f"No tools found matching: {query}"
|
||||
|
||||
# Use LangChain's built-in serialization to produce OpenAI function format.
|
||||
# This is model-agnostic: all LLMs understand this standard schema.
|
||||
tool_defs = [convert_to_openai_function(t) for t in matched_tools[:MAX_RESULTS]]
|
||||
|
||||
# Promote matched tools so the DeferredToolFilterMiddleware stops filtering
|
||||
# them from bind_tools — the LLM now has the full schema and can invoke them.
|
||||
registry.promote({t.name for t in matched_tools[:MAX_RESULTS]})
|
||||
|
||||
return json.dumps(tool_defs, indent=2, ensure_ascii=False)
|
||||
if not enabled:
|
||||
return DeferredToolSetup(None, frozenset(), None)
|
||||
deferred = [t for t in filtered_tools if _is_mcp_tool(t)]
|
||||
if not deferred:
|
||||
return DeferredToolSetup(None, frozenset(), None)
|
||||
catalog = DeferredToolCatalog(tuple(deferred))
|
||||
return DeferredToolSetup(build_tool_search_tool(catalog), catalog.names, catalog.hash)
|
||||
|
||||
@@ -7,7 +7,6 @@ from deerflow.config.app_config import AppConfig
|
||||
from deerflow.reflection import resolve_variable
|
||||
from deerflow.sandbox.security import is_host_bash_allowed
|
||||
from deerflow.tools.builtins import ask_clarification_tool, present_file_tool, task_tool, view_image_tool
|
||||
from deerflow.tools.builtins.tool_search import get_deferred_registry
|
||||
from deerflow.tools.sync import make_sync_tool_wrapper
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -127,57 +126,13 @@ def get_available_tools(
|
||||
if mcp_tools:
|
||||
logger.info(f"Using {len(mcp_tools)} cached MCP tool(s)")
|
||||
|
||||
# When tool_search is enabled, register MCP tools in the
|
||||
# deferred registry and add tool_search to builtin tools.
|
||||
if config.tool_search.enabled:
|
||||
from deerflow.tools.builtins.tool_search import DeferredToolRegistry, set_deferred_registry
|
||||
from deerflow.tools.builtins.tool_search import tool_search as tool_search_tool
|
||||
|
||||
# Reuse the existing registry if one is already set for
|
||||
# this async context. ``get_available_tools`` is
|
||||
# re-entered whenever a subagent is spawned
|
||||
# (``task_tool`` calls it to build the child agent's
|
||||
# toolset), and previously we used to unconditionally
|
||||
# rebuild the registry — wiping out the parent agent's
|
||||
# tool_search promotions. The
|
||||
# ``DeferredToolFilterMiddleware`` then re-hid those
|
||||
# tools from subsequent model calls, leaving the agent
|
||||
# able to see a tool's name but unable to invoke it
|
||||
# (issue #2884). ``contextvars`` already gives us the
|
||||
# lifetime semantics we want: a fresh request / graph
|
||||
# run starts in a new asyncio task with the
|
||||
# ContextVar at its default of ``None``, so reuse is
|
||||
# only triggered for re-entrant calls inside one run.
|
||||
#
|
||||
# Intentionally NOT reconciling against the current
|
||||
# ``mcp_tools`` snapshot. The MCP cache only refreshes
|
||||
# on ``extensions_config.json`` mtime changes, which
|
||||
# in practice happens between graph runs — not inside
|
||||
# one. And even if a refresh did happen mid-run, the
|
||||
# already-built lead agent's ``ToolNode`` still holds
|
||||
# the *previous* tool set (LangGraph binds tools at
|
||||
# graph construction time), so a brand-new MCP tool
|
||||
# couldn't actually be invoked anyway. The
|
||||
# ``DeferredToolRegistry`` doesn't retain the names
|
||||
# of previously-promoted tools (``promote()`` drops
|
||||
# the entry entirely), so re-syncing the registry
|
||||
# against a fresh ``mcp_tools`` list would
|
||||
# mis-classify those promotions as new tools and
|
||||
# re-register them as deferred — exactly the bug
|
||||
# this fix exists to prevent.
|
||||
existing_registry = get_deferred_registry()
|
||||
if existing_registry is None:
|
||||
registry = DeferredToolRegistry()
|
||||
for t in mcp_tools:
|
||||
registry.register(t)
|
||||
set_deferred_registry(registry)
|
||||
logger.info(f"Tool search active: {len(mcp_tools)} tools deferred")
|
||||
else:
|
||||
mcp_tool_names = {t.name for t in mcp_tools}
|
||||
still_deferred = len(existing_registry)
|
||||
promoted_count = max(0, len(mcp_tool_names) - still_deferred)
|
||||
logger.info(f"Tool search active (preserved promotions): {still_deferred} tools deferred, {promoted_count} already promoted")
|
||||
builtin_tools.append(tool_search_tool)
|
||||
# Tag MCP-sourced tools so deferred-tool assembly (done at
|
||||
# the agent construction site, AFTER tool-policy filtering)
|
||||
# can identify them. No ContextVar / registry is built here;
|
||||
# the deferred catalog + tool_search tool are assembled per
|
||||
# agent from the policy-filtered tool list.
|
||||
for t in mcp_tools:
|
||||
t.metadata = {**(t.metadata or {}), "deerflow_mcp": True}
|
||||
except ImportError:
|
||||
logger.warning("MCP module not available. Install 'langchain-mcp-adapters' package to enable MCP tools.")
|
||||
except Exception as e:
|
||||
|
||||
Reference in New Issue
Block a user