mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-10 09:25:57 +00:00
fix(middleware): externalize oversized tool output into sandbox for non-mounted sandboxes (#3417)
* fix(middleware): externalize oversized tool output into sandbox for non-mounted sandboxes
ToolOutputBudgetMiddleware persisted oversized tool results to the host
filesystem and returned a /mnt/user-data/outputs virtual path. For sandboxes
that do not use thread-data mounts (e.g. remote AIO sandbox), that virtual
path does not exist inside the sandbox, so the model's read_file tool could
not read it back and reported 'file not found'.
Branch on SandboxProvider.uses_thread_data_mounts:
- Mounted sandboxes (local Docker, AIO + LocalContainerBackend) keep the
original host-disk path; the host outputs dir is bind-mounted to the same
virtual path inside the sandbox, so behavior is unchanged.
- Non-mounted (remote) sandboxes externalize into the sandbox itself via
execute_command('mkdir -p ...') + write_file + 'test -s' validation. The
validation step is required because AIO sandbox execute_command returns
'Error: ...' as a string on failure instead of raising, so a silent mkdir
failure would otherwise leak through.
Any failure (rejected subdir, mkdir/write/validate error) falls back to the
existing inline head+tail truncation, so an unreadable path is never returned
to the model.
The sandbox resolver reads the sandbox_id that SandboxMiddleware already
writes into runtime.state['sandbox']; it never calls provider.acquire(),
keeping the tool-call hot path free of blocking I/O. Tools that do not use a
sandbox (web_search, MCP, ...) resolve to None and fall through to inline
truncation, which is the safe behavior for them.
Fixes #3416
* fix(middleware): address Copilot review feedback on sandbox externalization
- Make get_sandbox_provider() lookup best-effort in _budget_content: only
query when outputs_path or sandbox is available, and fall back to inline
truncation if provider initialization raises rather than propagating
the error. A resolved sandbox instance is sufficient on its own to take
the non-mounted externalization branch.
- Strict-match the sandbox post-write validation echo
(check.strip() == 'OK') to avoid false positives if execute_command
ever surfaces unrelated stdout/stderr containing 'OK' as a substring.
Refs: #3417
* test: fix flaky tests relying on /nonexistent/... path under container root
Two tests in this module (test_returns_none_on_invalid_path and
test_fallback_when_disk_write_fails) used paths like
'/nonexistent/impossible/path' to trigger _externalize's OSError
fallback. These paths are creatable when the test process runs as root
inside the CI container: os.makedirs(..., exist_ok=True) successfully
creates the entire chain under /, so the OSError branch is never hit
and the tests fail. Reproducible on main independently of this PR.
Switch to '/dev/null/cannot-mkdir-here'. /dev/null is a character
device on both Linux and macOS, so os.makedirs always fails with
NotADirectoryError regardless of privileges, reliably exercising the
OSError fallback.
* fix(tool-output-budget): only consult sandbox provider when a sandbox is resolved
The previous revision called get_sandbox_provider() whenever externalization
was triggered, including on the legacy host-disk path. Environments without
a configured sandbox -- in particular CI runners without a config.yaml --
would raise FileNotFoundError there, get caught, and silently fall back to
inline truncation. That defeated the host-disk externalization path that
predates this PR and was the root cause of the regressing legacy tests.
Restructure the branching so the provider is only consulted when a sandbox
has actually been resolved for the current tool call:
- sandbox resolved + provider.uses_thread_data_mounts: host-disk write
(bind-mounted into the sandbox, equivalent to a sandbox-side write).
- sandbox resolved + non-mounted provider: sandbox write (#3416).
- no sandbox + outputs_path: host-disk write
(legacy / non-sandbox tools, no provider call at all).
- otherwise: inline fallback.
No test changes; the legacy externalization tests are provider-agnostic by
construction and now pass without monkeypatching.
Refs: #3416
* test(tool-output-budget): assert legacy path does not call sandbox provider
Lock in the contract introduced by d6e2d25b: when no sandbox is resolved
for a tool call, _budget_content must externalize to the host outputs
directory without consulting get_sandbox_provider(). Regressing this would
re-break legacy / non-sandbox tools in environments without a configured
sandbox (e.g. CI without config.yaml), which is the failure mode #3416's
fix avoids.
The test injects a get_sandbox_provider that raises on call, so any
future refactor that moves the provider lookup out of the sandbox-only
branch will fail loudly.
Refs: #3416
This commit is contained in:
+175
-21
@@ -11,10 +11,11 @@ from __future__ import annotations
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
import shlex
|
||||
import uuid
|
||||
from collections.abc import Awaitable, Callable
|
||||
from dataclasses import replace as dc_replace
|
||||
from typing import Any, override
|
||||
from typing import TYPE_CHECKING, Any, override
|
||||
|
||||
from langchain.agents import AgentState
|
||||
from langchain.agents.middleware import AgentMiddleware
|
||||
@@ -24,9 +25,19 @@ from langgraph.prebuilt.tool_node import ToolCallRequest
|
||||
from langgraph.types import Command
|
||||
|
||||
from deerflow.config.tool_output_config import ToolOutputConfig
|
||||
from deerflow.sandbox.sandbox_provider import get_sandbox_provider
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from deerflow.sandbox.sandbox import Sandbox
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Virtual outputs root inside the sandbox. Host-mounted sandboxes map this to
|
||||
# the thread outputs dir on the host; for non-mounted (remote) sandboxes the
|
||||
# same path is written directly into the sandbox filesystem so the model's
|
||||
# ``read_file`` tool can read it back (issue #3416).
|
||||
_VIRTUAL_OUTPUTS_BASE = "/mnt/user-data/outputs"
|
||||
|
||||
|
||||
def _default_config() -> ToolOutputConfig:
|
||||
return ToolOutputConfig()
|
||||
@@ -94,6 +105,18 @@ def _sanitize_tool_name(name: str) -> str:
|
||||
return safe or "unknown"
|
||||
|
||||
|
||||
def _build_externalized_filename(*, tool_name: str, tool_call_id: str) -> str:
|
||||
"""Build the on-disk filename for an externalized tool output.
|
||||
|
||||
Shared by the host-disk and sandbox externalization paths so both
|
||||
produce the identical naming scheme.
|
||||
"""
|
||||
safe_name = _sanitize_tool_name(tool_name)
|
||||
ext = _EXT_MAP.get(tool_name, "txt")
|
||||
short_id = uuid.uuid4().hex[:12]
|
||||
return f"{safe_name}-{short_id}.{ext}"
|
||||
|
||||
|
||||
def _externalize(
|
||||
content: str,
|
||||
*,
|
||||
@@ -111,10 +134,7 @@ def _externalize(
|
||||
except OSError:
|
||||
return None
|
||||
|
||||
safe_name = _sanitize_tool_name(tool_name)
|
||||
ext = _EXT_MAP.get(tool_name, "txt")
|
||||
short_id = uuid.uuid4().hex[:12]
|
||||
filename = f"{safe_name}-{short_id}.{ext}"
|
||||
filename = _build_externalized_filename(tool_name=tool_name, tool_call_id=tool_call_id)
|
||||
filepath = os.path.join(storage_dir, filename)
|
||||
|
||||
if not os.path.abspath(filepath).startswith(os.path.abspath(storage_dir)):
|
||||
@@ -126,8 +146,56 @@ def _externalize(
|
||||
except OSError:
|
||||
return None
|
||||
|
||||
virtual_base = "/mnt/user-data/outputs"
|
||||
return f"{virtual_base}/{storage_subdir}/{filename}"
|
||||
return f"{_VIRTUAL_OUTPUTS_BASE}/{storage_subdir}/{filename}"
|
||||
|
||||
|
||||
def _externalize_to_sandbox(
|
||||
content: str,
|
||||
*,
|
||||
tool_name: str,
|
||||
tool_call_id: str,
|
||||
storage_subdir: str,
|
||||
sandbox: Sandbox,
|
||||
) -> str | None:
|
||||
"""Write *content* into the sandbox filesystem and return the virtual path.
|
||||
|
||||
Used when the sandbox does not use thread-data mounts (e.g. a remote AIO
|
||||
sandbox): the host-side :func:`_externalize` virtual path would not exist
|
||||
inside the sandbox, so the model's ``read_file`` tool could not read it
|
||||
back (issue #3416). Returns the same virtual-path contract on success, or
|
||||
``None`` to signal the caller to fall back to inline truncation.
|
||||
"""
|
||||
if os.path.isabs(storage_subdir) or ".." in storage_subdir:
|
||||
return None
|
||||
filename = _build_externalized_filename(tool_name=tool_name, tool_call_id=tool_call_id)
|
||||
virtual_dir = f"{_VIRTUAL_OUTPUTS_BASE}/{storage_subdir}"
|
||||
virtual_path = f"{virtual_dir}/{filename}"
|
||||
try:
|
||||
# AIO sandbox write_file does NOT create parent directories, so create
|
||||
# them explicitly before writing. execute_command returns its stdout
|
||||
# verbatim (including an "Error: ..." string on failure) rather than
|
||||
# raising, so we cannot rely on exception propagation here.
|
||||
sandbox.execute_command(f"mkdir -p {shlex.quote(virtual_dir)}")
|
||||
sandbox.write_file(virtual_path, content)
|
||||
# Validate the file landed: execute_command may have silently failed
|
||||
# to create the directory, and write_file backends differ. Refuse to
|
||||
# hand the model an unreadable read_file path.
|
||||
check = sandbox.execute_command(f"test -s {shlex.quote(virtual_path)} && echo OK || echo MISSING")
|
||||
if not isinstance(check, str) or check.strip() != "OK":
|
||||
logger.warning(
|
||||
"Sandbox externalize validation failed: path=%s, check=%r",
|
||||
virtual_path,
|
||||
check,
|
||||
)
|
||||
return None
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"Failed to externalize %s output to sandbox (call_id=%s)",
|
||||
tool_name,
|
||||
tool_call_id,
|
||||
)
|
||||
return None
|
||||
return virtual_path
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -227,6 +295,33 @@ def _resolve_outputs_path(request: ToolCallRequest) -> str | None:
|
||||
return outputs_path if isinstance(outputs_path, str) else None
|
||||
|
||||
|
||||
def _resolve_sandbox(request: ToolCallRequest) -> Sandbox | None:
|
||||
"""Resolve the active sandbox for the current tool call, or ``None``.
|
||||
|
||||
Reads the sandbox_id that ``SandboxMiddleware`` (and the sandbox tools
|
||||
themselves) write into ``runtime.state["sandbox"]``. We intentionally do
|
||||
NOT call ``provider.acquire`` here: acquiring a sandbox can trigger
|
||||
blocking remote I/O, and this resolver runs on every tool call. Tools
|
||||
that do not use a sandbox (``web_search``, MCP, ...) will return ``None``
|
||||
here, which is fine -- the caller falls back to inline truncation.
|
||||
"""
|
||||
runtime = getattr(request, "runtime", None)
|
||||
state = getattr(runtime, "state", None)
|
||||
if not isinstance(state, dict):
|
||||
return None
|
||||
sandbox_state = state.get("sandbox")
|
||||
if not isinstance(sandbox_state, dict):
|
||||
return None
|
||||
sandbox_id = sandbox_state.get("sandbox_id")
|
||||
if not sandbox_id:
|
||||
return None
|
||||
try:
|
||||
return get_sandbox_provider().get(sandbox_id)
|
||||
except Exception:
|
||||
logger.exception("Failed to look up sandbox %s for tool-output externalization", sandbox_id)
|
||||
return None
|
||||
|
||||
|
||||
def _budget_content(
|
||||
content: str,
|
||||
*,
|
||||
@@ -234,6 +329,7 @@ def _budget_content(
|
||||
tool_call_id: str,
|
||||
outputs_path: str | None,
|
||||
config: ToolOutputConfig,
|
||||
sandbox: Sandbox | None = None,
|
||||
) -> str | None:
|
||||
"""Apply budget to *content*. Returns ``None`` if no change needed."""
|
||||
threshold = config.tool_overrides.get(tool_name, config.externalize_min_chars)
|
||||
@@ -242,14 +338,50 @@ def _budget_content(
|
||||
if len(content) <= threshold and len(content) <= config.fallback_max_chars:
|
||||
return None
|
||||
|
||||
if threshold > 0 and len(content) > threshold and outputs_path:
|
||||
virtual_path = _externalize(
|
||||
content,
|
||||
tool_name=tool_name,
|
||||
tool_call_id=tool_call_id,
|
||||
outputs_path=outputs_path,
|
||||
storage_subdir=config.storage_subdir,
|
||||
)
|
||||
if threshold > 0 and len(content) > threshold:
|
||||
virtual_path: str | None = None
|
||||
# Decide persistence target based on what's available, without touching
|
||||
# the sandbox provider unless a sandbox was actually resolved for this
|
||||
# call. This keeps the legacy host-disk path provider-free, so callers
|
||||
# without a configured sandbox (and CI environments without a
|
||||
# config.yaml) continue to externalize to the host as before.
|
||||
if sandbox is not None:
|
||||
provider = None
|
||||
try:
|
||||
provider = get_sandbox_provider()
|
||||
except Exception:
|
||||
logger.exception("Failed to get sandbox provider for tool-output externalization; falling back to inline truncation")
|
||||
if provider is not None and getattr(provider, "uses_thread_data_mounts", False):
|
||||
# Host-mounted sandbox: host outputs path is bind-mounted into
|
||||
# the sandbox at the same virtual path, so writing host-side is
|
||||
# equivalent. Preserve the original behavior to avoid extra
|
||||
# sandbox round-trips.
|
||||
if outputs_path:
|
||||
virtual_path = _externalize(
|
||||
content,
|
||||
tool_name=tool_name,
|
||||
tool_call_id=tool_call_id,
|
||||
outputs_path=outputs_path,
|
||||
storage_subdir=config.storage_subdir,
|
||||
)
|
||||
else:
|
||||
virtual_path = _externalize_to_sandbox(
|
||||
content,
|
||||
tool_name=tool_name,
|
||||
tool_call_id=tool_call_id,
|
||||
storage_subdir=config.storage_subdir,
|
||||
sandbox=sandbox,
|
||||
)
|
||||
elif outputs_path:
|
||||
# No sandbox in this call (legacy / non-sandbox tools): write to
|
||||
# host outputs path directly, no provider needed.
|
||||
virtual_path = _externalize(
|
||||
content,
|
||||
tool_name=tool_name,
|
||||
tool_call_id=tool_call_id,
|
||||
outputs_path=outputs_path,
|
||||
storage_subdir=config.storage_subdir,
|
||||
)
|
||||
if virtual_path is not None:
|
||||
logger.info(
|
||||
"Externalized %s output (%d chars) to %s",
|
||||
@@ -288,7 +420,12 @@ def _budget_content(
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _patch_tool_message(msg: ToolMessage, config: ToolOutputConfig, outputs_path: str | None) -> ToolMessage:
|
||||
def _patch_tool_message(
|
||||
msg: ToolMessage,
|
||||
config: ToolOutputConfig,
|
||||
outputs_path: str | None,
|
||||
sandbox: Sandbox | None = None,
|
||||
) -> ToolMessage:
|
||||
"""Apply budget to a single ToolMessage. Returns the original if unchanged."""
|
||||
tool_name = msg.name or "unknown"
|
||||
if tool_name in config.exempt_tools:
|
||||
@@ -304,6 +441,7 @@ def _patch_tool_message(msg: ToolMessage, config: ToolOutputConfig, outputs_path
|
||||
tool_call_id=msg.tool_call_id or "",
|
||||
outputs_path=outputs_path,
|
||||
config=config,
|
||||
sandbox=sandbox,
|
||||
)
|
||||
if replacement is None:
|
||||
return msg
|
||||
@@ -355,10 +493,15 @@ def _needs_budget(result: ToolMessage | Command, config: ToolOutputConfig) -> bo
|
||||
return False
|
||||
|
||||
|
||||
def _patch_result(result: ToolMessage | Command, config: ToolOutputConfig, outputs_path: str | None) -> ToolMessage | Command:
|
||||
def _patch_result(
|
||||
result: ToolMessage | Command,
|
||||
config: ToolOutputConfig,
|
||||
outputs_path: str | None,
|
||||
sandbox: Sandbox | None = None,
|
||||
) -> ToolMessage | Command:
|
||||
"""Apply budget to a tool call result (ToolMessage or Command)."""
|
||||
if isinstance(result, ToolMessage):
|
||||
return _patch_tool_message(result, config, outputs_path)
|
||||
return _patch_tool_message(result, config, outputs_path, sandbox)
|
||||
|
||||
update = getattr(result, "update", None)
|
||||
if not isinstance(update, dict):
|
||||
@@ -372,7 +515,7 @@ def _patch_result(result: ToolMessage | Command, config: ToolOutputConfig, outpu
|
||||
changed = False
|
||||
for msg in messages:
|
||||
if isinstance(msg, ToolMessage):
|
||||
patched = _patch_tool_message(msg, config, outputs_path)
|
||||
patched = _patch_tool_message(msg, config, outputs_path, sandbox)
|
||||
if patched is not msg:
|
||||
changed = True
|
||||
new_messages.append(patched)
|
||||
@@ -392,6 +535,11 @@ def _patch_model_messages(messages: list[Any], config: ToolOutputConfig) -> list
|
||||
ToolMessage exceeds the budget — the common case once every result has
|
||||
already been budgeted at tool-call time, so a long history is not rebuilt
|
||||
on every model call.
|
||||
|
||||
Historical messages do not get a ``sandbox`` argument: any oversized tool
|
||||
message in history was already budgeted (and possibly externalized) at
|
||||
tool-call time, so the only thing left for the history path to do is
|
||||
inline fallback truncation, which needs no sandbox.
|
||||
"""
|
||||
if not any(isinstance(msg, ToolMessage) and _tool_message_over_budget(msg, config) for msg in messages):
|
||||
return None
|
||||
@@ -442,7 +590,8 @@ class ToolOutputBudgetMiddleware(AgentMiddleware[AgentState]):
|
||||
if not _needs_budget(result, self._config):
|
||||
return result
|
||||
outputs_path = _resolve_outputs_path(request)
|
||||
return _patch_result(result, self._config, outputs_path)
|
||||
sandbox = _resolve_sandbox(request)
|
||||
return _patch_result(result, self._config, outputs_path, sandbox)
|
||||
|
||||
@override
|
||||
async def awrap_tool_call(
|
||||
@@ -456,7 +605,12 @@ class ToolOutputBudgetMiddleware(AgentMiddleware[AgentState]):
|
||||
if not _needs_budget(result, self._config):
|
||||
return result
|
||||
outputs_path = _resolve_outputs_path(request)
|
||||
return await asyncio.to_thread(_patch_result, result, self._config, outputs_path)
|
||||
# _resolve_sandbox only touches runtime.state and the provider's
|
||||
# in-memory sandbox registry, so it is safe to call on the event
|
||||
# loop. The actual sandbox I/O (mkdir/write/test) happens inside
|
||||
# _patch_result, which is offloaded to a worker thread below.
|
||||
sandbox = _resolve_sandbox(request)
|
||||
return await asyncio.to_thread(_patch_result, result, self._config, outputs_path, sandbox)
|
||||
|
||||
# -- model call hooks (historical message truncation) ------------------
|
||||
|
||||
|
||||
Reference in New Issue
Block a user