fix(sandbox): uphold /mnt/user-data contract at Sandbox API boundary (#2873) (#2881)

* fix(sandbox): uphold /mnt/user-data contract at Sandbox API boundary (#2873)

LocalSandboxProvider used a process-wide singleton with no /mnt/user-data
mapping, forcing every caller to translate virtual paths via tools.py
before invoking the public Sandbox API. AIO already exposes /mnt/user-data
natively (per-thread bind mounts), so the same code path behaved
differently across implementations — and direct callers like
uploads.py:282 / feishu.py:389 only worked thanks to the
`uses_thread_data_mounts` workaround flag.

Switch the provider to a dual-track cache: keep the `"local"` singleton
for legacy acquire(None) callers (backward-compat for existing tests and
scripts), and create a per-thread LocalSandbox with id `"local:{tid}"`
for acquire(thread_id). Each per-thread instance carries PathMapping
entries for /mnt/user-data, its three subdirs, and /mnt/acp-workspace,
mirroring how AioSandboxProvider mounts those paths into its container.

is_local_sandbox() now recognises both id formats. `_agent_written_paths`
becomes per-thread (it was a process-wide set that leaked across
threads — a latent isolation bug also fixed by this change).

Verified via TDD: a new contract test suite hits the public Sandbox API
directly (write/read/list/exec/glob/grep/update + per-thread isolation +
lifecycle). 3212 backend tests still pass, ruff is clean.

* fix(sandbox): address Copilot review on #2881

Three follow-ups from Copilot's review of the LocalSandboxProvider refactor:

1. Synchronisation: ``acquire`` / ``get`` / ``reset`` mutated the cache without
   any lock, so concurrent acquire of the same ``thread_id`` could create two
   ``LocalSandbox`` instances and lose one's ``_agent_written_paths`` state.
   Add a provider-wide ``threading.Lock`` (matching ``AioSandboxProvider``) and
   build per-thread mappings outside the lock to avoid holding it during the
   ``ensure_thread_dirs`` filesystem touch.

2. Memory bound: ``_thread_sandboxes`` grew monotonically. Replace the plain
   dict with an ``OrderedDict`` LRU capped at
   ``DEFAULT_MAX_CACHED_THREAD_SANDBOXES`` (256, configurable per provider
   instance). ``get`` promotes touched threads to the MRU end so an active
   thread isn't evicted under load. Eviction is graceful: the next ``acquire``
   rebuilds a fresh sandbox; only ``_agent_written_paths`` (reverse-resolve
   hint) is lost.

3. Docs: update ``CLAUDE.md`` to reflect the new per-thread architecture, the
   LRU cap, and that ``is_local_sandbox`` recognises both id formats.

New regression tests:
- Concurrent ``acquire("alpha")`` from 8 threads yields a single instance
  (slow-init injection forces the race window wide open).
- Concurrent ``acquire`` of distinct thread_ids yields distinct instances.
- The cache evicts the least-recently-used thread once the cap is exceeded.
- ``get`` promotes recency so a polled thread survives a later acquire-storm.
This commit is contained in:
Xinmin Zeng
2026-05-17 08:26:04 +08:00
committed by GitHub
parent 4538c32298
commit 380255f722
4 changed files with 592 additions and 25 deletions
+3 -3
View File
@@ -232,14 +232,14 @@ Proxied through nginx: `/api/langgraph/*` → Gateway LangGraph-compatible runti
**Interface**: Abstract `Sandbox` with `execute_command`, `read_file`, `write_file`, `list_dir`
**Provider Pattern**: `SandboxProvider` with `acquire`, `get`, `release` lifecycle
**Implementations**:
- `LocalSandboxProvider` - Singleton local filesystem execution with path mappings
- `LocalSandboxProvider` - Local filesystem execution. `acquire(thread_id)` returns a per-thread `LocalSandbox` (id `local:{thread_id}`) whose `path_mappings` resolve `/mnt/user-data/{workspace,uploads,outputs}` and `/mnt/acp-workspace` to that thread's host directories, so the public `Sandbox` API honours the `/mnt/user-data` contract uniformly with AIO. `acquire()` / `acquire(None)` keeps the legacy generic singleton (id `local`) for callers without a thread context. Per-thread sandboxes are held in an LRU cache (default 256 entries) guarded by a `threading.Lock`.
- `AioSandboxProvider` (`packages/harness/deerflow/community/`) - Docker-based isolation
**Virtual Path System**:
- Agent sees: `/mnt/user-data/{workspace,uploads,outputs}`, `/mnt/skills`
- Physical: `backend/.deer-flow/users/{user_id}/threads/{thread_id}/user-data/...`, `deer-flow/skills/`
- Translation: `replace_virtual_path()` / `replace_virtual_paths_in_command()`
- Detection: `is_local_sandbox()` checks `sandbox_id == "local"`
- Translation: `LocalSandboxProvider` builds per-thread `PathMapping`s for the user-data prefixes at acquire time; `tools.py` keeps `replace_virtual_path()` / `replace_virtual_paths_in_command()` as a defense-in-depth layer (and for path validation). AIO has the directories volume-mounted at the same virtual paths inside its container, so both implementations accept `/mnt/user-data/...` natively.
- Detection: `is_local_sandbox()` accepts both `sandbox_id == "local"` (legacy / no-thread) and `sandbox_id.startswith("local:")` (per-thread)
**Sandbox Tools** (in `packages/harness/deerflow/sandbox/tools.py`):
- `bash` - Execute commands with path translation and error handling
@@ -1,4 +1,6 @@
import logging
import threading
from collections import OrderedDict
from pathlib import Path
from deerflow.sandbox.local.local_sandbox import LocalSandbox, PathMapping
@@ -7,25 +9,87 @@ from deerflow.sandbox.sandbox_provider import SandboxProvider
logger = logging.getLogger(__name__)
# Module-level alias kept for backward compatibility with older callers/tests
# that reach into ``local_sandbox_provider._singleton`` directly. New code reads
# the provider instance attributes (``_generic_sandbox`` / ``_thread_sandboxes``)
# instead.
_singleton: LocalSandbox | None = None
# Virtual prefixes that must be reserved by the per-thread mappings created in
# ``acquire`` — custom mounts from ``config.yaml`` may not overlap with these.
_USER_DATA_VIRTUAL_PREFIX = "/mnt/user-data"
_ACP_WORKSPACE_VIRTUAL_PREFIX = "/mnt/acp-workspace"
# Default upper bound on per-thread LocalSandbox instances retained in memory.
# Each cached instance is cheap (a small Python object with a list of
# PathMapping and a set of agent-written paths used for reverse resolve), but
# in a long-running gateway the number of distinct thread_ids is unbounded.
# When the cap is exceeded the least-recently-used entry is dropped; the next
# ``acquire(thread_id)`` for that thread simply rebuilds the sandbox at the
# cost of losing its accumulated ``_agent_written_paths`` (read_file falls
# back to no reverse resolution, which is the same behaviour as a fresh run).
DEFAULT_MAX_CACHED_THREAD_SANDBOXES = 256
class LocalSandboxProvider(SandboxProvider):
"""Local-filesystem sandbox provider with per-thread path scoping.
Earlier revisions of this provider returned a single process-wide
``LocalSandbox`` keyed by the literal id ``"local"``. That singleton could
not honour the documented ``/mnt/user-data/...`` contract at the public
``Sandbox`` API boundary because the corresponding host directory is
per-thread (``{base_dir}/users/{user_id}/threads/{thread_id}/user-data/``).
The provider now produces a fresh ``LocalSandbox`` per ``thread_id`` whose
``path_mappings`` include thread-scoped entries for
``/mnt/user-data/{workspace,uploads,outputs}`` and ``/mnt/acp-workspace``,
mirroring how :class:`AioSandboxProvider` bind-mounts those paths into its
docker container. The legacy ``acquire()`` / ``acquire(None)`` call still
returns a generic singleton with id ``"local"`` for callers (and tests)
that do not have a thread context.
Thread-safety: ``acquire``, ``get`` and ``reset`` may be invoked from
multiple threads (Gateway tool dispatch, subagent worker pools, the
background memory updater, …) so all cache state changes are serialised
through a provider-wide :class:`threading.Lock`. This matches the pattern
used by :class:`AioSandboxProvider`.
Memory bound: ``_thread_sandboxes`` is an LRU cache capped at
``max_cached_threads`` (default :data:`DEFAULT_MAX_CACHED_THREAD_SANDBOXES`).
When the cap is exceeded the least-recently-used entry is evicted on the
next ``acquire``; the evicted thread's next ``acquire`` rebuilds a fresh
sandbox (losing only its ``_agent_written_paths`` reverse-resolve hint,
which gracefully degrades read_file output).
"""
uses_thread_data_mounts = True
def __init__(self):
"""Initialize the local sandbox provider with path mappings."""
def __init__(self, max_cached_threads: int = DEFAULT_MAX_CACHED_THREAD_SANDBOXES):
"""Initialize the local sandbox provider with static path mappings.
Args:
max_cached_threads: Upper bound on per-thread sandboxes retained in
the LRU cache. When exceeded, the least-recently-used entry is
evicted on the next ``acquire``.
"""
self._path_mappings = self._setup_path_mappings()
self._generic_sandbox: LocalSandbox | None = None
self._thread_sandboxes: OrderedDict[str, LocalSandbox] = OrderedDict()
self._max_cached_threads = max_cached_threads
self._lock = threading.Lock()
def _setup_path_mappings(self) -> list[PathMapping]:
"""
Setup path mappings for local sandbox.
Setup static path mappings shared by every sandbox this provider yields.
Maps container paths to actual local paths, including skills directory
and any custom mounts configured in config.yaml.
Static mappings cover the skills directory and any custom mounts from
``config.yaml`` — both are process-wide and identical for every thread.
Per-thread ``/mnt/user-data/...`` and ``/mnt/acp-workspace`` mappings
are appended inside :meth:`acquire` because they depend on
``thread_id`` and the effective ``user_id``.
Returns:
List of path mappings
List of static path mappings
"""
mappings: list[PathMapping] = []
@@ -48,7 +112,11 @@ class LocalSandboxProvider(SandboxProvider):
)
# Map custom mounts from sandbox config
_RESERVED_CONTAINER_PREFIXES = [container_path, "/mnt/acp-workspace", "/mnt/user-data"]
_RESERVED_CONTAINER_PREFIXES = [
container_path,
_ACP_WORKSPACE_VIRTUAL_PREFIX,
_USER_DATA_VIRTUAL_PREFIX,
]
sandbox_config = config.sandbox
if sandbox_config and sandbox_config.mounts:
for mount in sandbox_config.mounts:
@@ -99,33 +167,162 @@ class LocalSandboxProvider(SandboxProvider):
return mappings
@staticmethod
def _build_thread_path_mappings(thread_id: str) -> list[PathMapping]:
"""Build per-thread path mappings for /mnt/user-data and /mnt/acp-workspace.
Resolves ``user_id`` via :func:`get_effective_user_id` (the same path
:class:`AioSandboxProvider` uses) and ensures the backing host
directories exist before they are mapped into the sandbox view.
"""
from deerflow.config.paths import get_paths
from deerflow.runtime.user_context import get_effective_user_id
paths = get_paths()
user_id = get_effective_user_id()
paths.ensure_thread_dirs(thread_id, user_id=user_id)
return [
# Aggregate parent mapping so ``ls /mnt/user-data`` and other
# parent-level operations behave the same as inside AIO (where the
# parent directory is real and contains the three subdirs). Longer
# subpath mappings below still win for ``/mnt/user-data/workspace/...``
# because ``_find_path_mapping`` sorts by container_path length.
PathMapping(
container_path=_USER_DATA_VIRTUAL_PREFIX,
local_path=str(paths.sandbox_user_data_dir(thread_id, user_id=user_id)),
read_only=False,
),
PathMapping(
container_path=f"{_USER_DATA_VIRTUAL_PREFIX}/workspace",
local_path=str(paths.sandbox_work_dir(thread_id, user_id=user_id)),
read_only=False,
),
PathMapping(
container_path=f"{_USER_DATA_VIRTUAL_PREFIX}/uploads",
local_path=str(paths.sandbox_uploads_dir(thread_id, user_id=user_id)),
read_only=False,
),
PathMapping(
container_path=f"{_USER_DATA_VIRTUAL_PREFIX}/outputs",
local_path=str(paths.sandbox_outputs_dir(thread_id, user_id=user_id)),
read_only=False,
),
PathMapping(
container_path=_ACP_WORKSPACE_VIRTUAL_PREFIX,
local_path=str(paths.acp_workspace_dir(thread_id, user_id=user_id)),
read_only=False,
),
]
def acquire(self, thread_id: str | None = None) -> str:
"""Return a sandbox id scoped to *thread_id* (or the generic singleton).
- ``thread_id=None`` keeps the legacy singleton with id ``"local"`` for
callers that have no thread context (e.g. legacy tests, scripts).
- ``thread_id="abc"`` yields a per-thread ``LocalSandbox`` with id
``"local:abc"`` whose ``path_mappings`` resolve ``/mnt/user-data/...``
to that thread's host directories.
Thread-safe under concurrent invocation: the cache check + insert is
guarded by ``self._lock`` so two callers racing on the same
``thread_id`` always observe the same LocalSandbox instance.
"""
global _singleton
if _singleton is None:
_singleton = LocalSandbox("local", path_mappings=self._path_mappings)
return _singleton.id
if thread_id is None:
with self._lock:
if self._generic_sandbox is None:
self._generic_sandbox = LocalSandbox("local", path_mappings=list(self._path_mappings))
_singleton = self._generic_sandbox
return self._generic_sandbox.id
# Fast path under lock.
with self._lock:
cached = self._thread_sandboxes.get(thread_id)
if cached is not None:
# Mark as most-recently used so frequently-touched threads
# survive eviction.
self._thread_sandboxes.move_to_end(thread_id)
return cached.id
# ``_build_thread_path_mappings`` touches the filesystem
# (``ensure_thread_dirs``); release the lock during I/O.
new_mappings = list(self._path_mappings) + self._build_thread_path_mappings(thread_id)
with self._lock:
# Re-check after the lock-free I/O: another caller may have
# populated the cache while we were computing mappings.
cached = self._thread_sandboxes.get(thread_id)
if cached is None:
cached = LocalSandbox(f"local:{thread_id}", path_mappings=new_mappings)
self._thread_sandboxes[thread_id] = cached
self._evict_until_within_cap_locked()
else:
self._thread_sandboxes.move_to_end(thread_id)
return cached.id
def _evict_until_within_cap_locked(self) -> None:
"""LRU-evict cached thread sandboxes once the cap is exceeded.
Caller MUST hold ``self._lock``.
"""
while len(self._thread_sandboxes) > self._max_cached_threads:
evicted_thread_id, _ = self._thread_sandboxes.popitem(last=False)
logger.info(
"Evicting LocalSandbox cache entry for thread %s (cap=%d)",
evicted_thread_id,
self._max_cached_threads,
)
def get(self, sandbox_id: str) -> Sandbox | None:
if sandbox_id == "local":
if _singleton is None:
with self._lock:
generic = self._generic_sandbox
if generic is None:
self.acquire()
return _singleton
with self._lock:
return self._generic_sandbox
return generic
if isinstance(sandbox_id, str) and sandbox_id.startswith("local:"):
thread_id = sandbox_id[len("local:") :]
with self._lock:
cached = self._thread_sandboxes.get(thread_id)
if cached is not None:
# Touching a thread via ``get`` (used by tools.py to look
# up the sandbox once per tool call) promotes it in LRU
# order so an active thread isn't evicted under load.
self._thread_sandboxes.move_to_end(thread_id)
return cached
return None
def release(self, sandbox_id: str) -> None:
# LocalSandbox uses singleton pattern - no cleanup needed.
# LocalSandbox has no resources to release; keep the cached instance so
# that ``_agent_written_paths`` (used to reverse-resolve agent-authored
# file contents on read) survives between turns. LRU eviction in
# ``acquire`` and explicit ``reset()`` / ``shutdown()`` are the only
# paths that drop cached entries.
#
# Note: This method is intentionally not called by SandboxMiddleware
# to allow sandbox reuse across multiple turns in a thread.
# For Docker-based providers (e.g., AioSandboxProvider), cleanup
# happens at application shutdown via the shutdown() method.
pass
def reset(self) -> None:
# reset_sandbox_provider() must also clear the module singleton.
"""Drop all cached LocalSandbox instances.
``reset_sandbox_provider()`` calls this to ensure config / mount
changes take effect on the next ``acquire()``. We also reset the
module-level ``_singleton`` alias so older callers/tests that reach
into it see a fresh state.
"""
global _singleton
_singleton = None
with self._lock:
self._generic_sandbox = None
self._thread_sandboxes.clear()
_singleton = None
def shutdown(self) -> None:
# LocalSandboxProvider has no extra resources beyond the shared
# singleton, so shutdown uses the same cleanup path as reset.
# LocalSandboxProvider has no extra resources beyond the cached
# ``LocalSandbox`` instances, so shutdown uses the same cleanup path
# as ``reset``.
self.reset()
@@ -1006,8 +1006,9 @@ def get_thread_data(runtime: Runtime | None) -> ThreadDataState | None:
def is_local_sandbox(runtime: Runtime | None) -> bool:
"""Check if the current sandbox is a local sandbox.
Path replacement is only needed for local sandbox since aio sandbox
already has /mnt/user-data mounted in the container.
Accepts both the legacy generic id ``"local"`` (acquire with no thread
context) and the per-thread id format ``"local:{thread_id}"`` produced by
:meth:`LocalSandboxProvider.acquire` once a thread is known.
"""
if runtime is None:
return False
@@ -1016,7 +1017,10 @@ def is_local_sandbox(runtime: Runtime | None) -> bool:
sandbox_state = runtime.state.get("sandbox")
if sandbox_state is None:
return False
return sandbox_state.get("sandbox_id") == "local"
sandbox_id = sandbox_state.get("sandbox_id")
if not isinstance(sandbox_id, str):
return False
return sandbox_id == "local" or sandbox_id.startswith("local:")
def sandbox_from_runtime(runtime: Runtime | None = None) -> Sandbox:
@@ -0,0 +1,366 @@
"""Issue #2873 regression — the public Sandbox API must honor the documented
/mnt/user-data contract uniformly across implementations.
Today AIO sandbox already accepts /mnt/user-data/... paths directly because the
container has those paths bind-mounted per-thread. LocalSandbox, however,
externalises that translation to ``deerflow.sandbox.tools`` via ``thread_data``,
so any caller that bypasses tools.py (e.g. ``uploads.py`` syncing files into a
remote sandbox via ``sandbox.update_file(virtual_path, ...)``) sees inconsistent
behaviour.
These tests pin down the **public Sandbox API boundary**: when a caller obtains
a ``LocalSandbox`` from ``LocalSandboxProvider.acquire(thread_id)`` and invokes
its abstract methods with documented virtual paths, those paths must resolve to
the thread's user-data directory automatically — no tools.py / thread_data
shim required.
"""
from __future__ import annotations
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import patch
import pytest
from deerflow.config.sandbox_config import SandboxConfig
from deerflow.sandbox.local.local_sandbox_provider import LocalSandboxProvider
def _build_config(skills_dir: Path) -> SimpleNamespace:
"""Minimal app config covering what ``LocalSandboxProvider`` reads at init."""
return SimpleNamespace(
skills=SimpleNamespace(
container_path="/mnt/skills",
get_skills_path=lambda: skills_dir,
use="deerflow.skills.storage.local_skill_storage:LocalSkillStorage",
),
sandbox=SandboxConfig(use="deerflow.sandbox.local:LocalSandboxProvider", mounts=[]),
)
@pytest.fixture
def isolated_paths(monkeypatch, tmp_path):
"""Redirect ``get_paths().base_dir`` to ``tmp_path`` and reset its singleton.
Without this, per-thread directories would be created under the developer's
real ``.deer-flow/`` tree.
"""
monkeypatch.setenv("DEER_FLOW_HOME", str(tmp_path))
from deerflow.config import paths as paths_module
monkeypatch.setattr(paths_module, "_paths", None)
yield tmp_path
monkeypatch.setattr(paths_module, "_paths", None)
@pytest.fixture
def provider(isolated_paths, tmp_path):
"""Provider with a real skills dir and no custom mounts."""
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
cfg = _build_config(skills_dir)
with patch("deerflow.config.get_app_config", return_value=cfg):
yield LocalSandboxProvider()
# ──────────────────────────────────────────────────────────────────────────
# 1. Direct Sandbox API accepts the virtual path contract for ``acquire(tid)``
# ──────────────────────────────────────────────────────────────────────────
def test_acquire_with_thread_id_returns_per_thread_id(provider):
sandbox_id = provider.acquire("alpha")
assert sandbox_id == "local:alpha"
def test_acquire_without_thread_id_remains_legacy_local_id(provider):
"""Backward-compat: ``acquire()`` with no thread keeps the singleton id."""
assert provider.acquire() == "local"
assert provider.acquire(None) == "local"
def test_write_then_read_via_public_api_with_virtual_path(provider):
sandbox_id = provider.acquire("alpha")
sbx = provider.get(sandbox_id)
assert sbx is not None
virtual = "/mnt/user-data/workspace/hello.txt"
sbx.write_file(virtual, "hi there")
assert sbx.read_file(virtual) == "hi there"
def test_list_dir_via_public_api_with_virtual_path(provider):
sandbox_id = provider.acquire("alpha")
sbx = provider.get(sandbox_id)
sbx.write_file("/mnt/user-data/workspace/foo.txt", "x")
entries = sbx.list_dir("/mnt/user-data/workspace")
# entries should be reverse-resolved back to the virtual prefix
assert any("/mnt/user-data/workspace/foo.txt" in e for e in entries)
def test_execute_command_with_virtual_path(provider):
sandbox_id = provider.acquire("alpha")
sbx = provider.get(sandbox_id)
sbx.write_file("/mnt/user-data/uploads/note.txt", "payload")
output = sbx.execute_command("ls /mnt/user-data/uploads")
assert "note.txt" in output
def test_glob_with_virtual_path(provider):
sandbox_id = provider.acquire("alpha")
sbx = provider.get(sandbox_id)
sbx.write_file("/mnt/user-data/outputs/report.md", "# r")
matches, _ = sbx.glob("/mnt/user-data/outputs", "*.md")
assert any(m.endswith("/mnt/user-data/outputs/report.md") for m in matches)
def test_grep_with_virtual_path(provider):
sandbox_id = provider.acquire("alpha")
sbx = provider.get(sandbox_id)
sbx.write_file("/mnt/user-data/workspace/findme.txt", "needle line\nother line")
matches, _ = sbx.grep("/mnt/user-data/workspace", "needle", literal=True)
assert matches
assert matches[0].path.endswith("/mnt/user-data/workspace/findme.txt")
def test_execute_command_lists_aggregate_user_data_root(provider):
"""``ls /mnt/user-data`` (the parent prefix itself) must list the three
subdirs — matching the AIO container's natural filesystem view."""
sandbox_id = provider.acquire("alpha")
sbx = provider.get(sandbox_id)
# Touch all three subdirs so they materialise on disk
sbx.write_file("/mnt/user-data/workspace/.keep", "")
sbx.write_file("/mnt/user-data/uploads/.keep", "")
sbx.write_file("/mnt/user-data/outputs/.keep", "")
output = sbx.execute_command("ls /mnt/user-data")
assert "workspace" in output
assert "uploads" in output
assert "outputs" in output
def test_update_file_with_virtual_path_for_remote_sync_scenario(provider):
"""This is the exact code path used by ``uploads.py:282`` and ``feishu.py:389``.
They build a ``virtual_path`` like ``/mnt/user-data/uploads/foo.pdf`` and hand
raw bytes to the sandbox. Before this fix LocalSandbox would try to write to
the literal host path ``/mnt/user-data/uploads/foo.pdf`` and fail.
"""
sandbox_id = provider.acquire("alpha")
sbx = provider.get(sandbox_id)
sbx.update_file("/mnt/user-data/uploads/blob.bin", b"\x00\x01\x02binary")
assert sbx.read_file("/mnt/user-data/uploads/blob.bin").startswith("\x00\x01\x02")
# ──────────────────────────────────────────────────────────────────────────
# 2. Per-thread isolation (no cross-thread state leaks)
# ──────────────────────────────────────────────────────────────────────────
def test_two_threads_get_distinct_sandboxes(provider):
sid_a = provider.acquire("alpha")
sid_b = provider.acquire("beta")
assert sid_a != sid_b
sbx_a = provider.get(sid_a)
sbx_b = provider.get(sid_b)
assert sbx_a is not sbx_b
def test_per_thread_user_data_mapping_isolated(provider, isolated_paths):
"""Files written via one thread's sandbox must not be visible through another."""
sid_a = provider.acquire("alpha")
sid_b = provider.acquire("beta")
sbx_a = provider.get(sid_a)
sbx_b = provider.get(sid_b)
sbx_a.write_file("/mnt/user-data/workspace/secret.txt", "alpha-only")
# The same virtual path resolves to a different host path in thread "beta"
with pytest.raises(FileNotFoundError):
sbx_b.read_file("/mnt/user-data/workspace/secret.txt")
def test_agent_written_paths_per_thread_isolation(provider):
"""``_agent_written_paths`` tracks files this sandbox wrote so reverse-resolve
runs on read. The set must not leak across threads."""
sid_a = provider.acquire("alpha")
sid_b = provider.acquire("beta")
sbx_a = provider.get(sid_a)
sbx_b = provider.get(sid_b)
sbx_a.write_file("/mnt/user-data/workspace/in-a.txt", "marker")
assert sbx_a._agent_written_paths
assert not sbx_b._agent_written_paths
# ──────────────────────────────────────────────────────────────────────────
# 3. Lifecycle: get / release / reset
# ──────────────────────────────────────────────────────────────────────────
def test_get_returns_cached_instance_for_known_id(provider):
sid = provider.acquire("alpha")
assert provider.get(sid) is provider.get(sid)
def test_get_unknown_id_returns_none(provider):
assert provider.get("local:nonexistent") is None
def test_release_is_noop_keeps_instance_available(provider):
"""Local has no resources to release; the cached instance stays alive across
turns so ``_agent_written_paths`` persists for reverse-resolve on later reads."""
sid = provider.acquire("alpha")
sbx_before = provider.get(sid)
provider.release(sid)
sbx_after = provider.get(sid)
assert sbx_before is sbx_after
def test_reset_clears_both_generic_and_per_thread_caches(provider):
provider.acquire() # populate generic
provider.acquire("alpha") # populate per-thread
assert provider._generic_sandbox is not None
assert provider._thread_sandboxes
provider.reset()
assert provider._generic_sandbox is None
assert not provider._thread_sandboxes
# ──────────────────────────────────────────────────────────────────────────
# 4. is_local_sandbox detects both legacy and per-thread ids
# ──────────────────────────────────────────────────────────────────────────
def test_is_local_sandbox_accepts_both_id_formats():
from deerflow.sandbox.tools import is_local_sandbox
legacy = SimpleNamespace(state={"sandbox": {"sandbox_id": "local"}}, context={})
per_thread = SimpleNamespace(state={"sandbox": {"sandbox_id": "local:alpha"}}, context={})
foreign = SimpleNamespace(state={"sandbox": {"sandbox_id": "aio-12345"}}, context={})
unset = SimpleNamespace(state={}, context={})
assert is_local_sandbox(legacy) is True
assert is_local_sandbox(per_thread) is True
assert is_local_sandbox(foreign) is False
assert is_local_sandbox(unset) is False
# ──────────────────────────────────────────────────────────────────────────
# 5. Concurrency safety (Copilot review feedback)
# ──────────────────────────────────────────────────────────────────────────
def test_concurrent_acquire_same_thread_yields_single_instance(provider):
"""Two threads racing on ``acquire("alpha")`` must share one LocalSandbox.
Without the provider lock the check-then-act in ``acquire`` is non-atomic:
both racers would see an empty cache, both would build their own
LocalSandbox, and one would overwrite the other — losing the loser's
``_agent_written_paths`` and any in-flight state on it.
"""
import threading
import time
from deerflow.sandbox.local import local_sandbox as local_sandbox_module
# Force a wide race window by slowing the LocalSandbox constructor down.
original_init = local_sandbox_module.LocalSandbox.__init__
def slow_init(self, *args, **kwargs):
time.sleep(0.05)
original_init(self, *args, **kwargs)
barrier = threading.Barrier(8)
results: list[str] = []
results_lock = threading.Lock()
def racer():
barrier.wait()
sid = provider.acquire("alpha")
with results_lock:
results.append(sid)
with patch.object(local_sandbox_module.LocalSandbox, "__init__", slow_init):
threads = [threading.Thread(target=racer) for _ in range(8)]
for t in threads:
t.start()
for t in threads:
t.join()
# Every racer must observe the same ``sandbox_id``…
assert len(set(results)) == 1, f"Racers saw different ids: {results}"
# …and the cache must hold exactly one instance for ``alpha``.
assert len(provider._thread_sandboxes) == 1
assert "alpha" in provider._thread_sandboxes
def test_concurrent_acquire_distinct_threads_yields_distinct_instances(provider):
"""Different thread_ids race-acquired in parallel each get their own sandbox."""
import threading
barrier = threading.Barrier(6)
sids: dict[str, str] = {}
lock = threading.Lock()
def racer(name: str):
barrier.wait()
sid = provider.acquire(name)
with lock:
sids[name] = sid
threads = [threading.Thread(target=racer, args=(f"t{i}",)) for i in range(6)]
for t in threads:
t.start()
for t in threads:
t.join()
assert set(sids.values()) == {f"local:t{i}" for i in range(6)}
assert set(provider._thread_sandboxes.keys()) == {f"t{i}" for i in range(6)}
# ──────────────────────────────────────────────────────────────────────────
# 6. Bounded memory growth (Copilot review feedback)
# ──────────────────────────────────────────────────────────────────────────
def test_thread_sandbox_cache_is_bounded(isolated_paths, tmp_path):
"""The LRU cap must evict the least-recently-used thread sandboxes once
exceeded — otherwise long-running gateways would accumulate cache entries
for every distinct ``thread_id`` ever served."""
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
cfg = _build_config(skills_dir)
with patch("deerflow.config.get_app_config", return_value=cfg):
provider = LocalSandboxProvider(max_cached_threads=3)
for i in range(5):
provider.acquire(f"t{i}")
# Only the 3 most-recent thread_ids should be retained.
assert set(provider._thread_sandboxes.keys()) == {"t2", "t3", "t4"}
assert provider.get("local:t0") is None
assert provider.get("local:t4") is not None
def test_lru_promotes_recently_used_thread(isolated_paths, tmp_path):
"""``get`` on a cached thread should mark it as most-recently used so a
later acquire-storm doesn't evict an active thread that is being polled."""
skills_dir = tmp_path / "skills"
skills_dir.mkdir()
cfg = _build_config(skills_dir)
with patch("deerflow.config.get_app_config", return_value=cfg):
provider = LocalSandboxProvider(max_cached_threads=3)
for name in ["a", "b", "c"]:
provider.acquire(name)
# Touch "a" via ``get`` so it becomes most-recently used.
provider.get("local:a")
# Adding a fourth thread should evict "b" (the new LRU), not "a".
provider.acquire("d")
assert "a" in provider._thread_sandboxes
assert "b" not in provider._thread_sandboxes
assert {"a", "c", "d"} == set(provider._thread_sandboxes.keys())