Commit Graph

6 Commits

Author SHA1 Message Date
stphtt f212da9f89 fix(sandbox): create shell session before retrying on a fresh id (#3577)
* fix(sandbox): create shell session before retrying on a fresh id

The AIO sandbox recovery path generated a UUID and passed it straight to
exec_command(id=...). The sandbox image only auto-creates a session when
exec_command is called with *no* id; an exec carrying an unknown id returns
HTTP 404 "Session not found". So every ErrorObservation recovery itself
404'd, turning a transient session lapse into an unrecoverable tool error
that looped the run up to the LangGraph recursion limit.

Explicitly create_session(id=fresh_id) before targeting that id on retry.
create_session is idempotent (returns the existing session if the id already
exists), so this is safe under the serializing lock.

Updated the regression test to assert the retry targets exactly the
created session id rather than a fabricated, uncreated one.

* fix(sandbox): release the one-shot recovery session after retry

The fresh session created on the ErrorObservation recovery path is used for
exactly one command -- the next execute_command runs with no id and returns
to the default session. Under persistent session corruption every command
would create another session that is never reused or released, accumulating
sessions on the container.

Release it best-effort with cleanup_session() in a finally, swallowing any
cleanup error so it never masks a successful retry.

Addresses review feedback on #3577.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-06-17 10:21:27 +08:00
Ryker_Feng 5dc2d6cbf5 fix(sandbox): close AioSandbox HTTP client during provider teardown (#2872) (#3245)
* fix(sandbox): close AioSandbox HTTP client during provider teardown (#2872)

AioSandbox allocates a host-side agent_sandbox client (wrapping an
httpx.Client) in __init__, but AioSandboxProvider.release/destroy/shutdown
only popped provider state and tore down the backend container — the
client/transport owned by each cached AioSandbox was never explicitly
closed, accumulating unreclaimed sockets in long-running services.

- Add AioSandbox.close(): best-effort, idempotent close of the wrapped
  httpx_client (falls back to top-level client.close()); errors are
  logged but never raised so backend cleanup is never blocked.
- AioSandboxProvider.release()/destroy() now close the cached AioSandbox
  before dropping it; shutdown() inherits this via destroy().

* fix(sandbox): close the real httpx.Client owned by AioSandbox (#2872)

The previous close() only walked one level (wrapper.httpx_client), which resolves to the Fern-generated HttpClient wrapper that has no close(). The real socket-owning httpx.Client lives one level deeper at _client_wrapper.httpx_client.httpx_client, so the close path never fired and host-side sockets still leaked.

Resolve the real httpx.Client with graceful degradation; clear self._client under the lock for use-after-close and concurrent double-close safety; mark provider release()/destroy() try/except as defense-in-depth; rewrite TestClose against the real nested structure to lock down the original no-op bug.
2026-06-02 22:55:59 +08:00
Xun e37912e2c8 feat(sandbox) Adds download file interface in Sandbox (#3038)
* Add download interface in Sandbox

* fix

* fix

* del invalidate test

* fix

* safe download

* improve
2026-05-20 10:16:31 +08:00
Willem Jiang 189b82405c fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination (#2685)
* fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination

  The agent_sandbox library's shell API defaults no_change_timeout to 120
  seconds. When AioSandbox.execute_command() called exec_command() without
  this parameter, commands producing no output for 120s would return with
  NO_CHANGE_TIMEOUT status even though the script was still running.

  Pass no_change_timeout=600 to all exec_command calls (matching the
  client-level HTTP timeout) so long-running commands are not cut short.

  Fixes #2668

* test(sandbox): add assertions for no_change_timeout in execute_command and list_dir

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2f37bc72-0826-4443-a6ba-e5b78c22fb5a

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-05-01 22:27:02 +08:00
Varian_米泽 a2cb38f62b fix: prevent concurrent subagent file write conflicts in sandbox tools (#1714)
* fix: prevent concurrent subagent file write conflicts

Serialize same-path str_replace operations in sandbox tools

Guard AioSandbox write_file/update_file with the existing sandbox lock

Add regression tests for concurrent str_replace and append races

Verify with backend full tests and ruff lint checks

* fix(sandbox): Fix the concurrency issue of file operations on the same path in isolated sandboxes.

Ensure that different sandbox instances use independent locks for file operations on the same virtual path to avoid concurrency conflicts. Change the lock key from a single path to a composite key of (sandbox.id, path), and add tests to verify the concurrent safety of isolated sandboxes.

* feat(sandbox): Extract file operation lock logic to standalone module and fix concurrency issues

Extract file operation lock related logic from tools.py into a separate file_operation_lock.py module.
Fix data race issues during concurrent str_replace and write_file operations.
2026-04-02 15:39:41 +08:00
Matt Van Horn a3bfea631c fix(sandbox): serialize concurrent exec_command calls in AioSandbox (#1435)
* fix(sandbox): serialize concurrent exec_command calls in AioSandbox

The AIO sandbox container maintains a single persistent shell session
that corrupts when multiple exec_command requests arrive concurrently
(e.g. when ToolNode issues parallel tool_calls). The corrupted session
returns 'ErrorObservation' strings as output, cascading into subsequent
commands.

Add a threading.Lock to AioSandbox to serialize shell commands. As a
secondary defense, detect ErrorObservation in output and retry with a
fresh session ID.

Fixes #1433

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(sandbox): address Copilot review findings

- Fix shell injection in list_dir: use shlex.quote(path) to escape
  user-provided paths in the find command
- Narrow ErrorObservation retry condition from broad substring match
  to the specific corruption signature to prevent false retries
- Improve test_lock_prevents_concurrent_execution: use threading.Barrier
  to ensure all workers contend for the lock simultaneously
- Improve test_list_dir_uses_lock: assert lock.locked() is True during
  exec_command to verify lock acquisition

* style: auto-format with ruff

---------

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 22:33:35 +08:00