deer-flow

mirror of https://github.com/bytedance/deer-flow.git synced 2026-06-17 13:05:58 +00:00

Author	SHA1	Message	Date
stphtt	f212da9f89	fix(sandbox): create shell session before retrying on a fresh id (#3577 ) * fix(sandbox): create shell session before retrying on a fresh id The AIO sandbox recovery path generated a UUID and passed it straight to exec_command(id=...). The sandbox image only auto-creates a session when exec_command is called with no id; an exec carrying an unknown id returns HTTP 404 "Session not found". So every ErrorObservation recovery itself 404'd, turning a transient session lapse into an unrecoverable tool error that looped the run up to the LangGraph recursion limit. Explicitly create_session(id=fresh_id) before targeting that id on retry. create_session is idempotent (returns the existing session if the id already exists), so this is safe under the serializing lock. Updated the regression test to assert the retry targets exactly the created session id rather than a fabricated, uncreated one. * fix(sandbox): release the one-shot recovery session after retry The fresh session created on the ErrorObservation recovery path is used for exactly one command -- the next execute_command runs with no id and returns to the default session. Under persistent session corruption every command would create another session that is never reused or released, accumulating sessions on the container. Release it best-effort with cleanup_session() in a finally, swallowing any cleanup error so it never masks a successful retry. Addresses review feedback on #3577. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-17 10:21:27 +08:00
Ryker_Feng	5dc2d6cbf5	fix(sandbox): close AioSandbox HTTP client during provider teardown (#2872 ) (#3245 ) * fix(sandbox): close AioSandbox HTTP client during provider teardown (#2872) AioSandbox allocates a host-side agent_sandbox client (wrapping an httpx.Client) in __init__, but AioSandboxProvider.release/destroy/shutdown only popped provider state and tore down the backend container — the client/transport owned by each cached AioSandbox was never explicitly closed, accumulating unreclaimed sockets in long-running services. - Add AioSandbox.close(): best-effort, idempotent close of the wrapped httpx_client (falls back to top-level client.close()); errors are logged but never raised so backend cleanup is never blocked. - AioSandboxProvider.release()/destroy() now close the cached AioSandbox before dropping it; shutdown() inherits this via destroy(). * fix(sandbox): close the real httpx.Client owned by AioSandbox (#2872) The previous close() only walked one level (wrapper.httpx_client), which resolves to the Fern-generated HttpClient wrapper that has no close(). The real socket-owning httpx.Client lives one level deeper at _client_wrapper.httpx_client.httpx_client, so the close path never fired and host-side sockets still leaked. Resolve the real httpx.Client with graceful degradation; clear self._client under the lock for use-after-close and concurrent double-close safety; mark provider release()/destroy() try/except as defense-in-depth; rewrite TestClose against the real nested structure to lock down the original no-op bug.	2026-06-02 22:55:59 +08:00
Xun	e37912e2c8	feat(sandbox) Adds download file interface in Sandbox (#3038 ) * Add download interface in Sandbox * fix * fix * del invalidate test * fix * safe download * improve	2026-05-20 10:16:31 +08:00
Willem Jiang	189b82405c	fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination (#2685 ) * fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination The agent_sandbox library's shell API defaults no_change_timeout to 120 seconds. When AioSandbox.execute_command() called exec_command() without this parameter, commands producing no output for 120s would return with NO_CHANGE_TIMEOUT status even though the script was still running. Pass no_change_timeout=600 to all exec_command calls (matching the client-level HTTP timeout) so long-running commands are not cut short. Fixes #2668 * test(sandbox): add assertions for no_change_timeout in execute_command and list_dir Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2f37bc72-0826-4443-a6ba-e5b78c22fb5a Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-01 22:27:02 +08:00
Varian_米泽	a2cb38f62b	fix: prevent concurrent subagent file write conflicts in sandbox tools (#1714 ) * fix: prevent concurrent subagent file write conflicts Serialize same-path str_replace operations in sandbox tools Guard AioSandbox write_file/update_file with the existing sandbox lock Add regression tests for concurrent str_replace and append races Verify with backend full tests and ruff lint checks * fix(sandbox): Fix the concurrency issue of file operations on the same path in isolated sandboxes. Ensure that different sandbox instances use independent locks for file operations on the same virtual path to avoid concurrency conflicts. Change the lock key from a single path to a composite key of (sandbox.id, path), and add tests to verify the concurrent safety of isolated sandboxes. * feat(sandbox): Extract file operation lock logic to standalone module and fix concurrency issues Extract file operation lock related logic from tools.py into a separate file_operation_lock.py module. Fix data race issues during concurrent str_replace and write_file operations.	2026-04-02 15:39:41 +08:00
Matt Van Horn	a3bfea631c	fix(sandbox): serialize concurrent exec_command calls in AioSandbox (#1435 ) * fix(sandbox): serialize concurrent exec_command calls in AioSandbox The AIO sandbox container maintains a single persistent shell session that corrupts when multiple exec_command requests arrive concurrently (e.g. when ToolNode issues parallel tool_calls). The corrupted session returns 'ErrorObservation' strings as output, cascading into subsequent commands. Add a threading.Lock to AioSandbox to serialize shell commands. As a secondary defense, detect ErrorObservation in output and retry with a fresh session ID. Fixes #1433 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(sandbox): address Copilot review findings - Fix shell injection in list_dir: use shlex.quote(path) to escape user-provided paths in the find command - Narrow ErrorObservation retry condition from broad substring match to the specific corruption signature to prevent false retries - Improve test_lock_prevents_concurrent_execution: use threading.Barrier to ensure all workers contend for the lock simultaneously - Improve test_list_dir_uses_lock: assert lock.locked() is True during exec_command to verify lock acquisition * style: auto-format with ruff --------- Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:33:35 +08:00

6 Commits