mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-05-24 17:06:00 +00:00
fix(sandbox): add startup reconciliation to prevent orphaned container leaks (#1976)
* fix(sandbox): add startup reconciliation to prevent orphaned container leaks Sandbox containers were never cleaned up when the managing process restarted, because all lifecycle tracking lived in in-memory dictionaries. This adds startup reconciliation that enumerates running containers via `docker ps` and either destroys orphans (age > idle_timeout) or adopts them into the warm pool. Closes #1972 * fix(sandbox): address Copilot review — adopt-all strategy, improved error handling - Reconciliation now adopts all containers into warm pool unconditionally, letting the idle checker decide cleanup. Avoids destroying containers that another concurrent process may still be using. - list_running() logs stderr on docker ps failure and catches FileNotFoundError/OSError. - Signal handler test restores SIGTERM/SIGINT in addition to SIGHUP. - E2E test docstring corrected to match actual coverage scope. * fix(sandbox): address maintainer review — batch inspect, lock tightening, import hygiene - _reconcile_orphans(): merge check-and-insert into a single lock acquisition per container to eliminate the TOCTOU window. - list_running(): batch the per-container docker inspect into a single call. Total subprocess calls drop from 2N+1 to 2 (one ps + one batch inspect). Parse port and created_at from the inspect JSON payload. - Extract _parse_docker_timestamp() and _extract_host_port() as module-level pure helpers and test them directly. - Move datetime/json imports to module top level. - _make_provider_for_reconciliation(): document the __new__ bypass and the lockstep coupling to AioSandboxProvider.__init__. - Add assertion that list_running() makes exactly ONE inspect call.
This commit is contained in:
@@ -96,3 +96,19 @@ class SandboxBackend(ABC):
|
||||
SandboxInfo if found and healthy, None otherwise.
|
||||
"""
|
||||
...
|
||||
|
||||
def list_running(self) -> list[SandboxInfo]:
|
||||
"""Enumerate all running sandboxes managed by this backend.
|
||||
|
||||
Used for startup reconciliation: when the process restarts, it needs
|
||||
to discover containers started by previous processes so they can be
|
||||
adopted into the warm pool or destroyed if idle too long.
|
||||
|
||||
The default implementation returns an empty list, which is correct
|
||||
for backends that don't manage local containers (e.g., RemoteSandboxBackend
|
||||
delegates lifecycle to the provisioner which handles its own cleanup).
|
||||
|
||||
Returns:
|
||||
A list of SandboxInfo for all currently running sandboxes.
|
||||
"""
|
||||
return []
|
||||
|
||||
Reference in New Issue
Block a user