fix(sandbox): add startup reconciliation to prevent orphaned container leaks (#1976)

* fix(sandbox): add startup reconciliation to prevent orphaned container leaks

Sandbox containers were never cleaned up when the managing process restarted,
because all lifecycle tracking lived in in-memory dictionaries. This adds
startup reconciliation that enumerates running containers via `docker ps` and
either destroys orphans (age > idle_timeout) or adopts them into the warm pool.

Closes #1972

* fix(sandbox): address Copilot review — adopt-all strategy, improved error handling

- Reconciliation now adopts all containers into warm pool unconditionally,
  letting the idle checker decide cleanup. Avoids destroying containers
  that another concurrent process may still be using.
- list_running() logs stderr on docker ps failure and catches
  FileNotFoundError/OSError.
- Signal handler test restores SIGTERM/SIGINT in addition to SIGHUP.
- E2E test docstring corrected to match actual coverage scope.

* fix(sandbox): address maintainer review — batch inspect, lock tightening, import hygiene

- _reconcile_orphans(): merge check-and-insert into a single lock acquisition
  per container to eliminate the TOCTOU window.
- list_running(): batch the per-container docker inspect into a single call.
  Total subprocess calls drop from 2N+1 to 2 (one ps + one batch inspect).
  Parse port and created_at from the inspect JSON payload.
- Extract _parse_docker_timestamp() and _extract_host_port() as module-level
  pure helpers and test them directly.
- Move datetime/json imports to module top level.
- _make_provider_for_reconciliation(): document the __new__ bypass and the
  lockstep coupling to AioSandboxProvider.__init__.
- Add assertion that list_running() makes exactly ONE inspect call.
This commit is contained in:
Xinmin Zeng
2026-04-09 17:21:23 +08:00
committed by GitHub
parent 140907ce1d
commit 0b6fa8b9e1
5 changed files with 1020 additions and 4 deletions
@@ -96,3 +96,19 @@ class SandboxBackend(ABC):
SandboxInfo if found and healthy, None otherwise.
"""
...
def list_running(self) -> list[SandboxInfo]:
"""Enumerate all running sandboxes managed by this backend.
Used for startup reconciliation: when the process restarts, it needs
to discover containers started by previous processes so they can be
adopted into the warm pool or destroyed if idle too long.
The default implementation returns an empty list, which is correct
for backends that don't manage local containers (e.g., RemoteSandboxBackend
delegates lifecycle to the provisioner which handles its own cleanup).
Returns:
A list of SandboxInfo for all currently running sandboxes.
"""
return []