fix the unit test error

Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-21 15:36:48 +00:00 · 2026-05-21 19:41:24 +08:00 · 2026-05-21 19:27:33 +08:00 · 2026-05-21 17:51:56 +08:00 · 2026-05-21 16:49:31 +08:00 · 2026-05-21 16:42:26 +08:00
82 changed files with 5908 additions and 506 deletions
@@ -1,6 +1,6 @@
 # DeerFlow - Unified Development Environment

-.PHONY: help config config-upgrade check install setup doctor dev dev-daemon start start-daemon stop up down clean docker-init docker-start docker-stop docker-logs docker-logs-frontend docker-logs-gateway
+.PHONY: help config config-upgrade check install setup doctor detect-thread-boundaries dev dev-daemon start start-daemon stop up down clean docker-init docker-start docker-stop docker-logs docker-logs-frontend docker-logs-gateway

 BASH ?= bash
 BACKEND_UV_RUN = cd backend && uv run
@@ -23,6 +23,7 @@ help:
 	@echo "  make config          - Generate local config files (aborts if config already exists)"
 	@echo "  make config-upgrade  - Merge new fields from config.example.yaml into config.yaml"
 	@echo "  make check           - Check if all required tools are installed"
+	@echo "  make detect-thread-boundaries - Inventory async/thread boundary points"
 	@echo "  make install         - Install all dependencies (frontend + backend + pre-commit hooks)"
 	@echo "  make setup-sandbox   - Pre-pull sandbox container image (recommended)"
 	@echo "  make dev             - Start all services in development mode (with hot-reloading)"
@@ -51,6 +52,9 @@ setup:
 doctor:
 	@$(BACKEND_UV_RUN) python ../scripts/doctor.py

+detect-thread-boundaries:
+	@$(PYTHON) ./scripts/detect_thread_boundaries.py
+
 config:
 	@$(PYTHON) ./scripts/configure.py

@@ -546,6 +546,15 @@ LANGFUSE_BASE_URL=https://cloud.langfuse.com

 If you are using a self-hosted Langfuse instance, set `LANGFUSE_BASE_URL` to your deployment URL.

+**Trace correlation fields.** Every agent run is annotated with Langfuse's reserved trace attributes so the Sessions and Users pages light up automatically:
+
+- `session_id` = LangGraph `thread_id` — groups every trace of the same conversation
+- `user_id` = effective user from `get_effective_user_id()` (falls back to `default` in no-auth mode)
+- `trace_name` = assistant id (defaults to `lead-agent`)
+- `tags` = `[env:<DEER_FLOW_ENV>, model:<model_name>]` (omitted when not set)
+
+These are injected into `RunnableConfig.metadata` at the graph invocation root for both the gateway path (`runtime/runs/worker.py::run_agent`) and the embedded path (`client.py::DeerFlowClient.stream`), so any LangChain-compatible callback can read them. Set `DEER_FLOW_ENV` (or `ENVIRONMENT`) to tag traces by deployment environment.
+
 #### Using Both Providers

 If both LangSmith and Langfuse are enabled, DeerFlow attaches both tracing callbacks and reports the same model activity to both systems.
@@ -236,7 +236,7 @@ Proxied through nginx: `/api/langgraph/*` → Gateway LangGraph-compatible runti
 ### Sandbox System (`packages/harness/deerflow/sandbox/`)

 **Interface**: Abstract `Sandbox` with `execute_command`, `read_file`, `write_file`, `list_dir`
-**Provider Pattern**: `SandboxProvider` with `acquire`, `get`, `release` lifecycle
+**Provider Pattern**: `SandboxProvider` with `acquire`, `acquire_async`, `get`, `release` lifecycle. Async agent/tool paths call async sandbox lifecycle hooks so Docker sandbox creation, discovery, cross-process locking, readiness polling, and release stay off the event loop.
 **Implementations**:
 - `LocalSandboxProvider` - Local filesystem execution. `acquire(thread_id)` returns a per-thread `LocalSandbox` (id `local:{thread_id}`) whose `path_mappings` resolve `/mnt/user-data/{workspace,uploads,outputs}` and `/mnt/acp-workspace` to that thread's host directories, so the public `Sandbox` API honours the `/mnt/user-data` contract uniformly with AIO. `acquire()` / `acquire(None)` keeps the legacy generic singleton (id `local`) for callers without a thread context. Per-thread sandboxes are held in an LRU cache (default 256 entries) guarded by a `threading.Lock`.
 - `AioSandboxProvider` (`packages/harness/deerflow/community/`) - Docker-based isolation
@@ -397,6 +397,24 @@ Focused regression coverage for the updater lives in `backend/tests/test_memory_
 - `resolve_variable(path)` - Import module and return variable (e.g., `module.path:variable_name`)
 - `resolve_class(path, base_class)` - Import and validate class against base class

+### Tracing System (`packages/harness/deerflow/tracing/`)
+
+LangSmith and Langfuse are both supported. The wiring lives in two layers:
+
+- `factory.py::build_tracing_callbacks()` — returns the LangChain `CallbackHandler` list for the providers currently enabled via env vars (`LANGSMITH_TRACING`, `LANGFUSE_TRACING`, etc.). The handlers are attached at the **graph invocation root** for in-graph runs (`make_lead_agent` and `DeerFlowClient.stream` both append them to `config["callbacks"]` before invoking the graph) so a single run produces one trace with all node / LLM / tool calls as child spans. Standalone callers — anything that invokes a model outside such a graph (e.g. `MemoryUpdater`) — keep `create_chat_model`'s default `attach_tracing=True`, which falls back to model-level callback attachment.
+- `metadata.py::build_langfuse_trace_metadata()` — builds the Langfuse-reserved trace attributes for `RunnableConfig.metadata`. The Langfuse v4 `langchain.CallbackHandler` lifts these onto the root trace (see its `_parse_langfuse_trace_attributes`), but only when it sees `on_chain_start(parent_run_id=None)` — which is why the callbacks have to live at the graph root, not the model.
+
+**Trace-attribute injection points**: both `runtime/runs/worker.py::run_agent` (gateway path) and `client.py::DeerFlowClient.stream` (embedded path) merge the metadata into `config["metadata"]` right before constructing the graph. Caller-supplied keys win via `setdefault`, so an external `session_id` override is preserved. Field mapping:
+
+| Langfuse field         | Source                                       |
+|-----------------------|----------------------------------------------|
+| `langfuse_session_id` | LangGraph `thread_id`                         |
+| `langfuse_user_id`    | `get_effective_user_id()` (`default` in no-auth) |
+| `langfuse_trace_name` | `RunRecord.assistant_id` / client `agent_name` (defaults to `lead-agent`) |
+| `langfuse_tags`       | `env:<DEER_FLOW_ENV>` + `model:<model_name>`  |
+
+Returns `{}` when Langfuse is not in the enabled providers — LangSmith-only deployments are unaffected. Set `DEER_FLOW_ENV` (or `ENVIRONMENT`) to tag traces by deployment environment. Tests live in `tests/test_tracing_factory.py`, `tests/test_tracing_metadata.py`, `tests/test_worker_langfuse_metadata.py`, and `tests/test_client_langfuse_metadata.py`.
+
 ### Config Schema

 **`config.yaml`** key sections:
@@ -2,13 +2,13 @@ install:
 	uv sync

 dev:
-	PYTHONPATH=. uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001 --reload
+	PYTHONPATH=. PYTHONIOENCODING=utf-8 PYTHONUTF8=1 uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001 --reload

 gateway:
-	PYTHONPATH=. uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001
+	PYTHONPATH=. PYTHONIOENCODING=utf-8 PYTHONUTF8=1 uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001

 test:
-	PYTHONPATH=. uv run pytest tests/ -v
+	PYTHONPATH=. PYTHONIOENCODING=utf-8 PYTHONUTF8=1 uv run pytest tests/ -v

 lint:
 	uvx ruff check .
@@ -69,7 +69,7 @@ Middlewares execute in strict order, each handling a specific concern:
 Per-thread isolated execution with virtual path translation:

 - **Abstract interface**: `execute_command`, `read_file`, `write_file`, `list_dir`
- **Providers**: `LocalSandboxProvider` (filesystem) and `AioSandboxProvider` (Docker, in community/)
+- **Providers**: `LocalSandboxProvider` (filesystem) and `AioSandboxProvider` (Docker, in community/). Async runtime paths use async sandbox lifecycle hooks so startup, readiness polling, and release do not block the event loop.
 - **Virtual paths**: `/mnt/user-data/{workspace,uploads,outputs}` → thread-specific physical directories
 - **Skills path**: `/mnt/skills` → `deer-flow/skills/` directory
 - **Skills loading**: Recursively discovers nested `SKILL.md` files under `skills/{public,custom}` and preserves nested container paths
@@ -146,13 +146,6 @@ def _normalize_custom_agent_name(raw_value: str) -> str:
    return normalized


-def _strip_loop_warning_text(text: str) -> str:
-    """Remove middleware-authored loop warning lines from display text."""
-    if "[LOOP DETECTED]" not in text:
-        return text
-    return "\n".join(line for line in text.splitlines() if "[LOOP DETECTED]" not in line).strip()
-
-
 def _extract_response_text(result: dict | list) -> str:
    """Extract the last AI message text from a LangGraph runs.wait result.

@@ -162,7 +155,6 @@ def _extract_response_text(result: dict | list) -> str:
    Handles special cases:
    - Regular AI text responses
    - Clarification interrupts (``ask_clarification`` tool messages)
-    - Strips loop-detection warnings attached to tool-call AI messages
    """
    if isinstance(result, list):
        messages = result
@@ -192,12 +184,7 @@ def _extract_response_text(result: dict | list) -> str:
        # Regular AI message with text content
        if msg_type == "ai":
            content = msg.get("content", "")
-            has_tool_calls = bool(msg.get("tool_calls"))
            if isinstance(content, str) and content:
-                if has_tool_calls:
-                    content = _strip_loop_warning_text(content)
-                    if not content:
-                        continue
                return content
            # content can be a list of content blocks
            if isinstance(content, list):
@@ -208,8 +195,6 @@ def _extract_response_text(result: dict | list) -> str:
                    elif isinstance(block, str):
                        parts.append(block)
                text = "".join(parts)
-                if has_tool_calls:
-                    text = _strip_loop_warning_text(text)
                if text:
                    return text
    return ""
@@ -63,6 +63,99 @@ class McpConfigUpdateRequest(BaseModel):
    )


+_MASKED_VALUE = "***"
+
+
+def _mask_server_config(server: McpServerConfigResponse) -> McpServerConfigResponse:
+    """Return a copy of server config with sensitive fields masked.
+
+    Masks env values, header values, and removes OAuth secrets so they
+    are not exposed through the GET API endpoint.
+    """
+    masked_env = {k: _MASKED_VALUE for k in server.env}
+    masked_headers = {k: _MASKED_VALUE for k in server.headers}
+    masked_oauth = None
+    if server.oauth is not None:
+        masked_oauth = server.oauth.model_copy(
+            update={
+                "client_secret": None,
+                "refresh_token": None,
+            }
+        )
+    return server.model_copy(
+        update={
+            "env": masked_env,
+            "headers": masked_headers,
+            "oauth": masked_oauth,
+        }
+    )
+
+
+def _merge_preserving_secrets(
+    incoming: McpServerConfigResponse,
+    existing: McpServerConfigResponse,
+) -> McpServerConfigResponse:
+    """Merge incoming config with existing, preserving secrets masked by GET.
+
+    When the frontend toggles ``enabled`` it round-trips the full config:
+    GET (masked) → modify enabled → PUT (masked values sent back).
+    This function ensures masked values (``***``) are replaced with the
+    real secrets from the current on-disk config.
+
+    ``***`` is only accepted for keys that already exist in *existing*.
+    New keys must provide a real value.
+
+    For OAuth secrets, ``None`` means "preserve the existing stored value"
+    so masked GET responses can be safely round-tripped. To explicitly clear
+    a stored secret, clients may send an empty string, which is converted
+    to ``None`` before persisting.
+    """
+    merged_env = {}
+    for k, v in incoming.env.items():
+        if v == _MASKED_VALUE:
+            if k in existing.env:
+                merged_env[k] = existing.env[k]
+            else:
+                raise HTTPException(
+                    status_code=400,
+                    detail=f"Cannot set env key '{k}' to masked value '***'; provide a real value.",
+                )
+        else:
+            merged_env[k] = v
+
+    merged_headers = {}
+    for k, v in incoming.headers.items():
+        if v == _MASKED_VALUE:
+            if k in existing.headers:
+                merged_headers[k] = existing.headers[k]
+            else:
+                raise HTTPException(
+                    status_code=400,
+                    detail=f"Cannot set header '{k}' to masked value '***'; provide a real value.",
+                )
+        else:
+            merged_headers[k] = v
+
+    merged_oauth = incoming.oauth
+    if incoming.oauth is not None and existing.oauth is not None:
+        # None = preserve (masked round-trip), "" = explicitly clear, else = new value
+        merged_client_secret = existing.oauth.client_secret if incoming.oauth.client_secret is None else (None if incoming.oauth.client_secret == "" else incoming.oauth.client_secret)
+        merged_refresh_token = existing.oauth.refresh_token if incoming.oauth.refresh_token is None else (None if incoming.oauth.refresh_token == "" else incoming.oauth.refresh_token)
+        merged_oauth = incoming.oauth.model_copy(
+            update={
+                "client_secret": merged_client_secret,
+                "refresh_token": merged_refresh_token,
+            }
+        )
+    return incoming.model_copy(
+        update={
+            "env": merged_env,
+            "headers": merged_headers,
+            "oauth": merged_oauth,
+        }
+    )
+
+
@router.get(
    "/mcp/config",
    response_model=McpConfigResponse,
@@ -83,7 +176,7 @@ async def get_mcp_configuration() -> McpConfigResponse:
                    "enabled": true,
                    "command": "npx",
                    "args": ["-y", "@modelcontextprotocol/server-github"],
-                    "env": {"GITHUB_TOKEN": "ghp_xxx"},
+                    "env": {"GITHUB_TOKEN": "***"},
                    "description": "GitHub MCP server for repository operations"
                }
            }
@@ -92,7 +185,8 @@ async def get_mcp_configuration() -> McpConfigResponse:
    """
    config = get_extensions_config()

-    return McpConfigResponse(mcp_servers={name: McpServerConfigResponse(**server.model_dump()) for name, server in config.mcp_servers.items()})
+    servers = {name: _mask_server_config(McpServerConfigResponse(**server.model_dump())) for name, server in config.mcp_servers.items()}
+    return McpConfigResponse(mcp_servers=servers)


@router.put(
@@ -142,14 +236,39 @@ async def update_mcp_configuration(request: McpConfigUpdateRequest) -> McpConfig
            config_path = Path.cwd().parent / "extensions_config.json"
            logger.info(f"No existing extensions config found. Creating new config at: {config_path}")

-        # Load current config to preserve skills configuration
+        # Load current config to preserve skills
        current_config = get_extensions_config()

-        # Convert request to dict format for JSON serialization
-        config_data = {
-            "mcpServers": {name: server.model_dump() for name, server in request.mcp_servers.items()},
-            "skills": {name: {"enabled": skill.enabled} for name, skill in current_config.skills.items()},
-        }
+        # Load raw (un-resolved) JSON from disk to use as the merge source.
+        # This preserves $VAR placeholders in env values and top-level keys
+        # like mcpInterceptors that would otherwise be lost.
+        raw_servers: dict[str, dict] = {}
+        raw_other_keys: dict = {}
+        if config_path is not None and config_path.exists():
+            with open(config_path, encoding="utf-8") as f:
+                raw_data = json.load(f)
+            raw_servers = raw_data.get("mcpServers", {})
+            # Preserve any top-level keys beyond mcpServers/skills
+            for key, value in raw_data.items():
+                if key not in ("mcpServers", "skills"):
+                    raw_other_keys[key] = value
+
+        # Merge incoming server configs with raw on-disk secrets
+        merged_servers: dict[str, McpServerConfigResponse] = {}
+        for name, incoming in request.mcp_servers.items():
+            raw_server = raw_servers.get(name)
+            if raw_server is not None:
+                merged_servers[name] = _merge_preserving_secrets(
+                    incoming,
+                    McpServerConfigResponse(**raw_server),
+                )
+            else:
+                merged_servers[name] = incoming
+
+        # Build config data preserving all top-level keys from the original file
+        config_data = dict(raw_other_keys)
+        config_data["mcpServers"] = {name: server.model_dump() for name, server in merged_servers.items()}
+        config_data["skills"] = {name: {"enabled": skill.enabled} for name, skill in current_config.skills.items()}

        # Write the configuration to file
        with open(config_path, "w", encoding="utf-8") as f:
@@ -162,7 +281,8 @@ async def update_mcp_configuration(request: McpConfigUpdateRequest) -> McpConfig

        # Reload the configuration and update the global cache
        reloaded_config = reload_extensions_config()
-        return McpConfigResponse(mcp_servers={name: McpServerConfigResponse(**server.model_dump()) for name, server in reloaded_config.mcp_servers.items()})
+        servers = {name: _mask_server_config(McpServerConfigResponse(**server.model_dump())) for name, server in reloaded_config.mcp_servers.items()}
+        return McpConfigResponse(mcp_servers=servers)

    except Exception as e:
        logger.error(f"Failed to update MCP configuration: {e}", exc_info=True)
@@ -74,6 +74,25 @@ def _make_file_sandbox_writable(file_path: os.PathLike[str] | str) -> None:
    os.chmod(file_path, writable_mode, **chmod_kwargs)


+def _make_file_sandbox_readable(file_path: os.PathLike[str] | str) -> None:
+    """Ensure uploaded files are readable by the sandbox process.
+
+    For Docker sandboxes (AIO), the gateway writes files as root with 0o600
+    permissions, then bind-mounts the host directory into the container. The
+    sandbox process inside the container runs as a non-root user and may be
+    unable to read those files without broader read access. To avoid making
+    uploads world-readable on the host, only the group read bit is added here.
+    """
+    file_stat = os.lstat(file_path)
+    if stat.S_ISLNK(file_stat.st_mode):
+        logger.warning("Skipping sandbox chmod for symlinked upload path: %s", file_path)
+        return
+
+    readable_mode = stat.S_IMODE(file_stat.st_mode) | stat.S_IRGRP
+    chmod_kwargs = {"follow_symlinks": False} if os.chmod in os.supports_follow_symlinks else {}
+    os.chmod(file_path, readable_mode, **chmod_kwargs)
+
+
 def _uses_thread_data_mounts(sandbox_provider: SandboxProvider) -> bool:
    return bool(getattr(sandbox_provider, "uses_thread_data_mounts", False))

@@ -276,6 +295,15 @@ async def upload_files(
            _cleanup_uploaded_paths(written_paths)
            raise HTTPException(status_code=500, detail=f"Failed to upload {file.filename}: {str(e)}")

+    # When the sandbox uses bind-mounted thread data directories (e.g. AIO with
+    # LocalContainerBackend), uploaded files are visible inside the container but
+    # retain the 0o600 permissions set by the gateway.  The sandbox process runs
+    # as a different user and cannot read them.  Adjust permissions to add
+    # group/other read bits so the sandbox can access the files.
+    if not sync_to_sandbox and getattr(sandbox_provider, "needs_upload_permission_adjustment", True):
+        for file_path in written_paths:
+            _make_file_sandbox_readable(file_path)
+
    if sync_to_sandbox:
        for file_path, virtual_path in sandbox_sync_targets:
            _make_file_sandbox_writable(file_path)
@@ -32,6 +32,7 @@ from deerflow.runtime import (
    UnsupportedStrategyError,
    run_agent,
 )
+from deerflow.runtime.runs.naming import resolve_root_run_name

 logger = logging.getLogger(__name__)

@@ -235,6 +236,7 @@ def build_run_config(
            target = config.setdefault("configurable", {})
        if target is not None and "agent_name" not in target:
            target["agent_name"] = normalized
+        config.setdefault("run_name", resolve_root_run_name(config, normalized))
    if metadata:
        config.setdefault("metadata", {}).update(metadata)
    return config
@@ -4,22 +4,22 @@

 `create_deerflow_agent` 通过 `RuntimeFeatures` 组装的完整 middleware 链（默认全开时）：

-| # | Middleware | `before_agent` | `before_model` | `after_model` | `after_agent` | `wrap_tool_call` | 主 Agent | Subagent | 来源 |
-|---|-----------|:-:|:-:|:-:|:-:|:-:|:-:|:-:|------|
-| 0 | ThreadDataMiddleware | ✓ | | | | | ✓ | ✓ | `sandbox` |
-| 1 | UploadsMiddleware | ✓ | | | | | ✓ | ✗ | `sandbox` |
-| 2 | SandboxMiddleware | ✓ | | | ✓ | | ✓ | ✓ | `sandbox` |
-| 3 | DanglingToolCallMiddleware | | | ✓ | | | ✓ | ✗ | 始终开启 |
-| 4 | GuardrailMiddleware | | | | | ✓ | ✓ | ✓ | *Phase 2 纳入* |
-| 5 | ToolErrorHandlingMiddleware | | | | | ✓ | ✓ | ✓ | 始终开启 |
-| 6 | SummarizationMiddleware | | | ✓ | | | ✓ | ✗ | `summarization` |
-| 7 | TodoMiddleware | | | ✓ | | | ✓ | ✗ | `plan_mode` 参数 |
-| 8 | TitleMiddleware | | | ✓ | | | ✓ | ✗ | `auto_title` |
-| 9 | MemoryMiddleware | | | | ✓ | | ✓ | ✗ | `memory` |
-| 10 | ViewImageMiddleware | | ✓ | | | | ✓ | ✗ | `vision` |
-| 11 | SubagentLimitMiddleware | | | ✓ | | | ✓ | ✗ | `subagent` |
-| 12 | LoopDetectionMiddleware | | | ✓ | | | ✓ | ✗ | 始终开启 |
-| 13 | ClarificationMiddleware | | | ✓ | | | ✓ | ✗ | 始终最后 |
+| # | Middleware | `before_agent` | `before_model` | `after_model` | `after_agent` | `wrap_model_call` | `wrap_tool_call` | 主 Agent | Subagent | 来源 |
+|---|-----------|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|------|
+| 0 | ThreadDataMiddleware | ✓ | | | | | | ✓ | ✓ | `sandbox` |
+| 1 | UploadsMiddleware | ✓ | | | | | | ✓ | ✗ | `sandbox` |
+| 2 | SandboxMiddleware | ✓ | | | ✓ | | | ✓ | ✓ | `sandbox` |
+| 3 | DanglingToolCallMiddleware | | | | | ✓ | | ✓ | ✗ | 始终开启 |
+| 4 | GuardrailMiddleware | | | | | | ✓ | ✓ | ✓ | *Phase 2 纳入* |
+| 5 | ToolErrorHandlingMiddleware | | | | | | ✓ | ✓ | ✓ | 始终开启 |
+| 6 | SummarizationMiddleware | | ✓ | | | | | ✓ | ✗ | `summarization` |
+| 7 | TodoMiddleware | | ✓ | ✓ | | ✓ | | ✓ | ✗ | `plan_mode` 参数 |
+| 8 | TitleMiddleware | | | ✓ | | | | ✓ | ✗ | `auto_title` |
+| 9 | MemoryMiddleware | | | | ✓ | | | ✓ | ✗ | `memory` |
+| 10 | ViewImageMiddleware | | ✓ | | | | | ✓ | ✗ | `vision` |
+| 11 | SubagentLimitMiddleware | | | ✓ | | | | ✓ | ✗ | `subagent` |
+| 12 | LoopDetectionMiddleware | ✓ | | ✓ | ✓ | ✓ | | ✓ | ✗ | 始终开启 |
+| 13 | ClarificationMiddleware | | | | | | ✓ | ✓ | ✗ | 始终最后 |

 主 agent **14 个** middleware（`make_lead_agent`），subagent **4 个**（ThreadData、Sandbox、Guardrail、ToolErrorHandling）。`create_deerflow_agent` Phase 1 实现 **13 个**（Guardrail 仅支持自定义实例，无内置默认）。

@@ -35,7 +35,7 @@ graph TB

    subgraph BA ["<b>before_agent</b> 正序 0→N"]
        direction TB
-        TD["[0] ThreadData<br/>创建线程目录"] --> UL["[1] Uploads<br/>扫描上传文件"] --> SB["[2] Sandbox<br/>获取沙箱"]
+        TD["[0] ThreadData<br/>创建线程目录"] --> UL["[1] Uploads<br/>扫描上传文件"] --> SB["[2] Sandbox<br/>获取沙箱"] --> LD_BA["[12] LoopDetection<br/>清理 stale warning"]
    end

    subgraph BM ["<b>before_model</b> 正序 0→N"]
@@ -43,34 +43,42 @@ graph TB
        VI["[10] ViewImage<br/>注入图片 base64"]
    end

-    SB --> VI
-    VI --> M["<b>MODEL</b>"]
+    subgraph WM ["<b>wrap_model_call</b>"]
+        direction TB
+        DTC_WM["[3] DanglingToolCall<br/>补悬空 ToolMessage"] --> LD_WM["[12] LoopDetection<br/>注入当前 run warning"]
+    end
+
+    LD_BA --> VI
+    VI --> DTC_WM
+    LD_WM --> M["<b>MODEL</b>"]

    subgraph AM ["<b>after_model</b> 反序 N→0"]
        direction TB
-        CL["[13] Clarification<br/>拦截 ask_clarification"] --> LD["[12] LoopDetection<br/>检测循环"] --> SL["[11] SubagentLimit<br/>截断多余 task"] --> TI["[8] Title<br/>生成标题"] --> SM["[6] Summarization<br/>上下文压缩"] --> DTC["[3] DanglingToolCall<br/>补缺失 ToolMessage"]
+        LD["[12] LoopDetection<br/>检测循环/排队 warning"] --> SL["[11] SubagentLimit<br/>截断多余 task"] --> TI["[8] Title<br/>生成标题"]
    end

-    M --> CL
+    M --> LD

    subgraph AA ["<b>after_agent</b> 反序 N→0"]
        direction TB
-        SBR["[2] Sandbox<br/>释放沙箱"] --> MEM["[9] Memory<br/>入队记忆"]
+        LD_CLEAN["[12] LoopDetection<br/>清理 pending warning"] --> MEM["[9] Memory<br/>入队记忆"] --> SBR["[2] Sandbox<br/>释放沙箱"]
    end

-    DTC --> SBR
-    MEM --> END(["response"])
+    TI --> LD_CLEAN
+    SBR --> END(["response"])

    classDef beforeNode fill:#a0a8b5,stroke:#636b7a,color:#2d3239
    classDef modelNode fill:#b5a8a0,stroke:#7a6b63,color:#2d3239
+    classDef wrapModelNode fill:#a8a0b5,stroke:#6b637a,color:#2d3239
    classDef afterModelNode fill:#b5a0a8,stroke:#7a636b,color:#2d3239
    classDef afterAgentNode fill:#a0b5a8,stroke:#637a6b,color:#2d3239
    classDef terminalNode fill:#a8b5a0,stroke:#6b7a63,color:#2d3239

-    class TD,UL,SB,VI beforeNode
+    class TD,UL,SB,LD_BA,VI beforeNode
+    class DTC_WM,LD_WM wrapModelNode
    class M modelNode
-    class CL,LD,SL,TI,SM,DTC afterModelNode
-    class SBR,MEM afterAgentNode
+    class LD,SL,TI afterModelNode
+    class LD_CLEAN,SBR,MEM afterAgentNode
    class START,END terminalNode
 ```

@@ -82,13 +90,12 @@ sequenceDiagram
    participant TD as ThreadDataMiddleware
    participant UL as UploadsMiddleware
    participant SB as SandboxMiddleware
+    participant LD as LoopDetectionMiddleware
    participant VI as ViewImageMiddleware
+    participant DTC as DanglingToolCallMiddleware
    participant M as MODEL
-    participant CL as ClarificationMiddleware
    participant SL as SubagentLimitMiddleware
    participant TI as TitleMiddleware
-    participant SM as SummarizationMiddleware
-    participant DTC as DanglingToolCallMiddleware
    participant MEM as MemoryMiddleware

    U ->> TD: invoke
@@ -103,19 +110,26 @@ sequenceDiagram
    activate SB
    Note right of SB: before_agent 获取沙箱

-    SB ->> VI: before_model
+    SB ->> LD: before_agent
+    activate LD
+    Note right of LD: before_agent 清理同 thread 旧 run 的 pending warning
+    LD ->> VI: before_model
    activate VI
    Note right of VI: before_model 注入图片 base64

-    VI ->> M: messages + tools
+    VI ->> DTC: wrap_model_call
+    activate DTC
+    Note right of DTC: wrap_model_call 补悬空 ToolMessage
+    DTC ->> LD: wrap_model_call
+    Note right of LD: wrap_model_call drain 当前 run warning 并追加到末尾
+    LD ->> M: messages + tools
    activate M
-    M -->> CL: AI response
+    M -->> LD: AI response
    deactivate M

-    activate CL
-    Note right of CL: after_model 拦截 ask_clarification
-    CL -->> SL: after_model
-    deactivate CL
+    Note right of LD: after_model 检测循环；warning 入队，hard-stop 清 tool_calls
+    LD -->> SL: after_model
+    deactivate LD

    activate SL
    Note right of SL: after_model 截断多余 task
@@ -124,22 +138,18 @@ sequenceDiagram

    activate TI
    Note right of TI: after_model 生成标题
-    TI -->> SM: after_model
+    TI -->> DTC: done
    deactivate TI

-    activate SM
-    Note right of SM: after_model 上下文压缩
-    SM -->> DTC: after_model
-    deactivate SM
-
-    activate DTC
-    Note right of DTC: after_model 补缺失 ToolMessage
-    DTC -->> VI: done
    deactivate DTC

    VI -->> SB: done
    deactivate VI

+    Note right of LD: after_agent 清理当前 run 未消费 warning
+
+    Note right of MEM: after_agent 入队记忆
+
    Note right of SB: after_agent 释放沙箱
    SB -->> UL: done
    deactivate SB
@@ -147,8 +157,6 @@ sequenceDiagram
    UL -->> TD: done
    deactivate UL

-    Note right of MEM: after_agent 入队记忆
-
    TD -->> U: response
    deactivate TD
 ```
@@ -224,12 +232,12 @@ sequenceDiagram
    participant TD as ThreadData
    participant UL as Uploads
    participant SB as Sandbox
+    participant LD as LoopDetection
    participant VI as ViewImage
+    participant DTC as DanglingToolCall
    participant M as MODEL
-    participant CL as Clarification
    participant SL as SubagentLimit
    participant TI as Title
-    participant SM as Summarization
    participant MEM as Memory

    U ->> TD: invoke
@@ -238,34 +246,40 @@ sequenceDiagram
    Note right of UL: before_agent 扫描文件
    UL ->> SB: .
    Note right of SB: before_agent 获取沙箱
+    SB ->> LD: .
+    Note right of LD: before_agent 清理 stale pending warning

    loop 每轮对话（tool call 循环）
        SB ->> VI: .
        Note right of VI: before_model 注入图片
-        VI ->> M: messages + tools
-        M -->> CL: AI response
-        Note right of CL: after_model 拦截 ask_clarification
-        CL -->> SL: .
+        VI ->> DTC: .
+        Note right of DTC: wrap_model_call 补悬空工具结果
+        DTC ->> LD: .
+        Note right of LD: wrap_model_call 注入当前 run warning
+        LD ->> M: messages + tools
+        M -->> LD: AI response
+        Note right of LD: after_model 检测循环/排队 warning
+        LD -->> SL: .
        Note right of SL: after_model 截断多余 task
        SL -->> TI: .
        Note right of TI: after_model 生成标题
-        TI -->> SM: .
-        Note right of SM: after_model 上下文压缩
    end

-    Note right of SB: after_agent 释放沙箱
-    SB -->> MEM: .
+    Note right of LD: after_agent 清理当前 run pending warning
+    LD -->> MEM: .
    Note right of MEM: after_agent 入队记忆
-    MEM -->> U: response
+    MEM -->> SB: .
+    Note right of SB: after_agent 释放沙箱
+    SB -->> U: response
 ```

 > [!warning] 不是洋葱
-> 14 个 middleware 中只有 SandboxMiddleware 有 before/after 对称（获取/释放）。其余都是单向的：要么只在 `before_*` 做事，要么只在 `after_*` 做事。`before_agent` / `after_agent` 只跑一次，`before_model` / `after_model` 每轮循环都跑。
+> 大部分 middleware 只用一个阶段。SandboxMiddleware 使用 `before_agent`/`after_agent` 做资源获取/释放；LoopDetectionMiddleware 也使用这两个钩子，但用途是清理 run-scoped pending warnings，不是资源生命周期对称。`before_agent` / `after_agent` 只跑一次，`before_model` / `after_model` / `wrap_model_call` 每轮循环都跑。

 硬依赖只有 2 处：

 1. **ThreadData 在 Sandbox 之前** — sandbox 需要线程目录
-2. **Clarification 在列表最后** — `after_model` 反序时最先执行，第一个拦截 `ask_clarification`
+2. **Clarification 在列表最后** — `wrap_tool_call` 处理 `ask_clarification` 时优先拦截，并通过 `Command(goto=END)` 中断执行

 ### 结论

@@ -273,19 +287,19 @@ sequenceDiagram
 |---|---|---|
 | 每个 middleware | before + after 对称 | 大多只用一个钩子 |
 | 激活条 | 嵌套（外长内短） | 不嵌套（串行） |
-| 反序的意义 | 清理与初始化配对 | 仅影响 after_model 的执行优先级 |
+| 反序的意义 | 清理与初始化配对 | 影响 `after_model` / `after_agent` 的执行优先级 |
 | 典型例子 | Auth: 校验 token / 清理上下文 | ThreadData: 只创建目录，没有清理 |

 ## 关键设计点

 ### ClarificationMiddleware 为什么在列表最后？

-位置最后 = `after_model` 最先执行。它需要**第一个**看到 model 输出，检查是否有 `ask_clarification` tool call。如果有，立即中断（`Command(goto=END)`），后续 middleware 的 `after_model` 不再执行。
+位置最后使它在工具调用包装链中优先拦截 `ask_clarification`。如果命中，它返回 `Command(goto=END)`，把格式化后的澄清问题写成 `ToolMessage` 并中断执行。

 ### SandboxMiddleware 的对称性

 `before_agent`（正序第 3 个）获取沙箱，`after_agent`（反序第 1 个）释放沙箱。外层进入 → 外层退出，天然的洋葱对称。

-### 大部分 middleware 只用一个钩子
+### LoopDetectionMiddleware 为什么同时用多个钩子？

-14 个 middleware 中，只有 SandboxMiddleware 同时用了 `before_agent` + `after_agent`（获取/释放）。其余都只在一个阶段执行。洋葱模型的反序特性主要影响 `after_model` 阶段的执行顺序。
+`after_model` 只做检测：重复工具调用达到 warning 阈值时，把 warning 放入 `(thread_id, run_id)` 作用域的 pending 队列。真正注入发生在下一次 `wrap_model_call`：此时上一轮 `AIMessage(tool_calls)` 对应的 `ToolMessage` 已经在请求里，warning 追加在末尾，不会破坏 OpenAI/Moonshot 的 tool-call pairing。`before_agent` 清理同一 thread 下旧 run 的残留 warning，`after_agent` 清理当前 run 没被消费的 warning。
@@ -1,3 +1,23 @@
+"""Lead agent factory.
+
+INVARIANT — tracing callback placement
+======================================
+
+Tracing callbacks (Langfuse, LangSmith) are attached at the **graph
+invocation root** in :func:`_make_lead_agent` (see the
+``build_tracing_callbacks()`` block that appends to ``config["callbacks"]``).
+Every ``create_chat_model(...)`` call inside this module — and inside any
+middleware reachable from this graph (e.g. ``TitleMiddleware``) — MUST pass
+``attach_tracing=False``.
+
+Forgetting that flag emits duplicate spans (one rooted at the graph, one at
+the model) AND prevents the Langfuse handler's ``propagate_attributes``
+path from firing, so ``session_id`` / ``user_id`` never reach the trace.
+The four current sites are: bootstrap agent, default agent, summarization
+middleware, and the async path inside ``TitleMiddleware``. Any new in-graph
+``create_chat_model`` call must add to this list and pass the flag.
+"""
+
 import logging

 from langchain.agents import create_agent
@@ -22,6 +42,7 @@ from deerflow.config.app_config import AppConfig, get_app_config
 from deerflow.models import create_chat_model
 from deerflow.skills.tool_policy import filter_tools_by_skill_allowed_tools
 from deerflow.skills.types import Skill
+from deerflow.tracing import build_tracing_callbacks

 logger = logging.getLogger(__name__)

@@ -73,10 +94,14 @@ def _create_summarization_middleware(*, app_config: AppConfig | None = None) ->
    # Bind "middleware:summarize" tag so RunJournal identifies these LLM calls
    # as middleware rather than lead_agent (SummarizationMiddleware is a
    # LangChain built-in, so we tag the model at creation time).
+    # attach_tracing=False because the graph-level RunnableConfig (set in
+    # ``_make_lead_agent``) already carries tracing callbacks; binding them
+    # again at the model level would emit duplicate spans and break
+    # ``session_id`` / ``user_id`` propagation.
    if config.model_name:
-        model = create_chat_model(name=config.model_name, thinking_enabled=False, app_config=resolved_app_config)
+        model = create_chat_model(name=config.model_name, thinking_enabled=False, app_config=resolved_app_config, attach_tracing=False)
    else:
-        model = create_chat_model(thinking_enabled=False, app_config=resolved_app_config)
+        model = create_chat_model(thinking_enabled=False, app_config=resolved_app_config, attach_tracing=False)
    model = model.with_config(tags=["middleware:summarize"])

    # Prepare kwargs
@@ -408,13 +433,26 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
        }
    )

+    # Inject tracing callbacks at the graph invocation root so a single LangGraph
+    # run produces one trace with all node / LLM / tool calls as child spans,
+    # AND so the Langfuse handler sees ``on_chain_start(parent_run_id=None)`` and
+    # actually propagates ``langfuse_session_id`` / ``langfuse_user_id`` from
+    # ``config["metadata"]`` onto the trace. Without root-level attachment the
+    # model is a nested observation and the handler strips ``langfuse_*`` keys.
+    tracing_callbacks = build_tracing_callbacks()
+    if tracing_callbacks:
+        existing = config.get("callbacks") or []
+        if not isinstance(existing, list):
+            existing = list(existing)
+        config["callbacks"] = [*existing, *tracing_callbacks]
+
    skills_for_tool_policy = _load_enabled_skills_for_tool_policy(available_skills, app_config=resolved_app_config)

    if is_bootstrap:
        # Special bootstrap agent with minimal prompt for initial custom agent creation flow
        tools = get_available_tools(model_name=model_name, subagent_enabled=subagent_enabled, app_config=resolved_app_config) + [setup_agent]
        return create_agent(
-            model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, app_config=resolved_app_config),
+            model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, app_config=resolved_app_config, attach_tracing=False),
            tools=filter_tools_by_skill_allowed_tools(tools, skills_for_tool_policy),
            middleware=_build_middlewares(config, model_name=model_name, app_config=resolved_app_config),
            system_prompt=apply_prompt_template(
@@ -432,7 +470,7 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
    # Default lead agent (unchanged behavior)
    tools = get_available_tools(model_name=model_name, groups=agent_config.tool_groups if agent_config else None, subagent_enabled=subagent_enabled, app_config=resolved_app_config)
    return create_agent(
-        model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, reasoning_effort=reasoning_effort, app_config=resolved_app_config),
+        model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, reasoning_effort=reasoning_effort, app_config=resolved_app_config, attach_tracing=False),
        tools=filter_tools_by_skill_allowed_tools(tools + extra_tools, skills_for_tool_policy),
        middleware=_build_middlewares(config, model_name=model_name, agent_name=agent_name, app_config=resolved_app_config),
        system_prompt=apply_prompt_template(
@@ -338,7 +338,7 @@ class MemoryUpdater:
            reinforcement_detected=reinforcement_detected,
        )
        prompt = MEMORY_UPDATE_PROMPT.format(
-            current_memory=json.dumps(current_memory, indent=2),
+            current_memory=json.dumps(current_memory, indent=2, ensure_ascii=False),
            conversation=conversation_text,
            correction_hint=correction_hint,
        )
@@ -6,10 +6,36 @@ arguments indefinitely until the recursion limit kills the run.
 Detection strategy:
  1. After each model response, hash the tool calls (name + args).
  2. Track recent hashes in a sliding window.
-  3. If the same hash appears >= warn_threshold times, inject a
-     "you are repeating yourself — wrap up" system message (once per hash).
+  3. If the same hash appears >= warn_threshold times, queue a
+     "you are repeating yourself — wrap up" warning for the current
+     thread/run. The warning is **injected at the next model call** (in
+     ``wrap_model_call``) as a ``HumanMessage`` appended to the message
+     list, *after* all ToolMessage responses to the previous
+     AIMessage(tool_calls).
  4. If it appears >= hard_limit times, strip all tool_calls from the
     response so the agent is forced to produce a final text answer.
+
+Why the warning is injected at ``wrap_model_call`` instead of
+``after_model``:
+
+  ``after_model`` fires immediately after the model emits an
+  ``AIMessage`` that may carry ``tool_calls``. The tools node has not
+  run yet, so no matching ``ToolMessage`` exists in the history. Any
+  message we add here lands *between* the assistant's tool_calls and
+  their responses. OpenAI/Moonshot reject the next request with
+  ``"tool_call_ids did not have response messages"`` because their
+  validators require the assistant's tool_calls to be followed
+  immediately by tool messages. Anthropic also disallows mid-stream
+  ``SystemMessage``. By deferring the warning to ``wrap_model_call``,
+  every prior ToolMessage is already present in the request's message
+  list and the warning is appended at the end — pairing intact, no
+  ``AIMessage`` semantics are mutated.
+
+Queued warnings are intentionally transient. If a run ends before the
+next model request drains a queued warning, ``after_agent`` drops it
+instead of carrying it into a later invocation for the same thread. The
+hard-stop path still forces termination when the configured safety limit
+is reached.
 """

 from __future__ import annotations
@@ -19,11 +45,14 @@ import json
 import logging
 import threading
 from collections import OrderedDict, defaultdict
+from collections.abc import Awaitable, Callable
 from copy import deepcopy
 from typing import TYPE_CHECKING, override

 from langchain.agents import AgentState
 from langchain.agents.middleware import AgentMiddleware
+from langchain.agents.middleware.types import ModelCallResult, ModelRequest, ModelResponse
+from langchain_core.messages import HumanMessage
 from langgraph.runtime import Runtime

 if TYPE_CHECKING:
@@ -38,6 +67,7 @@ _DEFAULT_WINDOW_SIZE = 20  # track last N tool calls
 _DEFAULT_MAX_TRACKED_THREADS = 100  # LRU eviction limit
 _DEFAULT_TOOL_FREQ_WARN = 30  # warn after 30 calls to the same tool type
 _DEFAULT_TOOL_FREQ_HARD_LIMIT = 50  # force-stop after 50 calls to the same tool type
+_MAX_PENDING_WARNINGS_PER_RUN = 4


 def _normalize_tool_call_args(raw_args: object) -> tuple[dict, str | None]:
@@ -195,6 +225,12 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
        self._warned: dict[str, set[str]] = defaultdict(set)
        self._tool_freq: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))
        self._tool_freq_warned: dict[str, set[str]] = defaultdict(set)
+        # Per-thread/run queue of warnings to inject at the next model call.
+        # Populated by ``after_model`` (detection) and drained by
+        # ``wrap_model_call`` (injection); see module docstring.
+        self._pending_warnings: dict[tuple[str, str], list[str]] = defaultdict(list)
+        self._pending_warning_touch_order: OrderedDict[tuple[str, str], None] = OrderedDict()
+        self._max_pending_warning_keys = max(1, self.max_tracked_threads * 2)

    @classmethod
    def from_config(cls, config: LoopDetectionConfig) -> LoopDetectionMiddleware:
@@ -213,9 +249,20 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
        """Extract thread_id from runtime context for per-thread tracking."""
        thread_id = runtime.context.get("thread_id") if runtime.context else None
        if thread_id:
-            return thread_id
+            return str(thread_id)
        return "default"

+    def _get_run_id(self, runtime: Runtime) -> str:
+        """Extract run_id from runtime context for per-run warning scoping."""
+        run_id = runtime.context.get("run_id") if runtime.context else None
+        if run_id:
+            return str(run_id)
+        return "default"
+
+    def _pending_key(self, runtime: Runtime) -> tuple[str, str]:
+        """Return the pending-warning key for the current thread/run."""
+        return self._get_thread_id(runtime), self._get_run_id(runtime)
+
    def _evict_if_needed(self) -> None:
        """Evict least recently used threads if over the limit.

@@ -226,8 +273,52 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
            self._warned.pop(evicted_id, None)
            self._tool_freq.pop(evicted_id, None)
            self._tool_freq_warned.pop(evicted_id, None)
+            for key in list(self._pending_warnings):
+                if key[0] == evicted_id:
+                    self._drop_pending_warning_key_locked(key)
            logger.debug("Evicted loop tracking for thread %s (LRU)", evicted_id)

+    def _drop_pending_warning_key_locked(self, key: tuple[str, str]) -> None:
+        """Drop all pending-warning bookkeeping for one thread/run key.
+
+        Must be called while holding self._lock.
+        """
+        self._pending_warnings.pop(key, None)
+        self._pending_warning_touch_order.pop(key, None)
+
+    def _touch_pending_warning_key_locked(self, key: tuple[str, str]) -> None:
+        """Mark a pending-warning key as recently used.
+
+        Must be called while holding self._lock.
+        """
+        self._pending_warning_touch_order[key] = None
+        self._pending_warning_touch_order.move_to_end(key)
+
+    def _prune_pending_warning_state_locked(self, protected_key: tuple[str, str]) -> None:
+        """Cap pending-warning state across abnormal or concurrent runs.
+
+        Must be called while holding self._lock.
+        """
+        overflow = len(self._pending_warning_touch_order) - self._max_pending_warning_keys
+        if overflow <= 0:
+            return
+
+        candidates = [key for key in self._pending_warning_touch_order if key != protected_key]
+        for key in candidates[:overflow]:
+            self._drop_pending_warning_key_locked(key)
+
+    def _queue_pending_warning(self, runtime: Runtime, warning: str) -> None:
+        """Queue one transient warning for the current thread/run with caps."""
+        pending_key = self._pending_key(runtime)
+        with self._lock:
+            warnings = self._pending_warnings[pending_key]
+            if warning not in warnings:
+                warnings.append(warning)
+            if len(warnings) > _MAX_PENDING_WARNINGS_PER_RUN:
+                del warnings[: len(warnings) - _MAX_PENDING_WARNINGS_PER_RUN]
+            self._touch_pending_warning_key_locked(pending_key)
+            self._prune_pending_warning_state_locked(protected_key=pending_key)
+
    def _track_and_check(self, state: AgentState, runtime: Runtime) -> tuple[str | None, bool]:
        """Track tool calls and check for loops.

@@ -268,6 +359,12 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
            if len(history) > self.window_size:
                history[:] = history[-self.window_size :]

+            warned_hashes = self._warned.get(thread_id)
+            if warned_hashes is not None:
+                warned_hashes.intersection_update(history)
+                if not warned_hashes:
+                    self._warned.pop(thread_id, None)
+
            count = history.count(call_hash)
            tool_names = [tc.get("name", "?") for tc in tool_calls]

@@ -381,7 +478,10 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
        warning, hard_stop = self._track_and_check(state, runtime)

        if hard_stop:
-            # Strip tool_calls from the last AIMessage to force text output
+            # Strip tool_calls from the last AIMessage to force text output.
+            # Once tool_calls are stripped, the AIMessage no longer requires
+            # matching ToolMessage responses, so mutating it in place here
+            # is safe for OpenAI/Moonshot pairing validators.
            messages = state.get("messages", [])
            last_msg = messages[-1]
            content = self._append_text(last_msg.content, warning or _HARD_STOP_MSG)
@@ -389,33 +489,48 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
            return {"messages": [stripped_msg]}

        if warning:
-            # WORKAROUND for v2.0-m1 — see #2724.
-            #
-            # Append the warning to the AIMessage content instead of
-            # injecting a separate HumanMessage. Inserting any non-tool
-            # message between an AIMessage(tool_calls=...) and its
-            # ToolMessage responses breaks OpenAI/Moonshot strict pairing
-            # validation ("tool_call_ids did not have response messages")
-            # because the tools node has not run yet at after_model time.
-            # tool_calls are preserved so the tools node still executes.
-            #
-            # This is a temporary mitigation: mutating an existing
-            # AIMessage to carry framework-authored text leaks loop-warning
-            # text into downstream consumers (MemoryMiddleware fact
-            # extraction, TitleMiddleware, telemetry, model replay) as if
-            # the model said it. The proper fix is to defer warning
-            # injection from after_model to wrap_model_call so every prior
-            # ToolMessage is already in the request — see RFC #2517 (which
-            # lists "loop intervention does not leave invalid
-            # tool-call/tool-message state" as acceptance criteria) and
-            # the prototype on `fix/loop-detection-tool-call-pairing`.
-            messages = state.get("messages", [])
-            last_msg = messages[-1]
-            patched_msg = last_msg.model_copy(update={"content": self._append_text(last_msg.content, warning)})
-            return {"messages": [patched_msg]}
+            # Defer injection to the next model call. We must NOT alter the
+            # AIMessage(tool_calls=...) here (would put framework words in
+            # the model's mouth, polluting downstream consumers like
+            # MemoryMiddleware), nor insert a separate non-tool message
+            # (would break OpenAI/Moonshot tool-call pairing because the
+            # tools node has not produced ToolMessage responses yet). The
+            # warning is delivered via ``wrap_model_call`` below.
+            self._queue_pending_warning(runtime, warning)
+            return None

        return None

+    def _clear_other_run_pending_warnings(self, runtime: Runtime) -> None:
+        """Drop stale pending warnings for previous runs in this thread."""
+        thread_id, current_run_id = self._pending_key(runtime)
+        with self._lock:
+            for key in list(self._pending_warnings):
+                if key[0] == thread_id and key[1] != current_run_id:
+                    self._drop_pending_warning_key_locked(key)
+
+    def _clear_current_run_pending_warnings(self, runtime: Runtime) -> None:
+        """Drop pending warnings owned by the current thread/run."""
+        pending_key = self._pending_key(runtime)
+        with self._lock:
+            self._drop_pending_warning_key_locked(pending_key)
+
+    @staticmethod
+    def _format_warning_message(warnings: list[str]) -> str:
+        """Merge pending warnings into one prompt message."""
+        deduped = list(dict.fromkeys(warnings))
+        return "\n\n".join(deduped)
+
+    @override
+    def before_agent(self, state: AgentState, runtime: Runtime) -> dict | None:
+        self._clear_other_run_pending_warnings(runtime)
+        return None
+
+    @override
+    async def abefore_agent(self, state: AgentState, runtime: Runtime) -> dict | None:
+        self._clear_other_run_pending_warnings(runtime)
+        return None
+
    @override
    def after_model(self, state: AgentState, runtime: Runtime) -> dict | None:
        return self._apply(state, runtime)
@@ -424,6 +539,59 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
    async def aafter_model(self, state: AgentState, runtime: Runtime) -> dict | None:
        return self._apply(state, runtime)

+    @override
+    def after_agent(self, state: AgentState, runtime: Runtime) -> dict | None:
+        self._clear_current_run_pending_warnings(runtime)
+        return None
+
+    @override
+    async def aafter_agent(self, state: AgentState, runtime: Runtime) -> dict | None:
+        self._clear_current_run_pending_warnings(runtime)
+        return None
+
+    def _drain_pending_warnings(self, runtime: Runtime) -> list[str]:
+        """Pop and return all queued warnings for *runtime*'s thread/run."""
+        pending_key = self._pending_key(runtime)
+        with self._lock:
+            warnings = self._pending_warnings.pop(pending_key, [])
+            self._pending_warning_touch_order.pop(pending_key, None)
+        return warnings
+
+    def _augment_request(self, request: ModelRequest) -> ModelRequest:
+        """Append queued loop warnings (if any) to the outgoing message list.
+
+        The warning is placed *after* every existing message, including the
+        ToolMessage responses to the previous AIMessage(tool_calls). This
+        keeps ``assistant tool_calls -> tool_messages`` pairing intact for
+        OpenAI/Moonshot, avoids the Anthropic mid-stream SystemMessage
+        restriction (we use HumanMessage), and never mutates an existing
+        AIMessage.
+        """
+        warnings = self._drain_pending_warnings(request.runtime)
+        if not warnings:
+            return request
+        new_messages = [
+            *request.messages,
+            HumanMessage(content=self._format_warning_message(warnings), name="loop_warning"),
+        ]
+        return request.override(messages=new_messages)
+
+    @override
+    def wrap_model_call(
+        self,
+        request: ModelRequest,
+        handler: Callable[[ModelRequest], ModelResponse],
+    ) -> ModelCallResult:
+        return handler(self._augment_request(request))
+
+    @override
+    async def awrap_model_call(
+        self,
+        request: ModelRequest,
+        handler: Callable[[ModelRequest], Awaitable[ModelResponse]],
+    ) -> ModelCallResult:
+        return await handler(self._augment_request(request))
+
    def reset(self, thread_id: str | None = None) -> None:
        """Clear tracking state. If thread_id given, clear only that thread."""
        with self._lock:
@@ -432,8 +600,13 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
                self._warned.pop(thread_id, None)
                self._tool_freq.pop(thread_id, None)
                self._tool_freq_warned.pop(thread_id, None)
+                for key in list(self._pending_warnings):
+                    if key[0] == thread_id:
+                        self._drop_pending_warning_key_locked(key)
            else:
                self._history.clear()
                self._warned.clear()
                self._tool_freq.clear()
                self._tool_freq_warned.clear()
+                self._pending_warnings.clear()
+                self._pending_warning_touch_order.clear()
@@ -160,7 +160,11 @@ class TitleMiddleware(AgentMiddleware[TitleMiddlewareState]):
        prompt, user_msg = self._build_title_prompt(state)

        try:
-            model_kwargs = {"thinking_enabled": False}
+            # attach_tracing=False because ``_get_runnable_config()`` inherits
+            # the graph-level RunnableConfig (set in ``_make_lead_agent``) whose
+            # callbacks already carry tracing handlers; binding them again at
+            # the model level would emit duplicate spans.
+            model_kwargs = {"thinking_enabled": False, "attach_tracing": False}
            if self._app_config is not None:
                model_kwargs["app_config"] = self._app_config
            if config.model_name:
@@ -19,6 +19,7 @@ import asyncio
 import json
 import logging
 import mimetypes
+import os
 import shutil
 import tempfile
 import uuid
@@ -42,6 +43,7 @@ from deerflow.config.paths import get_paths
 from deerflow.models import create_chat_model
 from deerflow.runtime.user_context import get_effective_user_id
 from deerflow.skills.storage import get_or_new_skill_storage
+from deerflow.tracing import build_tracing_callbacks, inject_langfuse_metadata
 from deerflow.uploads.manager import (
    claim_unique_filename,
    delete_file_safe,
@@ -123,6 +125,7 @@ class DeerFlowClient:
        agent_name: str | None = None,
        available_skills: set[str] | None = None,
        middlewares: Sequence[AgentMiddleware] | None = None,
+        environment: str | None = None,
    ):
        """Initialize the client.

@@ -140,6 +143,12 @@ class DeerFlowClient:
            agent_name: Name of the agent to use.
            available_skills: Optional set of skill names to make available. If None (default), all scanned skills are available.
            middlewares: Optional list of custom middlewares to inject into the agent.
+            environment: Deployment environment label that ends up in
+                ``langfuse_tags`` (e.g. ``"production"`` / ``"staging"``).
+                When ``None`` the worker/client falls back to the
+                ``DEER_FLOW_ENV`` or ``ENVIRONMENT`` env vars. Pass an
+                explicit value for programmatic callers that do not want
+                env-var coupling.
        """
        if config_path is not None:
            reload_app_config(config_path)
@@ -156,6 +165,7 @@ class DeerFlowClient:
        self._agent_name = agent_name
        self._available_skills = set(available_skills) if available_skills is not None else None
        self._middlewares = list(middlewares) if middlewares else []
+        self._environment = environment

        # Lazy agent — created on first call, recreated when config changes.
        self._agent = None
@@ -228,7 +238,11 @@ class DeerFlowClient:
        max_concurrent_subagents = cfg.get("max_concurrent_subagents", 3)

        kwargs: dict[str, Any] = {
-            "model": create_chat_model(name=model_name, thinking_enabled=thinking_enabled),
+            # attach_tracing=False because ``stream()`` injects tracing
+            # callbacks at the graph invocation root so a single embedded run
+            # produces one trace with correct session_id / user_id propagation.
+            # Attaching them again on the model would emit duplicate spans.
+            "model": create_chat_model(name=model_name, thinking_enabled=thinking_enabled, attach_tracing=False),
            "tools": self._get_tools(model_name=model_name, subagent_enabled=subagent_enabled),
            "middleware": _build_middlewares(config, model_name=model_name, agent_name=self._agent_name, custom_middlewares=self._middlewares),
            "system_prompt": apply_prompt_template(
@@ -571,6 +585,28 @@ class DeerFlowClient:
            thread_id = str(uuid.uuid4())

        config = self._get_runnable_config(thread_id, **kwargs)
+
+        # Inject tracing callbacks and Langfuse trace metadata at the graph
+        # invocation root so the embedded client matches the gateway worker's
+        # behaviour: a single ``stream()`` produces one trace with all node /
+        # LLM / tool calls nested under it, and the trace carries the reserved
+        # ``langfuse_session_id`` / ``langfuse_user_id`` keys that the Langfuse
+        # CallbackHandler lifts onto the root trace's ``sessionId`` / ``userId``.
+        tracing_callbacks = build_tracing_callbacks()
+        if tracing_callbacks:
+            existing_callbacks = list(config.get("callbacks") or [])
+            config["callbacks"] = [*existing_callbacks, *tracing_callbacks]
+
+        configurable = config.get("configurable") or {}
+        inject_langfuse_metadata(
+            config,
+            thread_id=thread_id,
+            user_id=get_effective_user_id(),
+            assistant_id=self._agent_name or "lead-agent",
+            model_name=configurable.get("model_name") or self._model_name,
+            environment=self._environment or os.environ.get("DEER_FLOW_ENV") or os.environ.get("ENVIRONMENT"),
+        )
+
        self._ensure_agent(config)

        state: dict[str, Any] = {"messages": [HumanMessage(content=message)]}
@@ -1,4 +1,5 @@
 import base64
+import errno
 import logging
 import shlex
 import threading
@@ -6,11 +7,14 @@ import uuid

 from agent_sandbox import Sandbox as AioSandboxClient

+from deerflow.config.paths import VIRTUAL_PATH_PREFIX
 from deerflow.sandbox.sandbox import Sandbox
 from deerflow.sandbox.search import GrepMatch, path_matches, should_ignore_path, truncate_line

 logger = logging.getLogger(__name__)

+_MAX_DOWNLOAD_SIZE = 100 * 1024 * 1024  # 100 MB
+
 _ERROR_OBSERVATION_SIGNATURE = "'ErrorObservation' object has no attribute 'exit_code'"


@@ -102,6 +106,49 @@ class AioSandbox(Sandbox):
            logger.error(f"Failed to read file in sandbox: {e}")
            return f"Error: {e}"

+    def download_file(self, path: str) -> bytes:
+        """Download file bytes from the sandbox.
+
+        Raises:
+            PermissionError: If the path contains '..' traversal segments or is
+                outside ``VIRTUAL_PATH_PREFIX``.
+            OSError: If the file cannot be retrieved from the sandbox.
+        """
+        # Reject path traversal before sending to the container API.
+        # LocalSandbox gets this implicitly via _resolve_path;
+        # here the path is forwarded verbatim so we must check explicitly.
+        normalised = path.replace("\\", "/")
+        for segment in normalised.split("/"):
+            if segment == "..":
+                logger.error(f"Refused download due to path traversal: {path}")
+                raise PermissionError(f"Access denied: path traversal detected in '{path}'")
+
+        stripped_path = normalised.lstrip("/")
+        allowed_prefix = VIRTUAL_PATH_PREFIX.lstrip("/")
+        if stripped_path != allowed_prefix and not stripped_path.startswith(f"{allowed_prefix}/"):
+            logger.error("Refused download outside allowed directory: path=%s, allowed_prefix=%s", path, VIRTUAL_PATH_PREFIX)
+            raise PermissionError(f"Access denied: path must be under '{VIRTUAL_PATH_PREFIX}': '{path}'")
+
+        with self._lock:
+            try:
+                chunks: list[bytes] = []
+                total = 0
+                for chunk in self._client.file.download_file(path=path):
+                    total += len(chunk)
+                    if total > _MAX_DOWNLOAD_SIZE:
+                        raise OSError(
+                            errno.EFBIG,
+                            f"File exceeds maximum download size of {_MAX_DOWNLOAD_SIZE} bytes",
+                            path,
+                        )
+                    chunks.append(chunk)
+                return b"".join(chunks)
+            except OSError:
+                raise
+            except Exception as e:
+                logger.error(f"Failed to download file in sandbox: {e}")
+                raise OSError(f"Failed to download file '{path}' from sandbox: {e}") from e
+
    def list_dir(self, path: str, max_depth: int = 2) -> list[str]:
        """List the contents of a directory in the sandbox.

@@ -10,6 +10,7 @@ The provider itself handles:
 - Mount computation (thread-specific, skills)
 """

+import asyncio
 import atexit
 import hashlib
 import logging
@@ -18,6 +19,7 @@ import signal
 import threading
 import time
 import uuid
+from concurrent.futures import ThreadPoolExecutor

 try:
    import fcntl
@@ -32,7 +34,7 @@ from deerflow.sandbox.sandbox import Sandbox
 from deerflow.sandbox.sandbox_provider import SandboxProvider

 from .aio_sandbox import AioSandbox
-from .backend import SandboxBackend, wait_for_sandbox_ready
+from .backend import SandboxBackend, wait_for_sandbox_ready, wait_for_sandbox_ready_async
 from .local_backend import LocalContainerBackend
 from .remote_backend import RemoteSandboxBackend
 from .sandbox_info import SandboxInfo
@@ -46,6 +48,9 @@ DEFAULT_CONTAINER_PREFIX = "deer-flow-sandbox"
 DEFAULT_IDLE_TIMEOUT = 600  # 10 minutes in seconds
 DEFAULT_REPLICAS = 3  # Maximum concurrent sandbox containers
 IDLE_CHECK_INTERVAL = 60  # Check every 60 seconds
+THREAD_LOCK_EXECUTOR_WORKERS = min(32, (os.cpu_count() or 1) + 4)
+_THREAD_LOCK_EXECUTOR = ThreadPoolExecutor(max_workers=THREAD_LOCK_EXECUTOR_WORKERS, thread_name_prefix="sandbox-lock-wait")
+atexit.register(_THREAD_LOCK_EXECUTOR.shutdown, wait=False, cancel_futures=True)


 def _lock_file_exclusive(lock_file) -> None:
@@ -66,6 +71,40 @@ def _unlock_file(lock_file) -> None:
    msvcrt.locking(lock_file.fileno(), msvcrt.LK_UNLCK, 1)


+def _open_lock_file(lock_path):
+    return open(lock_path, "a", encoding="utf-8")
+
+
+async def _acquire_thread_lock_async(lock: threading.Lock) -> None:
+    """Acquire a threading.Lock without polling or using the default executor."""
+    loop = asyncio.get_running_loop()
+    acquire_future = loop.run_in_executor(_THREAD_LOCK_EXECUTOR, lock.acquire, True)
+
+    try:
+        acquired = await asyncio.shield(acquire_future)
+    except asyncio.CancelledError:
+        acquire_future.add_done_callback(lambda task: _release_cancelled_lock_acquire(lock, task))
+        raise
+
+    if not acquired:
+        raise RuntimeError("Failed to acquire sandbox thread lock")
+
+
+def _release_cancelled_lock_acquire(lock: threading.Lock, task: asyncio.Future[bool]) -> None:
+    """Release a lock acquired after its awaiting coroutine was cancelled."""
+    if task.cancelled():
+        return
+
+    try:
+        acquired = task.result()
+    except Exception as e:
+        logger.warning(f"Cancelled sandbox lock acquisition finished with error: {e}")
+        return
+
+    if acquired:
+        lock.release()
+
+
 class AioSandboxProvider(SandboxProvider):
    """Sandbox provider that manages containers running the AIO sandbox.

@@ -416,6 +455,96 @@ class AioSandboxProvider(SandboxProvider):
                self._thread_locks[thread_id] = threading.Lock()
            return self._thread_locks[thread_id]

+    def _sandbox_id_for_thread(self, thread_id: str | None) -> str:
+        """Return deterministic IDs for thread sandboxes and random IDs otherwise."""
+        return self._deterministic_sandbox_id(thread_id) if thread_id else str(uuid.uuid4())[:8]
+
+    def _reuse_in_process_sandbox(self, thread_id: str | None, *, post_lock: bool = False) -> str | None:
+        """Reuse an active in-process sandbox for a thread if one is still tracked."""
+        if thread_id is None:
+            return None
+
+        with self._lock:
+            if thread_id not in self._thread_sandboxes:
+                return None
+
+            existing_id = self._thread_sandboxes[thread_id]
+            if existing_id in self._sandboxes:
+                suffix = " (post-lock check)" if post_lock else ""
+                logger.info(f"Reusing in-process sandbox {existing_id} for thread {thread_id}{suffix}")
+                self._last_activity[existing_id] = time.time()
+                return existing_id
+
+            del self._thread_sandboxes[thread_id]
+            return None
+
+    def _reclaim_warm_pool_sandbox(self, thread_id: str | None, sandbox_id: str, *, post_lock: bool = False) -> str | None:
+        """Promote a warm-pool sandbox back to active tracking if available."""
+        if thread_id is None:
+            return None
+
+        with self._lock:
+            if sandbox_id not in self._warm_pool:
+                return None
+
+            info, _ = self._warm_pool.pop(sandbox_id)
+            sandbox = AioSandbox(id=sandbox_id, base_url=info.sandbox_url)
+            self._sandboxes[sandbox_id] = sandbox
+            self._sandbox_infos[sandbox_id] = info
+            self._last_activity[sandbox_id] = time.time()
+            self._thread_sandboxes[thread_id] = sandbox_id
+
+        suffix = " (post-lock check)" if post_lock else f" at {info.sandbox_url}"
+        logger.info(f"Reclaimed warm-pool sandbox {sandbox_id} for thread {thread_id}{suffix}")
+        return sandbox_id
+
+    def _recheck_cached_sandbox(self, thread_id: str, sandbox_id: str) -> str | None:
+        """Re-check in-memory caches after acquiring the cross-process file lock."""
+        return self._reuse_in_process_sandbox(thread_id, post_lock=True) or self._reclaim_warm_pool_sandbox(thread_id, sandbox_id, post_lock=True)
+
+    def _register_discovered_sandbox(self, thread_id: str, info: SandboxInfo) -> str:
+        """Track a sandbox discovered through the backend."""
+        sandbox = AioSandbox(id=info.sandbox_id, base_url=info.sandbox_url)
+        with self._lock:
+            self._sandboxes[info.sandbox_id] = sandbox
+            self._sandbox_infos[info.sandbox_id] = info
+            self._last_activity[info.sandbox_id] = time.time()
+            self._thread_sandboxes[thread_id] = info.sandbox_id
+
+        logger.info(f"Discovered existing sandbox {info.sandbox_id} for thread {thread_id} at {info.sandbox_url}")
+        return info.sandbox_id
+
+    def _register_created_sandbox(self, thread_id: str | None, sandbox_id: str, info: SandboxInfo) -> str:
+        """Track a newly-created sandbox in the active maps."""
+        sandbox = AioSandbox(id=sandbox_id, base_url=info.sandbox_url)
+        with self._lock:
+            self._sandboxes[sandbox_id] = sandbox
+            self._sandbox_infos[sandbox_id] = info
+            self._last_activity[sandbox_id] = time.time()
+            if thread_id:
+                self._thread_sandboxes[thread_id] = sandbox_id
+
+        logger.info(f"Created sandbox {sandbox_id} for thread {thread_id} at {info.sandbox_url}")
+        return sandbox_id
+
+    def _replica_count(self) -> tuple[int, int]:
+        """Return configured replicas and currently tracked sandbox count."""
+        replicas = self._config.get("replicas", DEFAULT_REPLICAS)
+        with self._lock:
+            total = len(self._sandboxes) + len(self._warm_pool)
+        return replicas, total
+
+    def _log_replicas_soft_cap(self, replicas: int, sandbox_id: str, evicted: str | None) -> None:
+        """Log the result of enforcing the warm-pool replica budget."""
+        if evicted:
+            logger.info(f"Evicted warm-pool sandbox {evicted} to stay within replicas={replicas}")
+            return
+
+        # All slots are occupied by active sandboxes — proceed anyway and log.
+        # The replicas limit is a soft cap; we never forcibly stop a container
+        # that is actively serving a thread.
+        logger.warning(f"All {replicas} replica slots are in active use; creating sandbox {sandbox_id} beyond the soft limit")
+
    # ── Core: acquire / get / release / shutdown ─────────────────────────

    def acquire(self, thread_id: str | None = None) -> str:
@@ -440,6 +569,23 @@ class AioSandboxProvider(SandboxProvider):
        else:
            return self._acquire_internal(thread_id)

+    async def acquire_async(self, thread_id: str | None = None) -> str:
+        """Acquire a sandbox environment without blocking the event loop.
+
+        Mirrors ``acquire()`` while keeping blocking backend operations off the
+        event loop and using async-native readiness polling for newly created
+        sandboxes.
+        """
+        if thread_id:
+            thread_lock = self._get_thread_lock(thread_id)
+            await _acquire_thread_lock_async(thread_lock)
+            try:
+                return await self._acquire_internal_async(thread_id)
+            finally:
+                thread_lock.release()
+
+        return await self._acquire_internal_async(thread_id)
+
    def _acquire_internal(self, thread_id: str | None) -> str:
        """Internal sandbox acquisition with two-layer consistency.

@@ -448,33 +594,17 @@ class AioSandboxProvider(SandboxProvider):
                 sandbox_id is deterministic from thread_id so no shared state file
                 is needed — any process can derive the same container name)
        """
-        # ── Layer 1: In-process cache (fast path) ──
-        if thread_id:
-            with self._lock:
-                if thread_id in self._thread_sandboxes:
-                    existing_id = self._thread_sandboxes[thread_id]
-                    if existing_id in self._sandboxes:
-                        logger.info(f"Reusing in-process sandbox {existing_id} for thread {thread_id}")
-                        self._last_activity[existing_id] = time.time()
-                        return existing_id
-                    else:
-                        del self._thread_sandboxes[thread_id]
+        cached_id = self._reuse_in_process_sandbox(thread_id)
+        if cached_id is not None:
+            return cached_id

        # Deterministic ID for thread-specific, random for anonymous
-        sandbox_id = self._deterministic_sandbox_id(thread_id) if thread_id else str(uuid.uuid4())[:8]
+        sandbox_id = self._sandbox_id_for_thread(thread_id)

        # ── Layer 1.5: Warm pool (container still running, no cold-start) ──
-        if thread_id:
-            with self._lock:
-                if sandbox_id in self._warm_pool:
-                    info, _ = self._warm_pool.pop(sandbox_id)
-                    sandbox = AioSandbox(id=sandbox_id, base_url=info.sandbox_url)
-                    self._sandboxes[sandbox_id] = sandbox
-                    self._sandbox_infos[sandbox_id] = info
-                    self._last_activity[sandbox_id] = time.time()
-                    self._thread_sandboxes[thread_id] = sandbox_id
-                    logger.info(f"Reclaimed warm-pool sandbox {sandbox_id} for thread {thread_id} at {info.sandbox_url}")
-                    return sandbox_id
+        reclaimed_id = self._reclaim_warm_pool_sandbox(thread_id, sandbox_id)
+        if reclaimed_id is not None:
+            return reclaimed_id

        # ── Layer 2: Backend discovery + create (protected by cross-process lock) ──
        # Use a file lock so that two processes racing to create the same sandbox
@@ -485,6 +615,26 @@ class AioSandboxProvider(SandboxProvider):

        return self._create_sandbox(thread_id, sandbox_id)

+    async def _acquire_internal_async(self, thread_id: str | None) -> str:
+        """Async counterpart to ``_acquire_internal``."""
+        cached_id = self._reuse_in_process_sandbox(thread_id)
+        if cached_id is not None:
+            return cached_id
+
+        # Deterministic ID for thread-specific, random for anonymous
+        sandbox_id = self._sandbox_id_for_thread(thread_id)
+
+        # ── Layer 1.5: Warm pool (container still running, no cold-start) ──
+        reclaimed_id = self._reclaim_warm_pool_sandbox(thread_id, sandbox_id)
+        if reclaimed_id is not None:
+            return reclaimed_id
+
+        # ── Layer 2: Backend discovery + create (protected by cross-process lock) ──
+        if thread_id:
+            return await self._discover_or_create_with_lock_async(thread_id, sandbox_id)
+
+        return await self._create_sandbox_async(thread_id, sandbox_id)
+
    def _discover_or_create_with_lock(self, thread_id: str, sandbox_id: str) -> str:
        """Discover an existing sandbox or create a new one under a cross-process file lock.

@@ -503,40 +653,50 @@ class AioSandboxProvider(SandboxProvider):
                locked = True
                # Re-check in-process caches under the file lock in case another
                # thread in this process won the race while we were waiting.
-                with self._lock:
-                    if thread_id in self._thread_sandboxes:
-                        existing_id = self._thread_sandboxes[thread_id]
-                        if existing_id in self._sandboxes:
-                            logger.info(f"Reusing in-process sandbox {existing_id} for thread {thread_id} (post-lock check)")
-                            self._last_activity[existing_id] = time.time()
-                            return existing_id
-                    if sandbox_id in self._warm_pool:
-                        info, _ = self._warm_pool.pop(sandbox_id)
-                        sandbox = AioSandbox(id=sandbox_id, base_url=info.sandbox_url)
-                        self._sandboxes[sandbox_id] = sandbox
-                        self._sandbox_infos[sandbox_id] = info
-                        self._last_activity[sandbox_id] = time.time()
-                        self._thread_sandboxes[thread_id] = sandbox_id
-                        logger.info(f"Reclaimed warm-pool sandbox {sandbox_id} for thread {thread_id} (post-lock check)")
-                        return sandbox_id
+                cached_id = self._recheck_cached_sandbox(thread_id, sandbox_id)
+                if cached_id is not None:
+                    return cached_id

                # Backend discovery: another process may have created the container.
                discovered = self._backend.discover(sandbox_id)
                if discovered is not None:
-                    sandbox = AioSandbox(id=discovered.sandbox_id, base_url=discovered.sandbox_url)
-                    with self._lock:
-                        self._sandboxes[discovered.sandbox_id] = sandbox
-                        self._sandbox_infos[discovered.sandbox_id] = discovered
-                        self._last_activity[discovered.sandbox_id] = time.time()
-                        self._thread_sandboxes[thread_id] = discovered.sandbox_id
-                    logger.info(f"Discovered existing sandbox {discovered.sandbox_id} for thread {thread_id} at {discovered.sandbox_url}")
-                    return discovered.sandbox_id
+                    return self._register_discovered_sandbox(thread_id, discovered)

                return self._create_sandbox(thread_id, sandbox_id)
            finally:
                if locked:
                    _unlock_file(lock_file)

+    async def _discover_or_create_with_lock_async(self, thread_id: str, sandbox_id: str) -> str:
+        """Async counterpart to ``_discover_or_create_with_lock``."""
+        paths = get_paths()
+        user_id = get_effective_user_id()
+        await asyncio.to_thread(paths.ensure_thread_dirs, thread_id, user_id=user_id)
+        lock_path = paths.thread_dir(thread_id, user_id=user_id) / f"{sandbox_id}.lock"
+
+        lock_file = await asyncio.to_thread(_open_lock_file, lock_path)
+        locked = False
+        try:
+            await asyncio.to_thread(_lock_file_exclusive, lock_file)
+            locked = True
+            # Re-check in-process caches under the file lock in case another
+            # thread in this process won the race while we were waiting.
+            cached_id = self._recheck_cached_sandbox(thread_id, sandbox_id)
+            if cached_id is not None:
+                return cached_id
+
+            # Backend discovery is sync because local discovery may inspect
+            # Docker and perform a health check; keep it off the event loop.
+            discovered = await asyncio.to_thread(self._backend.discover, sandbox_id)
+            if discovered is not None:
+                return self._register_discovered_sandbox(thread_id, discovered)
+
+            return await self._create_sandbox_async(thread_id, sandbox_id)
+        finally:
+            if locked:
+                await asyncio.to_thread(_unlock_file, lock_file)
+            await asyncio.to_thread(lock_file.close)
+
    def _evict_oldest_warm(self) -> str | None:
        """Destroy the oldest container in the warm pool to free capacity.

@@ -574,18 +734,10 @@ class AioSandboxProvider(SandboxProvider):

        # Enforce replicas: only warm-pool containers count toward eviction budget.
        # Active sandboxes are in use by live threads and must not be forcibly stopped.
-        replicas = self._config.get("replicas", DEFAULT_REPLICAS)
-        with self._lock:
-            total = len(self._sandboxes) + len(self._warm_pool)
+        replicas, total = self._replica_count()
        if total >= replicas:
            evicted = self._evict_oldest_warm()
-            if evicted:
-                logger.info(f"Evicted warm-pool sandbox {evicted} to stay within replicas={replicas}")
-            else:
-                # All slots are occupied by active sandboxes — proceed anyway and log.
-                # The replicas limit is a soft cap; we never forcibly stop a container
-                # that is actively serving a thread.
-                logger.warning(f"All {replicas} replica slots are in active use; creating sandbox {sandbox_id} beyond the soft limit")
+            self._log_replicas_soft_cap(replicas, sandbox_id, evicted)

        info = self._backend.create(thread_id, sandbox_id, extra_mounts=extra_mounts or None)

@@ -594,16 +746,27 @@ class AioSandboxProvider(SandboxProvider):
            self._backend.destroy(info)
            raise RuntimeError(f"Sandbox {sandbox_id} failed to become ready within timeout at {info.sandbox_url}")

-        sandbox = AioSandbox(id=sandbox_id, base_url=info.sandbox_url)
-        with self._lock:
-            self._sandboxes[sandbox_id] = sandbox
-            self._sandbox_infos[sandbox_id] = info
-            self._last_activity[sandbox_id] = time.time()
-            if thread_id:
-                self._thread_sandboxes[thread_id] = sandbox_id
+        return self._register_created_sandbox(thread_id, sandbox_id, info)

-        logger.info(f"Created sandbox {sandbox_id} for thread {thread_id} at {info.sandbox_url}")
-        return sandbox_id
+    async def _create_sandbox_async(self, thread_id: str | None, sandbox_id: str) -> str:
+        """Async counterpart to ``_create_sandbox``."""
+        extra_mounts = await asyncio.to_thread(self._get_extra_mounts, thread_id)
+
+        # Enforce replicas: only warm-pool containers count toward eviction budget.
+        # Active sandboxes are in use by live threads and must not be forcibly stopped.
+        replicas, total = self._replica_count()
+        if total >= replicas:
+            evicted = await asyncio.to_thread(self._evict_oldest_warm)
+            self._log_replicas_soft_cap(replicas, sandbox_id, evicted)
+
+        info = await asyncio.to_thread(self._backend.create, thread_id, sandbox_id, extra_mounts=extra_mounts or None)
+
+        # Wait for sandbox to be ready without blocking the event loop.
+        if not await wait_for_sandbox_ready_async(info.sandbox_url, timeout=60):
+            await asyncio.to_thread(self._backend.destroy, info)
+            raise RuntimeError(f"Sandbox {sandbox_id} failed to become ready within timeout at {info.sandbox_url}")
+
+        return self._register_created_sandbox(thread_id, sandbox_id, info)

    def get(self, sandbox_id: str) -> Sandbox | None:
        """Get a sandbox by ID. Updates last activity timestamp.
@@ -2,10 +2,12 @@

 from __future__ import annotations

+import asyncio
 import logging
 import time
 from abc import ABC, abstractmethod

+import httpx
 import requests

 from .sandbox_info import SandboxInfo
@@ -35,6 +37,34 @@ def wait_for_sandbox_ready(sandbox_url: str, timeout: int = 30) -> bool:
    return False


+async def wait_for_sandbox_ready_async(sandbox_url: str, timeout: int = 30, poll_interval: float = 1.0) -> bool:
+    """Async variant of sandbox readiness polling.
+
+    Use this from async runtime paths so sandbox startup waits do not block the
+    event loop. The synchronous ``wait_for_sandbox_ready`` function remains for
+    existing synchronous backend/provider call sites.
+    """
+    loop = asyncio.get_running_loop()
+    deadline = loop.time() + timeout
+
+    async with httpx.AsyncClient(timeout=5) as client:
+        while True:
+            remaining = deadline - loop.time()
+            if remaining <= 0:
+                break
+            try:
+                response = await client.get(f"{sandbox_url}/v1/sandbox", timeout=min(5.0, remaining))
+                if response.status_code == 200:
+                    return True
+            except httpx.RequestError:
+                pass
+            remaining = deadline - loop.time()
+            if remaining <= 0:
+                break
+            await asyncio.sleep(min(poll_interval, remaining))
+    return False
+
+
 class SandboxBackend(ABC):
    """Abstract base for sandbox provisioning backends.

@@ -44,7 +74,7 @@ class SandboxBackend(ABC):
    """

    @abstractmethod
-    def create(self, thread_id: str, sandbox_id: str, extra_mounts: list[tuple[str, str, bool]] | None = None) -> SandboxInfo:
+    def create(self, thread_id: str | None, sandbox_id: str, extra_mounts: list[tuple[str, str, bool]] | None = None) -> SandboxInfo:
        """Create/provision a new sandbox.

        Args:
@@ -241,7 +241,7 @@ class LocalContainerBackend(SandboxBackend):

    # ── SandboxBackend interface ──────────────────────────────────────────

-    def create(self, thread_id: str, sandbox_id: str, extra_mounts: list[tuple[str, str, bool]] | None = None) -> SandboxInfo:
+    def create(self, thread_id: str | None, sandbox_id: str, extra_mounts: list[tuple[str, str, bool]] | None = None) -> SandboxInfo:
        """Start a new container and return its connection info.

        Args:
@@ -59,7 +59,7 @@ class RemoteSandboxBackend(SandboxBackend):

    def create(
        self,
-        thread_id: str,
+        thread_id: str | None,
        sandbox_id: str,
        extra_mounts: list[tuple[str, str, bool]] | None = None,
    ) -> SandboxInfo:
@@ -132,7 +132,7 @@ class RemoteSandboxBackend(SandboxBackend):
            logger.warning("Provisioner list_running failed: %s", exc)
            return []

-    def _provisioner_create(self, thread_id: str, sandbox_id: str, extra_mounts: list[tuple[str, str, bool]] | None = None) -> SandboxInfo:
+    def _provisioner_create(self, thread_id: str | None, sandbox_id: str, extra_mounts: list[tuple[str, str, bool]] | None = None) -> SandboxInfo:
        """POST /api/sandboxes → create Pod + Service."""
        try:
            resp = requests.post(
@@ -141,7 +141,7 @@ class ExtensionsConfig(BaseModel):
        try:
            with open(resolved_path, encoding="utf-8") as f:
                config_data = json.load(f)
-            cls.resolve_env_variables(config_data)
+            config_data = cls.resolve_env_variables(config_data)
            return cls.model_validate(config_data)
        except json.JSONDecodeError as e:
            raise ValueError(f"Extensions config file at {resolved_path} is not valid JSON: {e}") from e
@@ -149,7 +149,7 @@ class ExtensionsConfig(BaseModel):
            raise RuntimeError(f"Failed to load extensions config from {resolved_path}: {e}") from e

    @classmethod
-    def resolve_env_variables(cls, config: dict[str, Any]) -> dict[str, Any]:
+    def resolve_env_variables(cls, config: Any) -> Any:
        """Recursively resolve environment variables in the config.

        Environment variables are resolved using the `os.getenv` function. Example: $OPENAI_API_KEY
@@ -160,23 +160,26 @@ class ExtensionsConfig(BaseModel):
        Returns:
            The config with environment variables resolved.
        """
-        for key, value in config.items():
-            if isinstance(value, str):
-                if value.startswith("$"):
-                    env_value = os.getenv(value[1:])
-                    if env_value is None:
-                        # Unresolved placeholder — store empty string so downstream
-                        # consumers (e.g. MCP servers) don't receive the literal "$VAR"
-                        # token as an actual environment value.
-                        config[key] = ""
-                    else:
-                        config[key] = env_value
-                else:
-                    config[key] = value
-            elif isinstance(value, dict):
-                config[key] = cls.resolve_env_variables(value)
-            elif isinstance(value, list):
-                config[key] = [cls.resolve_env_variables(item) if isinstance(item, dict) else item for item in value]
+        if isinstance(config, str):
+            if not config.startswith("$"):
+                return config
+            env_value = os.getenv(config[1:])
+            if env_value is None:
+                # Unresolved placeholder — store empty string so downstream
+                # consumers (e.g. MCP servers) don't receive the literal "$VAR"
+                # token as an actual environment value.
+                return ""
+            return env_value
+
+        if isinstance(config, dict):
+            return {key: cls.resolve_env_variables(value) for key, value in config.items()}
+
+        if isinstance(config, list):
+            return [cls.resolve_env_variables(item) for item in config]
+
+        if isinstance(config, tuple):
+            return tuple(cls.resolve_env_variables(item) for item in config)
+
        return config

    def get_enabled_mcp_servers(self) -> dict[str, McpServerConfig]:
@@ -51,3 +51,16 @@ def load_title_config_from_dict(config_dict: dict) -> None:
    """Load title configuration from a dictionary."""
    global _title_config
    _title_config = TitleConfig(**config_dict)
+
+
+def reset_title_config() -> None:
+    """Restore the title configuration to its pristine ``TitleConfig()`` default.
+
+    Public API so that tests do not have to reach into the private
+    ``_title_config`` module attribute. ``AppConfig.from_file()`` calls
+    :func:`load_title_config_from_dict`, which permanently mutates the
+    singleton; tests that need a clean slate between cases should call
+    this between tests.
+    """
+    global _title_config
+    _title_config = TitleConfig()
@@ -147,3 +147,15 @@ def validate_enabled_tracing_providers() -> None:
 def is_tracing_enabled() -> bool:
    """Check if any tracing provider is enabled and fully configured."""
    return get_tracing_config().is_configured
+
+
+def reset_tracing_config() -> None:
+    """Discard the cached :class:`TracingConfig` so the next call rebuilds it.
+
+    Public API so that tests do not have to reach into the private
+    ``_tracing_config`` module attribute. A future internal rename would
+    silently break callers that mutate the attribute directly.
+    """
+    global _tracing_config
+    with _config_lock:
+        _tracing_config = None
@@ -47,11 +47,24 @@ def _enable_stream_usage_by_default(model_use_path: str, model_settings_from_con
        model_settings_from_config["stream_usage"] = True


-def create_chat_model(name: str | None = None, thinking_enabled: bool = False, *, app_config: AppConfig | None = None, **kwargs) -> BaseChatModel:
+def create_chat_model(name: str | None = None, thinking_enabled: bool = False, *, app_config: AppConfig | None = None, attach_tracing: bool = True, **kwargs) -> BaseChatModel:
    """Create a chat model instance from the config.

    Args:
        name: The name of the model to create. If None, the first model in the config will be used.
+        thinking_enabled: Enable the model's extended-thinking mode when supported.
+        app_config: Explicit application config; falls back to the cached global if omitted.
+        attach_tracing: When True (default), attach tracing callbacks (Langfuse,
+            LangSmith) directly to the model instance. Standalone callers — anything
+            that invokes the model outside a LangGraph run that already wires tracing
+            at the invocation root (``MemoryUpdater``, ad-hoc utilities, etc.) — keep
+            this default so the model-level callback still produces traces. Callers
+            that already attach tracing at the graph root (``make_lead_agent``, the
+            in-graph ``TitleMiddleware``) MUST pass ``attach_tracing=False``; otherwise
+            the same LLM call emits duplicate spans (one rooted at the graph, one at
+            the model) and ``session_id`` / ``user_id`` metadata never reach the trace
+            because the model becomes a nested observation whose ``langfuse_*`` keys
+            get stripped.

    Returns:
        A chat model instance.
@@ -149,9 +162,10 @@ def create_chat_model(name: str | None = None, thinking_enabled: bool = False, *

    model_instance = model_class(**kwargs, **model_settings_from_config)

-    callbacks = build_tracing_callbacks()
-    if callbacks:
-        existing_callbacks = model_instance.callbacks or []
-        model_instance.callbacks = [*existing_callbacks, *callbacks]
-        logger.debug(f"Tracing attached to model '{name}' with providers={len(callbacks)}")
+    if attach_tracing:
+        callbacks = build_tracing_callbacks()
+        if callbacks:
+            existing_callbacks = model_instance.callbacks or []
+            model_instance.callbacks = [*existing_callbacks, *callbacks]
+            logger.debug(f"Tracing attached to model '{name}' with providers={len(callbacks)}")
    return model_instance
@@ -13,6 +13,7 @@ from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker

 from deerflow.persistence.feedback.model import FeedbackRow
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, resolve_user_id
+from deerflow.utils.time import coerce_iso


 class FeedbackRepository:
@@ -24,7 +25,8 @@ class FeedbackRepository:
        d = row.to_dict()
        val = d.get("created_at")
        if isinstance(val, datetime):
-            d["created_at"] = val.isoformat()
+            # SQLite drops tzinfo on read; normalize via ``coerce_iso`` so output is always tz-aware.
+            d["created_at"] = coerce_iso(val)
        return d

    async def create(
@@ -17,6 +17,7 @@ from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
 from deerflow.persistence.run.model import RunRow
 from deerflow.runtime.runs.store.base import RunStore
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, resolve_user_id
+from deerflow.utils.time import coerce_iso


 class RunRepository(RunStore):
@@ -68,11 +69,13 @@ class RunRepository(RunStore):
        # Remap JSON columns to match RunStore interface
        d["metadata"] = d.pop("metadata_json", {})
        d["kwargs"] = d.pop("kwargs_json", {})
-        # Convert datetime to ISO string for consistency with MemoryRunStore
+        # Convert datetime to ISO string for consistency with MemoryRunStore.
+        # SQLite drops tzinfo on read despite ``DateTime(timezone=True)`` —
+        # ``coerce_iso`` normalizes naive datetimes as UTC.
        for key in ("created_at", "updated_at"):
            val = d.get(key)
            if isinstance(val, datetime):
-                d[key] = val.isoformat()
+                d[key] = coerce_iso(val)
        return d

    async def put(
@@ -13,6 +13,7 @@ from deerflow.persistence.json_compat import json_match
 from deerflow.persistence.thread_meta.base import InvalidMetadataFilterError, ThreadMetaStore
 from deerflow.persistence.thread_meta.model import ThreadMetaRow
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, resolve_user_id
+from deerflow.utils.time import coerce_iso

 logger = logging.getLogger(__name__)

@@ -28,7 +29,9 @@ class ThreadMetaRepository(ThreadMetaStore):
        for key in ("created_at", "updated_at"):
            val = d.get(key)
            if isinstance(val, datetime):
-                d[key] = val.isoformat()
+                # SQLite drops tzinfo despite ``DateTime(timezone=True)``;
+                # ``coerce_iso`` normalizes naive values as UTC so the wire format always carries tz.
+                d[key] = coerce_iso(val)
        return d

    async def create(
@@ -17,6 +17,7 @@ from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
 from deerflow.persistence.models.run_event import RunEventRow
 from deerflow.runtime.events.store.base import RunEventStore
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, get_current_user, resolve_user_id
+from deerflow.utils.time import coerce_iso

 logger = logging.getLogger(__name__)

@@ -32,7 +33,9 @@ class DbRunEventStore(RunEventStore):
        d["metadata"] = d.pop("event_metadata", {})
        val = d.get("created_at")
        if isinstance(val, datetime):
-            d["created_at"] = val.isoformat()
+            # SQLite drops tzinfo on read despite ``DateTime(timezone=True)``;
+            # ``coerce_iso`` normalizes naive datetimes as UTC.
+            d["created_at"] = coerce_iso(val)
        d.pop("id", None)
        # Restore structured content that was JSON-serialized on write.
        raw = d.get("content", "")
@@ -258,12 +258,17 @@ class RunManager:
            action: "interrupt" keeps checkpoint, "rollback" reverts to pre-run state.

        Sets the abort event with the action reason and cancels the asyncio task.
-        Returns ``True`` if the run was in-flight and cancellation was initiated.
+        Returns ``True`` if cancellation was initiated **or** the run was already
+        interrupted (idempotent — a second cancel is a no-op success).
+        Returns ``False`` only when the run is unknown to this worker or has
+        reached a terminal state other than interrupted (completed, failed, etc.).
        """
        async with self._lock:
            record = self._runs.get(run_id)
            if record is None:
                return False
+            if record.status == RunStatus.interrupted:
+                return True  # idempotent — already cancelled on this worker
            if record.status not in (RunStatus.pending, RunStatus.running):
                return False
            record.abort_action = action
@@ -0,0 +1,16 @@
+"""Run naming helpers for LangChain/LangSmith tracing."""
+
+from __future__ import annotations
+
+from collections.abc import Mapping
+from typing import Any
+
+
+def resolve_root_run_name(config: Mapping[str, Any], assistant_id: str | None) -> str:
+    for container_name in ("context", "configurable"):
+        container = config.get(container_name)
+        if isinstance(container, Mapping):
+            agent_name = container.get("agent_name")
+            if isinstance(agent_name, str) and agent_name.strip():
+                return agent_name
+    return assistant_id or "lead_agent"
@@ -19,6 +19,7 @@ import asyncio
 import copy
 import inspect
 import logging
+import os
 from dataclasses import dataclass, field
 from functools import lru_cache
 from typing import TYPE_CHECKING, Any, Literal, cast
@@ -31,8 +32,11 @@ if TYPE_CHECKING:
 from deerflow.config.app_config import AppConfig
 from deerflow.runtime.serialization import serialize
 from deerflow.runtime.stream_bridge import StreamBridge
+from deerflow.runtime.user_context import get_effective_user_id
+from deerflow.tracing import inject_langfuse_metadata

 from .manager import RunManager, RunRecord
+from .naming import resolve_root_run_name
 from .schemas import RunStatus

 logger = logging.getLogger(__name__)
@@ -224,6 +228,22 @@ async def run_agent(
        if journal is not None:
            config.setdefault("callbacks", []).append(journal)

+        # Inject Langfuse trace-attribute metadata so the langchain CallbackHandler
+        # can lift session_id / user_id / trace_name / tags onto the root trace.
+        # Shared helper with ``DeerFlowClient.stream`` so both entry points stay
+        # in sync; caller-provided metadata wins via setdefault inside the helper.
+        inject_langfuse_metadata(
+            config,
+            thread_id=thread_id,
+            user_id=get_effective_user_id(),
+            assistant_id=record.assistant_id,
+            model_name=record.model_name,
+            environment=os.environ.get("DEER_FLOW_ENV") or os.environ.get("ENVIRONMENT"),
+        )
+
+        # Resolve after runtime context installation so context/configurable reflect
+        # the agent name that this run will actually execute.
+        config.setdefault("run_name", resolve_root_run_name(config, record.assistant_id))
        runnable_config = RunnableConfig(**config)
        if ctx.app_config is not None and _agent_factory_supports_app_config(agent_factory):
            agent = agent_factory(config=runnable_config, app_config=ctx.app_config)
@@ -1,4 +1,5 @@
 import errno
+import logging
 import ntpath
 import os
 import shutil
@@ -7,10 +8,13 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import NamedTuple

+from deerflow.config.paths import VIRTUAL_PATH_PREFIX
 from deerflow.sandbox.local.list_dir import list_dir
 from deerflow.sandbox.sandbox import Sandbox
 from deerflow.sandbox.search import GrepMatch, find_glob_matches, find_grep_matches

+logger = logging.getLogger(__name__)
+

@dataclass(frozen=True)
 class PathMapping:
@@ -379,6 +383,28 @@ class LocalSandbox(Sandbox):
            # Re-raise with the original path for clearer error messages, hiding internal resolved paths
            raise type(e)(e.errno, e.strerror, path) from None

+    def download_file(self, path: str) -> bytes:
+        normalised = path.replace("\\", "/")
+        stripped_path = normalised.lstrip("/")
+        allowed_prefix = VIRTUAL_PATH_PREFIX.lstrip("/")
+        if stripped_path != allowed_prefix and not stripped_path.startswith(f"{allowed_prefix}/"):
+            logger.error("Refused download outside allowed directory: path=%s, allowed_prefix=%s", path, VIRTUAL_PATH_PREFIX)
+            raise PermissionError(errno.EACCES, f"Access denied: path must be under '{VIRTUAL_PATH_PREFIX}'", path)
+
+        resolved_path = self._resolve_path(path)
+        max_download_size = 100 * 1024 * 1024
+        try:
+            file_size = os.path.getsize(resolved_path)
+            if file_size > max_download_size:
+                raise OSError(errno.EFBIG, f"File exceeds maximum download size of {max_download_size} bytes", path)
+            # TOCTOU note: the file could grow between getsize() and read(); accepted
+            # tradeoff since this is a controlled sandbox environment.
+            with open(resolved_path, "rb") as f:
+                return f.read()
+        except OSError as e:
+            # Re-raise with the original path for clearer error messages, hiding internal resolved paths
+            raise type(e)(e.errno, e.strerror, path) from None
+
    def write_file(self, path: str, content: str, append: bool = False) -> None:
        resolved = self._resolve_path_with_mapping(path)
        resolved_path = resolved.path
@@ -63,6 +63,7 @@ class LocalSandboxProvider(SandboxProvider):
    """

    uses_thread_data_mounts = True
+    needs_upload_permission_adjustment = False

    def __init__(self, max_cached_threads: int = DEFAULT_MAX_CACHED_THREAD_SANDBOXES):
        """Initialize the local sandbox provider with static path mappings.
@@ -1,3 +1,4 @@
+import asyncio
 import logging
 from typing import NotRequired, override

@@ -48,6 +49,15 @@ class SandboxMiddleware(AgentMiddleware[SandboxMiddlewareState]):
        logger.info(f"Acquiring sandbox {sandbox_id}")
        return sandbox_id

+    async def _acquire_sandbox_async(self, thread_id: str) -> str:
+        provider = get_sandbox_provider()
+        sandbox_id = await provider.acquire_async(thread_id)
+        logger.info(f"Acquiring sandbox {sandbox_id}")
+        return sandbox_id
+
+    async def _release_sandbox_async(self, sandbox_id: str) -> None:
+        await asyncio.to_thread(get_sandbox_provider().release, sandbox_id)
+
    @override
    def before_agent(self, state: SandboxMiddlewareState, runtime: Runtime) -> dict | None:
        # Skip acquisition if lazy_init is enabled
@@ -64,6 +74,23 @@ class SandboxMiddleware(AgentMiddleware[SandboxMiddlewareState]):
            return {"sandbox": {"sandbox_id": sandbox_id}}
        return super().before_agent(state, runtime)

+    @override
+    async def abefore_agent(self, state: SandboxMiddlewareState, runtime: Runtime) -> dict | None:
+        # Skip acquisition if lazy_init is enabled
+        if self._lazy_init:
+            return await super().abefore_agent(state, runtime)
+
+        # Eager initialization (original behavior), but use the async provider
+        # hook so blocking sandbox startup/polling runs outside the event loop.
+        if "sandbox" not in state or state["sandbox"] is None:
+            thread_id = (runtime.context or {}).get("thread_id")
+            if thread_id is None:
+                return await super().abefore_agent(state, runtime)
+            sandbox_id = await self._acquire_sandbox_async(thread_id)
+            logger.info(f"Assigned sandbox {sandbox_id} to thread {thread_id}")
+            return {"sandbox": {"sandbox_id": sandbox_id}}
+        return await super().abefore_agent(state, runtime)
+
    @override
    def after_agent(self, state: SandboxMiddlewareState, runtime: Runtime) -> dict | None:
        sandbox = state.get("sandbox")
@@ -81,3 +108,21 @@ class SandboxMiddleware(AgentMiddleware[SandboxMiddlewareState]):

        # No sandbox to release
        return super().after_agent(state, runtime)
+
+    @override
+    async def aafter_agent(self, state: SandboxMiddlewareState, runtime: Runtime) -> dict | None:
+        sandbox = state.get("sandbox")
+        if sandbox is not None:
+            sandbox_id = sandbox["sandbox_id"]
+            logger.info(f"Releasing sandbox {sandbox_id}")
+            await self._release_sandbox_async(sandbox_id)
+            return None
+
+        if (runtime.context or {}).get("sandbox_id") is not None:
+            sandbox_id = runtime.context.get("sandbox_id")
+            logger.info(f"Releasing sandbox {sandbox_id} from context")
+            await self._release_sandbox_async(sandbox_id)
+            return None
+
+        # No sandbox to release
+        return await super().aafter_agent(state, runtime)
@@ -39,6 +39,25 @@ class Sandbox(ABC):
        """
        pass

+    @abstractmethod
+    def download_file(self, path: str) -> bytes:
+        """Download the binary content of a file.
+
+        Args:
+            path: The absolute path of the file to download.
+
+        Returns:
+            Raw file bytes.
+
+        Raises:
+            PermissionError: If path traversal is detected or the path is outside
+                the allowed virtual prefix.
+            OSError: If the file cannot be read or does not exist.  Both local
+                and remote implementations must raise ``OSError`` so callers
+                have a single exception type to handle.
+        """
+        pass
+
    @abstractmethod
    def list_dir(self, path: str, max_depth=2) -> list[str]:
        """List the contents of a directory.
@@ -1,3 +1,4 @@
+import asyncio
 from abc import ABC, abstractmethod

 from deerflow.config import get_app_config
@@ -9,6 +10,7 @@ class SandboxProvider(ABC):
    """Abstract base class for sandbox providers"""

    uses_thread_data_mounts: bool = False
+    needs_upload_permission_adjustment: bool = True

    @abstractmethod
    def acquire(self, thread_id: str | None = None) -> str:
@@ -19,6 +21,16 @@ class SandboxProvider(ABC):
        """
        pass

+    async def acquire_async(self, thread_id: str | None = None) -> str:
+        """Acquire a sandbox without blocking the event loop.
+
+        Most sandbox providers expose a synchronous lifecycle API because local
+        Docker/provisioner operations are blocking. Async runtimes should call
+        this method so those blocking operations run in a worker thread instead
+        of stalling the event loop.
+        """
+        return await asyncio.to_thread(self.acquire, thread_id)
+
    @abstractmethod
    def get(self, sandbox_id: str) -> Sandbox | None:
        """Get a sandbox environment by ID.
@@ -1,6 +1,8 @@
+import asyncio
 import posixpath
 import re
 import shlex
+from collections.abc import Callable
 from pathlib import Path

 from langchain.tools import tool
@@ -1111,6 +1113,68 @@ def ensure_sandbox_initialized(runtime: Runtime | None = None) -> Sandbox:
    return sandbox


+async def ensure_sandbox_initialized_async(runtime: Runtime | None = None) -> Sandbox:
+    """Async counterpart to ``ensure_sandbox_initialized`` for tool runtimes.
+
+    This keeps lazy sandbox acquisition on the async provider hook, so AIO
+    sandbox startup and readiness polling do not fall back to synchronous
+    ``provider.acquire()`` during async tool execution.
+    """
+    if runtime is None:
+        raise SandboxRuntimeError("Tool runtime not available")
+
+    if runtime.state is None:
+        raise SandboxRuntimeError("Tool runtime state not available")
+
+    sandbox_state = runtime.state.get("sandbox")
+    if sandbox_state is not None:
+        sandbox_id = sandbox_state.get("sandbox_id")
+        if sandbox_id is not None:
+            sandbox = get_sandbox_provider().get(sandbox_id)
+            if sandbox is not None:
+                if runtime.context is not None:
+                    runtime.context["sandbox_id"] = sandbox_id
+                return sandbox
+
+    thread_id = runtime.context.get("thread_id") if runtime.context else None
+    if thread_id is None:
+        thread_id = runtime.config.get("configurable", {}).get("thread_id") if runtime.config else None
+    if thread_id is None:
+        raise SandboxRuntimeError("Thread ID not available in runtime context")
+
+    provider = get_sandbox_provider()
+    sandbox_id = await provider.acquire_async(thread_id)
+
+    runtime.state["sandbox"] = {"sandbox_id": sandbox_id}
+
+    sandbox = provider.get(sandbox_id)
+    if sandbox is None:
+        raise SandboxNotFoundError("Sandbox not found after acquisition", sandbox_id=sandbox_id)
+
+    if runtime.context is not None:
+        runtime.context["sandbox_id"] = sandbox_id
+    return sandbox
+
+
+async def _run_sync_tool_after_async_sandbox_init(
+    func: Callable[..., str] | None,
+    runtime: Runtime,
+    *args: object,
+) -> str:
+    """Initialize lazily via async provider, then run sync tool body off-thread."""
+    try:
+        await ensure_sandbox_initialized_async(runtime)
+    except SandboxError as e:
+        return f"Error: {e}"
+    except Exception as e:
+        return f"Error: Unexpected error initializing sandbox: {_sanitize_error(e, runtime)}"
+
+    if func is None:
+        return "Error: Tool implementation not available"
+
+    return await asyncio.to_thread(func, runtime, *args)
+
+
 def ensure_thread_directories_exist(runtime: Runtime | None) -> None:
    """Ensure thread data directories (workspace, uploads, outputs) exist.

@@ -1273,6 +1337,13 @@ def bash_tool(runtime: Runtime, description: str, command: str) -> str:
        return f"Error: Unexpected error executing command: {_sanitize_error(e, runtime)}"


+async def _bash_tool_async(runtime: Runtime, description: str, command: str) -> str:
+    return await _run_sync_tool_after_async_sandbox_init(bash_tool.func, runtime, description, command)
+
+
+bash_tool.coroutine = _bash_tool_async
+
+
@tool("ls", parse_docstring=True)
 def ls_tool(runtime: Runtime, description: str, path: str) -> str:
    """List the contents of a directory up to 2 levels deep in tree format.
@@ -1320,6 +1391,13 @@ def ls_tool(runtime: Runtime, description: str, path: str) -> str:
        return f"Error: Unexpected error listing directory: {_sanitize_error(e, runtime)}"


+async def _ls_tool_async(runtime: Runtime, description: str, path: str) -> str:
+    return await _run_sync_tool_after_async_sandbox_init(ls_tool.func, runtime, description, path)
+
+
+ls_tool.coroutine = _ls_tool_async
+
+
@tool("glob", parse_docstring=True)
 def glob_tool(
    runtime: Runtime,
@@ -1370,6 +1448,28 @@ def glob_tool(
        return f"Error: Unexpected error searching paths: {_sanitize_error(e, runtime)}"


+async def _glob_tool_async(
+    runtime: Runtime,
+    description: str,
+    pattern: str,
+    path: str,
+    include_dirs: bool = False,
+    max_results: int = _DEFAULT_GLOB_MAX_RESULTS,
+) -> str:
+    return await _run_sync_tool_after_async_sandbox_init(
+        glob_tool.func,
+        runtime,
+        description,
+        pattern,
+        path,
+        include_dirs,
+        max_results,
+    )
+
+
+glob_tool.coroutine = _glob_tool_async
+
+
@tool("grep", parse_docstring=True)
 def grep_tool(
    runtime: Runtime,
@@ -1440,6 +1540,32 @@ def grep_tool(
        return f"Error: Unexpected error searching file contents: {_sanitize_error(e, runtime)}"


+async def _grep_tool_async(
+    runtime: Runtime,
+    description: str,
+    pattern: str,
+    path: str,
+    glob: str | None = None,
+    literal: bool = False,
+    case_sensitive: bool = False,
+    max_results: int = _DEFAULT_GREP_MAX_RESULTS,
+) -> str:
+    return await _run_sync_tool_after_async_sandbox_init(
+        grep_tool.func,
+        runtime,
+        description,
+        pattern,
+        path,
+        glob,
+        literal,
+        case_sensitive,
+        max_results,
+    )
+
+
+grep_tool.coroutine = _grep_tool_async
+
+
@tool("read_file", parse_docstring=True)
 def read_file_tool(
    runtime: Runtime,
@@ -1495,6 +1621,19 @@ def read_file_tool(
        return f"Error: Unexpected error reading file: {_sanitize_error(e, runtime)}"


+async def _read_file_tool_async(
+    runtime: Runtime,
+    description: str,
+    path: str,
+    start_line: int | None = None,
+    end_line: int | None = None,
+) -> str:
+    return await _run_sync_tool_after_async_sandbox_init(read_file_tool.func, runtime, description, path, start_line, end_line)
+
+
+read_file_tool.coroutine = _read_file_tool_async
+
+
@tool("write_file", parse_docstring=True)
 def write_file_tool(
    runtime: Runtime,
@@ -1536,6 +1675,19 @@ def write_file_tool(
        return f"Error: Unexpected error writing file: {_sanitize_error(e, runtime)}"


+async def _write_file_tool_async(
+    runtime: Runtime,
+    description: str,
+    path: str,
+    content: str,
+    append: bool = False,
+) -> str:
+    return await _run_sync_tool_after_async_sandbox_init(write_file_tool.func, runtime, description, path, content, append)
+
+
+write_file_tool.coroutine = _write_file_tool_async
+
+
@tool("str_replace", parse_docstring=True)
 def str_replace_tool(
    runtime: Runtime,
@@ -1585,3 +1737,25 @@ def str_replace_tool(
        return f"Error: Permission denied accessing file: {requested_path}"
    except Exception as e:
        return f"Error: Unexpected error replacing string: {_sanitize_error(e, runtime)}"
+
+
+async def _str_replace_tool_async(
+    runtime: Runtime,
+    description: str,
+    path: str,
+    old_str: str,
+    new_str: str,
+    replace_all: bool = False,
+) -> str:
+    return await _run_sync_tool_after_async_sandbox_init(
+        str_replace_tool.func,
+        runtime,
+        description,
+        path,
+        old_str,
+        new_str,
+        replace_all,
+    )
+
+
+str_replace_tool.coroutine = _str_replace_tool_async
@@ -383,9 +383,6 @@ async def task_tool(
            # Polling timeout as a safety net (in case thread pool timeout doesn't work)
            # Set to execution timeout + 60s buffer, in 5s poll intervals
            # This catches edge cases where the background task gets stuck
-            # Note: We don't call cleanup_background_task here because the task may
-            # still be running in the background. The cleanup will happen when the
-            # executor completes and sets a terminal status.
            if poll_count > max_poll_count:
                timeout_minutes = config.timeout_seconds // 60
                logger.error(f"[trace={trace_id}] Task {task_id} polling timed out after {poll_count} polls (should have been caught by thread pool timeout)")
@@ -393,6 +390,11 @@ async def task_tool(
                usage = _summarize_usage(getattr(result, "token_usage_records", None))
                _cache_subagent_usage(tool_call_id, usage, enabled=cache_token_usage)
                writer({"type": "task_timed_out", "task_id": task_id, "usage": usage})
+                # The task may still be running in the background. Signal cooperative
+                # cancellation and schedule deferred cleanup to remove the entry from
+                # _background_tasks once the background thread reaches a terminal state.
+                request_cancel_background_task(task_id)
+                _schedule_deferred_subagent_cleanup(task_id, trace_id, max_poll_count)
                return f"Task polling timed out after {timeout_minutes} minutes. This may indicate the background task is stuck. Status: {result.status.value}"
    except asyncio.CancelledError:
        # Signal the background subagent thread to stop cooperatively.
@@ -3,9 +3,13 @@
 import asyncio
 import atexit
 import concurrent.futures
+import contextvars
+import functools
 import logging
 from collections.abc import Callable
-from typing import Any
+from typing import Any, get_type_hints
+
+from langchain_core.runnables import RunnableConfig

 logger = logging.getLogger(__name__)

@@ -15,10 +19,49 @@ _SYNC_TOOL_EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=10, thre
 atexit.register(lambda: _SYNC_TOOL_EXECUTOR.shutdown(wait=False))


-def make_sync_tool_wrapper(coro: Callable[..., Any], tool_name: str) -> Callable[..., Any]:
-    """Build a synchronous wrapper for an asynchronous tool coroutine."""
+def _get_runnable_config_param(func: Callable[..., Any]) -> str | None:
+    """Return the coroutine parameter that expects LangChain RunnableConfig."""
+    if isinstance(func, functools.partial):
+        func = func.func

-    def sync_wrapper(*args: Any, **kwargs: Any) -> Any:
+    try:
+        type_hints = get_type_hints(func)
+    except Exception:
+        return None
+
+    for name, type_ in type_hints.items():
+        if type_ is RunnableConfig:
+            return name
+    return None
+
+
+def make_sync_tool_wrapper(coro: Callable[..., Any], tool_name: str) -> Callable[..., Any]:
+    """Build a synchronous wrapper for an asynchronous tool coroutine.
+
+    Args:
+        coro: Async callable backing a LangChain tool.
+        tool_name: Tool name used in error logs.
+
+    Returns:
+        A sync callable suitable for ``BaseTool.func``.
+
+    Notes:
+        If ``coro`` declares a ``RunnableConfig`` parameter, this wrapper
+        exposes ``config: RunnableConfig`` so LangChain can inject runtime
+        config and then forwards it to the coroutine's detected config
+        parameter. This covers DeerFlow's current config-sensitive tools, such
+        as ``invoke_acp_agent``.
+
+        This wrapper intentionally does not synthesize a dynamic function
+        signature. A future async tool with a normal user-facing argument named
+        ``config`` and a separate ``RunnableConfig`` parameter named something
+        else, such as ``run_config``, may collide with LangChain's injected
+        ``config`` argument. Rename that user-facing field or extend this
+        helper before using that signature.
+    """
+    config_param = _get_runnable_config_param(coro)
+
+    def run_coroutine(*args: Any, **kwargs: Any) -> Any:
        try:
            loop = asyncio.get_running_loop()
        except RuntimeError:
@@ -26,11 +69,24 @@ def make_sync_tool_wrapper(coro: Callable[..., Any], tool_name: str) -> Callable

        try:
            if loop is not None and loop.is_running():
-                future = _SYNC_TOOL_EXECUTOR.submit(asyncio.run, coro(*args, **kwargs))
+                context = contextvars.copy_context()
+                future = _SYNC_TOOL_EXECUTOR.submit(context.run, lambda: asyncio.run(coro(*args, **kwargs)))
                return future.result()
            return asyncio.run(coro(*args, **kwargs))
        except Exception as e:
            logger.error("Error invoking tool %r via sync wrapper: %s", tool_name, e, exc_info=True)
            raise

+    if config_param:
+
+        def sync_wrapper(*args: Any, config: RunnableConfig = None, **kwargs: Any) -> Any:
+            if config is not None or config_param not in kwargs:
+                kwargs[config_param] = config
+            return run_coroutine(*args, **kwargs)
+
+        return sync_wrapper
+
+    def sync_wrapper(*args: Any, **kwargs: Any) -> Any:
+        return run_coroutine(*args, **kwargs)
+
    return sync_wrapper
@@ -205,7 +205,7 @@ def get_available_tools(
    # Deduplicate by tool name — config-loaded tools take priority, followed by
    # built-ins, MCP tools, and ACP tools.  Duplicate names cause the LLM to
    # receive ambiguous or concatenated function schemas (issue #1803).
-    all_tools = loaded_tools + builtin_tools + mcp_tools + acp_tools
+    all_tools = [_ensure_sync_invocable_tool(t) for t in loaded_tools + builtin_tools + mcp_tools + acp_tools]
    seen_names: set[str] = set()
    unique_tools: list[BaseTool] = []
    for t in all_tools:
@@ -1,3 +1,8 @@
 from .factory import build_tracing_callbacks
+from .metadata import build_langfuse_trace_metadata, inject_langfuse_metadata

-__all__ = ["build_tracing_callbacks"]
+__all__ = [
+    "build_langfuse_trace_metadata",
+    "build_tracing_callbacks",
+    "inject_langfuse_metadata",
+]
@@ -0,0 +1,105 @@
+"""Langfuse trace-attribute metadata builders.
+
+The Langfuse v4 ``langchain.CallbackHandler`` lifts a fixed set of reserved
+keys from ``RunnableConfig.metadata`` onto the root trace:
+
+- ``langfuse_session_id`` → groups traces (LangGraph thread → Langfuse Session)
+- ``langfuse_user_id``    → trace user_id (powers the Users page)
+- ``langfuse_trace_name`` → human-readable trace name
+- ``langfuse_tags``       → trace tags
+
+See ``langfuse/langchain/CallbackHandler.py::_parse_langfuse_trace_attributes``
+and https://langfuse.com/docs/observability/features/sessions for the
+contract. Builders here exist so the gateway/run worker can inject the
+right metadata without leaking Langfuse internals into the call sites.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+from deerflow.config import get_enabled_tracing_providers
+
+# Lazy-imported below to avoid a circular import: ``deerflow.runtime`` eagerly
+# imports the run worker, which in turn needs ``deerflow.tracing``.
+_DEFAULT_TRACE_NAME = "lead-agent"
+
+
+def build_langfuse_trace_metadata(
+    *,
+    thread_id: str | None,
+    user_id: str | None = None,
+    assistant_id: str | None = None,
+    model_name: str | None = None,
+    environment: str | None = None,
+) -> dict[str, Any]:
+    """Return Langfuse trace-attribute metadata for ``RunnableConfig.metadata``.
+
+    Returns ``{}`` when Langfuse is not in the enabled tracing providers so
+    callers can unconditionally merge the result without affecting LangSmith
+    or other tracers.
+
+    Args:
+        thread_id: LangGraph thread id; mapped to ``langfuse_session_id``.
+        user_id: Effective user id; falls back to ``DEFAULT_USER_ID`` when
+            ``None`` so the Langfuse Users page works in no-auth mode.
+        assistant_id: Optional agent identifier; defaults to ``"lead-agent"``.
+        model_name: Model name; emitted as ``model:<name>`` in ``langfuse_tags``.
+        environment: Deployment env (e.g. ``"production"``); emitted as
+            ``env:<value>`` in ``langfuse_tags``.
+    """
+    if "langfuse" not in get_enabled_tracing_providers():
+        return {}
+
+    from deerflow.runtime.user_context import DEFAULT_USER_ID
+
+    metadata: dict[str, Any] = {
+        "langfuse_session_id": thread_id,
+        "langfuse_user_id": user_id or DEFAULT_USER_ID,
+        "langfuse_trace_name": assistant_id or _DEFAULT_TRACE_NAME,
+    }
+
+    tags: list[str] = []
+    if environment:
+        tags.append(f"env:{environment}")
+    if model_name:
+        tags.append(f"model:{model_name}")
+    if tags:
+        metadata["langfuse_tags"] = tags
+
+    return metadata
+
+
+def inject_langfuse_metadata(
+    config: dict,
+    *,
+    thread_id: str | None,
+    user_id: str | None = None,
+    assistant_id: str | None = None,
+    model_name: str | None = None,
+    environment: str | None = None,
+) -> None:
+    """Merge Langfuse trace-attribute metadata into ``config["metadata"]``.
+
+    Shared by the gateway worker (``runtime/runs/worker.py``) and the
+    embedded client (``client.py``) so the two paths cannot drift apart.
+
+    Caller-supplied metadata wins via ``setdefault`` — an upstream value
+    for e.g. ``langfuse_session_id`` set by the frontend stays untouched.
+    The ``config`` dict is mutated in place; the call is a no-op when
+    Langfuse is not in the enabled tracing providers.
+    """
+    langfuse_metadata = build_langfuse_trace_metadata(
+        thread_id=thread_id,
+        user_id=user_id,
+        assistant_id=assistant_id,
+        model_name=model_name,
+        environment=environment,
+    )
+    if not langfuse_metadata:
+        return
+
+    merged_metadata = dict(config.get("metadata") or {})
+    for key, value in langfuse_metadata.items():
+        merged_metadata.setdefault(key, value)
+    config["metadata"] = merged_metadata
@@ -176,6 +176,31 @@ def _reset_skill_storage_singleton():
        reset_skill_storage()


+@pytest.fixture(autouse=True)
+def _restore_title_config_singleton():
+    """Reset ``_title_config`` to its pristine default after every test.
+
+    ``AppConfig.from_file()`` writes the on-disk ``title`` block into the
+    module-level singleton (``config/app_config.py`` calls
+    ``load_title_config_from_dict``). Any test that loads the real
+    ``config.yaml`` therefore leaves the singleton in a state that
+    ``test_title_middleware_core_logic.py`` does not expect; that suite
+    relies on the pristine ``TitleConfig()`` default (``enabled=True``).
+    We restore the default after every test so test files stay
+    independent regardless of order.
+    """
+    try:
+        from deerflow.config.title_config import reset_title_config
+    except ImportError:
+        yield
+        return
+
+    try:
+        yield
+    finally:
+        reset_title_config()
+
+
@pytest.fixture(autouse=True)
 def _auto_user_context(request):
    """Inject a default ``test-user-autouse`` into the contextvar.
@@ -0,0 +1,507 @@
+#!/usr/bin/env python3
+"""Inventory async/thread boundary points for developer review.
+
+This detector is intentionally non-invasive: it parses Python source with AST
+and reports places where code crosses sync/async/thread boundaries. Findings
+are review evidence, not automatic bug decisions.
+"""
+
+from __future__ import annotations
+
+import argparse
+import ast
+import json
+import os
+import sys
+from collections.abc import Iterable, Sequence
+from dataclasses import asdict, dataclass
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parents[4]
+DEFAULT_SCAN_PATHS = (
+    REPO_ROOT / "backend" / "app",
+    REPO_ROOT / "backend" / "packages" / "harness" / "deerflow",
+)
+IGNORED_DIR_NAMES = {
+    ".git",
+    ".mypy_cache",
+    ".pytest_cache",
+    ".ruff_cache",
+    ".venv",
+    "__pycache__",
+    "node_modules",
+}
+SEVERITY_ORDER = {"INFO": 0, "WARN": 1, "FAIL": 2}
+
+
+@dataclass(frozen=True)
+class BoundaryFinding:
+    severity: str
+    category: str
+    path: str
+    line: int
+    column: int
+    function: str
+    async_context: bool
+    symbol: str
+    message: str
+    code: str
+
+    def to_dict(self) -> dict[str, object]:
+        return asdict(self)
+
+
+@dataclass(frozen=True)
+class _FunctionContext:
+    name: str
+    is_async: bool
+
+
+@dataclass(frozen=True)
+class _CallRule:
+    severity: str
+    category: str
+    message: str
+
+
+EXACT_CALL_RULES: dict[str, _CallRule] = {
+    "asyncio.run": _CallRule(
+        "WARN",
+        "SYNC_ASYNC_BRIDGE",
+        "Runs a coroutine from synchronous code by creating an event loop boundary.",
+    ),
+    "asyncio.to_thread": _CallRule(
+        "INFO",
+        "ASYNC_THREAD_OFFLOAD",
+        "Offloads synchronous work from an async context into a worker thread.",
+    ),
+    "asyncio.new_event_loop": _CallRule(
+        "WARN",
+        "NEW_EVENT_LOOP",
+        "Creates a separate event loop; review resource ownership across loops.",
+    ),
+    "asyncio.run_coroutine_threadsafe": _CallRule(
+        "WARN",
+        "CROSS_THREAD_COROUTINE",
+        "Submits a coroutine to an event loop from another thread.",
+    ),
+    "concurrent.futures.ThreadPoolExecutor": _CallRule(
+        "INFO",
+        "THREAD_POOL",
+        "Creates a thread pool boundary.",
+    ),
+    "threading.Thread": _CallRule(
+        "INFO",
+        "RAW_THREAD",
+        "Creates a raw thread; ContextVar values do not propagate automatically.",
+    ),
+    "threading.Timer": _CallRule(
+        "INFO",
+        "RAW_TIMER_THREAD",
+        "Creates a timer-backed raw thread; ContextVar values do not propagate automatically.",
+    ),
+    "make_sync_tool_wrapper": _CallRule(
+        "INFO",
+        "SYNC_TOOL_WRAPPER",
+        "Adapts an async tool coroutine for synchronous tool invocation.",
+    ),
+}
+THREAD_POOL_CONSTRUCTORS = {"concurrent.futures.ThreadPoolExecutor"}
+ASYNC_TOOL_FACTORY_CALLS = {
+    "StructuredTool.from_function",
+    "langchain.tools.StructuredTool.from_function",
+    "langchain_core.tools.StructuredTool.from_function",
+}
+LANGCHAIN_INVOKE_RECEIVER_NAMES = {
+    "agent",
+    "chain",
+    "chat_model",
+    "graph",
+    "llm",
+    "model",
+    "runnable",
+}
+LANGCHAIN_INVOKE_RECEIVER_SUFFIXES = (
+    "_agent",
+    "_chain",
+    "_graph",
+    "_llm",
+    "_model",
+    "_runnable",
+)
+
+ASYNC_BLOCKING_CALL_RULES: dict[str, _CallRule] = {
+    "time.sleep": _CallRule(
+        "WARN",
+        "BLOCKING_CALL_IN_ASYNC",
+        "Blocks the event loop when called directly inside async code.",
+    ),
+    "subprocess.run": _CallRule(
+        "WARN",
+        "BLOCKING_SUBPROCESS_IN_ASYNC",
+        "Runs a blocking subprocess from async code.",
+    ),
+    "subprocess.check_call": _CallRule(
+        "WARN",
+        "BLOCKING_SUBPROCESS_IN_ASYNC",
+        "Runs a blocking subprocess from async code.",
+    ),
+    "subprocess.check_output": _CallRule(
+        "WARN",
+        "BLOCKING_SUBPROCESS_IN_ASYNC",
+        "Runs a blocking subprocess from async code.",
+    ),
+    "subprocess.Popen": _CallRule(
+        "WARN",
+        "BLOCKING_SUBPROCESS_IN_ASYNC",
+        "Starts a subprocess from async code; review whether it blocks later.",
+    ),
+}
+
+
+def dotted_name(node: ast.AST | None) -> str | None:
+    if isinstance(node, ast.Name):
+        return node.id
+    if isinstance(node, ast.Attribute):
+        parent = dotted_name(node.value)
+        if parent:
+            return f"{parent}.{node.attr}"
+        return node.attr
+    return None
+
+
+def call_receiver_name(node: ast.Call) -> str | None:
+    if not isinstance(node.func, ast.Attribute):
+        return None
+    return dotted_name(node.func.value)
+
+
+def is_none_node(node: ast.AST | None) -> bool:
+    return isinstance(node, ast.Constant) and node.value is None
+
+
+class BoundaryVisitor(ast.NodeVisitor):
+    def __init__(self, path: Path, relative_path: str, source_lines: Sequence[str]) -> None:
+        self.path = path
+        self.relative_path = relative_path
+        self.source_lines = source_lines
+        self.findings: list[BoundaryFinding] = []
+        self.function_stack: list[_FunctionContext] = []
+        self.import_aliases: dict[str, str] = {}
+        self.executor_names: set[str] = set()
+
+    @property
+    def current_function(self) -> str:
+        if not self.function_stack:
+            return "<module>"
+        return ".".join(context.name for context in self.function_stack)
+
+    @property
+    def in_async_context(self) -> bool:
+        return bool(self.function_stack and self.function_stack[-1].is_async)
+
+    def visit_Import(self, node: ast.Import) -> None:
+        for alias in node.names:
+            local_name = alias.asname or alias.name.split(".", 1)[0]
+            canonical_name = alias.name if alias.asname else local_name
+            self.import_aliases[local_name] = canonical_name
+
+    def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
+        if node.module is None:
+            return
+        for alias in node.names:
+            local_name = alias.asname or alias.name
+            self.import_aliases[local_name] = f"{node.module}.{alias.name}"
+
+    def visit_Assign(self, node: ast.Assign) -> None:
+        self._record_executor_targets(node.value, node.targets)
+        self.generic_visit(node)
+
+    def visit_AnnAssign(self, node: ast.AnnAssign) -> None:
+        if node.value is not None:
+            self._record_executor_targets(node.value, [node.target])
+        self.generic_visit(node)
+
+    def visit_With(self, node: ast.With) -> None:
+        for item in node.items:
+            if item.optional_vars is not None:
+                self._record_executor_targets(item.context_expr, [item.optional_vars])
+        self.generic_visit(node)
+
+    def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
+        self.function_stack.append(_FunctionContext(node.name, is_async=False))
+        self.generic_visit(node)
+        self.function_stack.pop()
+
+    def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
+        self.function_stack.append(_FunctionContext(node.name, is_async=True))
+        try:
+            self._check_async_tool_definition(node)
+            self.generic_visit(node)
+        finally:
+            self.function_stack.pop()
+
+    def visit_Call(self, node: ast.Call) -> None:
+        call_name = self._canonical_name(dotted_name(node.func))
+        if call_name:
+            self._check_call(node, call_name)
+        self.generic_visit(node)
+
+    def _check_async_tool_definition(self, node: ast.AsyncFunctionDef) -> None:
+        for decorator in node.decorator_list:
+            decorator_call = decorator.func if isinstance(decorator, ast.Call) else decorator
+            decorator_name = self._canonical_name(dotted_name(decorator_call))
+            if decorator_name in {"langchain.tools.tool", "langchain_core.tools.tool"}:
+                self._emit(
+                    node,
+                    severity="INFO",
+                    category="ASYNC_TOOL_DEFINITION",
+                    symbol=decorator_name,
+                    message="Defines an async LangChain tool; sync clients need a wrapper before invoke().",
+                )
+                return
+
+    def _check_call(self, node: ast.Call, call_name: str) -> None:
+        rule = EXACT_CALL_RULES.get(call_name)
+        if rule:
+            self._emit_rule(node, call_name, rule)
+
+        if call_name.endswith(".run_until_complete"):
+            self._emit(
+                node,
+                severity="WARN",
+                category="RUN_UNTIL_COMPLETE",
+                symbol=call_name,
+                message="Drives an event loop from synchronous code; review nested-loop behavior.",
+            )
+
+        if self._is_executor_submit(node, call_name):
+            self._emit(
+                node,
+                severity="INFO",
+                category="EXECUTOR_SUBMIT",
+                symbol=call_name,
+                message="Submits work to an executor; review context propagation and cancellation.",
+            )
+
+        if call_name in ASYNC_TOOL_FACTORY_CALLS:
+            if any(keyword.arg == "coroutine" and not is_none_node(keyword.value) for keyword in node.keywords):
+                self._emit(
+                    node,
+                    severity="INFO",
+                    category="ASYNC_ONLY_TOOL_FACTORY",
+                    symbol=call_name,
+                    message="Creates a StructuredTool from a coroutine; sync clients need a wrapper.",
+                )
+
+        if self.in_async_context and call_name in ASYNC_BLOCKING_CALL_RULES:
+            self._emit_rule(node, call_name, ASYNC_BLOCKING_CALL_RULES[call_name])
+
+        if self.in_async_context and self._is_langchain_invoke(node, call_name, method_name="invoke"):
+            self._emit(
+                node,
+                severity="WARN",
+                category="SYNC_INVOKE_IN_ASYNC",
+                symbol=call_name,
+                message="Calls a synchronous invoke() from async code; review event-loop blocking.",
+            )
+
+        if not self.in_async_context and self._is_langchain_invoke(node, call_name, method_name="ainvoke"):
+            self._emit(
+                node,
+                severity="WARN",
+                category="ASYNC_INVOKE_IN_SYNC",
+                symbol=call_name,
+                message="Calls async ainvoke() from sync code; review how the coroutine is awaited.",
+            )
+
+    def _canonical_name(self, name: str | None) -> str | None:
+        if name is None:
+            return None
+        parts = name.split(".")
+        if parts and parts[0] in self.import_aliases:
+            return ".".join((self.import_aliases[parts[0]], *parts[1:]))
+        return name
+
+    def _record_executor_targets(self, value: ast.AST, targets: Sequence[ast.AST]) -> None:
+        if not isinstance(value, ast.Call):
+            return
+        call_name = self._canonical_name(dotted_name(value.func))
+        if call_name not in THREAD_POOL_CONSTRUCTORS:
+            return
+        for target in targets:
+            for name in self._target_names(target):
+                self.executor_names.add(name)
+
+    def _target_names(self, target: ast.AST) -> Iterable[str]:
+        if isinstance(target, ast.Name):
+            yield target.id
+        elif isinstance(target, (ast.Tuple, ast.List)):
+            for element in target.elts:
+                yield from self._target_names(element)
+
+    def _is_executor_submit(self, node: ast.Call, call_name: str) -> bool:
+        if not call_name.endswith(".submit"):
+            return False
+        receiver_name = call_receiver_name(node)
+        return receiver_name in self.executor_names
+
+    def _is_langchain_invoke(self, node: ast.Call, call_name: str, *, method_name: str) -> bool:
+        if not call_name.endswith(f".{method_name}"):
+            return False
+        receiver_name = call_receiver_name(node)
+        if receiver_name is None:
+            return False
+        receiver_leaf = receiver_name.rsplit(".", 1)[-1]
+        return receiver_leaf in LANGCHAIN_INVOKE_RECEIVER_NAMES or receiver_leaf.endswith(LANGCHAIN_INVOKE_RECEIVER_SUFFIXES)
+
+    def _emit_rule(self, node: ast.AST, symbol: str, rule: _CallRule) -> None:
+        self._emit(
+            node,
+            severity=rule.severity,
+            category=rule.category,
+            symbol=symbol,
+            message=rule.message,
+        )
+
+    def _emit(self, node: ast.AST, *, severity: str, category: str, symbol: str, message: str) -> None:
+        line = getattr(node, "lineno", 0)
+        column = getattr(node, "col_offset", 0)
+        code = ""
+        if line > 0 and line <= len(self.source_lines):
+            code = self.source_lines[line - 1].strip()
+        self.findings.append(
+            BoundaryFinding(
+                severity=severity,
+                category=category,
+                path=self.relative_path,
+                line=line,
+                column=column,
+                function=self.current_function,
+                async_context=self.in_async_context,
+                symbol=symbol,
+                message=message,
+                code=code,
+            )
+        )
+
+
+def relative_to_repo(path: Path, repo_root: Path = REPO_ROOT) -> str:
+    try:
+        return path.resolve().relative_to(repo_root.resolve()).as_posix()
+    except ValueError:
+        return path.as_posix()
+
+
+def scan_file(path: Path, *, repo_root: Path = REPO_ROOT) -> list[BoundaryFinding]:
+    source = path.read_text(encoding="utf-8")
+    source_lines = source.splitlines()
+    relative_path = relative_to_repo(path, repo_root)
+    try:
+        tree = ast.parse(source, filename=str(path))
+    except SyntaxError as exc:
+        line = exc.lineno or 0
+        code = source_lines[line - 1].strip() if line > 0 and line <= len(source_lines) else ""
+        return [
+            BoundaryFinding(
+                severity="WARN",
+                category="PARSE_ERROR",
+                path=relative_path,
+                line=line,
+                column=max((exc.offset or 1) - 1, 0),
+                function="<module>",
+                async_context=False,
+                symbol="SyntaxError",
+                message=str(exc),
+                code=code,
+            )
+        ]
+
+    visitor = BoundaryVisitor(path, relative_path, source_lines)
+    visitor.visit(tree)
+    return visitor.findings
+
+
+def is_ignored_path(path: Path) -> bool:
+    return any(part in IGNORED_DIR_NAMES for part in path.parts)
+
+
+def iter_python_files(paths: Iterable[Path]) -> Iterable[Path]:
+    for path in paths:
+        if not path.exists() or is_ignored_path(path):
+            continue
+        if path.is_file():
+            if path.suffix == ".py" and not is_ignored_path(path):
+                yield path
+            continue
+        for dirpath, dirnames, filenames in os.walk(path):
+            dirnames[:] = [dirname for dirname in dirnames if dirname not in IGNORED_DIR_NAMES]
+            for filename in filenames:
+                if filename.endswith(".py"):
+                    yield Path(dirpath) / filename
+
+
+def scan_paths(paths: Iterable[Path], *, repo_root: Path = REPO_ROOT) -> list[BoundaryFinding]:
+    findings: list[BoundaryFinding] = []
+    for path in sorted(iter_python_files(paths)):
+        findings.extend(scan_file(path, repo_root=repo_root))
+    return sorted(findings, key=lambda finding: (finding.path, finding.line, finding.column, finding.category))
+
+
+def filter_findings(findings: Iterable[BoundaryFinding], min_severity: str) -> list[BoundaryFinding]:
+    threshold = SEVERITY_ORDER[min_severity]
+    return [finding for finding in findings if SEVERITY_ORDER[finding.severity] >= threshold]
+
+
+def format_text(findings: Sequence[BoundaryFinding]) -> str:
+    if not findings:
+        return "No async/thread boundary findings."
+
+    lines: list[str] = []
+    for finding in findings:
+        lines.append(f"{finding.severity} {finding.category} {finding.path}:{finding.line}:{finding.column + 1} in {finding.function} async={str(finding.async_context).lower()}")
+        lines.append(f"  symbol: {finding.symbol}")
+        lines.append(f"  note: {finding.message}")
+        if finding.code:
+            lines.append(f"  code: {finding.code}")
+    return "\n".join(lines)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description=("Detect async/thread boundary points for developer review. Findings are an inventory, not automatic bug decisions."))
+    parser.add_argument(
+        "paths",
+        nargs="*",
+        type=Path,
+        help="Files or directories to scan. Defaults to backend app and harness sources.",
+    )
+    parser.add_argument(
+        "--format",
+        choices=("text", "json"),
+        default="text",
+        help="Output format.",
+    )
+    parser.add_argument(
+        "--min-severity",
+        choices=tuple(SEVERITY_ORDER),
+        default="INFO",
+        help="Only show findings at or above this severity.",
+    )
+    return parser
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+    paths = args.paths or list(DEFAULT_SCAN_PATHS)
+    findings = filter_findings(scan_paths(paths), args.min_severity)
+
+    if args.format == "json":
+        print(json.dumps([finding.to_dict() for finding in findings], indent=2, sort_keys=True))
+    else:
+        print(format_text(findings))
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -233,3 +233,88 @@ class TestConcurrentFileWrites:
            thread.join()

        assert storage["content"] in {"seed\nA\nB\n", "seed\nB\nA\n"}
+
+
+class TestDownloadFile:
+    """Tests for AioSandbox.download_file."""
+
+    def test_returns_concatenated_bytes(self, sandbox):
+        """download_file should join chunks from the client iterator into bytes."""
+        sandbox._client.file.download_file = MagicMock(return_value=[b"hel", b"lo"])
+
+        result = sandbox.download_file("/mnt/user-data/outputs/file.bin")
+
+        assert result == b"hello"
+        sandbox._client.file.download_file.assert_called_once_with(path="/mnt/user-data/outputs/file.bin")
+
+    def test_returns_empty_bytes_for_empty_file(self, sandbox):
+        """download_file should return b'' when the iterator yields nothing."""
+        sandbox._client.file.download_file = MagicMock(return_value=iter([]))
+
+        result = sandbox.download_file("/mnt/user-data/outputs/empty.bin")
+
+        assert result == b""
+
+    def test_uses_lock_during_download(self, sandbox):
+        """download_file should hold the lock while calling the client."""
+        lock_was_held = []
+
+        def tracking_download(path):
+            lock_was_held.append(sandbox._lock.locked())
+            return iter([b"data"])
+
+        sandbox._client.file.download_file = tracking_download
+
+        sandbox.download_file("/mnt/user-data/outputs/file.bin")
+
+        assert lock_was_held == [True], "download_file must hold the lock during client call"
+
+    def test_raises_oserror_on_client_error(self, sandbox):
+        """download_file should wrap client exceptions as OSError."""
+        sandbox._client.file.download_file = MagicMock(side_effect=RuntimeError("network error"))
+
+        with pytest.raises(OSError, match="network error"):
+            sandbox.download_file("/mnt/user-data/outputs/file.bin")
+
+    def test_preserves_oserror_from_client(self, sandbox):
+        """OSError raised by the client should propagate without re-wrapping."""
+        sandbox._client.file.download_file = MagicMock(side_effect=OSError("disk error"))
+
+        with pytest.raises(OSError, match="disk error"):
+            sandbox.download_file("/mnt/user-data/outputs/file.bin")
+
+    def test_rejects_path_outside_virtual_prefix_and_logs_error(self, sandbox, caplog):
+        """download_file must reject downloads outside /mnt/user-data and log the reason."""
+        sandbox._client.file.download_file = MagicMock()
+
+        with caplog.at_level("ERROR"):
+            with pytest.raises(PermissionError, match="must be under"):
+                sandbox.download_file("/etc/passwd")
+
+        assert "outside allowed directory" in caplog.text
+        sandbox._client.file.download_file.assert_not_called()
+
+    @pytest.mark.parametrize(
+        "path",
+        [
+            "/mnt/workspace/../../etc/passwd",
+            "../secret",
+            "/a/b/../../../etc/shadow",
+        ],
+    )
+    def test_rejects_path_traversal(self, sandbox, path):
+        """download_file must reject paths containing '..' before calling the client."""
+        sandbox._client.file.download_file = MagicMock()
+
+        with pytest.raises(PermissionError, match="path traversal"):
+            sandbox.download_file(path)
+
+        sandbox._client.file.download_file.assert_not_called()
+
+    def test_single_chunk(self, sandbox):
+        """download_file should work correctly with a single-chunk response."""
+        sandbox._client.file.download_file = MagicMock(return_value=[b"single-chunk"])
+
+        result = sandbox.download_file("/mnt/user-data/outputs/single.bin")
+
+        assert result == b"single-chunk"
@@ -1,5 +1,6 @@
 """Tests for AioSandboxProvider mount helpers."""

+import asyncio
 import importlib
 from types import SimpleNamespace
 from unittest.mock import MagicMock, patch
@@ -140,6 +141,182 @@ def test_discover_or_create_only_unlocks_when_lock_succeeds(tmp_path, monkeypatc
    assert unlock_calls == []


+@pytest.mark.anyio
+async def test_acquire_async_uses_async_readiness_polling(monkeypatch):
+    """AioSandboxProvider async creation must not use sync readiness polling."""
+    aio_mod = importlib.import_module("deerflow.community.aio_sandbox.aio_sandbox_provider")
+    provider = _make_provider(None)
+    provider._config = {"replicas": 3}
+    provider._thread_locks = {}
+    provider._warm_pool = {}
+    provider._sandbox_infos = {}
+    provider._thread_sandboxes = {}
+    provider._last_activity = {}
+    provider._lock = aio_mod.threading.Lock()
+    provider._backend = SimpleNamespace(
+        create=MagicMock(return_value=aio_mod.SandboxInfo(sandbox_id="sandbox-async", sandbox_url="http://sandbox")),
+        destroy=MagicMock(),
+        discover=MagicMock(return_value=None),
+    )
+
+    async_readiness_calls: list[tuple[str, int]] = []
+
+    async def fake_wait_for_sandbox_ready_async(sandbox_url: str, timeout: int = 30, poll_interval: float = 1.0) -> bool:
+        async_readiness_calls.append((sandbox_url, timeout))
+        return True
+
+    monkeypatch.setattr(aio_mod, "wait_for_sandbox_ready_async", fake_wait_for_sandbox_ready_async)
+    monkeypatch.setattr(
+        aio_mod,
+        "wait_for_sandbox_ready",
+        lambda *_args, **_kwargs: (_ for _ in ()).throw(AssertionError("sync readiness should not be used")),
+    )
+
+    sandbox_id = await provider._create_sandbox_async("thread-async", "sandbox-async")
+
+    assert sandbox_id == "sandbox-async"
+    assert async_readiness_calls == [("http://sandbox", 60)]
+    assert provider._backend.destroy.call_count == 0
+    assert provider._thread_sandboxes["thread-async"] == "sandbox-async"
+
+
+@pytest.mark.anyio
+async def test_discover_or_create_with_lock_async_offloads_lock_file_open_and_close(tmp_path, monkeypatch):
+    """Async lock path must not open or close lock files on the event loop."""
+    aio_mod = importlib.import_module("deerflow.community.aio_sandbox.aio_sandbox_provider")
+    provider = _make_provider(tmp_path)
+    provider._discover_or_create_with_lock_async = aio_mod.AioSandboxProvider._discover_or_create_with_lock_async.__get__(
+        provider,
+        aio_mod.AioSandboxProvider,
+    )
+    provider._thread_locks = {}
+    provider._warm_pool = {}
+    provider._sandbox_infos = {}
+    provider._thread_sandboxes = {"thread-async-lock": "sandbox-async-lock"}
+    provider._sandboxes = {"sandbox-async-lock": aio_mod.AioSandbox(id="sandbox-async-lock", base_url="http://sandbox")}
+    provider._last_activity = {}
+    provider._lock = aio_mod.threading.Lock()
+    provider._backend = SimpleNamespace(discover=MagicMock(return_value=None))
+
+    monkeypatch.setattr(aio_mod, "get_paths", lambda: Paths(base_dir=tmp_path))
+
+    to_thread_calls: list[object] = []
+
+    async def fake_to_thread(func, /, *args, **kwargs):
+        to_thread_calls.append(func)
+        return func(*args, **kwargs)
+
+    monkeypatch.setattr(aio_mod.asyncio, "to_thread", fake_to_thread)
+
+    sandbox_id = await provider._discover_or_create_with_lock_async("thread-async-lock", "sandbox-async-lock")
+
+    assert sandbox_id == "sandbox-async-lock"
+    assert aio_mod._open_lock_file in to_thread_calls
+    assert any(getattr(func, "__name__", "") == "close" for func in to_thread_calls)
+
+
+@pytest.mark.anyio
+async def test_acquire_thread_lock_async_uses_dedicated_executor(monkeypatch):
+    """Per-thread lock waits should not consume the default asyncio.to_thread pool."""
+    aio_mod = importlib.import_module("deerflow.community.aio_sandbox.aio_sandbox_provider")
+    lock = aio_mod.threading.Lock()
+
+    async def fail_to_thread(*_args, **_kwargs):
+        raise AssertionError("thread-lock acquisition must not use asyncio.to_thread")
+
+    monkeypatch.setattr(aio_mod.asyncio, "to_thread", fail_to_thread)
+
+    await aio_mod._acquire_thread_lock_async(lock)
+    try:
+        assert not lock.acquire(blocking=False)
+    finally:
+        lock.release()
+
+
+@pytest.mark.anyio
+async def test_acquire_async_cancellation_does_not_leak_thread_lock(tmp_path):
+    """Cancelled async lock waiters must not leave the per-thread lock held."""
+    aio_mod = importlib.import_module("deerflow.community.aio_sandbox.aio_sandbox_provider")
+    provider = _make_provider(tmp_path)
+    provider._thread_locks = {}
+    provider._warm_pool = {}
+    provider._sandbox_infos = {}
+    provider._thread_sandboxes = {}
+    provider._last_activity = {}
+    provider._lock = aio_mod.threading.Lock()
+
+    thread_id = "thread-cancel-lock"
+    thread_lock = provider._get_thread_lock(thread_id)
+    thread_lock.acquire()
+
+    task = asyncio.create_task(provider.acquire_async(thread_id))
+    await asyncio.sleep(0.05)
+    task.cancel()
+
+    try:
+        await task
+    except asyncio.CancelledError:
+        pass
+
+    thread_lock.release()
+    deadline = asyncio.get_running_loop().time() + 1
+    while asyncio.get_running_loop().time() < deadline:
+        acquired = thread_lock.acquire(blocking=False)
+        if acquired:
+            thread_lock.release()
+            return
+        await asyncio.sleep(0.01)
+
+    pytest.fail("provider thread lock was leaked after cancelling acquire_async")
+
+
+@pytest.mark.anyio
+async def test_acquire_async_cancelled_waiter_does_not_block_successor(tmp_path, monkeypatch):
+    """A cancelled waiter must not prevent the next live waiter from acquiring."""
+    aio_mod = importlib.import_module("deerflow.community.aio_sandbox.aio_sandbox_provider")
+    provider = _make_provider(tmp_path)
+    provider._thread_locks = {}
+    provider._warm_pool = {}
+    provider._sandbox_infos = {}
+    provider._thread_sandboxes = {}
+    provider._last_activity = {}
+    provider._lock = aio_mod.threading.Lock()
+
+    async def fake_acquire_internal_async(thread_id: str | None) -> str:
+        assert thread_id == "thread-successor-lock"
+        await asyncio.sleep(0)
+        return "sandbox-successor"
+
+    monkeypatch.setattr(provider, "_acquire_internal_async", fake_acquire_internal_async)
+
+    thread_id = "thread-successor-lock"
+    thread_lock = provider._get_thread_lock(thread_id)
+    thread_lock.acquire()
+
+    cancelled_waiter = asyncio.create_task(provider.acquire_async(thread_id))
+    await asyncio.sleep(0.05)
+    cancelled_waiter.cancel()
+    try:
+        await cancelled_waiter
+    except asyncio.CancelledError:
+        pass
+
+    live_waiter = asyncio.create_task(provider.acquire_async(thread_id))
+    thread_lock.release()
+
+    assert await asyncio.wait_for(live_waiter, timeout=1) == "sandbox-successor"
+
+    deadline = asyncio.get_running_loop().time() + 1
+    while asyncio.get_running_loop().time() < deadline:
+        acquired = thread_lock.acquire(blocking=False)
+        if acquired:
+            thread_lock.release()
+            return
+        await asyncio.sleep(0.01)
+
+    pytest.fail("provider thread lock was not released after successor acquire_async")
+
+
 def test_remote_backend_create_forwards_effective_user_id(monkeypatch):
    """Provisioner mode must receive user_id so PVC subPath matches user isolation."""
    remote_mod = importlib.import_module("deerflow.community.aio_sandbox.remote_backend")
@@ -0,0 +1,119 @@
+from __future__ import annotations
+
+from types import SimpleNamespace
+
+import pytest
+
+from deerflow.community.aio_sandbox import backend as readiness
+
+
+class _FakeAsyncClient:
+    def __init__(self, *, responses: list[object], calls: list[str], timeout: float, request_timeouts: list[float] | None = None) -> None:
+        self._responses = responses
+        self._calls = calls
+        self._timeout = timeout
+        self._request_timeouts = request_timeouts
+
+    async def __aenter__(self) -> _FakeAsyncClient:
+        return self
+
+    async def __aexit__(self, exc_type, exc, tb) -> None:
+        return None
+
+    async def get(self, url: str, *, timeout: float):
+        self._calls.append(url)
+        if self._request_timeouts is not None:
+            self._request_timeouts.append(timeout)
+        response = self._responses.pop(0)
+        if isinstance(response, BaseException):
+            raise response
+        return response
+
+
+class _FakeLoop:
+    def __init__(self, times: list[float]) -> None:
+        self._times = times
+        self._index = 0
+
+    def time(self) -> float:
+        value = self._times[self._index]
+        self._index += 1
+        return value
+
+
+@pytest.mark.anyio
+async def test_wait_for_sandbox_ready_async_uses_nonblocking_polling(monkeypatch: pytest.MonkeyPatch) -> None:
+    calls: list[str] = []
+    sleeps: list[float] = []
+
+    def fake_client(*, timeout: float):
+        return _FakeAsyncClient(
+            responses=[SimpleNamespace(status_code=503), SimpleNamespace(status_code=200)],
+            calls=calls,
+            timeout=timeout,
+        )
+
+    async def fake_sleep(delay: float) -> None:
+        sleeps.append(delay)
+
+    monkeypatch.setattr(readiness.httpx, "AsyncClient", fake_client)
+    monkeypatch.setattr(readiness.asyncio, "sleep", fake_sleep)
+    monkeypatch.setattr(readiness.requests, "get", lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("requests.get should not be used")))
+    monkeypatch.setattr(readiness.time, "sleep", lambda *_args, **_kwargs: (_ for _ in ()).throw(AssertionError("time.sleep should not be used")))
+
+    assert await readiness.wait_for_sandbox_ready_async("http://sandbox", timeout=5, poll_interval=0.05) is True
+
+    assert calls == ["http://sandbox/v1/sandbox", "http://sandbox/v1/sandbox"]
+    assert sleeps == [0.05]
+
+
+@pytest.mark.anyio
+async def test_wait_for_sandbox_ready_async_retries_request_errors(monkeypatch: pytest.MonkeyPatch) -> None:
+    calls: list[str] = []
+    sleeps: list[float] = []
+
+    def fake_client(*, timeout: float):
+        return _FakeAsyncClient(
+            responses=[readiness.httpx.ConnectError("not ready"), SimpleNamespace(status_code=200)],
+            calls=calls,
+            timeout=timeout,
+        )
+
+    async def fake_sleep(delay: float) -> None:
+        sleeps.append(delay)
+
+    monkeypatch.setattr(readiness.httpx, "AsyncClient", fake_client)
+    monkeypatch.setattr(readiness.asyncio, "sleep", fake_sleep)
+
+    assert await readiness.wait_for_sandbox_ready_async("http://sandbox", timeout=5, poll_interval=0.01) is True
+
+    assert len(calls) == 2
+    assert sleeps == [0.01]
+
+
+@pytest.mark.anyio
+async def test_wait_for_sandbox_ready_async_clamps_request_and_sleep_to_deadline(monkeypatch: pytest.MonkeyPatch) -> None:
+    calls: list[str] = []
+    request_timeouts: list[float] = []
+    sleeps: list[float] = []
+
+    def fake_client(*, timeout: float):
+        return _FakeAsyncClient(
+            responses=[SimpleNamespace(status_code=503)],
+            calls=calls,
+            timeout=timeout,
+            request_timeouts=request_timeouts,
+        )
+
+    async def fake_sleep(delay: float) -> None:
+        sleeps.append(delay)
+
+    monkeypatch.setattr(readiness.httpx, "AsyncClient", fake_client)
+    monkeypatch.setattr(readiness.asyncio, "sleep", fake_sleep)
+    monkeypatch.setattr(readiness.asyncio, "get_running_loop", lambda: _FakeLoop([100.0, 100.5, 101.75, 102.0]))
+
+    assert await readiness.wait_for_sandbox_ready_async("http://sandbox", timeout=2, poll_interval=1.0) is False
+
+    assert calls == ["http://sandbox/v1/sandbox"]
+    assert request_timeouts == [1.5]
+    assert sleeps == [0.25]
@@ -0,0 +1,142 @@
+"""Tests for idempotent run cancellation (issue #3055).
+
+RunManager.cancel() returns True when a run is already interrupted so that
+a second cancel request from the same worker is treated as a no-op success
+(202) rather than a conflict (409).  Both the POST cancel endpoint and the
+POST stream endpoint share this behaviour through the same cancel() call.
+"""
+
+from __future__ import annotations
+
+import asyncio
+
+from _router_auth_helpers import make_authed_test_app
+from fastapi.testclient import TestClient
+
+from app.gateway.routers import thread_runs
+from deerflow.runtime import RunManager, RunStatus
+
+THREAD_ID = "thread-cancel-test"
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_app(mgr: RunManager) -> TestClient:
+    app = make_authed_test_app()
+    app.include_router(thread_runs.router)
+    app.state.run_manager = mgr
+    return TestClient(app, raise_server_exceptions=False)
+
+
+def _create_interrupted_run(mgr: RunManager) -> str:
+    """Create a run and cancel it, returning its run_id."""
+
+    async def _setup():
+        record = await mgr.create(THREAD_ID)
+        await mgr.set_status(record.run_id, RunStatus.running)
+        await mgr.cancel(record.run_id)
+        return record.run_id
+
+    return asyncio.run(_setup())
+
+
+# ---------------------------------------------------------------------------
+# RunManager.cancel() unit tests
+# ---------------------------------------------------------------------------
+
+
+class TestRunManagerCancelIdempotency:
+    def test_cancel_returns_true_for_already_interrupted_run(self):
+        """cancel() must return True when the run is already interrupted."""
+
+        async def run():
+            mgr = RunManager()
+            record = await mgr.create(THREAD_ID)
+            await mgr.set_status(record.run_id, RunStatus.running)
+            first = await mgr.cancel(record.run_id)
+            assert first is True
+            second = await mgr.cancel(record.run_id)
+            assert second is True  # idempotent
+
+        asyncio.run(run())
+
+    def test_cancel_returns_false_for_successful_run(self):
+        """cancel() must still return False for runs that completed successfully."""
+
+        async def run():
+            mgr = RunManager()
+            record = await mgr.create(THREAD_ID)
+            await mgr.set_status(record.run_id, RunStatus.running)
+            await mgr.set_status(record.run_id, RunStatus.success)
+            result = await mgr.cancel(record.run_id)
+            assert result is False
+
+        asyncio.run(run())
+
+    def test_cancel_returns_false_for_unknown_run(self):
+        async def run():
+            mgr = RunManager()
+            result = await mgr.cancel("nonexistent-run-id")
+            assert result is False
+
+        asyncio.run(run())
+
+
+# ---------------------------------------------------------------------------
+# POST /cancel endpoint — idempotent 202
+# ---------------------------------------------------------------------------
+
+
+class TestCancelRunEndpointIdempotency:
+    def test_double_cancel_returns_202_not_409(self):
+        """Second cancel on an already-interrupted run must return 202, not 409."""
+        mgr = RunManager()
+        run_id = _create_interrupted_run(mgr)
+        client = _make_app(mgr)
+
+        resp = client.post(f"/api/threads/{THREAD_ID}/runs/{run_id}/cancel")
+        assert resp.status_code == 202, f"Expected 202, got {resp.status_code}: {resp.text}"
+
+    def test_cancel_unknown_run_returns_404(self):
+        mgr = RunManager()
+        client = _make_app(mgr)
+        resp = client.post(f"/api/threads/{THREAD_ID}/runs/no-such-run/cancel")
+        assert resp.status_code == 404
+
+    def test_cancel_successful_run_returns_409(self):
+        """Successfully-completed runs cannot be cancelled — must return 409."""
+
+        async def _setup():
+            mgr = RunManager()
+            record = await mgr.create(THREAD_ID)
+            await mgr.set_status(record.run_id, RunStatus.running)
+            await mgr.set_status(record.run_id, RunStatus.success)
+            return mgr, record.run_id
+
+        mgr, run_id = asyncio.run(_setup())
+        client = _make_app(mgr)
+        resp = client.post(f"/api/threads/{THREAD_ID}/runs/{run_id}/cancel")
+        assert resp.status_code == 409
+
+
+# ---------------------------------------------------------------------------
+# POST /{thread_id}/runs/{run_id}/join (stream_existing_run) — idempotent cancel
+# ---------------------------------------------------------------------------
+
+
+class TestStreamExistingRunIdempotentCancel:
+    def test_stream_cancel_already_interrupted_returns_not_409(self):
+        """stream_existing_run with action=interrupt on an already-interrupted run
+        must not raise 409 — the idempotent cancel path returns 202/SSE."""
+        mgr = RunManager()
+        run_id = _create_interrupted_run(mgr)
+        client = _make_app(mgr)
+
+        resp = client.post(
+            f"/api/threads/{THREAD_ID}/runs/{run_id}/join",
+            params={"action": "interrupt"},
+        )
+        assert resp.status_code != 409, f"Should not 409 on idempotent cancel, got {resp.status_code}"
@@ -372,37 +372,6 @@ class TestExtractResponseText:
        # Should return "" (no text in current turn), NOT "Hi there!" from previous turn
        assert _extract_response_text(result) == ""

-    def test_does_not_publish_loop_warning_on_tool_calling_ai_message(self):
-        """Loop-detection warning text on a tool-calling AI message is middleware-authored."""
-        from app.channels.manager import _extract_response_text
-
-        result = {
-            "messages": [
-                {"type": "human", "content": "search the repo"},
-                {
-                    "type": "ai",
-                    "content": "[LOOP DETECTED] You are repeating the same tool calls.",
-                    "tool_calls": [{"name": "grep", "args": {"pattern": "TODO"}, "id": "call_1"}],
-                },
-            ]
-        }
-        assert _extract_response_text(result) == ""
-
-    def test_preserves_visible_text_when_stripping_loop_warning(self):
-        from app.channels.manager import _extract_response_text
-
-        result = {
-            "messages": [
-                {"type": "human", "content": "prepare the report"},
-                {
-                    "type": "ai",
-                    "content": "Here is the report.\n\n[LOOP DETECTED] You are repeating the same tool calls.",
-                    "tool_calls": [{"name": "present_files", "args": {"filepaths": ["/mnt/user-data/outputs/report.md"]}, "id": "call_1"}],
-                },
-            ]
-        }
-        assert _extract_response_text(result) == "Here is the report."
-

 # ---------------------------------------------------------------------------
 # ChannelManager tests
@@ -0,0 +1,159 @@
+"""Tests for DeerFlowClient's graph-root tracing wiring.
+
+Regression coverage for the Copilot review on PR #2944: when the title
+and summarization middlewares request ``attach_tracing=False`` we must
+make sure ``DeerFlowClient`` injects the tracing callbacks at the graph
+invocation root instead, otherwise those middlewares produce untraced
+LLM calls.
+"""
+
+from __future__ import annotations
+
+from types import SimpleNamespace
+from typing import Any
+
+import pytest
+
+from deerflow.client import DeerFlowClient
+
+
+class _FakeAgent:
+    """Capture the ``config`` handed to ``agent.stream``."""
+
+    def __init__(self) -> None:
+        self.captured_config: dict | None = None
+        self.checkpointer = None
+        self.store = None
+
+    def stream(self, state, *, config, context, stream_mode):
+        self.captured_config = config
+        return iter(())  # empty stream
+
+
+@pytest.fixture(autouse=True)
+def _clear_langfuse_env(monkeypatch):
+    from deerflow.config.tracing_config import reset_tracing_config
+
+    for name in ("LANGFUSE_TRACING", "LANGFUSE_PUBLIC_KEY", "LANGFUSE_SECRET_KEY", "LANGFUSE_BASE_URL"):
+        monkeypatch.delenv(name, raising=False)
+    reset_tracing_config()
+    yield
+    reset_tracing_config()
+
+
+def _stub_agent_creation(monkeypatch, fake_agent: _FakeAgent) -> dict[str, Any]:
+    """Short-circuit the heavy parts of ``_ensure_agent`` so we can drive
+    ``stream()`` against a fake graph without touching real models, tools
+    or middleware factories.
+    """
+    captured: dict[str, Any] = {}
+
+    def _stub_ensure_agent(self, config):
+        captured["config"] = config
+        self._agent = fake_agent
+        self._agent_config_key = ("stub",)
+
+    monkeypatch.setattr(DeerFlowClient, "_ensure_agent", _stub_ensure_agent)
+    return captured
+
+
+def _make_client(_monkeypatch) -> DeerFlowClient:
+    """Build a client without going through ``__init__`` so we never load
+    config.yaml or perform any other side-effectful startup work."""
+    fake_app_config = SimpleNamespace(models=[SimpleNamespace(name="stub-model")])
+    client = DeerFlowClient.__new__(DeerFlowClient)
+    client._app_config = fake_app_config
+    client._extensions_config = None
+    client._model_name = "stub-model"
+    client._thinking_enabled = False
+    client._plan_mode = False
+    client._subagent_enabled = False
+    client._agent_name = None
+    client._available_skills = None
+    client._middlewares = None
+    client._checkpointer = None
+    client._agent = None
+    client._agent_config_key = None
+    client._environment = None
+    return client
+
+
+def test_stream_injects_langfuse_metadata_when_enabled(monkeypatch):
+    monkeypatch.setenv("LANGFUSE_TRACING", "true")
+    monkeypatch.setenv("LANGFUSE_PUBLIC_KEY", "pk-lf-test")
+    monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
+    from deerflow.config.tracing_config import reset_tracing_config
+
+    reset_tracing_config()
+
+    class _SentinelHandler:
+        pass
+
+    sentinel = _SentinelHandler()
+    monkeypatch.setattr("deerflow.client.build_tracing_callbacks", lambda: [sentinel])
+
+    fake_agent = _FakeAgent()
+    captured = _stub_agent_creation(monkeypatch, fake_agent)
+    client = _make_client(monkeypatch)
+
+    list(client.stream("hi", thread_id="thread-client-1"))
+
+    config = captured["config"]
+    metadata = config.get("metadata") or {}
+    assert metadata.get("langfuse_session_id") == "thread-client-1"
+    assert metadata.get("langfuse_trace_name") == "lead-agent"
+    # Default no-auth context falls back to ``"default"`` user.
+    assert metadata.get("langfuse_user_id") in {"default", "test-user-autouse"}
+    callbacks = config.get("callbacks") or []
+    assert sentinel in callbacks
+
+
+def test_stream_is_inert_when_langfuse_disabled(monkeypatch):
+    monkeypatch.setattr("deerflow.client.build_tracing_callbacks", lambda: [])
+
+    fake_agent = _FakeAgent()
+    captured = _stub_agent_creation(monkeypatch, fake_agent)
+    client = _make_client(monkeypatch)
+
+    list(client.stream("hi", thread_id="thread-client-2"))
+
+    config = captured["config"]
+    assert "callbacks" not in config or not config["callbacks"]
+    metadata = config.get("metadata") or {}
+    assert "langfuse_session_id" not in metadata
+    assert "langfuse_user_id" not in metadata
+
+
+def test_stream_preserves_caller_metadata_overrides(monkeypatch):
+    monkeypatch.setenv("LANGFUSE_TRACING", "true")
+    monkeypatch.setenv("LANGFUSE_PUBLIC_KEY", "pk-lf-test")
+    monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
+    from deerflow.config.tracing_config import reset_tracing_config
+
+    reset_tracing_config()
+    monkeypatch.setattr("deerflow.client.build_tracing_callbacks", lambda: [])
+
+    fake_agent = _FakeAgent()
+    captured = _stub_agent_creation(monkeypatch, fake_agent)
+    client = _make_client(monkeypatch)
+
+    # Drive stream with a pre-populated metadata so the worker-equivalent
+    # ``setdefault`` semantics are exercised.
+    original_get_config = DeerFlowClient._get_runnable_config
+
+    def patched_get_runnable_config(self, thread_id, **overrides):
+        cfg = original_get_config(self, thread_id, **overrides)
+        cfg["metadata"] = {
+            "langfuse_session_id": "explicit-session-override",
+            "langfuse_user_id": "explicit-user",
+        }
+        return cfg
+
+    monkeypatch.setattr(DeerFlowClient, "_get_runnable_config", patched_get_runnable_config)
+    list(client.stream("hi", thread_id="thread-client-3"))
+
+    metadata = captured["config"].get("metadata") or {}
+    assert metadata["langfuse_session_id"] == "explicit-session-override"
+    assert metadata["langfuse_user_id"] == "explicit-user"
+    # ``trace_name`` was not supplied by caller so the worker still fills it.
+    assert metadata["langfuse_trace_name"] == "lead-agent"
@@ -190,6 +190,24 @@ class TestBuildPatchedMessagesPatching:
        assert [patched[1].tool_call_id, patched[2].tool_call_id] == ["call_1", "call_2"]
        assert isinstance(patched[3], HumanMessage)

+    def test_non_tool_message_inserted_between_partial_tool_results_is_regrouped(self):
+        mw = DanglingToolCallMiddleware()
+        msgs = [
+            _ai_with_tool_calls([_tc("bash", "call_1"), _tc("read", "call_2")]),
+            _tool_msg("call_1", "bash"),
+            HumanMessage(content="interruption"),
+            _tool_msg("call_2", "read"),
+        ]
+
+        patched = mw._build_patched_messages(msgs)
+
+        assert patched is not None
+        assert isinstance(patched[0], AIMessage)
+        assert isinstance(patched[1], ToolMessage)
+        assert isinstance(patched[2], ToolMessage)
+        assert [patched[1].tool_call_id, patched[2].tool_call_id] == ["call_1", "call_2"]
+        assert isinstance(patched[3], HumanMessage)
+
    def test_valid_adjacent_tool_results_are_unchanged(self):
        mw = DanglingToolCallMiddleware()
        msgs = [
@@ -237,7 +255,8 @@ class TestBuildPatchedMessagesPatching:
        assert isinstance(patched[0], AIMessage)
        assert isinstance(patched[1], ToolMessage)
        assert patched[1].tool_call_id == "call_1"
-        assert orphan in patched
+        assert patched[2] is orphan
+        assert isinstance(patched[3], HumanMessage)
        assert patched.count(orphan) == 1

    def test_invalid_tool_call_is_patched(self):
@@ -0,0 +1,182 @@
+from __future__ import annotations
+
+import json
+import textwrap
+from pathlib import Path
+
+from support.detectors import thread_boundaries as detector
+
+
+def _write_python(path: Path, source: str) -> Path:
+    path.write_text(textwrap.dedent(source).strip() + "\n", encoding="utf-8")
+    return path
+
+
+def test_scan_file_detects_async_thread_and_tool_boundaries(tmp_path):
+    source_file = _write_python(
+        tmp_path / "sample.py",
+        """
+        import asyncio
+        import threading
+        import time
+        from concurrent.futures import ThreadPoolExecutor
+        from langchain.tools import tool
+        from langchain_core.tools import StructuredTool
+
+        @tool
+        async def async_tool(value: int) -> str:
+            return str(value)
+
+        async def handler(model):
+            await asyncio.to_thread(str, "x")
+            model.invoke("blocking")
+            time.sleep(1)
+
+        def sync_entry():
+            asyncio.run(handler(None))
+            pool = ThreadPoolExecutor(max_workers=1)
+            pool.submit(str, "x")
+            threading.Thread(target=sync_entry).start()
+            return StructuredTool.from_function(
+                name="factory_tool",
+                description="factory",
+                coroutine=async_tool,
+            )
+        """,
+    )
+
+    findings = detector.scan_file(source_file, repo_root=tmp_path)
+    categories = {finding.category for finding in findings}
+    async_tool_finding = next(finding for finding in findings if finding.category == "ASYNC_TOOL_DEFINITION")
+
+    assert "ASYNC_TOOL_DEFINITION" in categories
+    assert async_tool_finding.function == "async_tool"
+    assert async_tool_finding.async_context is True
+    assert "ASYNC_THREAD_OFFLOAD" in categories
+    assert "SYNC_INVOKE_IN_ASYNC" in categories
+    assert "BLOCKING_CALL_IN_ASYNC" in categories
+    assert "SYNC_ASYNC_BRIDGE" in categories
+    assert "THREAD_POOL" in categories
+    assert "EXECUTOR_SUBMIT" in categories
+    assert "RAW_THREAD" in categories
+    assert "ASYNC_ONLY_TOOL_FACTORY" in categories
+
+
+def test_scan_file_ignores_unqualified_threads_and_generic_method_names(tmp_path):
+    source_file = _write_python(
+        tmp_path / "sample.py",
+        """
+        class Thread:
+            pass
+
+        class Timer:
+            pass
+
+        async def handler(form, runner):
+            form.submit()
+            runner.invoke("not a langchain model")
+
+        def sync_entry(runner):
+            Thread()
+            Timer()
+            runner.ainvoke("not a langchain model")
+        """,
+    )
+
+    findings = detector.scan_file(source_file, repo_root=tmp_path)
+    categories = {finding.category for finding in findings}
+
+    assert "RAW_THREAD" not in categories
+    assert "RAW_TIMER_THREAD" not in categories
+    assert "EXECUTOR_SUBMIT" not in categories
+    assert "SYNC_INVOKE_IN_ASYNC" not in categories
+    assert "ASYNC_INVOKE_IN_SYNC" not in categories
+
+
+def test_scan_file_uses_import_evidence_for_thread_and_executor_aliases(tmp_path):
+    source_file = _write_python(
+        tmp_path / "sample.py",
+        """
+        from concurrent.futures import ThreadPoolExecutor as Pool
+        from threading import Thread as WorkerThread, Timer
+
+        def sync_entry():
+            pool = Pool(max_workers=1)
+            pool.submit(str, "x")
+            WorkerThread(target=sync_entry).start()
+            Timer(1, sync_entry).start()
+        """,
+    )
+
+    findings = detector.scan_file(source_file, repo_root=tmp_path)
+    categories = {finding.category for finding in findings}
+
+    assert "THREAD_POOL" in categories
+    assert "EXECUTOR_SUBMIT" in categories
+    assert "RAW_THREAD" in categories
+    assert "RAW_TIMER_THREAD" in categories
+
+
+def test_scan_paths_ignores_virtualenv_like_directories(tmp_path):
+    scanned_file = _write_python(
+        tmp_path / "app.py",
+        """
+        import asyncio
+
+        def main():
+            return asyncio.run(asyncio.sleep(0))
+        """,
+    )
+    ignored_dir = tmp_path / ".venv"
+    ignored_dir.mkdir()
+    _write_python(
+        ignored_dir / "ignored.py",
+        """
+        import threading
+
+        thread = threading.Thread(target=lambda: None)
+        """,
+    )
+
+    findings = detector.scan_paths([tmp_path], repo_root=tmp_path)
+
+    assert any(finding.path == scanned_file.name for finding in findings)
+    assert all(".venv" not in finding.path for finding in findings)
+
+
+def test_json_output_and_min_severity_filter(tmp_path, capsys):
+    source_file = _write_python(
+        tmp_path / "sample.py",
+        """
+        import asyncio
+
+        async def handler(model):
+            await asyncio.to_thread(str, "x")
+            model.invoke("blocking")
+        """,
+    )
+
+    exit_code = detector.main(["--format", "json", "--min-severity", "WARN", str(source_file)])
+
+    assert exit_code == 0
+    payload = json.loads(capsys.readouterr().out)
+    categories = {finding["category"] for finding in payload}
+    assert categories == {"SYNC_INVOKE_IN_ASYNC"}
+
+
+def test_parse_errors_are_reported_as_findings(tmp_path):
+    source_file = _write_python(
+        tmp_path / "broken.py",
+        """
+        def broken(:
+            pass
+        """,
+    )
+
+    findings = detector.scan_file(source_file, repo_root=tmp_path)
+
+    assert len(findings) == 1
+    assert findings[0].category == "PARSE_ERROR"
+    assert findings[0].severity == "WARN"
+    assert findings[0].column == 11
+    assert f"{source_file.name}:1:12" in detector.format_text(findings)
@@ -114,6 +114,7 @@ def test_build_run_config_custom_agent_injects_agent_name():

    config = build_run_config("thread-1", None, None, assistant_id="finalis")
    assert config["configurable"]["agent_name"] == "finalis"
+    assert config["run_name"] == "finalis"


 def test_build_run_config_lead_agent_no_agent_name():
@@ -122,6 +123,7 @@ def test_build_run_config_lead_agent_no_agent_name():

    config = build_run_config("thread-1", None, None, assistant_id="lead_agent")
    assert "agent_name" not in config["configurable"]
+    assert "run_name" not in config


 def test_build_run_config_none_assistant_id_no_agent_name():
@@ -130,6 +132,7 @@ def test_build_run_config_none_assistant_id_no_agent_name():

    config = build_run_config("thread-1", None, None, assistant_id=None)
    assert "agent_name" not in config["configurable"]
+    assert "run_name" not in config


 def test_build_run_config_explicit_agent_name_not_overwritten():
@@ -143,6 +146,7 @@ def test_build_run_config_explicit_agent_name_not_overwritten():
        assistant_id="other-agent",
    )
    assert config["configurable"]["agent_name"] == "explicit-agent"
+    assert config["run_name"] == "explicit-agent"


 def test_build_run_config_context_custom_agent_injects_agent_name():
@@ -699,6 +699,92 @@ def test_get_available_tools_includes_invoke_acp_agent_when_agents_configured(mo
    load_acp_config_from_dict({})


+def test_get_available_tools_sync_invoke_acp_agent_preserves_thread_workspace(monkeypatch, tmp_path):
+    from deerflow.config import paths as paths_module
+    from deerflow.runtime import user_context as uc_module
+
+    monkeypatch.setattr(paths_module, "get_paths", lambda: paths_module.Paths(base_dir=tmp_path))
+    monkeypatch.setattr(uc_module, "get_effective_user_id", lambda: None)
+    monkeypatch.setattr(
+        "deerflow.config.extensions_config.ExtensionsConfig.from_file",
+        classmethod(lambda cls: ExtensionsConfig(mcp_servers={}, skills={})),
+    )
+    monkeypatch.setattr("deerflow.tools.tools.is_host_bash_allowed", lambda config=None: True)
+
+    captured: dict[str, object] = {}
+
+    class DummyClient:
+        @property
+        def collected_text(self) -> str:
+            return "ok"
+
+        async def session_update(self, session_id, update, **kwargs):
+            pass
+
+        async def request_permission(self, options, session_id, tool_call, **kwargs):
+            raise AssertionError("should not be called")
+
+    class DummyConn:
+        async def initialize(self, **kwargs):
+            pass
+
+        async def new_session(self, **kwargs):
+            return SimpleNamespace(session_id="s1")
+
+        async def prompt(self, **kwargs):
+            pass
+
+    class DummyProcessContext:
+        def __init__(self, client, cmd, *args, env=None, cwd):
+            captured["cwd"] = cwd
+
+        async def __aenter__(self):
+            return DummyConn(), object()
+
+        async def __aexit__(self, exc_type, exc, tb):
+            return False
+
+    monkeypatch.setitem(
+        sys.modules,
+        "acp",
+        SimpleNamespace(
+            PROTOCOL_VERSION="2026-03-24",
+            Client=DummyClient,
+            spawn_agent_process=lambda client, cmd, *args, env=None, cwd: DummyProcessContext(client, cmd, *args, env=env, cwd=cwd),
+            text_block=lambda text: {"type": "text", "text": text},
+        ),
+    )
+    monkeypatch.setitem(
+        sys.modules,
+        "acp.schema",
+        SimpleNamespace(
+            ClientCapabilities=lambda: {},
+            Implementation=lambda **kwargs: kwargs,
+            TextContentBlock=type("TextContentBlock", (), {"__init__": lambda self, text: setattr(self, "text", text)}),
+        ),
+    )
+
+    explicit_config = SimpleNamespace(
+        tools=[],
+        models=[],
+        tool_search=SimpleNamespace(enabled=False),
+        skill_evolution=SimpleNamespace(enabled=False),
+        sandbox=SimpleNamespace(),
+        get_model_config=lambda name: None,
+        acp_agents={"codex": ACPAgentConfig(command="codex-acp", description="Codex CLI")},
+    )
+    tools = get_available_tools(include_mcp=False, subagent_enabled=False, app_config=explicit_config)
+    tool = next(tool for tool in tools if tool.name == "invoke_acp_agent")
+
+    thread_id = "thread-sync-123"
+    tool.invoke(
+        {"agent": "codex", "prompt": "Do something"},
+        config={"configurable": {"thread_id": thread_id}},
+    )
+
+    assert captured["cwd"] == str(tmp_path / "threads" / thread_id / "acp-workspace")
+
+
 def test_get_available_tools_uses_explicit_app_config_for_acp_agents(monkeypatch):
    explicit_agents = {"codex": ACPAgentConfig(command="codex-acp", description="Codex CLI")}
    explicit_config = SimpleNamespace(
@@ -41,6 +41,49 @@ def test_make_lead_agent_signature_matches_langgraph_server_factory_abi():
    assert list(inspect.signature(lead_agent_module.make_lead_agent).parameters) == ["config"]


+def test_make_lead_agent_attaches_tracing_callbacks_at_graph_root(monkeypatch):
+    """Regression guard: tracing handlers must be appended to
+    ``config["callbacks"]`` (graph invocation root), and every in-graph
+    ``create_chat_model`` call must pass ``attach_tracing=False``.
+
+    Catches future contributors who forget the flag when adding new
+    in-graph model creation, which would silently produce duplicate
+    spans and break Langfuse session/user propagation.
+    """
+    app_config = _make_app_config([_make_model("safe-model", supports_thinking=False)])
+
+    import deerflow.tools as tools_module
+
+    monkeypatch.setattr(lead_agent_module, "get_app_config", lambda: app_config)
+    monkeypatch.setattr(tools_module, "get_available_tools", lambda **kwargs: [])
+    monkeypatch.setattr(lead_agent_module, "_build_middlewares", lambda config, model_name, agent_name=None, **kwargs: [])
+
+    sentinel_handler = object()
+    monkeypatch.setattr(lead_agent_module, "build_tracing_callbacks", lambda: [sentinel_handler])
+
+    seen_attach_tracing: list[bool] = []
+
+    def _fake_create_chat_model(*, name, thinking_enabled, reasoning_effort=None, app_config=None, attach_tracing=True):
+        seen_attach_tracing.append(attach_tracing)
+        return object()
+
+    monkeypatch.setattr(lead_agent_module, "create_chat_model", _fake_create_chat_model)
+    monkeypatch.setattr(lead_agent_module, "create_agent", lambda **kwargs: kwargs)
+
+    config: dict = {"configurable": {"model_name": "safe-model"}}
+    lead_agent_module._make_lead_agent(config, app_config=app_config)
+
+    # Handler must land on the graph invocation config so the Langfuse
+    # CallbackHandler fires ``on_chain_start(parent_run_id=None)`` and
+    # propagates ``session_id`` / ``user_id`` onto the trace.
+    assert sentinel_handler in (config.get("callbacks") or []), "build_tracing_callbacks output must be appended to config['callbacks']"
+
+    # Every in-graph create_chat_model call must opt out of model-level
+    # tracing to avoid duplicate spans.
+    assert seen_attach_tracing, "_make_lead_agent did not call create_chat_model"
+    assert all(flag is False for flag in seen_attach_tracing), f"in-graph create_chat_model must pass attach_tracing=False; got {seen_attach_tracing}"
+
+
 def test_internal_make_lead_agent_uses_explicit_app_config(monkeypatch):
    app_config = _make_app_config([_make_model("explicit-model", supports_thinking=False)])

@@ -55,7 +98,7 @@ def test_internal_make_lead_agent_uses_explicit_app_config(monkeypatch):

    captured: dict[str, object] = {}

-    def _fake_create_chat_model(*, name, thinking_enabled, reasoning_effort=None, app_config=None):
+    def _fake_create_chat_model(*, name, thinking_enabled, reasoning_effort=None, app_config=None, attach_tracing=True):
        captured["name"] = name
        captured["app_config"] = app_config
        return object()
@@ -89,7 +132,7 @@ def test_make_lead_agent_uses_runtime_app_config_from_context_without_global_rea

    captured: dict[str, object] = {}

-    def _fake_create_chat_model(*, name, thinking_enabled, reasoning_effort=None, app_config=None):
+    def _fake_create_chat_model(*, name, thinking_enabled, reasoning_effort=None, app_config=None, attach_tracing=True):
        captured["name"] = name
        captured["app_config"] = app_config
        return object()
@@ -168,7 +211,7 @@ def test_make_lead_agent_disables_thinking_when_model_does_not_support_it(monkey

    captured: dict[str, object] = {}

-    def _fake_create_chat_model(*, name, thinking_enabled, reasoning_effort=None, app_config=None):
+    def _fake_create_chat_model(*, name, thinking_enabled, reasoning_effort=None, app_config=None, attach_tracing=True):
        captured["name"] = name
        captured["thinking_enabled"] = thinking_enabled
        captured["reasoning_effort"] = reasoning_effort
@@ -212,7 +255,7 @@ def test_make_lead_agent_reads_runtime_options_from_context(monkeypatch):

    captured: dict[str, object] = {}

-    def _fake_create_chat_model(*, name, thinking_enabled, reasoning_effort=None, app_config=None):
+    def _fake_create_chat_model(*, name, thinking_enabled, reasoning_effort=None, app_config=None, attach_tracing=True):
        captured["name"] = name
        captured["thinking_enabled"] = thinking_enabled
        captured["reasoning_effort"] = reasoning_effort
@@ -407,7 +450,7 @@ def test_create_summarization_middleware_uses_configured_model_alias(monkeypatch
    fake_model = MagicMock()
    fake_model.with_config.return_value = fake_model

-    def _fake_create_chat_model(*, name=None, thinking_enabled, reasoning_effort=None, app_config=None):
+    def _fake_create_chat_model(*, name=None, thinking_enabled, reasoning_effort=None, app_config=None, attach_tracing=True):
        captured["name"] = name
        captured["thinking_enabled"] = thinking_enabled
        captured["reasoning_effort"] = reasoning_effort
@@ -441,7 +484,7 @@ def test_create_summarization_middleware_threads_resolved_app_config_to_model(mo
    fake_model = MagicMock()
    fake_model.with_config.return_value = fake_model

-    def _fake_create_chat_model(*, name=None, thinking_enabled, reasoning_effort=None, app_config=None):
+    def _fake_create_chat_model(*, name=None, thinking_enabled, reasoning_effort=None, app_config=None, attach_tracing=True):
        captured["app_config"] = app_config
        return fake_model

@@ -204,6 +204,26 @@ class TestSymlinkEscapes:

        assert exc_info.value.errno == errno.EACCES

+    def test_download_file_blocks_symlink_escape_from_mount(self, tmp_path):
+        mount_dir = tmp_path / "mount"
+        mount_dir.mkdir()
+        outside_dir = tmp_path / "outside"
+        outside_dir.mkdir()
+        (outside_dir / "secret.bin").write_bytes(b"\x00secret")
+        _symlink_to(outside_dir, mount_dir / "escape", target_is_directory=True)
+
+        sandbox = LocalSandbox(
+            "test",
+            [
+                PathMapping(container_path="/mnt/user-data", local_path=str(mount_dir), read_only=False),
+            ],
+        )
+
+        with pytest.raises(PermissionError) as exc_info:
+            sandbox.download_file("/mnt/user-data/escape/secret.bin")
+
+        assert exc_info.value.errno == errno.EACCES
+
    def test_write_file_blocks_symlink_escape_from_mount(self, tmp_path):
        mount_dir = tmp_path / "mount"
        mount_dir.mkdir()
@@ -334,6 +354,74 @@ class TestSymlinkEscapes:
        assert existing.read_bytes() == b"original"


+class TestDownloadFileMappings:
+    """download_file must use _resolve_path_with_mapping so path resolution, symlink
+    containment, and read-only awareness are consistent with read_file."""
+
+    def test_resolves_container_path_via_mapping(self, tmp_path):
+        """download_file should resolve container paths through path mappings."""
+        data_dir = tmp_path / "data"
+        data_dir.mkdir()
+        (data_dir / "asset.bin").write_bytes(b"\x01\x02\x03")
+
+        sandbox = LocalSandbox(
+            "test",
+            [PathMapping(container_path="/mnt/user-data", local_path=str(data_dir))],
+        )
+
+        result = sandbox.download_file("/mnt/user-data/asset.bin")
+
+        assert result == b"\x01\x02\x03"
+
+    def test_raises_oserror_with_original_path_when_missing(self, tmp_path):
+        """OSError filename should show the container path, not the resolved host path."""
+        data_dir = tmp_path / "data"
+        data_dir.mkdir()
+
+        sandbox = LocalSandbox(
+            "test",
+            [PathMapping(container_path="/mnt/user-data", local_path=str(data_dir))],
+        )
+
+        with pytest.raises(OSError) as exc_info:
+            sandbox.download_file("/mnt/user-data/missing.bin")
+
+        assert exc_info.value.filename == "/mnt/user-data/missing.bin"
+
+    def test_rejects_path_outside_virtual_prefix_and_logs_error(self, tmp_path, caplog):
+        """download_file must reject paths outside /mnt/user-data and log the reason."""
+        data_dir = tmp_path / "data"
+        data_dir.mkdir()
+        (data_dir / "model.bin").write_bytes(b"weights")
+
+        sandbox = LocalSandbox(
+            "test",
+            [PathMapping(container_path="/mnt/user-data", local_path=str(data_dir), read_only=True)],
+        )
+
+        with caplog.at_level("ERROR"):
+            with pytest.raises(PermissionError) as exc_info:
+                sandbox.download_file("/mnt/skills/model.bin")
+
+        assert exc_info.value.errno == errno.EACCES
+        assert "outside allowed directory" in caplog.text
+
+    def test_readable_from_read_only_mount(self, tmp_path):
+        """Read-only mounts must not block download_file — read-only only restricts writes."""
+        skills_dir = tmp_path / "skills"
+        skills_dir.mkdir()
+        (skills_dir / "model.bin").write_bytes(b"weights")
+
+        sandbox = LocalSandbox(
+            "test",
+            [PathMapping(container_path="/mnt/user-data", local_path=str(skills_dir), read_only=True)],
+        )
+
+        result = sandbox.download_file("/mnt/user-data/model.bin")
+
+        assert result == b"weights"
+
+
 class TestMultipleMounts:
    def test_multiple_read_write_mounts(self, tmp_path):
        skills_dir = tmp_path / "skills"
@@ -1,24 +1,94 @@
 """Tests for LoopDetectionMiddleware."""

 import copy
+from collections import OrderedDict
+from typing import Any
 from unittest.mock import MagicMock

-from langchain_core.messages import AIMessage, SystemMessage
+import pytest
+from langchain.agents import create_agent
+from langchain_core.language_models.fake_chat_models import FakeMessagesListChatModel
+from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
+from langchain_core.runnables import Runnable
+from langchain_core.tools import tool as as_tool
+from pydantic import PrivateAttr

 from deerflow.agents.middlewares.loop_detection_middleware import (
    _HARD_STOP_MSG,
+    _MAX_PENDING_WARNINGS_PER_RUN,
    LoopDetectionMiddleware,
    _hash_tool_calls,
 )


-def _make_runtime(thread_id="test-thread"):
+def _make_runtime(thread_id="test-thread", run_id="test-run"):
    """Build a minimal Runtime mock with context."""
    runtime = MagicMock()
-    runtime.context = {"thread_id": thread_id}
+    runtime.context = {"thread_id": thread_id, "run_id": run_id}
    return runtime


+def _pending_key(thread_id="test-thread", run_id="test-run"):
+    return (thread_id, run_id)
+
+
+def _make_request(messages, runtime):
+    """Build a minimal ModelRequest stand-in for wrap_model_call tests."""
+    request = MagicMock()
+    request.messages = list(messages)
+    request.runtime = runtime
+    request.override = lambda **updates: _override_request(request, updates)
+    return request
+
+
+def _override_request(request, updates):
+    """Mimic ModelRequest.override(): return a copy with fields replaced."""
+    new = MagicMock()
+    new.messages = updates.get("messages", request.messages)
+    new.runtime = updates.get("runtime", request.runtime)
+    new.override = lambda **u: _override_request(new, u)
+    return new
+
+
+def _capture_handler():
+    """Build a sync handler that records the request it was called with."""
+    captured: list = []
+
+    def handler(req):
+        captured.append(req)
+        return MagicMock()
+
+    return captured, handler
+
+
+class _CapturingFakeMessagesListChatModel(FakeMessagesListChatModel):
+    """Fake chat model that records each model request's messages."""
+
+    _seen_messages: list[list[Any]] = PrivateAttr(default_factory=list)
+
+    @property
+    def seen_messages(self) -> list[list[Any]]:
+        return self._seen_messages
+
+    def bind_tools(
+        self,
+        tools: Any,
+        *,
+        tool_choice: Any = None,
+        **kwargs: Any,
+    ) -> Runnable:
+        return self
+
+    def _generate(self, messages, stop=None, run_manager=None, **kwargs):
+        self._seen_messages.append(list(messages))
+        return super()._generate(
+            messages,
+            stop=stop,
+            run_manager=run_manager,
+            **kwargs,
+        )
+
+
 def _make_state(tool_calls=None, content=""):
    """Build a minimal AgentState dict with an AIMessage.

@@ -138,7 +208,15 @@ class TestLoopDetection:
            result = mw._apply(_make_state(tool_calls=call), runtime)
            assert result is None

-    def test_warn_at_threshold(self):
+    def test_warn_at_threshold_queues_but_does_not_mutate_state(self):
+        """At warn threshold, ``after_model`` enqueues but returns None.
+
+        Detection observes the just-emitted AIMessage(tool_calls=...). The
+        tools node hasn't run yet, so injecting any non-tool message here
+        would split the assistant's tool_calls from their ToolMessage
+        responses and break OpenAI/Moonshot pairing. The warning is
+        delivered later from ``wrap_model_call``.
+        """
        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=5)
        runtime = _make_runtime()
        call = [_bash_call("ls")]
@@ -146,44 +224,150 @@ class TestLoopDetection:
        for _ in range(2):
            mw._apply(_make_state(tool_calls=call), runtime)

-        # Third identical call triggers warning. The warning is appended to
-        # the AIMessage content (tool_calls preserved) — never inserted as a
-        # separate HumanMessage between the AIMessage(tool_calls) and its
-        # ToolMessage responses, which would break OpenAI/Moonshot strict
-        # tool-call pairing validation.
+        # Third identical call triggers warning detection.
        result = mw._apply(_make_state(tool_calls=call), runtime)
-        assert result is not None
-        msgs = result["messages"]
-        assert len(msgs) == 1
-        assert isinstance(msgs[0], AIMessage)
-        assert len(msgs[0].tool_calls) == len(call)
-        assert msgs[0].tool_calls[0]["id"] == call[0]["id"]
-        assert "LOOP DETECTED" in msgs[0].content
+        # Detection must not mutate state — the AIMessage with tool_calls is
+        # left untouched so the tools node runs normally.
+        assert result is None
+        # ...but a warning is queued for the next model call.
+        assert mw._pending_warnings[_pending_key()]
+        assert "LOOP DETECTED" in mw._pending_warnings[_pending_key()][0]

-    def test_warn_does_not_break_tool_call_pairing(self):
-        """Regression: the warn branch must NOT inject a non-tool message
-        after an AIMessage(tool_calls=...). Moonshot/OpenAI reject the next
-        request with 'tool_call_ids did not have response messages' if any
-        non-tool message is wedged between the AIMessage and its ToolMessage
-        responses. See #2029.
+    def test_warn_injected_at_next_model_call(self):
+        """``wrap_model_call`` appends a HumanMessage(loop_warning) to the
+        outgoing messages — *after* every existing message — so that the
+        AIMessage(tool_calls=...) -> ToolMessage(...) pairing stays intact.
        """
        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
        runtime = _make_runtime()
        call = [_bash_call("ls")]
-
-        for _ in range(2):
+        for _ in range(3):
            mw._apply(_make_state(tool_calls=call), runtime)

-        result = mw._apply(_make_state(tool_calls=call), runtime)
-        assert result is not None
-        msgs = result["messages"]
-        assert len(msgs) == 1
-        assert isinstance(msgs[0], AIMessage)
-        assert len(msgs[0].tool_calls) == len(call)
-        assert msgs[0].tool_calls[0]["id"] == call[0]["id"]
+        # Build the messages the agent runtime would assemble for the next
+        # turn: prior AIMessage(tool_calls), its ToolMessage responses, ...
+        ai_msg = AIMessage(content="", tool_calls=call)
+        tool_msg = ToolMessage(content="ok", tool_call_id=call[0]["id"], name="bash")
+        request = _make_request([ai_msg, tool_msg], runtime)

-    def test_warn_only_injected_once(self):
-        """Warning for the same hash should only be injected once per thread."""
+        captured, handler = _capture_handler()
+        mw.wrap_model_call(request, handler)
+
+        sent = captured[0].messages
+        # AIMessage and ToolMessage stay in order, untouched.
+        assert sent[0] is ai_msg
+        assert sent[1] is tool_msg
+        # HumanMessage(warning) appears AFTER the ToolMessage — pairing intact.
+        assert isinstance(sent[2], HumanMessage)
+        assert sent[2].name == "loop_warning"
+        assert "LOOP DETECTED" in sent[2].content
+
+    def test_warn_queue_drained_after_injection(self):
+        """A queued warning must be emitted exactly once per detection event."""
+        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
+        runtime = _make_runtime()
+        call = [_bash_call("ls")]
+        for _ in range(3):
+            mw._apply(_make_state(tool_calls=call), runtime)
+
+        request = _make_request([AIMessage(content="hi")], runtime)
+        captured, handler = _capture_handler()
+
+        # First call: warning is appended.
+        mw.wrap_model_call(request, handler)
+        first = captured[0].messages
+        assert any(isinstance(m, HumanMessage) for m in first)
+
+        # Subsequent call without new detection: no warning re-emitted.
+        request2 = _make_request([AIMessage(content="hi")], runtime)
+        mw.wrap_model_call(request2, handler)
+        second = captured[1].messages
+        assert not any(isinstance(m, HumanMessage) for m in second)
+
+    def test_warn_queue_scoped_by_run_id(self):
+        """A warning queued for one run must not be injected into another run."""
+        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
+        runtime_a = _make_runtime(run_id="run-A")
+        runtime_b = _make_runtime(run_id="run-B")
+        call = [_bash_call("ls")]
+
+        for _ in range(3):
+            mw._apply(_make_state(tool_calls=call), runtime_a)
+
+        request_b = _make_request([AIMessage(content="hi")], runtime_b)
+        captured, handler = _capture_handler()
+        mw.wrap_model_call(request_b, handler)
+        assert not any(isinstance(m, HumanMessage) for m in captured[0].messages)
+        assert mw._pending_warnings.get(_pending_key(run_id="run-A"))
+
+        request_a = _make_request([AIMessage(content="hi")], runtime_a)
+        mw.wrap_model_call(request_a, handler)
+        assert any(isinstance(message, HumanMessage) and message.name == "loop_warning" for message in captured[1].messages)
+
+    def test_missing_run_id_uses_default_pending_scope(self):
+        """When runtime has no run_id, warning handling falls back to the default run scope."""
+        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
+        runtime = MagicMock()
+        runtime.context = {"thread_id": "test-thread"}
+        call = [_bash_call("ls")]
+
+        for _ in range(3):
+            mw._apply(_make_state(tool_calls=call), runtime)
+
+        assert mw._pending_warnings.get(_pending_key(run_id="default"))
+
+        request = _make_request([AIMessage(content="hi")], runtime)
+        captured, handler = _capture_handler()
+        mw.wrap_model_call(request, handler)
+
+        loop_warnings = [message for message in captured[0].messages if isinstance(message, HumanMessage) and message.name == "loop_warning"]
+        assert len(loop_warnings) == 1
+        assert "LOOP DETECTED" in loop_warnings[0].content
+        assert not mw._pending_warnings.get(_pending_key(run_id="default"))
+
+    def test_before_agent_clears_stale_pending_warnings_for_thread(self):
+        """Starting a new run drops stale warnings from prior runs in the same thread."""
+        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
+        runtime_a = _make_runtime(run_id="run-A")
+        runtime_b = _make_runtime(run_id="run-B")
+        call = [_bash_call("ls")]
+
+        for _ in range(3):
+            mw._apply(_make_state(tool_calls=call), runtime_a)
+
+        assert mw._pending_warnings.get(_pending_key(run_id="run-A"))
+        mw.before_agent({"messages": []}, runtime_b)
+        assert not mw._pending_warnings.get(_pending_key(run_id="run-A"))
+
+    def test_after_agent_clears_current_run_pending_warnings(self):
+        """Run cleanup should drop warnings that never reached wrap_model_call."""
+        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
+        runtime = _make_runtime()
+        call = [_bash_call("ls")]
+
+        for _ in range(3):
+            mw._apply(_make_state(tool_calls=call), runtime)
+
+        assert mw._pending_warnings.get(_pending_key())
+        mw.after_agent({"messages": []}, runtime)
+        assert not mw._pending_warnings.get(_pending_key())
+
+    def test_multiple_pending_warnings_are_merged_into_one_message(self):
+        """Edge-case drains should produce one loop_warning prompt message."""
+        mw = LoopDetectionMiddleware()
+        runtime = _make_runtime()
+        mw._pending_warnings[_pending_key()] = ["first warning", "second warning", "first warning"]
+        request = _make_request([AIMessage(content="hi")], runtime)
+        captured, handler = _capture_handler()
+
+        mw.wrap_model_call(request, handler)
+
+        loop_warnings = [message for message in captured[0].messages if isinstance(message, HumanMessage) and message.name == "loop_warning"]
+        assert len(loop_warnings) == 1
+        assert loop_warnings[0].content == "first warning\n\nsecond warning"
+
+    def test_warn_only_queued_once_per_hash(self):
+        """Same hash repeated past the threshold should warn only once."""
        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
        runtime = _make_runtime()
        call = [_bash_call("ls")]
@@ -192,14 +376,13 @@ class TestLoopDetection:
        for _ in range(2):
            mw._apply(_make_state(tool_calls=call), runtime)

-        # Third — warning injected
-        result = mw._apply(_make_state(tool_calls=call), runtime)
-        assert result is not None
-        assert "LOOP DETECTED" in result["messages"][0].content
+        # Third — warning queued
+        mw._apply(_make_state(tool_calls=call), runtime)
+        assert len(mw._pending_warnings[_pending_key()]) == 1

-        # Fourth — warning already injected, should return None
-        result = mw._apply(_make_state(tool_calls=call), runtime)
-        assert result is None
+        # Fourth — already warned for this hash, no additional enqueue.
+        mw._apply(_make_state(tool_calls=call), runtime)
+        assert len(mw._pending_warnings[_pending_key()]) == 1

    def test_hard_stop_at_limit(self):
        mw = LoopDetectionMiddleware(warn_threshold=2, hard_limit=4)
@@ -257,6 +440,7 @@ class TestLoopDetection:
        mw.reset()
        result = mw._apply(_make_state(tool_calls=call), runtime)
        assert result is None
+        assert not mw._pending_warnings.get(_pending_key())

    def test_non_ai_message_ignored(self):
        mw = LoopDetectionMiddleware()
@@ -283,15 +467,16 @@ class TestLoopDetection:
        # One call on thread B
        mw._apply(_make_state(tool_calls=call), runtime_b)

-        # Second call on thread A — triggers warning (2 >= warn_threshold)
-        result = mw._apply(_make_state(tool_calls=call), runtime_a)
-        assert result is not None
-        assert "LOOP DETECTED" in result["messages"][0].content
+        # Second call on thread A — queues warning under thread-A only.
+        mw._apply(_make_state(tool_calls=call), runtime_a)
+        assert mw._pending_warnings.get(_pending_key("thread-A"))
+        assert "LOOP DETECTED" in mw._pending_warnings[_pending_key("thread-A")][0]
+        assert not mw._pending_warnings.get(_pending_key("thread-B"))

-        # Second call on thread B — also triggers (independent tracking)
-        result = mw._apply(_make_state(tool_calls=call), runtime_b)
-        assert result is not None
-        assert "LOOP DETECTED" in result["messages"][0].content
+        # Second call on thread B — independent queue.
+        mw._apply(_make_state(tool_calls=call), runtime_b)
+        assert mw._pending_warnings.get(_pending_key("thread-B"))
+        assert "LOOP DETECTED" in mw._pending_warnings[_pending_key("thread-B")][0]

    def test_lru_eviction(self):
        """Old threads should be evicted when max_tracked_threads is exceeded."""
@@ -313,6 +498,55 @@ class TestLoopDetection:
        assert "thread-new" in mw._history
        assert len(mw._history) == 3

+    def test_warned_hashes_are_pruned_to_sliding_window(self):
+        """A long-lived thread should not keep every historical warned hash."""
+        mw = LoopDetectionMiddleware(warn_threshold=2, hard_limit=100, window_size=4)
+        runtime = _make_runtime()
+
+        for i in range(12):
+            call = [_bash_call(f"cmd_{i}")]
+            mw._apply(_make_state(tool_calls=call), runtime)
+            mw._apply(_make_state(tool_calls=call), runtime)
+
+        assert len(mw._history["test-thread"]) <= 4
+        assert set(mw._warned["test-thread"]).issubset(set(mw._history["test-thread"]))
+        assert len(mw._warned["test-thread"]) <= 4
+
+    def test_pending_warning_keys_are_capped(self):
+        """Abnormal same-thread runs cannot grow pending-warning keys forever."""
+        mw = LoopDetectionMiddleware(warn_threshold=2, max_tracked_threads=2)
+
+        for i in range(10):
+            runtime = _make_runtime(thread_id="same-thread", run_id=f"run-{i}")
+            mw._queue_pending_warning(runtime, f"warning-{i}")
+
+        assert len(mw._pending_warnings) == mw._max_pending_warning_keys
+        assert len(mw._pending_warning_touch_order) == mw._max_pending_warning_keys
+        assert _pending_key("same-thread", "run-9") in mw._pending_warnings
+
+    def test_pending_warning_list_is_capped_and_deduped(self):
+        """One run cannot accumulate an unbounded warning list."""
+        mw = LoopDetectionMiddleware()
+        runtime = _make_runtime()
+
+        for i in range(_MAX_PENDING_WARNINGS_PER_RUN + 4):
+            mw._queue_pending_warning(runtime, f"warning-{i}")
+        mw._queue_pending_warning(runtime, f"warning-{_MAX_PENDING_WARNINGS_PER_RUN + 3}")
+
+        warnings = mw._pending_warnings[_pending_key()]
+        assert len(warnings) == _MAX_PENDING_WARNINGS_PER_RUN
+        assert warnings == [f"warning-{i}" for i in range(4, _MAX_PENDING_WARNINGS_PER_RUN + 4)]
+
+    def test_pending_warning_touch_order_cleared_with_pending_key(self):
+        mw = LoopDetectionMiddleware()
+        runtime = _make_runtime()
+        mw._queue_pending_warning(runtime, "warning")
+
+        mw.after_agent({"messages": []}, runtime)
+
+        assert mw._pending_warnings == {}
+        assert mw._pending_warning_touch_order == OrderedDict()
+
    def test_thread_safe_mutations(self):
        """Verify lock is used for mutations (basic structural test)."""
        mw = LoopDetectionMiddleware()
@@ -331,6 +565,99 @@ class TestLoopDetection:
        assert "default" in mw._history


+class TestLoopDetectionAgentGraphIntegration:
+    def test_loop_warning_is_transient_in_real_agent_graph(self):
+        """after_model queues the warning; wrap_model_call injects it request-only."""
+
+        @as_tool
+        def bash(command: str) -> str:
+            """Run a fake shell command."""
+            return f"ran: {command}"
+
+        repeated_calls = [[{"name": "bash", "id": f"call_ls_{i}", "args": {"command": "ls"}}] for i in range(3)]
+        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
+        model = _CapturingFakeMessagesListChatModel(
+            responses=[
+                AIMessage(content="", tool_calls=repeated_calls[0]),
+                AIMessage(content="", tool_calls=repeated_calls[1]),
+                AIMessage(content="", tool_calls=repeated_calls[2]),
+                AIMessage(content="final answer"),
+            ],
+        )
+        graph = create_agent(model=model, tools=[bash], middleware=[mw])
+
+        result = graph.invoke(
+            {"messages": [("user", "inspect the directory")]},
+            context={"thread_id": "integration-thread", "run_id": "integration-run"},
+            config={"recursion_limit": 20},
+        )
+
+        assert len(model.seen_messages) == 4
+        loop_warnings_by_call = [[message for message in messages if isinstance(message, HumanMessage) and message.name == "loop_warning"] for messages in model.seen_messages]
+        assert loop_warnings_by_call[0] == []
+        assert loop_warnings_by_call[1] == []
+        assert loop_warnings_by_call[2] == []
+        assert len(loop_warnings_by_call[3]) == 1
+        assert "LOOP DETECTED" in loop_warnings_by_call[3][0].content
+
+        fourth_request = model.seen_messages[3]
+        assert isinstance(fourth_request[-2], ToolMessage)
+        assert fourth_request[-2].tool_call_id == "call_ls_2"
+        assert fourth_request[-1] is loop_warnings_by_call[3][0]
+
+        persisted_loop_warnings = [message for message in result["messages"] if isinstance(message, HumanMessage) and message.name == "loop_warning"]
+        assert persisted_loop_warnings == []
+        assert result["messages"][-1].content == "final answer"
+        assert mw._pending_warnings == {}
+        assert mw._pending_warning_touch_order == OrderedDict()
+
+    @pytest.mark.asyncio
+    async def test_loop_warning_is_transient_in_async_agent_graph(self):
+        """awrap_model_call injects loop_warning request-only in async graph runs."""
+
+        @as_tool
+        async def bash(command: str) -> str:
+            """Run a fake shell command."""
+            return f"ran: {command}"
+
+        repeated_calls = [[{"name": "bash", "id": f"call_async_ls_{i}", "args": {"command": "ls"}}] for i in range(3)]
+        mw = LoopDetectionMiddleware(warn_threshold=3, hard_limit=10)
+        model = _CapturingFakeMessagesListChatModel(
+            responses=[
+                AIMessage(content="", tool_calls=repeated_calls[0]),
+                AIMessage(content="", tool_calls=repeated_calls[1]),
+                AIMessage(content="", tool_calls=repeated_calls[2]),
+                AIMessage(content="async final answer"),
+            ],
+        )
+        graph = create_agent(model=model, tools=[bash], middleware=[mw])
+
+        result = await graph.ainvoke(
+            {"messages": [("user", "inspect the directory asynchronously")]},
+            context={"thread_id": "async-integration-thread", "run_id": "async-integration-run"},
+            config={"recursion_limit": 20},
+        )
+
+        assert len(model.seen_messages) == 4
+        loop_warnings_by_call = [[message for message in messages if isinstance(message, HumanMessage) and message.name == "loop_warning"] for messages in model.seen_messages]
+        assert loop_warnings_by_call[0] == []
+        assert loop_warnings_by_call[1] == []
+        assert loop_warnings_by_call[2] == []
+        assert len(loop_warnings_by_call[3]) == 1
+        assert "LOOP DETECTED" in loop_warnings_by_call[3][0].content
+
+        fourth_request = model.seen_messages[3]
+        assert isinstance(fourth_request[-2], ToolMessage)
+        assert fourth_request[-2].tool_call_id == "call_async_ls_2"
+        assert fourth_request[-1] is loop_warnings_by_call[3][0]
+
+        persisted_loop_warnings = [message for message in result["messages"] if isinstance(message, HumanMessage) and message.name == "loop_warning"]
+        assert persisted_loop_warnings == []
+        assert result["messages"][-1].content == "async final answer"
+        assert mw._pending_warnings == {}
+        assert mw._pending_warning_touch_order == OrderedDict()
+
+
 class TestAppendText:
    """Unit tests for LoopDetectionMiddleware._append_text."""

@@ -507,33 +834,29 @@ class TestToolFrequencyDetection:
        for i in range(4):
            mw._apply(_make_state(tool_calls=[self._read_call(f"/file_{i}.py")]), runtime)

-        # 5th call to read_file (different file each time) triggers freq warning
+        # 5th call queues a per-tool-type frequency warning; state untouched.
        result = mw._apply(_make_state(tool_calls=[self._read_call("/file_4.py")]), runtime)
-        assert result is not None
-        msg = result["messages"][0]
-        # Warning is appended to the AIMessage content; tool_calls preserved
-        # so the tools node still runs and Moonshot/OpenAI tool-call pairing
-        # validation does not break.
-        assert isinstance(msg, AIMessage)
-        assert msg.tool_calls
-        assert "read_file" in msg.content
-        assert "LOOP DETECTED" in msg.content
+        assert result is None
+        queued = mw._pending_warnings.get(_pending_key(), [])
+        assert queued
+        assert "read_file" in queued[0]
+        assert "LOOP DETECTED" in queued[0]

-    def test_freq_warn_only_injected_once(self):
+    def test_freq_warn_only_queued_once(self):
        mw = LoopDetectionMiddleware(tool_freq_warn=3, tool_freq_hard_limit=10)
        runtime = _make_runtime()

        for i in range(2):
            mw._apply(_make_state(tool_calls=[self._read_call(f"/file_{i}.py")]), runtime)

-        # 3rd triggers warning
-        result = mw._apply(_make_state(tool_calls=[self._read_call("/file_2.py")]), runtime)
-        assert result is not None
-        assert "LOOP DETECTED" in result["messages"][0].content
+        # 3rd queues a frequency warning.
+        mw._apply(_make_state(tool_calls=[self._read_call("/file_2.py")]), runtime)
+        assert len(mw._pending_warnings[_pending_key()]) == 1

-        # 4th should not re-warn (already warned for read_file)
+        # 4th: same tool name, no additional enqueue.
        result = mw._apply(_make_state(tool_calls=[self._read_call("/file_3.py")]), runtime)
        assert result is None
+        assert len(mw._pending_warnings[_pending_key()]) == 1

    def test_freq_hard_stop_at_limit(self):
        mw = LoopDetectionMiddleware(tool_freq_warn=3, tool_freq_hard_limit=6)
@@ -565,10 +888,10 @@ class TestToolFrequencyDetection:
            result = mw._apply(_make_state(tool_calls=[_bash_call(f"cmd_{i}")]), runtime)
            assert result is None

-        # 3rd read_file triggers (read_file count = 3)
+        # 3rd read_file triggers — warning is queued (state unchanged).
        result = mw._apply(_make_state(tool_calls=[self._read_call("/file_2.py")]), runtime)
-        assert result is not None
-        assert "read_file" in result["messages"][0].content
+        assert result is None
+        assert "read_file" in mw._pending_warnings[_pending_key()][0]

    def test_freq_reset_clears_state(self):
        mw = LoopDetectionMiddleware(tool_freq_warn=3, tool_freq_hard_limit=10)
@@ -600,10 +923,10 @@ class TestToolFrequencyDetection:
        assert "thread-A" not in mw._tool_freq
        assert "thread-A" not in mw._tool_freq_warned

-        # thread-B state should still be intact — 3rd call triggers warn
+        # thread-B state should still be intact — 3rd call queues a warn.
        result = mw._apply(_make_state(tool_calls=[self._read_call("/b_2.py")]), runtime_b)
-        assert result is not None
-        assert "LOOP DETECTED" in result["messages"][0].content
+        assert result is None
+        assert "LOOP DETECTED" in mw._pending_warnings[_pending_key("thread-B")][0]

        # thread-A restarted from 0 — should not trigger
        result = mw._apply(_make_state(tool_calls=[self._read_call("/a_new.py")]), runtime_a)
@@ -623,10 +946,11 @@ class TestToolFrequencyDetection:
        for i in range(2):
            mw._apply(_make_state(tool_calls=[self._read_call(f"/other_{i}.py")]), runtime_b)

-        # 3rd call on thread A — triggers (count=3 for thread A only)
+        # 3rd call on thread A — queues a warning (count=3 for thread A only).
        result = mw._apply(_make_state(tool_calls=[self._read_call("/file_2.py")]), runtime_a)
-        assert result is not None
-        assert "LOOP DETECTED" in result["messages"][0].content
+        assert result is None
+        assert "LOOP DETECTED" in mw._pending_warnings[_pending_key("thread-A")][0]
+        assert not mw._pending_warnings.get(_pending_key("thread-B"))

    def test_multi_tool_single_response_counted(self):
        """When a single response has multiple tool calls, each is counted."""
@@ -643,10 +967,10 @@ class TestToolFrequencyDetection:
        result = mw._apply(_make_state(tool_calls=call), runtime)
        assert result is None

-        # Response 3: 1 more → count = 5 → triggers warn
+        # Response 3: 1 more → count = 5 → queues warn.
        result = mw._apply(_make_state(tool_calls=[self._read_call("/e.py")]), runtime)
-        assert result is not None
-        assert "read_file" in result["messages"][0].content
+        assert result is None
+        assert "read_file" in mw._pending_warnings[_pending_key()][0]

    def test_override_tool_uses_override_thresholds(self):
        """A tool in tool_freq_overrides uses its own thresholds, not the global ones."""
@@ -674,10 +998,14 @@ class TestToolFrequencyDetection:
        for i in range(2):
            mw._apply(_make_state(tool_calls=[self._read_call(f"/file_{i}.py")]), runtime)

-        # 3rd read_file call hits global warn=3 (read_file has no override)
+        # 3rd read_file call hits global warn=3 (read_file has no override).
+        # Warning delivery is deferred to wrap_model_call so the just-emitted
+        # AIMessage(tool_calls=...) is not mutated before ToolMessages exist.
        result = mw._apply(_make_state(tool_calls=[self._read_call("/file_2.py")]), runtime)
-        assert result is not None
-        assert "read_file" in result["messages"][0].content
+        assert result is None
+        queued = mw._pending_warnings.get(_pending_key(), [])
+        assert queued
+        assert "read_file" in queued[0]

    def test_hash_detection_takes_priority(self):
        """Hash-based hard stop fires before frequency check for identical calls."""
@@ -736,11 +1064,13 @@ class TestFromConfig:
        mw = LoopDetectionMiddleware.from_config(self._config())
        assert mw._tool_freq_overrides == {}

-    def test_constructed_middleware_detects_loops(self):
+    def test_constructed_middleware_queues_loop_warning(self):
        mw = LoopDetectionMiddleware.from_config(self._config(warn_threshold=2, hard_limit=4))
        runtime = _make_runtime()
        call = [_bash_call("ls")]
        mw._apply(_make_state(tool_calls=call), runtime)
        result = mw._apply(_make_state(tool_calls=call), runtime)
-        assert result is not None
-        assert "LOOP DETECTED" in result["messages"][0].content
+        assert result is None
+        queued = mw._pending_warnings.get(_pending_key(), [])
+        assert queued
+        assert "LOOP DETECTED" in queued[0]
@@ -24,6 +24,26 @@ def test_build_server_params_stdio_success():
    }


+def test_extensions_config_resolves_env_variables_inside_nested_collections(monkeypatch):
+    monkeypatch.setenv("MCP_TOKEN", "secret")
+    monkeypatch.delenv("MISSING_TOKEN", raising=False)
+    raw_config = {
+        "args": ["--token", "$MCP_TOKEN", {"nested": ["$MCP_TOKEN", "$MISSING_TOKEN"]}],
+        "tuple_args": ("$MCP_TOKEN", "$MISSING_TOKEN"),
+        "env": {"API_KEY": "$MCP_TOKEN"},
+        "enabled": True,
+        "timeout": 30,
+    }
+
+    resolved = ExtensionsConfig.resolve_env_variables(raw_config)
+
+    assert resolved["args"] == ["--token", "secret", {"nested": ["secret", ""]}]
+    assert resolved["tuple_args"] == ("secret", "")
+    assert resolved["env"] == {"API_KEY": "secret"}
+    assert resolved["enabled"] is True
+    assert resolved["timeout"] == 30
+
+
 def test_build_server_params_stdio_requires_command():
    config = McpServerConfig(type="stdio", command=None)

@@ -0,0 +1,305 @@
+"""Tests for MCP config secret masking and preservation.
+
+Verifies that GET /api/mcp/config masks sensitive fields (env values,
+header values, OAuth secrets) and that PUT /api/mcp/config correctly
+preserves existing secrets when the frontend round-trips masked values.
+"""
+
+from __future__ import annotations
+
+import pytest
+
+from app.gateway.routers.mcp import (
+    McpOAuthConfigResponse,
+    McpServerConfigResponse,
+    _mask_server_config,
+    _merge_preserving_secrets,
+)
+
+# ---------------------------------------------------------------------------
+# _mask_server_config
+# ---------------------------------------------------------------------------
+
+
+def test_mask_replaces_env_values_with_asterisks():
+    """Env dict values should be replaced with '***'."""
+    server = McpServerConfigResponse(
+        env={"GITHUB_TOKEN": "ghp_real_secret_123", "API_KEY": "sk-abc"},
+    )
+    masked = _mask_server_config(server)
+    assert masked.env == {"GITHUB_TOKEN": "***", "API_KEY": "***"}
+
+
+def test_mask_replaces_header_values_with_asterisks():
+    """Header dict values should be replaced with '***'."""
+    server = McpServerConfigResponse(
+        headers={"Authorization": "Bearer tok_123", "X-API-Key": "key_456"},
+    )
+    masked = _mask_server_config(server)
+    assert masked.headers == {"Authorization": "***", "X-API-Key": "***"}
+
+
+def test_mask_removes_oauth_secrets():
+    """OAuth client_secret and refresh_token should be set to None."""
+    server = McpServerConfigResponse(
+        oauth=McpOAuthConfigResponse(
+            client_id="my-client",
+            client_secret="super-secret",
+            refresh_token="refresh-token-abc",
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    masked = _mask_server_config(server)
+    assert masked.oauth is not None
+    assert masked.oauth.client_secret is None
+    assert masked.oauth.refresh_token is None
+    # Non-secret fields preserved
+    assert masked.oauth.client_id == "my-client"
+    assert masked.oauth.token_url == "https://auth.example.com/token"
+
+
+def test_mask_preserves_non_secret_fields():
+    """Non-sensitive fields should pass through unchanged."""
+    server = McpServerConfigResponse(
+        enabled=True,
+        type="stdio",
+        command="npx",
+        args=["-y", "@modelcontextprotocol/server-github"],
+        env={"KEY": "val"},
+        description="GitHub MCP server",
+    )
+    masked = _mask_server_config(server)
+    assert masked.enabled is True
+    assert masked.type == "stdio"
+    assert masked.command == "npx"
+    assert masked.args == ["-y", "@modelcontextprotocol/server-github"]
+    assert masked.description == "GitHub MCP server"
+
+
+def test_mask_handles_empty_env_and_headers():
+    """Empty env/headers dicts should remain empty."""
+    server = McpServerConfigResponse()
+    masked = _mask_server_config(server)
+    assert masked.env == {}
+    assert masked.headers == {}
+
+
+def test_mask_handles_no_oauth():
+    """Server without OAuth should remain None."""
+    server = McpServerConfigResponse(oauth=None)
+    masked = _mask_server_config(server)
+    assert masked.oauth is None
+
+
+def test_mask_does_not_mutate_original():
+    """Masking should return a new object, not modify the original."""
+    server = McpServerConfigResponse(env={"KEY": "secret"})
+    masked = _mask_server_config(server)
+    assert server.env["KEY"] == "secret"
+    assert masked.env["KEY"] == "***"
+
+
+# ---------------------------------------------------------------------------
+# _merge_preserving_secrets
+# ---------------------------------------------------------------------------
+
+
+def test_merge_preserves_masked_env_values():
+    """Incoming '***' env values should be replaced with existing secrets."""
+    incoming = McpServerConfigResponse(env={"KEY": "***"})
+    existing = McpServerConfigResponse(env={"KEY": "real_secret"})
+    merged = _merge_preserving_secrets(incoming, existing)
+    assert merged.env["KEY"] == "real_secret"
+
+
+def test_merge_preserves_masked_header_values():
+    """Incoming '***' header values should be replaced with existing secrets."""
+    incoming = McpServerConfigResponse(headers={"Authorization": "***"})
+    existing = McpServerConfigResponse(headers={"Authorization": "Bearer real"})
+    merged = _merge_preserving_secrets(incoming, existing)
+    assert merged.headers["Authorization"] == "Bearer real"
+
+
+def test_merge_preserves_oauth_secrets_when_none():
+    """Incoming None oauth secrets should preserve existing values."""
+    incoming = McpServerConfigResponse(
+        oauth=McpOAuthConfigResponse(
+            client_secret=None,
+            refresh_token=None,
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    existing = McpServerConfigResponse(
+        oauth=McpOAuthConfigResponse(
+            client_secret="existing-secret",
+            refresh_token="existing-refresh",
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    merged = _merge_preserving_secrets(incoming, existing)
+    assert merged.oauth is not None
+    assert merged.oauth.client_secret == "existing-secret"
+    assert merged.oauth.refresh_token == "existing-refresh"
+
+
+def test_merge_accepts_new_secret_values():
+    """Incoming real secret values should replace existing ones."""
+    incoming = McpServerConfigResponse(
+        env={"KEY": "new_secret"},
+        oauth=McpOAuthConfigResponse(
+            client_secret="new-client-secret",
+            refresh_token="new-refresh-token",
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    existing = McpServerConfigResponse(
+        env={"KEY": "old_secret"},
+        oauth=McpOAuthConfigResponse(
+            client_secret="old-secret",
+            refresh_token="old-refresh",
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    merged = _merge_preserving_secrets(incoming, existing)
+    assert merged.env["KEY"] == "new_secret"
+    assert merged.oauth.client_secret == "new-client-secret"
+    assert merged.oauth.refresh_token == "new-refresh-token"
+
+
+def test_merge_handles_no_existing_oauth():
+    """When existing has no oauth but incoming does, keep incoming."""
+    incoming = McpServerConfigResponse(
+        oauth=McpOAuthConfigResponse(
+            client_secret="new-secret",
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    existing = McpServerConfigResponse(oauth=None)
+    merged = _merge_preserving_secrets(incoming, existing)
+    assert merged.oauth is not None
+    assert merged.oauth.client_secret == "new-secret"
+
+
+def test_merge_does_not_mutate_original():
+    """Merge should return a new object, not modify the original."""
+    incoming = McpServerConfigResponse(env={"KEY": "***"})
+    existing = McpServerConfigResponse(env={"KEY": "secret"})
+    merged = _merge_preserving_secrets(incoming, existing)
+    assert incoming.env["KEY"] == "***"
+    assert existing.env["KEY"] == "secret"
+    assert merged.env["KEY"] == "secret"
+
+
+# ---------------------------------------------------------------------------
+# Comment 2 fix: masked value for new key is rejected
+# ---------------------------------------------------------------------------
+
+
+def test_merge_rejects_masked_value_for_new_env_key():
+    """Sending '***' for a key that doesn't exist in existing should raise 400."""
+    from fastapi import HTTPException
+
+    incoming = McpServerConfigResponse(env={"NEW_KEY": "***"})
+    existing = McpServerConfigResponse(env={})
+    with pytest.raises(HTTPException) as exc_info:
+        _merge_preserving_secrets(incoming, existing)
+    assert exc_info.value.status_code == 400
+    assert "NEW_KEY" in exc_info.value.detail
+
+
+def test_merge_rejects_masked_value_for_new_header_key():
+    """Sending '***' for a header key that doesn't exist should raise 400."""
+    from fastapi import HTTPException
+
+    incoming = McpServerConfigResponse(headers={"X-New-Auth": "***"})
+    existing = McpServerConfigResponse(headers={})
+    with pytest.raises(HTTPException) as exc_info:
+        _merge_preserving_secrets(incoming, existing)
+    assert exc_info.value.status_code == 400
+    assert "X-New-Auth" in exc_info.value.detail
+
+
+# ---------------------------------------------------------------------------
+# Comment 4 fix: empty string clears OAuth secrets
+# ---------------------------------------------------------------------------
+
+
+def test_merge_empty_string_clears_oauth_client_secret():
+    """Sending '' for client_secret should clear the stored value."""
+    incoming = McpServerConfigResponse(
+        oauth=McpOAuthConfigResponse(
+            client_secret="",
+            refresh_token=None,
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    existing = McpServerConfigResponse(
+        oauth=McpOAuthConfigResponse(
+            client_secret="existing-secret",
+            refresh_token="existing-refresh",
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    merged = _merge_preserving_secrets(incoming, existing)
+    assert merged.oauth.client_secret is None
+    assert merged.oauth.refresh_token == "existing-refresh"
+
+
+def test_merge_empty_string_clears_oauth_refresh_token():
+    """Sending '' for refresh_token should clear the stored value."""
+    incoming = McpServerConfigResponse(
+        oauth=McpOAuthConfigResponse(
+            client_secret=None,
+            refresh_token="",
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    existing = McpServerConfigResponse(
+        oauth=McpOAuthConfigResponse(
+            client_secret="existing-secret",
+            refresh_token="existing-refresh",
+            token_url="https://auth.example.com/token",
+        ),
+    )
+    merged = _merge_preserving_secrets(incoming, existing)
+    assert merged.oauth.client_secret == "existing-secret"
+    assert merged.oauth.refresh_token is None
+
+
+# ---------------------------------------------------------------------------
+# Round-trip integration: mask → merge should preserve original secrets
+# ---------------------------------------------------------------------------
+
+
+def test_roundtrip_mask_then_merge_preserves_original_secrets():
+    """Simulates the full frontend round-trip: GET (masked) → toggle → PUT."""
+    original = McpServerConfigResponse(
+        enabled=True,
+        env={"GITHUB_TOKEN": "ghp_real_secret"},
+        headers={"Authorization": "Bearer real_token"},
+        oauth=McpOAuthConfigResponse(
+            client_id="client-123",
+            client_secret="oauth-secret",
+            refresh_token="refresh-abc",
+            token_url="https://auth.example.com/token",
+        ),
+        description="GitHub MCP server",
+    )
+
+    # Step 1: Server returns masked config (simulates GET response)
+    masked = _mask_server_config(original)
+    assert masked.env["GITHUB_TOKEN"] == "***"
+    assert masked.oauth.client_secret is None
+
+    # Step 2: Frontend toggles enabled and sends back (simulates PUT request)
+    from_frontend = masked.model_copy(update={"enabled": False})
+
+    # Step 3: Server merges with existing secrets (simulates PUT handler)
+    restored = _merge_preserving_secrets(from_frontend, original)
+    assert restored.enabled is False
+    assert restored.env["GITHUB_TOKEN"] == "ghp_real_secret"
+    assert restored.headers["Authorization"] == "Bearer real_token"
+    assert restored.oauth.client_secret == "oauth-secret"
+    assert restored.oauth.refresh_token == "refresh-abc"
+    # Non-secret fields from the update are preserved
+    assert restored.description == "GitHub MCP server"
@@ -1,7 +1,9 @@
 import asyncio
+import contextvars
 from unittest.mock import AsyncMock, MagicMock, patch

 import pytest
+from langchain_core.runnables import RunnableConfig
 from langchain_core.tools import StructuredTool
 from pydantic import BaseModel, Field

@@ -69,6 +71,58 @@ def test_mcp_tool_sync_wrapper_in_running_loop():
    assert result == "async_result: 100"


+def test_sync_wrapper_preserves_contextvars_in_running_loop():
+    """The executor branch preserves LangGraph-style contextvars."""
+    current_value: contextvars.ContextVar[str | None] = contextvars.ContextVar("current_value", default=None)
+
+    async def mock_coro() -> str | None:
+        return current_value.get()
+
+    sync_func = make_sync_tool_wrapper(mock_coro, "test_tool")
+
+    async def run_in_loop() -> str | None:
+        token = current_value.set("from-parent-context")
+        try:
+            return sync_func()
+        finally:
+            current_value.reset(token)
+
+    assert asyncio.run(run_in_loop()) == "from-parent-context"
+
+
+def test_sync_wrapper_preserves_runnable_config_injection():
+    """LangChain can still inject RunnableConfig after an async tool is wrapped."""
+    captured: dict[str, object] = {}
+
+    async def mock_coro(x: int, config: RunnableConfig = None):
+        captured["thread_id"] = ((config or {}).get("configurable") or {}).get("thread_id")
+        return f"result: {x}"
+
+    mock_tool = StructuredTool(
+        name="test_tool",
+        description="test description",
+        args_schema=MockArgs,
+        func=make_sync_tool_wrapper(mock_coro, "test_tool"),
+        coroutine=mock_coro,
+    )
+
+    result = mock_tool.invoke({"x": 42}, config={"configurable": {"thread_id": "thread-123"}})
+
+    assert result == "result: 42"
+    assert captured["thread_id"] == "thread-123"
+
+
+def test_sync_wrapper_preserves_regular_config_argument():
+    """Only RunnableConfig-annotated coroutine params get special config injection."""
+
+    async def mock_coro(config: str):
+        return config
+
+    sync_func = make_sync_tool_wrapper(mock_coro, "test_tool")
+
+    assert sync_func(config="user-config") == "user-config"
+
+
 def test_mcp_tool_sync_wrapper_exception_logging():
    """Test the shared sync wrapper's error logging."""

@@ -78,6 +78,41 @@ def test_apply_updates_skips_existing_duplicate_and_preserves_removals() -> None
    assert all(fact["id"] != "fact_remove" for fact in result["facts"])


+def test_prepare_update_prompt_preserves_non_ascii_memory_text() -> None:
+    updater = MemoryUpdater()
+    current_memory = _make_memory(
+        facts=[
+            {
+                "id": "fact_cn",
+                "content": "Deer-flow是一个非常好的框架。",
+                "category": "context",
+                "confidence": 0.9,
+                "createdAt": "2026-05-20T00:00:00Z",
+                "source": "thread-cn",
+            },
+        ]
+    )
+
+    with (
+        patch("deerflow.agents.memory.updater.get_memory_config", return_value=_memory_config(enabled=True)),
+        patch("deerflow.agents.memory.updater.get_memory_data", return_value=current_memory),
+    ):
+        msg = MagicMock()
+        msg.type = "human"
+        msg.content = "你好"
+        prepared = updater._prepare_update_prompt(
+            [msg],
+            agent_name=None,
+            correction_detected=False,
+            reinforcement_detected=False,
+        )
+
+    assert prepared is not None
+    _, prompt = prepared
+    assert "Deer-flow是一个非常好的框架。" in prompt
+    assert "\\u" not in prompt
+
+
 def test_apply_updates_skips_same_batch_duplicates_and_keeps_source_metadata() -> None:
    updater = MemoryUpdater()
    current_memory = _make_memory()
@@ -0,0 +1,106 @@
+"""Regression tests for #3120: SQLite-backed stores must emit tz-aware ISO timestamps.
+
+SQLAlchemy's ``DateTime(timezone=True)`` is a no-op on SQLite because the
+backend has no native timezone type, so values read back are naive
+``datetime`` instances. The four SQL ``_row_to_dict`` helpers therefore
+have to normalize through :func:`deerflow.utils.time.coerce_iso` instead
+of calling ``.isoformat()`` directly; otherwise the API ships
+timezone-less strings (e.g. ``"2026-05-20T06:10:22.970977"``) and the
+frontend's ``new Date(...)`` parses them as local time, shifting recent
+threads by the local UTC offset.
+"""
+
+import re
+
+import pytest
+
+_TZ_SUFFIX_RE = re.compile(r"(?:\+\d{2}:\d{2}|Z)$")
+
+
+def _assert_tz_aware(value: str | None, *, context: str) -> None:
+    assert value, f"{context}: expected ISO string, got {value!r}"
+    assert _TZ_SUFFIX_RE.search(value), f"{context}: timestamp lacks tz suffix: {value!r}"
+
+
+async def _init_sqlite(tmp_path):
+    from deerflow.persistence.engine import get_session_factory, init_engine
+
+    url = f"sqlite+aiosqlite:///{tmp_path / 'tz.db'}"
+    await init_engine("sqlite", url=url, sqlite_dir=str(tmp_path))
+    return get_session_factory()
+
+
+async def _cleanup():
+    from deerflow.persistence.engine import close_engine
+
+    await close_engine()
+
+
+@pytest.mark.anyio
+async def test_thread_meta_emits_tz_aware_timestamps(tmp_path):
+    from deerflow.persistence.thread_meta import ThreadMetaRepository
+
+    repo = ThreadMetaRepository(await _init_sqlite(tmp_path))
+    try:
+        created = await repo.create("t-tz", user_id="u1", display_name="tz")
+        _assert_tz_aware(created["created_at"], context="thread_meta.create.created_at")
+        _assert_tz_aware(created["updated_at"], context="thread_meta.create.updated_at")
+
+        # Second read from DB exercises the same _row_to_dict path on a
+        # value that SQLite has round-tripped (where tzinfo is lost).
+        fetched = await repo.get("t-tz", user_id="u1")
+        _assert_tz_aware(fetched["created_at"], context="thread_meta.get.created_at")
+        _assert_tz_aware(fetched["updated_at"], context="thread_meta.get.updated_at")
+
+        listed = await repo.search(user_id="u1")
+        assert listed, "search must return the created row"
+        _assert_tz_aware(listed[0]["created_at"], context="thread_meta.search.created_at")
+        _assert_tz_aware(listed[0]["updated_at"], context="thread_meta.search.updated_at")
+    finally:
+        await _cleanup()
+
+
+@pytest.mark.anyio
+async def test_run_repository_emits_tz_aware_timestamps(tmp_path):
+    from deerflow.persistence.run import RunRepository
+
+    repo = RunRepository(await _init_sqlite(tmp_path))
+    try:
+        await repo.put("r-tz", thread_id="t-tz", user_id="u1")
+        row = await repo.get("r-tz", user_id="u1")
+        _assert_tz_aware(row["created_at"], context="run.get.created_at")
+        _assert_tz_aware(row["updated_at"], context="run.get.updated_at")
+    finally:
+        await _cleanup()
+
+
+@pytest.mark.anyio
+async def test_feedback_repository_emits_tz_aware_timestamps(tmp_path):
+    from deerflow.persistence.feedback import FeedbackRepository
+
+    repo = FeedbackRepository(await _init_sqlite(tmp_path))
+    try:
+        record = await repo.create(run_id="r-tz", thread_id="t-tz", rating=1, user_id="u1")
+        _assert_tz_aware(record["created_at"], context="feedback.create.created_at")
+    finally:
+        await _cleanup()
+
+
+@pytest.mark.anyio
+async def test_run_event_store_emits_tz_aware_timestamps(tmp_path):
+    from deerflow.runtime.events.store.db import DbRunEventStore
+
+    store = DbRunEventStore(await _init_sqlite(tmp_path))
+    try:
+        await store.put(
+            thread_id="t-tz",
+            run_id="r-tz",
+            event_type="log",
+            category="log",
+            content="hello",
+        )
+        events = await store.list_events("t-tz", "r-tz", user_id=None)
+        assert events, "expected at least one event"
+        _assert_tz_aware(events[0]["created_at"], context="run_event.list.created_at")
+    finally:
+        await _cleanup()
@@ -159,6 +159,26 @@ def test_provisioner_create_returns_sandbox_info(monkeypatch):
    assert info.sandbox_url == "http://k3s:31001"


+def test_provisioner_create_accepts_anonymous_thread_id(monkeypatch):
+    backend = RemoteSandboxBackend("http://provisioner:8002")
+
+    def mock_post(url: str, json: dict, timeout: int):
+        assert url == "http://provisioner:8002/api/sandboxes"
+        assert json == {
+            "sandbox_id": "anon123",
+            "thread_id": None,
+            "user_id": "test-user-autouse",
+        }
+        assert timeout == 30
+        return _StubResponse(payload={"sandbox_id": "anon123", "sandbox_url": "http://k3s:31002"})
+
+    monkeypatch.setattr(requests, "post", mock_post)
+
+    info = backend.create(None, "anon123")
+    assert info.sandbox_id == "anon123"
+    assert info.sandbox_url == "http://k3s:31002"
+
+
 def test_provisioner_create_raises_runtime_error_on_request_exception(monkeypatch):
    backend = RemoteSandboxBackend("http://provisioner:8002")

@@ -0,0 +1,34 @@
+from deerflow.runtime.runs.naming import resolve_root_run_name
+
+
+def test_resolve_root_run_name_from_context_agent_name():
+    assert resolve_root_run_name({"context": {"agent_name": "finalis"}}, "lead_agent") == "finalis"
+
+
+def test_resolve_root_run_name_from_configurable_agent_name():
+    assert resolve_root_run_name({"configurable": {"agent_name": "finalis"}}, "lead_agent") == "finalis"
+
+
+def test_resolve_root_run_name_falls_back_to_assistant_id():
+    assert resolve_root_run_name({}, "my-agent") == "my-agent"
+
+
+def test_resolve_root_run_name_falls_back_to_lead_agent():
+    assert resolve_root_run_name({}, None) == "lead_agent"
+
+
+def test_resolve_root_run_name_prefers_context_over_configurable():
+    config = {
+        "context": {"agent_name": "ctx-agent"},
+        "configurable": {"agent_name": "cfg-agent"},
+    }
+
+    assert resolve_root_run_name(config, "lead_agent") == "ctx-agent"
+
+
+def test_resolve_root_run_name_ignores_blank_agent_name():
+    assert resolve_root_run_name({"context": {"agent_name": "   "}}, "my-agent") == "my-agent"
+
+
+def test_resolve_root_run_name_ignores_non_string_agent_name():
+    assert resolve_root_run_name({"context": {"agent_name": None}}, "my-agent") == "my-agent"
@@ -95,6 +95,108 @@ async def test_run_agent_threads_explicit_app_config_into_config_only_factory():
    bridge.cleanup.assert_awaited_once_with(record.run_id, delay=60)


+@pytest.mark.anyio
+async def test_run_agent_defaults_root_run_name_from_assistant_id():
+    run_manager = RunManager()
+    record = await run_manager.create("thread-1", assistant_id="lead_agent")
+    bridge = SimpleNamespace(
+        publish=AsyncMock(),
+        publish_end=AsyncMock(),
+        cleanup=AsyncMock(),
+    )
+    captured: dict[str, object] = {}
+
+    class DummyAgent:
+        async def astream(self, graph_input, config=None, stream_mode=None, subgraphs=False):
+            captured["astream_run_name"] = config["run_name"]
+            yield {"messages": []}
+
+    def factory(*, config):
+        captured["factory_run_name"] = config["run_name"]
+        return DummyAgent()
+
+    await run_agent(
+        bridge,
+        run_manager,
+        record,
+        ctx=RunContext(checkpointer=None),
+        agent_factory=factory,
+        graph_input={},
+        config={},
+    )
+
+    assert captured["factory_run_name"] == "lead_agent"
+    assert captured["astream_run_name"] == "lead_agent"
+
+
+@pytest.mark.anyio
+async def test_run_agent_defaults_root_run_name_from_context_agent_name():
+    run_manager = RunManager()
+    record = await run_manager.create("thread-1", assistant_id="lead_agent")
+    bridge = SimpleNamespace(
+        publish=AsyncMock(),
+        publish_end=AsyncMock(),
+        cleanup=AsyncMock(),
+    )
+    captured: dict[str, object] = {}
+
+    class DummyAgent:
+        async def astream(self, graph_input, config=None, stream_mode=None, subgraphs=False):
+            captured["astream_run_name"] = config["run_name"]
+            yield {"messages": []}
+
+    def factory(*, config):
+        captured["factory_run_name"] = config["run_name"]
+        return DummyAgent()
+
+    await run_agent(
+        bridge,
+        run_manager,
+        record,
+        ctx=RunContext(checkpointer=None),
+        agent_factory=factory,
+        graph_input={},
+        config={"context": {"agent_name": "finalis"}},
+    )
+
+    assert captured["factory_run_name"] == "finalis"
+    assert captured["astream_run_name"] == "finalis"
+
+
+@pytest.mark.anyio
+async def test_run_agent_defaults_root_run_name_from_configurable_agent_name():
+    run_manager = RunManager()
+    record = await run_manager.create("thread-1", assistant_id="lead_agent")
+    bridge = SimpleNamespace(
+        publish=AsyncMock(),
+        publish_end=AsyncMock(),
+        cleanup=AsyncMock(),
+    )
+    captured: dict[str, object] = {}
+
+    class DummyAgent:
+        async def astream(self, graph_input, config=None, stream_mode=None, subgraphs=False):
+            captured["astream_run_name"] = config["run_name"]
+            yield {"messages": []}
+
+    def factory(*, config):
+        captured["factory_run_name"] = config["run_name"]
+        return DummyAgent()
+
+    await run_agent(
+        bridge,
+        run_manager,
+        record,
+        ctx=RunContext(checkpointer=None),
+        agent_factory=factory,
+        graph_input={},
+        config={"configurable": {"agent_name": "finalis"}},
+    )
+
+    assert captured["factory_run_name"] == "finalis"
+    assert captured["astream_run_name"] == "finalis"
+
+
@pytest.mark.anyio
 async def test_rollback_restores_snapshot_without_deleting_thread():
    checkpointer = FakeCheckpointer(put_result={"configurable": {"thread_id": "thread-1", "checkpoint_ns": "", "checkpoint_id": "restored-1"}})
@@ -0,0 +1,686 @@
+"""HTTP/runtime lifecycle E2E tests for the Gateway-owned runs API.
+
+These tests keep the external model out of scope while exercising the real
+FastAPI app, auth middleware, lifespan-created runtime dependencies,
+``start_run()``, ``run_agent()``, StreamBridge, checkpointer, run store, and
+thread metadata store.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import inspect
+import json
+import queue
+import threading
+import time
+import uuid
+from contextlib import suppress
+from pathlib import Path
+from typing import Any
+from unittest.mock import patch
+
+import pytest
+from _agent_e2e_helpers import FakeToolCallingModel, build_single_tool_call_model
+from langchain_core.messages import AIMessage, HumanMessage
+
+pytestmark = pytest.mark.no_auto_user
+
+
+_MINIMAL_CONFIG_YAML = """\
+log_level: info
+models:
+  - name: fake-test-model
+    display_name: Fake Test Model
+    use: langchain_openai:ChatOpenAI
+    model: gpt-4o-mini
+    api_key: $OPENAI_API_KEY
+    base_url: $OPENAI_API_BASE
+sandbox:
+  use: deerflow.sandbox.local:LocalSandboxProvider
+agents_api:
+  enabled: true
+title:
+  enabled: false
+memory:
+  enabled: false
+database:
+  backend: sqlite
+run_events:
+  backend: memory
+"""
+
+
+class _RunController:
+    """Cross-thread controls for the fake async agent."""
+
+    def __init__(self) -> None:
+        self.started = threading.Event()
+        self.checkpoint_written = threading.Event()
+        self.cancelled = threading.Event()
+        self.release = threading.Event()
+        self.instances: list[_ScriptedAgent] = []
+
+
+class _ScriptedAgent:
+    """Deterministic runtime double for lifecycle-only tests.
+
+    This is intentionally not a full LangGraph graph. Tests that need
+    controllable blocking, cancellation, and rollback checkpoints use the small
+    ``run_agent`` surface they exercise: ``astream()``, checkpointer/store
+    attachment, metadata, and interrupt node attributes. The real lead-agent
+    graph/tool dispatch path is covered separately by
+    ``test_stream_run_executes_real_lead_agent_setup_agent_business_path``.
+    """
+
+    def __init__(
+        self,
+        controller: _RunController,
+        *,
+        title: str,
+        answer: str,
+        block_after_first_chunk: bool = False,
+    ) -> None:
+        self.controller = controller
+        self.title = title
+        self.answer = answer
+        self.block_after_first_chunk = block_after_first_chunk
+        self.checkpointer: Any | None = None
+        self.store: Any | None = None
+        self.metadata = {"model_name": "fake-test-model"}
+        self.interrupt_before_nodes = None
+        self.interrupt_after_nodes = None
+        self.model = FakeToolCallingModel(responses=[AIMessage(content=self.answer)])
+
+    async def astream(self, graph_input, config=None, stream_mode=None, subgraphs=False):
+        del subgraphs
+        self.controller.started.set()
+
+        thread_id = _thread_id_from_config(config)
+        human_text = _last_human_text(graph_input)
+        human = HumanMessage(content=human_text)
+        ai = await self.model.ainvoke([human], config=config)
+        state = {"messages": [human.model_dump(), ai.model_dump()], "title": self.title}
+
+        if self.checkpointer is not None:
+            await _write_checkpoint(self.checkpointer, thread_id=thread_id, state=state)
+        self.controller.checkpoint_written.set()
+
+        yield _stream_item_for_mode(stream_mode, state)
+
+        if self.block_after_first_chunk:
+            try:
+                while not self.controller.release.is_set():
+                    await asyncio.sleep(0.05)
+            except asyncio.CancelledError:
+                self.controller.cancelled.set()
+                raise
+
+
+def _make_agent_factory(controller: _RunController, **agent_kwargs):
+    def factory(*, config):
+        del config
+        agent = _ScriptedAgent(controller, **agent_kwargs)
+        controller.instances.append(agent)
+        return agent
+
+    return factory
+
+
+def _build_fake_setup_agent_model(agent_name: str):
+    """Patch target for lead_agent.agent.create_chat_model.
+
+    The graph, tool registry, ToolNode dispatch, and setup_agent implementation
+    remain production code; this fake only replaces the external LLM call.
+    """
+
+    def fake_create_chat_model(*args: Any, **kwargs: Any) -> FakeToolCallingModel:
+        del args, kwargs
+        return build_single_tool_call_model(
+            tool_name="setup_agent",
+            tool_args={
+                "soul": f"# Runtime Business E2E\n\nAgent name: {agent_name}",
+                "description": "runtime lifecycle business path",
+            },
+            tool_call_id="call_runtime_business_1",
+            final_text=f"Created {agent_name} through the real setup_agent tool.",
+        )
+
+    return fake_create_chat_model
+
+
+@pytest.fixture
+def isolated_deer_flow_home(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
+    home = tmp_path / "deer-flow-home"
+    home.mkdir()
+    monkeypatch.setenv("DEER_FLOW_HOME", str(home))
+    monkeypatch.setenv("OPENAI_API_KEY", "sk-fake-key-not-used")
+    monkeypatch.setenv("OPENAI_API_BASE", "https://example.invalid")
+
+    staged_config = tmp_path / "config.yaml"
+    staged_config.write_text(_MINIMAL_CONFIG_YAML, encoding="utf-8")
+    monkeypatch.setenv("DEER_FLOW_CONFIG_PATH", str(staged_config))
+
+    staged_extensions_config = tmp_path / "extensions_config.json"
+    staged_extensions_config.write_text('{"mcpServers": {}, "skills": {}}', encoding="utf-8")
+    monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(staged_extensions_config))
+    return home
+
+
+def _reset_process_singletons(monkeypatch: pytest.MonkeyPatch) -> None:
+    """Clear runtime singletons that depend on this test's temporary config.
+
+    The Gateway app/lifespan path reads process-wide caches before wiring
+    request-scoped dependencies. These E2E tests stage a temporary
+    ``config.yaml``/``extensions_config.json`` and ``DEER_FLOW_HOME``, so the
+    caches below must be reset before app creation:
+
+    - app_config / extensions_config: parsed config file caches.
+    - paths: ``DEER_FLOW_HOME``-derived filesystem paths.
+    - persistence.engine: SQLAlchemy engine/session factory for the sqlite dir.
+    - app.gateway.deps: cached local auth provider/repository.
+
+    A shared public reset helper would be cleaner long-term; this test keeps
+    the reset boundary explicit because the PR is focused on runtime lifecycle
+    coverage rather than config-cache API cleanup.
+    """
+
+    from app.gateway import deps as deps_module
+    from deerflow.config import app_config as app_config_module
+    from deerflow.config import extensions_config as extensions_config_module
+    from deerflow.config import paths as paths_module
+    from deerflow.persistence import engine as engine_module
+
+    for module, attr, value in (
+        (app_config_module, "_app_config", None),
+        (app_config_module, "_app_config_path", None),
+        (app_config_module, "_app_config_mtime", None),
+        (app_config_module, "_app_config_is_custom", False),
+        (extensions_config_module, "_extensions_config", None),
+        (paths_module, "_paths_singleton", None),
+        (paths_module, "_paths", None),
+        (engine_module, "_engine", None),
+        (engine_module, "_session_factory", None),
+        (deps_module, "_cached_local_provider", None),
+        (deps_module, "_cached_repo", None),
+    ):
+        monkeypatch.setattr(module, attr, value, raising=False)
+
+
+def _preserve_process_config_singletons(monkeypatch: pytest.MonkeyPatch) -> None:
+    """Restore config singletons mutated as a side effect of AppConfig loading.
+
+    ``AppConfig.from_file()`` calls ``_apply_singleton_configs()``, which pushes
+    nested config sections into module-level caches used by middlewares, tool
+    selection, and runtime providers. Snapshotting those attributes with
+    ``monkeypatch`` lets pytest restore the pre-test values during teardown, so
+    loading the isolated test config does not leak into later tests.
+    """
+
+    from deerflow.config import (
+        acp_config,
+        agents_api_config,
+        checkpointer_config,
+        guardrails_config,
+        memory_config,
+        stream_bridge_config,
+        subagents_config,
+        summarization_config,
+        title_config,
+        tool_search_config,
+    )
+
+    for module, attr in (
+        (title_config, "_title_config"),
+        (summarization_config, "_summarization_config"),
+        (memory_config, "_memory_config"),
+        (agents_api_config, "_agents_api_config"),
+        (subagents_config, "_subagents_config"),
+        (tool_search_config, "_tool_search_config"),
+        (guardrails_config, "_guardrails_config"),
+        (checkpointer_config, "_checkpointer_config"),
+        (stream_bridge_config, "_stream_bridge_config"),
+        (acp_config, "_acp_agents"),
+    ):
+        monkeypatch.setattr(module, attr, getattr(module, attr), raising=False)
+
+
+@pytest.fixture
+def isolated_app(isolated_deer_flow_home: Path, monkeypatch: pytest.MonkeyPatch):
+    _preserve_process_config_singletons(monkeypatch)
+    _reset_process_singletons(monkeypatch)
+
+    from deerflow.config import app_config as app_config_module
+
+    cfg = app_config_module.get_app_config()
+    cfg.database.sqlite_dir = str(isolated_deer_flow_home / "db")
+
+    from app.gateway.app import create_app
+
+    return create_app()
+
+
+def _register_user(client, *, email: str = "runtime-e2e@example.com") -> str:
+    response = client.post(
+        "/api/v1/auth/register",
+        json={"email": email, "password": "very-strong-password-123"},
+    )
+    assert response.status_code == 201, response.text
+    csrf_token = client.cookies.get("csrf_token")
+    assert csrf_token
+    return csrf_token
+
+
+def _create_thread(client, csrf_token: str) -> str:
+    thread_id = str(uuid.uuid4())
+    response = client.post(
+        "/api/threads",
+        json={"thread_id": thread_id, "metadata": {"purpose": "runtime-lifecycle-e2e"}},
+        headers={"X-CSRF-Token": csrf_token},
+    )
+    assert response.status_code == 200, response.text
+    return thread_id
+
+
+def _run_body(**overrides) -> dict[str, Any]:
+    body: dict[str, Any] = {
+        "assistant_id": "lead_agent",
+        "input": {"messages": [{"role": "user", "content": "Run lifecycle E2E prompt"}]},
+        "config": {"recursion_limit": 50},
+        "stream_mode": ["values"],
+    }
+    body.update(overrides)
+    return body
+
+
+def _drain_stream(response, *, timeout: float = 10.0, max_bytes: int = 1024 * 1024) -> str:
+    chunks: queue.Queue[bytes | BaseException | object] = queue.Queue()
+    sentinel = object()
+
+    def read_stream() -> None:
+        try:
+            for chunk in response.iter_bytes():
+                chunks.put(chunk)
+                if b"event: end" in chunk:
+                    break
+        except BaseException as exc:  # pragma: no cover - reported in the main test thread
+            chunks.put(exc)
+        finally:
+            chunks.put(sentinel)
+
+    reader = threading.Thread(target=read_stream, daemon=True)
+    reader.start()
+
+    deadline = time.monotonic() + timeout
+    body = b""
+    while True:
+        remaining = deadline - time.monotonic()
+        if remaining <= 0:
+            raise AssertionError(f"SSE stream did not finish within {timeout}s; transcript tail={body[-4000:].decode('utf-8', errors='replace')}")
+        try:
+            chunk = chunks.get(timeout=remaining)
+        except queue.Empty as exc:
+            raise AssertionError(f"SSE stream did not produce data within {timeout}s; transcript tail={body[-4000:].decode('utf-8', errors='replace')}") from exc
+        if chunk is sentinel:
+            break
+        if isinstance(chunk, BaseException):
+            raise AssertionError("SSE reader failed") from chunk
+        body += chunk
+        if b"event: end" in body:
+            break
+        if len(body) >= max_bytes:
+            raise AssertionError(f"SSE stream exceeded {max_bytes} bytes without event: end")
+    if b"event: end" not in body:
+        raise AssertionError(f"SSE stream closed before event: end; transcript tail={body[-4000:].decode('utf-8', errors='replace')}")
+    return body.decode("utf-8", errors="replace")
+
+
+def _parse_sse(transcript: str) -> list[dict[str, Any]]:
+    events: list[dict[str, Any]] = []
+    for raw_frame in transcript.split("\n\n"):
+        frame = raw_frame.strip()
+        if not frame or frame.startswith(":"):
+            continue
+        parsed: dict[str, Any] = {}
+        for line in frame.splitlines():
+            if line.startswith("event: "):
+                parsed["event"] = line.removeprefix("event: ")
+            elif line.startswith("data: "):
+                payload = line.removeprefix("data: ")
+                parsed["data"] = json.loads(payload)
+            elif line.startswith("id: "):
+                parsed["id"] = line.removeprefix("id: ")
+        if parsed:
+            events.append(parsed)
+    return events
+
+
+def _run_id_from_response(response) -> str:
+    location = response.headers.get("content-location", "")
+    assert location, "run stream response must include Content-Location"
+    return location.rstrip("/").split("/")[-1]
+
+
+def _wait_for_status(client, thread_id: str, run_id: str, status: str, *, timeout: float = 5.0) -> dict:
+    deadline = time.monotonic() + timeout
+    last: dict | None = None
+    while time.monotonic() < deadline:
+        response = client.get(f"/api/threads/{thread_id}/runs/{run_id}")
+        assert response.status_code == 200, response.text
+        last = response.json()
+        if last["status"] == status:
+            return last
+        time.sleep(0.05)
+    raise AssertionError(f"Run {run_id} did not reach {status!r}; last={last!r}")
+
+
+def _thread_id_from_config(config: dict | None) -> str:
+    config = config or {}
+    context = config.get("context") if isinstance(config.get("context"), dict) else {}
+    configurable = config.get("configurable") if isinstance(config.get("configurable"), dict) else {}
+    thread_id = context.get("thread_id") or configurable.get("thread_id")
+    assert thread_id, f"runtime config did not contain thread_id: {config!r}"
+    return str(thread_id)
+
+
+def _last_human_text(graph_input: dict) -> str:
+    messages = graph_input.get("messages") or []
+    if not messages:
+        return ""
+    last = messages[-1]
+    content = getattr(last, "content", last)
+    if isinstance(content, str):
+        return content
+    return str(content)
+
+
+async def _write_checkpoint(checkpointer: Any, *, thread_id: str, state: dict[str, Any]) -> None:
+    from langgraph.checkpoint.base import empty_checkpoint
+
+    checkpoint = empty_checkpoint()
+    checkpoint["channel_values"] = dict(state)
+    checkpoint["channel_versions"] = {key: 1 for key in state}
+    config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
+    metadata = {
+        "source": "loop",
+        "step": 1,
+        "writes": {"scripted_agent": {"title": state.get("title"), "message_count": len(state.get("messages", []))}},
+        "parents": {},
+    }
+
+    result = checkpointer.aput(config, checkpoint, metadata, {})
+    if inspect.isawaitable(result):
+        await result
+
+
+def _stream_item_for_mode(stream_mode: Any, state: dict[str, Any]) -> Any:
+    if isinstance(stream_mode, list):
+        # ``run_agent`` passes a list when multiple modes/subgraphs are active.
+        return stream_mode[0], state
+    return state
+
+
+def test_stream_run_completes_and_persists_runtime_state(isolated_app):
+    """A streaming run should traverse the real runtime and leave state behind."""
+    from starlette.testclient import TestClient
+
+    controller = _RunController()
+    factory = _make_agent_factory(
+        controller,
+        title="Lifecycle E2E",
+        answer="Lifecycle complete.",
+    )
+
+    with (
+        patch("app.gateway.services.resolve_agent_factory", return_value=factory),
+        TestClient(isolated_app) as client,
+    ):
+        csrf_token = _register_user(client)
+        thread_id = _create_thread(client, csrf_token)
+
+        with client.stream(
+            "POST",
+            f"/api/threads/{thread_id}/runs/stream",
+            json=_run_body(),
+            headers={"X-CSRF-Token": csrf_token},
+        ) as response:
+            assert response.status_code == 200, response.read().decode()
+            run_id = _run_id_from_response(response)
+            transcript = _drain_stream(response)
+
+        events = _parse_sse(transcript)
+        assert [event["event"] for event in events] == ["metadata", "values", "end"]
+        assert events[0]["data"] == {"run_id": run_id, "thread_id": thread_id}
+        assert events[1]["data"]["title"] == "Lifecycle E2E"
+        assert events[1]["data"]["messages"][-1]["content"] == "Lifecycle complete."
+
+        run = client.get(f"/api/threads/{thread_id}/runs/{run_id}")
+        assert run.status_code == 200, run.text
+        assert run.json()["status"] == "success"
+
+        thread = client.get(f"/api/threads/{thread_id}")
+        assert thread.status_code == 200, thread.text
+        assert thread.json()["status"] == "idle"
+        assert thread.json()["values"]["title"] == "Lifecycle E2E"
+
+        messages = client.get(f"/api/threads/{thread_id}/runs/{run_id}/messages")
+        assert messages.status_code == 200, messages.text
+        message_events = messages.json()["data"]
+        event_types = [row["event_type"] for row in message_events]
+        assert "llm.human.input" in event_types
+        assert "llm.ai.response" in event_types
+        assert any(row["content"]["content"] == "Run lifecycle E2E prompt" for row in message_events if row["event_type"] == "llm.human.input")
+        assert any(row["content"]["content"] == "Lifecycle complete." for row in message_events if row["event_type"] == "llm.ai.response")
+
+
+def test_stream_run_executes_real_lead_agent_setup_agent_business_path(isolated_app, isolated_deer_flow_home: Path):
+    """A runtime stream should execute real lead-agent business code and tools."""
+    from starlette.testclient import TestClient
+
+    agent_name = "runtime-business-agent"
+
+    with (
+        patch(
+            "deerflow.agents.lead_agent.agent.create_chat_model",
+            new=_build_fake_setup_agent_model(agent_name),
+        ),
+        TestClient(isolated_app) as client,
+    ):
+        csrf_token = _register_user(client, email="business-e2e@example.com")
+        auth_user_id = client.get("/api/v1/auth/me").json()["id"]
+        thread_id = _create_thread(client, csrf_token)
+
+        body = _run_body(
+            input={
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": f"Create a custom agent named {agent_name}.",
+                    }
+                ]
+            },
+            context={
+                "agent_name": agent_name,
+                "is_bootstrap": True,
+                "thinking_enabled": False,
+                "is_plan_mode": False,
+                "subagent_enabled": False,
+            },
+        )
+
+        with client.stream(
+            "POST",
+            f"/api/threads/{thread_id}/runs/stream",
+            json=body,
+            headers={"X-CSRF-Token": csrf_token},
+        ) as response:
+            assert response.status_code == 200, response.read().decode()
+            run_id = _run_id_from_response(response)
+            transcript = _drain_stream(response, timeout=20.0)
+
+        events = _parse_sse(transcript)
+        event_names = [event["event"] for event in events]
+        assert "metadata" in event_names
+        assert "error" not in event_names, transcript
+        assert event_names[-1] == "end"
+
+        run = _wait_for_status(client, thread_id, run_id, "success", timeout=10.0)
+        assert run["assistant_id"] == "lead_agent"
+
+        expected_soul = isolated_deer_flow_home / "users" / auth_user_id / "agents" / agent_name / "SOUL.md"
+        assert expected_soul.exists(), f"setup_agent did not write SOUL.md. tmp tree: {sorted(str(p.relative_to(isolated_deer_flow_home)) for p in isolated_deer_flow_home.rglob('SOUL.md'))}"
+        assert f"Agent name: {agent_name}" in expected_soul.read_text(encoding="utf-8")
+        assert not (isolated_deer_flow_home / "users" / "default" / "agents" / agent_name).exists()
+
+
+def test_cancel_interrupt_stops_running_background_run(isolated_app):
+    """HTTP cancel?action=interrupt should stop the worker and persist interruption."""
+    from starlette.testclient import TestClient
+
+    controller = _RunController()
+    factory = _make_agent_factory(
+        controller,
+        title="Interrupt candidate",
+        answer="This run should be interrupted.",
+        block_after_first_chunk=True,
+    )
+
+    with (
+        patch("app.gateway.services.resolve_agent_factory", return_value=factory),
+        TestClient(isolated_app) as client,
+    ):
+        csrf_token = _register_user(client, email="interrupt-e2e@example.com")
+        thread_id = _create_thread(client, csrf_token)
+
+        created = client.post(
+            f"/api/threads/{thread_id}/runs",
+            json=_run_body(),
+            headers={"X-CSRF-Token": csrf_token},
+        )
+        assert created.status_code == 200, created.text
+        run_id = created.json()["run_id"]
+        assert controller.started.wait(5), "fake agent never started"
+
+        cancelled = client.post(
+            f"/api/threads/{thread_id}/runs/{run_id}/cancel?wait=true&action=interrupt",
+            headers={"X-CSRF-Token": csrf_token},
+        )
+        assert cancelled.status_code == 204, cancelled.text
+        assert controller.cancelled.wait(5), "fake agent task was not cancelled"
+
+        run = _wait_for_status(client, thread_id, run_id, "interrupted")
+        assert run["status"] == "interrupted"
+
+        thread = client.get(f"/api/threads/{thread_id}")
+        assert thread.status_code == 200, thread.text
+        assert thread.json()["status"] == "idle"
+
+
+@pytest.mark.anyio
+async def test_sse_consumer_disconnect_cancels_inflight_run():
+    """A disconnected SSE request should cancel an in-flight run when configured."""
+    from app.gateway.services import sse_consumer
+    from deerflow.runtime import DisconnectMode, MemoryStreamBridge, RunManager, RunStatus
+
+    bridge = MemoryStreamBridge()
+    run_manager = RunManager()
+    record = await run_manager.create("thread-disconnect", on_disconnect=DisconnectMode.cancel)
+    await run_manager.set_status(record.run_id, RunStatus.running)
+    await bridge.publish(record.run_id, "metadata", {"run_id": record.run_id, "thread_id": record.thread_id})
+    worker_started = asyncio.Event()
+    worker_cancelled = asyncio.Event()
+
+    async def _pending_worker() -> None:
+        try:
+            worker_started.set()
+            await asyncio.Event().wait()
+        except asyncio.CancelledError:
+            worker_cancelled.set()
+            raise
+
+    record.task = asyncio.create_task(_pending_worker())
+    await asyncio.wait_for(worker_started.wait(), timeout=1.0)
+
+    class _DisconnectedRequest:
+        headers: dict[str, str] = {}
+
+        async def is_disconnected(self) -> bool:
+            return True
+
+    try:
+        frames = []
+        async for frame in sse_consumer(bridge, record, _DisconnectedRequest(), run_manager):
+            frames.append(frame)
+
+        assert frames == []
+        assert record.abort_event.is_set()
+        assert record.status == RunStatus.interrupted
+        await asyncio.wait_for(worker_cancelled.wait(), timeout=1.0)
+        assert record.task.cancelled()
+    finally:
+        if record.task is not None and not record.task.done():
+            record.task.cancel()
+            with suppress(asyncio.CancelledError):
+                await record.task
+
+
+def test_cancel_rollback_restores_pre_run_checkpoint(isolated_app):
+    """HTTP cancel?action=rollback should restore the checkpoint captured before run start."""
+    from starlette.testclient import TestClient
+
+    controller = _RunController()
+    factory = _make_agent_factory(
+        controller,
+        title="During rollback run",
+        answer="This answer should be rolled back.",
+        block_after_first_chunk=True,
+    )
+
+    with (
+        patch("app.gateway.services.resolve_agent_factory", return_value=factory),
+        TestClient(isolated_app) as client,
+    ):
+        csrf_token = _register_user(client, email="rollback-e2e@example.com")
+        thread_id = _create_thread(client, csrf_token)
+
+        before = client.post(
+            f"/api/threads/{thread_id}/state",
+            json={
+                "values": {
+                    "title": "Before rollback",
+                    "messages": [{"type": "human", "content": "before"}],
+                },
+                "as_node": "test_seed",
+            },
+            headers={"X-CSRF-Token": csrf_token},
+        )
+        assert before.status_code == 200, before.text
+        assert before.json()["values"]["title"] == "Before rollback"
+
+        created = client.post(
+            f"/api/threads/{thread_id}/runs",
+            json=_run_body(),
+            headers={"X-CSRF-Token": csrf_token},
+        )
+        assert created.status_code == 200, created.text
+        run_id = created.json()["run_id"]
+        assert controller.checkpoint_written.wait(5), "fake agent did not write in-run checkpoint"
+
+        during = client.get(f"/api/threads/{thread_id}/state")
+        assert during.status_code == 200, during.text
+        assert during.json()["values"]["title"] == "During rollback run"
+
+        rolled_back = client.post(
+            f"/api/threads/{thread_id}/runs/{run_id}/cancel?wait=true&action=rollback",
+            headers={"X-CSRF-Token": csrf_token},
+        )
+        assert rolled_back.status_code == 204, rolled_back.text
+        assert controller.cancelled.wait(5), "rollback did not cancel the worker task"
+
+        run = _wait_for_status(client, thread_id, run_id, "error")
+        assert run["status"] == "error"
+
+        after = client.get(f"/api/threads/{thread_id}/state")
+        assert after.status_code == 200, after.text
+        assert after.json()["values"]["title"] == "Before rollback"
+        assert after.json()["values"]["messages"] == [{"type": "human", "content": "before"}]
@@ -0,0 +1,225 @@
+from __future__ import annotations
+
+import asyncio
+
+import pytest
+from langchain.agents.middleware import AgentMiddleware
+from langchain.tools import ToolRuntime
+from langgraph.runtime import Runtime
+
+from deerflow.sandbox.middleware import SandboxMiddleware
+from deerflow.sandbox.sandbox import Sandbox
+from deerflow.sandbox.sandbox_provider import SandboxProvider, reset_sandbox_provider, set_sandbox_provider
+from deerflow.sandbox.search import GrepMatch
+from deerflow.sandbox.tools import ls_tool
+
+
+class _SyncProvider(SandboxProvider):
+    def __init__(self) -> None:
+        self.thread_ids: list[str | None] = []
+
+    def acquire(self, thread_id: str | None = None) -> str:
+        self.thread_ids.append(thread_id)
+        return "sync-sandbox"
+
+    def get(self, sandbox_id: str) -> Sandbox | None:
+        return None
+
+    def release(self, sandbox_id: str) -> None:
+        return None
+
+
+class _SandboxStub(Sandbox):
+    def execute_command(self, command: str) -> str:
+        return "OK"
+
+    def read_file(self, path: str) -> str:
+        return "content"
+
+    def download_file(self, path: str) -> bytes:
+        return b"content"
+
+    def list_dir(self, path: str, max_depth: int = 2) -> list[str]:
+        return ["/mnt/user-data/workspace/file.txt"]
+
+    def write_file(self, path: str, content: str, append: bool = False) -> None:
+        return None
+
+    def glob(self, path: str, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
+        return [], False
+
+    def grep(
+        self,
+        path: str,
+        pattern: str,
+        *,
+        glob: str | None = None,
+        literal: bool = False,
+        case_sensitive: bool = False,
+        max_results: int = 100,
+    ) -> tuple[list[GrepMatch], bool]:
+        return [], False
+
+    def update_file(self, path: str, content: bytes) -> None:
+        return None
+
+
+class _AsyncOnlyProvider(SandboxProvider):
+    def __init__(self) -> None:
+        self.thread_ids: list[str | None] = []
+        self.released_ids: list[str] = []
+        self.sandbox = _SandboxStub("async-sandbox")
+
+    def acquire(self, thread_id: str | None = None) -> str:
+        raise AssertionError("async middleware should not call sync acquire")
+
+    async def acquire_async(self, thread_id: str | None = None) -> str:
+        self.thread_ids.append(thread_id)
+        return "async-sandbox"
+
+    def get(self, sandbox_id: str) -> Sandbox | None:
+        if sandbox_id == "async-sandbox":
+            return self.sandbox
+        return None
+
+    def release(self, sandbox_id: str) -> None:
+        self.released_ids.append(sandbox_id)
+        return None
+
+
+@pytest.mark.anyio
+async def test_provider_default_acquire_async_offloads_sync_acquire(monkeypatch: pytest.MonkeyPatch) -> None:
+    provider = _SyncProvider()
+    calls: list[tuple[object, tuple[object, ...]]] = []
+
+    async def fake_to_thread(func, /, *args):
+        calls.append((func, args))
+        return func(*args)
+
+    monkeypatch.setattr(asyncio, "to_thread", fake_to_thread)
+
+    sandbox_id = await provider.acquire_async("thread-1")
+
+    assert sandbox_id == "sync-sandbox"
+    assert provider.thread_ids == ["thread-1"]
+    assert calls == [(provider.acquire, ("thread-1",))]
+
+
+@pytest.mark.anyio
+async def test_abefore_agent_uses_async_provider_acquire() -> None:
+    provider = _AsyncOnlyProvider()
+    set_sandbox_provider(provider)
+    try:
+        middleware = SandboxMiddleware(lazy_init=False)
+
+        result = await middleware.abefore_agent({}, Runtime(context={"thread_id": "thread-2"}))
+    finally:
+        reset_sandbox_provider()
+
+    assert result == {"sandbox": {"sandbox_id": "async-sandbox"}}
+    assert provider.thread_ids == ["thread-2"]
+
+
+@pytest.mark.anyio
+@pytest.mark.parametrize(
+    ("middleware", "state", "runtime"),
+    [
+        (SandboxMiddleware(lazy_init=True), {}, Runtime(context={"thread_id": "thread-lazy"})),
+        (SandboxMiddleware(lazy_init=False), {}, Runtime(context={})),
+        (SandboxMiddleware(lazy_init=False), {"sandbox": {"sandbox_id": "existing"}}, Runtime(context={"thread_id": "thread-existing"})),
+    ],
+)
+async def test_abefore_agent_delegates_to_super_when_not_acquiring(
+    monkeypatch: pytest.MonkeyPatch,
+    middleware: SandboxMiddleware,
+    state: dict,
+    runtime: Runtime,
+) -> None:
+    calls: list[tuple[dict, Runtime]] = []
+
+    async def fake_super_abefore_agent(self, state_arg, runtime_arg):
+        calls.append((state_arg, runtime_arg))
+        return {"delegated": True}
+
+    monkeypatch.setattr(AgentMiddleware, "abefore_agent", fake_super_abefore_agent)
+
+    result = await middleware.abefore_agent(state, runtime)
+
+    assert result == {"delegated": True}
+    assert calls == [(state, runtime)]
+
+
+@pytest.mark.anyio
+async def test_default_lazy_tool_acquisition_uses_async_provider() -> None:
+    provider = _AsyncOnlyProvider()
+    set_sandbox_provider(provider)
+    try:
+        runtime = ToolRuntime(
+            state={},
+            context={"thread_id": "thread-lazy"},
+            config={"configurable": {}},
+            stream_writer=lambda _: None,
+            tools=[],
+            tool_call_id="call-1",
+            store=None,
+        )
+
+        result = await ls_tool.ainvoke({"runtime": runtime, "description": "list workspace", "path": "/mnt/user-data/workspace"})
+    finally:
+        reset_sandbox_provider()
+
+    assert result == "/mnt/user-data/workspace/file.txt"
+    assert provider.thread_ids == ["thread-lazy"]
+    assert runtime.state["sandbox"] == {"sandbox_id": "async-sandbox"}
+    assert runtime.context["sandbox_id"] == "async-sandbox"
+
+
+@pytest.mark.anyio
+@pytest.mark.parametrize(
+    ("state", "runtime", "expected_sandbox_id"),
+    [
+        ({"sandbox": {"sandbox_id": "state-sandbox"}}, Runtime(context={}), "state-sandbox"),
+        ({}, Runtime(context={"sandbox_id": "context-sandbox"}), "context-sandbox"),
+    ],
+)
+async def test_aafter_agent_releases_sandbox_off_thread(
+    monkeypatch: pytest.MonkeyPatch,
+    state: dict,
+    runtime: Runtime,
+    expected_sandbox_id: str,
+) -> None:
+    provider = _AsyncOnlyProvider()
+    to_thread_calls: list[tuple[object, tuple[object, ...]]] = []
+
+    async def fake_to_thread(func, /, *args):
+        to_thread_calls.append((func, args))
+        return func(*args)
+
+    monkeypatch.setattr(asyncio, "to_thread", fake_to_thread)
+    set_sandbox_provider(provider)
+    try:
+        result = await SandboxMiddleware().aafter_agent(state, runtime)
+    finally:
+        reset_sandbox_provider()
+
+    assert result is None
+    assert provider.released_ids == [expected_sandbox_id]
+    assert to_thread_calls == [(provider.release, (expected_sandbox_id,))]
+
+
+@pytest.mark.anyio
+async def test_aafter_agent_delegates_to_super_when_no_sandbox(monkeypatch: pytest.MonkeyPatch) -> None:
+    calls: list[tuple[dict, Runtime]] = []
+
+    async def fake_super_aafter_agent(self, state_arg, runtime_arg):
+        calls.append((state_arg, runtime_arg))
+        return {"delegated": True}
+
+    monkeypatch.setattr(AgentMiddleware, "aafter_agent", fake_super_aafter_agent)
+
+    state = {}
+    runtime = Runtime(context={})
+    result = await SandboxMiddleware().aafter_agent(state, runtime)
+
+    assert result == {"delegated": True}
+    assert calls == [(state, runtime)]
@@ -732,17 +732,27 @@ def test_cleanup_called_on_timed_out(monkeypatch):


 def test_cleanup_not_called_on_polling_safety_timeout(monkeypatch):
-    """Verify cleanup_background_task is NOT called on polling safety timeout.
+    """Verify cleanup_background_task is NOT called directly on polling safety timeout.

-    This prevents race conditions where the background task is still running
-    but the polling loop gives up. The cleanup should happen later when the
-    executor completes and sets a terminal status.
+    The task is still RUNNING so it cannot be safely removed yet. Instead,
+    cooperative cancellation is requested and a deferred cleanup is scheduled.
    """
    config = _make_subagent_config()
    # Keep max_poll_count small for test speed: (1 + 60) // 5 = 12
    config.timeout_seconds = 1
    events = []
    cleanup_calls = []
+    cancel_requests = []
+    scheduled_cleanups = []
+
+    class DummyCleanupTask:
+        def add_done_callback(self, _callback):
+            return None
+
+    def fake_create_task(coro):
+        scheduled_cleanups.append(coro)
+        coro.close()
+        return DummyCleanupTask()

    monkeypatch.setattr(task_tool_module, "SubagentStatus", FakeSubagentStatus)
    monkeypatch.setattr(
@@ -759,12 +769,18 @@ def test_cleanup_not_called_on_polling_safety_timeout(monkeypatch):
    )
    monkeypatch.setattr(task_tool_module, "get_stream_writer", lambda: events.append)
    monkeypatch.setattr(task_tool_module.asyncio, "sleep", _no_sleep)
+    monkeypatch.setattr(task_tool_module.asyncio, "create_task", fake_create_task)
    monkeypatch.setattr("deerflow.tools.get_available_tools", lambda **kwargs: [])
    monkeypatch.setattr(
        task_tool_module,
        "cleanup_background_task",
        lambda task_id: cleanup_calls.append(task_id),
    )
+    monkeypatch.setattr(
+        task_tool_module,
+        "request_cancel_background_task",
+        lambda task_id: cancel_requests.append(task_id),
+    )

    output = _run_task_tool(
        runtime=_make_runtime(),
@@ -775,8 +791,12 @@ def test_cleanup_not_called_on_polling_safety_timeout(monkeypatch):
    )

    assert output.startswith("Task polling timed out after 0 minutes")
-    # cleanup should NOT be called because the task is still RUNNING
+    # cleanup_background_task must NOT be called directly (task is still RUNNING)
    assert cleanup_calls == []
+    # cooperative cancellation must be requested
+    assert cancel_requests == ["tc-no-cleanup-safety-timeout"]
+    # a deferred cleanup coroutine must be scheduled
+    assert len(scheduled_cleanups) == 1


 def test_cleanup_scheduled_on_cancellation(monkeypatch):
@@ -109,7 +109,7 @@ class TestTitleMiddlewareCoreLogic:
        title = result["title"]

        assert title == "短标题"
-        title_middleware_module.create_chat_model.assert_called_once_with(thinking_enabled=False)
+        title_middleware_module.create_chat_model.assert_called_once_with(thinking_enabled=False, attach_tracing=False)
        model.ainvoke.assert_awaited_once()
        assert model.ainvoke.await_args.kwargs["config"] == {
            "run_name": "title_agent",
@@ -141,6 +141,7 @@ class TestTitleMiddlewareCoreLogic:
        title_middleware_module.create_chat_model.assert_called_once_with(
            name="title-model",
            thinking_enabled=False,
+            attach_tracing=False,
            app_config=app_config,
        )

@@ -95,6 +95,64 @@ def test_config_loaded_async_only_tool_gets_sync_wrapper(mock_bash, mock_cfg):
    assert async_tool.invoke({"x": 42}) == "result: 42"


+@patch("deerflow.tools.tools.get_app_config")
+@patch("deerflow.tools.tools.is_host_bash_allowed", return_value=True)
+def test_subagent_async_only_tool_gets_sync_wrapper(mock_bash, mock_cfg):
+    """Async-only tools added through the subagent path can be invoked by sync clients."""
+
+    async def async_tool_impl(x: int) -> str:
+        return f"subagent: {x}"
+
+    async_tool = StructuredTool(
+        name="async_subagent_tool",
+        description="Async-only subagent test tool.",
+        args_schema=AsyncToolArgs,
+        func=None,
+        coroutine=async_tool_impl,
+    )
+    mock_cfg.return_value = _make_minimal_config([])
+
+    with (
+        patch("deerflow.tools.tools.BUILTIN_TOOLS", []),
+        patch("deerflow.tools.tools.SUBAGENT_TOOLS", [async_tool]),
+    ):
+        result = get_available_tools(include_mcp=False, subagent_enabled=True, app_config=mock_cfg.return_value)
+
+    assert async_tool in result
+    assert async_tool.func is not None
+    assert async_tool.invoke({"x": 7}) == "subagent: 7"
+
+
+@patch("deerflow.tools.tools.get_app_config")
+@patch("deerflow.tools.tools.is_host_bash_allowed", return_value=True)
+def test_acp_async_only_tool_gets_sync_wrapper(mock_bash, mock_cfg):
+    """Async-only ACP tools can be invoked by sync clients."""
+
+    async def async_tool_impl(x: int) -> str:
+        return f"acp: {x}"
+
+    async_tool = StructuredTool(
+        name="invoke_acp_agent",
+        description="Async-only ACP test tool.",
+        args_schema=AsyncToolArgs,
+        func=None,
+        coroutine=async_tool_impl,
+    )
+    config = _make_minimal_config([])
+    config.acp_agents = {"codex": object()}
+    mock_cfg.return_value = config
+
+    with (
+        patch("deerflow.tools.tools.BUILTIN_TOOLS", []),
+        patch("deerflow.tools.builtins.invoke_acp_agent_tool.build_invoke_acp_agent_tool", return_value=async_tool),
+    ):
+        result = get_available_tools(include_mcp=False, app_config=config)
+
+    assert async_tool in result
+    assert async_tool.func is not None
+    assert async_tool.invoke({"x": 9}) == "acp: 9"
+
+
@patch("deerflow.tools.tools.get_app_config")
@patch("deerflow.tools.tools.is_host_bash_allowed", return_value=True)
 def test_no_duplicates_returned(mock_bash, mock_cfg):
@@ -5,10 +5,11 @@ from __future__ import annotations
 import pytest

 from deerflow.config import tracing_config as tracing_module
+from deerflow.config.tracing_config import reset_tracing_config


 def _reset_tracing_cache() -> None:
-    tracing_module._tracing_config = None
+    reset_tracing_config()


@pytest.fixture(autouse=True)
@@ -12,7 +12,7 @@ from deerflow.tracing import factory as tracing_factory

@pytest.fixture(autouse=True)
 def clear_tracing_env(monkeypatch):
-    from deerflow.config import tracing_config as tracing_module
+    from deerflow.config.tracing_config import reset_tracing_config

    for name in (
        "LANGSMITH_TRACING",
@@ -30,9 +30,9 @@ def clear_tracing_env(monkeypatch):
        "LANGFUSE_BASE_URL",
    ):
        monkeypatch.delenv(name, raising=False)
-    tracing_module._tracing_config = None
+    reset_tracing_config()
    yield
-    tracing_module._tracing_config = None
+    reset_tracing_config()


 def test_build_tracing_callbacks_returns_empty_list_when_disabled(monkeypatch):
@@ -114,12 +114,12 @@ def test_build_tracing_callbacks_raises_when_enabled_provider_fails(monkeypatch)


 def test_build_tracing_callbacks_raises_for_explicitly_enabled_misconfigured_provider(monkeypatch):
-    from deerflow.config import tracing_config as tracing_module
+    from deerflow.config.tracing_config import reset_tracing_config

    monkeypatch.setenv("LANGFUSE_TRACING", "true")
    monkeypatch.delenv("LANGFUSE_PUBLIC_KEY", raising=False)
    monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
-    tracing_module._tracing_config = None
+    reset_tracing_config()

    with pytest.raises(ValueError, match="LANGFUSE_PUBLIC_KEY"):
        tracing_factory.build_tracing_callbacks()
@@ -0,0 +1,137 @@
+"""Tests for deerflow.tracing.metadata.build_langfuse_trace_metadata."""
+
+from __future__ import annotations
+
+import pytest
+
+from deerflow.tracing import metadata as tracing_metadata
+
+
+@pytest.fixture(autouse=True)
+def _clear_tracing_env(monkeypatch):
+    from deerflow.config.tracing_config import reset_tracing_config
+
+    for name in (
+        "LANGFUSE_TRACING",
+        "LANGFUSE_PUBLIC_KEY",
+        "LANGFUSE_SECRET_KEY",
+        "LANGFUSE_BASE_URL",
+        "LANGSMITH_TRACING",
+        "LANGCHAIN_TRACING_V2",
+        "LANGCHAIN_TRACING",
+        "LANGSMITH_API_KEY",
+        "LANGCHAIN_API_KEY",
+    ):
+        monkeypatch.delenv(name, raising=False)
+    reset_tracing_config()
+    yield
+    reset_tracing_config()
+
+
+def _enable_langfuse(monkeypatch):
+    monkeypatch.setenv("LANGFUSE_TRACING", "true")
+    monkeypatch.setenv("LANGFUSE_PUBLIC_KEY", "pk-lf-test")
+    monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
+
+
+def test_returns_empty_when_langfuse_disabled(monkeypatch):
+    # No env vars set → langfuse not in enabled providers.
+    result = tracing_metadata.build_langfuse_trace_metadata(
+        thread_id="t-1",
+        user_id="u-1",
+        assistant_id="lead-agent",
+        model_name="gpt-4o",
+    )
+    assert result == {}
+
+
+def test_session_id_maps_to_thread_id(monkeypatch):
+    _enable_langfuse(monkeypatch)
+
+    result = tracing_metadata.build_langfuse_trace_metadata(
+        thread_id="thread-abc",
+        user_id="user-42",
+    )
+
+    assert result["langfuse_session_id"] == "thread-abc"
+
+
+def test_user_id_falls_back_to_default(monkeypatch):
+    _enable_langfuse(monkeypatch)
+
+    result = tracing_metadata.build_langfuse_trace_metadata(
+        thread_id="thread-abc",
+        user_id=None,
+    )
+
+    assert result["langfuse_user_id"] == "default"
+
+
+def test_user_id_explicit_value_wins(monkeypatch):
+    _enable_langfuse(monkeypatch)
+
+    result = tracing_metadata.build_langfuse_trace_metadata(
+        thread_id="thread-abc",
+        user_id="alice@example.com",
+    )
+
+    assert result["langfuse_user_id"] == "alice@example.com"
+
+
+def test_trace_name_uses_assistant_id_when_provided(monkeypatch):
+    _enable_langfuse(monkeypatch)
+
+    result = tracing_metadata.build_langfuse_trace_metadata(
+        thread_id="t",
+        assistant_id="custom-agent",
+    )
+
+    assert result["langfuse_trace_name"] == "custom-agent"
+
+
+def test_trace_name_defaults_to_lead_agent(monkeypatch):
+    _enable_langfuse(monkeypatch)
+
+    result = tracing_metadata.build_langfuse_trace_metadata(
+        thread_id="t",
+        assistant_id=None,
+    )
+
+    assert result["langfuse_trace_name"] == "lead-agent"
+
+
+def test_tags_include_env_and_model(monkeypatch):
+    _enable_langfuse(monkeypatch)
+
+    result = tracing_metadata.build_langfuse_trace_metadata(
+        thread_id="t",
+        environment="production",
+        model_name="gpt-4o",
+    )
+
+    assert result["langfuse_tags"] == ["env:production", "model:gpt-4o"]
+
+
+def test_tags_omitted_when_no_tag_inputs(monkeypatch):
+    _enable_langfuse(monkeypatch)
+
+    result = tracing_metadata.build_langfuse_trace_metadata(
+        thread_id="t",
+        user_id="u",
+    )
+
+    assert "langfuse_tags" not in result
+
+
+def test_thread_id_none_still_produces_metadata(monkeypatch):
+    # Stateless run paths may not have a thread_id — we still want
+    # user_id / trace_name to flow through so Users page works.
+    _enable_langfuse(monkeypatch)
+
+    result = tracing_metadata.build_langfuse_trace_metadata(
+        thread_id=None,
+        user_id="u-1",
+    )
+
+    assert result["langfuse_session_id"] is None
+    assert result["langfuse_user_id"] == "u-1"
@@ -218,6 +218,7 @@ def test_upload_files_does_not_adjust_permissions_for_local_sandbox(tmp_path):

    provider = MagicMock()
    provider.uses_thread_data_mounts = True
+    provider.needs_upload_permission_adjustment = False
    provider.acquire.return_value = "local"
    sandbox = MagicMock()
    provider.get.return_value = sandbox
@@ -227,12 +228,14 @@ def test_upload_files_does_not_adjust_permissions_for_local_sandbox(tmp_path):
        patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
        patch.object(uploads, "get_sandbox_provider", return_value=provider),
        patch.object(uploads, "_make_file_sandbox_writable") as make_writable,
+        patch.object(uploads, "_make_file_sandbox_readable") as make_readable,
    ):
        file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
        result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=[file], config=SimpleNamespace()))

    assert result.success is True
    make_writable.assert_not_called()
+    make_readable.assert_not_called()


 def test_upload_files_acquires_non_local_sandbox_before_writing(tmp_path):
@@ -430,6 +433,58 @@ def test_make_file_sandbox_writable_skips_symlinks(tmp_path):
    chmod.assert_not_called()


+def test_make_file_sandbox_readable_adds_read_bits_for_regular_files(tmp_path):
+    file_path = tmp_path / "data.csv"
+    file_path.write_bytes(b"csv-data")
+    # Simulate the 0o600 permissions set by open_upload_file_no_symlink
+    file_path.chmod(0o600)
+
+    uploads._make_file_sandbox_readable(file_path)
+
+    updated_mode = stat.S_IMODE(file_path.stat().st_mode)
+    assert updated_mode & stat.S_IRUSR
+    assert updated_mode & stat.S_IRGRP
+
+
+def test_make_file_sandbox_readable_skips_symlinks(tmp_path):
+    file_path = tmp_path / "target-link.txt"
+    file_path.write_text("hello", encoding="utf-8")
+    symlink_stat = MagicMock(st_mode=stat.S_IFLNK)
+
+    with (
+        patch.object(uploads.os, "lstat", return_value=symlink_stat),
+        patch.object(uploads.os, "chmod") as chmod,
+    ):
+        uploads._make_file_sandbox_readable(file_path)
+
+    chmod.assert_not_called()
+
+
+def test_upload_files_adjusts_read_permissions_for_mounted_non_local_sandbox(tmp_path):
+    thread_uploads_dir = tmp_path / "uploads"
+    thread_uploads_dir.mkdir(parents=True)
+
+    # AIO sandbox with LocalContainerBackend: uses_thread_data_mounts=True
+    # but needs_upload_permission_adjustment=True (default)
+    provider = MagicMock()
+    provider.uses_thread_data_mounts = True
+    provider.needs_upload_permission_adjustment = True
+
+    with (
+        patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
+        patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
+        patch.object(uploads, "get_sandbox_provider", return_value=provider),
+        patch.object(uploads, "_make_file_sandbox_readable") as make_readable,
+    ):
+        file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
+        result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-aio", request=MagicMock(), files=[file], config=SimpleNamespace()))
+
+    assert result.success is True
+    make_readable.assert_called_once()
+    called_path = make_readable.call_args[0][0]
+    assert called_path.name == "notes.txt"
+
+
 def test_upload_files_rejects_dotdot_and_dot_filenames(tmp_path):
    thread_uploads_dir = tmp_path / "uploads"
    thread_uploads_dir.mkdir(parents=True)
@@ -0,0 +1,248 @@
+"""Integration test: worker.run_agent injects Langfuse trace metadata.
+
+Verifies that the agent factory's resulting graph receives a
+``RunnableConfig`` whose ``metadata`` carries the Langfuse reserved keys
+(``langfuse_session_id`` / ``langfuse_user_id`` / ``langfuse_trace_name``).
+"""
+
+from __future__ import annotations
+
+import asyncio
+
+import pytest
+
+from deerflow.runtime.runs.manager import RunRecord
+from deerflow.runtime.runs.schemas import DisconnectMode, RunStatus
+from deerflow.runtime.runs.worker import RunContext, run_agent
+
+
+class _FakeAgent:
+    """Minimal LangGraph-like graph that captures the runnable config."""
+
+    def __init__(self) -> None:
+        self.captured_config: dict | None = None
+        self.metadata: dict = {}
+        # Worker may assign these attributes; need them to exist.
+        self.checkpointer = None
+        self.store = None
+        self.interrupt_before_nodes: list[str] = []
+        self.interrupt_after_nodes: list[str] = []
+
+    async def astream(self, graph_input, *, config, stream_mode, **kwargs):
+        self.captured_config = config
+        # Empty async generator — no chunks produced.
+        return
+        yield  # pragma: no cover (makes this an async generator)
+
+
+class _FakeRunManager:
+    async def set_status(self, *_args, **_kwargs) -> None:
+        return None
+
+    async def update_model_name(self, *_args, **_kwargs) -> None:
+        return None
+
+    async def update_run_completion(self, *_args, **_kwargs) -> None:
+        return None
+
+
+class _FakeBridge:
+    def __init__(self) -> None:
+        self.events: list[tuple[str, object]] = []
+
+    async def publish(self, _run_id, event, payload) -> None:
+        self.events.append((event, payload))
+
+    async def publish_end(self, _run_id) -> None:
+        self.events.append(("end", None))
+
+    async def cleanup(self, _run_id, *, delay: int = 0) -> None:
+        return None
+
+
+@pytest.fixture(autouse=True)
+def _clear_tracing_env(monkeypatch):
+    from deerflow.config.tracing_config import reset_tracing_config
+
+    for name in ("LANGFUSE_TRACING", "LANGFUSE_PUBLIC_KEY", "LANGFUSE_SECRET_KEY", "LANGFUSE_BASE_URL"):
+        monkeypatch.delenv(name, raising=False)
+    reset_tracing_config()
+    yield
+    reset_tracing_config()
+
+
+@pytest.mark.asyncio
+async def test_run_agent_injects_langfuse_metadata(monkeypatch):
+    monkeypatch.setenv("LANGFUSE_TRACING", "true")
+    monkeypatch.setenv("LANGFUSE_PUBLIC_KEY", "pk-lf-test")
+    monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
+    from deerflow.config.tracing_config import reset_tracing_config
+
+    reset_tracing_config()
+
+    fake_agent = _FakeAgent()
+
+    def agent_factory(config):
+        return fake_agent
+
+    record = RunRecord(
+        run_id="run-1",
+        thread_id="thread-xyz",
+        assistant_id="lead-agent",
+        status=RunStatus.pending,
+        on_disconnect=DisconnectMode.cancel,
+        model_name="gpt-4o",
+    )
+    record.abort_event = asyncio.Event()
+    ctx = RunContext(checkpointer=None)
+
+    await run_agent(
+        _FakeBridge(),
+        _FakeRunManager(),
+        record,
+        ctx=ctx,
+        agent_factory=agent_factory,
+        graph_input={"messages": []},
+        config={"configurable": {"thread_id": "thread-xyz"}},
+    )
+
+    assert fake_agent.captured_config is not None, "astream was not invoked"
+    metadata = fake_agent.captured_config.get("metadata") or {}
+    assert metadata.get("langfuse_session_id") == "thread-xyz"
+    # conftest.py autouse fixture injects ``test-user-autouse`` into the
+    # contextvar — the worker should read it via ``get_effective_user_id``.
+    user_id = metadata.get("langfuse_user_id")
+    assert user_id == "test-user-autouse", f"expected test-user-autouse, got {user_id}"
+    assert metadata.get("langfuse_trace_name") == "lead-agent"
+    tags = metadata.get("langfuse_tags") or []
+    assert "model:gpt-4o" in tags
+
+
+@pytest.mark.asyncio
+async def test_run_agent_falls_back_to_default_user_when_unset(monkeypatch):
+    """When no user is in the contextvar, langfuse_user_id falls back to 'default'.
+
+    Uses ``monkeypatch.setattr`` to redirect ``get_effective_user_id`` to return
+    ``"default"`` rather than directly mutating the contextvar — direct contextvar
+    operations across pytest test boundaries have produced spooky cross-file
+    pollution when combined with the langfuse OTel global tracer provider.
+    """
+    monkeypatch.setenv("LANGFUSE_TRACING", "true")
+    monkeypatch.setenv("LANGFUSE_PUBLIC_KEY", "pk-lf-test")
+    monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
+    from deerflow.config.tracing_config import reset_tracing_config
+    from deerflow.runtime.runs import worker as worker_module
+    from deerflow.runtime.user_context import DEFAULT_USER_ID
+
+    reset_tracing_config()
+    monkeypatch.setattr(worker_module, "get_effective_user_id", lambda: DEFAULT_USER_ID)
+
+    fake_agent = _FakeAgent()
+
+    def agent_factory(config):
+        return fake_agent
+
+    record = RunRecord(
+        run_id="run-fallback",
+        thread_id="thread-fb",
+        assistant_id="lead-agent",
+        status=RunStatus.pending,
+        on_disconnect=DisconnectMode.cancel,
+    )
+    record.abort_event = asyncio.Event()
+    ctx = RunContext(checkpointer=None)
+
+    await run_agent(
+        _FakeBridge(),
+        _FakeRunManager(),
+        record,
+        ctx=ctx,
+        agent_factory=agent_factory,
+        graph_input={"messages": []},
+        config={"configurable": {"thread_id": "thread-fb"}},
+    )
+
+    metadata = fake_agent.captured_config.get("metadata") or {}
+    assert metadata.get("langfuse_user_id") == "default"
+
+
+@pytest.mark.asyncio
+async def test_run_agent_preserves_caller_metadata_overrides(monkeypatch):
+    """Caller-provided langfuse_* keys must NOT be overridden by the default injection."""
+    monkeypatch.setenv("LANGFUSE_TRACING", "true")
+    monkeypatch.setenv("LANGFUSE_PUBLIC_KEY", "pk-lf-test")
+    monkeypatch.setenv("LANGFUSE_SECRET_KEY", "sk-lf-test")
+    from deerflow.config.tracing_config import reset_tracing_config
+
+    reset_tracing_config()
+
+    fake_agent = _FakeAgent()
+
+    def agent_factory(config):
+        return fake_agent
+
+    record = RunRecord(
+        run_id="run-2",
+        thread_id="thread-default",
+        assistant_id="lead-agent",
+        status=RunStatus.pending,
+        on_disconnect=DisconnectMode.cancel,
+    )
+    record.abort_event = asyncio.Event()
+    ctx = RunContext(checkpointer=None)
+
+    await run_agent(
+        _FakeBridge(),
+        _FakeRunManager(),
+        record,
+        ctx=ctx,
+        agent_factory=agent_factory,
+        graph_input={"messages": []},
+        config={
+            "configurable": {"thread_id": "thread-default"},
+            "metadata": {
+                "langfuse_session_id": "custom-session-id",
+                "langfuse_user_id": "explicit-user",
+            },
+        },
+    )
+
+    metadata = fake_agent.captured_config.get("metadata") or {}
+    # Caller-supplied keys win.
+    assert metadata["langfuse_session_id"] == "custom-session-id"
+    assert metadata["langfuse_user_id"] == "explicit-user"
+    # Worker still fills in keys that the caller didn't set.
+    assert metadata["langfuse_trace_name"] == "lead-agent"
+
+
+@pytest.mark.asyncio
+async def test_run_agent_skips_metadata_when_langfuse_disabled(monkeypatch):
+    fake_agent = _FakeAgent()
+
+    def agent_factory(config):
+        return fake_agent
+
+    record = RunRecord(
+        run_id="run-3",
+        thread_id="thread-noop",
+        assistant_id="lead-agent",
+        status=RunStatus.pending,
+        on_disconnect=DisconnectMode.cancel,
+    )
+    record.abort_event = asyncio.Event()
+    ctx = RunContext(checkpointer=None)
+
+    await run_agent(
+        _FakeBridge(),
+        _FakeRunManager(),
+        record,
+        ctx=ctx,
+        agent_factory=agent_factory,
+        graph_input={"messages": []},
+        config={"configurable": {"thread_id": "thread-noop"}},
+    )
+
+    metadata = fake_agent.captured_config.get("metadata") or {}
+    assert "langfuse_session_id" not in metadata
+    assert "langfuse_user_id" not in metadata
+    assert "langfuse_trace_name" not in metadata
@@ -1,5 +1,5 @@
 version = 1
-revision = 2
+revision = 3
 requires-python = ">=3.12"
 resolution-markers = [
    "python_full_version >= '3.14' and sys_platform == 'win32'",
@@ -1504,11 +1504,11 @@ wheels = [

 [[package]]
 name = "idna"
-version = "3.13"
+version = "3.15"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/ce/cc/762dfb036166873f0059f3b7de4565e1b5bc3d6f28a414c13da27e442f99/idna-3.13.tar.gz", hash = "sha256:585ea8fe5d69b9181ec1afba340451fba6ba764af97026f92a91d4eef164a242", size = 194210, upload-time = "2026-04-22T16:42:42.314Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/82/77/7b3966d0b9d1d31a36ddf1746926a11dface89a83409bf1483f0237aa758/idna-3.15.tar.gz", hash = "sha256:ca962446ea538f7092a95e057da437618e886f4d349216d2b1e294abfdb65fdc", size = 199245, upload-time = "2026-05-12T22:45:57.011Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/5d/13/ad7d7ca3808a898b4612b6fe93cde56b53f3034dcde235acb1f0e1df24c6/idna-3.13-py3-none-any.whl", hash = "sha256:892ea0cde124a99ce773decba204c5552b69c3c67ffd5f232eb7696135bc8bb3", size = 68629, upload-time = "2026-04-22T16:42:40.909Z" },
+    { url = "https://files.pythonhosted.org/packages/d2/23/408243171aa9aaba178d3e2559159c24c1171a641aa83b67bdd3394ead8e/idna-3.15-py3-none-any.whl", hash = "sha256:048adeaf8c2d788c40fee287673ccaa74c24ffd8dcf09ffa555a2fbb59f10ac8", size = 72340, upload-time = "2026-05-12T22:45:55.733Z" },
 ]

 [[package]]
@@ -1731,128 +1731,128 @@ packages:
    resolution: {integrity: sha512-FqALmHI8D4o6lk/LRWDnhw95z5eO+eAa6ORjVg09YRR7BkcM6oPHU9uyC0gtQG5vpFLvgpeU4+zEAz2H8APHNw==}
    engines: {node: '>= 10'}

-  '@rollup/rollup-android-arm-eabi@4.60.3':
-    resolution: {integrity: sha512-x35CNW/ANXG3hE/EZpRU8MXX1JDN86hBb2wMGAtltkz7pc6cxgjpy1OMMfDosOQ+2hWqIkag/fGok1Yady9nGw==}
+  '@rollup/rollup-android-arm-eabi@4.60.4':
+    resolution: {integrity: sha512-F5QXMSiFebS9hKZj02XhWLLnRpJ3B3AROP0tWbFBSj+6kCbg5m9j5JoHKd4mmSVy5mS/IMQloYgYxCuJC0fxEQ==}
    cpu: [arm]
    os: [android]

-  '@rollup/rollup-android-arm64@4.60.3':
-    resolution: {integrity: sha512-xw3xtkDApIOGayehp2+Rz4zimfkaX65r4t47iy+ymQB2G4iJCBBfj0ogVg5jpvjpn8UWn/+q9tprxleYeNp3Hw==}
+  '@rollup/rollup-android-arm64@4.60.4':
+    resolution: {integrity: sha512-GxxTKApUpzRhof7poWvCJHRF51C67u1R7D6DiluBE8wKU1u5GWE8t+v81JvJYtbawoBFX1hLv5Ei4eVjkWokaw==}
    cpu: [arm64]
    os: [android]

-  '@rollup/rollup-darwin-arm64@4.60.3':
-    resolution: {integrity: sha512-vo6Y5Qfpx7/5EaamIwi0WqW2+zfiusVihKatLvtN1VFVy3D13uERk/6gZLU1UiHRL6fDXqj/ELIeVRGnvcTE1g==}
+  '@rollup/rollup-darwin-arm64@4.60.4':
+    resolution: {integrity: sha512-tua0TaJxMOB1R0V0RS1jFZ/RpURFDJIOR2A6jWwQeawuFyS4gBW+rntLRaQd0EQ4bd6Vp44Z2rXW+YYDBsj6IA==}
    cpu: [arm64]
    os: [darwin]

-  '@rollup/rollup-darwin-x64@4.60.3':
-    resolution: {integrity: sha512-D+0QGcZhBzTN82weOnsSlY7V7+RMmPuF1CkbxyMAGE8+ZHeUjyb76ZiWmBlCu//AQQONvxcqRbwZTajZKqjuOw==}
+  '@rollup/rollup-darwin-x64@4.60.4':
+    resolution: {integrity: sha512-CSKq7MsP+5PFIcydhAiR1K0UhEI1A2jWXVKHPCBZ151yOutENwvnPocgVHkivu2kviURtCEB6zUQw0vs8RrhMg==}
    cpu: [x64]
    os: [darwin]

-  '@rollup/rollup-freebsd-arm64@4.60.3':
-    resolution: {integrity: sha512-6HnvHCT7fDyj6R0Ph7A6x8dQS/S38MClRWeDLqc0MdfWkxjiu1HSDYrdPhqSILzjTIC/pnXbbJbo+ft+gy/9hQ==}
+  '@rollup/rollup-freebsd-arm64@4.60.4':
+    resolution: {integrity: sha512-+O8OkVdyvXMtJEciu2wS/pzm1IxntEEQx3z5TAVy4l32G0etZn+RsA48ARRrFm6Ri8fvqPQfgrvNxSjKAbnd3g==}
    cpu: [arm64]
    os: [freebsd]

-  '@rollup/rollup-freebsd-x64@4.60.3':
-    resolution: {integrity: sha512-KHLgC3WKlUYW3ShFKnnosZDOJ0xjg9zp7au3sIm2bs/tGBeC2ipmvRh/N7JKi0t9Ue20C0dpEshi8WUubg+cnA==}
+  '@rollup/rollup-freebsd-x64@4.60.4':
+    resolution: {integrity: sha512-Iw3oMskH3AfNuhU0MSN7vNbdi4me/NiYo2azqPz/Le16zHSa+3RRmliCMWWQmh4lcndccU40xcJuTYJZxNo/lw==}
    cpu: [x64]
    os: [freebsd]

-  '@rollup/rollup-linux-arm-gnueabihf@4.60.3':
-    resolution: {integrity: sha512-DV6fJoxEYWJOvaZIsok7KrYl0tPvga5OZ2yvKHNNYyk/2roMLqQAbGhr78EQ5YhHpnhLKJD3S1WFusAkmUuV5g==}
+  '@rollup/rollup-linux-arm-gnueabihf@4.60.4':
+    resolution: {integrity: sha512-EIPRXTVQpHyF8WOo219AD2yEltPehLTcTMz2fn6JsatLYSzQf00hj3rulF+yauOlF9/FtM2WpkT/hJh/KJFGhA==}
    cpu: [arm]
    os: [linux]

-  '@rollup/rollup-linux-arm-musleabihf@4.60.3':
-    resolution: {integrity: sha512-mQKoJAzvuOs6F+TZybQO4GOTSMUu7v0WdxEk24krQ/uUxXoPTtHjuaUuPmFhtBcM4K0ons8nrE3JyhTuCFtT/w==}
+  '@rollup/rollup-linux-arm-musleabihf@4.60.4':
+    resolution: {integrity: sha512-J3Yh9PzzF1Ovah2At+lHiGQdsYgArxBbXv/zHfSyaiFQEqvNv7DcW98pCrmdjCZBrqBiKrKKe2V+aaSGWuBe/w==}
    cpu: [arm]
    os: [linux]

-  '@rollup/rollup-linux-arm64-gnu@4.60.3':
-    resolution: {integrity: sha512-Whjj2qoiJ6+OOJMGptTYazaJvjOJm+iKHpXQM1P3LzGjt7Ff++Tp7nH4N8J/BUA7R9IHfDyx4DJIflifwnbmIA==}
+  '@rollup/rollup-linux-arm64-gnu@4.60.4':
+    resolution: {integrity: sha512-BFDEZMYfUvLn37ONE1yMBojPxnMlTFsdyNoqncT0qFq1mAfllL+ATMMJd8TeuVMiX84s1KbcxcZbXInmcO2mRg==}
    cpu: [arm64]
    os: [linux]

-  '@rollup/rollup-linux-arm64-musl@4.60.3':
-    resolution: {integrity: sha512-4YTNHKqGng5+yiZt3mg77nmyuCfmNfX4fPmyUapBcIk+BdwSwmCWGXOUxhXbBEkFHtoN5boLj/5NON+u5QC9tg==}
+  '@rollup/rollup-linux-arm64-musl@4.60.4':
+    resolution: {integrity: sha512-pc9EYOSlOgdQ2uPl1o9PF6/kLSgaUosia7gOuS8mB69IxJvlclko1MECXysjs5ryez1/5zjYqx3+xYU0TU6R1A==}
    cpu: [arm64]
    os: [linux]

-  '@rollup/rollup-linux-loong64-gnu@4.60.3':
-    resolution: {integrity: sha512-SU3kNlhkpI4UqlUc2VXPGK9o886ZsSeGfMAX2ba2b8DKmMXq4AL7KUrkSWVbb7koVqx41Yczx6dx5PNargIrEA==}
+  '@rollup/rollup-linux-loong64-gnu@4.60.4':
+    resolution: {integrity: sha512-NxnomyxYerDh5n4iLrNa+sH+Z+U4BMEE46V2PgQ/hoB909i8gV1M5wPojWg9fk1jWpO3IQnOs20K4wyZuFLEFQ==}
    cpu: [loong64]
    os: [linux]

-  '@rollup/rollup-linux-loong64-musl@4.60.3':
-    resolution: {integrity: sha512-6lDLl5h4TXpB1mTf2rQWnAk/LcXrx9vBfu/DT5TIPhvMhRWaZ5MxkIc8u4lJAmBo6klTe1ywXIUHFjylW505sg==}
+  '@rollup/rollup-linux-loong64-musl@4.60.4':
+    resolution: {integrity: sha512-nbJnQ8a3z1mtmrwImCYhc6BGpThAyYVRQxw9uKSKG4wR6aAYno9sVjJ0zaZcW9BPJX1GbrDPf+SvdWjgTuDmnw==}
    cpu: [loong64]
    os: [linux]

-  '@rollup/rollup-linux-ppc64-gnu@4.60.3':
-    resolution: {integrity: sha512-BMo8bOw8evlup/8G+cj5xWtPyp93xPdyoSN16Zy90Q2QZ0ZYRhCt6ZJSwbrRzG9HApFabjwj2p25TUPDWrhzqQ==}
+  '@rollup/rollup-linux-ppc64-gnu@4.60.4':
+    resolution: {integrity: sha512-2EU6acNrQLd8tYvo/LXW535wupT3m6fo7HKo6lr7ktQoItxTyOL1ZCR/GfGCuXl2vR+zmfI6eRXkSemafv+iVg==}
    cpu: [ppc64]
    os: [linux]

-  '@rollup/rollup-linux-ppc64-musl@4.60.3':
-    resolution: {integrity: sha512-E0L8X1dZN1/Rph+5VPF6Xj2G7JJvMACVXtamTJIDrVI44Y3K+G8gQaMEAavbqCGTa16InptiVrX6eM6pmJ+7qA==}
+  '@rollup/rollup-linux-ppc64-musl@4.60.4':
+    resolution: {integrity: sha512-WeBtoMuaMxiiIrO2IYP3xs6GMWkJP2C0EoT8beTLkUPmzV1i/UcOSVw1d5r9KBODtHKilG5yFxsGRnBbK3wJ4A==}
    cpu: [ppc64]
    os: [linux]

-  '@rollup/rollup-linux-riscv64-gnu@4.60.3':
-    resolution: {integrity: sha512-oZJ/WHaVfHUiRAtmTAeo3DcevNsVvH8mbvodjZy7D5QKvCefO371SiKRpxoDcCxB3PTRTLayWBkvmDQKTcX/sw==}
+  '@rollup/rollup-linux-riscv64-gnu@4.60.4':
+    resolution: {integrity: sha512-FJHFfqpKUI3A10WrWKiFbBZ7yVbGT4q4B5o1qKFFojqpaYoh9LrQgqWCmmcxQzVSXYtyB5bzkXrYzlHTs21MYA==}
    cpu: [riscv64]
    os: [linux]

-  '@rollup/rollup-linux-riscv64-musl@4.60.3':
-    resolution: {integrity: sha512-Dhbyh7j9FybM3YaTgaHmVALwA8AkUwTPccyCQ79TG9AJUsMQqgN1DDEZNr4+QUfwiWvLDumW5vdwzoeUF+TNxQ==}
+  '@rollup/rollup-linux-riscv64-musl@4.60.4':
+    resolution: {integrity: sha512-mcEl6CUT5IAUmQf1m9FYSmVqCJlpQ8r8eyftFUHG8i9OhY7BkBXSUdnLH5DOf0wCOjcP9v/QO93zpmF1SptCCw==}
    cpu: [riscv64]
    os: [linux]

-  '@rollup/rollup-linux-s390x-gnu@4.60.3':
-    resolution: {integrity: sha512-cJd1X5XhHHlltkaypz1UcWLA8AcoIi1aWhsvaWDskD1oz2eKCypnqvTQ8ykMNI0RSmm7NkTdSqSSD7zM0xa6Ig==}
+  '@rollup/rollup-linux-s390x-gnu@4.60.4':
+    resolution: {integrity: sha512-ynt3JxVd2w2buzoKDWIyiV1pJW93xlQic1THVLXilz429oijRpSHivZAgp65KBu+cMcgf1eVVjdnTLvPxgCuoQ==}
    cpu: [s390x]
    os: [linux]

-  '@rollup/rollup-linux-x64-gnu@4.60.3':
-    resolution: {integrity: sha512-DAZDBHQfG2oQuhY7mc6I3/qB4LU2fQCjRvxbDwd/Jdvb9fypP4IJ4qmtu6lNjes6B531AI8cg1aKC2di97bUxA==}
+  '@rollup/rollup-linux-x64-gnu@4.60.4':
+    resolution: {integrity: sha512-Boiz5+MsaROEWDf+GGEwF8VMHGhlUoQMtIPjOgA5fv4osupqTVnJteQNKJwUcnUog2G55jYXH7KZFFiJe0TEzQ==}
    cpu: [x64]
    os: [linux]

-  '@rollup/rollup-linux-x64-musl@4.60.3':
-    resolution: {integrity: sha512-cRxsE8c13mZOh3vP+wLDxpQBRrOHDIGOWyDL93Sy0Ga8y515fBcC2pjUfFwUe5T7tqvTvWbCpg1URM/AXdWIXA==}
+  '@rollup/rollup-linux-x64-musl@4.60.4':
+    resolution: {integrity: sha512-+qfSY27qIrFfI/Hom04KYFw3GKZSGU4lXus51wsb5EuySfFlWRwjkKWoE9emgRw/ukoT4Udsj4W/+xxG8VbPKg==}
    cpu: [x64]
    os: [linux]

-  '@rollup/rollup-openbsd-x64@4.60.3':
-    resolution: {integrity: sha512-QaWcIgRxqEdQdhJqW4DJctsH6HCmo5vHxY0krHSX4jMtOqfzC+dqDGuHM87bu4H8JBeibWx7jFz+h6/4C8wA5Q==}
+  '@rollup/rollup-openbsd-x64@4.60.4':
+    resolution: {integrity: sha512-VpTfOPHgVXEBeeR8hZ2O0F3aSso+JDWqTWmTmzcQKted54IAdUVbxE+j/MVxUsKa8L20HJhv3vUezVPoquqWjA==}
    cpu: [x64]
    os: [openbsd]

-  '@rollup/rollup-openharmony-arm64@4.60.3':
-    resolution: {integrity: sha512-AaXwSvUi3QIPtroAUw1t5yHGIyqKEXwH54WUocFolZhpGDruJcs8c+xPNDRn4XiQsS7MEwnYsHW2l0MBLDMkWg==}
+  '@rollup/rollup-openharmony-arm64@4.60.4':
+    resolution: {integrity: sha512-IPOsh5aRYuLv/nkU51X10Bf75Bsf6+gZdx1X+QP5QM6lIJFHHqbHLG0uJn/hWthzo13UAc2umiUorqZy3axoZg==}
    cpu: [arm64]
    os: [openharmony]

-  '@rollup/rollup-win32-arm64-msvc@4.60.3':
-    resolution: {integrity: sha512-65LAKM/bAWDqKNEelHlcHvm2V+Vfb8C6INFxQXRHCvaVN1rJfwr4NvdP4FyzUaLqWfaCGaadf6UbTm8xJeYfEg==}
+  '@rollup/rollup-win32-arm64-msvc@4.60.4':
+    resolution: {integrity: sha512-4QzE9E81OohJ/HKzHhsqU+zcYYojVOXlFMs1DdyMT6qXl/niOH7AVElmmEdUNHHS/oRkc++d5k6Vy85zFs0DEw==}
    cpu: [arm64]
    os: [win32]

-  '@rollup/rollup-win32-ia32-msvc@4.60.3':
-    resolution: {integrity: sha512-EEM2gyhBF5MFnI6vMKdX1LAosE627RGBzIoGMdLloPZkXrUN0Ckqgr2Qi8+J3zip/8NVVro3/FjB+tjhZUgUHA==}
+  '@rollup/rollup-win32-ia32-msvc@4.60.4':
+    resolution: {integrity: sha512-zTPgT1YuHHcd+Tmx7h8aml0FWFVelV5N54oHow9SLj+GfoDy/huQ+UV396N/C7KpMDMiPspRktzM1/0r1usYEA==}
    cpu: [ia32]
    os: [win32]

-  '@rollup/rollup-win32-x64-gnu@4.60.3':
-    resolution: {integrity: sha512-E5Eb5H/DpxaoXH++Qkv28RcUJboMopmdDUALBczvHMf7hNIxaDZqwY5lK12UK1BHacSmvupoEWGu+n993Z0y1A==}
+  '@rollup/rollup-win32-x64-gnu@4.60.4':
+    resolution: {integrity: sha512-DRS4G7mi9lJxqEDezIkKCaUIKCrLUUDCUaCsTPCi/rtqaC6D/jjwslMQyiDU50Ka0JKpeXeRBFBAXwArY52vBw==}
    cpu: [x64]
    os: [win32]

-  '@rollup/rollup-win32-x64-msvc@4.60.3':
-    resolution: {integrity: sha512-hPt/bgL5cE+Qp+/TPHBqptcAgPzgj46mPcg/16zNUmbQk0j+mOEQV/+Lqu8QRtDV3Ek95Q6FeFITpuhl6OTsAA==}
+  '@rollup/rollup-win32-x64-msvc@4.60.4':
+    resolution: {integrity: sha512-QVTUovf40zgTqlFVrKA1uXMVvU2QWEFWfAH8Wdc48IxLvrJMQVMBRjuQyUpzZCDkakImib9eVazbWlC6ksWtJw==}
    cpu: [x64]
    os: [win32]

@@ -4079,8 +4079,8 @@ packages:
    resolution: {integrity: sha512-lyuxPGr/Wfhrlem2CL/UcnUc1zcqKAImBDzukY7Y5F/yQiNdko6+fRLevlw1HgMySw7f611UIY408EtxRSoK3Q==}
    hasBin: true

-  lru-cache@11.3.6:
-    resolution: {integrity: sha512-Gf/KoL3C/MlI7Bt0PGI9I+TeTC/I6r/csU58N4BSNc4lppLBeKsOdFYkK+dX0ABDUMJNfCHTyPpzwwO21Awd3A==}
+  lru-cache@11.5.0:
+    resolution: {integrity: sha512-5YgH9UJd7wVb9hIouI2adWpgqrrICkt070Dnj8EUY1+B4B2P9eRLPAkAAo6NICA7CEhOIeBHl46u9zSNpNu7zA==}
    engines: {node: 20 || >=22}

  lucide-react@0.542.0:
@@ -4671,8 +4671,8 @@ packages:
    resolution: {integrity: sha512-PS08Iboia9mts/2ygV3eLpY5ghnUcfLV/EXTOW1E2qYxJKGGBUtNjN76FYHnMs36RmARn41bC0AZmn+rR0OVpQ==}
    engines: {node: ^10 || ^12 || >=14}

-  postcss@8.5.14:
-    resolution: {integrity: sha512-SoSL4+OSEtR99LHFZQiJLkT59C5B1amGO1NzTwj7TT1qCUgUO6hxOvzkOYxD+vMrXBM3XJIKzokoERdqQq/Zmg==}
+  postcss@8.5.15:
+    resolution: {integrity: sha512-FfR8sjd4em2T6fb3I2MwAJU7HWVMr9zba+enmQeeWFfCbm+UOC/0X4DS8XtpUTMwWMGbjKYP7xjfNekzyGmB3A==}
    engines: {node: ^10 || ^12 || >=14}

  postcss@8.5.6:
@@ -4962,8 +4962,8 @@ packages:
  robust-predicates@3.0.2:
    resolution: {integrity: sha512-IXgzBWvWQwE6PrDI05OvmXUIruQTcoMDzRsOd5CDvHCVLcLHMTSYvOK5Cm46kWqlV3yAbuSpBZdJ5oP5OUoStg==}

-  rollup@4.60.3:
-    resolution: {integrity: sha512-pAQK9HalE84QSm4Po3EmWIZPd3FnjkShVkiMlz1iligWYkWQ7wHYd1PF/T7QZ5TVSD6uSTon5gBVMSM4JfBV+A==}
+  rollup@4.60.4:
+    resolution: {integrity: sha512-WHeFSbZYsPu3+bLoNRUuAO+wavNlocOPf3wSHTP7hcFKVnJeWsYlCDbr3mTS14FCizf9ccIxXA8sGL8zKeQN3g==}
    engines: {node: '>=18.0.0', npm: '>=8.0.0'}
    hasBin: true

@@ -7297,79 +7297,79 @@ snapshots:

  '@resvg/resvg-wasm@2.6.2': {}

-  '@rollup/rollup-android-arm-eabi@4.60.3':
+  '@rollup/rollup-android-arm-eabi@4.60.4':
    optional: true

-  '@rollup/rollup-android-arm64@4.60.3':
+  '@rollup/rollup-android-arm64@4.60.4':
    optional: true

-  '@rollup/rollup-darwin-arm64@4.60.3':
+  '@rollup/rollup-darwin-arm64@4.60.4':
    optional: true

-  '@rollup/rollup-darwin-x64@4.60.3':
+  '@rollup/rollup-darwin-x64@4.60.4':
    optional: true

-  '@rollup/rollup-freebsd-arm64@4.60.3':
+  '@rollup/rollup-freebsd-arm64@4.60.4':
    optional: true

-  '@rollup/rollup-freebsd-x64@4.60.3':
+  '@rollup/rollup-freebsd-x64@4.60.4':
    optional: true

-  '@rollup/rollup-linux-arm-gnueabihf@4.60.3':
+  '@rollup/rollup-linux-arm-gnueabihf@4.60.4':
    optional: true

-  '@rollup/rollup-linux-arm-musleabihf@4.60.3':
+  '@rollup/rollup-linux-arm-musleabihf@4.60.4':
    optional: true

-  '@rollup/rollup-linux-arm64-gnu@4.60.3':
+  '@rollup/rollup-linux-arm64-gnu@4.60.4':
    optional: true

-  '@rollup/rollup-linux-arm64-musl@4.60.3':
+  '@rollup/rollup-linux-arm64-musl@4.60.4':
    optional: true

-  '@rollup/rollup-linux-loong64-gnu@4.60.3':
+  '@rollup/rollup-linux-loong64-gnu@4.60.4':
    optional: true

-  '@rollup/rollup-linux-loong64-musl@4.60.3':
+  '@rollup/rollup-linux-loong64-musl@4.60.4':
    optional: true

-  '@rollup/rollup-linux-ppc64-gnu@4.60.3':
+  '@rollup/rollup-linux-ppc64-gnu@4.60.4':
    optional: true

-  '@rollup/rollup-linux-ppc64-musl@4.60.3':
+  '@rollup/rollup-linux-ppc64-musl@4.60.4':
    optional: true

-  '@rollup/rollup-linux-riscv64-gnu@4.60.3':
+  '@rollup/rollup-linux-riscv64-gnu@4.60.4':
    optional: true

-  '@rollup/rollup-linux-riscv64-musl@4.60.3':
+  '@rollup/rollup-linux-riscv64-musl@4.60.4':
    optional: true

-  '@rollup/rollup-linux-s390x-gnu@4.60.3':
+  '@rollup/rollup-linux-s390x-gnu@4.60.4':
    optional: true

-  '@rollup/rollup-linux-x64-gnu@4.60.3':
+  '@rollup/rollup-linux-x64-gnu@4.60.4':
    optional: true

-  '@rollup/rollup-linux-x64-musl@4.60.3':
+  '@rollup/rollup-linux-x64-musl@4.60.4':
    optional: true

-  '@rollup/rollup-openbsd-x64@4.60.3':
+  '@rollup/rollup-openbsd-x64@4.60.4':
    optional: true

-  '@rollup/rollup-openharmony-arm64@4.60.3':
+  '@rollup/rollup-openharmony-arm64@4.60.4':
    optional: true

-  '@rollup/rollup-win32-arm64-msvc@4.60.3':
+  '@rollup/rollup-win32-arm64-msvc@4.60.4':
    optional: true

-  '@rollup/rollup-win32-ia32-msvc@4.60.3':
+  '@rollup/rollup-win32-ia32-msvc@4.60.4':
    optional: true

-  '@rollup/rollup-win32-x64-gnu@4.60.3':
+  '@rollup/rollup-win32-x64-gnu@4.60.4':
    optional: true

-  '@rollup/rollup-win32-x64-msvc@4.60.3':
+  '@rollup/rollup-win32-x64-msvc@4.60.4':
    optional: true

  '@rtsao/scc@1.1.0': {}
@@ -8067,7 +8067,7 @@ snapshots:
      '@vue/shared': 3.5.28
      estree-walker: 2.0.2
      magic-string: 0.30.21
-      postcss: 8.5.14
+      postcss: 8.5.15
      source-map-js: 1.2.1

  '@vue/compiler-ssr@3.5.28':
@@ -9947,7 +9947,7 @@ snapshots:
    dependencies:
      js-tokens: 4.0.0

-  lru-cache@11.3.6: {}
+  lru-cache@11.5.0: {}

  lucide-react@0.542.0(react@19.2.4):
    dependencies:
@@ -10941,7 +10941,7 @@ snapshots:
      picocolors: 1.1.1
      source-map-js: 1.2.1

-  postcss@8.5.14:
+  postcss@8.5.15:
    dependencies:
      nanoid: 3.3.12
      picocolors: 1.1.1
@@ -11282,35 +11282,35 @@ snapshots:

  robust-predicates@3.0.2: {}

-  rollup@4.60.3:
+  rollup@4.60.4:
    dependencies:
      '@types/estree': 1.0.8
    optionalDependencies:
-      '@rollup/rollup-android-arm-eabi': 4.60.3
-      '@rollup/rollup-android-arm64': 4.60.3
-      '@rollup/rollup-darwin-arm64': 4.60.3
-      '@rollup/rollup-darwin-x64': 4.60.3
-      '@rollup/rollup-freebsd-arm64': 4.60.3
-      '@rollup/rollup-freebsd-x64': 4.60.3
-      '@rollup/rollup-linux-arm-gnueabihf': 4.60.3
-      '@rollup/rollup-linux-arm-musleabihf': 4.60.3
-      '@rollup/rollup-linux-arm64-gnu': 4.60.3
-      '@rollup/rollup-linux-arm64-musl': 4.60.3
-      '@rollup/rollup-linux-loong64-gnu': 4.60.3
-      '@rollup/rollup-linux-loong64-musl': 4.60.3
-      '@rollup/rollup-linux-ppc64-gnu': 4.60.3
-      '@rollup/rollup-linux-ppc64-musl': 4.60.3
-      '@rollup/rollup-linux-riscv64-gnu': 4.60.3
-      '@rollup/rollup-linux-riscv64-musl': 4.60.3
-      '@rollup/rollup-linux-s390x-gnu': 4.60.3
-      '@rollup/rollup-linux-x64-gnu': 4.60.3
-      '@rollup/rollup-linux-x64-musl': 4.60.3
-      '@rollup/rollup-openbsd-x64': 4.60.3
-      '@rollup/rollup-openharmony-arm64': 4.60.3
-      '@rollup/rollup-win32-arm64-msvc': 4.60.3
-      '@rollup/rollup-win32-ia32-msvc': 4.60.3
-      '@rollup/rollup-win32-x64-gnu': 4.60.3
-      '@rollup/rollup-win32-x64-msvc': 4.60.3
+      '@rollup/rollup-android-arm-eabi': 4.60.4
+      '@rollup/rollup-android-arm64': 4.60.4
+      '@rollup/rollup-darwin-arm64': 4.60.4
+      '@rollup/rollup-darwin-x64': 4.60.4
+      '@rollup/rollup-freebsd-arm64': 4.60.4
+      '@rollup/rollup-freebsd-x64': 4.60.4
+      '@rollup/rollup-linux-arm-gnueabihf': 4.60.4
+      '@rollup/rollup-linux-arm-musleabihf': 4.60.4
+      '@rollup/rollup-linux-arm64-gnu': 4.60.4
+      '@rollup/rollup-linux-arm64-musl': 4.60.4
+      '@rollup/rollup-linux-loong64-gnu': 4.60.4
+      '@rollup/rollup-linux-loong64-musl': 4.60.4
+      '@rollup/rollup-linux-ppc64-gnu': 4.60.4
+      '@rollup/rollup-linux-ppc64-musl': 4.60.4
+      '@rollup/rollup-linux-riscv64-gnu': 4.60.4
+      '@rollup/rollup-linux-riscv64-musl': 4.60.4
+      '@rollup/rollup-linux-s390x-gnu': 4.60.4
+      '@rollup/rollup-linux-x64-gnu': 4.60.4
+      '@rollup/rollup-linux-x64-musl': 4.60.4
+      '@rollup/rollup-openbsd-x64': 4.60.4
+      '@rollup/rollup-openharmony-arm64': 4.60.4
+      '@rollup/rollup-win32-arm64-msvc': 4.60.4
+      '@rollup/rollup-win32-ia32-msvc': 4.60.4
+      '@rollup/rollup-win32-x64-gnu': 4.60.4
+      '@rollup/rollup-win32-x64-msvc': 4.60.4
      fsevents: 2.3.3

  roughjs@4.6.6:
@@ -11908,7 +11908,7 @@ snapshots:
      chokidar: 5.0.0
      destr: 2.0.5
      h3: 1.15.11
-      lru-cache: 11.3.6
+      lru-cache: 11.5.0
      node-fetch-native: 1.6.7
      ofetch: 1.5.1
      ufo: 1.6.4
@@ -11985,8 +11985,8 @@ snapshots:
      esbuild: 0.27.7
      fdir: 6.5.0(picomatch@4.0.4)
      picomatch: 4.0.4
-      postcss: 8.5.14
-      rollup: 4.60.3
+      postcss: 8.5.15
+      rollup: 4.60.4
      tinyglobby: 0.2.16
    optionalDependencies:
      '@types/node': 20.19.33
@@ -50,6 +50,8 @@ Intercepts clarification tool calls and converts them into proper user-facing re

 Detects when the agent is making the same tool call repeatedly without making progress. When a loop is detected, the middleware intervenes to break the cycle and prevents the agent from burning turns indefinitely.

+Warning interventions are queued per thread and run, then drained on the next model call as a single hidden `HumanMessage(name="loop_warning")` appended after existing tool results. This keeps provider tool-call pairing valid. Run start/end hooks clear stale or undelivered warnings, and hard stops still strip tool calls before forcing a final text response.
+
 **Configuration**: built-in, no user configuration.

 ---
@@ -50,6 +50,8 @@ import { Callout } from "nextra/components";

 检测 Agent 是否在没有取得进展的情况下重复进行相同的工具调用。检测到循环时，中间件会介入打破循环，防止 Agent 无限消耗轮次。

+Warning 介入会按 thread 和 run 排队，并在下一次模型调用时合并为一条隐藏的 `HumanMessage(name="loop_warning")`，追加到已有工具结果之后。这样不会破坏 provider 对 tool-call/tool-message 配对的校验。Run 开始和结束时会清理过期或未送达的 warning；达到 hard stop 时仍会清空 tool calls 并强制生成最终文本回复。
+
 **配置**：内置，无需用户配置。

 ---
@@ -120,7 +120,20 @@ if [ -z "$BETTER_AUTH_SECRET" ]; then
        echo -e "${GREEN}✓ BETTER_AUTH_SECRET loaded from $_secret_file${NC}"
    else
        export BETTER_AUTH_SECRET
-        BETTER_AUTH_SECRET="$(python3 -c 'import secrets; print(secrets.token_hex(32))')"
+        if command -v python3 > /dev/null 2>&1 && \
+            BETTER_AUTH_SECRET="$(python3 -c 'import sys; sys.version_info >= (3, 6) or sys.exit(1); import secrets; print(secrets.token_hex(32))' 2>/dev/null)"; then
+            true
+        elif command -v python > /dev/null 2>&1 && \
+            BETTER_AUTH_SECRET="$(python -c 'import sys; sys.version_info >= (3, 6) or sys.exit(1); import secrets; print(secrets.token_hex(32))' 2>/dev/null)"; then
+            true
+        elif command -v openssl > /dev/null 2>&1 && \
+            BETTER_AUTH_SECRET="$(openssl rand -hex 32)"; then
+            true
+        else
+            echo -e "${RED}✗ Cannot generate BETTER_AUTH_SECRET: python3, python, and openssl are all unavailable.${NC}" >&2
+            echo -e "${RED}  Set BETTER_AUTH_SECRET manually before running make up.${NC}" >&2
+            exit 1
+        fi
        echo "$BETTER_AUTH_SECRET" > "$_secret_file"
        chmod 600 "$_secret_file"
        echo -e "${GREEN}✓ BETTER_AUTH_SECRET generated → $_secret_file${NC}"
@@ -0,0 +1,23 @@
+#!/usr/bin/env python3
+"""CLI wrapper for the async/thread boundary detector."""
+
+from __future__ import annotations
+
+import sys
+from collections.abc import Sequence
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+TEST_SUPPORT_PATH = REPO_ROOT / "backend" / "tests"
+if str(TEST_SUPPORT_PATH) not in sys.path:
+    sys.path.insert(0, str(TEST_SUPPORT_PATH))
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    from support.detectors.thread_boundaries import main as detector_main
+
+    return detector_main(argv)
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -62,27 +62,129 @@ done

 # ── Stop helper ──────────────────────────────────────────────────────────────

-_kill_port() {
+_is_repo_pid() {
+    local pid=$1
+    lsof -p "$pid" 2>/dev/null | grep -F "$REPO_ROOT" >/dev/null
+}
+
+_kill_repo_processes() {
+    local pattern=$1
+    local pid
+    local pids=""
+
+    while IFS= read -r pid; do
+        if [ -n "$pid" ] && _is_repo_pid "$pid"; then
+            case " $pids " in
+                *" $pid "*) ;;
+                *) pids="$pids $pid" ;;
+            esac
+        fi
+    done < <(pgrep -f "$pattern" 2>/dev/null || true)
+
+    if [ -n "$pids" ]; then
+        kill $pids 2>/dev/null || true
+    fi
+}
+
+_kill_repo_port() {
    local port=$1
    local pid
-    pid=$(lsof -ti :"$port" 2>/dev/null) || true
-    if [ -n "$pid" ]; then
-        kill -9 $pid 2>/dev/null || true
+    local pids=""
+
+    while IFS= read -r pid; do
+        if [ -n "$pid" ] && _is_repo_pid "$pid"; then
+            case " $pids " in
+                *" $pid "*) ;;
+                *) pids="$pids $pid" ;;
+            esac
+        fi
+    done < <(lsof -nP -iTCP:"$port" -sTCP:LISTEN -t 2>/dev/null || true)
+
+    if [ -n "$pids" ]; then
+        kill -9 $pids 2>/dev/null || true
+    fi
+}
+
+_is_port_listening() {
+    local port=$1
+
+    if command -v lsof >/dev/null 2>&1; then
+        if lsof -nP -iTCP:"$port" -sTCP:LISTEN -t >/dev/null 2>&1; then
+            return 0
+        fi
+    fi
+
+    if command -v ss >/dev/null 2>&1; then
+        if ss -ltn "( sport = :$port )" 2>/dev/null | tail -n +2 | grep -q .; then
+            return 0
+        fi
+    fi
+
+    if command -v netstat >/dev/null 2>&1; then
+        if netstat -ltn 2>/dev/null | awk '{print $4}' | grep -Eq "(^|[.:])${port}$"; then
+            return 0
+        fi
+    fi
+
+    return 1
+}
+
+_is_repo_nginx_pid() {
+    local pid=$1
+    local command
+    local args
+
+    command=$(ps -p "$pid" -o comm= 2>/dev/null) || return 1
+    case "$command" in
+        nginx|*/nginx) ;;
+        *) return 1 ;;
+    esac
+
+    args=$(ps -p "$pid" -o args= 2>/dev/null) || return 1
+    case "$args" in
+        *"$REPO_ROOT/docker/nginx/nginx.local.conf"*|*"$REPO_ROOT"*) return 0 ;;
+    esac
+
+    _is_repo_pid "$pid"
+}
+
+_kill_repo_nginx() {
+    local pid
+    local pids=""
+
+    if [ -f "$REPO_ROOT/logs/nginx.pid" ]; then
+        read -r pid < "$REPO_ROOT/logs/nginx.pid" || true
+        if [ -n "$pid" ] && _is_repo_nginx_pid "$pid"; then
+            pids="$pids $pid"
+        fi
+    fi
+
+    while IFS= read -r pid; do
+        if [ -n "$pid" ] && _is_repo_nginx_pid "$pid"; then
+            case " $pids " in
+                *" $pid "*) ;;
+                *) pids="$pids $pid" ;;
+            esac
+        fi
+    done < <(pgrep -f nginx 2>/dev/null || true)
+
+    if [ -n "$pids" ]; then
+        kill -9 $pids 2>/dev/null || true
    fi
 }

 stop_all() {
    echo "Stopping all services..."
-    pkill -f "uvicorn app.gateway.app:app" 2>/dev/null || true
-    pkill -f "next dev" 2>/dev/null || true
-    pkill -f "next start" 2>/dev/null || true
-    pkill -f "next-server" 2>/dev/null || true
+    _kill_repo_processes "uvicorn app.gateway.app:app"
+    _kill_repo_processes "next dev"
+    _kill_repo_processes "next start"
+    _kill_repo_processes "next-server"
    nginx -c "$REPO_ROOT/docker/nginx/nginx.local.conf" -p "$REPO_ROOT" -s quit 2>/dev/null || true
    sleep 1
-    pkill -9 nginx 2>/dev/null || true
+    _kill_repo_nginx
    # Force-kill any survivors still holding the service ports
-    _kill_port 8001
-    _kill_port 3000
+    _kill_repo_port 8001
+    _kill_repo_port 3000
    ./scripts/cleanup-containers.sh deer-flow-sandbox 2>/dev/null || true
    echo "✓ All services stopped"
 }
@@ -216,13 +318,15 @@ echo ""
 # ── Cleanup handler ──────────────────────────────────────────────────────────

 cleanup() {
+    local status="${1:-0}"
    trap - INT TERM
    echo ""
    stop_all
-    exit 0
+    exit "$status"
 }

-trap cleanup INT TERM
+trap 'cleanup 130' INT
+trap 'cleanup 143' TERM

 # ── Helper: start a service ──────────────────────────────────────────────────

@@ -231,6 +335,12 @@ trap cleanup INT TERM
 run_service() {
    local name="$1" cmd="$2" port="$3" timeout="$4"

+    if _is_port_listening "$port"; then
+        echo "✗ $name cannot start because port $port is already in use."
+        echo "  If it belongs to this worktree, run 'make stop'; otherwise free the port manually."
+        cleanup 1
+    fi
+
    echo "Starting $name..."
    if $DAEMON_MODE; then
        nohup sh -c "$cmd" > /dev/null 2>&1 &
@@ -242,7 +352,7 @@ run_service() {
        local logfile="logs/$(echo "$name" | tr '[:upper:]' '[:lower:]' | tr ' ' '-').log"
        echo "✗ $name failed to start."
        [ -f "$logfile" ] && tail -20 "$logfile"
-        cleanup
+        cleanup 1
    }
    echo "✓ $name started on localhost:$port"
 }