fix(auth): use getBackendBaseURL() in auth-related fetch calls

Auth pages (login, setup) and components (AuthProvider, account-settings, workspace layout) used hardcoded relative paths like /api/v1/auth/... instead of the configurable getBackendBaseURL() used by the rest of the codebase. This prevented them from reaching the backend when NEXT_PUBLIC_BACKEND_BASE_URL was set to a different origin. Closes #2859
perf(harness): push thread metadata filters into SQL (#2865 )
2026-05-13 15:46:29 +08:00 · 2026-05-12 23:21:22 +08:00 · 2026-05-12 23:18:54 +08:00 · 2026-05-12 23:15:11 +08:00 · 2026-05-12 23:07:11 +08:00 · 2026-05-12 16:19:21 +08:00
97 changed files with 4595 additions and 833 deletions
@@ -9,8 +9,9 @@ JINA_API_KEY=your-jina-api-key

 # InfoQuest API Key
 INFOQUEST_API_KEY=your-infoquest-api-key
-# CORS Origins (comma-separated) - e.g., http://localhost:3000,http://localhost:3001
-# CORS_ORIGINS=http://localhost:3000
+# Browser CORS allowlist for split-origin or port-forwarded deployments (comma-separated exact origins).
+# Leave unset when using the unified nginx endpoint, e.g. http://localhost:2026.
+# GATEWAY_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000

 # Optional:
 # FIRECRAWL_API_KEY=your-firecrawl-api-key
@@ -46,12 +46,12 @@ Docker provides a consistent, isolated environment with all dependencies pre-con
   All services will start with hot-reload enabled:
   - Frontend changes are automatically reloaded
   - Backend changes trigger automatic restart
-   - LangGraph server supports hot-reload
+   - Gateway-hosted LangGraph-compatible runtime supports hot-reload

 4. **Access the application**:
   - Web Interface: http://localhost:2026
   - API Gateway: http://localhost:2026/api/*
-   - LangGraph: http://localhost:2026/api/langgraph/*
+   - LangGraph-compatible API: http://localhost:2026/api/langgraph/*

 #### Docker Commands

@@ -94,7 +94,7 @@ Use these as practical starting points for development and review environments:
 If `make docker-init`, `make docker-start`, or `make docker-stop` fails on Linux with an error like below, your current user likely does not have permission to access the Docker daemon socket:

 ```text
-unable to get image 'deer-flow-dev-langgraph': permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock
+unable to get image 'deer-flow-gateway': permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock
 ```

 Recommended fix: add your current user to the `docker` group so Docker commands work without `sudo`.
@@ -131,9 +131,8 @@ Host Machine
 Docker Compose (deer-flow-dev)
  ├→ nginx (port 2026) ← Reverse proxy
  ├→ web (port 3000) ← Frontend with hot-reload
-  ├→ api (port 8001) ← Gateway API with hot-reload
-   ├→ langgraph (port 2024) ← LangGraph server with hot-reload
-   └→ provisioner (optional, port 8002) ← Started only in provisioner/K8s sandbox mode
+  ├→ gateway (port 8001) ← Gateway API + LangGraph-compatible runtime with hot-reload
+  └→ provisioner (optional, port 8002) ← Started only in provisioner/K8s sandbox mode
 ```

 **Benefits of Docker Development**:
@@ -184,17 +183,13 @@ Required tools:

 If you need to start services individually:

-1. **Start backend services**:
+1. **Start backend service**:
   ```bash
-   # Terminal 1: Start LangGraph Server (port 2024)
+   # Terminal 1: Start Gateway API + embedded agent runtime (port 8001)
   cd backend
   make dev

-   # Terminal 2: Start Gateway API (port 8001)
-   cd backend
-   make gateway
-
-   # Terminal 3: Start Frontend (port 3000)
+   # Terminal 2: Start Frontend (port 3000)
   cd frontend
   pnpm dev
   ```
@@ -212,10 +207,10 @@ If you need to start services individually:

 The nginx configuration provides:
 - Unified entry point on port 2026
- Routes `/api/langgraph/*` to LangGraph Server (2024)
+- Rewrites `/api/langgraph/*` to Gateway's LangGraph-compatible API (8001)
 - Routes other `/api/*` endpoints to Gateway API (8001)
 - Routes non-API requests to Frontend (3000)
- Centralized CORS handling
+- Same-origin API routing; split-origin or port-forwarded browser clients should use the Gateway `GATEWAY_CORS_ORIGINS` allowlist
 - SSE/streaming support for real-time agent responses
 - Optimized timeouts for long-running operations

@@ -235,8 +230,8 @@ deer-flow/
 │       └── nginx.local.conf # Nginx config for local dev
 ├── backend/                 # Backend application
 │   ├── src/
-│   │   ├── gateway/        # Gateway API (port 8001)
-│   │   ├── agents/         # LangGraph agents (port 2024)
+│   │   ├── gateway/        # Gateway API and LangGraph-compatible runtime (port 8001)
+│   │   ├── agents/         # LangGraph agent runtime used by Gateway
 │   │   ├── mcp/            # Model Context Protocol integration
 │   │   ├── skills/         # Skills system
 │   │   └── sandbox/        # Sandbox execution
@@ -256,8 +251,7 @@ Browser
  ↓
 Nginx (port 2026) ← Unified entry point
  ├→ Frontend (port 3000) ← / (non-API requests)
-  ├→ Gateway API (port 8001) ← /api/models, /api/mcp, /api/skills, /api/threads/*/artifacts
-  └→ LangGraph Server (port 2024) ← /api/langgraph/* (agent interactions)
+  └→ Gateway API (port 8001) ← /api/* and /api/langgraph/* (LangGraph-compatible agent interactions)
 ```

 ## Development Workflow
@@ -245,6 +245,8 @@ make down   # Stop and remove containers

 Access: http://localhost:2026

+The unified nginx endpoint is same-origin by default and does not emit browser CORS headers. If you run a split-origin or port-forwarded browser client, set `GATEWAY_CORS_ORIGINS` to comma-separated exact origins such as `http://localhost:3000`; the Gateway then applies the CORS allowlist and matching CSRF origin checks.
+
 See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed Docker development guide.

 #### Option 2: Local Development
@@ -228,7 +228,7 @@ make down   # Stop and remove containers
 ```

 > [!NOTE]
-> Le serveur d'agents LangGraph fonctionne actuellement via `langgraph dev` (le serveur CLI open source).
+> Le runtime d'agent s'exécute actuellement dans la Gateway. nginx réécrit `/api/langgraph/*` vers l'API compatible LangGraph servie par la Gateway.

 Accès : http://localhost:2026

@@ -296,8 +296,8 @@ DeerFlow peut recevoir des tâches depuis des applications de messagerie. Les ca

 ```yaml
 channels:
-  # LangGraph Server URL (default: http://localhost:2024)
-  langgraph_url: http://localhost:2024
+  # LangGraph-compatible Gateway API base URL (default: http://localhost:8001/api)
+  langgraph_url: http://localhost:8001/api
  # Gateway API URL (default: http://localhost:8001)
  gateway_url: http://localhost:8001

@@ -181,7 +181,7 @@ make down   # コンテナを停止して削除
 ```

 > [!NOTE]
-> LangGraphエージェントサーバーは現在`langgraph dev`（オープンソースCLIサーバー）経由で実行されます。
+> Agentランタイムは現在Gateway内で実行されます。`/api/langgraph/*`はnginxによってGatewayのLangGraph-compatible APIへ書き換えられます。

 アクセス: http://localhost:2026

@@ -249,8 +249,8 @@ DeerFlowはメッセージングアプリからのタスク受信をサポート

 ```yaml
 channels:
-  # LangGraphサーバーURL（デフォルト: http://localhost:2024）
-  langgraph_url: http://localhost:2024
+  # LangGraph-compatible Gateway API base URL（デフォルト: http://localhost:8001/api）
+  langgraph_url: http://localhost:8001/api
  # Gateway API URL（デフォルト: http://localhost:8001）
  gateway_url: http://localhost:8001

@@ -184,7 +184,7 @@ make down   # 停止并移除容器
 ```

 > [!NOTE]
-> 当前 LangGraph agent server 通过开源 CLI 服务 `langgraph dev` 运行。
+> 当前 Agent 运行时嵌入在 Gateway 中运行，`/api/langgraph/*` 会由 nginx 重写到 Gateway 的 LangGraph-compatible API。

 访问地址：http://localhost:2026

@@ -254,8 +254,8 @@ DeerFlow 支持从即时通讯应用接收任务。只要配置完成，对应

 ```yaml
 channels:
-  # LangGraph Server URL（默认：http://localhost:2024）
-  langgraph_url: http://localhost:2024
+  # LangGraph-compatible Gateway API base URL（默认：http://localhost:8001/api）
+  langgraph_url: http://localhost:8001/api
  # Gateway API URL（默认：http://localhost:8001）
  gateway_url: http://localhost:8001

@@ -207,6 +207,8 @@ Configuration priority:

 FastAPI application on port 8001 with health check at `GET /health`. Set `GATEWAY_ENABLE_DOCS=false` to disable `/docs`, `/redoc`, and `/openapi.json` in production (default: enabled).

+CORS is same-origin by default when requests enter through nginx on port 2026. Split-origin or port-forwarded browser clients must opt in with `GATEWAY_CORS_ORIGINS` (comma-separated exact origins); Gateway `CORSMiddleware` and `CSRFMiddleware` both read that variable so browser CORS and auth-origin checks stay aligned.
+
 **Routers**:

 | Router | Endpoints |
@@ -223,7 +225,7 @@ FastAPI application on port 8001 with health check at `GET /health`. Set `GATEWA
 | **Feedback** (`/api/threads/{id}/runs/{rid}/feedback`) | `PUT /` - upsert feedback; `DELETE /` - delete user feedback; `POST /` - create feedback; `GET /` - list feedback; `GET /stats` - aggregate stats; `DELETE /{fid}` - delete specific |
 | **Runs** (`/api/runs`) | `POST /stream` - stateless run + SSE; `POST /wait` - stateless run + block; `GET /{rid}/messages` - paginated messages by run_id `{data, has_more}` (cursor: `after_seq`/`before_seq`); `GET /{rid}/feedback` - list feedback by run_id |

-Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` → Gateway.
+Proxied through nginx: `/api/langgraph/*` → Gateway LangGraph-compatible runtime, all other `/api/*` → Gateway REST APIs.

 ### Sandbox System (`packages/harness/deerflow/sandbox/`)

@@ -243,7 +245,7 @@ Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` →
 - `bash` - Execute commands with path translation and error handling
 - `ls` - Directory listing (tree format, max 2 levels)
 - `read_file` - Read file contents with optional line range
- `write_file` - Write/append to files, creates directories
+- `write_file` - Write/append to files, creates directories; overwrites by default and exposes the `append` argument in the model-facing schema for end-of-file writes
 - `str_replace` - Substring replacement (single or all occurrences); same-path serialization is scoped to `(sandbox.id, path)` so isolated sandboxes do not contend on identical virtual paths inside one process

 ### Subagent System (`packages/harness/deerflow/subagents/`)
@@ -56,11 +56,8 @@ export OPENAI_API_KEY="your-api-key"
 ### Run the Development Server

 ```bash
-# Terminal 1: LangGraph server
+# Gateway API + embedded agent runtime
 make dev
-
-# Terminal 2: Gateway API
-make gateway
 ```

 ## Project Structure
@@ -11,31 +11,26 @@ DeerFlow is a LangGraph-based AI super agent with sandbox execution, persistent
                        │          Nginx (Port 2026)           │
                        │      Unified reverse proxy           │
                        └───────┬──────────────────┬───────────┘
-                                │                  │
-              /api/langgraph/*  │                  │  /api/* (other)
-                                ▼                  ▼
-               ┌────────────────────┐  ┌────────────────────────┐
-               │ LangGraph Server   │  │   Gateway API (8001)   │
-               │    (Port 2024)     │  │   FastAPI REST         │
-               │                    │  │                        │
-               │ ┌────────────────┐ │  │ Models, MCP, Skills,   │
-               │ │  Lead Agent    │ │  │ Memory, Uploads,       │
-               │ │  ┌──────────┐  │ │  │ Artifacts              │
-               │ │  │Middleware│  │ │  └────────────────────────┘
-               │ │  │  Chain   │  │ │
-               │ │  └──────────┘  │ │
-               │ │  ┌──────────┐  │ │
-               │ │  │  Tools   │  │ │
-               │ │  └──────────┘  │ │
-               │ │  ┌──────────┐  │ │
-               │ │  │Subagents │  │ │
-               │ │  └──────────┘  │ │
-               │ └────────────────┘ │
-               └────────────────────┘
+                                │
+            /api/langgraph/*    │    /api/* (other)
+            rewritten to /api/* │
+                                ▼
+               ┌────────────────────────────────────────┐
+               │        Gateway API (8001)              │
+               │        FastAPI REST + agent runtime    │
+               │                                        │
+               │ Models, MCP, Skills, Memory, Uploads,  │
+               │ Artifacts, Threads, Runs, Streaming    │
+               │                                        │
+               │ ┌────────────────────────────────────┐ │
+               │ │ Lead Agent                         │ │
+               │ │ Middleware Chain, Tools, Subagents │ │
+               │ └────────────────────────────────────┘ │
+               └────────────────────────────────────────┘
 ```

 **Request Routing** (via Nginx):
- `/api/langgraph/*` → LangGraph Server - agent interactions, threads, streaming
+- `/api/langgraph/*` → Gateway LangGraph-compatible API - agent interactions, threads, streaming
 - `/api/*` (other) → Gateway API - models, MCP, skills, memory, artifacts, uploads, thread-local cleanup
 - `/` (non-API) → Frontend - Next.js web interface

@@ -79,7 +74,7 @@ Per-thread isolated execution with virtual path translation:
 - **Skills path**: `/mnt/skills` → `deer-flow/skills/` directory
 - **Skills loading**: Recursively discovers nested `SKILL.md` files under `skills/{public,custom}` and preserves nested container paths
 - **File-write safety**: `str_replace` serializes read-modify-write per `(sandbox.id, path)` so isolated sandboxes keep concurrency even when virtual paths match
- **Tools**: `bash`, `ls`, `read_file`, `write_file`, `str_replace` (`bash` is disabled by default when using `LocalSandboxProvider`; use `AioSandboxProvider` for isolated shell access)
+- **Tools**: `bash`, `ls`, `read_file`, `write_file`, `str_replace` (`write_file` overwrites by default and exposes `append` for end-of-file writes; `bash` is disabled by default when using `LocalSandboxProvider`; use `AioSandboxProvider` for isolated shell access)

 ### Subagent System

@@ -193,7 +188,7 @@ export OPENAI_API_KEY="your-api-key-here"
 **Full Application** (from project root):

 ```bash
-make dev  # Starts LangGraph + Gateway + Frontend + Nginx
+make dev  # Starts Gateway + Frontend + Nginx
 ```

 Access at: http://localhost:2026
@@ -201,14 +196,11 @@ Access at: http://localhost:2026
 **Backend Only** (from backend directory):

 ```bash
-# Terminal 1: LangGraph server
+# Gateway API + embedded agent runtime
 make dev
-
-# Terminal 2: Gateway API
-make gateway
 ```

-Direct access: LangGraph at http://localhost:2024, Gateway at http://localhost:8001
+Direct access: Gateway at http://localhost:8001

 ---

@@ -244,12 +236,16 @@ backend/
 │   └── utils/                  # Utilities
 ├── docs/                       # Documentation
 ├── tests/                      # Test suite
-├── langgraph.json              # LangGraph server configuration
+├── langgraph.json              # LangGraph graph registry for tooling/Studio compatibility
 ├── pyproject.toml              # Python dependencies
 ├── Makefile                    # Development commands
 └── Dockerfile                  # Container build
 ```

+`langgraph.json` is not the default service entrypoint.  The scripts and Docker
+deployments run the Gateway embedded runtime; the file is kept for LangGraph
+tooling, Studio, or direct LangGraph Server compatibility.
+
 ---

 ## Configuration
@@ -362,8 +358,8 @@ If a provider is explicitly enabled but required credentials are missing, or the

 ```bash
 make install    # Install dependencies
-make dev        # Run LangGraph server (port 2024)
-make gateway    # Run Gateway API (port 8001)
+make dev        # Run Gateway API + embedded agent runtime (port 8001)
+make gateway    # Run Gateway API without reload (port 8001)
 make lint       # Run linter (ruff)
 make format     # Format code (ruff)
 ```
@@ -1,6 +1,5 @@
 import asyncio
 import logging
-import os
 from collections.abc import AsyncGenerator
 from contextlib import asynccontextmanager

@@ -9,7 +8,7 @@ from fastapi.middleware.cors import CORSMiddleware

 from app.gateway.auth_middleware import AuthMiddleware
 from app.gateway.config import get_gateway_config
-from app.gateway.csrf_middleware import CSRFMiddleware
+from app.gateway.csrf_middleware import CSRFMiddleware, get_configured_cors_origins
 from app.gateway.deps import langgraph_runtime
 from app.gateway.routers import (
    agents,
@@ -63,7 +62,7 @@ async def _ensure_admin_user(app: FastAPI) -> None:

    Subsequent boots (admin already exists):
      - Runs the one-time "no-auth → with-auth" orphan thread migration for
-        existing LangGraph thread metadata that has no owner_id.
+        existing LangGraph thread metadata that has no user_id.

    No SQL persistence migration is needed: the four user_id columns
    (threads_meta, runs, run_events, feedback) only come into existence
@@ -178,7 +177,7 @@ async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    async with langgraph_runtime(app):
        logger.info("LangGraph runtime initialised")

-        # Ensure admin user exists (auto-create on first boot)
+        # Check admin bootstrap state and migrate orphan threads after admin exists.
        # Must run AFTER langgraph_runtime so app.state.store is available for thread migration
        await _ensure_admin_user(app)

@@ -219,7 +218,9 @@ def create_app() -> FastAPI:
        Configured FastAPI application instance.
    """
    config = get_gateway_config()
-    docs_kwargs = {"docs_url": "/docs", "redoc_url": "/redoc", "openapi_url": "/openapi.json"} if config.enable_docs else {"docs_url": None, "redoc_url": None, "openapi_url": None}
+    docs_url = "/docs" if config.enable_docs else None
+    redoc_url = "/redoc" if config.enable_docs else None
+    openapi_url = "/openapi.json" if config.enable_docs else None

    app = FastAPI(
        title="DeerFlow API Gateway",
@@ -239,12 +240,14 @@ API Gateway for DeerFlow - A LangGraph-based AI agent backend with sandbox execu

 ### Architecture

-LangGraph requests are handled by nginx reverse proxy.
-This gateway provides custom endpoints for models, MCP configuration, skills, and artifacts.
+LangGraph-compatible requests are routed through nginx to this gateway.
+This gateway provides runtime endpoints for agent runs plus custom endpoints for models, MCP configuration, skills, and artifacts.
        """,
        version="0.1.0",
        lifespan=lifespan,
-        **docs_kwargs,
+        docs_url=docs_url,
+        redoc_url=redoc_url,
+        openapi_url=openapi_url,
        openapi_tags=[
            {
                "name": "models",
@@ -307,25 +310,18 @@ This gateway provides custom endpoints for models, MCP configuration, skills, an
    # CSRF: Double Submit Cookie pattern for state-changing requests
    app.add_middleware(CSRFMiddleware)

-    # CORS: when GATEWAY_CORS_ORIGINS is set (dev without nginx), add CORS middleware.
-    # In production, nginx handles CORS and no middleware is needed.
-    cors_origins_env = os.environ.get("GATEWAY_CORS_ORIGINS", "")
-    if cors_origins_env:
-        cors_origins = [o.strip() for o in cors_origins_env.split(",") if o.strip()]
-        # Validate: wildcard origin with credentials is a security misconfiguration
-        for origin in cors_origins:
-            if origin == "*":
-                logger.error("GATEWAY_CORS_ORIGINS contains wildcard '*' with allow_credentials=True. This is a security misconfiguration — browsers will reject the response. Use explicit scheme://host:port origins instead.")
-                cors_origins = [o for o in cors_origins if o != "*"]
-                break
-        if cors_origins:
-            app.add_middleware(
-                CORSMiddleware,
-                allow_origins=cors_origins,
-                allow_credentials=True,
-                allow_methods=["*"],
-                allow_headers=["*"],
-            )
+    # CORS: the unified nginx endpoint is same-origin by default. Split-origin
+    # browser clients must opt in with this explicit Gateway allowlist so CORS
+    # and CSRF origin checks share the same source of truth.
+    cors_origins = sorted(get_configured_cors_origins())
+    if cors_origins:
+        app.add_middleware(
+            CORSMiddleware,
+            allow_origins=cors_origins,
+            allow_credentials=True,
+            allow_methods=["*"],
+            allow_headers=["*"],
+        )

    # Include routers
    # Models API is mounted at /api/models
@@ -374,7 +370,7 @@ This gateway provides custom endpoints for models, MCP configuration, skills, an
    app.include_router(runs.router)

    @app.get("/health", tags=["health"])
-    async def health_check() -> dict:
+    async def health_check() -> dict[str, str]:
        """Health check endpoint.

        Returns:
@@ -28,7 +28,7 @@ class User(BaseModel):
    oauth_id: str | None = Field(None, description="User ID from OAuth provider")

    # Auth lifecycle
-    needs_setup: bool = Field(default=False, description="True for auto-created admin until setup completes")
+    needs_setup: bool = Field(default=False, description="True when a reset account must complete setup")
    token_version: int = Field(default=0, description="Incremented on password change to invalidate old JWTs")


@@ -8,7 +8,6 @@ class GatewayConfig(BaseModel):

    host: str = Field(default="0.0.0.0", description="Host to bind the gateway server")
    port: int = Field(default=8001, description="Port to bind the gateway server")
-    cors_origins: list[str] = Field(default_factory=lambda: ["http://localhost:3000"], description="Allowed CORS origins")
    enable_docs: bool = Field(default=True, description="Enable Swagger/ReDoc/OpenAPI endpoints")


@@ -19,11 +18,9 @@ def get_gateway_config() -> GatewayConfig:
    """Get gateway config, loading from environment if available."""
    global _gateway_config
    if _gateway_config is None:
-        cors_origins_str = os.getenv("CORS_ORIGINS", "http://localhost:3000")
        _gateway_config = GatewayConfig(
            host=os.getenv("GATEWAY_HOST", "0.0.0.0"),
            port=int(os.getenv("GATEWAY_PORT", "8001")),
-            cors_origins=cors_origins_str.split(","),
            enable_docs=os.getenv("GATEWAY_ENABLE_DOCS", "true").lower() == "true",
        )
    return _gateway_config
@@ -6,7 +6,7 @@ State-changing operations require CSRF protection.

 import os
 import secrets
-from collections.abc import Callable
+from collections.abc import Awaitable, Callable
 from urllib.parse import urlsplit

 from fastapi import Request, Response
@@ -106,6 +106,11 @@ def _configured_cors_origins() -> set[str]:
    return origins


+def get_configured_cors_origins() -> set[str]:
+    """Return normalized explicit browser origins from GATEWAY_CORS_ORIGINS."""
+    return _configured_cors_origins()
+
+
 def _first_header_value(value: str | None) -> str | None:
    """Return the first value from a comma-separated proxy header."""
    if not value:
@@ -172,7 +177,7 @@ class CSRFMiddleware(BaseHTTPMiddleware):
    def __init__(self, app: ASGIApp) -> None:
        super().__init__(app)

-    async def dispatch(self, request: Request, call_next: Callable) -> Response:
+    async def dispatch(self, request: Request, call_next: Callable[[Request], Awaitable[Response]]) -> Response:
        _is_auth = is_auth_endpoint(request)

        if should_check_csrf(request) and _is_auth and not is_allowed_auth_origin(request):
@@ -1,8 +1,12 @@
-"""LangGraph Server auth handler — shares JWT logic with Gateway.
+"""LangGraph compatibility auth handler — shares JWT logic with Gateway.

-Loaded by LangGraph Server via langgraph.json ``auth.path``.
-Reuses the same ``decode_token`` / ``get_auth_config`` as Gateway,
-so both modes validate tokens with the same secret and rules.
+The default DeerFlow runtime is embedded in the FastAPI Gateway; scripts and
+Docker deployments do not load this module.  It is retained for LangGraph
+tooling, Studio, or direct LangGraph Server compatibility through
+``langgraph.json``'s ``auth.path``.
+
+When that compatibility path is used, this module reuses the same JWT and CSRF
+rules as Gateway so both modes validate sessions consistently.

 Two layers:
  1. @auth.authenticate — validates JWT cookie, extracts user_id,
@@ -305,7 +305,7 @@ async def login_local(
 async def register(request: Request, response: Response, body: RegisterRequest):
    """Register a new user account (always 'user' role).

-    Admin is auto-created on first boot. This endpoint creates regular users.
+    The first admin is created explicitly through /initialize. This endpoint creates regular users.
    Auto-login by setting the session cookie.
    """
    try:
@@ -90,6 +90,28 @@ class ThreadSearchRequest(BaseModel):
    offset: int = Field(default=0, ge=0, description="Pagination offset")
    status: str | None = Field(default=None, description="Filter by thread status")

+    @field_validator("metadata")
+    @classmethod
+    def _validate_metadata_filters(cls, v: dict[str, Any]) -> dict[str, Any]:
+        """Reject filter entries the SQL backend cannot compile.
+
+        Enforces consistent behaviour across SQL and memory backends.
+        See ``deerflow.persistence.json_compat`` for the shared validators.
+        """
+        if not v:
+            return v
+        from deerflow.persistence.json_compat import validate_metadata_filter_key, validate_metadata_filter_value
+
+        bad_entries: list[str] = []
+        for key, value in v.items():
+            if not validate_metadata_filter_key(key):
+                bad_entries.append(f"{key!r} (unsafe key)")
+            elif not validate_metadata_filter_value(value):
+                bad_entries.append(f"{key!r} (unsupported value type {type(value).__name__})")
+        if bad_entries:
+            raise ValueError(f"Invalid metadata filter entries: {', '.join(bad_entries)}")
+        return v
+

 class ThreadStateResponse(BaseModel):
    """Response model for thread state."""
@@ -294,14 +316,18 @@ async def search_threads(body: ThreadSearchRequest, request: Request) -> list[Th
    (SQL-backed for sqlite/postgres, Store-backed for memory mode).
    """
    from app.gateway.deps import get_thread_store
+    from deerflow.persistence.thread_meta import InvalidMetadataFilterError

    repo = get_thread_store(request)
-    rows = await repo.search(
-        metadata=body.metadata or None,
-        status=body.status,
-        limit=body.limit,
-        offset=body.offset,
-    )
+    try:
+        rows = await repo.search(
+            metadata=body.metadata or None,
+            status=body.status,
+            limit=body.limit,
+            offset=body.offset,
+        )
+    except InvalidMetadataFilterError as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
    return [
        ThreadResponse(
            thread_id=r["thread_id"],
@@ -19,6 +19,7 @@ from langchain_core.messages import HumanMessage

 from app.gateway.deps import get_run_context, get_run_manager, get_stream_bridge
 from app.gateway.utils import sanitize_log_param
+from deerflow.config.app_config import get_app_config
 from deerflow.runtime import (
    END_SENTINEL,
    HEARTBEAT_SENTINEL,
@@ -267,6 +268,23 @@ async def start_run(

    disconnect = DisconnectMode.cancel if body.on_disconnect == "cancel" else DisconnectMode.continue_

+    body_context = getattr(body, "context", None) or {}
+    model_name = body_context.get("model_name")
+
+    # Coerce non-string model_name values to str before truncation.
+    if model_name is not None and not isinstance(model_name, str):
+        model_name = str(model_name)
+
+    # Validate model against the allowlist when a model_name is provided.
+    if model_name:
+        app_config = get_app_config()
+        resolved = app_config.get_model_config(model_name)
+        if resolved is None:
+            raise HTTPException(
+                status_code=400,
+                detail=f"Model {model_name!r} is not in the configured model allowlist",
+            )
+
    try:
        record = await run_mgr.create_or_reject(
            thread_id,
@@ -275,6 +293,7 @@ async def start_run(
            metadata=body.metadata or {},
            kwargs={"input": body.input, "config": body.config},
            multitask_strategy=body.multitask_strategy,
+            model_name=model_name,
        )
    except ConflictError as exc:
        raise HTTPException(status_code=409, detail=str(exc)) from exc
@@ -6,16 +6,16 @@ This document provides a complete reference for the DeerFlow backend APIs.

 DeerFlow backend exposes two sets of APIs:

-1. **LangGraph API** - Agent interactions, threads, and streaming (`/api/langgraph/*`)
+1. **LangGraph-compatible API** - Agent interactions, threads, and streaming (`/api/langgraph/*`)
 2. **Gateway API** - Models, MCP, skills, uploads, and artifacts (`/api/*`)

 All APIs are accessed through the Nginx reverse proxy at port 2026.

-## LangGraph API
+## LangGraph-compatible API

 Base URL: `/api/langgraph`

-The LangGraph API is provided by the LangGraph server and follows the LangGraph SDK conventions.
+The public LangGraph-compatible API follows LangGraph SDK conventions. In the unified nginx deployment, Gateway owns `/api/langgraph/*` and translates those paths to its native `/api/*` run, thread, and streaming routers.

 ### Threads

@@ -104,17 +104,11 @@ Content-Type: application/json
 **Recursion Limit:**

 `config.recursion_limit` caps the number of graph steps LangGraph will execute
-in a single run. The `/api/langgraph/*` endpoints go straight to the LangGraph
-server and therefore inherit LangGraph's native default of **25**, which is
-too low for plan-mode or subagent-heavy runs — the agent typically errors out
-with `GraphRecursionError` after the first round of subagent results comes
-back, before the lead agent can synthesize the final answer.
-
-DeerFlow's own Gateway and IM-channel paths mitigate this by defaulting to
-`100` in `build_run_config` (see `backend/app/gateway/services.py`), but
-clients calling the LangGraph API directly must set `recursion_limit`
-explicitly in the request body. `100` matches the Gateway default and is a
-safe starting point; increase it if you run deeply nested subagent graphs.
+in a single run. The unified Gateway path defaults to `100` in
+`build_run_config` (see `backend/app/gateway/services.py`), which is a safer
+starting point for plan-mode or subagent-heavy runs. Clients can still set
+`recursion_limit` explicitly in the request body; increase it if you run deeply
+nested subagent graphs.

 **Configurable Options:**
 - `model_name` (string): Override the default model
@@ -541,14 +535,28 @@ All APIs return errors in a consistent format:

 ## Authentication

-Currently, DeerFlow does not implement authentication. All APIs are accessible without credentials.
+DeerFlow enforces authentication for all non-public HTTP routes. Public routes are limited to health/docs metadata and these public auth endpoints:

-Note: This is about DeerFlow API authentication. MCP outbound connections can still use OAuth for configured HTTP/SSE MCP servers.
+- `POST /api/v1/auth/initialize` creates the first admin account when no admin exists.
+- `POST /api/v1/auth/login/local` logs in with email/password and sets an HttpOnly `access_token` cookie.
+- `POST /api/v1/auth/register` creates a regular `user` account and sets the session cookie.
+- `POST /api/v1/auth/logout` clears the session cookie.
+- `GET /api/v1/auth/setup-status` reports whether the first admin still needs to be created.

-For production deployments, it is recommended to:
-1. Use Nginx for basic auth or OAuth integration
-2. Deploy behind a VPN or private network
-3. Implement custom authentication middleware
+The authenticated auth endpoints are:
+
+- `GET /api/v1/auth/me` returns the current user.
+- `POST /api/v1/auth/change-password` changes password, optionally changes email during setup, increments `token_version`, and reissues the cookie.
+
+Protected state-changing requests also require the CSRF double-submit token: send the `csrf_token` cookie value as the `X-CSRF-Token` header. Login/register/initialize/logout are bootstrap auth endpoints: they are exempt from the double-submit token but still reject hostile browser `Origin` headers.
+
+User isolation is enforced from the authenticated user context:
+
+- Thread metadata is scoped by `threads_meta.user_id`; search/read/write/delete APIs only expose the current user's threads.
+- Thread files live under `{base_dir}/users/{user_id}/threads/{thread_id}/user-data/` and are exposed inside the sandbox as `/mnt/user-data/`.
+- Memory and custom agents are stored under `{base_dir}/users/{user_id}/...`.
+
+Note: MCP outbound connections can still use OAuth for configured HTTP/SSE MCP servers; that is separate from DeerFlow API authentication.

 ---

@@ -567,12 +575,13 @@ location /api/ {

 ---

-## WebSocket Support
+## Streaming Support

-The LangGraph server supports WebSocket connections for real-time streaming. Connect to:
+Gateway's LangGraph-compatible API streams run events with Server-Sent Events (SSE):

-```
-ws://localhost:2026/api/langgraph/threads/{thread_id}/runs/stream
+```http
+POST /api/langgraph/threads/{thread_id}/runs/stream
+Accept: text/event-stream
 ```

 ---
@@ -608,13 +617,21 @@ const response = await fetch('/api/models');
 const data = await response.json();
 console.log(data.models);

-// Using EventSource for streaming
-const eventSource = new EventSource(
-  `/api/langgraph/threads/${threadId}/runs/stream`
-);
-eventSource.onmessage = (event) => {
-  console.log(JSON.parse(event.data));
-};
+// Create a run and stream SSE events
+const streamResponse = await fetch(`/api/langgraph/threads/${threadId}/runs/stream`, {
+  method: "POST",
+  headers: {
+    "Content-Type": "application/json",
+    Accept: "text/event-stream",
+  },
+  body: JSON.stringify({
+    input: { messages: [{ role: "user", content: "Hello" }] },
+    stream_mode: ["values", "messages-tuple", "custom"],
+  }),
+});
+
+const reader = streamResponse.body?.getReader();
+// Decode and parse SSE frames from reader in your client code.
 ```

 ### cURL Examples
@@ -649,7 +666,7 @@ curl -X POST http://localhost:2026/api/langgraph/threads/abc123/runs \
  }'
 ```

-> The `/api/langgraph/*` endpoints bypass DeerFlow's Gateway and inherit
-> LangGraph's native `recursion_limit` default of 25, which is too low for
-> plan-mode or subagent runs. Set `config.recursion_limit` explicitly — see
-> the [Create Run](#create-run) section for details.
+> The unified Gateway path defaults `config.recursion_limit` to 100 for
+> plan-mode and subagent-heavy runs. Clients may still set
+> `config.recursion_limit` explicitly — see the [Create Run](#create-run)
+> section for details.
@@ -14,30 +14,28 @@ This document provides a comprehensive overview of the DeerFlow backend architec
 │                          Nginx (Port 2026)                               │
 │                    Unified Reverse Proxy Entry Point                      │
 │  ┌────────────────────────────────────────────────────────────────────┐  │
-│  │  /api/langgraph/*  →  LangGraph Server (2024)                      │  │
-│  │  /api/*            →  Gateway API (8001)                           │  │
+│  │  /api/langgraph/*  →  Gateway LangGraph-compatible runtime (8001)  │  │
+│  │  /api/*            →  Gateway REST APIs (8001)                     │  │
 │  │  /*                →  Frontend (3000)                               │  │
 │  └────────────────────────────────────────────────────────────────────┘  │
 └─────────────────────────────────┬────────────────────────────────────────┘
                                  │
-          ┌───────────────────────┼───────────────────────┐
-          │                       │                       │
-          ▼                       ▼                       ▼
-┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
-│   LangGraph Server  │ │    Gateway API      │ │     Frontend        │
-│     (Port 2024)     │ │    (Port 8001)      │ │    (Port 3000)      │
-│                     │ │                     │ │                     │
-│  - Agent Runtime    │ │  - Models API       │ │  - Next.js App      │
-│  - Thread Mgmt      │ │  - MCP Config       │ │  - React UI         │
-│  - SSE Streaming    │ │  - Skills Mgmt      │ │  - Chat Interface   │
-│  - Checkpointing    │ │  - File Uploads     │ │                     │
-│                     │ │  - Thread Cleanup   │ │                     │
-│                     │ │  - Artifacts        │ │                     │
-└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
-          │                       │
-          │     ┌─────────────────┘
-          │     │
-          ▼     ▼
+          ┌───────────────────────┴───────────────────────┐
+          │                                               │
+          ▼                                               ▼
+┌─────────────────────────────────────────────┐ ┌─────────────────────┐
+│              Gateway API                    │ │     Frontend        │
+│              (Port 8001)                    │ │    (Port 3000)      │
+│                                             │ │                     │
+│  - LangGraph-compatible runs/threads API    │ │  - Next.js App      │
+│  - Embedded Agent Runtime                   │ │  - React UI         │
+│  - SSE Streaming                            │ │  - Chat Interface   │
+│  - Checkpointing                            │ │                     │
+│  - Models, MCP, Skills, Uploads, Artifacts  │ │                     │
+│  - Thread Cleanup                           │ │                     │
+└─────────────────────────────────────────────┘ └─────────────────────┘
+          │
+          ▼
 ┌──────────────────────────────────────────────────────────────────────────┐
 │                         Shared Configuration                              │
 │  ┌─────────────────────────┐  ┌────────────────────────────────────────┐ │
@@ -52,9 +50,9 @@ This document provides a comprehensive overview of the DeerFlow backend architec

 ## Component Details

-### LangGraph Server
+### Gateway Embedded Agent Runtime

-The LangGraph server is the core agent runtime, built on LangGraph for robust multi-agent workflow orchestration.
+The agent runtime is embedded in the FastAPI Gateway and built on LangGraph for robust multi-agent workflow orchestration. Nginx rewrites `/api/langgraph/*` to Gateway's native `/api/*` routes, so the public API remains compatible with LangGraph SDK clients without running a separate LangGraph server.

 **Entry Point**: `packages/harness/deerflow/agents/lead_agent/agent.py:make_lead_agent`

@@ -65,7 +63,8 @@ The LangGraph server is the core agent runtime, built on LangGraph for robust mu
 - Tool execution orchestration
 - SSE streaming for real-time responses

-**Configuration**: `langgraph.json`
+**Graph registry**: `langgraph.json` remains available for tooling, Studio, or direct LangGraph Server compatibility.
+It is not the default service entrypoint; scripts and Docker deployments run the Gateway embedded runtime.

 ```json
 {
@@ -78,12 +77,13 @@ The LangGraph server is the core agent runtime, built on LangGraph for robust mu

 ### Gateway API

-FastAPI application providing REST endpoints for non-agent operations.
+FastAPI application providing REST endpoints plus the public LangGraph-compatible `/api/langgraph/*` runtime routes.

 **Entry Point**: `app/gateway/app.py`

 **Routers**:
 - `models.py` - `/api/models` - Model listing and details
+- `thread_runs.py` / `runs.py` - `/api/threads/{id}/runs`, `/api/runs/*` - LangGraph-compatible runs and streaming
 - `mcp.py` - `/api/mcp` - MCP server configuration
 - `skills.py` - `/api/skills` - Skills management
 - `uploads.py` - `/api/threads/{id}/uploads` - File upload
@@ -91,7 +91,7 @@ FastAPI application providing REST endpoints for non-agent operations.
 - `artifacts.py` - `/api/threads/{id}/artifacts` - Artifact serving
 - `suggestions.py` - `/api/threads/{id}/suggestions` - Follow-up suggestion generation

-The web conversation delete flow is now split across both backend surfaces: LangGraph handles `DELETE /api/langgraph/threads/{thread_id}` for thread state, then the Gateway `threads.py` router removes DeerFlow-managed filesystem data via `Paths.delete_thread_dir()`.
+The web conversation delete flow first deletes Gateway-managed thread state through the LangGraph-compatible route, then the Gateway `threads.py` router removes DeerFlow-managed filesystem data via `Paths.delete_thread_dir()`.

 ### Agent Architecture

@@ -353,10 +353,10 @@ SKILL.md Format:
   POST /api/langgraph/threads/{thread_id}/runs
   {"input": {"messages": [{"role": "user", "content": "Hello"}]}}

-2. Nginx → LangGraph Server (2024)
-   Proxied to LangGraph server
+2. Nginx → Gateway API (8001)
+   `/api/langgraph/*` is rewritten to Gateway's LangGraph-compatible `/api/*` routes

-3. LangGraph Server
+3. Gateway embedded runtime
   a. Load/create thread state
   b. Execute middleware chain:
      - ThreadDataMiddleware: Set up paths
@@ -412,7 +412,7 @@ SKILL.md Format:
 ### Thread Cleanup Flow

 ```
-1. Client deletes conversation via LangGraph
+1. Client deletes conversation via the LangGraph-compatible Gateway route
   DELETE /api/langgraph/threads/{thread_id}

 2. Web UI follows up with Gateway cleanup
@@ -0,0 +1,331 @@
+# 用户认证与隔离设计
+
+本文档描述 DeerFlow 当前内置认证模块的设计，而不是历史 RFC。它覆盖浏览器登录、API 认证、CSRF、用户隔离、首次初始化、密码重置、内部调用和升级迁移。
+
+## 设计目标
+
+认证模块的核心目标是把 DeerFlow 从“本地单用户工具”提升为“可多用户部署的 agent runtime”，并让用户身份贯穿 HTTP API、LangGraph-compatible runtime、文件系统、memory、自定义 agent 和反馈数据。
+
+设计约束：
+
+- 默认强制认证：除健康检查、文档和 auth bootstrap 端点外，HTTP 路由都必须有有效 session。
+- 服务端持有所有权：客户端 metadata 不能声明 `user_id` 或 `owner_id`。
+- 隔离默认开启：repository（仓储）、文件路径、memory、agent 配置默认按当前用户解析。
+- 旧数据可升级：无认证版本留下的 thread 可以在 admin 存在后迁移到 admin。
+- 密码不进日志：首次初始化由操作者设置密码；`reset_admin` 只写 0600 凭据文件。
+
+非目标：
+
+- 当前 OAuth 端点只是占位，尚未实现第三方登录。
+- 当前用户角色只有 `admin` 和 `user`，尚未实现细粒度 RBAC。
+- 当前登录限速是进程内字典，多 worker 下不是全局精确限速。
+
+## 核心模型
+
+```mermaid
+graph TB
+  classDef actor fill:#D8CFC4,stroke:#6E6259,color:#2F2A26;
+  classDef api fill:#C9D7D2,stroke:#5D706A,color:#21302C;
+  classDef state fill:#D7D3E8,stroke:#6B6680,color:#29263A;
+  classDef data fill:#E5D2C4,stroke:#806A5B,color:#30251E;
+
+  Browser["Browser — access_token cookie and csrf_token cookie"]:::actor
+  AuthMiddleware["AuthMiddleware — strict session gate"]:::api
+  CSRFMiddleware["CSRFMiddleware — double-submit token and Origin check"]:::api
+  AuthRoutes["Auth routes — initialize login register logout me change-password"]:::api
+  UserContext["Current user ContextVar — request-scoped identity"]:::state
+  Repositories["Repositories — AUTO resolves user_id from context"]:::state
+  Files["Filesystem — users/{user_id}/threads/{thread_id}/user-data"]:::data
+  Memory["Memory and agents — users/{user_id}/memory.json and agents"]:::data
+
+  Browser --> AuthMiddleware
+  Browser --> CSRFMiddleware
+  AuthMiddleware --> AuthRoutes
+  AuthMiddleware --> UserContext
+  UserContext --> Repositories
+  UserContext --> Files
+  UserContext --> Memory
+```
+
+### 用户表
+
+用户记录定义在 `app.gateway.auth.models.User`，持久化到 `users` 表。关键字段：
+
+| 字段 | 语义 |
+|---|---|
+| `id` | 用户主键，JWT `sub` 使用该值 |
+| `email` | 唯一登录名 |
+| `password_hash` | bcrypt hash，OAuth 用户可为空 |
+| `system_role` | `admin` 或 `user` |
+| `needs_setup` | reset 后要求用户完成邮箱 / 密码设置 |
+| `token_version` | 改密码或 reset 时递增，用于废弃旧 JWT |
+
+### 运行时身份
+
+认证成功后，`AuthMiddleware` 把用户同时写入：
+
+- `request.state.user`
+- `request.state.auth`
+- `deerflow.runtime.user_context` 的 `ContextVar`
+
+`ContextVar` 是这里的核心边界。上层 Gateway 负责写入身份，下层 persistence / file path 只读取结构化的当前用户，不反向依赖 `app.gateway.auth` 具体类型。
+
+可以把 repository 调用的用户参数理解成一个三态 ADT：
+
+```scala
+enum UserScope:
+  case AutoFromContext
+  case Explicit(userId: String)
+  case BypassForMigration
+```
+
+对应 Python 实现是 `AUTO | str | None`：
+
+- `AUTO`：从 `ContextVar` 解析当前用户；没有上下文则抛错。
+- `str`：显式指定用户，主要用于测试或管理脚本。
+- `None`：跳过用户过滤，只允许迁移脚本或 admin CLI 使用。
+
+## 登录与初始化流程
+
+### 首次初始化
+
+首次启动时，如果没有 admin，服务不会自动创建账号，只记录日志提示访问 `/setup`。
+
+流程：
+
+1. 用户访问 `/setup`。
+2. 前端调用 `GET /api/v1/auth/setup-status`。
+3. 如果返回 `{"needs_setup": true}`，前端展示创建 admin 表单。
+4. 表单提交 `POST /api/v1/auth/initialize`。
+5. 服务端确认当前没有 admin，创建 `system_role="admin"`、`needs_setup=false` 的用户。
+6. 服务端设置 `access_token` HttpOnly cookie，用户进入 workspace。
+
+`/api/v1/auth/initialize` 只在没有 admin 时可用。并发初始化由数据库唯一约束兜底，失败方返回 409。
+
+### 普通登录
+
+`POST /api/v1/auth/login/local` 使用 `OAuth2PasswordRequestForm`：
+
+- `username` 是邮箱。
+- `password` 是密码。
+- 成功后签发 JWT，放入 `access_token` HttpOnly cookie。
+- 响应体只返回 `expires_in` 和 `needs_setup`，不返回 token。
+
+登录失败会按客户端 IP 计数。IP 解析只在 TCP peer 属于 `AUTH_TRUSTED_PROXIES` 时信任 `X-Real-IP`，不使用 `X-Forwarded-For`。
+
+### 注册
+
+`POST /api/v1/auth/register` 创建普通 `user`，并自动登录。
+
+当前实现允许在没有 admin 时注册普通用户，但 `setup-status` 仍会返回 `needs_setup=true`，因为 admin 仍不存在。这是当前产品策略边界：如果后续要求“必须先初始化 admin 才能注册普通用户”，需要在 `/register` 增加 admin-exists gate。
+
+### 改密码与 reset setup
+
+`POST /api/v1/auth/change-password` 需要当前密码和新密码：
+
+- 校验当前密码。
+- 更新 bcrypt hash。
+- `token_version += 1`，使旧 JWT 立即失效。
+- 重新签发 cookie。
+- 如果 `needs_setup=true` 且传了 `new_email`，则更新邮箱并清除 `needs_setup`。
+
+`python -m app.gateway.auth.reset_admin` 会：
+
+- 找到 admin 或指定邮箱用户。
+- 生成随机密码。
+- 更新密码 hash。
+- `token_version += 1`。
+- 设置 `needs_setup=true`。
+- 写入 `.deer-flow/admin_initial_credentials.txt`，权限 `0600`。
+
+命令行只输出凭据文件路径，不输出明文密码。
+
+## HTTP 认证边界
+
+`AuthMiddleware` 是 fail-closed（默认拒绝）的全局认证门。
+
+公开路径：
+
+- `/health`
+- `/docs`
+- `/redoc`
+- `/openapi.json`
+- `/api/v1/auth/login/local`
+- `/api/v1/auth/register`
+- `/api/v1/auth/logout`
+- `/api/v1/auth/setup-status`
+- `/api/v1/auth/initialize`
+
+其余路径都要求有效 `access_token` cookie。存在 cookie 但 JWT 无效、过期、用户不存在或 `token_version` 不匹配时，直接返回 401，而不是让请求穿透到业务路由。
+
+路由级别的 owner check 由 `require_permission(..., owner_check=True)` 完成：
+
+- 读类请求允许旧的未追踪 legacy thread 兼容读取。
+- 写 / 删除类请求使用 `require_existing=True`，要求 thread row 存在且属于当前用户，避免删除后缺 row 导致其他用户误通过。
+
+## CSRF 设计
+
+DeerFlow 使用 Double Submit Cookie：
+
+- 服务端设置 `csrf_token` cookie。
+- 前端 state-changing 请求发送同值 `X-CSRF-Token` header。
+- 服务端用 `secrets.compare_digest` 比较 cookie/header。
+
+需要 CSRF 的方法：
+
+- `POST`
+- `PUT`
+- `DELETE`
+- `PATCH`
+
+auth bootstrap 端点（login/register/initialize/logout）不要求 double-submit token，因为首次调用时浏览器还没有 token；但这些端点会校验 browser `Origin`，拒绝 hostile Origin，避免 login CSRF / session fixation。
+
+## 用户隔离
+
+### Thread metadata
+
+Thread metadata 存在 `threads_meta`，关键隔离字段是 `user_id`。
+
+创建 thread 时：
+
+- 客户端传入的 `metadata.user_id` 和 `metadata.owner_id` 会被剥离。
+- `ThreadMetaRepository.create(..., user_id=AUTO)` 从 `ContextVar` 解析真实用户。
+- `/api/threads/search` 默认只返回当前用户的 thread。
+
+读取 / 修改 / 删除时：
+
+- `get()` 默认按当前用户过滤。
+- `check_access()` 用于路由 owner check。
+- 对其他用户的 thread 返回 404，避免泄露资源存在性。
+
+### 文件系统
+
+当前线程文件布局：
+
+```text
+{base_dir}/users/{user_id}/threads/{thread_id}/user-data/
+├── workspace/
+├── uploads/
+└── outputs/
+```
+
+agent 在 sandbox 内看到统一虚拟路径：
+
+```text
+/mnt/user-data/workspace
+/mnt/user-data/uploads
+/mnt/user-data/outputs
+```
+
+`ThreadDataMiddleware` 使用 `get_effective_user_id()` 解析当前用户并生成线程路径。没有认证上下文时会落到 `default` 用户桶，主要用于内部调用、嵌入式 client 或无 HTTP 的本地执行路径。
+
+### Memory
+
+默认 memory 存储：
+
+```text
+{base_dir}/users/{user_id}/memory.json
+{base_dir}/users/{user_id}/agents/{agent_name}/memory.json
+```
+
+有用户上下文时，空或相对 `memory.storage_path` 都使用上述 per-user 默认路径；只有绝对 `memory.storage_path` 会视为显式 opt-out（退出） per-user isolation，所有用户共享该路径。无用户上下文的 legacy 路径仍会把相对 `storage_path` 解析到 `Paths.base_dir` 下。
+
+### 自定义 agent
+
+用户自定义 agent 写入：
+
+```text
+{base_dir}/users/{user_id}/agents/{agent_name}/
+├── config.yaml
+├── SOUL.md
+└── memory.json
+```
+
+旧布局 `{base_dir}/agents/{agent_name}/` 只作为只读兼容回退。更新或删除旧共享 agent 会要求先运行迁移脚本。
+
+## 内部调用与 IM 渠道
+
+IM channel worker 不是浏览器用户，不持有浏览器 cookie。它们通过 Gateway 内部认证：
+
+- 请求带 `X-DeerFlow-Internal-Token`。
+- 同时带匹配的 CSRF cookie/header。
+- 服务端识别为内部用户，`id="default"`、`system_role="internal"`。
+
+这意味着 channel 产生的数据默认进入 `default` 用户桶。这个选择适合“平台级 bot 身份”，但不是“每个 IM 用户单独隔离”。如果后续要做到外部 IM 用户隔离，需要把外部 platform user 映射到 DeerFlow user，并让 channel manager 设置对应的 scoped identity。
+
+## LangGraph-compatible 认证
+
+Gateway 内嵌 runtime 路径由 `AuthMiddleware` 和 `CSRFMiddleware` 保护。
+
+仓库仍保留 `app.gateway.langgraph_auth`，用于 LangGraph Server 直连模式：
+
+- `@auth.authenticate` 校验 JWT cookie、CSRF、用户存在性和 `token_version`。
+- `@auth.on` 在写入 metadata 时注入 `user_id`，并在读路径返回 `{"user_id": current_user}` 过滤条件。
+
+这保证 Gateway 路由和 LangGraph-compatible 直连模式使用同一 JWT 语义。
+
+## 升级与迁移
+
+从无认证版本升级时，可能存在没有 `user_id` 的历史 thread。
+
+当前策略：
+
+1. 首次启动如果没有 admin，只提示访问 `/setup`，不迁移。
+2. 操作者创建 admin。
+3. 后续启动时，`_ensure_admin_user()` 找到 admin，并把 LangGraph store 中缺少 `metadata.user_id` 的 thread 迁移到 admin。
+
+文件系统旧布局迁移由脚本处理：
+
+```bash
+cd backend
+PYTHONPATH=. python scripts/migrate_user_isolation.py --dry-run
+PYTHONPATH=. python scripts/migrate_user_isolation.py --user-id <target-user-id>
+```
+
+迁移脚本覆盖 legacy `memory.json`、`threads/` 和 `agents/` 到 per-user layout。
+
+## 安全不变量
+
+必须长期保持的不变量：
+
+- JWT 只在 HttpOnly cookie 中传输，不出现在响应 JSON。
+- 任何非 public HTTP 路由都不能只靠“cookie 存在”放行，必须严格验证 JWT。
+- `token_version` 不匹配必须拒绝，保证改密码 / reset 后旧 session 失效。
+- 客户端 metadata 中的 `user_id` / `owner_id` 必须剥离。
+- repository 默认 `AUTO` 必须从当前用户上下文解析，不能静默退化成全局查询。
+- 只有迁移脚本和 admin CLI 可以显式传 `user_id=None` 绕过隔离。
+- 本地文件路径必须通过 `Paths` 和 sandbox path validation 解析，不能拼接未校验的用户输入。
+- 捕获认证、迁移、后台任务异常必须记录日志；不能空 catch。
+
+## 已知边界
+
+| 边界 | 当前行为 | 后续方向 |
+|---|---|---|
+| 无 admin 时注册普通用户 | 允许注册普通 `user` | 如产品要求先初始化 admin，给 `/register` 加 gate |
+| 登录限速 | 进程内 dict，单 worker 精确，多 worker 近似 | Redis / DB-backed rate limiter |
+| OAuth | 端点占位，未实现 | 接入 provider 并统一 `token_version` / role 语义 |
+| IM 用户隔离 | channel 使用 `default` 内部用户 | 建立外部用户到 DeerFlow user 的映射 |
+| 绝对 memory path | 显式共享 memory | UI / docs 明确提示 opt-out 风险 |
+
+## 相关文件
+
+| 文件 | 职责 |
+|---|---|
+| `app/gateway/auth_middleware.py` | 全局认证门、JWT 严格验证、写入 user context |
+| `app/gateway/csrf_middleware.py` | CSRF double-submit 和 auth Origin 校验 |
+| `app/gateway/routers/auth.py` | initialize/login/register/logout/me/change-password |
+| `app/gateway/auth/jwt.py` | JWT 创建与解析 |
+| `app/gateway/auth/reset_admin.py` | 密码 reset CLI |
+| `app/gateway/auth/credential_file.py` | 0600 凭据文件写入 |
+| `app/gateway/authz.py` | 路由权限与 owner check |
+| `deerflow/runtime/user_context.py` | 当前用户 ContextVar 与 `AUTO` sentinel |
+| `deerflow/persistence/thread_meta/` | thread metadata owner filter |
+| `deerflow/config/paths.py` | per-user filesystem layout |
+| `deerflow/agents/middlewares/thread_data_middleware.py` | run 时解析用户线程目录 |
+| `deerflow/agents/memory/storage.py` | per-user memory storage |
+| `deerflow/config/agents_config.py` | per-user custom agents |
+| `app/channels/manager.py` | IM channel 内部认证调用 |
+| `scripts/migrate_user_isolation.py` | legacy 数据迁移到 per-user layout |
+| `.deer-flow/data/deerflow.db` | 统一 SQLite 数据库，包含 users / threads_meta / runs / feedback 等表 |
+| `.deer-flow/users/{user_id}/agents/{agent_name}/` | 用户自定义 agent 配置、SOUL 和 agent memory |
+| `.deer-flow/admin_initial_credentials.txt` | `reset_admin` 生成的新凭据文件（0600，读完应删除） |
@@ -24,11 +24,11 @@ All other test plan sections were executed against either:

 | Case | Title | What it covers | Why not run |
 |---|---|---|---|
-| TC-DOCKER-01 | `users.db` volume persistence | Verify the `DEER_FLOW_HOME` bind mount survives container restart | needs `docker compose up` |
+| TC-DOCKER-01 | `deerflow.db` volume persistence | Verify the `DEER_FLOW_HOME` bind mount survives container restart | needs `docker compose up` |
 | TC-DOCKER-02 | Session persistence across container restart | `AUTH_JWT_SECRET` env var keeps cookies valid after `docker compose down && up` | needs `docker compose down/up` |
 | TC-DOCKER-03 | Per-worker rate limiter divergence | Confirms in-process `_login_attempts` dict doesn't share state across `gunicorn` workers (4 by default in the compose file); known limitation, documented | needs multi-worker container |
-| TC-DOCKER-04 | IM channels skip AuthMiddleware | Verify Feishu/Slack/Telegram dispatchers run in-container against `http://langgraph:2024` without going through nginx | needs `docker logs` |
-| TC-DOCKER-05 | Admin credentials surfacing | **Updated post-simplify** — was "log scrape", now "0600 credential file in `DEER_FLOW_HOME`". The file-based behavior is already validated by TC-1.1 + TC-UPG-13 on sg_dev (non-Docker), so the only Docker-specific gap is verifying the volume mount carries the file out to the host | needs container + host volume |
+| TC-DOCKER-04 | IM channels use internal Gateway auth | Verify Feishu/Slack/Telegram dispatchers attach the process-local internal auth header plus CSRF cookie/header when calling Gateway-compatible LangGraph APIs | needs `docker logs` |
+| TC-DOCKER-05 | Reset credentials surfacing | `reset_admin` writes a 0600 credential file in `DEER_FLOW_HOME` instead of logging plaintext. The file-based behavior is validated by non-Docker reset tests, so the only Docker-specific gap is verifying the volume mount carries the file out to the host | needs container + host volume |
 | TC-DOCKER-06 | Gateway-mode Docker deploy | `./scripts/deploy.sh --gateway` produces a 3-container topology (no `langgraph` container); same auth flow as standard mode | needs `docker compose --profile gateway` |

 ## Coverage already provided by non-Docker tests
@@ -41,8 +41,8 @@ the test cases that ran on sg_dev or local:
 | TC-DOCKER-01 (volume persistence) | TC-REENT-01 on sg_dev (admin row survives gateway restart) — same SQLite file, just no container layer between |
 | TC-DOCKER-02 (session persistence) | TC-API-02/03/06 (cookie roundtrip), plus TC-REENT-04 (multi-cookie) — JWT verification is process-state-free, container restart is equivalent to `pkill uvicorn && uv run uvicorn` |
 | TC-DOCKER-03 (per-worker rate limit) | TC-GW-04 + TC-REENT-09 (single-worker rate limit + 5min expiry). The cross-worker divergence is an architectural property of the in-memory dict; no auth code path differs |
-| TC-DOCKER-04 (IM channels skip auth) | Code-level only: `app/channels/manager.py` uses `langgraph_sdk` directly with no cookie handling. The langgraph_auth handler is bypassed by going through SDK, not HTTP |
-| TC-DOCKER-05 (credential surfacing) | TC-1.1 on sg_dev (file at `~/deer-flow/backend/.deer-flow/admin_initial_credentials.txt`, mode 0600, password 22 chars) — the only Docker-unique step is whether the bind mount projects this path onto the host, which is a `docker compose` config check, not a runtime behavior change |
+| TC-DOCKER-04 (IM channels use internal auth) | Code-level: `app/channels/manager.py` creates the `langgraph_sdk` client with `create_internal_auth_headers()` plus CSRF cookie/header, so channel workers do not rely on browser cookies |
+| TC-DOCKER-05 (credential surfacing) | `reset_admin` writes `.deer-flow/admin_initial_credentials.txt` with mode 0600 and logs only the path — the only Docker-unique step is whether the bind mount projects this path onto the host, which is a `docker compose` config check, not a runtime behavior change |
 | TC-DOCKER-06 (gateway-mode container) | Section 七 7.2 covered by TC-GW-01..05 + Section 二 (gateway-mode auth flow on sg_dev) — same Gateway code, container is just a packaging change |

 ## Reproduction steps when Docker becomes available
@@ -72,6 +72,6 @@ Then run TC-DOCKER-01..06 from the test plan as written.
  about *container packaging* details (bind mounts, multi-worker, log
  collection), not about whether the auth code paths work.
 - **TC-DOCKER-05 was updated in place** in `AUTH_TEST_PLAN.md` to reflect
-  the post-simplify reality (credentials file → 0600 file, no log leak).
+  the current reset flow (`reset_admin` → 0600 credentials file, no log leak).
  The old "grep 'Password:' in docker logs" expectation would have failed
  silently and given a false sense of coverage.
@@ -19,7 +19,7 @@

 ```bash
 # 清除已有数据
-rm -f backend/.deer-flow/users.db
+rm -f backend/.deer-flow/data/deerflow.db

 # 选择模式启动
 make dev          # 标准模式
@@ -28,10 +28,11 @@ make dev-pro      # Gateway 模式
 ```

 **验证点：**
- [ ] 控制台输出 admin 邮箱和随机密码
- [ ] 密码格式为 `secrets.token_urlsafe(16)` 的 22 字符字符串
- [ ] 邮箱为 `admin@deerflow.dev`
- [ ] 提示 `Change it after login: Settings -> Account`
+- [ ] 控制台不输出 admin 邮箱或明文密码
+- [ ] 控制台提示 `First boot detected — no admin account exists.`
+- [ ] 控制台提示访问 `/setup` 完成 admin 创建
+- [ ] `GET /api/v1/auth/setup-status` 返回 `{"needs_setup": true}`
+- [ ] 前端访问 `/login` 会跳转 `/setup`

 ### 1.2 非首次启动

@@ -42,7 +43,8 @@ make dev

 **验证点：**
 - [ ] 控制台不输出密码
- [ ] 如果 admin 仍 `needs_setup=True`，控制台有 warning 提示
+- [ ] `GET /api/v1/auth/setup-status` 返回 `{"needs_setup": false}`
+- [ ] 已登录用户如果 `needs_setup=True`，访问 workspace 会被引导到 `/setup` 完成改邮箱 / 改密码流程

 ### 1.3 环境变量配置

@@ -76,19 +78,22 @@ make dev
 curl -s $BASE/api/v1/auth/setup-status | jq .
 ```

-**预期：** 返回 `{"needs_setup": false}`（admin 在启动时已自动创建，`count_users() > 0`）。仅在启动完成前的极短窗口内可能返回 `true`。
+**预期：**
+- 干净数据库且尚未初始化 admin：返回 `{"needs_setup": true}`
+- 已存在 admin：返回 `{"needs_setup": false}`

-#### TC-API-02: Admin 首次登录
+#### TC-API-02: 首次初始化 Admin

 ```bash
-curl -s -X POST $BASE/api/v1/auth/login/local \
-  -d "username=admin@deerflow.dev&password=<控制台密码>" \
+curl -s -X POST $BASE/api/v1/auth/initialize \
+  -H "Content-Type: application/json" \
+  -d '{"email":"admin@example.com","password":"AdminPass1!"}' \
  -c cookies.txt | jq .
 ```

 **预期：**
- 状态码 200
- Body: `{"expires_in": 604800, "needs_setup": true}`
+- 状态码 201
+- Body: `{"id": "...", "email": "admin@example.com", "system_role": "admin", "needs_setup": false}`
 - `cookies.txt` 包含 `access_token`（HttpOnly）和 `csrf_token`（非 HttpOnly）

 #### TC-API-03: 获取当前用户
@@ -97,9 +102,9 @@ curl -s -X POST $BASE/api/v1/auth/login/local \
 curl -s $BASE/api/v1/auth/me -b cookies.txt | jq .
 ```

-**预期：** `{"id": "...", "email": "admin@deerflow.dev", "system_role": "admin", "needs_setup": true}`
+**预期：** `{"id": "...", "email": "admin@example.com", "system_role": "admin", "needs_setup": false}`

-#### TC-API-04: Setup 流程（改邮箱 + 改密码）
+#### TC-API-04: 改密码流程

 ```bash
 CSRF=$(grep csrf_token cookies.txt | awk '{print $NF}')
@@ -107,13 +112,36 @@ curl -s -X POST $BASE/api/v1/auth/change-password \
  -b cookies.txt \
  -H "Content-Type: application/json" \
  -H "X-CSRF-Token: $CSRF" \
-  -d '{"current_password":"<控制台密码>","new_password":"NewPass123!","new_email":"admin@example.com"}' | jq .
+  -d '{"current_password":"AdminPass1!","new_password":"NewPass123!"}' | jq .
 ```

 **预期：**
 - 状态码 200
 - `{"message": "Password changed successfully"}`
- 再调 `/auth/me` 邮箱变为 `admin@example.com`，`needs_setup` 变为 `false`
+- 再调 `/auth/me` 仍为 `admin@example.com`，`needs_setup` 仍为 `false`
+
+#### TC-API-04a: reset_admin 后的 Setup 流程（改邮箱 + 改密码）
+
+```bash
+cd backend
+python -m app.gateway.auth.reset_admin --email admin@example.com
+# 从 .deer-flow/admin_initial_credentials.txt 读取 reset 后密码
+
+curl -s -X POST $BASE/api/v1/auth/login/local \
+  -d "username=admin@example.com&password=<凭据文件密码>" \
+  -c cookies.txt | jq .
+
+CSRF=$(grep csrf_token cookies.txt | awk '{print $NF}')
+curl -s -X POST $BASE/api/v1/auth/change-password \
+  -b cookies.txt \
+  -H "Content-Type: application/json" \
+  -H "X-CSRF-Token: $CSRF" \
+  -d '{"current_password":"<凭据文件密码>","new_password":"AdminPass2!","new_email":"admin2@example.com"}' | jq .
+```
+
+**预期：**
+- 登录返回 `{"expires_in": 604800, "needs_setup": true}`
+- `change-password` 后 `/auth/me` 邮箱变为 `admin2@example.com`，`needs_setup` 变为 `false`

 #### TC-API-05: 普通用户注册

@@ -493,7 +521,7 @@ curl -s -X POST $BASE/api/v1/auth/register \

 ```bash
 # 检查数据库
-sqlite3 backend/.deer-flow/users.db "SELECT email, password_hash FROM users LIMIT 3;"
+sqlite3 backend/.deer-flow/data/deerflow.db "SELECT email, password_hash FROM users LIMIT 3;"
 ```

 **预期：** `password_hash` 以 `$2b$` 开头（bcrypt 格式）
@@ -506,24 +534,25 @@ sqlite3 backend/.deer-flow/users.db "SELECT email, password_hash FROM users LIMI

 ### 4.1 首次登录流程

-#### TC-UI-01: 访问首页跳转登录
+#### TC-UI-01: 无 admin 时访问 workspace 跳转 setup

 1. 打开 `http://localhost:2026/workspace`
-2. **预期：** 自动跳转到 `/login`
+2. **预期：** 自动跳转到 `/setup`

-#### TC-UI-02: Login 页面
+#### TC-UI-02: Setup 页面创建 admin

-1. 输入 admin 邮箱和控制台密码
-2. 点击 Login
-3. **预期：** 跳转到 `/setup`（因为 `needs_setup=true`）
-
-#### TC-UI-03: Setup 页面
-
-1. 输入新邮箱、控制台密码（current）、新密码、确认密码
-2. 点击 Complete Setup
+1. 输入 admin 邮箱、密码、确认密码
+2. 点击 Create Admin Account
 3. **预期：** 跳转到 `/workspace`
 4. 刷新页面不跳回 `/setup`

+#### TC-UI-03: 已初始化后 Login 页面
+
+1. 退出登录后访问 `/login`
+2. 输入 admin 邮箱和密码
+3. 点击 Login
+4. **预期：** 跳转到 `/workspace`
+
 #### TC-UI-04: Setup 密码不匹配

 1. 新密码和确认密码不一致
@@ -602,7 +631,7 @@ sqlite3 backend/.deer-flow/users.db "SELECT email, password_hash FROM users LIMI
 #### TC-UI-15: reset_admin 后重新登录

 1. 执行 `cd backend && python -m app.gateway.auth.reset_admin`
-2. 使用新密码登录
+2. 从 `.deer-flow/admin_initial_credentials.txt` 读取新密码并登录
 3. **预期：** 跳转到 `/setup` 页面（`needs_setup` 被重置为 true）
 4. 旧 session 已失效

@@ -645,18 +674,28 @@ make install
 make dev
 ```

-#### TC-UPG-01: 首次启动创建 admin
+#### TC-UPG-01: 首次启动等待 admin 初始化

 **预期：**
- [ ] 控制台输出 admin 邮箱（`admin@deerflow.dev`）和随机密码
+- [ ] 控制台不输出 admin 邮箱或随机密码
+- [ ] 访问 `/setup` 可创建第一个 admin
 - [ ] 无报错，正常启动

 #### TC-UPG-02: 旧 Thread 迁移到 admin

 ```bash
+# 创建第一个 admin
+curl -s -X POST http://localhost:2026/api/v1/auth/initialize \
+  -H "Content-Type: application/json" \
+  -d '{"email":"admin@example.com","password":"AdminPass1!"}' \
+  -c cookies.txt
+
+# 重启一次：启动迁移只在已有 admin 的启动路径执行
+make stop && make dev
+
 # 登录 admin
 curl -s -X POST http://localhost:2026/api/v1/auth/login/local \
-  -d "username=admin@deerflow.dev&password=<控制台密码>" \
+  -d "username=admin@example.com&password=AdminPass1!" \
  -c cookies.txt

 # 查看 thread 列表
@@ -670,8 +709,8 @@ curl -s -X POST http://localhost:2026/api/threads/search \

 **预期：**
 - [ ] 返回的 thread 数量 ≥ 旧版创建的数量
- [ ] 控制台日志有 `Migrated N orphaned thread(s) to admin`
- [ ] 每个 thread 的 `metadata.owner_id` 都已被设为 admin 的 ID
+- [ ] 控制台日志有 `Migrated N orphan LangGraph thread(s) to admin`
+- [ ] 旧 thread 只对 admin 可见

 #### TC-UPG-03: 旧 Thread 内容完整

@@ -683,7 +722,7 @@ curl -s http://localhost:2026/api/threads/<old-thread-id> \

 **预期：**
 - [ ] `metadata.title` 保留原值（如 `old-thread-1`）
- [ ] `metadata.owner_id` 已填充
+- [ ] 响应不回显服务端保留的 `user_id` / `owner_id`

 #### TC-UPG-04: 新用户看不到旧 Thread

@@ -706,18 +745,19 @@ curl -s -X POST http://localhost:2026/api/threads/search \

 ### 5.3 数据库 Schema 兼容

-#### TC-UPG-05: 无 users.db 时自动创建
+#### TC-UPG-05: 无 deerflow.db 时创建 schema 但不创建默认用户

 ```bash
-ls -la backend/.deer-flow/users.db
+ls -la backend/.deer-flow/data/deerflow.db
+sqlite3 backend/.deer-flow/data/deerflow.db "SELECT COUNT(*) FROM users;"
 ```

-**预期：** 文件存在，`sqlite3` 可查到 `users` 表含 `needs_setup`、`token_version` 列
+**预期：** 文件存在，`sqlite3` 可查到 `users` 表含 `needs_setup`、`token_version` 列；未调用 `/initialize` 前用户数为 0

-#### TC-UPG-06: users.db WAL 模式
+#### TC-UPG-06: deerflow.db WAL 模式

 ```bash
-sqlite3 backend/.deer-flow/users.db "PRAGMA journal_mode;"
+sqlite3 backend/.deer-flow/data/deerflow.db "PRAGMA journal_mode;"
 ```

 **预期：** 返回 `wal`
@@ -768,9 +808,9 @@ make dev
 ```

 **预期：**
- [ ] 服务正常启动（忽略 `users.db`，无 auth 相关代码不报错）
+- [ ] 服务正常启动（忽略 `deerflow.db`，无 auth 相关代码不报错）
 - [ ] 旧对话数据仍然可访问
- [ ] `users.db` 文件残留但不影响运行
+- [ ] `deerflow.db` 文件残留但不影响运行

 #### TC-UPG-12: 再次升级到 auth 分支

@@ -781,51 +821,47 @@ make dev
 ```

 **预期：**
- [ ] 识别已有 `users.db`，不重新创建 admin
- [ ] 旧的 admin 账号仍可登录（如果回退期间未删 `users.db`）
+- [ ] 识别已有 `deerflow.db`，不重新创建 admin
+- [ ] 旧的 admin 账号仍可登录（如果回退期间未删 `deerflow.db`）

-### 5.7 休眠 Admin（初始密码未使用/未更改）
+### 5.7 Admin 初始化与 reset_admin

-> 首次启动生成 admin + 随机密码，但运维未登录、未改密码。
-> 密码只在首次启动的控制台闪过一次，后续启动不再显示。
+> 首次启动不生成默认 admin，也不在日志输出密码。忘记密码时走 `reset_admin`，新密码写入 0600 凭据文件。

-#### TC-UPG-13: 重启后自动重置密码并打印
+#### TC-UPG-13: 未初始化 admin 时重启不创建默认账号

 ```bash
-# 首次启动，记录密码
-rm -f backend/.deer-flow/users.db
+rm -f backend/.deer-flow/data/deerflow.db
 make dev
-# 控制台输出密码 P0，不登录
 make stop

-# 隔了几天，再次启动
 make dev
-# 控制台输出新密码 P1
+curl -s $BASE/api/v1/auth/setup-status | jq .
 ```

 **预期：**
- [ ] 控制台输出 `Admin account setup incomplete — password reset`
- [ ] 输出新密码 P1（P0 已失效）
- [ ] 用 P1 可以登录，P0 不可以
- [ ] 登录后 `needs_setup=true`，跳转 `/setup`
- [ ] `token_version` 递增（旧 session 如有也失效）
+- [ ] 控制台不输出密码
+- [ ] `setup-status` 仍为 `{"needs_setup": true}`
+- [ ] 访问 `/setup` 仍可创建第一个 admin

-#### TC-UPG-14: 密码丢失 — 无需 CLI，重启即可
+#### TC-UPG-14: 密码丢失 — reset_admin 写入凭据文件

 ```bash
-# 忘记了控制台密码 → 直接重启服务
-make stop && make dev
-# 控制台自动输出新密码
+python -m app.gateway.auth.reset_admin --email admin@example.com
+ls -la backend/.deer-flow/admin_initial_credentials.txt
+cat backend/.deer-flow/admin_initial_credentials.txt
 ```

 **预期：**
- [ ] 无需 `reset_admin`，重启服务即可拿到新密码
- [ ] `reset_admin` CLI 仍然可用作手动备选方案
+- [ ] 命令行只输出凭据文件路径，不输出明文密码
+- [ ] 凭据文件权限为 `0600`
+- [ ] 凭据文件包含 email + password 行
+- [ ] 该用户下次登录返回 `needs_setup=true`

-#### TC-UPG-15: 休眠 admin 期间普通用户注册
+#### TC-UPG-15: 未初始化 admin 期间普通用户注册策略边界

 ```bash
-# admin 存在但从未登录，普通用户先注册
+# admin 尚不存在，普通用户尝试注册
 curl -s -X POST $BASE/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email":"earlybird@example.com","password":"EarlyPass1!"}' \
@@ -833,11 +869,11 @@ curl -s -X POST $BASE/api/v1/auth/register \
 ```

 **预期：**
- [ ] 注册成功（201），角色为 `user`
- [ ] 无法提权为 admin
- [ ] 普通用户的数据与 admin 隔离
+- [ ] 当前代码允许注册普通用户并自动登录（201，角色为 `user`）
+- [ ] 但 `setup-status` 仍为 `{"needs_setup": true}`，因为 admin 仍不存在
+- [ ] 这是一个产品策略边界：若要求“必须先有 admin”，需要在 `/register` 增加 admin-exists gate

-#### TC-UPG-16: 休眠 admin 不影响后续操作
+#### TC-UPG-16: 普通用户数据与后续 admin 隔离

 ```bash
 # 普通用户正常创建 thread、发消息
@@ -849,14 +885,13 @@ curl -s -X POST $BASE/api/threads \
  -d '{"metadata":{}}' | jq .thread_id
 ```

-**预期：** 正常创建，不受休眠 admin 影响
+**预期：** 普通用户正常创建 thread；后续 admin 创建后，搜索不到该普通用户 thread

-#### TC-UPG-17: 休眠 admin 最终完成 Setup
+#### TC-UPG-17: reset_admin 后完成 Setup

 ```bash
-# 运维终于登录
 curl -s -X POST $BASE/api/v1/auth/login/local \
-  -d "username=admin@deerflow.dev&password=<P0或P1>" \
+  -d "username=admin@example.com&password=<凭据文件密码>" \
  -c admin.txt | jq .needs_setup
 # 预期: true

@@ -866,7 +901,7 @@ curl -s -X POST $BASE/api/v1/auth/change-password \
  -b admin.txt \
  -H "Content-Type: application/json" \
  -H "X-CSRF-Token: $CSRF" \
-  -d '{"current_password":"<密码>","new_password":"AdminFinal1!","new_email":"admin@real.com"}' \
+  -d '{"current_password":"<凭据文件密码>","new_password":"AdminFinal1!","new_email":"admin@real.com"}' \
  -c admin.txt

 # 验证
@@ -876,7 +911,7 @@ curl -s $BASE/api/v1/auth/me -b admin.txt | jq '{email, needs_setup}'
 **预期：**
 - [ ] `email` 变为 `admin@real.com`
 - [ ] `needs_setup` 变为 `false`
- [ ] 后续重启控制台不再有 warning
+- [ ] 后续登录使用新密码

 #### TC-UPG-18: 长期未用后 JWT 密钥轮换

@@ -890,8 +925,8 @@ make stop && make dev

 **预期：**
 - [ ] 服务正常启动
- [ ] 旧密码仍可登录（密码存在 DB，与 JWT 密钥无关）
- [ ] 旧的 JWT token 失效（密钥变了签名不匹配）— 但因为从未登录过也没有旧 token
+- [ ] 账号密码仍可登录（密码存在 DB，与 JWT 密钥无关）
+- [ ] 旧的 JWT token 失效（密钥变了签名不匹配）

 ---

@@ -910,7 +945,7 @@ for i in 1 2 3; do
 done

 # 检查 admin 数量
-sqlite3 backend/.deer-flow/users.db \
+sqlite3 backend/.deer-flow/data/deerflow.db \
  "SELECT COUNT(*) FROM users WHERE system_role='admin';"
 ```

@@ -1055,7 +1090,7 @@ curl -s -X POST $BASE/api/v1/auth/register \
 wait

 # 检查用户数
-sqlite3 backend/.deer-flow/users.db \
+sqlite3 backend/.deer-flow/data/deerflow.db \
  "SELECT COUNT(*) FROM users WHERE email='race@example.com';"
 ```

@@ -1165,13 +1200,16 @@ curl -s -w "%{http_code}" -X DELETE "$BASE/api/threads/$TID" \
 ```bash
 cd backend
 python -m app.gateway.auth.reset_admin
-# 记录密码 P1
+cp .deer-flow/admin_initial_credentials.txt /tmp/deerflow-reset-p1.txt
+P1=$(awk -F': ' '/^password:/ {print $2}' /tmp/deerflow-reset-p1.txt)

 python -m app.gateway.auth.reset_admin
-# 记录密码 P2
+cp .deer-flow/admin_initial_credentials.txt /tmp/deerflow-reset-p2.txt
+P2=$(awk -F': ' '/^password:/ {print $2}' /tmp/deerflow-reset-p2.txt)
 ```

 **预期：**
+- [ ] `.deer-flow/admin_initial_credentials.txt` 每次都会被重写，文件权限为 `0600`
 - [ ] P1 ≠ P2（每次生成新随机密码）
 - [ ] P1 不可用，只有 P2 有效
 - [ ] `token_version` 递增了 2
@@ -1324,7 +1362,8 @@ done
 ```bash
 GW=http://localhost:8001

-for path in /health /api/v1/auth/setup-status /api/v1/auth/login/local /api/v1/auth/register; do
+for path in /health /api/v1/auth/setup-status /api/v1/auth/login/local \
+            /api/v1/auth/register /api/v1/auth/initialize /api/v1/auth/logout; do
  echo "$path: $(curl -s -w '%{http_code}' -o /dev/null $GW$path)"
 done
 # 预期: 200 或 405/422（方法不对但不是 401）
@@ -1399,9 +1438,9 @@ done
 >
 > 前置条件：
 > - `.env` 中设置 `AUTH_JWT_SECRET`（否则每次容器重启 session 全部失效）
-> - `DEER_FLOW_HOME` 挂载到宿主机目录（持久化 `users.db`）
+> - `DEER_FLOW_HOME` 挂载到宿主机目录（持久化 `deerflow.db`）

-#### TC-DOCKER-01: users.db 通过 volume 持久化
+#### TC-DOCKER-01: deerflow.db 通过 volume 持久化

 ```bash
 # 启动容器
@@ -1416,13 +1455,13 @@ curl -s -X POST $BASE/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email":"docker-test@example.com","password":"DockerTest1!"}' -w "\nHTTP %{http_code}"

-# 检查宿主机上的 users.db
-ls -la ${DEER_FLOW_HOME:-backend/.deer-flow}/users.db
-sqlite3 ${DEER_FLOW_HOME:-backend/.deer-flow}/users.db \
+# 检查宿主机上的 deerflow.db
+ls -la ${DEER_FLOW_HOME:-backend/.deer-flow}/data/deerflow.db
+sqlite3 ${DEER_FLOW_HOME:-backend/.deer-flow}/data/deerflow.db \
  "SELECT email FROM users WHERE email='docker-test@example.com';"
 ```

-**预期：** users.db 在宿主机 `DEER_FLOW_HOME` 目录中，查询可见刚注册的用户。
+**预期：** deerflow.db 在宿主机 `DEER_FLOW_HOME` 目录中，查询可见刚注册的用户。

 #### TC-DOCKER-02: 重启容器后 session 保持

@@ -1466,22 +1505,24 @@ done

 **已知限制：** In-process rate limiter 不跨 worker 共享。生产环境如需精确限速，需要 Redis 等外部存储。

-#### TC-DOCKER-04: IM 渠道不经过 auth
+#### TC-DOCKER-04: IM 渠道使用内部认证

 ```bash
-# IM 渠道（Feishu/Slack/Telegram）在 gateway 容器内部通过 LangGraph SDK 通信
-# 不走 nginx，不经过 AuthMiddleware
+# IM 渠道（Feishu/Slack/Telegram）在 gateway 容器内部通过 LangGraph SDK 调 Gateway
+# 请求携带 process-local internal auth header，并带匹配的 CSRF cookie/header

 # 验证方式：检查 gateway 日志中 channel manager 的请求不包含 auth 错误
 docker logs deer-flow-gateway 2>&1 | grep -E "ChannelManager|channel" | head -10
 ```

-**预期：** 无 auth 相关错误。渠道通过 `langgraph-sdk` 直连 LangGraph Server（`http://langgraph:2024`），不走 auth 层。
+**预期：** 无 auth 相关错误。渠道不依赖浏览器 cookie；服务端通过内部认证头把请求归入 `default` 用户桶。

-#### TC-DOCKER-05: admin 密码写入 0600 凭证文件（不再走日志）
+#### TC-DOCKER-05: reset_admin 密码写入 0600 凭证文件（不再走日志）

 ```bash
-# 凭证文件写在挂载到宿主机的 DEER_FLOW_HOME 下
+# 首次启动不会自动生成 admin 密码。先重置已有 admin，凭据文件写在挂载到宿主机的 DEER_FLOW_HOME 下。
+docker exec deer-flow-gateway python -m app.gateway.auth.reset_admin --email docker-test@example.com
+
 ls -la ${DEER_FLOW_HOME:-backend/.deer-flow}/admin_initial_credentials.txt
 # 预期文件权限: -rw------- (0600)

@@ -1512,14 +1553,15 @@ sleep 15
 docker ps --filter name=deer-flow-langgraph --format '{{.Names}}' | wc -l
 # 预期: 0

-# auth 流程正常
+# auth 流程正常：未登录受保护接口返回 401
 curl -s -w "%{http_code}" -o /dev/null $BASE/api/models
 # 预期: 401

-curl -s -X POST $BASE/api/v1/auth/login/local \
-  -d "username=admin@deerflow.dev&password=<日志密码>" \
+curl -s -X POST $BASE/api/v1/auth/initialize \
+  -H "Content-Type: application/json" \
+  -d '{"email":"admin@example.com","password":"AdminPass1!"}' \
  -c cookies.txt -w "\nHTTP %{http_code}"
-# 预期: 200
+# 预期: 201
 ```

 ### 7.4 补充边界用例
@@ -1587,13 +1629,15 @@ curl -s -D - -X POST $BASE/api/v1/auth/login/local \
 #### TC-EDGE-05: HTTP 无 max_age / HTTPS 有 max_age

 ```bash
+GW=http://localhost:8001
+
 # HTTP
-curl -s -D - -X POST $BASE/api/v1/auth/login/local \
+curl -s -D - -X POST $GW/api/v1/auth/login/local \
  -d "username=admin@example.com&password=正确密码" 2>/dev/null \
  | grep "access_token=" | grep -oi "max-age=[0-9]*" || echo "NO max-age (HTTP session cookie)"

-# HTTPS
-curl -s -D - -X POST $BASE/api/v1/auth/login/local \
+# HTTPS：直连 Gateway 才能用 X-Forwarded-Proto 模拟 HTTPS；nginx 会覆盖该 header
+curl -s -D - -X POST $GW/api/v1/auth/login/local \
  -H "X-Forwarded-Proto: https" \
  -d "username=admin@example.com&password=正确密码" 2>/dev/null \
  | grep "access_token=" | grep -oi "max-age=[0-9]*"
@@ -1712,10 +1756,10 @@ curl -s -X POST $BASE/api/threads \
  -b cookies.txt \
  -H "Content-Type: application/json" \
  -H "X-CSRF-Token: $CSRF" \
-  -d '{"metadata":{"owner_id":"victim-user-id"}}' | jq .metadata.owner_id
+  -d '{"metadata":{"owner_id":"victim-user-id","user_id":"victim-user-id"}}' | jq .metadata
 ```

-**预期：** 返回的 `metadata.owner_id` 应为当前登录用户的 ID，不是请求中注入的 `victim-user-id`。服务端应覆盖客户端提供的 `user_id`。
+**预期：** 返回的 `metadata` 不包含 `owner_id` 或 `user_id`。真实所有权写入 `threads_meta.user_id`，不从客户端 metadata 接收，也不通过 metadata 回显。

 #### 7.5.6 HTTP Method 探测

@@ -1796,6 +1840,6 @@ cd backend && PYTHONPATH=. uv run pytest \
 # 核心接口冒烟
 curl -s $BASE/health                              # 200
 curl -s $BASE/api/models                          # 401 (无 cookie)
-curl -s -X POST $BASE/api/v1/auth/setup-status    # 200
+curl -s $BASE/api/v1/auth/setup-status            # 200
 curl -s $BASE/api/v1/auth/me -b cookies.txt       # 200 (有 cookie)
 ```
@@ -2,13 +2,16 @@

 DeerFlow 内置了认证模块。本文档面向从无认证版本升级的用户。

+完整设计见 [AUTH_DESIGN.md](AUTH_DESIGN.md)。
+
 ## 核心概念

 认证模块采用**始终强制**策略：

- 首次启动时自动创建 admin 账号，随机密码打印到控制台日志
+- 首次启动时不会自动创建账号；首次访问 `/setup` 时由操作者创建第一个 admin 账号
 - 认证从一开始就是强制的，无竞争窗口
- 历史对话（升级前创建的 thread）自动迁移到 admin 名下
+- 已有 admin 后，服务启动时会把历史对话（升级前创建且缺少 `user_id` 的 thread）迁移到 admin 名下
+- 新数据按用户隔离：thread、workspace/uploads/outputs、memory、自定义 agent 都归属当前用户

 ## 升级步骤

@@ -25,39 +28,41 @@ cd backend && make install
 make dev
 ```

-控制台会输出：
+如果没有 admin 账号，控制台只会提示：

 ```
 ============================================================
-  Admin account created on first boot
-  Email:    admin@deerflow.dev
-  Password: aB3xK9mN_pQ7rT2w
-  Change it after login: Settings → Account
+  First boot detected — no admin account exists.
+  Visit /setup to complete admin account creation.
 ============================================================
 ```

-如果未登录就重启了服务，不用担心——只要 setup 未完成，每次启动都会重置密码并重新打印到控制台。
+首次启动不会在日志里打印随机密码，也不会写入默认 admin。这样避免启动日志泄露凭据，也避免在操作者创建账号前出现可被猜测的默认身份。

-### 3. 登录
+### 3. 创建 admin

-访问 `http://localhost:2026/login`，使用控制台输出的邮箱和密码登录。
+访问 `http://localhost:2026/setup`，填写邮箱和密码创建第一个 admin 账号。创建成功后会自动登录并进入 workspace。

-### 4. 修改密码
+如果这是从无认证版本升级，创建 admin 后重启一次服务，让启动迁移把缺少 `user_id` 的历史 thread 归属到 admin。

-登录后进入 Settings → Account → Change Password。
+### 4. 登录
+
+后续访问 `http://localhost:2026/login`，使用已创建的邮箱和密码登录。

 ### 5. 添加用户（可选）

-其他用户通过 `/login` 页面注册，自动获得 **user** 角色。每个用户只能看到自己的对话。
+其他用户通过 `/login` 页面注册，自动获得 **user** 角色。每个用户只能看到自己的对话、上传文件、输出文件、memory 和自定义 agent。

 ## 安全机制

 | 机制 | 说明 |
 |------|------|
 | JWT HttpOnly Cookie | Token 不暴露给 JavaScript，防止 XSS 窃取 |
-| CSRF Double Submit Cookie | 所有 POST/PUT/DELETE 请求需携带 `X-CSRF-Token` |
+| CSRF Double Submit Cookie | 受保护的 POST/PUT/PATCH/DELETE 请求需携带 `X-CSRF-Token`；登录/注册/初始化/登出走 auth 端点 Origin 校验 |
 | bcrypt 密码哈希 | 密码不以明文存储 |
-| 多租户隔离 | 用户只能访问自己的 thread |
+| Thread owner filter | `threads_meta.user_id` 由服务端认证上下文写入，搜索、读取、更新、删除默认按当前用户过滤 |
+| 文件系统隔离 | 线程数据写入 `{base_dir}/users/{user_id}/threads/{thread_id}/user-data/`，sandbox 内统一映射为 `/mnt/user-data/` |
+| Memory / agent 隔离 | 用户 memory 和自定义 agent 写入 `{base_dir}/users/{user_id}/...`；旧共享 agent 只作为只读兼容回退 |
 | HTTPS 自适应 | 检测 `x-forwarded-proto`，自动设置 `Secure` cookie 标志 |

 ## 常见操作
@@ -74,22 +79,26 @@ python -m app.gateway.auth.reset_admin
 python -m app.gateway.auth.reset_admin --email user@example.com
 ```

-会输出新的随机密码。
+会把新的随机密码写入 `.deer-flow/admin_initial_credentials.txt`，文件权限为 `0600`。命令行只输出文件路径，不输出明文密码。

 ### 完全重置

-删除用户数据库，重启后自动创建新 admin：
+删除统一 SQLite 数据库，重启后重新访问 `/setup` 创建新 admin：

 ```bash
-rm -f backend/.deer-flow/users.db
-# 重启服务，控制台输出新密码
+rm -f backend/.deer-flow/data/deerflow.db
+# 重启服务后访问 http://localhost:2026/setup
 ```

 ## 数据存储

 | 文件 | 内容 |
 |------|------|
-| `.deer-flow/users.db` | SQLite 用户数据库（密码哈希、角色） |
+| `.deer-flow/data/deerflow.db` | 统一 SQLite 数据库（users、threads_meta、runs、feedback 等应用数据） |
+| `.deer-flow/users/{user_id}/threads/{thread_id}/user-data/` | 用户线程的 workspace、uploads、outputs |
+| `.deer-flow/users/{user_id}/memory.json` | 用户级 memory |
+| `.deer-flow/users/{user_id}/agents/{agent_name}/` | 用户自定义 agent 配置、SOUL 和 agent memory |
+| `.deer-flow/admin_initial_credentials.txt` | `reset_admin` 生成的新凭据文件（0600，读完应删除） |
 | `.env` 中的 `AUTH_JWT_SECRET` | JWT 签名密钥（未设置时自动生成临时密钥，重启后 session 失效） |

 ### 生产环境建议
@@ -111,19 +120,21 @@ python -c "import secrets; print(secrets.token_urlsafe(32))"
 | `/api/v1/auth/me` | GET | 获取当前用户信息 |
 | `/api/v1/auth/change-password` | POST | 修改密码 |
 | `/api/v1/auth/setup-status` | GET | 检查 admin 是否存在 |
+| `/api/v1/auth/initialize` | POST | 首次初始化第一个 admin（仅无 admin 时可调用） |

 ## 兼容性

- **标准模式**（`make dev`）：完全兼容，admin 自动创建
+- **标准模式**（`make dev`）：完全兼容；无 admin 时访问 `/setup` 初始化
 - **Gateway 模式**（`make dev-pro`）：完全兼容
- **Docker 部署**：完全兼容，`.deer-flow/users.db` 需持久化卷挂载
- **IM 渠道**（Feishu/Slack/Telegram）：通过 LangGraph SDK 通信，不经过认证层
+- **Docker 部署**：完全兼容，`.deer-flow/data/deerflow.db` 需持久化卷挂载
+- **IM 渠道**（Feishu/Slack/Telegram）：通过 Gateway 内部认证通信，使用 `default` 用户桶
 - **DeerFlowClient**（嵌入式）：不经过 HTTP，不受认证影响

 ## 故障排查

 | 症状 | 原因 | 解决 |
 |------|------|------|
-| 启动后没看到密码 | admin 已存在（非首次启动） | 用 `reset_admin` 重置，或删 `users.db` |
+| 启动后没看到密码 | 当前实现不在启动日志输出密码 | 首次安装访问 `/setup`；忘记密码用 `reset_admin` |
+| `/login` 自动跳到 `/setup` | 系统还没有 admin | 在 `/setup` 创建第一个 admin |
 | 登录后 POST 返回 403 | CSRF token 缺失 | 确认前端已更新 |
 | 重启后需要重新登录 | `AUTH_JWT_SECRET` 未持久化 | 在 `.env` 中设置固定密钥 |
@@ -8,6 +8,7 @@ This directory contains detailed documentation for the DeerFlow backend.
 |----------|-------------|
 | [ARCHITECTURE.md](ARCHITECTURE.md) | System architecture overview |
 | [API.md](API.md) | Complete API reference |
+| [AUTH_DESIGN.md](AUTH_DESIGN.md) | User authentication, CSRF, and per-user isolation design |
 | [CONFIGURATION.md](CONFIGURATION.md) | Configuration options |
 | [SETUP.md](SETUP.md) | Quick setup guide |

@@ -42,6 +43,7 @@ docs/
 ├── README.md                  # This file
 ├── ARCHITECTURE.md            # System architecture
 ├── API.md                     # API reference
+├── AUTH_DESIGN.md             # User authentication and isolation design
 ├── CONFIGURATION.md           # Configuration guide
 ├── SETUP.md                   # Setup instructions
 ├── FILE_UPLOAD.md             # File upload feature
@@ -36,42 +36,73 @@ class DanglingToolCallMiddleware(AgentMiddleware[AgentState]):

    @staticmethod
    def _message_tool_calls(msg) -> list[dict]:
-        """Return normalized tool calls from structured fields or raw provider payloads."""
+        """Return normalized tool calls from structured fields or raw provider payloads.
+
+        LangChain stores malformed provider function calls in ``invalid_tool_calls``.
+        They do not execute, but provider adapters may still serialize enough of
+        the call id/name back into the next request that strict OpenAI-compatible
+        validators expect a matching ToolMessage. Treat them as dangling calls so
+        the next model request stays well-formed and the model sees a recoverable
+        tool error instead of another provider 400.
+        """
+        normalized: list[dict] = []
+
        tool_calls = getattr(msg, "tool_calls", None) or []
-        if tool_calls:
-            return list(tool_calls)
+        normalized.extend(list(tool_calls))

        raw_tool_calls = (getattr(msg, "additional_kwargs", None) or {}).get("tool_calls") or []
-        normalized: list[dict] = []
-        for raw_tc in raw_tool_calls:
-            if not isinstance(raw_tc, dict):
+        if not tool_calls:
+            for raw_tc in raw_tool_calls:
+                if not isinstance(raw_tc, dict):
+                    continue
+
+                function = raw_tc.get("function")
+                name = raw_tc.get("name")
+                if not name and isinstance(function, dict):
+                    name = function.get("name")
+
+                args = raw_tc.get("args", {})
+                if not args and isinstance(function, dict):
+                    raw_args = function.get("arguments")
+                    if isinstance(raw_args, str):
+                        try:
+                            parsed_args = json.loads(raw_args)
+                        except (TypeError, ValueError, json.JSONDecodeError):
+                            parsed_args = {}
+                        args = parsed_args if isinstance(parsed_args, dict) else {}
+
+                normalized.append(
+                    {
+                        "id": raw_tc.get("id"),
+                        "name": name or "unknown",
+                        "args": args if isinstance(args, dict) else {},
+                    }
+                )
+
+        for invalid_tc in getattr(msg, "invalid_tool_calls", None) or []:
+            if not isinstance(invalid_tc, dict):
                continue
-
-            function = raw_tc.get("function")
-            name = raw_tc.get("name")
-            if not name and isinstance(function, dict):
-                name = function.get("name")
-
-            args = raw_tc.get("args", {})
-            if not args and isinstance(function, dict):
-                raw_args = function.get("arguments")
-                if isinstance(raw_args, str):
-                    try:
-                        parsed_args = json.loads(raw_args)
-                    except (TypeError, ValueError, json.JSONDecodeError):
-                        parsed_args = {}
-                    args = parsed_args if isinstance(parsed_args, dict) else {}
-
            normalized.append(
                {
-                    "id": raw_tc.get("id"),
-                    "name": name or "unknown",
-                    "args": args if isinstance(args, dict) else {},
+                    "id": invalid_tc.get("id"),
+                    "name": invalid_tc.get("name") or "unknown",
+                    "args": {},
+                    "invalid": True,
+                    "error": invalid_tc.get("error"),
                }
            )

        return normalized

+    @staticmethod
+    def _synthetic_tool_message_content(tool_call: dict) -> str:
+        if tool_call.get("invalid"):
+            error = tool_call.get("error")
+            if isinstance(error, str) and error:
+                return f"[Tool call could not be executed because its arguments were invalid: {error}]"
+            return "[Tool call could not be executed because its arguments were invalid.]"
+        return "[Tool call was interrupted and did not return a result.]"
+
    def _build_patched_messages(self, messages: list) -> list | None:
        """Return a new message list with patches inserted at the correct positions.

@@ -114,7 +145,7 @@ class DanglingToolCallMiddleware(AgentMiddleware[AgentState]):
                if tc_id and tc_id not in existing_tool_msg_ids and tc_id not in patched_ids:
                    patched.append(
                        ToolMessage(
-                            content="[Tool call was interrupted and did not return a result.]",
+                            content=self._synthetic_tool_message_content(tc),
                            tool_call_id=tc_id,
                            name=tc.get("name", "unknown"),
                            status="error",
@@ -1,11 +1,6 @@
 """Load MCP tools using langchain-mcp-adapters."""

-import asyncio
-import atexit
-import concurrent.futures
 import logging
-from collections.abc import Callable
-from typing import Any

 from langchain_core.tools import BaseTool

@@ -13,46 +8,10 @@ from deerflow.config.extensions_config import ExtensionsConfig
 from deerflow.mcp.client import build_servers_config
 from deerflow.mcp.oauth import build_oauth_tool_interceptor, get_initial_oauth_headers
 from deerflow.reflection import resolve_variable
+from deerflow.tools.sync import make_sync_tool_wrapper

 logger = logging.getLogger(__name__)

-# Global thread pool for sync tool invocation in async environments
-_SYNC_TOOL_EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=10, thread_name_prefix="mcp-sync-tool")
-
-# Register shutdown hook for the global executor
-atexit.register(lambda: _SYNC_TOOL_EXECUTOR.shutdown(wait=False))
-
-
-def _make_sync_tool_wrapper(coro: Callable[..., Any], tool_name: str) -> Callable[..., Any]:
-    """Build a synchronous wrapper for an asynchronous tool coroutine.
-
-    Args:
-        coro: The tool's asynchronous coroutine.
-        tool_name: Name of the tool (for logging).
-
-    Returns:
-        A synchronous function that correctly handles nested event loops.
-    """
-
-    def sync_wrapper(*args: Any, **kwargs: Any) -> Any:
-        try:
-            loop = asyncio.get_running_loop()
-        except RuntimeError:
-            loop = None
-
-        try:
-            if loop is not None and loop.is_running():
-                # Use global executor to avoid nested loop issues and improve performance
-                future = _SYNC_TOOL_EXECUTOR.submit(asyncio.run, coro(*args, **kwargs))
-                return future.result()
-            else:
-                return asyncio.run(coro(*args, **kwargs))
-        except Exception as e:
-            logger.error(f"Error invoking MCP tool '{tool_name}' via sync wrapper: {e}", exc_info=True)
-            raise
-
-    return sync_wrapper
-

 async def get_mcp_tools() -> list[BaseTool]:
    """Get all tools from enabled MCP servers.
@@ -126,7 +85,7 @@ async def get_mcp_tools() -> list[BaseTool]:
        # Patch tools to support sync invocation, as deerflow client streams synchronously
        for tool in tools:
            if getattr(tool, "func", None) is None and getattr(tool, "coroutine", None) is not None:
-                tool.func = _make_sync_tool_wrapper(tool.coroutine, tool.name)
+                tool.func = make_sync_tool_wrapper(tool.coroutine, tool.name)

        return tools

@@ -0,0 +1,195 @@
+"""Dialect-aware JSON value matching for SQLAlchemy (SQLite + PostgreSQL)."""
+
+from __future__ import annotations
+
+import re
+from dataclasses import dataclass
+from typing import Any
+
+from sqlalchemy import BigInteger, Float, String, bindparam
+from sqlalchemy.ext.compiler import compiles
+from sqlalchemy.sql.compiler import SQLCompiler
+from sqlalchemy.sql.expression import ColumnElement
+from sqlalchemy.sql.visitors import InternalTraversal
+from sqlalchemy.types import Boolean, TypeEngine
+
+# Key is interpolated into compiled SQL; restrict charset to prevent injection.
+_KEY_CHARSET_RE = re.compile(r"^[A-Za-z0-9_\-]+$")
+
+# Allowed value types for metadata filter values (same set accepted by JsonMatch).
+ALLOWED_FILTER_VALUE_TYPES: tuple[type, ...] = (type(None), bool, int, float, str)
+
+# SQLite raises an overflow when binding values outside signed 64-bit range;
+# PostgreSQL overflows during BIGINT cast. Reject at validation time instead.
+_INT64_MIN = -(2**63)
+_INT64_MAX = 2**63 - 1
+
+
+def validate_metadata_filter_key(key: object) -> bool:
+    """Return True if *key* is safe for use as a JSON metadata filter key.
+
+    A key is "safe" when it is a string matching ``[A-Za-z0-9_-]+``. The
+    charset is restricted because the key is interpolated into the
+    compiled SQL path expression (``$."<key>"`` / ``->`` literal), so any
+    laxer pattern would open a SQL/JSONPath injection surface.
+    """
+    return isinstance(key, str) and bool(_KEY_CHARSET_RE.match(key))
+
+
+def validate_metadata_filter_value(value: object) -> bool:
+    """Return True if *value* is an allowed type for a JSON metadata filter.
+
+    Matches the set of types ``_build_clause`` knows how to compile into
+    a dialect-portable predicate. Anything else (list/dict/bytes/...) is
+    intentionally rejected rather than silently coerced via ``str()`` —
+    silent coercion would (a) produce wrong matches and (b) break
+    SQLAlchemy's ``inherit_cache`` invariant when ``value`` is unhashable.
+
+    Integer values are additionally restricted to the signed 64-bit range
+    ``[-2**63, 2**63 - 1]``: SQLite overflows when binding larger values
+    and PostgreSQL overflows during the ``BIGINT`` cast.
+    """
+    if not isinstance(value, ALLOWED_FILTER_VALUE_TYPES):
+        return False
+    if isinstance(value, int) and not isinstance(value, bool):
+        if not (_INT64_MIN <= value <= _INT64_MAX):
+            return False
+    return True
+
+
+class JsonMatch(ColumnElement):
+    """Dialect-portable ``column[key] == value`` for JSON columns.
+
+    Compiles to ``json_type``/``json_extract`` on SQLite and
+    ``json_typeof``/``->>`` on PostgreSQL, with type-safe comparison
+    that distinguishes bool vs int and NULL vs missing key.
+
+    *key* must be a single literal key matching ``[A-Za-z0-9_-]+``.
+    *value* must be one of: ``None``, ``bool``, ``int`` (signed 64-bit), ``float``, ``str``.
+    """
+
+    inherit_cache = True
+    type = Boolean()
+    _is_implicitly_boolean = True
+
+    _traverse_internals = [
+        ("column", InternalTraversal.dp_clauseelement),
+        ("key", InternalTraversal.dp_string),
+        ("value", InternalTraversal.dp_plain_obj),
+    ]
+
+    def __init__(self, column: ColumnElement, key: str, value: object) -> None:
+        if not validate_metadata_filter_key(key):
+            raise ValueError(f"JsonMatch key must match {_KEY_CHARSET_RE.pattern!r}; got: {key!r}")
+        if not validate_metadata_filter_value(value):
+            if isinstance(value, int) and not isinstance(value, bool):
+                raise TypeError(f"JsonMatch int value out of signed 64-bit range [-2**63, 2**63-1]: {value!r}")
+            raise TypeError(f"JsonMatch value must be None, bool, int, float, or str; got: {type(value).__name__!r}")
+        self.column = column
+        self.key = key
+        self.value = value
+        super().__init__()
+
+
+@dataclass(frozen=True)
+class _Dialect:
+    """Per-dialect names used when emitting JSON type/value comparisons."""
+
+    null_type: str
+    num_types: tuple[str, ...]
+    num_cast: str
+    int_types: tuple[str, ...]
+    int_cast: str
+    # None for SQLite where json_type already returns 'integer'/'real';
+    # regex literal for PostgreSQL where json_typeof returns 'number' for
+    # both ints and floats, so an extra guard prevents CAST errors on floats.
+    int_guard: str | None
+    string_type: str
+    bool_type: str | None
+
+
+_SQLITE = _Dialect(
+    null_type="null",
+    num_types=("integer", "real"),
+    num_cast="REAL",
+    int_types=("integer",),
+    int_cast="INTEGER",
+    int_guard=None,
+    string_type="text",
+    bool_type=None,
+)
+
+_PG = _Dialect(
+    null_type="null",
+    num_types=("number",),
+    num_cast="DOUBLE PRECISION",
+    int_types=("number",),
+    int_cast="BIGINT",
+    int_guard="'^-?[0-9]+$'",
+    string_type="string",
+    bool_type="boolean",
+)
+
+
+def _bind(compiler: SQLCompiler, value: object, sa_type: TypeEngine[Any], **kw: Any) -> str:
+    param = bindparam(None, value, type_=sa_type)
+    return compiler.process(param, **kw)
+
+
+def _type_check(typeof: str, types: tuple[str, ...]) -> str:
+    if len(types) == 1:
+        return f"{typeof} = '{types[0]}'"
+    quoted = ", ".join(f"'{t}'" for t in types)
+    return f"{typeof} IN ({quoted})"
+
+
+def _build_clause(compiler: SQLCompiler, typeof: str, extract: str, value: object, dialect: _Dialect, **kw: Any) -> str:
+    if value is None:
+        return f"{typeof} = '{dialect.null_type}'"
+    if isinstance(value, bool):
+        # bool check must precede int check — bool is a subclass of int in Python
+        bool_str = "true" if value else "false"
+        if dialect.bool_type is None:
+            return f"{typeof} = '{bool_str}'"
+        return f"({typeof} = '{dialect.bool_type}' AND {extract} = '{bool_str}')"
+    if isinstance(value, int):
+        bp = _bind(compiler, value, BigInteger(), **kw)
+        if dialect.int_guard:
+            # CASE prevents CAST error when json_typeof = 'number' also matches floats
+            return f"(CASE WHEN {_type_check(typeof, dialect.int_types)} AND {extract} ~ {dialect.int_guard} THEN CAST({extract} AS {dialect.int_cast}) END = {bp})"
+        return f"({_type_check(typeof, dialect.int_types)} AND CAST({extract} AS {dialect.int_cast}) = {bp})"
+    if isinstance(value, float):
+        bp = _bind(compiler, value, Float(), **kw)
+        return f"({_type_check(typeof, dialect.num_types)} AND CAST({extract} AS {dialect.num_cast}) = {bp})"
+    bp = _bind(compiler, str(value), String(), **kw)
+    return f"({typeof} = '{dialect.string_type}' AND {extract} = {bp})"
+
+
+@compiles(JsonMatch, "sqlite")
+def _compile_sqlite(element: JsonMatch, compiler: SQLCompiler, **kw: Any) -> str:
+    if not validate_metadata_filter_key(element.key):
+        raise ValueError(f"Key escaped validation: {element.key!r}")
+    col = compiler.process(element.column, **kw)
+    path = f'$."{element.key}"'
+    typeof = f"json_type({col}, '{path}')"
+    extract = f"json_extract({col}, '{path}')"
+    return _build_clause(compiler, typeof, extract, element.value, _SQLITE, **kw)
+
+
+@compiles(JsonMatch, "postgresql")
+def _compile_pg(element: JsonMatch, compiler: SQLCompiler, **kw: Any) -> str:
+    if not validate_metadata_filter_key(element.key):
+        raise ValueError(f"Key escaped validation: {element.key!r}")
+    col = compiler.process(element.column, **kw)
+    typeof = f"json_typeof({col} -> '{element.key}')"
+    extract = f"({col} ->> '{element.key}')"
+    return _build_clause(compiler, typeof, extract, element.value, _PG, **kw)
+
+
+@compiles(JsonMatch)
+def _compile_default(element: JsonMatch, compiler: SQLCompiler, **kw: Any) -> str:
+    raise NotImplementedError(f"JsonMatch supports only sqlite and postgresql; got dialect: {compiler.dialect.name}")
+
+
+def json_match(column: ColumnElement, key: str, value: object) -> JsonMatch:
+    return JsonMatch(column, key, value)
@@ -23,6 +23,18 @@ class RunRepository(RunStore):
    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
        self._sf = session_factory

+    @staticmethod
+    def _normalize_model_name(model_name: str | None) -> str | None:
+        """Normalize model_name for storage: strip whitespace, truncate to 128 chars."""
+        if model_name is None:
+            return None
+        if not isinstance(model_name, str):
+            model_name = str(model_name)
+        normalized = model_name.strip()
+        if len(normalized) > 128:
+            normalized = normalized[:128]
+        return normalized
+
    @staticmethod
    def _safe_json(obj: Any) -> Any:
        """Ensure obj is JSON-serializable. Falls back to model_dump() or str()."""
@@ -70,6 +82,7 @@ class RunRepository(RunStore):
        thread_id,
        assistant_id=None,
        user_id: str | None | _AutoSentinel = AUTO,
+        model_name: str | None = None,
        status="pending",
        multitask_strategy="reject",
        metadata=None,
@@ -85,6 +98,7 @@ class RunRepository(RunStore):
            thread_id=thread_id,
            assistant_id=assistant_id,
            user_id=resolved_user_id,
+            model_name=self._normalize_model_name(model_name),
            status=status,
            multitask_strategy=multitask_strategy,
            metadata_json=self._safe_json(metadata) or {},
@@ -4,7 +4,7 @@ from __future__ import annotations

 from typing import TYPE_CHECKING

-from deerflow.persistence.thread_meta.base import ThreadMetaStore
+from deerflow.persistence.thread_meta.base import InvalidMetadataFilterError, ThreadMetaStore
 from deerflow.persistence.thread_meta.memory import MemoryThreadMetaStore
 from deerflow.persistence.thread_meta.model import ThreadMetaRow
 from deerflow.persistence.thread_meta.sql import ThreadMetaRepository
@@ -14,6 +14,7 @@ if TYPE_CHECKING:
    from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker

 __all__ = [
+    "InvalidMetadataFilterError",
    "MemoryThreadMetaStore",
    "ThreadMetaRepository",
    "ThreadMetaRow",
@@ -15,10 +15,15 @@ three-state semantics (see :mod:`deerflow.runtime.user_context`):
 from __future__ import annotations

 import abc
+from typing import Any

 from deerflow.runtime.user_context import AUTO, _AutoSentinel


+class InvalidMetadataFilterError(ValueError):
+    """Raised when all client-supplied metadata filter keys are rejected."""
+
+
 class ThreadMetaStore(abc.ABC):
    @abc.abstractmethod
    async def create(
@@ -40,12 +45,12 @@ class ThreadMetaStore(abc.ABC):
    async def search(
        self,
        *,
-        metadata: dict | None = None,
+        metadata: dict[str, Any] | None = None,
        status: str | None = None,
        limit: int = 100,
        offset: int = 0,
        user_id: str | None | _AutoSentinel = AUTO,
-    ) -> list[dict]:
+    ) -> list[dict[str, Any]]:
        pass

    @abc.abstractmethod
@@ -69,12 +69,12 @@ class MemoryThreadMetaStore(ThreadMetaStore):
    async def search(
        self,
        *,
-        metadata: dict | None = None,
+        metadata: dict[str, Any] | None = None,
        status: str | None = None,
        limit: int = 100,
        offset: int = 0,
        user_id: str | None | _AutoSentinel = AUTO,
-    ) -> list[dict]:
+    ) -> list[dict[str, Any]]:
        resolved_user_id = resolve_user_id(user_id, method_name="MemoryThreadMetaStore.search")
        filter_dict: dict[str, Any] = {}
        if metadata:
@@ -2,16 +2,20 @@

 from __future__ import annotations

+import logging
 from datetime import UTC, datetime
 from typing import Any

 from sqlalchemy import select, update
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker

-from deerflow.persistence.thread_meta.base import ThreadMetaStore
+from deerflow.persistence.json_compat import json_match
+from deerflow.persistence.thread_meta.base import InvalidMetadataFilterError, ThreadMetaStore
 from deerflow.persistence.thread_meta.model import ThreadMetaRow
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, resolve_user_id

+logger = logging.getLogger(__name__)
+

 class ThreadMetaRepository(ThreadMetaStore):
    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
@@ -20,7 +24,7 @@ class ThreadMetaRepository(ThreadMetaStore):
    @staticmethod
    def _row_to_dict(row: ThreadMetaRow) -> dict[str, Any]:
        d = row.to_dict()
-        d["metadata"] = d.pop("metadata_json", {})
+        d["metadata"] = d.pop("metadata_json", None) or {}
        for key in ("created_at", "updated_at"):
            val = d.get(key)
            if isinstance(val, datetime):
@@ -104,39 +108,43 @@ class ThreadMetaRepository(ThreadMetaStore):
    async def search(
        self,
        *,
-        metadata: dict | None = None,
+        metadata: dict[str, Any] | None = None,
        status: str | None = None,
        limit: int = 100,
        offset: int = 0,
        user_id: str | None | _AutoSentinel = AUTO,
-    ) -> list[dict]:
+    ) -> list[dict[str, Any]]:
        """Search threads with optional metadata and status filters.

        Owner filter is enforced by default: caller must be in a user
        context. Pass ``user_id=None`` to bypass (migration/CLI).
        """
        resolved_user_id = resolve_user_id(user_id, method_name="ThreadMetaRepository.search")
-        stmt = select(ThreadMetaRow).order_by(ThreadMetaRow.updated_at.desc())
+        stmt = select(ThreadMetaRow).order_by(ThreadMetaRow.updated_at.desc(), ThreadMetaRow.thread_id.desc())
        if resolved_user_id is not None:
            stmt = stmt.where(ThreadMetaRow.user_id == resolved_user_id)
        if status:
            stmt = stmt.where(ThreadMetaRow.status == status)

        if metadata:
-            # When metadata filter is active, fetch a larger window and filter
-            # in Python. TODO(Phase 2): use JSON DB operators (Postgres @>,
-            # SQLite json_extract) for server-side filtering.
-            stmt = stmt.limit(limit * 5 + offset)
-            async with self._sf() as session:
-                result = await session.execute(stmt)
-                rows = [self._row_to_dict(r) for r in result.scalars()]
-            rows = [r for r in rows if all(r.get("metadata", {}).get(k) == v for k, v in metadata.items())]
-            return rows[offset : offset + limit]
-        else:
-            stmt = stmt.limit(limit).offset(offset)
-            async with self._sf() as session:
-                result = await session.execute(stmt)
-                return [self._row_to_dict(r) for r in result.scalars()]
+            applied = 0
+            for key, value in metadata.items():
+                try:
+                    stmt = stmt.where(json_match(ThreadMetaRow.metadata_json, key, value))
+                    applied += 1
+                except (ValueError, TypeError) as exc:
+                    logger.warning("Skipping metadata filter key %s: %s", ascii(key), exc)
+            if applied == 0:
+                # Comma-separated plain string (no list repr / nested
+                # quoting) so the 400 detail surfaced by the Gateway is
+                # easy for clients to read. Sorted for determinism.
+                rejected_keys = ", ".join(sorted(str(k) for k in metadata))
+                raise InvalidMetadataFilterError(f"All metadata filter keys were rejected as unsafe: {rejected_keys}")
+
+        stmt = stmt.limit(limit).offset(offset)
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]

    async def _check_ownership(self, session: AsyncSession, thread_id: str, resolved_user_id: str | None) -> bool:
        """Return True if the row exists and is owned (or filter bypassed)."""
@@ -20,12 +20,13 @@ from __future__ import annotations
 import asyncio
 import logging
 import time
+from collections.abc import Mapping
 from datetime import UTC, datetime
 from typing import TYPE_CHECKING, Any, cast
 from uuid import UUID

 from langchain_core.callbacks import BaseCallbackHandler
-from langchain_core.messages import AnyMessage, BaseMessage, HumanMessage, ToolMessage
+from langchain_core.messages import AIMessage, AnyMessage, BaseMessage, HumanMessage, ToolMessage
 from langgraph.types import Command

 if TYPE_CHECKING:
@@ -63,6 +64,16 @@ class RunJournal(BaseCallbackHandler):
        self._total_tokens = 0
        self._llm_call_count = 0

+        # Caller-bucketed token accumulators
+        self._lead_agent_tokens = 0
+        self._subagent_tokens = 0
+        self._middleware_tokens = 0
+
+        # Dedup: LangChain may fire on_llm_end multiple times for the same run_id
+        self._counted_llm_run_ids: set[str] = set()
+        self._counted_external_source_ids: set[str] = set()
+        self._counted_message_llm_run_ids: set[str] = set()
+
        # Convenience fields
        self._last_ai_msg: str | None = None
        self._first_human_msg: str | None = None
@@ -77,6 +88,50 @@ class RunJournal(BaseCallbackHandler):

    # -- Lifecycle callbacks --

+    @staticmethod
+    def _message_text(message: BaseMessage) -> str:
+        """Extract displayable text from a message's mixed content shape."""
+        content = getattr(message, "content", None)
+        if isinstance(content, str):
+            return content
+        if isinstance(content, list):
+            parts: list[str] = []
+            for block in content:
+                if isinstance(block, str):
+                    parts.append(block)
+                elif isinstance(block, Mapping):
+                    text = block.get("text")
+                    if isinstance(text, str):
+                        parts.append(text)
+                    else:
+                        nested = block.get("content")
+                        if isinstance(nested, str):
+                            parts.append(nested)
+            return "".join(parts)
+        if isinstance(content, Mapping):
+            for key in ("text", "content"):
+                value = content.get(key)
+                if isinstance(value, str):
+                    return value
+
+        text = getattr(message, "text", None)
+        if isinstance(text, str):
+            return text
+        return ""
+
+    def _record_message_summary(self, message: BaseMessage, *, caller: str | None = None) -> None:
+        """Update run-level convenience fields for persisted run rows."""
+        self._msg_count += 1
+
+        # ``last_ai_message`` should represent the lead agent's user-facing
+        # answer. Middleware/subagent model calls and empty tool-call-only
+        # AI messages must not overwrite the last useful assistant text.
+        is_ai_message = isinstance(message, AIMessage) or getattr(message, "type", None) == "ai"
+        if is_ai_message and (caller is None or caller == "lead_agent"):
+            text = self._message_text(message).strip()
+            if text:
+                self._last_ai_msg = text[:2000]
+
    def on_chain_start(
        self,
        serialized: dict[str, Any],
@@ -155,6 +210,7 @@ class RunJournal(BaseCallbackHandler):
                            content=m.model_dump(),
                            metadata={"caller": caller},
                        )
+                        self._record_message_summary(m, caller=caller)
                        break
                if self._first_human_msg:
                    break
@@ -213,20 +269,34 @@ class RunJournal(BaseCallbackHandler):
                    "llm_call_index": call_index,
                },
            )
+            if rid not in self._counted_message_llm_run_ids:
+                self._record_message_summary(message, caller=caller)

-            # Token accumulation
+            # Token accumulation (dedup by langchain run_id to avoid double-counting
+            # when the callback fires more than once for the same response)
            if self._track_tokens:
                input_tk = usage_dict.get("input_tokens", 0) or 0
                output_tk = usage_dict.get("output_tokens", 0) or 0
                total_tk = usage_dict.get("total_tokens", 0) or 0
                if total_tk == 0:
                    total_tk = input_tk + output_tk
-                if total_tk > 0:
+                if total_tk > 0 and rid not in self._counted_llm_run_ids:
+                    self._counted_llm_run_ids.add(rid)
                    self._total_input_tokens += input_tk
                    self._total_output_tokens += output_tk
                    self._total_tokens += total_tk
                    self._llm_call_count += 1

+                    if caller.startswith("subagent:"):
+                        self._subagent_tokens += total_tk
+                    elif caller.startswith("middleware:"):
+                        self._middleware_tokens += total_tk
+                    else:
+                        self._lead_agent_tokens += total_tk
+
+        if messages:
+            self._counted_message_llm_run_ids.add(str(run_id))
+
    def on_llm_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
        self._llm_start_times.pop(str(run_id), None)
        self._put(event_type="llm.error", category="trace", content=str(error))
@@ -242,12 +312,14 @@ class RunJournal(BaseCallbackHandler):
            if isinstance(output, ToolMessage):
                msg = cast(ToolMessage, output)
                self._put(event_type="llm.tool.result", category="message", content=msg.model_dump())
+                self._record_message_summary(msg)
            elif isinstance(output, Command):
                cmd = cast(Command, output)
                messages = cmd.update.get("messages", [])
                for message in messages:
                    if isinstance(message, BaseMessage):
                        self._put(event_type="llm.tool.result", category="message", content=message.model_dump())
+                        self._record_message_summary(message)
                    else:
                        logger.warning(f"on_tool_end {run_id}: command update message is not BaseMessage: {type(message)}")
            else:
@@ -330,6 +402,49 @@ class RunJournal(BaseCallbackHandler):

    # -- Public methods (called by worker) --

+    def record_external_llm_usage_records(
+        self,
+        records: list[dict[str, int | str]],
+    ) -> None:
+        """Record token usage from external sources (e.g., subagents).
+
+        Each record should contain:
+            source_run_id: Unique identifier to prevent double-counting
+            caller: Caller tag (e.g. "subagent:general-purpose")
+            input_tokens: Input token count
+            output_tokens: Output token count
+            total_tokens: Total token count (computed from input+output if 0/missing)
+        """
+        if not self._track_tokens:
+            return
+        for record in records:
+            source_id = str(record.get("source_run_id", ""))
+            if not source_id:
+                continue
+            if source_id in self._counted_external_source_ids:
+                continue
+
+            total_tk = record.get("total_tokens", 0) or 0
+            if total_tk <= 0:
+                input_tk = record.get("input_tokens", 0) or 0
+                output_tk = record.get("output_tokens", 0) or 0
+                total_tk = input_tk + output_tk
+            if total_tk <= 0:
+                continue
+
+            self._counted_external_source_ids.add(source_id)
+            self._total_input_tokens += record.get("input_tokens", 0) or 0
+            self._total_output_tokens += record.get("output_tokens", 0) or 0
+            self._total_tokens += total_tk
+
+            caller = str(record.get("caller", ""))
+            if caller.startswith("subagent:"):
+                self._subagent_tokens += total_tk
+            elif caller.startswith("middleware:"):
+                self._middleware_tokens += total_tk
+            else:
+                self._lead_agent_tokens += total_tk
+
    def set_first_human_message(self, content: str) -> None:
        """Record the first human message for convenience fields."""
        self._first_human_msg = content[:2000] if content else None
@@ -376,6 +491,9 @@ class RunJournal(BaseCallbackHandler):
            "total_output_tokens": self._total_output_tokens,
            "total_tokens": self._total_tokens,
            "llm_call_count": self._llm_call_count,
+            "lead_agent_tokens": self._lead_agent_tokens,
+            "subagent_tokens": self._subagent_tokens,
+            "middleware_tokens": self._middleware_tokens,
            "message_count": self._msg_count,
            "last_ai_message": self._last_ai_msg,
            "first_human_message": self._first_human_msg,
@@ -36,6 +36,7 @@ class RunRecord:
    abort_event: asyncio.Event = field(default_factory=asyncio.Event, repr=False)
    abort_action: str = "interrupt"
    error: str | None = None
+    model_name: str | None = None


 class RunManager:
@@ -65,6 +66,7 @@ class RunManager:
                metadata=record.metadata or {},
                kwargs=record.kwargs or {},
                created_at=record.created_at,
+                model_name=record.model_name,
            )
        except Exception:
            logger.warning("Failed to persist run %s to store", record.run_id, exc_info=True)
@@ -137,6 +139,18 @@ class RunManager:
                logger.warning("Failed to persist status update for run %s", run_id, exc_info=True)
        logger.info("Run %s -> %s", run_id, status.value)

+    async def update_model_name(self, run_id: str, model_name: str | None) -> None:
+        """Update the model name for a run."""
+        async with self._lock:
+            record = self._runs.get(run_id)
+            if record is None:
+                logger.warning("update_model_name called for unknown run %s", run_id)
+                return
+            record.model_name = model_name
+            record.updated_at = _now_iso()
+        await self._persist_to_store(record)
+        logger.info("Run %s model_name=%s", run_id, model_name)
+
    async def cancel(self, run_id: str, *, action: str = "interrupt") -> bool:
        """Request cancellation of a run.

@@ -171,6 +185,7 @@ class RunManager:
        metadata: dict | None = None,
        kwargs: dict | None = None,
        multitask_strategy: str = "reject",
+        model_name: str | None = None,
    ) -> RunRecord:
        """Atomically check for inflight runs and create a new one.

@@ -221,6 +236,7 @@ class RunManager:
                kwargs=kwargs or {},
                created_at=now,
                updated_at=now,
+                model_name=model_name,
            )
            self._runs[run_id] = record

@@ -23,6 +23,7 @@ class RunStore(abc.ABC):
        thread_id: str,
        assistant_id: str | None = None,
        user_id: str | None = None,
+        model_name: str | None = None,
        status: str = "pending",
        multitask_strategy: str = "reject",
        metadata: dict[str, Any] | None = None,
@@ -22,6 +22,7 @@ class MemoryRunStore(RunStore):
        thread_id,
        assistant_id=None,
        user_id=None,
+        model_name=None,
        status="pending",
        multitask_strategy="reject",
        metadata=None,
@@ -35,6 +36,7 @@ class MemoryRunStore(RunStore):
            "thread_id": thread_id,
            "assistant_id": assistant_id,
            "user_id": user_id,
+            "model_name": model_name,
            "status": status,
            "multitask_strategy": multitask_strategy,
            "metadata": metadata or {},
@@ -230,6 +230,17 @@ async def run_agent(
        else:
            agent = agent_factory(config=runnable_config)

+        # Capture the effective (resolved) model name from the agent's metadata.
+        # _resolve_model_name in agent.py may return the default model if the
+        # requested name is not in the allowlist — this update ensures the
+        # persisted model_name reflects the actual model used.
+        if record.model_name is not None:
+            resolved = getattr(agent, "metadata", {}) or {}
+            if isinstance(resolved, dict):
+                effective = resolved.get("model_name")
+                if effective and effective != record.model_name:
+                    await run_manager.update_model_name(record.run_id, effective)
+
        # 4. Attach checkpointer and store
        if checkpointer is not None:
            agent.checkpointer = checkpointer
@@ -109,6 +109,34 @@ def get_effective_user_id() -> str:
    return str(user.id)


+def resolve_runtime_user_id(runtime: object | None) -> str:
+    """Single source of truth for a tool/middleware's effective user_id.
+
+    Resolution order (most authoritative first):
+      1. ``runtime.context["user_id"]`` — set by ``inject_authenticated_user_context``
+         in the gateway from the auth-validated ``request.state.user``. This is
+         the only source that survives boundaries where the contextvar may have
+         been lost (background tasks scheduled outside the request task,
+         worker pools that don't copy_context, future cross-process drivers).
+      2. The ``_current_user`` ContextVar — set by the auth middleware at
+         request entry. Reliable for in-task work; copied by ``asyncio``
+         child tasks and by ``ContextThreadPoolExecutor``.
+      3. ``DEFAULT_USER_ID`` — last-resort fallback so unauthenticated
+         CLI / migration / test paths keep working without raising.
+
+    Tools that persist user-scoped state (custom agents, memory, uploads)
+    MUST call this instead of ``get_effective_user_id()`` directly so they
+    benefit from the runtime.context channel that ``setup_agent`` already
+    relies on.
+    """
+    context = getattr(runtime, "context", None)
+    if isinstance(context, dict):
+        ctx_user_id = context.get("user_id")
+        if ctx_user_id:
+            return str(ctx_user_id)
+    return get_effective_user_id()
+
+
 # ---------------------------------------------------------------------------
 # Sentinel-based user_id resolution
 # ---------------------------------------------------------------------------
@@ -119,3 +119,13 @@ class LocalSandboxProvider(SandboxProvider):
        # For Docker-based providers (e.g., AioSandboxProvider), cleanup
        # happens at application shutdown via the shutdown() method.
        pass
+
+    def reset(self) -> None:
+        # reset_sandbox_provider() must also clear the module singleton.
+        global _singleton
+        _singleton = None
+
+    def shutdown(self) -> None:
+        # LocalSandboxProvider has no extra resources beyond the shared
+        # singleton, so shutdown uses the same cleanup path as reset.
+        self.reset()
@@ -37,6 +37,10 @@ class SandboxProvider(ABC):
        """
        pass

+    def reset(self) -> None:
+        """Clear cached state that survives provider instance replacement."""
+        pass
+

 _default_sandbox_provider: SandboxProvider | None = None

@@ -65,11 +69,18 @@ def reset_sandbox_provider() -> None:
    The next call to `get_sandbox_provider()` will create a new instance.
    Useful for testing or when switching configurations.

+    Providers can override `reset()` to clear any module-level state they keep
+    alive across instances (for example, `LocalSandboxProvider`'s cached
+    `LocalSandbox` singleton). Without it, config/mount changes would not take
+    effect on the next acquire().
+
    Note: If the provider has active sandboxes, they will be orphaned.
    Use `shutdown_sandbox_provider()` for proper cleanup.
    """
    global _default_sandbox_provider
-    _default_sandbox_provider = None
+    if _default_sandbox_provider is not None:
+        _default_sandbox_provider.reset()
+        _default_sandbox_provider = None


 def shutdown_sandbox_provider() -> None:
@@ -1499,12 +1499,13 @@ def write_file_tool(
    content: str,
    append: bool = False,
 ) -> str:
-    """Write text content to a file.
+    """Write text content to a file. By default this overwrites the target file; set append to true to add content to the end without replacing existing content.

    Args:
        description: Explain why you are writing to this file in short words. ALWAYS PROVIDE THIS PARAMETER FIRST.
        path: The **absolute** path to the file to write to. ALWAYS PROVIDE THIS PARAMETER SECOND.
        content: The content to write to the file. ALWAYS PROVIDE THIS PARAMETER THIRD.
+        append: Whether to append content to the end of the file instead of overwriting it. Defaults to false.
    """
    try:
        sandbox = ensure_sandbox_initialized(runtime)
@@ -26,7 +26,7 @@ class SubagentConfig:

    name: str
    description: str
-    system_prompt: str
+    system_prompt: str | None = None
    tools: list[str] | None = None
    disallowed_tools: list[str] | None = field(default_factory=lambda: ["task"])
    skills: list[str] | None = None
@@ -26,6 +26,7 @@ from deerflow.models import create_chat_model
 from deerflow.skills.tool_policy import filter_tools_by_skill_allowed_tools
 from deerflow.skills.types import Skill
 from deerflow.subagents.config import SubagentConfig, resolve_subagent_model_name
+from deerflow.subagents.token_collector import SubagentTokenCollector

 logger = logging.getLogger(__name__)

@@ -70,6 +71,8 @@ class SubagentResult:
    started_at: datetime | None = None
    completed_at: datetime | None = None
    ai_messages: list[dict[str, Any]] | None = None
+    token_usage_records: list[dict[str, int | str]] = field(default_factory=list)
+    usage_reported: bool = False
    cancel_event: threading.Event = field(default_factory=threading.Event, repr=False)

    def __post_init__(self):
@@ -283,11 +286,13 @@ class SubagentExecutor:
        # Reuse shared middleware composition with lead agent.
        middlewares = build_subagent_runtime_middlewares(app_config=app_config, model_name=self.model_name, lazy_init=True)

+        # system_prompt is included in initial state messages (see _build_initial_state)
+        # to avoid multiple SystemMessages which some LLM APIs don't support.
        return create_agent(
            model=model,
            tools=tools if tools is not None else self.tools,
            middleware=middlewares,
-            system_prompt=self.config.system_prompt,
+            system_prompt=None,
            state_schema=ThreadState,
        )

@@ -362,14 +367,25 @@ class SubagentExecutor:
        Returns:
            Initial state dictionary and tools filtered by loaded skill metadata.
        """
+
        # Load skills as conversation items (Codex pattern)
        skills = await self._load_skills()
        filtered_tools = self._apply_skill_allowed_tools(skills)
        skill_messages = await self._load_skill_messages(skills)

+        # Combine system_prompt and skills into a single SystemMessage.
+        # Some LLM APIs reject multiple SystemMessages with
+        # "System message must be at the beginning."
+        system_parts: list[str] = []
+        if self.config.system_prompt:
+            system_parts.append(self.config.system_prompt)
+        for skill_msg in skill_messages:
+            system_parts.append(skill_msg.content)
+
        messages: list[Any] = []
-        # Skill content injected as developer/system messages before the task
-        messages.extend(skill_messages)
+        if system_parts:
+            messages.append(SystemMessage(content="\n\n".join(system_parts)))
+
        # Then the actual task
        messages.append(HumanMessage(content=task))

@@ -412,13 +428,20 @@ class SubagentExecutor:
            ai_messages = []
            result.ai_messages = ai_messages

+        collector: SubagentTokenCollector | None = None
        try:
            state, filtered_tools = await self._build_initial_state(task)
            agent = self._create_agent(filtered_tools)

+            # Token collector for subagent LLM calls
+            collector_caller = f"subagent:{self.config.name}"
+            collector = SubagentTokenCollector(caller=collector_caller)
+
            # Build config with thread_id for sandbox access and recursion limit
            run_config: RunnableConfig = {
                "recursion_limit": self.config.max_turns,
+                "callbacks": [collector],
+                "tags": [collector_caller],
            }
            context: dict[str, Any] = {}
            if self.thread_id:
@@ -441,6 +464,8 @@ class SubagentExecutor:
                        result.status = SubagentStatus.CANCELLED
                        result.error = "Cancelled by user"
                        result.completed_at = datetime.now()
+                if collector is not None:
+                    result.token_usage_records = collector.snapshot_records()
                return result

            async for chunk in agent.astream(state, config=run_config, context=context, stream_mode="values"):  # type: ignore[arg-type]
@@ -455,6 +480,7 @@ class SubagentExecutor:
                            result.status = SubagentStatus.CANCELLED
                            result.error = "Cancelled by user"
                            result.completed_at = datetime.now()
+                    result.token_usage_records = collector.snapshot_records()
                    return result

                final_state = chunk
@@ -481,6 +507,7 @@ class SubagentExecutor:
                            logger.info(f"[trace={self.trace_id}] Subagent {self.config.name} captured AI message #{len(ai_messages)}")

            logger.info(f"[trace={self.trace_id}] Subagent {self.config.name} completed async execution")
+            result.token_usage_records = collector.snapshot_records()

            if final_state is None:
                logger.warning(f"[trace={self.trace_id}] Subagent {self.config.name} no final state")
@@ -560,6 +587,8 @@ class SubagentExecutor:
            result.status = SubagentStatus.FAILED
            result.error = str(e)
            result.completed_at = datetime.now()
+            if collector is not None:
+                result.token_usage_records = collector.snapshot_records()

        return result

@@ -0,0 +1,63 @@
+"""Callback handler that collects LLM token usage within a subagent.
+
+Each subagent execution creates its own collector. After the subagent
+finishes, the collected records are transferred to the parent RunJournal
+via :meth:`RunJournal.record_external_llm_usage_records`.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+from langchain_core.callbacks import BaseCallbackHandler
+
+
+class SubagentTokenCollector(BaseCallbackHandler):
+    """Lightweight callback handler that collects LLM token usage within a subagent."""
+
+    def __init__(self, caller: str):
+        super().__init__()
+        self.caller = caller
+        self._records: list[dict[str, int | str]] = []
+        self._counted_run_ids: set[str] = set()
+
+    def on_llm_end(
+        self,
+        response: Any,
+        *,
+        run_id: Any,
+        tags: list[str] | None = None,
+        **kwargs: Any,
+    ) -> None:
+        rid = str(run_id)
+        if rid in self._counted_run_ids:
+            return
+
+        for generation in response.generations:
+            for gen in generation:
+                if not hasattr(gen, "message"):
+                    continue
+                usage = getattr(gen.message, "usage_metadata", None)
+                usage_dict = dict(usage) if usage else {}
+                input_tk = usage_dict.get("input_tokens", 0) or 0
+                output_tk = usage_dict.get("output_tokens", 0) or 0
+                total_tk = usage_dict.get("total_tokens", 0) or 0
+                if total_tk <= 0:
+                    total_tk = input_tk + output_tk
+                if total_tk <= 0:
+                    continue
+                self._counted_run_ids.add(rid)
+                self._records.append(
+                    {
+                        "source_run_id": rid,
+                        "caller": self.caller,
+                        "input_tokens": input_tk,
+                        "output_tokens": output_tk,
+                        "total_tokens": total_tk,
+                    }
+                )
+                return
+
+    def snapshot_records(self) -> list[dict[str, int | str]]:
+        """Return a copy of the accumulated usage records."""
+        return list(self._records)
@@ -7,20 +7,13 @@ from langgraph.types import Command

 from deerflow.config.agents_config import validate_agent_name
 from deerflow.config.paths import get_paths
-from deerflow.runtime.user_context import get_effective_user_id
+from deerflow.runtime.user_context import resolve_runtime_user_id
 from deerflow.tools.types import Runtime

 logger = logging.getLogger(__name__)


-def _get_runtime_user_id(runtime: Runtime) -> str:
-    context_user_id = runtime.context.get("user_id") if runtime.context else None
-    if context_user_id:
-        return str(context_user_id)
-    return get_effective_user_id()
-
-
-@tool
+@tool(parse_docstring=True)
 def setup_agent(
    soul: str,
    description: str,
@@ -45,7 +38,7 @@ def setup_agent(
        if agent_name:
            # Custom agents are persisted under the current user's bucket so
            # different users do not see each other's agents.
-            user_id = _get_runtime_user_id(runtime)
+            user_id = resolve_runtime_user_id(runtime)
            agent_dir = paths.user_agent_dir(user_id, agent_name)
        else:
            # Default agent (no agent_name): SOUL.md lives at the global base dir.
@@ -27,6 +27,92 @@ if TYPE_CHECKING:
 logger = logging.getLogger(__name__)


+def _is_subagent_terminal(result: Any) -> bool:
+    """Return whether a background subagent result is safe to clean up."""
+    return result.status in {SubagentStatus.COMPLETED, SubagentStatus.FAILED, SubagentStatus.CANCELLED, SubagentStatus.TIMED_OUT} or getattr(result, "completed_at", None) is not None
+
+
+async def _await_subagent_terminal(task_id: str, max_polls: int) -> Any | None:
+    """Poll until the background subagent reaches a terminal status or we run out of polls."""
+    for _ in range(max_polls):
+        result = get_background_task_result(task_id)
+        if result is None:
+            return None
+        if _is_subagent_terminal(result):
+            return result
+        await asyncio.sleep(5)
+    return None
+
+
+async def _deferred_cleanup_subagent_task(task_id: str, trace_id: str, max_polls: int) -> None:
+    """Keep polling a cancelled subagent until it can be safely removed."""
+    cleanup_poll_count = 0
+    while True:
+        result = get_background_task_result(task_id)
+        if result is None:
+            return
+        if _is_subagent_terminal(result):
+            cleanup_background_task(task_id)
+            return
+        if cleanup_poll_count >= max_polls:
+            logger.warning(f"[trace={trace_id}] Deferred cleanup for task {task_id} timed out after {cleanup_poll_count} polls")
+            return
+        await asyncio.sleep(5)
+        cleanup_poll_count += 1
+
+
+def _log_cleanup_failure(cleanup_task: asyncio.Task[None], *, trace_id: str, task_id: str) -> None:
+    if cleanup_task.cancelled():
+        return
+
+    exc = cleanup_task.exception()
+    if exc is not None:
+        logger.error(f"[trace={trace_id}] Deferred cleanup failed for task {task_id}: {exc}")
+
+
+def _schedule_deferred_subagent_cleanup(task_id: str, trace_id: str, max_polls: int) -> None:
+    logger.debug(f"[trace={trace_id}] Scheduling deferred cleanup for cancelled task {task_id}")
+    cleanup_task = asyncio.create_task(_deferred_cleanup_subagent_task(task_id, trace_id, max_polls))
+    cleanup_task.add_done_callback(lambda task: _log_cleanup_failure(task, trace_id=trace_id, task_id=task_id))
+
+
+def _find_usage_recorder(runtime: Any) -> Any | None:
+    """Find a callback handler with ``record_external_llm_usage_records`` in the runtime config."""
+    if runtime is None:
+        return None
+    config = getattr(runtime, "config", None)
+    if not isinstance(config, dict):
+        return None
+    callbacks = config.get("callbacks", [])
+    if not callbacks:
+        return None
+    for cb in callbacks:
+        if hasattr(cb, "record_external_llm_usage_records"):
+            return cb
+    return None
+
+
+def _report_subagent_usage(runtime: Any, result: Any) -> None:
+    """Report subagent token usage to the parent RunJournal, if available.
+
+    Each subagent task must be reported only once (guarded by usage_reported).
+    """
+    if getattr(result, "usage_reported", True):
+        return
+    records = getattr(result, "token_usage_records", None) or []
+    if not records:
+        return
+    journal = _find_usage_recorder(runtime)
+    if journal is None:
+        logger.debug("No usage recorder found in runtime callbacks — subagent token usage not recorded")
+        return
+    try:
+        journal.record_external_llm_usage_records(records)
+        result.usage_reported = True
+    except Exception:
+        logger.warning("Failed to report subagent token usage", exc_info=True)
+
+
 def _get_runtime_app_config(runtime: Any) -> "AppConfig | None":
    context = getattr(runtime, "context", None)
    if isinstance(context, dict):
@@ -227,21 +313,25 @@ async def task_tool(

            # Check if task completed, failed, or timed out
            if result.status == SubagentStatus.COMPLETED:
+                _report_subagent_usage(runtime, result)
                writer({"type": "task_completed", "task_id": task_id, "result": result.result})
                logger.info(f"[trace={trace_id}] Task {task_id} completed after {poll_count} polls")
                cleanup_background_task(task_id)
                return f"Task Succeeded. Result: {result.result}"
            elif result.status == SubagentStatus.FAILED:
+                _report_subagent_usage(runtime, result)
                writer({"type": "task_failed", "task_id": task_id, "error": result.error})
                logger.error(f"[trace={trace_id}] Task {task_id} failed: {result.error}")
                cleanup_background_task(task_id)
                return f"Task failed. Error: {result.error}"
            elif result.status == SubagentStatus.CANCELLED:
+                _report_subagent_usage(runtime, result)
                writer({"type": "task_cancelled", "task_id": task_id, "error": result.error})
                logger.info(f"[trace={trace_id}] Task {task_id} cancelled: {result.error}")
                cleanup_background_task(task_id)
                return "Task cancelled by user."
            elif result.status == SubagentStatus.TIMED_OUT:
+                _report_subagent_usage(runtime, result)
                writer({"type": "task_timed_out", "task_id": task_id, "error": result.error})
                logger.warning(f"[trace={trace_id}] Task {task_id} timed out: {result.error}")
                cleanup_background_task(task_id)
@@ -260,43 +350,28 @@ async def task_tool(
            if poll_count > max_poll_count:
                timeout_minutes = config.timeout_seconds // 60
                logger.error(f"[trace={trace_id}] Task {task_id} polling timed out after {poll_count} polls (should have been caught by thread pool timeout)")
+                _report_subagent_usage(runtime, result)
                writer({"type": "task_timed_out", "task_id": task_id})
                return f"Task polling timed out after {timeout_minutes} minutes. This may indicate the background task is stuck. Status: {result.status.value}"
    except asyncio.CancelledError:
        # Signal the background subagent thread to stop cooperatively.
-        # Without this, the thread (running in ThreadPoolExecutor with its
-        # own event loop via asyncio.run) would continue executing even
-        # after the parent task is cancelled.
        request_cancel_background_task(task_id)

-        async def cleanup_when_done() -> None:
-            max_cleanup_polls = max_poll_count
-            cleanup_poll_count = 0
+        # Wait (shielded) for the subagent to reach a terminal state so the
+        # final token usage snapshot is reported to the parent RunJournal
+        # before the parent worker persists get_completion_data().
+        terminal_result = None
+        try:
+            terminal_result = await asyncio.shield(_await_subagent_terminal(task_id, max_poll_count))
+        except asyncio.CancelledError:
+            pass

-            while True:
-                result = get_background_task_result(task_id)
-                if result is None:
-                    return
-
-                if result.status in {SubagentStatus.COMPLETED, SubagentStatus.FAILED, SubagentStatus.CANCELLED, SubagentStatus.TIMED_OUT} or getattr(result, "completed_at", None) is not None:
-                    cleanup_background_task(task_id)
-                    return
-
-                if cleanup_poll_count > max_cleanup_polls:
-                    logger.warning(f"[trace={trace_id}] Deferred cleanup for task {task_id} timed out after {cleanup_poll_count} polls")
-                    return
-
-                await asyncio.sleep(5)
-                cleanup_poll_count += 1
-
-        def log_cleanup_failure(cleanup_task: asyncio.Task[None]) -> None:
-            if cleanup_task.cancelled():
-                return
-
-            exc = cleanup_task.exception()
-            if exc is not None:
-                logger.error(f"[trace={trace_id}] Deferred cleanup failed for task {task_id}: {exc}")
-
-        logger.debug(f"[trace={trace_id}] Scheduling deferred cleanup for cancelled task {task_id}")
-        asyncio.create_task(cleanup_when_done()).add_done_callback(log_cleanup_failure)
+        # Report whatever the subagent collected (even if we timed out).
+        final_result = terminal_result or get_background_task_result(task_id)
+        if final_result is not None:
+            _report_subagent_usage(runtime, final_result)
+        if final_result is not None and _is_subagent_terminal(final_result):
+            cleanup_background_task(task_id)
+        else:
+            _schedule_deferred_subagent_cleanup(task_id, trace_id, max_poll_count)
        raise
@@ -27,7 +27,7 @@ from langgraph.types import Command
 from deerflow.config.agents_config import load_agent_config, validate_agent_name
 from deerflow.config.app_config import get_app_config
 from deerflow.config.paths import get_paths
-from deerflow.runtime.user_context import get_effective_user_id
+from deerflow.runtime.user_context import resolve_runtime_user_id
 from deerflow.tools.types import Runtime

 logger = logging.getLogger(__name__)
@@ -67,7 +67,7 @@ def _cleanup_temps(temps: list[Path]) -> None:
            logger.debug("Failed to clean up temp file %s", tmp, exc_info=True)


-@tool
+@tool(parse_docstring=True)
 def update_agent(
    runtime: Runtime,
    soul: str | None = None,
@@ -118,9 +118,13 @@ def update_agent(
        return _err("update_agent is only available inside a custom agent's chat. There is no agent_name in the current runtime context, so there is nothing to update. If you are inside the bootstrap flow, use setup_agent instead.")

    # Resolve the active user so that updates only affect this user's agent.
-    # ``get_effective_user_id`` returns DEFAULT_USER_ID when no auth context
-    # is set (matching how memory and thread storage behave).
-    user_id = get_effective_user_id()
+    # ``resolve_runtime_user_id`` prefers ``runtime.context["user_id"]`` (set by
+    # the gateway from the auth-validated request) and falls back to the
+    # contextvar, then DEFAULT_USER_ID. This matches setup_agent so a user
+    # creating an agent and later refining it always touches the same files,
+    # even if the contextvar gets lost across an async/thread boundary
+    # (issue #2782 / #2862 class of bugs).
+    user_id = resolve_runtime_user_id(runtime)

    # Reject an unknown ``model`` *before* touching the filesystem. Otherwise
    # ``_resolve_model_name`` silently falls back to the default at runtime
@@ -10,11 +10,11 @@ from weakref import WeakValueDictionary
 from langchain.tools import tool

 from deerflow.agents.lead_agent.prompt import refresh_skills_system_prompt_cache_async
-from deerflow.mcp.tools import _make_sync_tool_wrapper
 from deerflow.skills.security_scanner import scan_skill_content
 from deerflow.skills.storage import get_or_new_skill_storage
 from deerflow.skills.storage.skill_storage import SkillStorage
 from deerflow.skills.types import SKILL_MD_FILE
+from deerflow.tools.sync import make_sync_tool_wrapper
 from deerflow.tools.types import Runtime

 logger = logging.getLogger(__name__)
@@ -235,4 +235,4 @@ async def skill_manage_tool(
    )


-skill_manage_tool.func = _make_sync_tool_wrapper(_skill_manage_impl, "skill_manage")
+skill_manage_tool.func = make_sync_tool_wrapper(_skill_manage_impl, "skill_manage")
@@ -0,0 +1,36 @@
+"""Utilities for invoking async tools from synchronous agent paths."""
+
+import asyncio
+import atexit
+import concurrent.futures
+import logging
+from collections.abc import Callable
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+# Shared thread pool for sync tool invocation in async environments.
+_SYNC_TOOL_EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=10, thread_name_prefix="tool-sync")
+
+atexit.register(lambda: _SYNC_TOOL_EXECUTOR.shutdown(wait=False))
+
+
+def make_sync_tool_wrapper(coro: Callable[..., Any], tool_name: str) -> Callable[..., Any]:
+    """Build a synchronous wrapper for an asynchronous tool coroutine."""
+
+    def sync_wrapper(*args: Any, **kwargs: Any) -> Any:
+        try:
+            loop = asyncio.get_running_loop()
+        except RuntimeError:
+            loop = None
+
+        try:
+            if loop is not None and loop.is_running():
+                future = _SYNC_TOOL_EXECUTOR.submit(asyncio.run, coro(*args, **kwargs))
+                return future.result()
+            return asyncio.run(coro(*args, **kwargs))
+        except Exception as e:
+            logger.error("Error invoking tool %r via sync wrapper: %s", tool_name, e, exc_info=True)
+            raise
+
+    return sync_wrapper
@@ -8,6 +8,7 @@ from deerflow.reflection import resolve_variable
 from deerflow.sandbox.security import is_host_bash_allowed
 from deerflow.tools.builtins import ask_clarification_tool, present_file_tool, task_tool, view_image_tool
 from deerflow.tools.builtins.tool_search import reset_deferred_registry
+from deerflow.tools.sync import make_sync_tool_wrapper

 logger = logging.getLogger(__name__)

@@ -33,6 +34,13 @@ def _is_host_bash_tool(tool: object) -> bool:
    return False


+def _ensure_sync_invocable_tool(tool: BaseTool) -> BaseTool:
+    """Attach a sync wrapper to async-only tools used by sync agent callers."""
+    if getattr(tool, "func", None) is None and getattr(tool, "coroutine", None) is not None:
+        tool.func = make_sync_tool_wrapper(tool.coroutine, tool.name)
+    return tool
+
+
 def get_available_tools(
    groups: list[str] | None = None,
    include_mcp: bool = True,
@@ -77,7 +85,7 @@ def get_available_tools(
                cfg.use,
            )

-    loaded_tools = [t for _, t in loaded_tools_raw]
+    loaded_tools = [_ensure_sync_invocable_tool(t) for _, t in loaded_tools_raw]

    # Conditionally add tools based on config
    builtin_tools = BUILTIN_TOOLS.copy()
@@ -0,0 +1,68 @@
+"""Shared helpers for user-isolation e2e tests on the custom-agent tooling.
+
+Centralises the small fake-LLM shim and a few test-data builders that the
+three e2e files in this PR (``test_setup_agent_e2e_user_isolation``,
+``test_update_agent_e2e_user_isolation``, ``test_setup_agent_http_e2e_real_server``)
+all need. The shim is what lets a real ``langchain.agents.create_agent``
+graph run without an API key — every other layer in those tests is real
+production code, which is the entire point of the test design.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+from langchain_core.language_models.fake_chat_models import FakeMessagesListChatModel
+from langchain_core.messages import AIMessage
+from langchain_core.runnables import Runnable
+
+
+class FakeToolCallingModel(FakeMessagesListChatModel):
+    """FakeMessagesListChatModel plus a no-op ``bind_tools`` for create_agent.
+
+    ``langchain.agents.create_agent`` calls ``model.bind_tools(...)`` to
+    expose the tool schemas to the model; the upstream fake raises
+    ``NotImplementedError`` there. We just return ``self`` because we
+    drive deterministic tool_call output via ``responses=...``, no schema
+    handling needed.
+    """
+
+    def bind_tools(  # type: ignore[override]
+        self,
+        tools: Any,
+        *,
+        tool_choice: Any = None,
+        **kwargs: Any,
+    ) -> Runnable:
+        return self
+
+
+def build_single_tool_call_model(
+    *,
+    tool_name: str,
+    tool_args: dict[str, Any],
+    tool_call_id: str = "call_e2e_1",
+    final_text: str = "done",
+) -> FakeToolCallingModel:
+    """Build a fake model that emits exactly one tool_call then finishes.
+
+    Two-turn behaviour, identical across our e2e tests:
+      turn 1 → AIMessage with a single tool_call for *tool_name*
+      turn 2 → AIMessage with *final_text* (terminates the agent loop)
+    """
+    return FakeToolCallingModel(
+        responses=[
+            AIMessage(
+                content="",
+                tool_calls=[
+                    {
+                        "name": tool_name,
+                        "args": tool_args,
+                        "id": tool_call_id,
+                        "type": "tool_call",
+                    }
+                ],
+            ),
+            AIMessage(content=final_text),
+        ]
+    )
@@ -14,6 +14,10 @@ def _ai_with_tool_calls(tool_calls):
    return AIMessage(content="", tool_calls=tool_calls)


+def _ai_with_invalid_tool_calls(invalid_tool_calls):
+    return AIMessage(content="", tool_calls=[], invalid_tool_calls=invalid_tool_calls)
+
+
 def _tool_msg(tool_call_id, name="test_tool"):
    return ToolMessage(content="result", tool_call_id=tool_call_id, name=name)

@@ -22,6 +26,16 @@ def _tc(name="bash", tc_id="call_1"):
    return {"name": name, "id": tc_id, "args": {}}


+def _invalid_tc(name="write_file", tc_id="write_file:36", error="Failed to parse tool arguments: malformed JSON"):
+    return {
+        "type": "invalid_tool_call",
+        "name": name,
+        "id": tc_id,
+        "args": '{"description":"write report","path":"/mnt/user-data/outputs/report.md","content":"bad {"json"}"}',
+        "error": error,
+    }
+
+
 class TestBuildPatchedMessagesNoPatch:
    def test_empty_messages(self):
        mw = DanglingToolCallMiddleware()
@@ -144,6 +158,42 @@ class TestBuildPatchedMessagesPatching:
        assert patched[1].name == "bash"
        assert patched[1].status == "error"

+    def test_invalid_tool_call_is_patched(self):
+        mw = DanglingToolCallMiddleware()
+        msgs = [_ai_with_invalid_tool_calls([_invalid_tc()])]
+        patched = mw._build_patched_messages(msgs)
+        assert patched is not None
+        assert len(patched) == 2
+        assert isinstance(patched[1], ToolMessage)
+        assert patched[1].tool_call_id == "write_file:36"
+        assert patched[1].name == "write_file"
+        assert patched[1].status == "error"
+        assert "arguments were invalid" in patched[1].content
+        assert "Failed to parse tool arguments" in patched[1].content
+
+    def test_valid_and_invalid_tool_calls_are_both_patched(self):
+        mw = DanglingToolCallMiddleware()
+        msgs = [
+            AIMessage(
+                content="",
+                tool_calls=[_tc("bash", "call_1")],
+                invalid_tool_calls=[_invalid_tc()],
+            )
+        ]
+        patched = mw._build_patched_messages(msgs)
+        assert patched is not None
+        tool_msgs = [m for m in patched if isinstance(m, ToolMessage)]
+        assert len(tool_msgs) == 2
+        assert {tm.tool_call_id for tm in tool_msgs} == {"call_1", "write_file:36"}
+
+    def test_invalid_tool_call_already_responded_is_not_patched(self):
+        mw = DanglingToolCallMiddleware()
+        msgs = [
+            _ai_with_invalid_tool_calls([_invalid_tc()]),
+            _tool_msg("write_file:36", "write_file"),
+        ]
+        assert mw._build_patched_messages(msgs) is None
+

 class TestWrapModelCall:
    def test_no_patch_passthrough(self):
@@ -122,3 +122,45 @@ def test_health_still_works_when_docs_disabled():
        resp = client.get("/health")
        assert resp.status_code == 200
        assert resp.json()["status"] == "healthy"
+
+
+# ---------------------------------------------------------------------------
+# Runtime CORS behavior
+# ---------------------------------------------------------------------------
+
+
+def _make_gateway_client(cors_origins: str) -> TestClient:
+    with patch.dict(os.environ, {"GATEWAY_CORS_ORIGINS": cors_origins}):
+        _reset_gateway_config()
+        from app.gateway.app import create_app
+
+        return TestClient(create_app())
+
+
+def test_gateway_cors_allows_configured_origin():
+    """GATEWAY_CORS_ORIGINS should control actual browser CORS responses."""
+    client = _make_gateway_client("https://app.example")
+
+    response = client.get("/health", headers={"Origin": "https://app.example"})
+
+    assert response.status_code == 200
+    assert response.headers["access-control-allow-origin"] == "https://app.example"
+    assert response.headers["access-control-allow-credentials"] == "true"
+
+
+def test_gateway_cors_rejects_unconfigured_origin():
+    client = _make_gateway_client("https://app.example")
+
+    response = client.get("/health", headers={"Origin": "https://evil.example"})
+
+    assert response.status_code == 200
+    assert "access-control-allow-origin" not in response.headers
+
+
+def test_gateway_cors_normalizes_configured_default_port():
+    client = _make_gateway_client("https://app.example:443")
+
+    response = client.get("/health", headers={"Origin": "https://app.example"})
+
+    assert response.status_code == 200
+    assert response.headers["access-control-allow-origin"] == "https://app.example"
@@ -53,6 +53,29 @@ def test_nginx_routes_official_langgraph_prefix_to_gateway_api():
        assert "proxy_pass http://gateway" in content or "proxy_pass http://$gateway_upstream" in content


+def test_nginx_defers_cors_to_gateway_allowlist():
+    for path in ("docker/nginx/nginx.local.conf", "docker/nginx/nginx.conf"):
+        content = _read(path)
+
+        assert "Access-Control-Allow-Origin" not in content
+        assert "Access-Control-Allow-Methods" not in content
+        assert "Access-Control-Allow-Headers" not in content
+        assert "Access-Control-Allow-Credentials" not in content
+        assert "proxy_hide_header 'Access-Control-Allow-" not in content
+        assert "if ($request_method = 'OPTIONS')" not in content
+
+
+def test_gateway_cors_configuration_uses_gateway_allowlist():
+    gateway_config = _read("backend/app/gateway/config.py")
+    gateway_app = _read("backend/app/gateway/app.py")
+    csrf_middleware = _read("backend/app/gateway/csrf_middleware.py")
+
+    assert not re.search(r"(?<!GATEWAY_)[\"']CORS_ORIGINS[\"']", gateway_config)
+    assert "cors_origins" not in gateway_config
+    assert "get_configured_cors_origins" in gateway_app
+    assert "GATEWAY_CORS_ORIGINS" in csrf_middleware
+
+
 def test_frontend_rewrites_langgraph_prefix_to_gateway():
    next_config = _read("frontend/next.config.js")
    api_client = _read("frontend/src/core/api/api-client.ts")
@@ -639,3 +639,148 @@ class TestLocalSandboxProviderMounts:
            provider = LocalSandboxProvider()

        assert [m.container_path for m in provider._path_mappings] == ["/mnt/skills", "/mnt/data"]
+
+
+class TestLocalSandboxProviderResetClearsSingleton:
+    """Regression coverage for issue #2815.
+
+    The module-level LocalSandbox singleton must be cleared whenever the
+    provider is reset or shut down — otherwise stale path mappings and
+    mount policy survive config reloads and test teardown.
+    """
+
+    def _build_config(self, skills_dir, mounts):
+        from deerflow.config.sandbox_config import SandboxConfig
+
+        sandbox_config = SandboxConfig(
+            use="deerflow.sandbox.local:LocalSandboxProvider",
+            mounts=mounts,
+        )
+        return SimpleNamespace(
+            skills=SimpleNamespace(
+                container_path="/mnt/skills",
+                get_skills_path=lambda: skills_dir,
+                use="deerflow.skills.storage.local_skill_storage:LocalSkillStorage",
+            ),
+            sandbox=sandbox_config,
+        )
+
+    def test_reset_sandbox_provider_clears_local_singleton(self, tmp_path):
+        from deerflow.config.sandbox_config import VolumeMountConfig
+        from deerflow.sandbox import local as local_module
+        from deerflow.sandbox.local import local_sandbox_provider as lsp_module
+        from deerflow.sandbox.sandbox_provider import (
+            get_sandbox_provider,
+            reset_sandbox_provider,
+        )
+
+        skills_dir = tmp_path / "skills"
+        skills_dir.mkdir()
+        first_dir = tmp_path / "first"
+        first_dir.mkdir()
+        second_dir = tmp_path / "second"
+        second_dir.mkdir()
+
+        first_cfg = self._build_config(
+            skills_dir,
+            [VolumeMountConfig(host_path=str(first_dir), container_path="/mnt/first", read_only=False)],
+        )
+        second_cfg = self._build_config(
+            skills_dir,
+            [VolumeMountConfig(host_path=str(second_dir), container_path="/mnt/second", read_only=False)],
+        )
+
+        # Make sure no leftover singleton from a prior test interferes.
+        lsp_module._singleton = None
+        reset_sandbox_provider()
+
+        try:
+            with patch("deerflow.sandbox.sandbox_provider.get_app_config", return_value=first_cfg), patch("deerflow.config.get_app_config", return_value=first_cfg):
+                provider = get_sandbox_provider()
+                provider.acquire()
+
+            assert lsp_module._singleton is not None
+            first_container_paths = {m.container_path for m in lsp_module._singleton.path_mappings}
+            assert "/mnt/first" in first_container_paths
+
+            reset_sandbox_provider()
+
+            # The whole point of the regression: reset must drop the cached LocalSandbox.
+            assert lsp_module._singleton is None
+
+            with patch("deerflow.sandbox.sandbox_provider.get_app_config", return_value=second_cfg), patch("deerflow.config.get_app_config", return_value=second_cfg):
+                provider2 = get_sandbox_provider()
+                provider2.acquire()
+
+            assert provider2 is not provider
+            second_container_paths = {m.container_path for m in lsp_module._singleton.path_mappings}
+            assert "/mnt/second" in second_container_paths
+            assert "/mnt/first" not in second_container_paths
+        finally:
+            lsp_module._singleton = None
+            reset_sandbox_provider()
+
+        # Sanity: the local sandbox module still exposes the singleton symbol
+        # at the same module path (guards against accidental rename).
+        assert hasattr(local_module.local_sandbox_provider, "_singleton")
+
+    def test_shutdown_sandbox_provider_clears_local_singleton(self, tmp_path):
+        from deerflow.config.sandbox_config import VolumeMountConfig
+        from deerflow.sandbox.local import local_sandbox_provider as lsp_module
+        from deerflow.sandbox.sandbox_provider import (
+            get_sandbox_provider,
+            reset_sandbox_provider,
+            shutdown_sandbox_provider,
+        )
+
+        skills_dir = tmp_path / "skills"
+        skills_dir.mkdir()
+        mount_dir = tmp_path / "mount"
+        mount_dir.mkdir()
+
+        cfg = self._build_config(
+            skills_dir,
+            [VolumeMountConfig(host_path=str(mount_dir), container_path="/mnt/data", read_only=False)],
+        )
+
+        lsp_module._singleton = None
+        reset_sandbox_provider()
+
+        try:
+            with patch("deerflow.sandbox.sandbox_provider.get_app_config", return_value=cfg), patch("deerflow.config.get_app_config", return_value=cfg):
+                provider = get_sandbox_provider()
+                provider.acquire()
+
+            assert lsp_module._singleton is not None
+
+            shutdown_sandbox_provider()
+
+            assert lsp_module._singleton is None
+        finally:
+            lsp_module._singleton = None
+            reset_sandbox_provider()
+
+    def test_provider_reset_method_is_idempotent(self, tmp_path):
+        from deerflow.sandbox.local import local_sandbox_provider as lsp_module
+        from deerflow.sandbox.local.local_sandbox_provider import LocalSandboxProvider
+
+        skills_dir = tmp_path / "skills"
+        skills_dir.mkdir()
+        cfg = self._build_config(skills_dir, [])
+
+        lsp_module._singleton = None
+
+        try:
+            with patch("deerflow.config.get_app_config", return_value=cfg):
+                provider = LocalSandboxProvider()
+                provider.acquire()
+            assert lsp_module._singleton is not None
+
+            provider.reset()
+            assert lsp_module._singleton is None
+
+            # Calling reset again on an already-cleared singleton is safe.
+            provider.reset()
+            assert lsp_module._singleton is None
+        finally:
+            lsp_module._singleton = None
@@ -5,7 +5,8 @@ import pytest
 from langchain_core.tools import StructuredTool
 from pydantic import BaseModel, Field

-from deerflow.mcp.tools import _make_sync_tool_wrapper, get_mcp_tools
+from deerflow.mcp.tools import get_mcp_tools
+from deerflow.tools.sync import make_sync_tool_wrapper


 class MockArgs(BaseModel):
@@ -51,14 +52,13 @@ def test_mcp_tool_sync_wrapper_generation():


 def test_mcp_tool_sync_wrapper_in_running_loop():
-    """Test the actual helper function from production code (Fix for Comment 1 & 3)."""
+    """Test the shared sync wrapper from production code."""

    async def mock_coro(x: int):
        await asyncio.sleep(0.01)
        return f"async_result: {x}"

-    # Test the real helper function exported from deerflow.mcp.tools
-    sync_func = _make_sync_tool_wrapper(mock_coro, "test_tool")
+    sync_func = make_sync_tool_wrapper(mock_coro, "test_tool")

    async def run_in_loop():
        # This call should succeed due to ThreadPoolExecutor in the real helper
@@ -70,16 +70,16 @@ def test_mcp_tool_sync_wrapper_in_running_loop():


 def test_mcp_tool_sync_wrapper_exception_logging():
-    """Test the actual helper's error logging (Fix for Comment 3)."""
+    """Test the shared sync wrapper's error logging."""

    async def error_coro():
        raise ValueError("Tool failure")

-    sync_func = _make_sync_tool_wrapper(error_coro, "error_tool")
+    sync_func = make_sync_tool_wrapper(error_coro, "error_tool")

-    with patch("deerflow.mcp.tools.logger.error") as mock_log_error:
+    with patch("deerflow.tools.sync.logger.error") as mock_log_error:
        with pytest.raises(ValueError, match="Tool failure"):
            sync_func()
        mock_log_error.assert_called_once()
        # Verify the tool name is in the log message
-        assert "error_tool" in mock_log_error.call_args[0][0]
+        assert mock_log_error.call_args[0][1] == "error_tool"
@@ -339,6 +339,99 @@ class TestConvenienceFields:
        data = j.get_completion_data()
        assert data["first_human_message"] == "What is AI?"

+    @pytest.mark.anyio
+    async def test_completion_data_counts_human_ai_and_tool_messages(self, journal_setup):
+        from langchain_core.messages import HumanMessage, ToolMessage
+
+        j, _ = journal_setup
+        j.on_chat_model_start({}, [[HumanMessage(content="Question")]], run_id=uuid4(), tags=["lead_agent"])
+        j.on_llm_end(_make_llm_response("Answer"), run_id=uuid4(), parent_run_id=None, tags=["lead_agent"])
+        j.on_tool_end(ToolMessage(content="Tool result", tool_call_id="call_1", name="search"), run_id=uuid4())
+
+        data = j.get_completion_data()
+
+        assert data["message_count"] == 3
+        assert data["first_human_message"] == "Question"
+        assert data["last_ai_message"] == "Answer"
+
+    @pytest.mark.anyio
+    async def test_tool_call_only_ai_does_not_clear_last_ai_message(self, journal_setup):
+        j, _ = journal_setup
+        j.on_llm_end(_make_llm_response("Useful answer"), run_id=uuid4(), parent_run_id=None, tags=["lead_agent"])
+        j.on_llm_end(
+            _make_llm_response("", tool_calls=[{"id": "call_1", "name": "search", "args": {}}]),
+            run_id=uuid4(),
+            parent_run_id=None,
+            tags=["lead_agent"],
+        )
+
+        data = j.get_completion_data()
+
+        assert data["message_count"] == 2
+        assert data["last_ai_message"] == "Useful answer"
+
+    @pytest.mark.anyio
+    async def test_last_ai_message_extracts_mixed_content_without_extra_newlines(self, journal_setup):
+        j, _ = journal_setup
+        j.on_llm_end(
+            _make_llm_response(
+                [
+                    {"type": "text", "text": "First "},
+                    {"type": "text", "content": "second"},
+                    " third",
+                    {"type": "image", "url": "ignored"},
+                ]
+            ),
+            run_id=uuid4(),
+            parent_run_id=None,
+            tags=["lead_agent"],
+        )
+
+        data = j.get_completion_data()
+
+        assert data["message_count"] == 1
+        assert data["last_ai_message"] == "First second third"
+
+    @pytest.mark.anyio
+    async def test_last_ai_message_extracts_mapping_content(self, journal_setup):
+        j, _ = journal_setup
+        j.on_llm_end(_make_llm_response({"content": "Nested answer"}), run_id=uuid4(), parent_run_id=None, tags=["lead_agent"])
+
+        data = j.get_completion_data()
+
+        assert data["message_count"] == 1
+        assert data["last_ai_message"] == "Nested answer"
+
+    @pytest.mark.anyio
+    async def test_duplicate_llm_run_id_does_not_double_count_message_summary(self, journal_setup):
+        j, _ = journal_setup
+        run_id = uuid4()
+
+        j.on_llm_end(_make_llm_response("Answer", usage=None), run_id=run_id, parent_run_id=None, tags=["lead_agent"])
+        j.on_llm_end(
+            _make_llm_response("Answer", usage={"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}),
+            run_id=run_id,
+            parent_run_id=None,
+            tags=["lead_agent"],
+        )
+
+        data = j.get_completion_data()
+
+        assert data["message_count"] == 1
+        assert data["last_ai_message"] == "Answer"
+        assert data["total_tokens"] == 15
+
+    @pytest.mark.anyio
+    async def test_subagent_ai_does_not_overwrite_lead_last_ai_message(self, journal_setup):
+        j, _ = journal_setup
+        j.on_llm_end(_make_llm_response("Lead answer"), run_id=uuid4(), parent_run_id=None, tags=["lead_agent"])
+        j.on_llm_end(_make_llm_response("Subagent detail"), run_id=uuid4(), parent_run_id=None, tags=["subagent:research"])
+
+        data = j.get_completion_data()
+
+        assert data["message_count"] == 2
+        assert data["last_ai_message"] == "Lead answer"
+
    @pytest.mark.anyio
    async def test_get_completion_data(self, journal_setup):
        j, _ = journal_setup
@@ -383,6 +476,244 @@ class TestMiddlewareEvents:
        assert "middleware:guardrail" in event_types


+class TestCallerBucketing:
+    """Tests for caller-bucketed token accumulation (lead_agent / subagent / middleware)."""
+
+    def test_lead_agent_bucketing(self, journal_setup):
+        j, _ = journal_setup
+        usage = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        j.on_llm_end(_make_llm_response("A", usage=usage), run_id=uuid4(), parent_run_id=None, tags=["lead_agent"])
+        assert j._lead_agent_tokens == 15
+        assert j._subagent_tokens == 0
+        assert j._middleware_tokens == 0
+
+    def test_subagent_bucketing(self, journal_setup):
+        j, _ = journal_setup
+        usage = {"input_tokens": 20, "output_tokens": 10, "total_tokens": 30}
+        j.on_llm_end(_make_llm_response("B", usage=usage), run_id=uuid4(), parent_run_id=None, tags=["subagent:research"])
+        assert j._subagent_tokens == 30
+        assert j._lead_agent_tokens == 0
+        assert j._middleware_tokens == 0
+
+    def test_middleware_bucketing(self, journal_setup):
+        j, _ = journal_setup
+        usage = {"input_tokens": 5, "output_tokens": 2, "total_tokens": 7}
+        j.on_llm_end(_make_llm_response("C", usage=usage), run_id=uuid4(), parent_run_id=None, tags=["middleware:summarize"])
+        assert j._middleware_tokens == 7
+        assert j._lead_agent_tokens == 0
+        assert j._subagent_tokens == 0
+
+    def test_mixed_callers_sum_independently(self, journal_setup):
+        j, _ = journal_setup
+        usage = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        j.on_llm_end(_make_llm_response("A", usage=usage), run_id=uuid4(), parent_run_id=None, tags=["lead_agent"])
+        j.on_llm_end(_make_llm_response("B", usage=usage), run_id=uuid4(), parent_run_id=None, tags=["subagent:bash"])
+        j.on_llm_end(_make_llm_response("C", usage=usage), run_id=uuid4(), parent_run_id=None, tags=["middleware:title"])
+        assert j._lead_agent_tokens == 15
+        assert j._subagent_tokens == 15
+        assert j._middleware_tokens == 15
+        assert j._total_tokens == 45
+
+    def test_get_completion_data_includes_buckets(self, journal_setup):
+        j, _ = journal_setup
+        j._lead_agent_tokens = 100
+        j._subagent_tokens = 200
+        j._middleware_tokens = 50
+        data = j.get_completion_data()
+        assert data["lead_agent_tokens"] == 100
+        assert data["subagent_tokens"] == 200
+        assert data["middleware_tokens"] == 50
+
+    def test_dedup_same_run_id(self, journal_setup):
+        """Same langchain run_id in on_llm_end must not double-count."""
+        j, _ = journal_setup
+        run_id = uuid4()
+        usage = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        j.on_llm_end(_make_llm_response("A", usage=usage), run_id=run_id, parent_run_id=None, tags=["lead_agent"])
+        j.on_llm_end(_make_llm_response("A", usage=usage), run_id=run_id, parent_run_id=None, tags=["lead_agent"])
+        assert j._total_tokens == 15
+        assert j._lead_agent_tokens == 15
+        assert j._llm_call_count == 1
+
+    def test_first_no_usage_second_with_usage(self, journal_setup):
+        """First callback with no usage must not block second callback with usage for same run_id."""
+        j, _ = journal_setup
+        run_id = uuid4()
+        j.on_llm_end(_make_llm_response("A", usage=None), run_id=run_id, parent_run_id=None, tags=["lead_agent"])
+        assert str(run_id) not in j._counted_llm_run_ids
+        # Second callback for the same run_id with actual usage must still count
+        usage = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        j.on_llm_end(_make_llm_response("A", usage=usage), run_id=run_id, parent_run_id=None, tags=["lead_agent"])
+        assert j._total_tokens == 15
+        assert j._lead_agent_tokens == 15
+
+    def test_track_token_usage_false_skips_buckets(self):
+        """When token tracking is disabled, caller buckets stay at 0."""
+        store = MemoryRunEventStore()
+        j = RunJournal("r1", "t1", store, track_token_usage=False, flush_threshold=100)
+        usage = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        j.on_llm_end(_make_llm_response("X", usage=usage), run_id=uuid4(), parent_run_id=None, tags=["subagent:research"])
+        assert j._subagent_tokens == 0
+        assert j._lead_agent_tokens == 0
+
+    def test_default_no_tags_buckets_as_lead_agent(self, journal_setup):
+        """LLM calls without explicit tags default to lead_agent bucket."""
+        j, _ = journal_setup
+        usage = {"input_tokens": 5, "output_tokens": 5, "total_tokens": 10}
+        j.on_llm_end(_make_llm_response("Hi", usage=usage), run_id=uuid4(), parent_run_id=None)
+        assert j._lead_agent_tokens == 10
+        assert j._subagent_tokens == 0
+        assert j._middleware_tokens == 0
+
+    def test_unknown_tag_buckets_as_lead_agent(self, journal_setup):
+        """Calls with unrecognized tags (not lead_agent/subagent:/middleware:) go to lead_agent."""
+        j, _ = journal_setup
+        usage = {"input_tokens": 5, "output_tokens": 5, "total_tokens": 10}
+        j.on_llm_end(_make_llm_response("Hi", usage=usage), run_id=uuid4(), parent_run_id=None, tags=["some_random_tag"])
+        assert j._lead_agent_tokens == 10
+
+
+class TestExternalUsageRecords:
+    """Tests for record_external_llm_usage_records."""
+
+    def test_records_added_to_subagent_bucket(self, journal_setup):
+        j, _ = journal_setup
+        records = [
+            {
+                "source_run_id": "ext-1",
+                "caller": "subagent:general-purpose",
+                "input_tokens": 100,
+                "output_tokens": 50,
+                "total_tokens": 150,
+            }
+        ]
+        j.record_external_llm_usage_records(records)
+        assert j._subagent_tokens == 150
+        assert j._total_tokens == 150
+        assert j._total_input_tokens == 100
+        assert j._total_output_tokens == 50
+
+    def test_records_added_to_middleware_bucket(self, journal_setup):
+        j, _ = journal_setup
+        records = [
+            {
+                "source_run_id": "ext-2",
+                "caller": "middleware:summarize",
+                "input_tokens": 30,
+                "output_tokens": 10,
+                "total_tokens": 40,
+            }
+        ]
+        j.record_external_llm_usage_records(records)
+        assert j._middleware_tokens == 40
+        assert j._lead_agent_tokens == 0
+        assert j._subagent_tokens == 0
+
+    def test_records_added_to_lead_agent_bucket(self, journal_setup):
+        j, _ = journal_setup
+        records = [
+            {
+                "source_run_id": "ext-3",
+                "caller": "lead_agent",
+                "input_tokens": 10,
+                "output_tokens": 5,
+                "total_tokens": 15,
+            }
+        ]
+        j.record_external_llm_usage_records(records)
+        assert j._lead_agent_tokens == 15
+
+    def test_dedup_same_source_run_id(self, journal_setup):
+        """Same source_run_id must not be double-counted."""
+        j, _ = journal_setup
+        records = [
+            {
+                "source_run_id": "dup-1",
+                "caller": "subagent:research",
+                "input_tokens": 50,
+                "output_tokens": 25,
+                "total_tokens": 75,
+            }
+        ]
+        j.record_external_llm_usage_records(records)
+        j.record_external_llm_usage_records(records)
+        assert j._subagent_tokens == 75
+        assert j._total_tokens == 75
+
+    def test_total_tokens_missing_computed_from_input_output(self, journal_setup):
+        j, _ = journal_setup
+        records = [
+            {
+                "source_run_id": "ext-4",
+                "caller": "subagent:bash",
+                "input_tokens": 200,
+                "output_tokens": 100,
+                "total_tokens": 0,
+            }
+        ]
+        j.record_external_llm_usage_records(records)
+        assert j._subagent_tokens == 300
+        assert j._total_tokens == 300
+
+    def test_total_tokens_zero_no_count(self, journal_setup):
+        """Records with zero total and zero input+output must not be counted."""
+        j, _ = journal_setup
+        records = [
+            {
+                "source_run_id": "ext-5",
+                "caller": "subagent:research",
+                "input_tokens": 0,
+                "output_tokens": 0,
+                "total_tokens": 0,
+            }
+        ]
+        j.record_external_llm_usage_records(records)
+        assert j._total_tokens == 0
+        assert j._subagent_tokens == 0
+
+    def test_empty_source_run_id_skipped(self, journal_setup):
+        j, _ = journal_setup
+        records = [
+            {
+                "source_run_id": "",
+                "caller": "subagent:research",
+                "input_tokens": 50,
+                "output_tokens": 25,
+                "total_tokens": 75,
+            }
+        ]
+        j.record_external_llm_usage_records(records)
+        assert j._total_tokens == 0
+
+    def test_multiple_records_in_single_call(self, journal_setup):
+        j, _ = journal_setup
+        records = [
+            {"source_run_id": "r1", "caller": "subagent:gp", "input_tokens": 10, "output_tokens": 5, "total_tokens": 15},
+            {"source_run_id": "r2", "caller": "subagent:bash", "input_tokens": 20, "output_tokens": 10, "total_tokens": 30},
+        ]
+        j.record_external_llm_usage_records(records)
+        assert j._subagent_tokens == 45
+        assert j._total_tokens == 45
+
+    def test_external_records_coexist_with_inline_callbacks(self, journal_setup):
+        """External records and inline on_llm_end must not interfere."""
+        j, _ = journal_setup
+        usage = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        j.on_llm_end(_make_llm_response("A", usage=usage), run_id=uuid4(), parent_run_id=None, tags=["lead_agent"])
+        j.record_external_llm_usage_records([{"source_run_id": "ext-6", "caller": "subagent:gp", "input_tokens": 100, "output_tokens": 50, "total_tokens": 150}])
+        assert j._lead_agent_tokens == 15
+        assert j._subagent_tokens == 150
+        assert j._total_tokens == 165
+
+    def test_track_token_usage_false_skips_external_records(self):
+        """When token tracking is disabled, external records must not accumulate."""
+        store = MemoryRunEventStore()
+        j = RunJournal("r1", "t1", store, track_token_usage=False, flush_threshold=100)
+        j.record_external_llm_usage_records([{"source_run_id": "ext-7", "caller": "subagent:gp", "input_tokens": 100, "output_tokens": 50, "total_tokens": 150}])
+        assert j._total_tokens == 0
+        assert j._subagent_tokens == 0
+
+
 class TestChatModelStartHumanMessage:
    """Tests for on_chat_model_start extracting the first human message."""

@@ -5,6 +5,7 @@ import re
 import pytest

 from deerflow.runtime import RunManager, RunStatus
+from deerflow.runtime.runs.store.memory import MemoryRunStore

 ISO_RE = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}")

@@ -141,3 +142,53 @@ async def test_create_defaults(manager: RunManager):
    assert record.kwargs == {}
    assert record.multitask_strategy == "reject"
    assert record.assistant_id is None
+
+
+@pytest.mark.anyio
+async def test_model_name_create_or_reject():
+    """create_or_reject should accept and persist model_name."""
+    from deerflow.runtime.runs.schemas import DisconnectMode
+
+    store = MemoryRunStore()
+    mgr = RunManager(store=store)
+
+    record = await mgr.create_or_reject(
+        "thread-1",
+        assistant_id="lead_agent",
+        on_disconnect=DisconnectMode.cancel,
+        metadata={"key": "val"},
+        kwargs={"input": {}},
+        multitask_strategy="reject",
+        model_name="anthropic.claude-sonnet-4-20250514-v1:0",
+    )
+    assert record.model_name == "anthropic.claude-sonnet-4-20250514-v1:0"
+    assert record.status == RunStatus.pending
+
+    # Verify model_name was persisted to store
+    stored = await store.get(record.run_id)
+    assert stored is not None
+    assert stored["model_name"] == "anthropic.claude-sonnet-4-20250514-v1:0"
+
+    # Verify retrieval returns the model_name via in-memory record
+    fetched = mgr.get(record.run_id)
+    assert fetched is not None
+    assert fetched.model_name == "anthropic.claude-sonnet-4-20250514-v1:0"
+
+
+@pytest.mark.anyio
+async def test_model_name_default_is_none():
+    """create_or_reject without model_name should default to None."""
+    from deerflow.runtime.runs.schemas import DisconnectMode
+
+    store = MemoryRunStore()
+    mgr = RunManager(store=store)
+
+    record = await mgr.create_or_reject(
+        "thread-1",
+        on_disconnect=DisconnectMode.cancel,
+        model_name=None,
+    )
+    assert record.model_name is None
+
+    stored = await store.get(record.run_id)
+    assert stored["model_name"] is None
@@ -249,3 +249,32 @@ class TestRunRepository:
        rows = await repo.list_by_thread("t1", user_id=None)
        assert len(rows) == 2
        await _cleanup()
+
+    @pytest.mark.anyio
+    async def test_model_name_persistence(self, tmp_path):
+        """RunRepository should persist, normalize, and truncate model_name correctly via SQL."""
+        from deerflow.persistence.engine import get_session_factory, init_engine
+
+        url = f"sqlite+aiosqlite:///{tmp_path / 'test.db'}"
+        await init_engine("sqlite", url=url, sqlite_dir=str(tmp_path))
+        repo = RunRepository(get_session_factory())
+
+        await repo.put("run-1", thread_id="thread-1", model_name="gpt-4o")
+        row = await repo.get("run-1")
+        assert row is not None
+        assert row["model_name"] == "gpt-4o"
+
+        long_name = "a" * 200
+        await repo.put("run-2", thread_id="thread-1", model_name=long_name)
+        row2 = await repo.get("run-2")
+        assert row2["model_name"] == "a" * 128
+
+        await repo.put("run-3", thread_id="thread-1", model_name=123)
+        row3 = await repo.get("run-3")
+        assert row3["model_name"] == "123"
+
+        await repo.put("run-4", thread_id="thread-1", model_name=None)
+        row4 = await repo.get("run-4")
+        assert row4["model_name"] is None
+
+        await _cleanup()
@@ -0,0 +1,429 @@
+"""End-to-end verification for issue #2862 (and the regression of #2782).
+
+Goal: prove — without trusting any single layer's claim — that an authenticated
+user creating a custom agent through the real ``setup_agent`` tool, driven by a
+real LangGraph ``create_agent`` graph, ends up with files under
+``users/<auth_uid>/agents/<name>`` and **not** under ``users/default/agents/...``.
+
+We intentionally exercise the full pipeline:
+
+    HTTP body shape (mimics LangGraph SDK wire format)
+      -> app.gateway.services.start_run config-assembly chain
+      -> deerflow.runtime.runs.worker._build_runtime_context
+      -> langchain.agents.create_agent graph
+      -> ToolNode dispatch
+      -> setup_agent tool
+
+The only thing we mock is the LLM (FakeMessagesListChatModel) — every layer
+that handles ``user_id`` is the real production code path. If the
+``user_id`` propagation is broken anywhere in this chain, these tests will
+fail.
+
+These tests intentionally ``no_auto_user`` so that the ``contextvar``
+fallback would put files into ``default/`` if propagation breaks.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from types import SimpleNamespace
+from unittest.mock import patch
+from uuid import UUID
+
+import pytest
+from _agent_e2e_helpers import FakeToolCallingModel
+from langchain_core.messages import AIMessage, HumanMessage
+
+from app.gateway.services import (
+    build_run_config,
+    inject_authenticated_user_context,
+    merge_run_context_overrides,
+)
+from deerflow.runtime.runs.worker import _build_runtime_context, _install_runtime_context
+
+# ---------------------------------------------------------------------------
+# Helpers — real production code paths
+# ---------------------------------------------------------------------------
+
+
+def _make_request(user_id_str: str | None) -> SimpleNamespace:
+    """Build a fake FastAPI Request that carries an authenticated user."""
+    if user_id_str is None:
+        user = None
+    else:
+        # User.id is UUID in production; honour that
+        user = SimpleNamespace(id=UUID(user_id_str), email="alice@local")
+    return SimpleNamespace(state=SimpleNamespace(user=user))
+
+
+def _assemble_config(
+    *,
+    body_config: dict | None,
+    body_context: dict | None,
+    request_user_id: str | None,
+    thread_id: str = "thread-e2e",
+    assistant_id: str = "lead_agent",
+) -> dict:
+    """Replay the **exact** start_run config-assembly sequence."""
+    config = build_run_config(thread_id, body_config, None, assistant_id=assistant_id)
+    merge_run_context_overrides(config, body_context)
+    inject_authenticated_user_context(config, _make_request(request_user_id))
+    return config
+
+
+def _make_paths_mock(tmp_path: Path):
+    """Mirror the production paths.user_agent_dir signature."""
+    from unittest.mock import MagicMock
+
+    paths = MagicMock()
+    paths.base_dir = tmp_path
+    paths.agent_dir = lambda name: tmp_path / "agents" / name
+    paths.user_agent_dir = lambda user_id, name: tmp_path / "users" / user_id / "agents" / name
+    return paths
+
+
+# ---------------------------------------------------------------------------
+# L1-L3: HTTP wire format → start_run → worker._build_runtime_context
+# ---------------------------------------------------------------------------
+
+
+class TestConfigAssembly:
+    """Covers L1-L3: validate that user_id reaches runtime_ctx for every wire shape."""
+
+    def test_typical_wire_format_user_id_in_runtime_ctx(self):
+        """Real frontend: body.config={recursion_limit}, body.context={agent_name,...}."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context={"agent_name": "myagent", "is_bootstrap": True, "mode": "flash"},
+            request_user_id="11111111-2222-3333-4444-555555555555",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "11111111-2222-3333-4444-555555555555"
+        assert runtime_ctx["agent_name"] == "myagent"
+
+    def test_body_context_none_still_injects_user_id(self):
+        """If frontend omits body.context entirely, inject must still create it."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context=None,
+            request_user_id="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
+
+    def test_body_context_empty_dict_still_injects_user_id(self):
+        """body.context={} (falsy) path: inject must still produce user_id."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context={},
+            request_user_id="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
+
+    def test_body_config_already_contains_context_field(self):
+        """body.config={'context': {...}} (LG 0.6 alt wire): inject still wins."""
+        config = _assemble_config(
+            body_config={"context": {"agent_name": "myagent"}, "recursion_limit": 1000},
+            body_context=None,
+            request_user_id="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
+
+    def test_client_supplied_user_id_is_overridden(self):
+        """Spoofed client user_id must be overwritten by inject (auth-trusted source)."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context={"agent_name": "myagent", "user_id": "spoofed"},
+            request_user_id="11111111-2222-3333-4444-555555555555",
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert runtime_ctx["user_id"] == "11111111-2222-3333-4444-555555555555"
+
+    def test_unauthenticated_request_does_not_inject(self):
+        """If request.state.user is missing (impossible under fail-closed auth, but
+        verify defensively), inject must not write user_id and runtime_ctx must
+        therefore lack it — forcing the tool fallback path to reveal itself."""
+        config = _assemble_config(
+            body_config={"recursion_limit": 1000},
+            body_context={"agent_name": "myagent"},
+            request_user_id=None,
+        )
+        runtime_ctx = _build_runtime_context("thread-e2e", "run-1", config.get("context"), None)
+        assert "user_id" not in runtime_ctx
+
+
+# ---------------------------------------------------------------------------
+# L4-L7: Real LangGraph create_agent driving the real setup_agent tool
+# ---------------------------------------------------------------------------
+
+
+def _build_real_bootstrap_graph(authenticated_user_id: str):
+    """Construct a real LangGraph using create_agent + the real setup_agent tool.
+
+    The LLM is faked (FakeMessagesListChatModel) so we don't need an API key.
+    Everything else — ToolNode dispatch, runtime injection, middleware — is
+    the real production code path.
+    """
+    from langchain.agents import create_agent
+
+    from deerflow.tools.builtins.setup_agent_tool import setup_agent
+
+    # First model turn: emit a tool_call for setup_agent
+    # Second model turn (after tool result): final answer (terminates the loop)
+    fake_model = FakeToolCallingModel(
+        responses=[
+            AIMessage(
+                content="",
+                tool_calls=[
+                    {
+                        "name": "setup_agent",
+                        "args": {
+                            "soul": "# My E2E Agent\n\nA SOUL written by the model.",
+                            "description": "End-to-end test agent",
+                        },
+                        "id": "call_setup_1",
+                        "type": "tool_call",
+                    }
+                ],
+            ),
+            AIMessage(content=f"Done. Agent created for user {authenticated_user_id}."),
+        ]
+    )
+
+    graph = create_agent(
+        model=fake_model,
+        tools=[setup_agent],
+        system_prompt="You are a bootstrap agent. Call setup_agent immediately.",
+    )
+    return graph
+
+
+@pytest.mark.no_auto_user
+@pytest.mark.asyncio
+async def test_real_graph_real_setup_agent_writes_to_authenticated_user_dir(tmp_path: Path):
+    """The smoking-gun test for issue #2862.
+
+    Under no_auto_user (contextvar = empty), if user_id propagation through
+    runtime.context is broken, setup_agent will fall back to DEFAULT_USER_ID
+    and write to users/default/agents/... The assertion that this directory
+    DOES NOT exist is what makes this test load-bearing.
+    """
+    from langgraph.runtime import Runtime
+
+    auth_uid = "abcdef01-2345-6789-abcd-ef0123456789"
+    config = _assemble_config(
+        body_config={"recursion_limit": 50},
+        body_context={"agent_name": "e2e-agent", "is_bootstrap": True},
+        request_user_id=auth_uid,
+        thread_id="thread-e2e-1",
+    )
+
+    # Replay worker.run_agent's runtime construction. This is the key step:
+    # it is what makes ToolRuntime.context contain user_id when the tool
+    # actually fires.
+    runtime_ctx = _build_runtime_context("thread-e2e-1", "run-1", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_real_bootstrap_graph(auth_uid)
+
+    # Patch get_paths only (the file-system rooting); everything else is real
+    with patch(
+        "deerflow.tools.builtins.setup_agent_tool.get_paths",
+        return_value=_make_paths_mock(tmp_path),
+    ):
+        # Drive the real graph. This goes through real ToolNode + real Runtime merge.
+        final_state = await graph.ainvoke(
+            {"messages": [HumanMessage(content="Create an agent named e2e-agent")]},
+            config=config,
+        )
+
+    expected_dir = tmp_path / "users" / auth_uid / "agents" / "e2e-agent"
+    default_dir = tmp_path / "users" / "default" / "agents" / "e2e-agent"
+
+    # Load-bearing assertions:
+    assert expected_dir.exists(), f"Agent directory not found at the authenticated user's path. Expected: {expected_dir}. tmp_path tree: {[str(p) for p in tmp_path.rglob('*')]}"
+    assert (expected_dir / "SOUL.md").read_text() == "# My E2E Agent\n\nA SOUL written by the model."
+    assert (expected_dir / "config.yaml").exists()
+    assert not default_dir.exists(), "REGRESSION: agent landed under users/default/. user_id propagation broke somewhere between HTTP layer and ToolRuntime.context."
+
+    # And final state should reflect tool success
+    last = final_state["messages"][-1]
+    assert "Done" in (last.content if isinstance(last.content, str) else str(last.content))
+
+
+@pytest.mark.no_auto_user
+@pytest.mark.asyncio
+async def test_inject_failure_falls_back_to_default_proving_test_is_load_bearing(tmp_path: Path):
+    """Negative control: if inject does NOT happen (no user in request), and
+    contextvar is empty (no_auto_user), setup_agent must land in default/.
+
+    This proves the positive test is actually load-bearing — i.e. it would
+    have failed before PR #2784, not passed accidentally.
+    """
+    from langgraph.runtime import Runtime
+
+    config = _assemble_config(
+        body_config={"recursion_limit": 50},
+        body_context={"agent_name": "fallback-agent", "is_bootstrap": True},
+        request_user_id=None,  # no auth — inject is a no-op
+        thread_id="thread-e2e-2",
+    )
+
+    runtime_ctx = _build_runtime_context("thread-e2e-2", "run-2", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_real_bootstrap_graph("does-not-matter")
+
+    with patch(
+        "deerflow.tools.builtins.setup_agent_tool.get_paths",
+        return_value=_make_paths_mock(tmp_path),
+    ):
+        await graph.ainvoke(
+            {"messages": [HumanMessage(content="Create fallback-agent")]},
+            config=config,
+        )
+
+    default_dir = tmp_path / "users" / "default" / "agents" / "fallback-agent"
+    assert default_dir.exists(), "Negative control failed: even without inject + contextvar, agent did not land in default/. The test infrastructure may not be reproducing the bug condition."
+
+
+# ---------------------------------------------------------------------------
+# L5: Sub-graph runtime propagation (the task tool case)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.no_auto_user
+@pytest.mark.asyncio
+async def test_subgraph_invocation_preserves_user_id_in_runtime(tmp_path: Path):
+    """When a parent graph invokes a child graph (the pattern used by
+    subagents), parent_runtime.merge() must keep user_id intact.
+
+    We construct a child graph that contains setup_agent and call it from
+    a parent graph's tool. If LangGraph re-creates the Runtime and drops
+    user_id at the sub-graph boundary, this fails.
+    """
+    from langchain.agents import create_agent
+    from langgraph.runtime import Runtime
+
+    from deerflow.tools.builtins.setup_agent_tool import setup_agent
+
+    auth_uid = "deadbeef-0000-1111-2222-333344445555"
+
+    # Inner graph: same as the bootstrap flow
+    inner_model = FakeToolCallingModel(
+        responses=[
+            AIMessage(
+                content="",
+                tool_calls=[
+                    {
+                        "name": "setup_agent",
+                        "args": {"soul": "# Inner", "description": "subgraph"},
+                        "id": "call_inner_1",
+                        "type": "tool_call",
+                    }
+                ],
+            ),
+            AIMessage(content="inner done"),
+        ]
+    )
+    inner_graph = create_agent(
+        model=inner_model,
+        tools=[setup_agent],
+        system_prompt="inner",
+    )
+
+    config = _assemble_config(
+        body_config={"recursion_limit": 50},
+        body_context={"agent_name": "subgraph-agent", "is_bootstrap": True},
+        request_user_id=auth_uid,
+        thread_id="thread-e2e-3",
+    )
+    runtime_ctx = _build_runtime_context("thread-e2e-3", "run-3", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    with patch(
+        "deerflow.tools.builtins.setup_agent_tool.get_paths",
+        return_value=_make_paths_mock(tmp_path),
+    ):
+        # Direct sub-graph invoke (mimics what a subagent invocation looks like
+        # — distinct ainvoke call, but parent config carries the same runtime).
+        await inner_graph.ainvoke(
+            {"messages": [HumanMessage(content="Create subgraph-agent")]},
+            config=config,
+        )
+
+    expected_dir = tmp_path / "users" / auth_uid / "agents" / "subgraph-agent"
+    default_dir = tmp_path / "users" / "default" / "agents" / "subgraph-agent"
+    assert expected_dir.exists()
+    assert not default_dir.exists()
+
+
+# ---------------------------------------------------------------------------
+# L6: Sync tool path through ContextThreadPoolExecutor
+# ---------------------------------------------------------------------------
+
+
+def test_sync_tool_dispatch_through_thread_pool_uses_runtime_context(tmp_path: Path):
+    """setup_agent is a sync function. When dispatched through ToolNode's
+    ContextThreadPoolExecutor, runtime.context must still carry user_id —
+    not via thread-local copy_context (which only carries contextvars), but
+    because it was passed in as the ToolRuntime constructor argument.
+    """
+    from langchain.agents import create_agent
+    from langgraph.runtime import Runtime
+
+    from deerflow.tools.builtins.setup_agent_tool import setup_agent
+
+    auth_uid = "11112222-3333-4444-5555-666677778888"
+
+    fake_model = FakeToolCallingModel(
+        responses=[
+            AIMessage(
+                content="",
+                tool_calls=[
+                    {
+                        "name": "setup_agent",
+                        "args": {"soul": "# Sync", "description": "sync path"},
+                        "id": "call_sync_1",
+                        "type": "tool_call",
+                    }
+                ],
+            ),
+            AIMessage(content="sync done"),
+        ]
+    )
+    graph = create_agent(model=fake_model, tools=[setup_agent], system_prompt="sync")
+
+    config = _assemble_config(
+        body_config={"recursion_limit": 50},
+        body_context={"agent_name": "sync-agent", "is_bootstrap": True},
+        request_user_id=auth_uid,
+        thread_id="thread-e2e-4",
+    )
+    runtime_ctx = _build_runtime_context("thread-e2e-4", "run-4", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    with patch(
+        "deerflow.tools.builtins.setup_agent_tool.get_paths",
+        return_value=_make_paths_mock(tmp_path),
+    ):
+        # Use SYNC invoke to hit the ContextThreadPoolExecutor path
+        graph.invoke(
+            {"messages": [HumanMessage(content="Create sync-agent")]},
+            config=config,
+        )
+
+    expected_dir = tmp_path / "users" / auth_uid / "agents" / "sync-agent"
+    default_dir = tmp_path / "users" / "default" / "agents" / "sync-agent"
+    assert expected_dir.exists()
+    assert not default_dir.exists()
@@ -0,0 +1,326 @@
+"""Real HTTP end-to-end verification for issue #2862's setup_agent path.
+
+This test drives the **entire** FastAPI gateway through ``starlette.testclient.TestClient``:
+
+  starlette.testclient.TestClient (real ASGI stack)
+    -> AuthMiddleware (real cookie parsing, real JWT decode)
+    -> /api/v1/auth/register endpoint (real password hash + sqlite write)
+    -> /api/threads/{id}/runs/stream endpoint (real start_run config-assembly)
+    -> background asyncio.create_task(run_agent) (real worker, real Runtime)
+    -> langchain.agents.create_agent graph (real, with fake LLM)
+    -> ToolNode dispatch (real)
+    -> setup_agent tool (real file I/O)
+
+The only mock is the LLM (no API key needed). Every layer that participates
+in ``user_id`` propagation — auth, ContextVar, ``inject_authenticated_user_context``,
+``worker._build_runtime_context``, ``Runtime.merge`` — is the real production
+code path. If the chain is broken at any layer, this test fails.
+
+This is what "真实验证" looks like for a server that lives behind authentication:
+register a user, log in (cookie), POST to /runs/stream, wait for the run to
+finish, then read the filesystem.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+from unittest.mock import patch
+
+import pytest
+from _agent_e2e_helpers import FakeToolCallingModel, build_single_tool_call_model
+
+
+def _build_fake_create_chat_model(agent_name: str):
+    """Return a callable matching the real ``create_chat_model`` signature.
+
+    Whenever the lead agent constructs a chat model during the bootstrap flow,
+    we hand it a fake that emits a single setup_agent tool_call on its first
+    turn, then a benign final answer on its second turn.
+    """
+
+    def fake_create_chat_model(*args: Any, **kwargs: Any) -> FakeToolCallingModel:
+        return build_single_tool_call_model(
+            tool_name="setup_agent",
+            tool_args={
+                "soul": f"# Real HTTP E2E SOUL for {agent_name}",
+                "description": "real-http-e2e agent",
+            },
+            tool_call_id="call_real_http_1",
+            final_text=f"Agent {agent_name} created via real HTTP e2e.",
+        )
+
+    return fake_create_chat_model
+
+
+@pytest.fixture
+def isolated_deer_flow_home(tmp_path: Path, monkeypatch: pytest.MonkeyPatch):
+    """Stand up an isolated DeerFlow data root + config under tmp_path.
+
+    - Sets ``DEER_FLOW_HOME`` so paths land under tmp_path, not the real
+      ``.deer-flow`` directory.
+    - Stages a copy of the project's ``config.yaml`` (or ``config.example.yaml``
+      on a fresh CI checkout where ``config.yaml`` is gitignored) and pins
+      ``DEER_FLOW_CONFIG_PATH`` to it, so lifespan boot doesn't depend on the
+      developer's local config layout.
+    - Sets a placeholder OPENAI_API_KEY because the config has
+      ``$OPENAI_API_KEY`` that gets resolved at parse time; the LLM itself is
+      mocked, so any non-empty value works.
+    """
+    home = tmp_path / "deer-flow-home"
+    home.mkdir()
+    monkeypatch.setenv("DEER_FLOW_HOME", str(home))
+    monkeypatch.setenv("OPENAI_API_KEY", "sk-fake-key-not-used-because-llm-is-mocked")
+    monkeypatch.setenv("OPENAI_API_BASE", "https://example.invalid")
+
+    # Hermetic config: do not depend on whether the dev machine has a real
+    # ``config.yaml`` at the repo root. CI's ``actions/checkout`` only ships
+    # ``config.example.yaml`` (and its ``models:`` list is commented out, so
+    # AppConfig validation would reject it). Write a minimal, self-sufficient
+    # config to tmp_path and pin ``DEER_FLOW_CONFIG_PATH`` to it.
+    staged_config = tmp_path / "config.yaml"
+    staged_config.write_text(_MINIMAL_CONFIG_YAML, encoding="utf-8")
+    monkeypatch.setenv("DEER_FLOW_CONFIG_PATH", str(staged_config))
+
+    return home
+
+
+# Minimal config that satisfies AppConfig + LeadAgent's _resolve_model_name.
+# The model `use` path must resolve to a real class for config parsing to
+# succeed; the test patches ``create_chat_model`` on the lead agent module,
+# so the model is never actually instantiated. SandboxConfig.use is required
+# at schema level; LocalSandboxProvider is the only sandbox that runs without
+# Docker.
+_MINIMAL_CONFIG_YAML = """\
+log_level: info
+models:
+  - name: fake-test-model
+    display_name: Fake Test Model
+    use: langchain_openai:ChatOpenAI
+    model: gpt-4o-mini
+    api_key: $OPENAI_API_KEY
+    base_url: $OPENAI_API_BASE
+sandbox:
+  use: deerflow.sandbox.local:LocalSandboxProvider
+agents_api:
+  enabled: true
+database:
+  backend: sqlite
+"""
+
+
+def _reset_process_singletons(monkeypatch: pytest.MonkeyPatch) -> None:
+    """Reset every process-wide cache that would survive across tests.
+
+    This fixture stands up a full FastAPI app + sqlite DB + LangGraph runtime
+    inside ``tmp_path``. To get true per-test isolation we have to invalidate
+    a handful of module-level caches that production normally never resets,
+    so they pick up our test-only ``DEER_FLOW_HOME`` and sqlite path:
+
+    - ``deerflow.config.app_config`` caches the parsed ``config.yaml``.
+    - ``deerflow.config.paths`` caches the ``Paths`` singleton derived from
+      ``DEER_FLOW_HOME`` at first access.
+    - ``deerflow.persistence.engine`` caches the SQLAlchemy engine and
+      session factory after the first call to ``init_engine_from_config``.
+
+    ``raising=False`` keeps the fixture resilient if upstream renames or
+    drops one of these attributes — the test will simply skip that reset
+    instead of failing with a confusing AttributeError, and the next test
+    to call ``get_app_config()``/``get_paths()`` will surface the real
+    incompatibility loudly.
+    """
+    from deerflow.config import app_config as app_config_module
+    from deerflow.config import paths as paths_module
+    from deerflow.persistence import engine as engine_module
+
+    for module, attr in (
+        (app_config_module, "_app_config"),
+        (app_config_module, "_app_config_path"),
+        (app_config_module, "_app_config_mtime"),
+        (paths_module, "_paths_singleton"),
+        (engine_module, "_engine"),
+        (engine_module, "_session_factory"),
+    ):
+        monkeypatch.setattr(module, attr, None, raising=False)
+
+
+@pytest.fixture
+def isolated_app(isolated_deer_flow_home: Path, monkeypatch: pytest.MonkeyPatch):
+    """Build a fresh FastAPI app inside a clean DEER_FLOW_HOME.
+
+    Each test gets its own sqlite DB and checkpoint store under ``tmp_path``,
+    with no cross-test contamination.
+    """
+    _reset_process_singletons(monkeypatch)
+
+    # Re-resolve the config from the test-only DEER_FLOW_HOME and pin its
+    # sqlite path into tmp_path so the lifespan-time engine init lands there.
+    from deerflow.config import app_config as app_config_module
+
+    cfg = app_config_module.get_app_config()
+    cfg.database.sqlite_dir = str(isolated_deer_flow_home / "db")
+
+    from app.gateway.app import create_app
+
+    return create_app()
+
+
+def _drain_stream(response, *, timeout: float = 30.0, max_bytes: int = 4 * 1024 * 1024) -> str:
+    """Consume an SSE response body until the run terminates and return the text.
+
+    Bounded to keep the test fail-fast:
+      - Stops as soon as an ``event: end`` SSE frame is observed (the gateway
+        sends this when the background run finishes — see ``services.format_sse``
+        and ``StreamBridge.publish_end``).
+      - Stops at ``timeout`` seconds wall-clock so a stuck run / runaway heartbeat
+        loop surfaces a real failure instead of hanging pytest.
+      - Stops at ``max_bytes`` so a runaway producer can't OOM the test process.
+    """
+    import time as _time
+
+    deadline = _time.monotonic() + timeout
+    body = b""
+    for chunk in response.iter_bytes():
+        body += chunk
+        if b"event: end" in body:
+            break
+        if len(body) >= max_bytes:
+            break
+        if _time.monotonic() >= deadline:
+            break
+    return body.decode("utf-8", errors="replace")
+
+
+def _wait_for_file(path: Path, *, timeout: float = 10.0) -> bool:
+    """Block until *path* exists or *timeout* elapses.
+
+    The run completes inside ``asyncio.create_task`` after start_run returns,
+    so the test must wait for the background task to flush its writes.
+    """
+    import time as _time
+
+    deadline = _time.monotonic() + timeout
+    while _time.monotonic() < deadline:
+        if path.exists():
+            return True
+        _time.sleep(0.05)
+    return False
+
+
+@pytest.mark.no_auto_user
+def test_real_http_create_agent_lands_in_authenticated_user_dir(
+    isolated_app: Any,
+    isolated_deer_flow_home: Path,
+    monkeypatch: pytest.MonkeyPatch,
+):
+    """The full real-server contract test.
+
+    1. Register a real user via POST /api/v1/auth/register (also auto-logs in)
+    2. POST to /api/threads/{tid}/runs/stream with the **exact** body shape the
+       frontend (LangGraph SDK) sends during the bootstrap flow.
+    3. Wait for the background run to finish.
+    4. Assert SOUL.md exists under users/<authenticated_uid>/agents/<name>/.
+    5. Assert NOTHING exists under users/default/agents/<name>/.
+    """
+    # ``deerflow.agents.lead_agent.agent`` imports ``create_chat_model`` with
+    # ``from deerflow.models import create_chat_model`` at module load time,
+    # rebinding the symbol into its own namespace. So the only patch that
+    # intercepts the call is the bound name on ``lead_agent.agent`` — patching
+    # ``deerflow.models.create_chat_model`` would be too late.
+    agent_name = "real-http-agent"
+
+    from starlette.testclient import TestClient
+
+    with (
+        patch(
+            "deerflow.agents.lead_agent.agent.create_chat_model",
+            new=_build_fake_create_chat_model(agent_name),
+        ),
+        TestClient(isolated_app) as client,
+    ):
+        # --- 1. Register & auto-login ---
+        register = client.post(
+            "/api/v1/auth/register",
+            json={"email": "e2e-user@example.com", "password": "very-strong-password-123"},
+        )
+        assert register.status_code == 201, register.text
+        registered = register.json()
+        auth_uid = registered["id"]
+        # The endpoint sets both access_token (auth) and csrf_token (CSRF Double
+        # Submit Cookie) cookies; the TestClient cookie jar propagates them.
+        assert client.cookies.get("access_token"), "register endpoint must set session cookie"
+        csrf_token = client.cookies.get("csrf_token")
+        assert csrf_token, "register endpoint must set csrf_token cookie"
+
+        # --- 2. Create a thread (require_existing=True on /runs/stream means
+        # we must call POST /api/threads first; the React frontend does the
+        # same via the LangGraph SDK's threads.create) ---
+        import uuid as _uuid
+
+        thread_id = str(_uuid.uuid4())
+        created = client.post(
+            "/api/threads",
+            json={"thread_id": thread_id, "metadata": {}},
+            headers={"X-CSRF-Token": csrf_token},
+        )
+        assert created.status_code == 200, created.text
+
+        # --- 3. POST /runs/stream with the bootstrap wire format ---
+        # This is the EXACT shape the React frontend sends after PR #2784:
+        #   thread.submit(input, {config, context}) ->
+        #   POST /api/threads/{id}/runs/stream body =
+        #     {assistant_id, input, config, context}
+        body = {
+            "assistant_id": "lead_agent",
+            "input": {
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": (f"The new custom agent name is {agent_name}. Help me design its SOUL.md before saving it."),
+                    }
+                ]
+            },
+            "config": {"recursion_limit": 50},
+            "context": {
+                "agent_name": agent_name,
+                "is_bootstrap": True,
+                "mode": "flash",
+                "thinking_enabled": False,
+                "is_plan_mode": False,
+                "subagent_enabled": False,
+            },
+            "stream_mode": ["values"],
+        }
+        # The /stream endpoint returns SSE; we drain it so the server-side
+        # background task (run_agent) gets to completion before we look at disk.
+        with client.stream(
+            "POST",
+            f"/api/threads/{thread_id}/runs/stream",
+            json=body,
+            headers={"X-CSRF-Token": csrf_token},
+        ) as resp:
+            assert resp.status_code == 200, resp.read().decode()
+            transcript = _drain_stream(resp)
+
+        # Sanity: the stream should have produced at least one event
+        assert "event:" in transcript, f"no SSE events in response: {transcript[:500]!r}"
+
+        # --- 4. Verify filesystem outcome ---
+        expected_dir = isolated_deer_flow_home / "users" / auth_uid / "agents" / agent_name
+        default_dir = isolated_deer_flow_home / "users" / "default" / "agents" / agent_name
+
+        # The setup_agent tool runs inside the background asyncio task spawned
+        # by start_run; SSE-drain typically waits for it, but we add a bounded
+        # poll to be robust against scheduler jitter.
+        assert _wait_for_file(expected_dir / "SOUL.md", timeout=15.0), (
+            "SOUL.md did not appear under users/<auth_uid>/agents/. "
+            f"Expected: {expected_dir / 'SOUL.md'}. "
+            f"tmp tree: {sorted(str(p.relative_to(isolated_deer_flow_home)) for p in isolated_deer_flow_home.rglob('SOUL.md'))}. "
+            f"SSE transcript tail: {transcript[-1000:]!r}"
+        )
+
+        soul_text = (expected_dir / "SOUL.md").read_text()
+        assert agent_name in soul_text, f"unexpected SOUL content: {soul_text!r}"
+
+        # The smoking-gun assertion: the agent must NOT have landed in default/
+        assert not default_dir.exists(), f"REGRESSION: agent landed under users/default/{agent_name} instead of the authenticated user. Default-dir contents: {list(default_dir.rglob('*')) if default_dir.exists() else 'n/a'}"
@@ -291,7 +291,7 @@ class TestAgentConstruction:
        assert captured["agent"]["model"] is model
        assert captured["agent"]["middleware"] is middlewares
        assert captured["agent"]["tools"] == []
-        assert captured["agent"]["system_prompt"] == base_config.system_prompt
+        assert captured["agent"]["system_prompt"] is None  # system_prompt is merged into initial state messages

    @pytest.mark.anyio
    async def test_load_skill_messages_uses_explicit_app_config_for_skill_storage(
@@ -331,6 +331,124 @@ class TestAgentConstruction:
        assert len(messages) == 1
        assert "Use demo skill" in messages[0].content

+    @pytest.mark.anyio
+    async def test_build_initial_state_consolidates_system_prompt_and_skills(
+        self,
+        classes,
+        base_config,
+        monkeypatch: pytest.MonkeyPatch,
+        tmp_path,
+    ):
+        """_build_initial_state merges system_prompt and skills into one SystemMessage."""
+        SubagentExecutor = classes["SubagentExecutor"]
+
+        skill_dir = tmp_path / "my-skill"
+        skill_dir.mkdir()
+        skill_file = skill_dir / "SKILL.md"
+        skill_file.write_text("Skill instructions here", encoding="utf-8")
+
+        monkeypatch.setattr(
+            sys.modules["deerflow.skills.storage"],
+            "get_or_new_skill_storage",
+            lambda *, app_config=None: SimpleNamespace(load_skills=lambda *, enabled_only: [SimpleNamespace(name="my-skill", skill_file=skill_file, allowed_tools=None)]),
+        )
+
+        executor = SubagentExecutor(
+            config=base_config,
+            tools=[],
+            thread_id="test-thread",
+        )
+
+        state, _filtered_tools = await executor._build_initial_state("Do the task")
+
+        messages = state["messages"]
+        # Should have exactly 2 messages: one combined SystemMessage + one HumanMessage
+        assert len(messages) == 2
+
+        from langchain_core.messages import HumanMessage, SystemMessage
+
+        assert isinstance(messages[0], SystemMessage)
+        assert isinstance(messages[1], HumanMessage)
+        # SystemMessage should contain both the system_prompt and skill content
+        assert base_config.system_prompt in messages[0].content
+        assert "Skill instructions here" in messages[0].content
+        # HumanMessage should be the task
+        assert messages[1].content == "Do the task"
+
+    @pytest.mark.anyio
+    async def test_build_initial_state_no_skills_only_system_prompt(
+        self,
+        classes,
+        base_config,
+        monkeypatch: pytest.MonkeyPatch,
+    ):
+        """_build_initial_state works when there are no skills."""
+        SubagentExecutor = classes["SubagentExecutor"]
+
+        monkeypatch.setattr(
+            sys.modules["deerflow.skills.storage"],
+            "get_or_new_skill_storage",
+            lambda *, app_config=None: SimpleNamespace(load_skills=lambda *, enabled_only: []),
+        )
+
+        executor = SubagentExecutor(
+            config=base_config,
+            tools=[],
+            thread_id="test-thread",
+        )
+
+        state, _filtered_tools = await executor._build_initial_state("Do the task")
+
+        messages = state["messages"]
+        from langchain_core.messages import HumanMessage, SystemMessage
+
+        assert len(messages) == 2
+        assert isinstance(messages[0], SystemMessage)
+        assert base_config.system_prompt in messages[0].content
+        assert isinstance(messages[1], HumanMessage)
+
+    @pytest.mark.anyio
+    async def test_build_initial_state_no_system_prompt_with_skills(
+        self,
+        classes,
+        monkeypatch: pytest.MonkeyPatch,
+        tmp_path,
+    ):
+        """_build_initial_state works when there is no system_prompt but there are skills."""
+        SubagentConfig = classes["SubagentConfig"]
+
+        config = SubagentConfig(
+            name="test-agent",
+            description="Test agent",
+            system_prompt=None,
+            max_turns=10,
+            timeout_seconds=60,
+        )
+
+        skill_dir = tmp_path / "my-skill"
+        skill_dir.mkdir()
+        skill_file = skill_dir / "SKILL.md"
+        skill_file.write_text("Skill content", encoding="utf-8")
+
+        monkeypatch.setattr(
+            sys.modules["deerflow.skills.storage"],
+            "get_or_new_skill_storage",
+            lambda *, app_config=None: SimpleNamespace(load_skills=lambda *, enabled_only: [SimpleNamespace(name="my-skill", skill_file=skill_file, allowed_tools=None)]),
+        )
+
+        SubagentExecutor = classes["SubagentExecutor"]
+        executor = SubagentExecutor(config=config, tools=[], thread_id="test-thread")
+
+        state, _filtered_tools = await executor._build_initial_state("Do the task")
+
+        messages = state["messages"]
+        from langchain_core.messages import HumanMessage, SystemMessage
+
+        assert len(messages) == 2
+        assert isinstance(messages[0], SystemMessage)
+        assert "Skill content" in messages[0].content
+        assert isinstance(messages[1], HumanMessage)
+

 # -----------------------------------------------------------------------------
 # Async Execution Path Tests
@@ -514,6 +632,70 @@ class TestAsyncExecutionPath:
        assert result.status == SubagentStatus.COMPLETED
        assert "Task" in result.result

+    @pytest.mark.anyio
+    async def test_aexecute_passes_at_most_one_system_message_to_agent(
+        self,
+        classes,
+        base_config,
+        monkeypatch: pytest.MonkeyPatch,
+        tmp_path,
+    ):
+        """Regression: messages sent to agent.astream must contain at most one
+        SystemMessage and it must be the first message.
+
+        This catches any regression where system_prompt would be re-injected
+        via create_agent() (e.g. system_prompt not passed as None) and appear
+        as a second SystemMessage, which providers like vLLM and Xinference
+        reject with "System message must be at the beginning."
+        """
+        from langchain_core.messages import AIMessage, SystemMessage
+
+        SubagentExecutor = classes["SubagentExecutor"]
+        SubagentStatus = classes["SubagentStatus"]
+
+        # Set up a skill so both system_prompt AND skill content are present,
+        # maximising the chance of catching a double-SystemMessage regression.
+        skill_dir = tmp_path / "regression-skill"
+        skill_dir.mkdir()
+        (skill_dir / "SKILL.md").write_text("Skill instruction text", encoding="utf-8")
+
+        monkeypatch.setattr(
+            sys.modules["deerflow.skills.storage"],
+            "get_or_new_skill_storage",
+            lambda *, app_config=None: SimpleNamespace(load_skills=lambda *, enabled_only: [SimpleNamespace(name="regression-skill", skill_file=skill_dir / "SKILL.md", allowed_tools=None)]),
+        )
+
+        captured_states: list[dict] = []
+
+        async def capturing_astream(state, **kwargs):
+            captured_states.append(state)
+            yield {"messages": [AIMessage(content="Done", id="msg-1")]}
+
+        mock_agent = MagicMock()
+        mock_agent.astream = capturing_astream
+
+        executor = SubagentExecutor(
+            config=base_config,
+            tools=[],
+            thread_id="test-thread",
+        )
+
+        with patch.object(executor, "_create_agent", return_value=mock_agent):
+            result = await executor._aexecute("Do something")
+
+        assert result.status == SubagentStatus.COMPLETED
+        assert len(captured_states) == 1, "astream should be called exactly once"
+        initial_messages = captured_states[0]["messages"]
+
+        system_messages = [m for m in initial_messages if isinstance(m, SystemMessage)]
+        assert len(system_messages) <= 1, f"Expected at most 1 SystemMessage but got {len(system_messages)}: {system_messages}"
+        if system_messages:
+            assert initial_messages[0] is system_messages[0], "SystemMessage must be the first message in the conversation"
+            # The consolidated SystemMessage must carry both the system_prompt
+            # and all skill content — nothing should be split across two messages.
+            assert base_config.system_prompt in system_messages[0].content
+            assert "Skill instruction text" in system_messages[0].content
+

 class TestSkillAllowedTools:
    @pytest.mark.anyio
@@ -0,0 +1,161 @@
+"""Tests for SubagentTokenCollector callback handler."""
+
+from unittest.mock import MagicMock
+from uuid import uuid4
+
+from deerflow.subagents.token_collector import SubagentTokenCollector
+
+
+def _make_llm_response(content="Hello", usage=None):
+    """Create a mock LLM response with a message."""
+    msg = MagicMock()
+    msg.content = content
+    msg.usage_metadata = usage
+
+    gen = MagicMock()
+    gen.message = msg
+
+    response = MagicMock()
+    response.generations = [[gen]]
+    return response
+
+
+def _make_llm_response_from_usages(usages):
+    """Create a mock LLM response with one generation per usage entry."""
+    generations = []
+    for usage in usages:
+        msg = MagicMock()
+        msg.content = "chunk"
+        msg.usage_metadata = usage
+
+        gen = MagicMock()
+        gen.message = msg
+        generations.append([gen])
+
+    response = MagicMock()
+    response.generations = generations
+    return response
+
+
+class TestSubagentTokenCollector:
+    def test_collects_usage_from_response(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        usage = {"input_tokens": 100, "output_tokens": 50, "total_tokens": 150}
+        collector.on_llm_end(_make_llm_response("Hi", usage=usage), run_id=uuid4())
+        records = collector.snapshot_records()
+        assert len(records) == 1
+        assert records[0]["caller"] == "subagent:test"
+        assert records[0]["input_tokens"] == 100
+        assert records[0]["output_tokens"] == 50
+        assert records[0]["total_tokens"] == 150
+        assert "source_run_id" in records[0]
+
+    def test_total_tokens_zero_uses_input_plus_output(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        usage = {"input_tokens": 200, "output_tokens": 100, "total_tokens": 0}
+        collector.on_llm_end(_make_llm_response("Hi", usage=usage), run_id=uuid4())
+        records = collector.snapshot_records()
+        assert len(records) == 1
+        assert records[0]["total_tokens"] == 300
+
+    def test_total_tokens_missing_uses_input_plus_output(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        usage = {"input_tokens": 30, "output_tokens": 20}
+        collector.on_llm_end(_make_llm_response("Hi", usage=usage), run_id=uuid4())
+        records = collector.snapshot_records()
+        assert len(records) == 1
+        assert records[0]["total_tokens"] == 50
+
+    def test_dedup_same_run_id(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        run_id = uuid4()
+        usage = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        collector.on_llm_end(_make_llm_response("A", usage=usage), run_id=run_id)
+        collector.on_llm_end(_make_llm_response("A", usage=usage), run_id=run_id)
+        records = collector.snapshot_records()
+        assert len(records) == 1
+
+    def test_no_usage_no_record(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        collector.on_llm_end(_make_llm_response("Hi", usage=None), run_id=uuid4())
+        records = collector.snapshot_records()
+        assert len(records) == 0
+
+    def test_zero_usage_no_record(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
+        collector.on_llm_end(_make_llm_response("Hi", usage=usage), run_id=uuid4())
+        records = collector.snapshot_records()
+        assert len(records) == 0
+
+    def test_skips_empty_generation_and_records_later_usage(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        response = _make_llm_response_from_usages(
+            [
+                None,
+                {"input_tokens": 20, "output_tokens": 10, "total_tokens": 30},
+            ]
+        )
+
+        collector.on_llm_end(response, run_id=uuid4())
+
+        records = collector.snapshot_records()
+        assert len(records) == 1
+        assert records[0]["total_tokens"] == 30
+
+    def test_snapshot_returns_copy(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        usage = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        collector.on_llm_end(_make_llm_response("Hi", usage=usage), run_id=uuid4())
+        snap1 = collector.snapshot_records()
+        snap2 = collector.snapshot_records()
+        assert snap1 == snap2
+        assert snap1 is not snap2
+        # Mutating snapshot does not affect internal records
+        snap1.append({"source_run_id": "fake"})
+        assert len(collector.snapshot_records()) == 1
+
+    def test_multiple_calls_accumulate(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        usage = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        collector.on_llm_end(_make_llm_response("A", usage=usage), run_id=uuid4())
+        collector.on_llm_end(_make_llm_response("B", usage=usage), run_id=uuid4())
+        records = collector.snapshot_records()
+        assert len(records) == 2
+
+    def test_different_run_ids_accumulate_separately(self):
+        collector = SubagentTokenCollector(caller="subagent:test")
+        usage1 = {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
+        usage2 = {"input_tokens": 20, "output_tokens": 10, "total_tokens": 30}
+        collector.on_llm_end(_make_llm_response("A", usage=usage1), run_id=uuid4())
+        collector.on_llm_end(_make_llm_response("B", usage=usage2), run_id=uuid4())
+        records = collector.snapshot_records()
+        assert len(records) == 2
+        assert records[0]["total_tokens"] == 15
+        assert records[1]["total_tokens"] == 30
+
+    def test_message_without_usage_metadata_skipped(self):
+        """A response where message has no usage_metadata attribute must be skipped."""
+        collector = SubagentTokenCollector(caller="subagent:test")
+
+        msg = MagicMock(spec=[])  # object without usage_metadata
+        gen = MagicMock()
+        gen.message = msg
+        response = MagicMock()
+        response.generations = [[gen]]
+
+        collector.on_llm_end(response, run_id=uuid4())
+        records = collector.snapshot_records()
+        assert len(records) == 0
+
+    def test_generation_without_message_skipped(self):
+        """A generation without a message attribute must be skipped."""
+        collector = SubagentTokenCollector(caller="subagent:test")
+
+        gen = MagicMock(spec=[])  # object without message
+        response = MagicMock()
+        response.generations = [[gen]]
+
+        collector.on_llm_end(response, run_id=uuid4())
+        records = collector.snapshot_records()
+        assert len(records) == 0
@@ -777,22 +777,27 @@ def test_cleanup_not_called_on_polling_safety_timeout(monkeypatch):


 def test_cleanup_scheduled_on_cancellation(monkeypatch):
-    """Verify cancellation schedules deferred cleanup for the background task."""
+    """Verify cancellation handler synchronously cleans up after shielded wait."""
    config = _make_subagent_config()
    events = []
    cleanup_calls = []
-    scheduled_cleanup_coros = []
    poll_count = 0

    def get_result(_: str):
        nonlocal poll_count
        poll_count += 1
-        if poll_count == 1:
+        # Main loop polls RUNNING twice, then shielded wait gets COMPLETED
+        if poll_count <= 2:
            return _make_result(FakeSubagentStatus.RUNNING, ai_messages=[])
        return _make_result(FakeSubagentStatus.COMPLETED, result="done")

-    async def cancel_on_first_sleep(_: float) -> None:
-        raise asyncio.CancelledError
+    sleep_count = 0
+
+    async def cancel_on_second_sleep(_: float) -> None:
+        nonlocal sleep_count
+        sleep_count += 1
+        if sleep_count == 2:
+            raise asyncio.CancelledError

    monkeypatch.setattr(task_tool_module, "SubagentStatus", FakeSubagentStatus)
    monkeypatch.setattr(
@@ -804,12 +809,7 @@ def test_cleanup_scheduled_on_cancellation(monkeypatch):

    monkeypatch.setattr(task_tool_module, "get_background_task_result", get_result)
    monkeypatch.setattr(task_tool_module, "get_stream_writer", lambda: events.append)
-    monkeypatch.setattr(task_tool_module.asyncio, "sleep", cancel_on_first_sleep)
-    monkeypatch.setattr(
-        task_tool_module.asyncio,
-        "create_task",
-        lambda coro: scheduled_cleanup_coros.append(coro) or _DummyScheduledTask(),
-    )
+    monkeypatch.setattr(task_tool_module.asyncio, "sleep", cancel_on_second_sleep)
    monkeypatch.setattr("deerflow.tools.get_available_tools", lambda **kwargs: [])
    monkeypatch.setattr(
        task_tool_module,
@@ -826,25 +826,48 @@ def test_cleanup_scheduled_on_cancellation(monkeypatch):
            tool_call_id="tc-cancelled-cleanup",
        )

-    assert cleanup_calls == []
-    assert len(scheduled_cleanup_coros) == 1
-
-    asyncio.run(scheduled_cleanup_coros.pop())
-
+    # Cleanup happens synchronously within the cancellation handler
    assert cleanup_calls == ["tc-cancelled-cleanup"]


 def test_cancelled_cleanup_stops_after_timeout(monkeypatch):
-    """Verify deferred cleanup gives up after a bounded number of polls."""
+    """Verify cancellation handler survives a shielded-wait timeout gracefully.
+
+    When the subagent never reaches a terminal state, the shielded wait times
+    out (or is interrupted), the handler reports whatever usage it can, calls
+    cleanup (which is a no-op for non-terminal tasks), and re-raises.
+    """
    config = _make_subagent_config()
-    config.timeout_seconds = 1
    events = []
+    report_calls = []
    cleanup_calls = []
-    scheduled_cleanup_coros = []
+    scheduled_cleanups = []
+
+    # Always return RUNNING — subagent never finishes
+    monkeypatch.setattr(
+        task_tool_module,
+        "get_background_task_result",
+        lambda _: _make_result(FakeSubagentStatus.RUNNING, ai_messages=[]),
+    )

    async def cancel_on_first_sleep(_: float) -> None:
        raise asyncio.CancelledError

+    def fake_report_subagent_usage(runtime, result):
+        report_calls.append((runtime, result))
+
+    class DummyCleanupTask:
+        def __init__(self, coro):
+            self.coro = coro
+
+        def add_done_callback(self, callback):
+            self.callback = callback
+
+    def fake_create_task(coro):
+        scheduled_cleanups.append(coro)
+        coro.close()
+        return DummyCleanupTask(coro)
+
    monkeypatch.setattr(task_tool_module, "SubagentStatus", FakeSubagentStatus)
    monkeypatch.setattr(
        task_tool_module,
@@ -852,19 +875,10 @@ def test_cancelled_cleanup_stops_after_timeout(monkeypatch):
        type("DummyExecutor", (), {"__init__": lambda self, **kwargs: None, "execute_async": lambda self, prompt, task_id=None: task_id}),
    )
    monkeypatch.setattr(task_tool_module, "get_subagent_config", lambda _: config)
-
-    monkeypatch.setattr(
-        task_tool_module,
-        "get_background_task_result",
-        lambda _: _make_result(FakeSubagentStatus.RUNNING, ai_messages=[]),
-    )
    monkeypatch.setattr(task_tool_module, "get_stream_writer", lambda: events.append)
    monkeypatch.setattr(task_tool_module.asyncio, "sleep", cancel_on_first_sleep)
-    monkeypatch.setattr(
-        task_tool_module.asyncio,
-        "create_task",
-        lambda coro: scheduled_cleanup_coros.append(coro) or _DummyScheduledTask(),
-    )
+    monkeypatch.setattr(task_tool_module.asyncio, "create_task", fake_create_task)
+    monkeypatch.setattr(task_tool_module, "_report_subagent_usage", fake_report_subagent_usage)
    monkeypatch.setattr("deerflow.tools.get_available_tools", lambda **kwargs: [])
    monkeypatch.setattr(
        task_tool_module,
@@ -881,13 +895,73 @@ def test_cancelled_cleanup_stops_after_timeout(monkeypatch):
            tool_call_id="tc-cancelled-timeout",
        )

-    async def bounded_sleep(_seconds: float) -> None:
-        return None
-
-    monkeypatch.setattr(task_tool_module.asyncio, "sleep", bounded_sleep)
-    asyncio.run(scheduled_cleanup_coros.pop())
-
+    # Non-terminal tasks cannot be cleaned immediately; a deferred cleanup
+    # keeps polling after the parent cancellation path exits.
    assert cleanup_calls == []
+    assert len(scheduled_cleanups) == 1
+    # _report_subagent_usage is called (but skips because result has no records)
+    assert len(report_calls) == 1
+
+
+def test_cancellation_wait_uses_subagent_polling_budget(monkeypatch):
+    """Cancelled parent waits on the existing subagent polling budget, not a fixed timeout."""
+    config = _make_subagent_config()
+    events = []
+    report_calls = []
+    cleanup_calls = []
+    sleep_count = 0
+    result_polls = 0
+    terminal_result = _make_result(FakeSubagentStatus.COMPLETED, result="done")
+
+    def get_result(_: str):
+        nonlocal result_polls
+        result_polls += 1
+        if result_polls < 5:
+            return _make_result(FakeSubagentStatus.RUNNING, ai_messages=[])
+        return terminal_result
+
+    async def cancel_then_continue(_: float) -> None:
+        nonlocal sleep_count
+        sleep_count += 1
+        if sleep_count == 1:
+            raise asyncio.CancelledError
+
+    def fake_report_subagent_usage(runtime, result):
+        report_calls.append((runtime, result))
+
+    async def fail_on_fixed_timeout(awaitable, *, timeout=None):
+        raise AssertionError(f"cancellation wait should not use fixed timeout={timeout}")
+
+    monkeypatch.setattr(task_tool_module, "SubagentStatus", FakeSubagentStatus)
+    monkeypatch.setattr(
+        task_tool_module,
+        "SubagentExecutor",
+        type("DummyExecutor", (), {"__init__": lambda self, **kwargs: None, "execute_async": lambda self, prompt, task_id=None: task_id}),
+    )
+    monkeypatch.setattr(task_tool_module, "get_subagent_config", lambda _: config)
+    monkeypatch.setattr(task_tool_module, "get_background_task_result", get_result)
+    monkeypatch.setattr(task_tool_module, "get_stream_writer", lambda: events.append)
+    monkeypatch.setattr(task_tool_module.asyncio, "sleep", cancel_then_continue)
+    monkeypatch.setattr(task_tool_module.asyncio, "wait_for", fail_on_fixed_timeout)
+    monkeypatch.setattr(task_tool_module, "_report_subagent_usage", fake_report_subagent_usage)
+    monkeypatch.setattr("deerflow.tools.get_available_tools", lambda **kwargs: [])
+    monkeypatch.setattr(
+        task_tool_module,
+        "cleanup_background_task",
+        lambda task_id: cleanup_calls.append(task_id),
+    )
+
+    with pytest.raises(asyncio.CancelledError):
+        _run_task_tool(
+            runtime=_make_runtime(),
+            description="执行任务",
+            prompt="cancel task",
+            subagent_type="general-purpose",
+            tool_call_id="tc-cancel-budget",
+        )
+
+    assert report_calls == [(_make_runtime(), terminal_result)]
+    assert cleanup_calls == ["tc-cancel-budget"]


 def test_cancellation_calls_request_cancel(monkeypatch):
@@ -895,7 +969,6 @@ def test_cancellation_calls_request_cancel(monkeypatch):
    config = _make_subagent_config()
    events = []
    cancel_requests = []
-    scheduled_cleanup_coros = []

    async def cancel_on_first_sleep(_: float) -> None:
        raise asyncio.CancelledError
@@ -915,11 +988,6 @@ def test_cancellation_calls_request_cancel(monkeypatch):
    )
    monkeypatch.setattr(task_tool_module, "get_stream_writer", lambda: events.append)
    monkeypatch.setattr(task_tool_module.asyncio, "sleep", cancel_on_first_sleep)
-    monkeypatch.setattr(
-        task_tool_module.asyncio,
-        "create_task",
-        lambda coro: (coro.close(), scheduled_cleanup_coros.append(None))[-1] or _DummyScheduledTask(),
-    )
    monkeypatch.setattr("deerflow.tools.get_available_tools", lambda **kwargs: [])
    monkeypatch.setattr(
        task_tool_module,
@@ -987,3 +1055,80 @@ def test_task_tool_returns_cancelled_message(monkeypatch):
    assert output == "Task cancelled by user."
    assert any(e.get("type") == "task_cancelled" for e in events)
    assert cleanup_calls == ["tc-poll-cancelled"]
+
+
+def test_cancellation_reports_subagent_usage(monkeypatch):
+    """Verify cancellation handler waits (shielded) for subagent terminal state,
+    then reports the final token usage before re-raising CancelledError.
+
+    The report must happen synchronously within the cancellation handler so
+    the parent worker's finally block sees the updated journal totals.
+    """
+    config = _make_subagent_config()
+    events = []
+    report_calls = []
+    cleanup_calls = []
+
+    # Terminal result with token usage collected after cancellation processing
+    cancel_result = _make_result(FakeSubagentStatus.CANCELLED, error="Cancelled by user")
+    cancel_result.token_usage_records = [{"source_run_id": "sub-run-1", "caller": "subagent:gp", "input_tokens": 50, "output_tokens": 25, "total_tokens": 75}]
+    cancel_result.usage_reported = False
+
+    poll_count = 0
+
+    def get_result(_: str):
+        nonlocal poll_count
+        poll_count += 1
+        # Main loop polls 3 times (RUNNING each time to keep looping)
+        if poll_count <= 3:
+            running = _make_result(FakeSubagentStatus.RUNNING, ai_messages=[])
+            running.token_usage_records = []
+            running.usage_reported = False
+            return running
+        # Shielded wait poll gets the terminal result
+        return cancel_result
+
+    sleep_count = 0
+
+    async def cancel_on_third_sleep(_: float) -> None:
+        nonlocal sleep_count
+        sleep_count += 1
+        if sleep_count == 3:
+            raise asyncio.CancelledError
+
+    def fake_report_subagent_usage(runtime, result):
+        report_calls.append((runtime, result))
+
+    monkeypatch.setattr(task_tool_module, "SubagentStatus", FakeSubagentStatus)
+    monkeypatch.setattr(
+        task_tool_module,
+        "SubagentExecutor",
+        type("DummyExecutor", (), {"__init__": lambda self, **kwargs: None, "execute_async": lambda self, prompt, task_id=None: task_id}),
+    )
+    monkeypatch.setattr(task_tool_module, "get_subagent_config", lambda _: config)
+    monkeypatch.setattr(task_tool_module, "get_background_task_result", get_result)
+    monkeypatch.setattr(task_tool_module, "get_stream_writer", lambda: events.append)
+    monkeypatch.setattr(task_tool_module.asyncio, "sleep", cancel_on_third_sleep)
+    monkeypatch.setattr(task_tool_module, "_report_subagent_usage", fake_report_subagent_usage)
+    monkeypatch.setattr("deerflow.tools.get_available_tools", lambda **kwargs: [])
+    monkeypatch.setattr(task_tool_module, "request_cancel_background_task", lambda _: None)
+    monkeypatch.setattr(
+        task_tool_module,
+        "cleanup_background_task",
+        lambda task_id: cleanup_calls.append(task_id),
+    )
+
+    with pytest.raises(asyncio.CancelledError):
+        _run_task_tool(
+            runtime=_make_runtime(),
+            description="执行任务",
+            prompt="cancel me",
+            subagent_type="general-purpose",
+            tool_call_id="tc-cancel-report",
+        )
+
+    # _report_subagent_usage is called synchronously within the cancellation
+    # handler (after the shielded wait), before CancelledError is re-raised.
+    assert len(report_calls) == 1
+    assert report_calls[0][1] is cancel_result
+    assert cleanup_calls == ["tc-cancel-report"]
@@ -1,28 +1,25 @@
 """Tests for ThreadMetaRepository (SQLAlchemy-backed)."""

+import logging
+
 import pytest

-from deerflow.persistence.thread_meta import ThreadMetaRepository
+from deerflow.persistence.thread_meta import InvalidMetadataFilterError, ThreadMetaRepository


-async def _make_repo(tmp_path):
-    from deerflow.persistence.engine import get_session_factory, init_engine
+@pytest.fixture
+async def repo(tmp_path):
+    from deerflow.persistence.engine import close_engine, get_session_factory, init_engine

    url = f"sqlite+aiosqlite:///{tmp_path / 'test.db'}"
    await init_engine("sqlite", url=url, sqlite_dir=str(tmp_path))
-    return ThreadMetaRepository(get_session_factory())
-
-
-async def _cleanup():
-    from deerflow.persistence.engine import close_engine
-
+    yield ThreadMetaRepository(get_session_factory())
    await close_engine()


 class TestThreadMetaRepository:
    @pytest.mark.anyio
-    async def test_create_and_get(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_create_and_get(self, repo):
        record = await repo.create("t1")
        assert record["thread_id"] == "t1"
        assert record["status"] == "idle"
@@ -31,148 +28,523 @@ class TestThreadMetaRepository:
        fetched = await repo.get("t1")
        assert fetched is not None
        assert fetched["thread_id"] == "t1"
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_create_with_assistant_id(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_create_with_assistant_id(self, repo):
        record = await repo.create("t1", assistant_id="agent1")
        assert record["assistant_id"] == "agent1"
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_create_with_owner_and_display_name(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_create_with_owner_and_display_name(self, repo):
        record = await repo.create("t1", user_id="user1", display_name="My Thread")
        assert record["user_id"] == "user1"
        assert record["display_name"] == "My Thread"
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_create_with_metadata(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_create_with_metadata(self, repo):
        record = await repo.create("t1", metadata={"key": "value"})
        assert record["metadata"] == {"key": "value"}
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_get_nonexistent(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_get_nonexistent(self, repo):
        assert await repo.get("nonexistent") is None
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_check_access_no_record_allows(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_check_access_no_record_allows(self, repo):
        assert await repo.check_access("unknown", "user1") is True
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_check_access_owner_matches(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_check_access_owner_matches(self, repo):
        await repo.create("t1", user_id="user1")
        assert await repo.check_access("t1", "user1") is True
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_check_access_owner_mismatch(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_check_access_owner_mismatch(self, repo):
        await repo.create("t1", user_id="user1")
        assert await repo.check_access("t1", "user2") is False
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_check_access_no_owner_allows_all(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_check_access_no_owner_allows_all(self, repo):
        # Explicit user_id=None to bypass the new AUTO default that
        # would otherwise pick up the test user from the autouse fixture.
        await repo.create("t1", user_id=None)
        assert await repo.check_access("t1", "anyone") is True
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_check_access_strict_missing_row_denied(self, tmp_path):
+    async def test_check_access_strict_missing_row_denied(self, repo):
        """require_existing=True flips the missing-row case to *denied*.

        Closes the delete-idempotence cross-user gap: after a thread is
        deleted, the row is gone, and the permissive default would let any
        caller "claim" it as untracked. The strict mode demands a row.
        """
-        repo = await _make_repo(tmp_path)
        assert await repo.check_access("never-existed", "user1", require_existing=True) is False
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_check_access_strict_owner_match_allowed(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_check_access_strict_owner_match_allowed(self, repo):
        await repo.create("t1", user_id="user1")
        assert await repo.check_access("t1", "user1", require_existing=True) is True
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_check_access_strict_owner_mismatch_denied(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_check_access_strict_owner_mismatch_denied(self, repo):
        await repo.create("t1", user_id="user1")
        assert await repo.check_access("t1", "user2", require_existing=True) is False
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_check_access_strict_null_owner_still_allowed(self, tmp_path):
+    async def test_check_access_strict_null_owner_still_allowed(self, repo):
        """Even in strict mode, a row with NULL user_id stays shared.

        The strict flag tightens the *missing row* case, not the *shared
        row* case — legacy pre-auth rows that survived a clean migration
        without an owner are still everyone's.
        """
-        repo = await _make_repo(tmp_path)
        await repo.create("t1", user_id=None)
        assert await repo.check_access("t1", "anyone", require_existing=True) is True
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_update_status(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_update_status(self, repo):
        await repo.create("t1")
        await repo.update_status("t1", "busy")
        record = await repo.get("t1")
        assert record["status"] == "busy"
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_delete(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_delete(self, repo):
        await repo.create("t1")
        await repo.delete("t1")
        assert await repo.get("t1") is None
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_delete_nonexistent_is_noop(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_delete_nonexistent_is_noop(self, repo):
        await repo.delete("nonexistent")  # should not raise
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_update_metadata_merges(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_update_metadata_merges(self, repo):
        await repo.create("t1", metadata={"a": 1, "b": 2})
        await repo.update_metadata("t1", {"b": 99, "c": 3})
        record = await repo.get("t1")
        # Existing key preserved, overlapping key overwritten, new key added
        assert record["metadata"] == {"a": 1, "b": 99, "c": 3}
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_update_metadata_on_empty(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_update_metadata_on_empty(self, repo):
        await repo.create("t1")
        await repo.update_metadata("t1", {"k": "v"})
        record = await repo.get("t1")
        assert record["metadata"] == {"k": "v"}
-        await _cleanup()

    @pytest.mark.anyio
-    async def test_update_metadata_nonexistent_is_noop(self, tmp_path):
-        repo = await _make_repo(tmp_path)
+    async def test_update_metadata_nonexistent_is_noop(self, repo):
        await repo.update_metadata("nonexistent", {"k": "v"})  # should not raise
-        await _cleanup()
+
+    # --- search with metadata filter (SQL push-down) ---
+
+    @pytest.mark.anyio
+    async def test_search_metadata_filter_string(self, repo):
+        await repo.create("t1", metadata={"env": "prod"})
+        await repo.create("t2", metadata={"env": "staging"})
+        await repo.create("t3", metadata={"env": "prod", "region": "us"})
+
+        results = await repo.search(metadata={"env": "prod"})
+        ids = {r["thread_id"] for r in results}
+        assert ids == {"t1", "t3"}
+
+    @pytest.mark.anyio
+    async def test_search_metadata_filter_numeric(self, repo):
+        await repo.create("t1", metadata={"priority": 1})
+        await repo.create("t2", metadata={"priority": 2})
+        await repo.create("t3", metadata={"priority": 1, "extra": "x"})
+
+        results = await repo.search(metadata={"priority": 1})
+        ids = {r["thread_id"] for r in results}
+        assert ids == {"t1", "t3"}
+
+    @pytest.mark.anyio
+    async def test_search_metadata_filter_multiple_keys(self, repo):
+        await repo.create("t1", metadata={"env": "prod", "region": "us"})
+        await repo.create("t2", metadata={"env": "prod", "region": "eu"})
+        await repo.create("t3", metadata={"env": "staging", "region": "us"})
+
+        results = await repo.search(metadata={"env": "prod", "region": "us"})
+        assert len(results) == 1
+        assert results[0]["thread_id"] == "t1"
+
+    @pytest.mark.anyio
+    async def test_search_metadata_no_match(self, repo):
+        await repo.create("t1", metadata={"env": "prod"})
+
+        results = await repo.search(metadata={"env": "dev"})
+        assert results == []
+
+    @pytest.mark.anyio
+    async def test_search_metadata_pagination_correct(self, repo):
+        """Regression: SQL push-down makes limit/offset exact even when most rows don't match."""
+        for i in range(30):
+            meta = {"target": "yes"} if i % 3 == 0 else {"target": "no"}
+            await repo.create(f"t{i:03d}", metadata=meta)
+
+        # Total matching rows: i in {0,3,6,9,12,15,18,21,24,27} = 10 rows
+        all_matches = await repo.search(metadata={"target": "yes"}, limit=100)
+        assert len(all_matches) == 10
+
+        # Paginate: first page
+        page1 = await repo.search(metadata={"target": "yes"}, limit=3, offset=0)
+        assert len(page1) == 3
+
+        # Paginate: second page
+        page2 = await repo.search(metadata={"target": "yes"}, limit=3, offset=3)
+        assert len(page2) == 3
+
+        # No overlap between pages
+        page1_ids = {r["thread_id"] for r in page1}
+        page2_ids = {r["thread_id"] for r in page2}
+        assert page1_ids.isdisjoint(page2_ids)
+
+        # Last page
+        page_last = await repo.search(metadata={"target": "yes"}, limit=3, offset=9)
+        assert len(page_last) == 1
+
+    @pytest.mark.anyio
+    async def test_search_metadata_with_status_filter(self, repo):
+        await repo.create("t1", metadata={"env": "prod"})
+        await repo.create("t2", metadata={"env": "prod"})
+        await repo.update_status("t1", "busy")
+
+        results = await repo.search(metadata={"env": "prod"}, status="busy")
+        assert len(results) == 1
+        assert results[0]["thread_id"] == "t1"
+
+    @pytest.mark.anyio
+    async def test_search_without_metadata_still_works(self, repo):
+        await repo.create("t1", metadata={"env": "prod"})
+        await repo.create("t2")
+
+        results = await repo.search(limit=10)
+        assert len(results) == 2
+
+    @pytest.mark.anyio
+    async def test_search_metadata_missing_key_no_match(self, repo):
+        """Rows without the requested metadata key should not match."""
+        await repo.create("t1", metadata={"other": "val"})
+        await repo.create("t2", metadata={"env": "prod"})
+
+        results = await repo.search(metadata={"env": "prod"})
+        assert len(results) == 1
+        assert results[0]["thread_id"] == "t2"
+
+    @pytest.mark.anyio
+    async def test_search_metadata_all_unsafe_keys_raises(self, repo, caplog):
+        """When ALL metadata keys are unsafe, raises InvalidMetadataFilterError."""
+        await repo.create("t1", metadata={"env": "prod"})
+        await repo.create("t2", metadata={"env": "staging"})
+
+        with caplog.at_level(logging.WARNING, logger="deerflow.persistence.thread_meta.sql"):
+            with pytest.raises(InvalidMetadataFilterError, match="rejected") as exc_info:
+                await repo.search(metadata={"bad;key": "x"})
+        assert any("bad;key" in r.message for r in caplog.records)
+        # Subclass of ValueError for backward compatibility
+        assert isinstance(exc_info.value, ValueError)
+
+    @pytest.mark.anyio
+    async def test_search_metadata_partial_unsafe_key_skipped(self, repo, caplog):
+        """Valid keys filter rows; only the invalid key is warned and skipped."""
+        await repo.create("t1", metadata={"env": "prod"})
+        await repo.create("t2", metadata={"env": "staging"})
+
+        with caplog.at_level(logging.WARNING, logger="deerflow.persistence.thread_meta.sql"):
+            results = await repo.search(metadata={"env": "prod", "bad;key": "x"})
+        ids = {r["thread_id"] for r in results}
+        assert ids == {"t1"}
+        assert any("bad;key" in r.message for r in caplog.records)
+
+    @pytest.mark.anyio
+    async def test_search_metadata_filter_boolean(self, repo):
+        """True matches only boolean true, not integer 1."""
+        await repo.create("t1", metadata={"active": True})
+        await repo.create("t2", metadata={"active": False})
+        await repo.create("t3", metadata={"active": True, "extra": "x"})
+        await repo.create("t4", metadata={"active": 1})
+
+        results = await repo.search(metadata={"active": True})
+        ids = {r["thread_id"] for r in results}
+        assert ids == {"t1", "t3"}
+
+    @pytest.mark.anyio
+    async def test_search_metadata_filter_none(self, repo):
+        """Only rows with explicit JSON null match; missing key does not."""
+        await repo.create("t1", metadata={"tag": None})
+        await repo.create("t2", metadata={"tag": "present"})
+        await repo.create("t3", metadata={"other": "val"})
+
+        results = await repo.search(metadata={"tag": None})
+        ids = {r["thread_id"] for r in results}
+        assert ids == {"t1"}
+
+    @pytest.mark.anyio
+    async def test_search_metadata_non_string_key_skipped(self, repo, caplog):
+        """Non-string keys raise ValueError from isinstance check; should be warned and skipped."""
+        await repo.create("t1", metadata={"env": "prod"})
+        await repo.create("t2", metadata={"env": "staging"})
+
+        with caplog.at_level(logging.WARNING, logger="deerflow.persistence.thread_meta.sql"):
+            with pytest.raises(InvalidMetadataFilterError, match="rejected"):
+                await repo.search(metadata={1: "x"})
+        assert any("1" in r.message for r in caplog.records)
+
+    @pytest.mark.anyio
+    async def test_search_metadata_unsupported_value_type_skipped(self, repo, caplog):
+        """Unsupported value types (list, dict) raise TypeError; should be warned and skipped."""
+        await repo.create("t1", metadata={"env": "prod"})
+        await repo.create("t2", metadata={"env": "staging"})
+
+        with caplog.at_level(logging.WARNING, logger="deerflow.persistence.thread_meta.sql"):
+            with pytest.raises(InvalidMetadataFilterError, match="rejected"):
+                await repo.search(metadata={"env": ["prod", "staging"]})
+
+    @pytest.mark.anyio
+    async def test_search_metadata_dotted_key_raises(self, repo, caplog):
+        """Dotted keys are rejected; when ALL keys are dotted, raises ValueError."""
+        await repo.create("t1", metadata={"env": "prod"})
+        await repo.create("t2", metadata={"env": "staging"})
+
+        with caplog.at_level(logging.WARNING, logger="deerflow.persistence.thread_meta.sql"):
+            with pytest.raises(InvalidMetadataFilterError, match="rejected"):
+                await repo.search(metadata={"a.b": "anything"})
+        assert any("a.b" in r.message for r in caplog.records)
+
+    # --- dialect-aware type-safe filtering edge cases ---
+
+    @pytest.mark.anyio
+    async def test_search_metadata_bool_vs_int_distinction(self, repo):
+        """True must not match 1; False must not match 0."""
+        await repo.create("bool_true", metadata={"flag": True})
+        await repo.create("bool_false", metadata={"flag": False})
+        await repo.create("int_one", metadata={"flag": 1})
+        await repo.create("int_zero", metadata={"flag": 0})
+
+        true_hits = {r["thread_id"] for r in await repo.search(metadata={"flag": True})}
+        assert true_hits == {"bool_true"}
+
+        false_hits = {r["thread_id"] for r in await repo.search(metadata={"flag": False})}
+        assert false_hits == {"bool_false"}
+
+    @pytest.mark.anyio
+    async def test_search_metadata_int_does_not_match_bool(self, repo):
+        """Integer 1 must not match boolean True."""
+        await repo.create("bool_true", metadata={"val": True})
+        await repo.create("int_one", metadata={"val": 1})
+
+        hits = {r["thread_id"] for r in await repo.search(metadata={"val": 1})}
+        assert hits == {"int_one"}
+
+    @pytest.mark.anyio
+    async def test_search_metadata_none_excludes_missing_key(self, repo):
+        """Filtering by None matches explicit JSON null only, not missing key or empty {}."""
+        await repo.create("explicit_null", metadata={"k": None})
+        await repo.create("missing_key", metadata={"other": "x"})
+        await repo.create("empty_obj", metadata={})
+
+        hits = {r["thread_id"] for r in await repo.search(metadata={"k": None})}
+        assert hits == {"explicit_null"}
+
+    @pytest.mark.anyio
+    async def test_search_metadata_float_value(self, repo):
+        await repo.create("t1", metadata={"score": 3.14})
+        await repo.create("t2", metadata={"score": 2.71})
+        await repo.create("t3", metadata={"score": 3.14})
+
+        hits = {r["thread_id"] for r in await repo.search(metadata={"score": 3.14})}
+        assert hits == {"t1", "t3"}
+
+    @pytest.mark.anyio
+    async def test_search_metadata_mixed_types_same_key(self, repo):
+        """Each type query only matches its own type, even when the key is shared."""
+        await repo.create("str_row", metadata={"x": "hello"})
+        await repo.create("int_row", metadata={"x": 42})
+        await repo.create("bool_row", metadata={"x": True})
+        await repo.create("null_row", metadata={"x": None})
+
+        assert {r["thread_id"] for r in await repo.search(metadata={"x": "hello"})} == {"str_row"}
+        assert {r["thread_id"] for r in await repo.search(metadata={"x": 42})} == {"int_row"}
+        assert {r["thread_id"] for r in await repo.search(metadata={"x": True})} == {"bool_row"}
+        assert {r["thread_id"] for r in await repo.search(metadata={"x": None})} == {"null_row"}
+
+    @pytest.mark.anyio
+    async def test_search_metadata_large_int_precision(self, repo):
+        """Integers beyond float precision (> 2**53) must match exactly."""
+        large = 2**53 + 1
+        await repo.create("t1", metadata={"id": large})
+        await repo.create("t2", metadata={"id": large - 1})
+
+        hits = {r["thread_id"] for r in await repo.search(metadata={"id": large})}
+        assert hits == {"t1"}
+
+
+class TestJsonMatchCompilation:
+    """Verify compiled SQL for both SQLite and PostgreSQL dialects."""
+
+    def test_json_match_compiles_sqlite(self):
+        from sqlalchemy import Column, MetaData, String, Table, create_engine
+        from sqlalchemy.types import JSON
+
+        from deerflow.persistence.json_compat import json_match
+
+        metadata = MetaData()
+        t = Table("t", metadata, Column("data", JSON), Column("id", String))
+        engine = create_engine("sqlite://")
+
+        cases = [
+            (None, "json_type(t.data, '$.\"k\"') = 'null'"),
+            (True, "json_type(t.data, '$.\"k\"') = 'true'"),
+            (False, "json_type(t.data, '$.\"k\"') = 'false'"),
+        ]
+        for value, expected_fragment in cases:
+            expr = json_match(t.c.data, "k", value)
+            sql = expr.compile(dialect=engine.dialect, compile_kwargs={"literal_binds": True})
+            assert str(sql) == expected_fragment, f"value={value!r}: {sql}"
+
+        # int: uses INTEGER cast for precision, type-check narrows to 'integer' only
+        int_expr = json_match(t.c.data, "k", 42)
+        sql = str(int_expr.compile(dialect=engine.dialect, compile_kwargs={"literal_binds": True}))
+        assert "json_type" in sql
+        assert "= 'integer'" in sql
+        assert "INTEGER" in sql
+        assert "CAST" in sql
+
+        # float: uses REAL cast, type-check spans 'integer' and 'real'
+        float_expr = json_match(t.c.data, "k", 3.14)
+        sql = str(float_expr.compile(dialect=engine.dialect, compile_kwargs={"literal_binds": True}))
+        assert "json_type" in sql
+        assert "IN ('integer', 'real')" in sql
+        assert "REAL" in sql
+
+        str_expr = json_match(t.c.data, "k", "hello")
+        sql = str(str_expr.compile(dialect=engine.dialect, compile_kwargs={"literal_binds": True}))
+        assert "json_type" in sql
+        assert "'text'" in sql
+
+    def test_json_match_compiles_pg(self):
+        from sqlalchemy import Column, MetaData, String, Table
+        from sqlalchemy.dialects import postgresql
+        from sqlalchemy.types import JSON
+
+        from deerflow.persistence.json_compat import json_match
+
+        metadata = MetaData()
+        t = Table("t", metadata, Column("data", JSON), Column("id", String))
+        dialect = postgresql.dialect()
+
+        cases = [
+            (None, "json_typeof(t.data -> 'k') = 'null'"),
+            (True, "(json_typeof(t.data -> 'k') = 'boolean' AND (t.data ->> 'k') = 'true')"),
+            (False, "(json_typeof(t.data -> 'k') = 'boolean' AND (t.data ->> 'k') = 'false')"),
+        ]
+        for value, expected_fragment in cases:
+            expr = json_match(t.c.data, "k", value)
+            sql = expr.compile(dialect=dialect, compile_kwargs={"literal_binds": True})
+            assert str(sql) == expected_fragment, f"value={value!r}: {sql}"
+
+        # int: CASE guard prevents CAST error when 'number' also matches floats
+        int_expr = json_match(t.c.data, "k", 42)
+        sql = str(int_expr.compile(dialect=dialect, compile_kwargs={"literal_binds": True}))
+        assert "json_typeof" in sql
+        assert "'number'" in sql
+        assert "BIGINT" in sql
+        assert "CASE WHEN" in sql
+        assert "'^-?[0-9]+$'" in sql
+
+        # float: uses DOUBLE PRECISION cast
+        float_expr = json_match(t.c.data, "k", 3.14)
+        sql = str(float_expr.compile(dialect=dialect, compile_kwargs={"literal_binds": True}))
+        assert "json_typeof" in sql
+        assert "'number'" in sql
+        assert "DOUBLE PRECISION" in sql
+
+        str_expr = json_match(t.c.data, "k", "hello")
+        sql = str(str_expr.compile(dialect=dialect, compile_kwargs={"literal_binds": True}))
+        assert "json_typeof" in sql
+        assert "'string'" in sql
+
+    def test_json_match_rejects_unsafe_key(self):
+        from sqlalchemy import Column, MetaData, String, Table
+        from sqlalchemy.types import JSON
+
+        from deerflow.persistence.json_compat import json_match
+
+        metadata = MetaData()
+        t = Table("t", metadata, Column("data", JSON), Column("id", String))
+
+        for bad_key in ["a.b", "with space", "bad'quote", 'bad"quote', "back\\slash", "semi;colon", ""]:
+            with pytest.raises(ValueError, match="JsonMatch key must match"):
+                json_match(t.c.data, bad_key, "x")
+
+        # Non-string keys must also raise ValueError (not TypeError from re.match)
+        for non_str_key in [42, None, ("k",)]:
+            with pytest.raises(ValueError, match="JsonMatch key must match"):
+                json_match(t.c.data, non_str_key, "x")
+
+    def test_json_match_rejects_unsupported_value_type(self):
+        from sqlalchemy import Column, MetaData, String, Table
+        from sqlalchemy.types import JSON
+
+        from deerflow.persistence.json_compat import json_match
+
+        metadata = MetaData()
+        t = Table("t", metadata, Column("data", JSON), Column("id", String))
+
+        for bad_value in [[], {}, object()]:
+            with pytest.raises(TypeError, match="JsonMatch value must be"):
+                json_match(t.c.data, "k", bad_value)
+
+    def test_json_match_unsupported_dialect_raises(self):
+        from sqlalchemy import Column, MetaData, String, Table
+        from sqlalchemy.dialects import mysql
+        from sqlalchemy.types import JSON
+
+        from deerflow.persistence.json_compat import json_match
+
+        metadata = MetaData()
+        t = Table("t", metadata, Column("data", JSON), Column("id", String))
+        expr = json_match(t.c.data, "k", "v")
+
+        with pytest.raises(NotImplementedError, match="mysql"):
+            str(expr.compile(dialect=mysql.dialect(), compile_kwargs={"literal_binds": True}))
+
+    def test_json_match_rejects_out_of_range_int(self):
+        from sqlalchemy import Column, MetaData, String, Table
+        from sqlalchemy.types import JSON
+
+        from deerflow.persistence.json_compat import json_match
+
+        metadata = MetaData()
+        t = Table("t", metadata, Column("data", JSON), Column("id", String))
+
+        # boundary values must be accepted
+        json_match(t.c.data, "k", 2**63 - 1)
+        json_match(t.c.data, "k", -(2**63))
+
+        # one beyond each boundary must be rejected
+        for out_of_range in [2**63, -(2**63) - 1, 10**30]:
+            with pytest.raises(TypeError, match="out of signed 64-bit range"):
+                json_match(t.c.data, "k", out_of_range)
+
+    def test_compiler_raises_on_escaped_key(self):
+        """Compiler raises ValueError even when __init__ validation is bypassed."""
+        from sqlalchemy import Column, MetaData, String, Table, create_engine
+        from sqlalchemy.dialects import postgresql
+        from sqlalchemy.types import JSON
+
+        from deerflow.persistence.json_compat import json_match
+
+        metadata = MetaData()
+        t = Table("t", metadata, Column("data", JSON), Column("id", String))
+        engine = create_engine("sqlite://")
+
+        elem = json_match(t.c.data, "k", "v")
+        elem.key = "bad.key"  # bypass __init__ to simulate -O stripping assert
+
+        with pytest.raises(ValueError, match="Key escaped validation"):
+            str(elem.compile(dialect=engine.dialect, compile_kwargs={"literal_binds": True}))
+
+        with pytest.raises(ValueError, match="Key escaped validation"):
+            str(elem.compile(dialect=postgresql.dialect(), compile_kwargs={"literal_binds": True}))
@@ -10,6 +10,7 @@ from langgraph.store.memory import InMemoryStore

 from app.gateway.routers import threads
 from deerflow.config.paths import Paths
+from deerflow.persistence.thread_meta import InvalidMetadataFilterError
 from deerflow.persistence.thread_meta.memory import THREADS_NS, MemoryThreadMetaStore

 _ISO_TIMESTAMP_RE = re.compile(r"^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}")
@@ -431,3 +432,56 @@ def test_get_thread_history_returns_iso_for_legacy_checkpoint_metadata() -> None
    assert entries, "expected at least one history entry"
    for entry in entries:
        assert _ISO_TIMESTAMP_RE.match(entry["created_at"]), entry
+
+
+# ── Metadata filter validation at API boundary ────────────────────────────────
+
+
+def test_search_threads_rejects_invalid_key_at_api_boundary() -> None:
+    """Keys that don't match [A-Za-z0-9_-]+ are rejected by the Pydantic
+    validator on ThreadSearchRequest.metadata — 422 from both backends.
+    """
+    app, _store, _checkpointer = _build_thread_app()
+
+    with TestClient(app) as client:
+        response = client.post("/api/threads/search", json={"metadata": {"bad;key": "x"}})
+
+    assert response.status_code == 422
+
+
+def test_search_threads_rejects_unsupported_value_type_at_api_boundary() -> None:
+    """Value types outside (None, bool, int, float, str) are rejected."""
+    app, _store, _checkpointer = _build_thread_app()
+
+    with TestClient(app) as client:
+        response = client.post("/api/threads/search", json={"metadata": {"env": ["a", "b"]}})
+
+    assert response.status_code == 422
+
+
+def test_search_threads_returns_400_for_backend_invalid_metadata_filter() -> None:
+    """If the backend still raises InvalidMetadataFilterError (defense in
+    depth), the handler surfaces it as HTTP 400.
+    """
+    app, _store, _checkpointer = _build_thread_app()
+    thread_store = app.state.thread_store
+
+    async def _raise(**kwargs):
+        raise InvalidMetadataFilterError("rejected")
+
+    with TestClient(app) as client:
+        with patch.object(thread_store, "search", side_effect=_raise):
+            response = client.post("/api/threads/search", json={"metadata": {"valid_key": "x"}})
+
+    assert response.status_code == 400
+    assert "rejected" in response.json()["detail"]
+
+
+def test_search_threads_succeeds_with_valid_metadata() -> None:
+    """Sanity check: valid metadata passes through without error."""
+    app, _store, _checkpointer = _build_thread_app()
+
+    with TestClient(app) as client:
+        response = client.post("/api/threads/search", json={"metadata": {"env": "prod"}})
+
+    assert response.status_code == 200
@@ -89,3 +89,20 @@ def test_tool_args_schema_does_not_emit_pydantic_context_warning(tool_obj, extra

    pydantic_warnings = [w for w in caught if "PydanticSerializationUnexpectedValue" in str(w.message)]
    assert not pydantic_warnings, f"{tool_obj.name} args_schema.model_dump() emitted Pydantic context serialization warnings: {[str(w.message) for w in pydantic_warnings]}"
+
+
+def test_write_file_append_is_discoverable_in_tool_schema() -> None:
+    """``append`` must be visible and described in the model-facing tool schema."""
+    assert "append" in write_file_tool.description
+
+    append_field = write_file_tool.tool_call_schema.model_fields["append"]
+    assert append_field.default is False
+    assert append_field.description
+    assert "append" in append_field.description
+
+
+@pytest.mark.parametrize("tool_obj", [case[0] for case in _TOOL_CASES], ids=[case[0].name for case in _TOOL_CASES])
+def test_model_facing_tool_parameters_have_descriptions(tool_obj) -> None:
+    """Every model-facing tool parameter should explain when and how to use it."""
+    missing_descriptions = [field_name for field_name, field in tool_obj.tool_call_schema.model_fields.items() if not field.description]
+    assert missing_descriptions == [], f"{tool_obj.name} has model-facing parameters without descriptions: {missing_descriptions}. Add an Args: section to the tool's docstring and ensure @tool(parse_docstring=True) is set."
@@ -10,7 +10,8 @@ from __future__ import annotations

 from unittest.mock import MagicMock, patch

-from langchain_core.tools import BaseTool, tool
+from langchain_core.tools import BaseTool, StructuredTool, tool
+from pydantic import BaseModel, Field

 from deerflow.tools.tools import get_available_tools

@@ -19,6 +20,10 @@ from deerflow.tools.tools import get_available_tools
 # ---------------------------------------------------------------------------


+class AsyncToolArgs(BaseModel):
+    x: int = Field(..., description="test input")
+
+
@tool
 def _tool_alpha(x: str) -> str:
    """Alpha tool."""
@@ -52,10 +57,45 @@ def _make_minimal_config(tools):
    config.tools = tools
    config.models = []
    config.tool_search.enabled = False
+    config.skill_evolution.enabled = False
    config.sandbox = MagicMock()
+    config.acp_agents = {}
    return config


+@patch("deerflow.tools.tools.get_app_config")
+@patch("deerflow.tools.tools.is_host_bash_allowed", return_value=True)
+@patch("deerflow.tools.tools.reset_deferred_registry")
+def test_config_loaded_async_only_tool_gets_sync_wrapper(mock_reset, mock_bash, mock_cfg):
+    """Config-loaded async-only tools can still be invoked by sync clients."""
+
+    async def async_tool_impl(x: int) -> str:
+        return f"result: {x}"
+
+    async_tool = StructuredTool(
+        name="async_tool",
+        description="Async-only test tool.",
+        args_schema=AsyncToolArgs,
+        func=None,
+        coroutine=async_tool_impl,
+    )
+    tool_cfg = MagicMock()
+    tool_cfg.name = "async_tool"
+    tool_cfg.group = "test"
+    tool_cfg.use = "tests.fake:async_tool"
+    mock_cfg.return_value = _make_minimal_config([tool_cfg])
+
+    with (
+        patch("deerflow.tools.tools.resolve_variable", return_value=async_tool),
+        patch("deerflow.tools.tools.BUILTIN_TOOLS", []),
+    ):
+        result = get_available_tools(include_mcp=False, app_config=mock_cfg.return_value)
+
+    assert async_tool in result
+    assert async_tool.func is not None
+    assert async_tool.invoke({"x": 42}) == "result: 42"
+
+
@patch("deerflow.tools.tools.get_app_config")
@patch("deerflow.tools.tools.is_host_bash_allowed", return_value=True)
@patch("deerflow.tools.tools.reset_deferred_registry")
@@ -0,0 +1,253 @@
+"""End-to-end verification for update_agent's user_id resolution.
+
+PR #2784 hardened setup_agent to prefer runtime.context["user_id"] over the
+contextvar. update_agent had the same latent gap: it unconditionally called
+get_effective_user_id() at module level, so any scenario where the contextvar
+was unavailable while runtime.context carried user_id (a background task
+scheduled outside the request task, a worker pool that doesn't copy_context,
+checkpoint resume on a different task) would silently route writes to
+users/default/agents/...
+
+These tests are load-bearing under @no_auto_user (contextvar empty):
+
+- The negative-control test confirms the fixture actually puts the tool in
+  the regime where the contextvar fallback would land in users/default/.
+  Without that, the positive test would be vacuously satisfied.
+- The positive test verifies update_agent honours runtime.context["user_id"]
+  injected by inject_authenticated_user_context in the gateway. Before the
+  fix in this PR, this test failed; now it passes.
+"""
+
+from __future__ import annotations
+
+from contextlib import ExitStack
+from pathlib import Path
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+from uuid import UUID
+
+import pytest
+import yaml
+from _agent_e2e_helpers import build_single_tool_call_model
+from langchain_core.messages import HumanMessage
+
+from app.gateway.services import (
+    build_run_config,
+    inject_authenticated_user_context,
+    merge_run_context_overrides,
+)
+from deerflow.runtime.runs.worker import _build_runtime_context, _install_runtime_context
+
+
+def _make_request(user_id_str: str | None) -> SimpleNamespace:
+    user = SimpleNamespace(id=UUID(user_id_str), email="alice@local") if user_id_str else None
+    return SimpleNamespace(state=SimpleNamespace(user=user))
+
+
+def _assemble_config(*, body_context: dict | None, request_user_id: str | None, thread_id: str) -> dict:
+    config = build_run_config(thread_id, {"recursion_limit": 50}, None, assistant_id="lead_agent")
+    merge_run_context_overrides(config, body_context)
+    inject_authenticated_user_context(config, _make_request(request_user_id))
+    return config
+
+
+def _seed_existing_agent(tmp_path: Path, user_id: str, agent_name: str, soul: str = "# Original"):
+    """Pre-create an agent on disk for update_agent to overwrite."""
+    agent_dir = tmp_path / "users" / user_id / "agents" / agent_name
+    agent_dir.mkdir(parents=True, exist_ok=True)
+    (agent_dir / "config.yaml").write_text(
+        yaml.dump({"name": agent_name, "description": "old"}, allow_unicode=True),
+        encoding="utf-8",
+    )
+    (agent_dir / "SOUL.md").write_text(soul, encoding="utf-8")
+    return agent_dir
+
+
+def _make_paths_mock(tmp_path: Path):
+    paths = MagicMock()
+    paths.base_dir = tmp_path
+    paths.agent_dir = lambda name: tmp_path / "agents" / name
+    paths.user_agent_dir = lambda user_id, name: tmp_path / "users" / user_id / "agents" / name
+    return paths
+
+
+def _patch_update_agent_dependencies(tmp_path: Path):
+    """update_agent reads load_agent_config + get_app_config — stub them
+    minimally so the tool can run without a real config file or LLM."""
+    fake_model_cfg = SimpleNamespace(name="fake-model")
+    fake_app_cfg = MagicMock()
+    fake_app_cfg.get_model_config = lambda name: fake_model_cfg if name == "fake-model" else None
+
+    return [
+        patch(
+            "deerflow.tools.builtins.update_agent_tool.get_paths",
+            return_value=_make_paths_mock(tmp_path),
+        ),
+        patch(
+            "deerflow.tools.builtins.update_agent_tool.get_app_config",
+            return_value=fake_app_cfg,
+        ),
+        # load_agent_config (used by update_agent to read existing config) also
+        # reads paths via its own module-level get_paths reference. Patch it too
+        # or the tool returns "Agent does not exist" before touching disk.
+        patch(
+            "deerflow.config.agents_config.get_paths",
+            return_value=_make_paths_mock(tmp_path),
+        ),
+    ]
+
+
+def _build_update_graph(*, soul_payload: str):
+    from langchain.agents import create_agent
+
+    from deerflow.tools.builtins.update_agent_tool import update_agent
+
+    fake_model = build_single_tool_call_model(
+        tool_name="update_agent",
+        tool_args={"soul": soul_payload, "description": "refined"},
+        tool_call_id="call_update_1",
+        final_text="updated",
+    )
+    return create_agent(model=fake_model, tools=[update_agent], system_prompt="updater")
+
+
+# ---------------------------------------------------------------------------
+# Negative control — proves the test environment puts update_agent in the
+# regime where the contextvar fallback would land in default/.
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.no_auto_user
+def test_update_agent_falls_back_to_default_when_no_inject_and_no_contextvar(tmp_path: Path):
+    """No request.state.user, no contextvar — update_agent must look in
+    users/default/agents/. We seed the file there so the tool succeeds and
+    we know which directory it actually consulted."""
+    from langgraph.runtime import Runtime
+
+    _seed_existing_agent(tmp_path, "default", "fallback-target")
+
+    config = _assemble_config(
+        body_context={"agent_name": "fallback-target"},
+        request_user_id=None,  # no auth, inject is no-op
+        thread_id="thread-update-1",
+    )
+    runtime_ctx = _build_runtime_context("thread-update-1", "run-1", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_update_graph(soul_payload="# Fallback Updated")
+
+    with ExitStack() as stack:
+        for p in _patch_update_agent_dependencies(tmp_path):
+            stack.enter_context(p)
+        graph.invoke(
+            {"messages": [HumanMessage(content="update fallback-target")]},
+            config=config,
+        )
+
+    soul = (tmp_path / "users" / "default" / "agents" / "fallback-target" / "SOUL.md").read_text()
+    assert soul == "# Fallback Updated", "Sanity: tool should have written under default/"
+
+
+# ---------------------------------------------------------------------------
+# Regression guard — passes on this branch, would fail on main before the fix.
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.no_auto_user
+def test_update_agent_should_use_runtime_context_user_id_when_contextvar_missing(tmp_path: Path):
+    """update_agent prefers the authenticated user_id carried in
+    runtime.context (placed there by inject_authenticated_user_context)
+    over the contextvar — same contract as setup_agent (PR #2784).
+
+    Before this PR's fix, update_agent unconditionally called
+    get_effective_user_id() and landed in default/ whenever the contextvar
+    was unavailable. This test pins the corrected behaviour.
+    """
+    from langgraph.runtime import Runtime
+
+    auth_uid = "abcdef01-2345-6789-abcd-ef0123456789"
+
+    # Seed the agent in BOTH locations so we can prove which one was opened.
+    auth_dir = _seed_existing_agent(tmp_path, auth_uid, "shared-name", soul="# Auth Original")
+    default_dir = _seed_existing_agent(tmp_path, "default", "shared-name", soul="# Default Original")
+
+    config = _assemble_config(
+        body_context={"agent_name": "shared-name"},
+        request_user_id=auth_uid,
+        thread_id="thread-update-2",
+    )
+    runtime_ctx = _build_runtime_context("thread-update-2", "run-2", config.get("context"), None)
+    assert runtime_ctx["user_id"] == auth_uid, "Pre-condition: inject must have placed user_id into runtime_ctx"
+
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_update_graph(soul_payload="# Auth Updated")
+
+    with ExitStack() as stack:
+        for p in _patch_update_agent_dependencies(tmp_path):
+            stack.enter_context(p)
+        graph.invoke(
+            {"messages": [HumanMessage(content="update shared-name")]},
+            config=config,
+        )
+
+    auth_soul = (auth_dir / "SOUL.md").read_text()
+    default_soul = (default_dir / "SOUL.md").read_text()
+
+    assert auth_soul == "# Auth Updated", f"REGRESSION: update_agent ignored runtime.context['user_id']={auth_uid!r} and routed the write to users/default/ instead. auth_soul={auth_soul!r}, default_soul={default_soul!r}"
+    assert default_soul == "# Default Original", "REGRESSION: update_agent corrupted the shared default-user agent. It should have written under the authenticated user's path."
+
+
+# ---------------------------------------------------------------------------
+# Positive — when contextvar IS the auth user (the normal HTTP case), things
+# already work. Pin it as a regression guard so future refactors don't
+# accidentally break the contextvar path in pursuit of the runtime-context fix.
+# ---------------------------------------------------------------------------
+
+
+def test_update_agent_uses_contextvar_when_present(tmp_path: Path, monkeypatch):
+    """The normal HTTP case: contextvar is set by auth_middleware. This must
+    keep working regardless of how runtime.context is populated."""
+    from types import SimpleNamespace as _SN
+
+    from deerflow.runtime.user_context import reset_current_user, set_current_user
+
+    auth_uid = "11112222-3333-4444-5555-666677778888"
+    user = _SN(id=auth_uid, email="ctxvar@local")
+
+    _seed_existing_agent(tmp_path, auth_uid, "ctxvar-agent", soul="# Original")
+
+    from langgraph.runtime import Runtime
+
+    config = _assemble_config(
+        body_context={"agent_name": "ctxvar-agent"},
+        request_user_id=auth_uid,
+        thread_id="thread-update-3",
+    )
+    runtime_ctx = _build_runtime_context("thread-update-3", "run-3", config.get("context"), None)
+    _install_runtime_context(config, runtime_ctx)
+    runtime = Runtime(context=runtime_ctx, store=None)
+    config.setdefault("configurable", {})["__pregel_runtime"] = runtime
+
+    graph = _build_update_graph(soul_payload="# CtxVar Updated")
+
+    with ExitStack() as stack:
+        for p in _patch_update_agent_dependencies(tmp_path):
+            stack.enter_context(p)
+        token = set_current_user(user)
+        try:
+            final = graph.invoke(
+                {"messages": [HumanMessage(content="update ctxvar-agent")]},
+                config=config,
+            )
+        finally:
+            reset_current_user(token)
+
+    # surface the tool's reply for debug if it errored
+    tool_replies = [m.content for m in final["messages"] if getattr(m, "type", "") == "tool"]
+    soul = (tmp_path / "users" / auth_uid / "agents" / "ctxvar-agent" / "SOUL.md").read_text()
+    assert soul == "# CtxVar Updated", f"tool replies: {tool_replies}"
@@ -4224,11 +4224,11 @@ wheels = [

 [[package]]
 name = "urllib3"
-version = "2.6.3"
+version = "2.7.0"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/c7/24/5f1b3bdffd70275f6661c76461e25f024d5a38a46f04aaca912426a2b1d3/urllib3-2.6.3.tar.gz", hash = "sha256:1b62b6884944a57dbe321509ab94fd4d3b307075e0c2eae991ac71ee15ad38ed", size = 435556, upload-time = "2026-01-07T16:24:43.925Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/53/0c/06f8b233b8fd13b9e5ee11424ef85419ba0d8ba0b3138bf360be2ff56953/urllib3-2.7.0.tar.gz", hash = "sha256:231e0ec3b63ceb14667c67be60f2f2c40a518cb38b03af60abc813da26505f4c", size = 433602, upload-time = "2026-05-07T16:13:18.596Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/39/08/aaaad47bc4e9dc8c725e68f9d04865dbcb2052843ff09c97b08904852d84/urllib3-2.6.3-py3-none-any.whl", hash = "sha256:bf272323e553dfb2e87d9bfd225ca7b0f467b919d7bbd355436d3fd37cb0acd4", size = 131584, upload-time = "2026-01-07T16:24:42.685Z" },
+    { url = "https://files.pythonhosted.org/packages/7f/3e/5db95bcf282c52709639744ca2a8b149baccf648e39c8cc87553df9eae0c/urllib3-2.7.0-py3-none-any.whl", hash = "sha256:9fb4c81ebbb1ce9531cce37674bbc6f1360472bc18ca9a553ede278ef7276897", size = 131087, upload-time = "2026-05-07T16:13:17.151Z" },
 ]

 [[package]]
@@ -28,21 +28,11 @@ http {
        set $gateway_upstream gateway:8001;
        set $frontend_upstream frontend:3000;

-        # Hide CORS headers from upstream to prevent duplicates
-        proxy_hide_header 'Access-Control-Allow-Origin';
-        proxy_hide_header 'Access-Control-Allow-Methods';
-        proxy_hide_header 'Access-Control-Allow-Headers';
-        proxy_hide_header 'Access-Control-Allow-Credentials';
-
-        # CORS headers for all responses (nginx handles CORS centrally)
-        add_header 'Access-Control-Allow-Origin' '*' always;
-        add_header 'Access-Control-Allow-Methods' 'GET, POST, PUT, DELETE, PATCH, OPTIONS' always;
-        add_header 'Access-Control-Allow-Headers' '*' always;
-
-        # Handle OPTIONS requests (CORS preflight)
-        if ($request_method = 'OPTIONS') {
-            return 204;
-        }
+        # Keep the unified nginx endpoint same-origin by default. When split
+        # frontend/backend or port-forwarded deployments need browser CORS,
+        # configure the Gateway allowlist with GATEWAY_CORS_ORIGINS so CORS and
+        # CSRF origin checks stay aligned instead of approving every origin at
+        # the proxy layer.

        # LangGraph-compatible API routes served by Gateway.
        # Rewrites /api/langgraph/* to /api/* before proxying to Gateway.
@@ -28,21 +28,11 @@ http {
        listen [::]:2026;
        server_name _;

-        # Hide CORS headers from upstream to prevent duplicates
-        proxy_hide_header 'Access-Control-Allow-Origin';
-        proxy_hide_header 'Access-Control-Allow-Methods';
-        proxy_hide_header 'Access-Control-Allow-Headers';
-        proxy_hide_header 'Access-Control-Allow-Credentials';
-
-        # CORS headers for all responses (nginx handles CORS centrally)
-        add_header 'Access-Control-Allow-Origin' '*' always;
-        add_header 'Access-Control-Allow-Methods' 'GET, POST, PUT, DELETE, PATCH, OPTIONS' always;
-        add_header 'Access-Control-Allow-Headers' '*' always;
-
-        # Handle OPTIONS requests (CORS preflight)
-        if ($request_method = 'OPTIONS') {
-            return 204;
-        }
+        # Keep the unified nginx endpoint same-origin by default. When split
+        # frontend/backend or port-forwarded deployments need browser CORS,
+        # configure the Gateway allowlist with GATEWAY_CORS_ORIGINS so CORS and
+        # CSRF origin checks stay aligned instead of approving every origin at
+        # the proxy layer.

        # LangGraph-compatible API routes served by Gateway.
        # Rewrites /api/langgraph/* to /api/* before proxying to Gateway.
@@ -82,10 +82,10 @@ pnpm start
 Key environment variables (see `.env.example` for full list):

 ```bash
-# Backend API URLs (optional, uses nginx proxy by default)
+# Backend API URL (optional, uses local Next.js/nginx proxy by default)
 NEXT_PUBLIC_BACKEND_BASE_URL="http://localhost:8001"
-# LangGraph API URLs (optional, uses nginx proxy by default)
-NEXT_PUBLIC_LANGGRAPH_BASE_URL="http://localhost:2024"
+# LangGraph-compatible API URL (optional, uses local Next.js/nginx proxy by default)
+NEXT_PUBLIC_LANGGRAPH_BASE_URL="http://localhost:8001/api"
 ```

 ## Project Structure
@@ -68,7 +68,7 @@
    "lucide-react": "^0.562.0",
    "motion": "^12.26.2",
    "nanoid": "^5.1.6",
-    "next": "^16.1.7",
+    "next": "^16.2.6",
    "next-themes": "^0.4.6",
    "nextra": "^4.6.1",
    "nextra-theme-docs": "^4.6.1",
@@ -156,17 +156,17 @@ importers:
        specifier: ^5.1.6
        version: 5.1.6
      next:
-        specifier: ^16.1.7
-        version: 16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
+        specifier: ^16.2.6
+        version: 16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
      next-themes:
        specifier: ^0.4.6
        version: 0.4.6(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
      nextra:
        specifier: ^4.6.1
-        version: 4.6.1(next@16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3)
+        version: 4.6.1(next@16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3)
      nextra-theme-docs:
        specifier: ^4.6.1
-        version: 4.6.1(@types/react@19.2.13)(next@16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(nextra@4.6.1(next@16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(use-sync-external-store@1.6.0(react@19.2.4))
+        version: 4.6.1(@types/react@19.2.13)(next@16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(nextra@4.6.1(next@16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(use-sync-external-store@1.6.0(react@19.2.4))
      nuxt-og-image:
        specifier: ^5.1.13
        version: 5.1.13(@unhead/vue@2.1.4(vue@3.5.28(typescript@5.9.3)))(unstorage@1.17.4)(vite@7.3.1(@types/node@20.19.33)(jiti@2.6.1)(lightningcss@1.30.2)(yaml@2.8.3))(vue@3.5.28(typescript@5.9.3))
@@ -437,8 +437,8 @@ packages:
  '@emnapi/core@1.8.1':
    resolution: {integrity: sha512-AvT9QFpxK0Zd8J0jopedNm+w/2fIzvtPKPjqyw9jwvBaReTTqPBk9Hixaz7KbjimP+QNz605/XnjFcDAL2pqBg==}

-  '@emnapi/runtime@1.9.0':
-    resolution: {integrity: sha512-QN75eB0IH2ywSpRpNddCRfQIhmJYBCJ1x5Lb3IscKAL8bMnVAKnRg8dCoXbHzVLLH7P38N2Z3mtulB7W0J0FKw==}
+  '@emnapi/runtime@1.10.0':
+    resolution: {integrity: sha512-ewvYlk86xUoGI0zQRNq/mC+16R1QeDlKQy21Ki3oSYXNgLb45GV1P6A0M+/s6nyCuNDqe5VpaY84BzXGwVbwFA==}

  '@emnapi/wasi-threads@1.1.0':
    resolution: {integrity: sha512-WI0DdZ8xFSbgMjR1sFsKABJ/C5OnRrjT06JXbZKexJGrDuPTzZdDYfFlsgcCXCyf+suG5QU2e/y1Wo2V/OapLQ==}
@@ -1018,56 +1018,56 @@ packages:
  '@napi-rs/wasm-runtime@0.2.12':
    resolution: {integrity: sha512-ZVWUcfwY4E/yPitQJl481FjFo3K22D6qF0DuFH6Y/nbnE11GY5uguDxZMGXPQ8WQ0128MXQD7TnfHyK4oWoIJQ==}

-  '@next/env@16.1.7':
-    resolution: {integrity: sha512-rJJbIdJB/RQr2F1nylZr/PJzamvNNhfr3brdKP6s/GW850jbtR70QlSfFselvIBbcPUOlQwBakexjFzqLzF6pg==}
+  '@next/env@16.2.6':
+    resolution: {integrity: sha512-gd8HoHN4ufj73WmR3JmVolrpJR47ILK6LouP5xElPglaVxir6e1a7VzvTvDWkOoPXT9rkkTzyCxBu4yeZfZwcw==}

  '@next/eslint-plugin-next@15.5.12':
    resolution: {integrity: sha512-+ZRSDFTv4aC96aMb5E41rMjysx8ApkryevnvEYZvPZO52KvkqP5rNExLUXJFr9P4s0f3oqNQR6vopCZsPWKDcQ==}

-  '@next/swc-darwin-arm64@16.1.7':
-    resolution: {integrity: sha512-b2wWIE8sABdyafc4IM8r5Y/dS6kD80JRtOGrUiKTsACFQfWWgUQ2NwoUX1yjFMXVsAwcQeNpnucF2ZrujsBBPg==}
+  '@next/swc-darwin-arm64@16.2.6':
+    resolution: {integrity: sha512-ZJGkkcNfYgrrMkqOdZ7zoLa1TOy0qpcMfk/z4Mh/FKUz40gVO+HNQWqmLxf67Z5WB64DRp0dhEbyHfel+6sJUg==}
    engines: {node: '>= 10'}
    cpu: [arm64]
    os: [darwin]

-  '@next/swc-darwin-x64@16.1.7':
-    resolution: {integrity: sha512-zcnVaaZulS1WL0Ss38R5Q6D2gz7MtBu8GZLPfK+73D/hp4GFMrC2sudLky1QibfV7h6RJBJs/gOFvYP0X7UVlQ==}
+  '@next/swc-darwin-x64@16.2.6':
+    resolution: {integrity: sha512-v/YLBHIY132Ced3puBJ7YJKw1lqsCrgcNo2aRJlCEyQrrCeRJlvGlnmxhPxNQI3KE3N1DN5r9TPNPvka3nq5RQ==}
    engines: {node: '>= 10'}
    cpu: [x64]
    os: [darwin]

-  '@next/swc-linux-arm64-gnu@16.1.7':
-    resolution: {integrity: sha512-2ant89Lux/Q3VyC8vNVg7uBaFVP9SwoK2jJOOR0L8TQnX8CAYnh4uctAScy2Hwj2dgjVHqHLORQZJ2wH6VxhSQ==}
+  '@next/swc-linux-arm64-gnu@16.2.6':
+    resolution: {integrity: sha512-RPOvqlYBbcQjkz9VQQDZ2T2bARIjXZV1KFlt+V2Mr6SW/e4I9fcKsaA0hdyf2FHoTlsV2xnBd5Y912rP/1Ce6w==}
    engines: {node: '>= 10'}
    cpu: [arm64]
    os: [linux]

-  '@next/swc-linux-arm64-musl@16.1.7':
-    resolution: {integrity: sha512-uufcze7LYv0FQg9GnNeZ3/whYfo+1Q3HnQpm16o6Uyi0OVzLlk2ZWoY7j07KADZFY8qwDbsmFnMQP3p3+Ftprw==}
+  '@next/swc-linux-arm64-musl@16.2.6':
+    resolution: {integrity: sha512-URUTu1+dMkxJsPFgm+OeEvq9wf5sujw0EvgYy80TDGHTSLTnIHeqb0Eu8A3sC95IRgjejQL+kC4mw+4yPxiAXA==}
    engines: {node: '>= 10'}
    cpu: [arm64]
    os: [linux]

-  '@next/swc-linux-x64-gnu@16.1.7':
-    resolution: {integrity: sha512-KWVf2gxYvHtvuT+c4MBOGxuse5TD7DsMFYSxVxRBnOzok/xryNeQSjXgxSv9QpIVlaGzEn/pIuI6Koosx8CGWA==}
+  '@next/swc-linux-x64-gnu@16.2.6':
+    resolution: {integrity: sha512-DOj182mPV8G3UkrayLoREM5YEYI+Dk5wv7Ox9xl1fFibAELEsFD0lDPfHIeILlutMMfdyhlzYPELG3peuKaurw==}
    engines: {node: '>= 10'}
    cpu: [x64]
    os: [linux]

-  '@next/swc-linux-x64-musl@16.1.7':
-    resolution: {integrity: sha512-HguhaGwsGr1YAGs68uRKc4aGWxLET+NevJskOcCAwXbwj0fYX0RgZW2gsOCzr9S11CSQPIkxmoSbuVaBp4Z3dA==}
+  '@next/swc-linux-x64-musl@16.2.6':
+    resolution: {integrity: sha512-HKQ5SP/V/ub73UvF7n/zeJlxk2kLmtL7Wzrg4WfmkjmNos5onJ2tKu7yZOPdL18A6Svfn3max29ym+ry7NkK4g==}
    engines: {node: '>= 10'}
    cpu: [x64]
    os: [linux]

-  '@next/swc-win32-arm64-msvc@16.1.7':
-    resolution: {integrity: sha512-S0n3KrDJokKTeFyM/vGGGR8+pCmXYrjNTk2ZozOL1C/JFdfUIL9O1ATaJOl5r2POe56iRChbsszrjMAdWSv7kQ==}
+  '@next/swc-win32-arm64-msvc@16.2.6':
+    resolution: {integrity: sha512-LZXpTlPyS5v7HhSmnvsLGP3iIYgYOBnc8r8ArlT55sGHV89bR2HlDdBjWQ+PY6SJMmk8TuVGFuxalnP3k/0Dwg==}
    engines: {node: '>= 10'}
    cpu: [arm64]
    os: [win32]

-  '@next/swc-win32-x64-msvc@16.1.7':
-    resolution: {integrity: sha512-mwgtg8CNZGYm06LeEd+bNnOUfwOyNem/rOiP14Lsz+AnUY92Zq/LXwtebtUiaeVkhbroRCQ0c8GlR4UT1U+0yg==}
+  '@next/swc-win32-x64-msvc@16.2.6':
+    resolution: {integrity: sha512-F0+4i0h9J6C4eE3EAPWsoCk7UW/dbzOjyzxY0qnDUOYFu6FFmdZ6l97/XdV3/Nz3VYyO7UWjyEJUXkGqcoXfMA==}
    engines: {node: '>= 10'}
    cpu: [x64]
    os: [win32]
@@ -1912,6 +1912,9 @@ packages:
  '@swc/helpers@0.5.15':
    resolution: {integrity: sha512-JQ5TuMi45Owi4/BIMAJBoSQoOJu12oOk/gADqlcUL9JEdHB8vyjUSsxqeNXnmXHjYKMi2WcYtezGEEhqUI/E2g==}

+  '@swc/helpers@0.5.21':
+    resolution: {integrity: sha512-jI/VAmtdjB/RnI8GTnokyX7Ug8c+g+ffD6QRLa6XQewtnGyukKkKSk3wLTM3b5cjt1jNh9x0jfVlagdN2gDKQg==}
+
  '@t3-oss/env-core@0.12.0':
    resolution: {integrity: sha512-lOPj8d9nJJTt81mMuN9GMk8x5veOt7q9m11OSnCBJhwp1QrL/qR+M8Y467ULBSm9SunosryWNbmQQbgoiMgcdw==}
    peerDependencies:
@@ -2652,8 +2655,8 @@ packages:
  base64-js@1.5.1:
    resolution: {integrity: sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==}

-  baseline-browser-mapping@2.10.8:
-    resolution: {integrity: sha512-PCLz/LXGBsNTErbtB6i5u4eLpHeMfi93aUv5duMmj6caNu6IphS4q6UevDnL36sZQv9lrP11dbPKGMaXPwMKfQ==}
+  baseline-browser-mapping@2.10.29:
+    resolution: {integrity: sha512-Asa2krT+XTPZINCS+2QcyS8WTkObE77RwkydwF7h6DmnKqbvlalz93m/dnphUyCa6SWSP51VgtEUf2FN+gelFQ==}
    engines: {node: '>=6.0.0'}
    hasBin: true

@@ -2710,8 +2713,8 @@ packages:
  camelize@1.0.1:
    resolution: {integrity: sha512-dU+Tx2fsypxTgtLoE36npi3UqcjSSMNYfkqgmoEhtZrraP5VWq0K7FkWVTYa8eMPtnU/G2txVsfdCJTn9uzpuQ==}

-  caniuse-lite@1.0.30001780:
-    resolution: {integrity: sha512-llngX0E7nQci5BPJDqoZSbuZ5Bcs9F5db7EtgfwBerX9XGtkkiO4NwfDDIRzHTTwcYC8vC7bmeUEPGrKlR/TkQ==}
+  caniuse-lite@1.0.30001792:
+    resolution: {integrity: sha512-hVLMUZFgR4JJ6ACt1uEESvQN1/dBVqPAKY0hgrV70eN3391K6juAfTjKZLKvOMsx8PxA7gsY1/tLMMTcfFLLpw==}

  canvas-confetti@1.9.4:
    resolution: {integrity: sha512-yxQbJkAVrFXWNbTUjPqjF7G+g6pDotOUHGbkZq2NELZUMDpiJ85rIEazVb8GTaAptNW2miJAXbs1BtioA251Pw==}
@@ -4389,8 +4392,8 @@ packages:
      react: ^16.8 || ^17 || ^18 || ^19 || ^19.0.0-rc
      react-dom: ^16.8 || ^17 || ^18 || ^19 || ^19.0.0-rc

-  next@16.1.7:
-    resolution: {integrity: sha512-WM0L7WrSvKwoLegLYr6V+mz+RIofqQgVAfHhMp9a88ms0cFX8iX9ew+snpWlSBwpkURJOUdvCEt3uLl3NNzvWg==}
+  next@16.2.6:
+    resolution: {integrity: sha512-qOVgKJg1+At15NpeUP+eJgCHvTCgXsogweq87Ri/Ix7PkqQHg4sdaXmSFqKlgaIXE4kW0g25LE68W87UANlHtw==}
    engines: {node: '>=20.9.0'}
    hasBin: true
    peerDependencies:
@@ -5013,6 +5016,11 @@ packages:
    engines: {node: '>=10'}
    hasBin: true

+  semver@7.8.0:
+    resolution: {integrity: sha512-AcM7dV/5ul4EekoQ29Agm5vri8JNqRyj39o0qpX6vDF2GZrtutZl5RwgD1XnZjiTAfncsJhMI48QQH3sN87YNA==}
+    engines: {node: '>=10'}
+    hasBin: true
+
  server-only@0.0.1:
    resolution: {integrity: sha512-qepMx2JxAa5jjfzxG79yPPq+8BuFToHd1hm7kI+Z4zAq1ftQiP7HcxMhDDItrbtwVeLg/cY2JnKnrcFkmiswNA==}

@@ -6066,7 +6074,7 @@ snapshots:
      tslib: 2.8.1
    optional: true

-  '@emnapi/runtime@1.9.0':
+  '@emnapi/runtime@1.10.0':
    dependencies:
      tslib: 2.8.1
    optional: true
@@ -6343,7 +6351,7 @@ snapshots:

  '@img/sharp-wasm32@0.34.5':
    dependencies:
-      '@emnapi/runtime': 1.9.0
+      '@emnapi/runtime': 1.10.0
    optional: true

  '@img/sharp-win32-arm64@0.34.5':
@@ -6598,38 +6606,38 @@ snapshots:
  '@napi-rs/wasm-runtime@0.2.12':
    dependencies:
      '@emnapi/core': 1.8.1
-      '@emnapi/runtime': 1.9.0
+      '@emnapi/runtime': 1.10.0
      '@tybys/wasm-util': 0.10.1
    optional: true

-  '@next/env@16.1.7': {}
+  '@next/env@16.2.6': {}

  '@next/eslint-plugin-next@15.5.12':
    dependencies:
      fast-glob: 3.3.1

-  '@next/swc-darwin-arm64@16.1.7':
+  '@next/swc-darwin-arm64@16.2.6':
    optional: true

-  '@next/swc-darwin-x64@16.1.7':
+  '@next/swc-darwin-x64@16.2.6':
    optional: true

-  '@next/swc-linux-arm64-gnu@16.1.7':
+  '@next/swc-linux-arm64-gnu@16.2.6':
    optional: true

-  '@next/swc-linux-arm64-musl@16.1.7':
+  '@next/swc-linux-arm64-musl@16.2.6':
    optional: true

-  '@next/swc-linux-x64-gnu@16.1.7':
+  '@next/swc-linux-x64-gnu@16.2.6':
    optional: true

-  '@next/swc-linux-x64-musl@16.1.7':
+  '@next/swc-linux-x64-musl@16.2.6':
    optional: true

-  '@next/swc-win32-arm64-msvc@16.1.7':
+  '@next/swc-win32-arm64-msvc@16.2.6':
    optional: true

-  '@next/swc-win32-x64-msvc@16.1.7':
+  '@next/swc-win32-x64-msvc@16.2.6':
    optional: true

  '@nodelib/fs.scandir@2.1.5':
@@ -7192,7 +7200,7 @@ snapshots:
      '@react-aria/interactions': 3.27.1(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
      '@react-aria/utils': 3.33.1(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
      '@react-types/shared': 3.33.1(react@19.2.4)
-      '@swc/helpers': 0.5.15
+      '@swc/helpers': 0.5.21
      clsx: 2.1.1
      react: 19.2.4
      react-dom: 19.2.4(react@19.2.4)
@@ -7203,13 +7211,13 @@ snapshots:
      '@react-aria/utils': 3.33.1(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
      '@react-stately/flags': 3.1.2
      '@react-types/shared': 3.33.1(react@19.2.4)
-      '@swc/helpers': 0.5.15
+      '@swc/helpers': 0.5.21
      react: 19.2.4
      react-dom: 19.2.4(react@19.2.4)

  '@react-aria/ssr@3.9.10(react@19.2.4)':
    dependencies:
-      '@swc/helpers': 0.5.15
+      '@swc/helpers': 0.5.21
      react: 19.2.4

  '@react-aria/utils@3.33.1(react-dom@19.2.4(react@19.2.4))(react@19.2.4)':
@@ -7218,18 +7226,18 @@ snapshots:
      '@react-stately/flags': 3.1.2
      '@react-stately/utils': 3.11.0(react@19.2.4)
      '@react-types/shared': 3.33.1(react@19.2.4)
-      '@swc/helpers': 0.5.15
+      '@swc/helpers': 0.5.21
      clsx: 2.1.1
      react: 19.2.4
      react-dom: 19.2.4(react@19.2.4)

  '@react-stately/flags@3.1.2':
    dependencies:
-      '@swc/helpers': 0.5.15
+      '@swc/helpers': 0.5.21

  '@react-stately/utils@3.11.0(react@19.2.4)':
    dependencies:
-      '@swc/helpers': 0.5.15
+      '@swc/helpers': 0.5.21
      react: 19.2.4

  '@react-types/shared@3.33.1(react@19.2.4)':
@@ -7437,6 +7445,10 @@ snapshots:
    dependencies:
      tslib: 2.8.1

+  '@swc/helpers@0.5.21':
+    dependencies:
+      tslib: 2.8.1
+
  '@t3-oss/env-core@0.12.0(typescript@5.9.3)(zod@3.25.76)':
    optionalDependencies:
      typescript: 5.9.3
@@ -8249,7 +8261,7 @@ snapshots:

  base64-js@1.5.1: {}

-  baseline-browser-mapping@2.10.8: {}
+  baseline-browser-mapping@2.10.29: {}

  best-effort-json-parser@1.2.1: {}

@@ -8313,7 +8325,7 @@ snapshots:

  camelize@1.0.1: {}

-  caniuse-lite@1.0.30001780: {}
+  caniuse-lite@1.0.30001792: {}

  canvas-confetti@1.9.4: {}

@@ -9643,7 +9655,7 @@ snapshots:

  is-bun-module@2.0.0:
    dependencies:
-      semver: 7.7.4
+      semver: 7.8.0

  is-callable@1.2.7: {}

@@ -10531,25 +10543,25 @@ snapshots:
      react: 19.2.4
      react-dom: 19.2.4(react@19.2.4)

-  next@16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4):
+  next@16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4):
    dependencies:
-      '@next/env': 16.1.7
+      '@next/env': 16.2.6
      '@swc/helpers': 0.5.15
-      baseline-browser-mapping: 2.10.8
-      caniuse-lite: 1.0.30001780
+      baseline-browser-mapping: 2.10.29
+      caniuse-lite: 1.0.30001792
      postcss: 8.4.31
      react: 19.2.4
      react-dom: 19.2.4(react@19.2.4)
      styled-jsx: 5.1.6(react@19.2.4)
    optionalDependencies:
-      '@next/swc-darwin-arm64': 16.1.7
-      '@next/swc-darwin-x64': 16.1.7
-      '@next/swc-linux-arm64-gnu': 16.1.7
-      '@next/swc-linux-arm64-musl': 16.1.7
-      '@next/swc-linux-x64-gnu': 16.1.7
-      '@next/swc-linux-x64-musl': 16.1.7
-      '@next/swc-win32-arm64-msvc': 16.1.7
-      '@next/swc-win32-x64-msvc': 16.1.7
+      '@next/swc-darwin-arm64': 16.2.6
+      '@next/swc-darwin-x64': 16.2.6
+      '@next/swc-linux-arm64-gnu': 16.2.6
+      '@next/swc-linux-arm64-musl': 16.2.6
+      '@next/swc-linux-x64-gnu': 16.2.6
+      '@next/swc-linux-x64-musl': 16.2.6
+      '@next/swc-win32-arm64-msvc': 16.2.6
+      '@next/swc-win32-x64-msvc': 16.2.6
      '@opentelemetry/api': 1.9.0
      '@playwright/test': 1.59.1
      sharp: 0.34.5
@@ -10557,13 +10569,13 @@ snapshots:
      - '@babel/core'
      - babel-plugin-macros

-  nextra-theme-docs@4.6.1(@types/react@19.2.13)(next@16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(nextra@4.6.1(next@16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(use-sync-external-store@1.6.0(react@19.2.4)):
+  nextra-theme-docs@4.6.1(@types/react@19.2.13)(next@16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(nextra@4.6.1(next@16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(use-sync-external-store@1.6.0(react@19.2.4)):
    dependencies:
      '@headlessui/react': 2.2.9(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
      clsx: 2.1.1
-      next: 16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
+      next: 16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
      next-themes: 0.4.6(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
-      nextra: 4.6.1(next@16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3)
+      nextra: 4.6.1(next@16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3)
      react: 19.2.4
      react-compiler-runtime: 19.1.0-rc.3(react@19.2.4)
      react-dom: 19.2.4(react@19.2.4)
@@ -10575,7 +10587,7 @@ snapshots:
      - immer
      - use-sync-external-store

-  nextra@4.6.1(next@16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3):
+  nextra@4.6.1(next@16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4))(react-dom@19.2.4(react@19.2.4))(react@19.2.4)(typescript@5.9.3):
    dependencies:
      '@formatjs/intl-localematcher': 0.6.2
      '@headlessui/react': 2.2.9(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
@@ -10596,7 +10608,7 @@ snapshots:
      mdast-util-gfm: 3.1.0
      mdast-util-to-hast: 13.2.1
      negotiator: 1.0.0
-      next: 16.1.7(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
+      next: 16.2.6(@opentelemetry/api@1.9.0)(@playwright/test@1.59.1)(react-dom@19.2.4(react@19.2.4))(react@19.2.4)
      react: 19.2.4
      react-compiler-runtime: 19.1.0-rc.3(react@19.2.4)
      react-dom: 19.2.4(react@19.2.4)
@@ -10925,7 +10937,7 @@ snapshots:

  postcss@8.4.31:
    dependencies:
-      nanoid: 3.3.11
+      nanoid: 3.3.12
      picocolors: 1.1.1
      source-map-js: 1.2.1

@@ -11365,6 +11377,8 @@ snapshots:

  semver@7.7.4: {}

+  semver@7.8.0: {}
+
  server-only@0.0.1: {}

  set-function-length@1.2.2:
@@ -11393,7 +11407,7 @@ snapshots:
    dependencies:
      '@img/colour': 1.1.0
      detect-libc: 2.1.2
-      semver: 7.7.4
+      semver: 7.8.0
    optionalDependencies:
      '@img/sharp-darwin-arm64': 0.34.5
      '@img/sharp-darwin-x64': 0.34.5
@@ -10,6 +10,7 @@ import { FlickeringGrid } from "@/components/ui/flickering-grid";
 import { Input } from "@/components/ui/input";
 import { useAuth } from "@/core/auth/AuthProvider";
 import { parseAuthError } from "@/core/auth/types";
+import { getBackendBaseURL } from "@/core/config";

 /**
 * Validate next parameter
@@ -71,7 +72,7 @@ export default function LoginPage() {
  useEffect(() => {
    let cancelled = false;

-    void fetch("/api/v1/auth/setup-status")
+    void fetch(`${getBackendBaseURL()}/api/v1/auth/setup-status`)
      .then((r) => r.json())
      .then((data: { needs_setup?: boolean }) => {
        if (!cancelled && data.needs_setup) {
@@ -94,8 +95,8 @@ export default function LoginPage() {

    try {
      const endpoint = isLogin
-        ? "/api/v1/auth/login/local"
-        : "/api/v1/auth/register";
+        ? `${getBackendBaseURL()}/api/v1/auth/login/local`
+        : `${getBackendBaseURL()}/api/v1/auth/register`;
      const body = isLogin
        ? `username=${encodeURIComponent(email)}&password=${encodeURIComponent(password)}`
        : JSON.stringify({ email, password });
@@ -10,6 +10,7 @@ import { Input } from "@/components/ui/input";
 import { getCsrfHeaders } from "@/core/api/fetcher";
 import { useAuth } from "@/core/auth/AuthProvider";
 import { parseAuthError } from "@/core/auth/types";
+import { getBackendBaseURL } from "@/core/config";

 type SetupMode = "loading" | "init_admin" | "change_password";

@@ -36,7 +37,7 @@ export default function SetupPage() {
      setMode("change_password");
    } else if (!isAuthenticated) {
      // Check if the system has no users yet
-      void fetch("/api/v1/auth/setup-status")
+      void fetch(`${getBackendBaseURL()}/api/v1/auth/setup-status`)
        .then((r) => r.json())
        .then((data: { needs_setup?: boolean }) => {
          if (cancelled) return;
@@ -72,7 +73,7 @@ export default function SetupPage() {

    setLoading(true);
    try {
-      const res = await fetch("/api/v1/auth/initialize", {
+      const res = await fetch(`${getBackendBaseURL()}/api/v1/auth/initialize`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        credentials: "include",
@@ -113,19 +114,22 @@ export default function SetupPage() {

    setLoading(true);
    try {
-      const res = await fetch("/api/v1/auth/change-password", {
-        method: "POST",
-        headers: {
-          "Content-Type": "application/json",
-          ...getCsrfHeaders(),
+      const res = await fetch(
+        `${getBackendBaseURL()}/api/v1/auth/change-password`,
+        {
+          method: "POST",
+          headers: {
+            "Content-Type": "application/json",
+            ...getCsrfHeaders(),
+          },
+          credentials: "include",
+          body: JSON.stringify({
+            current_password: currentPassword,
+            new_password: newPassword,
+            new_email: email || undefined,
+          }),
        },
-        credentials: "include",
-        body: JSON.stringify({
-          current_password: currentPassword,
-          new_password: newPassword,
-          new_email: email || undefined,
-        }),
-      });
+      );

      if (!res.ok) {
        const data = await res.json();
@@ -4,6 +4,7 @@ import { redirect } from "next/navigation";
 import { AuthProvider } from "@/core/auth/AuthProvider";
 import { getServerSideUser } from "@/core/auth/server";
 import { assertNever } from "@/core/auth/types";
+import { getBackendBaseURL } from "@/core/config";

 import { WorkspaceContent } from "./workspace-content";

@@ -44,7 +45,7 @@ export default async function WorkspaceLayout({
              Retry
            </Link>
            <Link
-              href="/api/v1/auth/logout"
+              href={`${getBackendBaseURL()}/api/v1/auth/logout`}
              className="text-muted-foreground hover:bg-muted rounded-md border px-4 py-2 text-sm"
            >
              Logout &amp; Reset
@@ -8,6 +8,7 @@ import { Input } from "@/components/ui/input";
 import { fetch, getCsrfHeaders } from "@/core/api/fetcher";
 import { useAuth } from "@/core/auth/AuthProvider";
 import { parseAuthError } from "@/core/auth/types";
+import { getBackendBaseURL } from "@/core/config";
 import { useI18n } from "@/core/i18n/hooks";

 import { SettingsSection } from "./settings-section";
@@ -38,17 +39,20 @@ export function AccountSettingsPage() {

    setLoading(true);
    try {
-      const res = await fetch("/api/v1/auth/change-password", {
-        method: "POST",
-        headers: {
-          "Content-Type": "application/json",
-          ...getCsrfHeaders(),
+      const res = await fetch(
+        `${getBackendBaseURL()}/api/v1/auth/change-password`,
+        {
+          method: "POST",
+          headers: {
+            "Content-Type": "application/json",
+            ...getCsrfHeaders(),
+          },
+          body: JSON.stringify({
+            current_password: currentPassword,
+            new_password: newPassword,
+          }),
        },
-        body: JSON.stringify({
-          current_password: currentPassword,
-          new_password: newPassword,
-        }),
-      });
+      );

      if (!res.ok) {
        const data = await res.json();
@@ -111,10 +111,9 @@ checkpointer:
 ```

 <Callout type="info">
-  The LangGraph Server manages its own state separately. The
-  <code>checkpointer</code> setting in <code>config.yaml</code> applies to the
-  embedded <code>DeerFlowClient</code> (used in direct Python integrations), not
-  to the LangGraph Server deployment used by DeerFlow App.
+  The Gateway embedded runtime uses the <code>checkpointer</code> setting in
+  <code>config.yaml</code>. The same setting is also used by
+  <code>DeerFlowClient</code> in direct Python integrations.
 </Callout>

 ### Thread data storage
@@ -23,8 +23,7 @@ Services started:

 | Service     | Port | Description              |
 | ----------- | ---- | ------------------------ |
-| LangGraph   | 2024 | DeerFlow Harness runtime |
-| Gateway API | 8001 | FastAPI backend          |
+| Gateway API | 8001 | FastAPI backend + embedded agent runtime |
 | Frontend    | 3000 | Next.js UI               |
 | nginx       | 2026 | Unified reverse proxy    |

@@ -36,13 +35,12 @@ Access the app at **http://localhost:2026**.
 make stop
 ```

-Stops all four services. Safe to run even if a service is not running.
+Stops all services. Safe to run even if a service is not running.

  </Tabs.Tab>
  <Tabs.Tab>
 ```
-logs/langgraph.log   # Agent runtime logs
-logs/gateway.log     # API gateway logs
+logs/gateway.log     # API gateway and agent runtime logs
 logs/frontend.log    # Next.js dev server logs
 logs/nginx.log       # nginx access/error logs
 ```
@@ -50,7 +48,7 @@ logs/nginx.log       # nginx access/error logs
 Tail a log in real time:

 ```bash
-tail -f logs/langgraph.log
+tail -f logs/gateway.log
 ```

  </Tabs.Tab>
@@ -74,7 +72,7 @@ export DEER_FLOW_ROOT=/path/to/deer-flow
 docker compose -f docker/docker-compose-dev.yaml up --build
 ```

-Services: nginx, frontend, gateway, langgraph, and optionally provisioner (for K8s-managed sandboxes).
+Services: nginx, frontend, gateway, and optionally provisioner (for K8s-managed sandboxes).

 Access the app at **http://localhost:2026**.

@@ -99,7 +97,7 @@ The `docker-compose*.yaml` files include an `env_file: ../.env` directive that l

 ### Data persistence

-Thread data is stored in `backend/.deer-flow/threads/`. In Docker deployments, this directory is bind-mounted into the langgraph container.
+Thread data is stored in `backend/.deer-flow/threads/`. In Docker deployments, this directory is bind-mounted into the gateway container.

 To avoid data loss when containers are recreated:

@@ -161,14 +159,7 @@ When `USERDATA_PVC_NAME` is set, the provisioner automatically uses subPath (`th

 ### nginx configuration

-nginx routes all traffic. Key environment variables that control routing:
-
-| Variable             | Default          | Description                             |
-| -------------------- | ---------------- | --------------------------------------- |
-| `LANGGRAPH_UPSTREAM` | `langgraph:2024` | LangGraph service address               |
-| `LANGGRAPH_REWRITE`  | `/`              | URL rewrite prefix for LangGraph routes |
-
-These are set in the Docker Compose environment and processed by `envsubst` at container startup.
+nginx routes all traffic to the frontend or Gateway. `/api/langgraph/*` is rewritten to Gateway's LangGraph-compatible `/api/*` routes, so no separate LangGraph upstream is required.

 ### Authentication

@@ -186,8 +177,7 @@ openssl rand -base64 32

 | Service                         | Minimum          | Recommended      |
 | ------------------------------- | ---------------- | ---------------- |
-| LangGraph (agent runtime)       | 2 vCPU, 4 GB RAM | 4 vCPU, 8 GB RAM |
-| Gateway                         | 0.5 vCPU, 512 MB | 1 vCPU, 1 GB     |
+| Gateway + agent runtime         | 2 vCPU, 4 GB RAM | 4 vCPU, 8 GB RAM |
 | Frontend                        | 0.5 vCPU, 512 MB | 1 vCPU, 1 GB     |
 | Sandbox container (per session) | 1 vCPU, 1 GB     | 2 vCPU, 2 GB     |

@@ -199,9 +189,6 @@ After starting, verify the deployment:
 # Check Gateway health
 curl http://localhost:8001/health

-# Check LangGraph health
-curl http://localhost:2024/ok
-
 # List configured models (through nginx)
 curl http://localhost:2026/api/models
 ```
@@ -25,11 +25,11 @@ DeerFlow App is the reference implementation of what a production DeerFlow exper
 | **Streaming responses** | Real-time token streaming with thinking steps and tool call visibility                               |
 | **Artifact viewer**     | In-browser preview and download of files and outputs produced by the agent                           |
 | **Extensions UI**       | Enable/disable MCP servers and skills without editing config files                                   |
-| **Gateway API**         | FastAPI-based REST API that bridges the frontend and the LangGraph runtime                           |
+| **Gateway API**         | FastAPI-based REST API with the embedded LangGraph-compatible agent runtime                          |

 ## Architecture

-The DeerFlow App runs as four services behind a single nginx reverse proxy:
+The DeerFlow App runs behind a single nginx reverse proxy:

 ```
                ┌──────────────────┐
@@ -42,19 +42,11 @@ The DeerFlow App runs as four services behind a single nginx reverse proxy:
 │  Frontend :3000  │       │  Gateway API :8001    │
 │  (Next.js)       │       │  (FastAPI)            │
 └──────────────────┘       └──────────────────────┘
-                                      │
-                            ┌─────────┘
-                            ▼
-                  ┌──────────────────────┐
-                  │  LangGraph :2024     │
-                  │  (DeerFlow Harness)  │
-                  └──────────────────────┘
 ```

- **nginx**: routes requests — `/api/*` to the Gateway, LangGraph streaming endpoints to LangGraph directly, and everything else to the frontend.
- **Frontend** (Next.js + React): the browser UI. Communicates with both the Gateway and LangGraph.
- **Gateway** (FastAPI): handles API operations — model listing, agent CRUD, memory, extensions management, file uploads.
- **LangGraph**: the DeerFlow Harness runtime. Manages thread state, agent execution, and streaming.
+- **nginx**: routes requests — `/api/*` and `/api/langgraph/*` to Gateway, and everything else to the frontend.
+- **Frontend** (Next.js + React): the browser UI. Communicates with Gateway.
+- **Gateway** (FastAPI): handles API operations and the embedded LangGraph-compatible runtime for thread state, agent execution, and streaming.

 ## Technology stack

@@ -64,7 +56,7 @@ The DeerFlow App runs as four services behind a single nginx reverse proxy:
 | Gateway           | FastAPI, Python 3.12, uvicorn                                        |
 | Agent runtime     | LangGraph, LangChain, DeerFlow Harness                               |
 | Reverse proxy     | nginx                                                                |
-| State persistence | LangGraph Server (default) + optional SQLite/PostgreSQL checkpointer |
+| State persistence | Gateway runtime + optional SQLite/PostgreSQL checkpointer             |

 <Cards num={2}>
  <Cards.Card title="Quick Start" href="/docs/application/quick-start" />
@@ -15,15 +15,13 @@ All services write logs to the `logs/` directory when started with `make dev`:

 | File                 | Service                              |
 | -------------------- | ------------------------------------ |
-| `logs/langgraph.log` | LangGraph / DeerFlow Harness runtime |
-| `logs/gateway.log`   | FastAPI Gateway API                  |
+| `logs/gateway.log`   | FastAPI Gateway API and agent runtime |
 | `logs/frontend.log`  | Next.js frontend dev server          |
 | `logs/nginx.log`     | nginx reverse proxy                  |

 Tail logs in real time:

 ```bash
-tail -f logs/langgraph.log
 tail -f logs/gateway.log
 ```

@@ -41,9 +39,6 @@ Verify each service is responding:
 # Gateway health
 curl http://localhost:8001/health

-# LangGraph health
-curl http://localhost:2024/ok
-
 # Through nginx (verifies full proxy chain)
 curl http://localhost:2026/api/models
 ```
@@ -66,7 +61,7 @@ grep config_version config.yaml

 ### The app loads but the agent doesn't respond

-1. Check `logs/langgraph.log` for startup errors.
+1. Check `logs/gateway.log` for startup errors.
 2. Verify your model is correctly configured in `config.yaml` with a valid API key.
 3. Confirm the API key environment variable is set in the shell that ran `make dev`.
 4. Test the model endpoint directly with `curl` to rule out network issues.
@@ -126,7 +121,7 @@ Connection refused: http://provisioner:8002

 If MCP tools appear in `extensions_config.json` but are not available in the agent:

-1. Check `logs/langgraph.log` for MCP initialization errors.
+1. Check `logs/gateway.log` for MCP initialization errors.
 2. Verify the MCP server command is installed (`npx`, `uvx`, or the relevant binary).
 3. Test the server command manually to confirm it starts without errors.
 4. Set `log_level: debug` to see detailed MCP loading output.
@@ -137,7 +132,7 @@ If MCP tools appear in `extensions_config.json` but are not available in the age

 - Verify `memory.enabled: true` in `config.yaml`.
 - Check that the storage path is writable: `ls -la backend/.deer-flow/`.
- Look for memory update errors in `logs/langgraph.log` (search for "memory").
+- Look for memory update errors in `logs/gateway.log` (search for "memory").

 ## Data backup

@@ -1,6 +1,6 @@
 ---
 title: Quick Start
-description: This guide walks you through starting DeerFlow App on your local machine using the `make dev` workflow. All four services (LangGraph, Gateway, Frontend, nginx) start together and are accessible through a single URL.
+description: This guide walks you through starting DeerFlow App on your local machine using the `make dev` workflow. Gateway, Frontend, and nginx start together and are accessible through a single URL.
 ---

 import { Callout, Cards, Steps } from "nextra/components";
@@ -12,7 +12,7 @@ import { Callout, Cards, Steps } from "nextra/components";
  Python 3.12+, Node.js 22+, and at least one LLM API key.
 </Callout>

-This guide walks you through starting DeerFlow App on your local machine using the `make dev` workflow. All four services (LangGraph, Gateway, Frontend, nginx) start together and are accessible through a single URL.
+This guide walks you through starting DeerFlow App on your local machine using the `make dev` workflow. Gateway, Frontend, and nginx start together and are accessible through a single URL.

 ## Prerequisites

@@ -88,8 +88,7 @@ make dev

 This starts:

- LangGraph server on port `2024`
- Gateway API on port `8001`
+- Gateway API and embedded agent runtime on port `8001`
 - Frontend on port `3000`
 - nginx reverse proxy on port `2026`

@@ -113,15 +112,13 @@ Log files:

 | Service   | Log file             |
 | --------- | -------------------- |
-| LangGraph | `logs/langgraph.log` |
 | Gateway   | `logs/gateway.log`   |
 | Frontend  | `logs/frontend.log`  |
 | nginx     | `logs/nginx.log`     |

 <Callout type="tip">
  If something is not working, check the log files first. Most startup errors
-  (missing API keys, config parsing failures) appear in `logs/langgraph.log` or
-  `logs/gateway.log`.
+  (missing API keys, config parsing failures) appear in `logs/gateway.log`.
 </Callout>

 <Cards num={2}>
@@ -67,6 +67,26 @@ Each agent response in the conversation may contain:

 Tool calls and thinking steps are collapsed by default. Click to expand them.

+## Understanding token usage
+
+If token usage display is enabled, DeerFlow shows one conversation-level total in
+the header and optional per-turn or debug summaries in the message list.
+
+- **Header total**: the persisted thread-level total from the backend. While the
+  current run is still streaming, the header may also include the visible
+  in-flight usage for that unfinished response.
+- **Per-turn / debug usage**: usage derived from the assistant messages that are
+  currently visible in the conversation view.
+
+This means the header total and the visible per-turn totals do **not** need to
+add up exactly. The header is a thread ledger; the per-turn view is a rendering
+of the messages you can currently see.
+
+These totals may also differ from your provider's billing page. Common reasons
+include retries, failed requests, cached input tokens, reasoning tokens,
+provider-specific billing rules, and internal calls that do not appear as normal
+chat messages.
+
 ## Switching agents

 If you have created custom agents, use the **Agent** selector in the input bar to switch to a different agent. The selected agent persists for the duration of the thread.
@@ -68,7 +68,7 @@ DeerFlow ships with the following public skills:

 ### Discovery and loading

-`load_skills()` in `skills/loader.py` scans both `public/` and `custom/` directories under the configured skills path. It re-reads `ExtensionsConfig.from_file()` on every call, which means enabling or disabling a skill through the Gateway API takes effect immediately in the running LangGraph server without a restart.
+`load_skills()` in `skills/loader.py` scans both `public/` and `custom/` directories under the configured skills path. It re-reads `ExtensionsConfig.from_file()` on every call, which means enabling or disabling a skill through the Gateway API takes effect immediately in the running agent runtime without a restart.

 ### Parsing

@@ -215,7 +215,6 @@ BETTER_AUTH_SECRET=local-dev-secret-at-least-32-chars
 | `DEER_FLOW_CONFIG_PATH` | 自动发现         | `config.yaml` 的绝对路径                         |
 | `LOG_LEVEL`             | `info`           | 日志详细程度（`debug`/`info`/`warning`/`error`） |
 | `DEER_FLOW_ROOT`        | 仓库根目录       | 用于 Docker 中的技能和线程挂载                   |
-| `LANGGRAPH_UPSTREAM`    | `langgraph:2024` | nginx 代理的 LangGraph 地址                      |

 <Cards num={2}>
  <Cards.Card title="Harness 配置" href="/docs/harness/configuration" />
@@ -23,8 +23,7 @@ make dev

 | 服务        | 端口 | 描述                    |
 | ----------- | ---- | ----------------------- |
-| LangGraph   | 2024 | DeerFlow Harness 运行时 |
-| Gateway API | 8001 | FastAPI 后端            |
+| Gateway API | 8001 | FastAPI 后端 + 嵌入式 Agent 运行时 |
 | 前端        | 3000 | Next.js 界面            |
 | nginx       | 2026 | 统一反向代理            |

@@ -36,13 +35,12 @@ make dev
 make stop
 ```

-停止所有四个服务。即使某个服务没有运行也可以安全执行。
+停止所有服务。即使某个服务没有运行也可以安全执行。

  </Tabs.Tab>
  <Tabs.Tab>
 ```
-logs/langgraph.log   # Agent 运行时日志
-logs/gateway.log     # API Gateway 日志
+logs/gateway.log     # API Gateway 和 Agent 运行时日志
 logs/frontend.log    # Next.js 开发服务器日志
 logs/nginx.log       # nginx 访问/错误日志
 ```
@@ -50,7 +48,7 @@ logs/nginx.log       # nginx 访问/错误日志
 实时追踪日志：

 ```bash
-tail -f logs/langgraph.log
+tail -f logs/gateway.log
 ```

  </Tabs.Tab>
@@ -96,7 +94,7 @@ BETTER_AUTH_SECRET=your-secret-here-min-32-chars

 ### 数据持久化

-线程数据存储在 `backend/.deer-flow/threads/`。在 Docker 部署中，此目录被绑定挂载到 langgraph 容器中。
+线程数据存储在 `backend/.deer-flow/threads/`。在 Docker 部署中，此目录会绑定挂载到 gateway 容器中。

 为避免容器重建时数据丢失：

@@ -156,14 +154,7 @@ SKILLS_PVC_NAME=deer-flow-skills-pvc

 ### nginx 配置

-nginx 路由所有流量，控制路由的关键环境变量：
-
-| 变量                 | 默认值           | 描述                          |
-| -------------------- | ---------------- | ----------------------------- |
-| `LANGGRAPH_UPSTREAM` | `langgraph:2024` | LangGraph 服务地址            |
-| `LANGGRAPH_REWRITE`  | `/`              | LangGraph 路由的 URL 重写前缀 |
-
-这些在 Docker Compose 环境中设置，并在容器启动时由 `envsubst` 处理。
+nginx 将流量路由到前端或 Gateway。`/api/langgraph/*` 会被重写到 Gateway 的 LangGraph-compatible `/api/*` 路由，因此不需要单独的 LangGraph upstream。

 ### 认证配置

@@ -181,8 +172,7 @@ openssl rand -base64 32

 | 服务                      | 最低配置         | 推荐配置         |
 | ------------------------- | ---------------- | ---------------- |
-| LangGraph（Agent 运行时） | 2 vCPU、4 GB RAM | 4 vCPU、8 GB RAM |
-| Gateway                   | 0.5 vCPU、512 MB | 1 vCPU、1 GB     |
+| Gateway + Agent 运行时    | 2 vCPU、4 GB RAM | 4 vCPU、8 GB RAM |
 | 前端                      | 0.5 vCPU、512 MB | 1 vCPU、1 GB     |
 | 沙箱容器（每会话）        | 1 vCPU、1 GB     | 2 vCPU、2 GB     |

@@ -194,9 +184,6 @@ openssl rand -base64 32
 # 检查 Gateway 健康状态
 curl http://localhost:8001/health

-# 检查 LangGraph 健康状态
-curl http://localhost:2024/ok
-
 # 通过 nginx 列出配置的模型（验证完整代理链）
 curl http://localhost:2026/api/models
 ```
@@ -25,11 +25,11 @@ DeerFlow 应用是 DeerFlow 生产体验的参考实现。它将 Harness 运行
 | **流式响应**     | 实时 token 流式传输，带思考步骤和工具调用可见性       |
 | **产出物查看器** | Agent 生成文件和输出的浏览器内预览和下载              |
 | **扩展界面**     | 无需编辑配置文件即可启用/禁用 MCP 服务器和技能        |
-| **Gateway API**  | 桥接前端和 LangGraph 运行时的基于 FastAPI 的 REST API |
+| **Gateway API**  | 基于 FastAPI 的 REST API，并内置 LangGraph-compatible Agent 运行时 |

 ## 架构

-DeerFlow 应用以四个服务的形式运行，通过单个 nginx 反向代理提供：
+DeerFlow 应用通过单个 nginx 反向代理提供：

 ```
                ┌──────────────────┐
@@ -42,19 +42,11 @@ DeerFlow 应用以四个服务的形式运行，通过单个 nginx 反向代理
 │  前端 :3000      │       │  Gateway API :8001    │
 │  (Next.js)       │       │  (FastAPI)            │
 └──────────────────┘       └──────────────────────┘
-                                      │
-                            ┌─────────┘
-                            ▼
-                  ┌──────────────────────┐
-                  │  LangGraph :2024     │
-                  │  (DeerFlow Harness)  │
-                  └──────────────────────┘
 ```

- **nginx**：路由请求——`/api/*` 到 Gateway，LangGraph 流式端点到 LangGraph，其余到前端。
- **前端**（Next.js + React）：浏览器界面，与 Gateway 和 LangGraph 通信。
- **Gateway**（FastAPI）：处理 API 操作——模型列表、Agent CRUD、记忆、扩展管理、文件上传。
- **LangGraph**：DeerFlow Harness 运行时，管理线程状态、Agent 执行和流式传输。
+- **nginx**：路由请求——`/api/*` 和 `/api/langgraph/*` 到 Gateway，其余到前端。
+- **前端**（Next.js + React）：浏览器界面，与 Gateway 通信。
+- **Gateway**（FastAPI）：处理 API 操作，并通过内置 LangGraph-compatible 运行时管理线程状态、Agent 执行和流式传输。

 ## 技术栈

@@ -64,7 +56,7 @@ DeerFlow 应用以四个服务的形式运行，通过单个 nginx 反向代理
 | Gateway      | FastAPI、Python 3.12、uvicorn                           |
 | Agent 运行时 | LangGraph、LangChain、DeerFlow Harness                  |
 | 反向代理     | nginx                                                   |
-| 状态持久化   | LangGraph Server（默认）+ 可选 SQLite/PostgreSQL 检查点 |
+| 状态持久化   | Gateway 运行时 + 可选 SQLite/PostgreSQL 检查点 |

 <Cards num={2}>
  <Cards.Card title="快速上手" href="/docs/application/quick-start" />
@@ -15,16 +15,14 @@ DeerFlow 应用在 `logs/` 目录中写入每个服务的日志：

 | 文件                 | 内容                                   |
 | -------------------- | -------------------------------------- |
-| `logs/langgraph.log` | Agent 运行时、工具调用、LangGraph 错误 |
-| `logs/gateway.log`   | API 请求/响应、Gateway 错误            |
+| `logs/gateway.log`   | API 请求/响应、Agent 运行时和 Gateway 错误 |
 | `logs/frontend.log`  | Next.js 服务器日志                     |
 | `logs/nginx.log`     | 代理访问和错误日志                     |

 **实时追踪日志**：

 ```bash
-tail -f logs/langgraph.log   # 查看 Agent 活动
-tail -f logs/gateway.log     # 查看 API 请求
+tail -f logs/gateway.log     # 查看 API 请求和 Agent 活动
 ```

 **调整日志级别**：
@@ -42,9 +40,6 @@ DeerFlow 暴露健康检查端点：
 # Gateway 健康状态
 curl http://localhost:8001/health

-# LangGraph 健康状态
-curl http://localhost:2024/ok
-
 # 通过 nginx 完整代理链验证
 curl http://localhost:2026/api/models
 ```
@@ -68,8 +63,8 @@ make config-upgrade
 **诊断**：

 ```bash
-# 检查 LangGraph 日志中的模型错误
-grep -i "error\|apikey\|unauthorized" logs/langgraph.log | tail -20
+# 检查 Gateway 日志中的模型错误
+grep -i "error\|apikey\|unauthorized" logs/gateway.log | tail -20
 ```

 **解决**：
@@ -118,13 +113,13 @@ SKIP_ENV_VALIDATION=1 pnpm build

 ### MCP 服务器连接失败

-**症状**：MCP 工具未出现，`logs/langgraph.log` 中有超时错误。
+**症状**：MCP 工具未出现，`logs/gateway.log` 中有超时错误。

 **诊断**：

 ```bash
 # 检查 MCP 相关错误
-grep -i "mcp\|timeout" logs/langgraph.log | tail -20
+grep -i "mcp\|timeout" logs/gateway.log | tail -20
 ```

 **解决**：
@@ -1,6 +1,6 @@
 ---
 title: 快速上手
-description: 本指南引导你使用 `make dev` 工作流在本地机器上启动 DeerFlow 应用。所有四个服务（LangGraph、Gateway、前端、nginx）一起启动，通过单个 URL 访问。
+description: 本指南引导你使用 `make dev` 工作流在本地机器上启动 DeerFlow 应用。Gateway、前端和 nginx 会一起启动，通过单个 URL 访问。
 ---

 import { Callout, Cards, Steps } from "nextra/components";
@@ -12,7 +12,7 @@ import { Callout, Cards, Steps } from "nextra/components";
  3.12+、Node.js 22+ 的机器，以及至少一个 LLM API Key。
 </Callout>

-本指南引导你使用 `make dev` 工作流在本地机器上启动 DeerFlow 应用。所有四个服务（LangGraph、Gateway、前端、nginx）一起启动，通过单个 URL 访问。
+本指南引导你使用 `make dev` 工作流在本地机器上启动 DeerFlow 应用。Gateway、前端和 nginx 会一起启动，通过单个 URL 访问。

 ## 前置条件

@@ -88,8 +88,7 @@ make dev

 这会启动：

- LangGraph 服务，端口 `2024`
- Gateway API，端口 `8001`
+- Gateway API 和嵌入式 Agent 运行时，端口 `8001`
 - 前端，端口 `3000`
 - nginx 反向代理，端口 `2026`

@@ -113,15 +112,13 @@ make stop

 | 服务      | 日志文件             |
 | --------- | -------------------- |
-| LangGraph | `logs/langgraph.log` |
 | Gateway   | `logs/gateway.log`   |
 | 前端      | `logs/frontend.log`  |
 | nginx     | `logs/nginx.log`     |

 <Callout type="tip">
  如果有问题，先检查日志文件。大多数启动错误（缺失 API
-  Key、配置解析失败）会出现在 <code>logs/langgraph.log</code> 或{" "}
-  <code>logs/gateway.log</code> 中。
+  Key、配置解析失败）会出现在 <code>logs/gateway.log</code> 中。
 </Callout>

 <Cards num={2}>
@@ -70,6 +70,17 @@ DeerFlow 工作区是一个基于浏览器的对话界面，你可以在其中

 点击消息旁边的展开箭头查看完整的推理链。

+## 理解 Token 用量
+
+如果启用了 Token 用量显示，DeerFlow 会在顶部显示一个对话级总量，并在消息列表中按配置显示每轮或调试级别的用量摘要。
+
+- **顶部总量**：后端持久化的线程级总账。当当前回复仍在流式返回时，顶部还可能临时叠加这条未完成回复的可见进行中用量。
+- **每轮 / 调试用量**：根据当前界面里可见的 assistant 消息计算出来的用量。
+
+因此，顶部总量和当前可见的每轮总和**不要求完全相等**。顶部展示的是整个线程的总账；每轮展示的是你当前能看到的消息视图。
+
+这些数字也可能与模型供应商的账单页不同。常见原因包括重试请求、失败请求、缓存输入 token、推理 token、供应商自己的计费口径，以及不会以普通聊天消息形式显示的内部调用。
+
 ## 查看产出物

 当 Agent 生成文件（报告、图表、代码文件、演示文稿）时，它们会以**产出物**的形式出现在对话中。
@@ -10,6 +10,8 @@ import React, {
  type ReactNode,
 } from "react";

+import { getBackendBaseURL } from "@/core/config";
+
 import { type User, buildLoginUrl } from "./types";

 // Re-export for consumers
@@ -56,7 +58,7 @@ export function AuthProvider({ children, initialUser }: AuthProviderProps) {
  const refreshUser = useCallback(async () => {
    try {
      setIsLoading(true);
-      const res = await fetch("/api/v1/auth/me", {
+      const res = await fetch(`${getBackendBaseURL()}/api/v1/auth/me`, {
        credentials: "include",
      });

@@ -88,7 +90,7 @@ export function AuthProvider({ children, initialUser }: AuthProviderProps) {
    setUser(null);

    try {
-      await fetch("/api/v1/auth/logout", {
+      await fetch(`${getBackendBaseURL()}/api/v1/auth/logout`, {
        method: "POST",
        credentials: "include",
      });
@@ -310,7 +310,7 @@ export const enUS: Translations = {
    unavailable:
      "No token usage yet. Usage appears only after a successful model response when the provider returns usage_metadata.",
    unavailableShort: "No usage returned",
-    note: "Header totals use persisted thread usage when available. Per-turn and debug usage come from visible messages. Totals may differ from provider billing pages.",
+    note: "Header totals use persisted thread usage, plus visible in-flight usage while a run is still streaming. Per-turn and debug usage come from currently visible messages only. Totals may differ from provider billing pages.",
    presets: {
      off: "Off",
      summary: "Summary",
@@ -296,7 +296,7 @@ export const zhCN: Translations = {
    unavailable:
      "暂无 Token 用量。只有模型成功返回且供应商提供 usage_metadata 时才会显示。",
    unavailableShort: "未返回用量",
-    note: "顶部总量优先使用后端持久化的线程用量。每轮和调试用量来自当前可见消息，可能与平台账单页不完全一致。",
+    note: "顶部总量优先使用后端持久化的线程用量；当当前回复仍在流式返回时，还会叠加可见的进行中用量。每轮和调试用量只来自当前可见消息，可能与平台账单页不完全一致。",
    presets: {
      off: "关闭",
      summary: "总览",
@@ -20,7 +20,11 @@ test("fetchThreadTokenUsage uses shared auth fetch without JSON GET headers", as
      total_tokens: 7,
      total_runs: 1,
      by_model: { unknown: { tokens: 7, runs: 1 } },
-      by_caller: {},
+      by_caller: {
+        lead_agent: 0,
+        subagent: 0,
+        middleware: 0,
+      },
    }),
  });

@@ -14,8 +14,8 @@ DeerFlow exposes two API surfaces behind an Nginx reverse proxy:

 | Service        | Direct Port | Via Proxy                        | Purpose                          |
 |----------------|-------------|----------------------------------|----------------------------------|
-| Gateway API    | 8001        | `$DEERFLOW_GATEWAY_URL`          | REST endpoints (models, skills, memory, uploads) |
-| LangGraph API  | 2024        | `$DEERFLOW_LANGGRAPH_URL`        | Agent threads, runs, streaming   |
+| Gateway API    | 8001        | `$DEERFLOW_GATEWAY_URL`          | REST endpoints and embedded agent runtime |
+| LangGraph-compatible API | 8001 | `$DEERFLOW_LANGGRAPH_URL`       | Agent threads, runs, streaming   |

 ## Environment Variables