Fix test assertions for run ordering in RunManager tests

- Updated assertions in `test_list_by_thread` to reflect correct ordering of runs. - Modified `test_list_by_thread_is_stable_when_timestamps_tie` to ensure stable ordering when timestamps are tied.
Refactor API fetch calls to use a unified fetch function; enhance chat history loading with new hooks and UI components
2026-04-19 09:55:34 +08:00 · 2026-04-17 23:41:11 +08:00 · 2026-04-12 20:04:32 +08:00 · 2026-04-12 12:05:38 +08:00 · 2026-04-12 11:32:25 +08:00 · 2026-04-12 11:08:31 +08:00
286 changed files with 26003 additions and 1167 deletions
@@ -33,5 +33,9 @@ INFOQUEST_API_KEY=your-infoquest-api-key
 # GitHub API Token
 # GITHUB_TOKEN=your-github-token
 # Database (only needed when config.yaml has database.backend: postgres)
 # DATABASE_URL=postgresql://deerflow:password@localhost:5432/deerflow
 #
 # WECOM_BOT_ID=your-wecom-bot-id
 # WECOM_BOT_SECRET=your-wecom-bot-secret
@@ -158,7 +158,7 @@ from deerflow.config import get_app_config
 Middlewares execute in strict order in `packages/harness/deerflow/agents/lead_agent/agent.py`:
-1. **ThreadDataMiddleware** - Creates per-thread directories (`backend/.deer-flow/threads/{thread_id}/user-data/{workspace,uploads,outputs}`); Web UI thread deletion now follows LangGraph thread removal with Gateway cleanup of the local `.deer-flow/threads/{thread_id}` directory
+1. **ThreadDataMiddleware** - Creates per-thread directories under the user's isolation scope (`backend/.deer-flow/users/{user_id}/threads/{thread_id}/user-data/{workspace,uploads,outputs}`); resolves `user_id` via `get_effective_user_id()` (falls back to `"default"` in no-auth mode); Web UI thread deletion now follows LangGraph thread removal with Gateway cleanup of the local thread directory
 2. **UploadsMiddleware** - Tracks and injects newly uploaded files into conversation
 3. **SandboxMiddleware** - Acquires sandbox, stores `sandbox_id` in state
 4. **DanglingToolCallMiddleware** - Injects placeholder ToolMessages for AIMessage tool_calls that lack responses (e.g., due to user interruption)
@@ -216,6 +216,9 @@ FastAPI application on port 8001 with health check at `GET /health`.
 | **Threads** (`/api/threads/{id}`) | `DELETE /` - remove DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail |
 | **Artifacts** (`/api/threads/{id}/artifacts`) | `GET /{path}` - serve artifacts; active content types (`text/html`, `application/xhtml+xml`, `image/svg+xml`) are always forced as download attachments to reduce XSS risk; `?download=true` still forces download for other file types |
 | **Suggestions** (`/api/threads/{id}/suggestions`) | `POST /` - generate follow-up questions; rich list/block model content is normalized before JSON parsing |
 | **Thread Runs** (`/api/threads/{id}/runs`) | `POST /` - create background run; `POST /stream` - create + SSE stream; `POST /wait` - create + block; `GET /` - list runs; `GET /{rid}` - run details; `POST /{rid}/cancel` - cancel; `GET /{rid}/join` - join SSE; `GET /{rid}/messages` - paginated messages `{data, has_more}`; `GET /{rid}/events` - full event stream; `GET /../messages` - thread messages with feedback; `GET /../token-usage` - aggregate tokens |
 | **Feedback** (`/api/threads/{id}/runs/{rid}/feedback`) | `PUT /` - upsert feedback; `DELETE /` - delete user feedback; `POST /` - create feedback; `GET /` - list feedback; `GET /stats` - aggregate stats; `DELETE /{fid}` - delete specific |
 | **Runs** (`/api/runs`) | `POST /stream` - stateless run + SSE; `POST /wait` - stateless run + block; `GET /{rid}/messages` - paginated messages by run_id `{data, has_more}` (cursor: `after_seq`/`before_seq`); `GET /{rid}/feedback` - list feedback by run_id |
 Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` → Gateway.
@@ -229,7 +232,7 @@ Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` →
 **Virtual Path System**:
 - Agent sees: `/mnt/user-data/{workspace,uploads,outputs}`, `/mnt/skills`
- Physical: `backend/.deer-flow/threads/{thread_id}/user-data/...`, `deer-flow/skills/`
+- Physical: `backend/.deer-flow/users/{user_id}/threads/{thread_id}/user-data/...`, `deer-flow/skills/`
 - Translation: `replace_virtual_path()` / `replace_virtual_paths_in_command()`
 - Detection: `is_local_sandbox()` checks `sandbox_id == "local"`
@@ -269,7 +272,7 @@ Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` →
 - `invoke_acp_agent` - Invokes external ACP-compatible agents from `config.yaml`
 - ACP launchers must be real ACP adapters. The standard `codex` CLI is not ACP-compatible by itself; configure a wrapper such as `npx -y @zed-industries/codex-acp` or an installed `codex-acp` binary
 - Missing ACP executables now return an actionable error message instead of a raw `[Errno 2]`
- Each ACP agent uses a per-thread workspace at `{base_dir}/threads/{thread_id}/acp-workspace/`. The workspace is accessible to the lead agent via the virtual path `/mnt/acp-workspace/` (read-only). In docker sandbox mode, the directory is volume-mounted into the container at `/mnt/acp-workspace` (read-only); in local sandbox mode, path translation is handled by `tools.py`
+- Each ACP agent uses a per-thread workspace at `{base_dir}/users/{user_id}/threads/{thread_id}/acp-workspace/`. The workspace is accessible to the lead agent via the virtual path `/mnt/acp-workspace/` (read-only). In docker sandbox mode, the directory is volume-mounted into the container at `/mnt/acp-workspace` (read-only); in local sandbox mode, path translation is handled by `tools.py`
 - `image_search/` - Image search via DuckDuckGo
 ### MCP System (`packages/harness/deerflow/mcp/`)
@@ -338,18 +341,27 @@ Bridges external messaging platforms (Feishu, Slack, Telegram) to the DeerFlow a
 **Components**:
 - `updater.py` - LLM-based memory updates with fact extraction, whitespace-normalized fact deduplication (trims leading/trailing whitespace before comparing), and atomic file I/O
- `queue.py` - Debounced update queue (per-thread deduplication, configurable wait time)
+- `queue.py` - Debounced update queue (per-thread deduplication, configurable wait time); captures `user_id` at enqueue time so it survives the `threading.Timer` boundary
 - `prompt.py` - Prompt templates for memory updates
 - `storage.py` - File-based storage with per-user isolation; cache keyed by `(user_id, agent_name)` tuple
-**Data Structure** (stored in `backend/.deer-flow/memory.json`):
+**Per-User Isolation**:
 - Memory is stored per-user at `{base_dir}/users/{user_id}/memory.json`
 - Per-agent per-user memory at `{base_dir}/users/{user_id}/agents/{agent_name}/memory.json`
 - `user_id` is resolved via `get_effective_user_id()` from `deerflow.runtime.user_context`
 - In no-auth mode, `user_id` defaults to `"default"` (constant `DEFAULT_USER_ID`)
 - Absolute `storage_path` in config opts out of per-user isolation
 - **Migration**: Run `PYTHONPATH=. python scripts/migrate_user_isolation.py` to move legacy `memory.json` and `threads/` into per-user layout; supports `--dry-run`
 **Data Structure** (stored in `{base_dir}/users/{user_id}/memory.json`):
 - **User Context**: `workContext`, `personalContext`, `topOfMind` (1-3 sentence summaries)
 - **History**: `recentMonths`, `earlierContext`, `longTermBackground`
 - **Facts**: Discrete facts with `id`, `content`, `category` (preference/knowledge/context/behavior/goal), `confidence` (0-1), `createdAt`, `source`
 **Workflow**:
-1. `MemoryMiddleware` filters messages (user inputs + final AI responses) and queues conversation
+1. `MemoryMiddleware` filters messages (user inputs + final AI responses), captures `user_id` via `get_effective_user_id()`, and queues conversation with the captured `user_id`
 2. Queue debounces (30s default), batches updates, deduplicates per-thread
-3. Background thread invokes LLM to extract context updates and facts
+3. Background thread invokes LLM to extract context updates and facts, using the stored `user_id` (not the contextvar, which is unavailable on timer threads)
 4. Applies updates atomically (temp file + rename) with cache invalidation, skipping duplicate fact content before append
 5. Next interaction injects top 15 facts + context into `<memory>` tags in system prompt
@@ -357,7 +369,7 @@ Focused regression coverage for the updater lives in `backend/tests/test_memory_
 **Configuration** (`config.yaml` → `memory`):
 - `enabled` / `injection_enabled` - Master switches
- `storage_path` - Path to memory.json
+- `storage_path` - Path to memory.json (absolute path opts out of per-user isolation)
 - `debounce_seconds` - Wait time before processing (default: 30)
 - `model_name` - LLM for updates (null = default model)
 - `max_facts` / `fact_confidence_threshold` - Fact storage limits (100 / 0.7)
@@ -13,6 +13,9 @@ FROM python:3.12-slim-bookworm AS builder
 ARG NODE_MAJOR=22
 ARG APT_MIRROR
 ARG UV_INDEX_URL
 # Optional extras to install (e.g. "postgres" for PostgreSQL support)
 # Usage: docker build --build-arg UV_EXTRAS=postgres ...
 ARG UV_EXTRAS
 # Optionally override apt mirror for restricted networks (e.g. APT_MIRROR=mirrors.aliyun.com)
 RUN if [ -n "${APT_MIRROR}" ]; then \
@@ -43,8 +46,9 @@ WORKDIR /app
 COPY backend ./backend
 # Install dependencies with cache mount
 # When UV_EXTRAS is set (e.g. "postgres"), installs optional dependencies.
 RUN --mount=type=cache,target=/root/.cache/uv \
-    sh -c "cd backend && UV_INDEX_URL=${UV_INDEX_URL:-https://pypi.org/simple} uv sync"
+    sh -c "cd backend && UV_INDEX_URL=${UV_INDEX_URL:-https://pypi.org/simple} uv sync ${UV_EXTRAS:+--extra $UV_EXTRAS}"
 # ── Stage 2: Dev ──────────────────────────────────────────────────────────────
 # Retains compiler toolchain from builder so startup-time `uv sync` can build
@@ -13,6 +13,7 @@ from app.channels.base import Channel
 from app.channels.commands import KNOWN_CHANNEL_COMMANDS
 from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
 from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
 from deerflow.runtime.user_context import get_effective_user_id
 from deerflow.sandbox.sandbox_provider import get_sandbox_provider
 logger = logging.getLogger(__name__)
@@ -344,8 +345,9 @@ class FeishuChannel(Channel):
            return f"Failed to obtain the [{type}]"
        paths = get_paths()
-        paths.ensure_thread_dirs(thread_id)
+        user_id = get_effective_user_id()
-        uploads_dir = paths.sandbox_uploads_dir(thread_id).resolve()
+        paths.ensure_thread_dirs(thread_id, user_id=user_id)
        uploads_dir = paths.sandbox_uploads_dir(thread_id, user_id=user_id).resolve()
        ext = "png" if type == "image" else "bin"
        raw_filename = getattr(response, "file_name", "") or f"feishu_{file_key[-12:]}.{ext}"
@@ -17,6 +17,7 @@ from langgraph_sdk.errors import ConflictError
 from app.channels.commands import KNOWN_CHANNEL_COMMANDS
 from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
 from app.channels.store import ChannelStore
 from deerflow.runtime.user_context import get_effective_user_id
 logger = logging.getLogger(__name__)
@@ -341,14 +342,15 @@ def _resolve_attachments(thread_id: str, artifacts: list[str]) -> list[ResolvedA
    attachments: list[ResolvedAttachment] = []
    paths = get_paths()
-    outputs_dir = paths.sandbox_outputs_dir(thread_id).resolve()
+    user_id = get_effective_user_id()
    outputs_dir = paths.sandbox_outputs_dir(thread_id, user_id=user_id).resolve()
    for virtual_path in artifacts:
        # Security: only allow files from the agent outputs directory
        if not virtual_path.startswith(_OUTPUTS_VIRTUAL_PREFIX):
            logger.warning("[Manager] rejected non-outputs artifact path: %s", virtual_path)
            continue
        try:
-            actual = paths.resolve_virtual_path(thread_id, virtual_path)
+            actual = paths.resolve_virtual_path(thread_id, virtual_path, user_id=user_id)
            # Verify the resolved path is actually under the outputs directory
            # (guards against path-traversal even after prefix check)
            try:
@@ -1,16 +1,22 @@
 import logging
 import os
 from collections.abc import AsyncGenerator
 from contextlib import asynccontextmanager
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
 from app.gateway.auth_middleware import AuthMiddleware
 from app.gateway.config import get_gateway_config
 from app.gateway.csrf_middleware import CSRFMiddleware
 from app.gateway.deps import langgraph_runtime
 from app.gateway.routers import (
    agents,
    artifacts,
    assistants_compat,
    auth,
    channels,
    feedback,
    mcp,
    memory,
    models,
@@ -33,6 +39,108 @@ logging.basicConfig(
 logger = logging.getLogger(__name__)
 async def _ensure_admin_user(app: FastAPI) -> None:
    """Startup hook: handle first boot and migrate orphan threads otherwise.
    After admin creation, migrate orphan threads from the LangGraph
    store (metadata.user_id unset) to the admin account. This is the
    "no-auth → with-auth" upgrade path: users who ran DeerFlow without
    authentication have existing LangGraph thread data that needs an
    owner assigned.
        First boot (no admin exists):
            - Does NOT create any user accounts automatically.
            - The operator must visit ``/setup`` to create the first admin.
    Subsequent boots (admin already exists):
      - Runs the one-time "no-auth → with-auth" orphan thread migration for
        existing LangGraph thread metadata that has no owner_id.
    No SQL persistence migration is needed: the four user_id columns
    (threads_meta, runs, run_events, feedback) only come into existence
    alongside the auth module via create_all, so freshly created tables
    never contain NULL-owner rows.
    """
    from sqlalchemy import select
    from app.gateway.deps import get_local_provider
    from deerflow.persistence.engine import get_session_factory
    from deerflow.persistence.user.model import UserRow
    provider = get_local_provider()
    admin_count = await provider.count_admin_users()
    if admin_count == 0:
        logger.info("=" * 60)
        logger.info("  First boot detected — no admin account exists.")
        logger.info("  Visit /setup to complete admin account creation.")
        logger.info("=" * 60)
        return
    # Admin already exists — run orphan thread migration for any
    # LangGraph thread metadata that pre-dates the auth module.
    sf = get_session_factory()
    if sf is None:
        return
    async with sf() as session:
        stmt = select(UserRow).where(UserRow.system_role == "admin").limit(1)
        row = (await session.execute(stmt)).scalar_one_or_none()
    if row is None:
        return  # Should not happen (admin_count > 0 above), but be safe.
    admin_id = str(row.id)
    # LangGraph store orphan migration — non-fatal.
    # This covers the "no-auth → with-auth" upgrade path for users
    # whose existing LangGraph thread metadata has no user_id set.
    store = getattr(app.state, "store", None)
    if store is not None:
        try:
            migrated = await _migrate_orphaned_threads(store, admin_id)
            if migrated:
                logger.info("Migrated %d orphan LangGraph thread(s) to admin", migrated)
        except Exception:
            logger.exception("LangGraph thread migration failed (non-fatal)")
 async def _iter_store_items(store, namespace, *, page_size: int = 500):
    """Paginated async iterator over a LangGraph store namespace.
    Replaces the old hardcoded ``limit=1000`` call with a cursor-style
    loop so that environments with more than one page of orphans do
    not silently lose data. Terminates when a page is empty OR when a
    short page arrives (indicating the last page).
    """
    offset = 0
    while True:
        batch = await store.asearch(namespace, limit=page_size, offset=offset)
        if not batch:
            return
        for item in batch:
            yield item
        if len(batch) < page_size:
            return
        offset += page_size
 async def _migrate_orphaned_threads(store, admin_user_id: str) -> int:
    """Migrate LangGraph store threads with no user_id to the given admin.
    Uses cursor pagination so all orphans are migrated regardless of
    count. Returns the number of rows migrated.
    """
    migrated = 0
    async for item in _iter_store_items(store, ("threads",)):
        metadata = item.value.get("metadata", {})
        if not metadata.get("user_id"):
            metadata["user_id"] = admin_user_id
            item.value["metadata"] = metadata
            await store.aput(("threads",), item.key, item.value)
            migrated += 1
    return migrated
@asynccontextmanager
 async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """Application lifespan handler."""
@@ -52,6 +160,10 @@ async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    async with langgraph_runtime(app):
        logger.info("LangGraph runtime initialised")
        # Ensure admin user exists (auto-create on first boot)
        # Must run AFTER langgraph_runtime so app.state.store is available for thread migration
        await _ensure_admin_user(app)
        # Start IM channel service if any channels are configured
        try:
            from app.channels.service import start_channel_service
@@ -163,7 +275,31 @@ This gateway provides custom endpoints for models, MCP configuration, skills, an
        ],
    )
-    # CORS is handled by nginx - no need for FastAPI middleware
+    # Auth: reject unauthenticated requests to non-public paths (fail-closed safety net)
    app.add_middleware(AuthMiddleware)
    # CSRF: Double Submit Cookie pattern for state-changing requests
    app.add_middleware(CSRFMiddleware)
    # CORS: when GATEWAY_CORS_ORIGINS is set (dev without nginx), add CORS middleware.
    # In production, nginx handles CORS and no middleware is needed.
    cors_origins_env = os.environ.get("GATEWAY_CORS_ORIGINS", "")
    if cors_origins_env:
        cors_origins = [o.strip() for o in cors_origins_env.split(",") if o.strip()]
        # Validate: wildcard origin with credentials is a security misconfiguration
        for origin in cors_origins:
            if origin == "*":
                logger.error("GATEWAY_CORS_ORIGINS contains wildcard '*' with allow_credentials=True. This is a security misconfiguration — browsers will reject the response. Use explicit scheme://host:port origins instead.")
                cors_origins = [o for o in cors_origins if o != "*"]
                break
        if cors_origins:
            app.add_middleware(
                CORSMiddleware,
                allow_origins=cors_origins,
                allow_credentials=True,
                allow_methods=["*"],
                allow_headers=["*"],
            )
    # Include routers
    # Models API is mounted at /api/models
@@ -199,6 +335,12 @@ This gateway provides custom endpoints for models, MCP configuration, skills, an
    # Assistants compatibility API (LangGraph Platform stub)
    app.include_router(assistants_compat.router)
    # Auth API is mounted at /api/v1/auth
    app.include_router(auth.router)
    # Feedback API is mounted at /api/threads/{thread_id}/runs/{run_id}/feedback
    app.include_router(feedback.router)
    # Thread Runs API (LangGraph Platform-compatible runs lifecycle)
    app.include_router(thread_runs.router)
@@ -0,0 +1,42 @@
 """Authentication module for DeerFlow.
 This module provides:
 - JWT-based authentication
 - Provider Factory pattern for extensible auth methods
 - UserRepository interface for storage backends (SQLite)
 """
 from app.gateway.auth.config import AuthConfig, get_auth_config, set_auth_config
 from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse, TokenError
 from app.gateway.auth.jwt import TokenPayload, create_access_token, decode_token
 from app.gateway.auth.local_provider import LocalAuthProvider
 from app.gateway.auth.models import User, UserResponse
 from app.gateway.auth.password import hash_password, verify_password
 from app.gateway.auth.providers import AuthProvider
 from app.gateway.auth.repositories.base import UserRepository
 __all__ = [
    # Config
    "AuthConfig",
    "get_auth_config",
    "set_auth_config",
    # Errors
    "AuthErrorCode",
    "AuthErrorResponse",
    "TokenError",
    # JWT
    "TokenPayload",
    "create_access_token",
    "decode_token",
    # Password
    "hash_password",
    "verify_password",
    # Models
    "User",
    "UserResponse",
    # Providers
    "AuthProvider",
    "LocalAuthProvider",
    # Repository
    "UserRepository",
 ]
@@ -0,0 +1,57 @@
 """Authentication configuration for DeerFlow."""
 import logging
 import os
 import secrets
 from dotenv import load_dotenv
 from pydantic import BaseModel, Field
 load_dotenv()
 logger = logging.getLogger(__name__)
 class AuthConfig(BaseModel):
    """JWT and auth-related configuration. Parsed once at startup.
    Note: the ``users`` table now lives in the shared persistence
    database managed by ``deerflow.persistence.engine``. The old
    ``users_db_path`` config key has been removed — user storage is
    configured through ``config.database`` like every other table.
    """
    jwt_secret: str = Field(
        ...,
        description="Secret key for JWT signing. MUST be set via AUTH_JWT_SECRET.",
    )
    token_expiry_days: int = Field(default=7, ge=1, le=30)
    oauth_github_client_id: str | None = Field(default=None)
    oauth_github_client_secret: str | None = Field(default=None)
 _auth_config: AuthConfig | None = None
 def get_auth_config() -> AuthConfig:
    """Get the global AuthConfig instance. Parses from env on first call."""
    global _auth_config
    if _auth_config is None:
        jwt_secret = os.environ.get("AUTH_JWT_SECRET")
        if not jwt_secret:
            jwt_secret = secrets.token_urlsafe(32)
            os.environ["AUTH_JWT_SECRET"] = jwt_secret
            logger.warning(
                "⚠ AUTH_JWT_SECRET is not set — using an auto-generated ephemeral secret. "
                "Sessions will be invalidated on restart. "
                "For production, add AUTH_JWT_SECRET to your .env file: "
                'python -c "import secrets; print(secrets.token_urlsafe(32))"'
            )
        _auth_config = AuthConfig(jwt_secret=jwt_secret)
    return _auth_config
 def set_auth_config(config: AuthConfig) -> None:
    """Set the global AuthConfig instance (for testing)."""
    global _auth_config
    _auth_config = config
@@ -0,0 +1,48 @@
 """Write initial admin credentials to a restricted file instead of logs.
 Logging secrets to stdout/stderr is a well-known CodeQL finding
 (py/clear-text-logging-sensitive-data) — in production those logs
 get collected into ELK/Splunk/etc and become a secret sprawl
 source. This helper writes the credential to a 0600 file that only
 the process user can read, and returns the path so the caller can
 log **the path** (not the password) for the operator to pick up.
 """
 from __future__ import annotations
 import os
 from pathlib import Path
 from deerflow.config.paths import get_paths
 _CREDENTIAL_FILENAME = "admin_initial_credentials.txt"
 def write_initial_credentials(email: str, password: str, *, label: str = "initial") -> Path:
    """Write the admin email + password to ``{base_dir}/admin_initial_credentials.txt``.
    The file is created **atomically** with mode 0600 via ``os.open``
    so the password is never world-readable, even for the single syscall
    window between ``write_text`` and ``chmod``.
    ``label`` distinguishes "initial" (fresh creation) from "reset"
    (password reset) in the file header so an operator picking up the
    file after a restart can tell which event produced it.
    Returns the absolute :class:`Path` to the file.
    """
    target = get_paths().base_dir / _CREDENTIAL_FILENAME
    target.parent.mkdir(parents=True, exist_ok=True)
    content = (
        f"# DeerFlow admin {label} credentials\n# This file is generated on first boot or password reset.\n# Change the password after login via Settings -> Account,\n# then delete this file.\n#\nemail: {email}\npassword: {password}\n"
    )
    # Atomic 0600 create-or-truncate. O_TRUNC (not O_EXCL) so the
    # reset-password path can rewrite an existing file without a
    # separate unlink-then-create dance.
    fd = os.open(target, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
    with os.fdopen(fd, "w", encoding="utf-8") as fh:
        fh.write(content)
    return target.resolve()
@@ -0,0 +1,45 @@
 """Typed error definitions for auth module.
 AuthErrorCode: exhaustive enum of all auth failure conditions.
 TokenError: exhaustive enum of JWT decode failures.
 AuthErrorResponse: structured error payload for HTTP responses.
 """
 from enum import StrEnum
 from pydantic import BaseModel
 class AuthErrorCode(StrEnum):
    """Exhaustive list of auth error conditions."""
    INVALID_CREDENTIALS = "invalid_credentials"
    TOKEN_EXPIRED = "token_expired"
    TOKEN_INVALID = "token_invalid"
    USER_NOT_FOUND = "user_not_found"
    EMAIL_ALREADY_EXISTS = "email_already_exists"
    PROVIDER_NOT_FOUND = "provider_not_found"
    NOT_AUTHENTICATED = "not_authenticated"
    SYSTEM_ALREADY_INITIALIZED = "system_already_initialized"
 class TokenError(StrEnum):
    """Exhaustive list of JWT decode failure reasons."""
    EXPIRED = "expired"
    INVALID_SIGNATURE = "invalid_signature"
    MALFORMED = "malformed"
 class AuthErrorResponse(BaseModel):
    """Structured error response — replaces bare `detail` strings."""
    code: AuthErrorCode
    message: str
 def token_error_to_code(err: TokenError) -> AuthErrorCode:
    """Map TokenError to AuthErrorCode — single source of truth."""
    if err == TokenError.EXPIRED:
        return AuthErrorCode.TOKEN_EXPIRED
    return AuthErrorCode.TOKEN_INVALID
@@ -0,0 +1,55 @@
 """JWT token creation and verification."""
 from datetime import UTC, datetime, timedelta
 import jwt
 from pydantic import BaseModel
 from app.gateway.auth.config import get_auth_config
 from app.gateway.auth.errors import TokenError
 class TokenPayload(BaseModel):
    """JWT token payload."""
    sub: str  # user_id
    exp: datetime
    iat: datetime | None = None
    ver: int = 0  # token_version — must match User.token_version
 def create_access_token(user_id: str, expires_delta: timedelta | None = None, token_version: int = 0) -> str:
    """Create a JWT access token.
    Args:
        user_id: The user's UUID as string
        expires_delta: Optional custom expiry, defaults to 7 days
        token_version: User's current token_version for invalidation
    Returns:
        Encoded JWT string
    """
    config = get_auth_config()
    expiry = expires_delta or timedelta(days=config.token_expiry_days)
    now = datetime.now(UTC)
    payload = {"sub": user_id, "exp": now + expiry, "iat": now, "ver": token_version}
    return jwt.encode(payload, config.jwt_secret, algorithm="HS256")
 def decode_token(token: str) -> TokenPayload | TokenError:
    """Decode and validate a JWT token.
    Returns:
        TokenPayload if valid, or a specific TokenError variant.
    """
    config = get_auth_config()
    try:
        payload = jwt.decode(token, config.jwt_secret, algorithms=["HS256"])
        return TokenPayload(**payload)
    except jwt.ExpiredSignatureError:
        return TokenError.EXPIRED
    except jwt.InvalidSignatureError:
        return TokenError.INVALID_SIGNATURE
    except jwt.PyJWTError:
        return TokenError.MALFORMED
@@ -0,0 +1,91 @@
 """Local email/password authentication provider."""
 from app.gateway.auth.models import User
 from app.gateway.auth.password import hash_password_async, verify_password_async
 from app.gateway.auth.providers import AuthProvider
 from app.gateway.auth.repositories.base import UserRepository
 class LocalAuthProvider(AuthProvider):
    """Email/password authentication provider using local database."""
    def __init__(self, repository: UserRepository):
        """Initialize with a UserRepository.
        Args:
            repository: UserRepository implementation (SQLite)
        """
        self._repo = repository
    async def authenticate(self, credentials: dict) -> User | None:
        """Authenticate with email and password.
        Args:
            credentials: dict with 'email' and 'password' keys
        Returns:
            User if authentication succeeds, None otherwise
        """
        email = credentials.get("email")
        password = credentials.get("password")
        if not email or not password:
            return None
        user = await self._repo.get_user_by_email(email)
        if user is None:
            return None
        if user.password_hash is None:
            # OAuth user without local password
            return None
        if not await verify_password_async(password, user.password_hash):
            return None
        return user
    async def get_user(self, user_id: str) -> User | None:
        """Get user by ID."""
        return await self._repo.get_user_by_id(user_id)
    async def create_user(self, email: str, password: str | None = None, system_role: str = "user", needs_setup: bool = False) -> User:
        """Create a new local user.
        Args:
            email: User email address
            password: Plain text password (will be hashed)
            system_role: Role to assign ("admin" or "user")
            needs_setup: If True, user must complete setup on first login
        Returns:
            Created User instance
        """
        password_hash = await hash_password_async(password) if password else None
        user = User(
            email=email,
            password_hash=password_hash,
            system_role=system_role,
            needs_setup=needs_setup,
        )
        return await self._repo.create_user(user)
    async def get_user_by_oauth(self, provider: str, oauth_id: str) -> User | None:
        """Get user by OAuth provider and ID."""
        return await self._repo.get_user_by_oauth(provider, oauth_id)
    async def count_users(self) -> int:
        """Return total number of registered users."""
        return await self._repo.count_users()
    async def count_admin_users(self) -> int:
        """Return number of admin users."""
        return await self._repo.count_admin_users()
    async def update_user(self, user: User) -> User:
        """Update an existing user."""
        return await self._repo.update_user(user)
    async def get_user_by_email(self, email: str) -> User | None:
        """Get user by email."""
        return await self._repo.get_user_by_email(email)
@@ -0,0 +1,41 @@
 """User Pydantic models for authentication."""
 from datetime import UTC, datetime
 from typing import Literal
 from uuid import UUID, uuid4
 from pydantic import BaseModel, ConfigDict, EmailStr, Field
 def _utc_now() -> datetime:
    """Return current UTC time (timezone-aware)."""
    return datetime.now(UTC)
 class User(BaseModel):
    """Internal user representation."""
    model_config = ConfigDict(from_attributes=True)
    id: UUID = Field(default_factory=uuid4, description="Primary key")
    email: EmailStr = Field(..., description="Unique email address")
    password_hash: str | None = Field(None, description="bcrypt hash, nullable for OAuth users")
    system_role: Literal["admin", "user"] = Field(default="user")
    created_at: datetime = Field(default_factory=_utc_now)
    # OAuth linkage (optional)
    oauth_provider: str | None = Field(None, description="e.g. 'github', 'google'")
    oauth_id: str | None = Field(None, description="User ID from OAuth provider")
    # Auth lifecycle
    needs_setup: bool = Field(default=False, description="True for auto-created admin until setup completes")
    token_version: int = Field(default=0, description="Incremented on password change to invalidate old JWTs")
 class UserResponse(BaseModel):
    """Response model for user info endpoint."""
    id: str
    email: str
    system_role: Literal["admin", "user"]
    needs_setup: bool = False
@@ -0,0 +1,33 @@
 """Password hashing utilities using bcrypt directly."""
 import asyncio
 import bcrypt
 def hash_password(password: str) -> str:
    """Hash a password using bcrypt."""
    return bcrypt.hashpw(password.encode("utf-8"), bcrypt.gensalt()).decode("utf-8")
 def verify_password(plain_password: str, hashed_password: str) -> bool:
    """Verify a password against its hash."""
    return bcrypt.checkpw(plain_password.encode("utf-8"), hashed_password.encode("utf-8"))
 async def hash_password_async(password: str) -> str:
    """Hash a password using bcrypt (non-blocking).
    Wraps the blocking bcrypt operation in a thread pool to avoid
    blocking the event loop during password hashing.
    """
    return await asyncio.to_thread(hash_password, password)
 async def verify_password_async(plain_password: str, hashed_password: str) -> bool:
    """Verify a password against its hash (non-blocking).
    Wraps the blocking bcrypt operation in a thread pool to avoid
    blocking the event loop during password verification.
    """
    return await asyncio.to_thread(verify_password, plain_password, hashed_password)
@@ -0,0 +1,24 @@
 """Auth provider abstraction."""
 from abc import ABC, abstractmethod
 class AuthProvider(ABC):
    """Abstract base class for authentication providers."""
    @abstractmethod
    async def authenticate(self, credentials: dict) -> "User | None":
        """Authenticate user with given credentials.
        Returns User if authentication succeeds, None otherwise.
        """
        ...
    @abstractmethod
    async def get_user(self, user_id: str) -> "User | None":
        """Retrieve user by ID."""
        ...
 # Import User at runtime to avoid circular imports
 from app.gateway.auth.models import User  # noqa: E402
@@ -0,0 +1,102 @@
 """User repository interface for abstracting database operations."""
 from abc import ABC, abstractmethod
 from app.gateway.auth.models import User
 class UserNotFoundError(LookupError):
    """Raised when a user repository operation targets a non-existent row.
    Subclass of :class:`LookupError` so callers that already catch
    ``LookupError`` for "missing entity" can keep working unchanged,
    while specific call sites can pin to this class to distinguish
    "concurrent delete during update" from other lookups.
    """
 class UserRepository(ABC):
    """Abstract interface for user data storage.
    Implement this interface to support different storage backends
    (SQLite)
    """
    @abstractmethod
    async def create_user(self, user: User) -> User:
        """Create a new user.
        Args:
            user: User object to create
        Returns:
            Created User with ID assigned
        Raises:
            ValueError: If email already exists
        """
        ...
    @abstractmethod
    async def get_user_by_id(self, user_id: str) -> User | None:
        """Get user by ID.
        Args:
            user_id: User UUID as string
        Returns:
            User if found, None otherwise
        """
        ...
    @abstractmethod
    async def get_user_by_email(self, email: str) -> User | None:
        """Get user by email.
        Args:
            email: User email address
        Returns:
            User if found, None otherwise
        """
        ...
    @abstractmethod
    async def update_user(self, user: User) -> User:
        """Update an existing user.
        Args:
            user: User object with updated fields
        Returns:
            Updated User
        Raises:
            UserNotFoundError: If no row exists for ``user.id``. This is
                a hard failure (not a no-op) so callers cannot mistake a
                concurrent-delete race for a successful update.
        """
        ...
    @abstractmethod
    async def count_users(self) -> int:
        """Return total number of registered users."""
        ...
    @abstractmethod
    async def count_admin_users(self) -> int:
        """Return number of users with system_role == 'admin'."""
        ...
    @abstractmethod
    async def get_user_by_oauth(self, provider: str, oauth_id: str) -> User | None:
        """Get user by OAuth provider and ID.
        Args:
            provider: OAuth provider name (e.g. 'github', 'google')
            oauth_id: User ID from the OAuth provider
        Returns:
            User if found, None otherwise
        """
        ...
@@ -0,0 +1,127 @@
 """SQLAlchemy-backed UserRepository implementation.
 Uses the shared async session factory from
 ``deerflow.persistence.engine`` — the ``users`` table lives in the
 same database as ``threads_meta``, ``runs``, ``run_events``, and
 ``feedback``.
 Constructor takes the session factory directly (same pattern as the
 other four repositories in ``deerflow.persistence.*``). Callers
 construct this after ``init_engine_from_config()`` has run.
 """
 from __future__ import annotations
 from datetime import UTC
 from uuid import UUID
 from sqlalchemy import func, select
 from sqlalchemy.exc import IntegrityError
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
 from app.gateway.auth.models import User
 from app.gateway.auth.repositories.base import UserNotFoundError, UserRepository
 from deerflow.persistence.user.model import UserRow
 class SQLiteUserRepository(UserRepository):
    """Async user repository backed by the shared SQLAlchemy engine."""
    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
        self._sf = session_factory
    # ── Converters ────────────────────────────────────────────────────
    @staticmethod
    def _row_to_user(row: UserRow) -> User:
        return User(
            id=UUID(row.id),
            email=row.email,
            password_hash=row.password_hash,
            system_role=row.system_role,  # type: ignore[arg-type]
            # SQLite loses tzinfo on read; reattach UTC so downstream
            # code can compare timestamps reliably.
            created_at=row.created_at if row.created_at.tzinfo else row.created_at.replace(tzinfo=UTC),
            oauth_provider=row.oauth_provider,
            oauth_id=row.oauth_id,
            needs_setup=row.needs_setup,
            token_version=row.token_version,
        )
    @staticmethod
    def _user_to_row(user: User) -> UserRow:
        return UserRow(
            id=str(user.id),
            email=user.email,
            password_hash=user.password_hash,
            system_role=user.system_role,
            created_at=user.created_at,
            oauth_provider=user.oauth_provider,
            oauth_id=user.oauth_id,
            needs_setup=user.needs_setup,
            token_version=user.token_version,
        )
    # ── CRUD ──────────────────────────────────────────────────────────
    async def create_user(self, user: User) -> User:
        """Insert a new user. Raises ``ValueError`` on duplicate email."""
        row = self._user_to_row(user)
        async with self._sf() as session:
            session.add(row)
            try:
                await session.commit()
            except IntegrityError as exc:
                await session.rollback()
                raise ValueError(f"Email already registered: {user.email}") from exc
        return user
    async def get_user_by_id(self, user_id: str) -> User | None:
        async with self._sf() as session:
            row = await session.get(UserRow, user_id)
            return self._row_to_user(row) if row is not None else None
    async def get_user_by_email(self, email: str) -> User | None:
        stmt = select(UserRow).where(UserRow.email == email)
        async with self._sf() as session:
            result = await session.execute(stmt)
            row = result.scalar_one_or_none()
            return self._row_to_user(row) if row is not None else None
    async def update_user(self, user: User) -> User:
        async with self._sf() as session:
            row = await session.get(UserRow, str(user.id))
            if row is None:
                # Hard fail on concurrent delete: callers (reset_admin,
                # password change handlers, _ensure_admin_user) all
                # fetched the user just before this call, so a missing
                # row here means the row vanished underneath us. Silent
                # success would let the caller log "password reset" for
                # a row that no longer exists.
                raise UserNotFoundError(f"User {user.id} no longer exists")
            row.email = user.email
            row.password_hash = user.password_hash
            row.system_role = user.system_role
            row.oauth_provider = user.oauth_provider
            row.oauth_id = user.oauth_id
            row.needs_setup = user.needs_setup
            row.token_version = user.token_version
            await session.commit()
        return user
    async def count_users(self) -> int:
        stmt = select(func.count()).select_from(UserRow)
        async with self._sf() as session:
            return await session.scalar(stmt) or 0
    async def count_admin_users(self) -> int:
        stmt = select(func.count()).select_from(UserRow).where(UserRow.system_role == "admin")
        async with self._sf() as session:
            return await session.scalar(stmt) or 0
    async def get_user_by_oauth(self, provider: str, oauth_id: str) -> User | None:
        stmt = select(UserRow).where(UserRow.oauth_provider == provider, UserRow.oauth_id == oauth_id)
        async with self._sf() as session:
            result = await session.execute(stmt)
            row = result.scalar_one_or_none()
            return self._row_to_user(row) if row is not None else None
@@ -0,0 +1,91 @@
 """CLI tool to reset an admin password.
 Usage:
    python -m app.gateway.auth.reset_admin
    python -m app.gateway.auth.reset_admin --email admin@example.com
 Writes the new password to ``.deer-flow/admin_initial_credentials.txt``
 (mode 0600) instead of printing it, so CI / log aggregators never see
 the cleartext secret.
 """
 from __future__ import annotations
 import argparse
 import asyncio
 import secrets
 import sys
 from sqlalchemy import select
 from app.gateway.auth.credential_file import write_initial_credentials
 from app.gateway.auth.password import hash_password
 from app.gateway.auth.repositories.sqlite import SQLiteUserRepository
 from deerflow.persistence.user.model import UserRow
 async def _run(email: str | None) -> int:
    from deerflow.config import get_app_config
    from deerflow.persistence.engine import (
        close_engine,
        get_session_factory,
        init_engine_from_config,
    )
    config = get_app_config()
    await init_engine_from_config(config.database)
    try:
        sf = get_session_factory()
        if sf is None:
            print("Error: persistence engine not available (check config.database).", file=sys.stderr)
            return 1
        repo = SQLiteUserRepository(sf)
        if email:
            user = await repo.get_user_by_email(email)
        else:
            # Find first admin via direct SELECT — repository does not
            # expose a "first admin" helper and we do not want to add
            # one just for this CLI.
            async with sf() as session:
                stmt = select(UserRow).where(UserRow.system_role == "admin").limit(1)
                row = (await session.execute(stmt)).scalar_one_or_none()
            if row is None:
                user = None
            else:
                user = await repo.get_user_by_id(row.id)
        if user is None:
            if email:
                print(f"Error: user '{email}' not found.", file=sys.stderr)
            else:
                print("Error: no admin user found.", file=sys.stderr)
            return 1
        new_password = secrets.token_urlsafe(16)
        user.password_hash = hash_password(new_password)
        user.token_version += 1
        user.needs_setup = True
        await repo.update_user(user)
        cred_path = write_initial_credentials(user.email, new_password, label="reset")
        print(f"Password reset for: {user.email}")
        print(f"Credentials written to: {cred_path} (mode 0600)")
        print("Next login will require setup (new email + password).")
        return 0
    finally:
        await close_engine()
 def main() -> None:
    parser = argparse.ArgumentParser(description="Reset admin password")
    parser.add_argument("--email", help="Admin email (default: first admin found)")
    args = parser.parse_args()
    exit_code = asyncio.run(_run(args.email))
    sys.exit(exit_code)
 if __name__ == "__main__":
    main()
@@ -0,0 +1,118 @@
 """Global authentication middleware — fail-closed safety net.
 Rejects unauthenticated requests to non-public paths with 401. When a
 request passes the cookie check, resolves the JWT payload to a real
 ``User`` object and stamps it into both ``request.state.user`` and the
 ``deerflow.runtime.user_context`` contextvar so that repository-layer
 owner filtering works automatically via the sentinel pattern.
 Fine-grained permission checks remain in authz.py decorators.
 """
 from collections.abc import Callable
 from fastapi import HTTPException, Request, Response
 from starlette.middleware.base import BaseHTTPMiddleware
 from starlette.responses import JSONResponse
 from starlette.types import ASGIApp
 from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse
 from app.gateway.authz import _ALL_PERMISSIONS, AuthContext
 from deerflow.runtime.user_context import reset_current_user, set_current_user
 # Paths that never require authentication.
 _PUBLIC_PATH_PREFIXES: tuple[str, ...] = (
    "/health",
    "/docs",
    "/redoc",
    "/openapi.json",
 )
 # Exact auth paths that are public (login/register/status check).
 # /api/v1/auth/me, /api/v1/auth/change-password etc. are NOT public.
 _PUBLIC_EXACT_PATHS: frozenset[str] = frozenset(
    {
        "/api/v1/auth/login/local",
        "/api/v1/auth/register",
        "/api/v1/auth/logout",
        "/api/v1/auth/setup-status",
        "/api/v1/auth/initialize",
    }
 )
 def _is_public(path: str) -> bool:
    stripped = path.rstrip("/")
    if stripped in _PUBLIC_EXACT_PATHS:
        return True
    return any(path.startswith(prefix) for prefix in _PUBLIC_PATH_PREFIXES)
 class AuthMiddleware(BaseHTTPMiddleware):
    """Strict auth gate: reject requests without a valid session.
    Two-stage check for non-public paths:
    1. Cookie presence — return 401 NOT_AUTHENTICATED if missing
    2. JWT validation via ``get_optional_user_from_request`` — return 401
       TOKEN_INVALID if the token is absent, malformed, expired, or the
       signed user does not exist / is stale
    On success, stamps ``request.state.user`` and the
    ``deerflow.runtime.user_context`` contextvar so that repository-layer
    owner filters work downstream without every route needing a
    ``@require_auth`` decorator. Routes that need per-resource
    authorization (e.g. "user A cannot read user B's thread by guessing
    the URL") should additionally use ``@require_permission(...,
    owner_check=True)`` for explicit enforcement — but authentication
    itself is fully handled here.
    """
    def __init__(self, app: ASGIApp) -> None:
        super().__init__(app)
    async def dispatch(self, request: Request, call_next: Callable) -> Response:
        if _is_public(request.url.path):
            return await call_next(request)
        # Non-public path: require session cookie
        if not request.cookies.get("access_token"):
            return JSONResponse(
                status_code=401,
                content={
                    "detail": AuthErrorResponse(
                        code=AuthErrorCode.NOT_AUTHENTICATED,
                        message="Authentication required",
                    ).model_dump()
                },
            )
        # Strict JWT validation: reject junk/expired tokens with 401
        # right here instead of silently passing through. This closes
        # the "junk cookie bypass" gap (AUTH_TEST_PLAN test 7.5.8):
        # without this, non-isolation routes like /api/models would
        # accept any cookie-shaped string as authentication.
        #
        # We call the *strict* resolver so that fine-grained error
        # codes (token_expired, token_invalid, user_not_found, …)
        # propagate from AuthErrorCode, not get flattened into one
        # generic code. BaseHTTPMiddleware doesn't let HTTPException
        # bubble up, so we catch and render it as JSONResponse here.
        from app.gateway.deps import get_current_user_from_request
        try:
            user = await get_current_user_from_request(request)
        except HTTPException as exc:
            return JSONResponse(status_code=exc.status_code, content={"detail": exc.detail})
        # Stamp both request.state.user (for the contextvar pattern)
        # and request.state.auth (so @require_permission's "auth is
        # None" branch short-circuits instead of running the entire
        # JWT-decode + DB-lookup pipeline a second time per request).
        request.state.user = user
        request.state.auth = AuthContext(user=user, permissions=_ALL_PERMISSIONS)
        token = set_current_user(user)
        try:
            return await call_next(request)
        finally:
            reset_current_user(token)
@@ -0,0 +1,262 @@
 """Authorization decorators and context for DeerFlow.
 Inspired by LangGraph Auth system: https://github.com/langchain-ai/langgraph/blob/main/libs/sdk-py/langgraph_sdk/auth/__init__.py
 **Usage:**
 1. Use ``@require_auth`` on routes that need authentication
 2. Use ``@require_permission("resource", "action", filter_key=...)`` for permission checks
 3. The decorator chain processes from bottom to top
 **Example:**
    @router.get("/{thread_id}")
    @require_auth
    @require_permission("threads", "read", owner_check=True)
    async def get_thread(thread_id: str, request: Request):
        # User is authenticated and has threads:read permission
        ...
 **Permission Model:**
 - threads:read   - View thread
 - threads:write  - Create/update thread
 - threads:delete - Delete thread
 - runs:create   - Run agent
 - runs:read     - View run
 - runs:cancel   - Cancel run
 """
 from __future__ import annotations
 import functools
 from collections.abc import Callable
 from typing import TYPE_CHECKING, Any, ParamSpec, TypeVar
 from fastapi import HTTPException, Request
 if TYPE_CHECKING:
    from app.gateway.auth.models import User
 P = ParamSpec("P")
 T = TypeVar("T")
 # Permission constants
 class Permissions:
    """Permission constants for resource:action format."""
    # Threads
    THREADS_READ = "threads:read"
    THREADS_WRITE = "threads:write"
    THREADS_DELETE = "threads:delete"
    # Runs
    RUNS_CREATE = "runs:create"
    RUNS_READ = "runs:read"
    RUNS_CANCEL = "runs:cancel"
 class AuthContext:
    """Authentication context for the current request.
    Stored in request.state.auth after require_auth decoration.
    Attributes:
        user: The authenticated user, or None if anonymous
        permissions: List of permission strings (e.g., "threads:read")
    """
    __slots__ = ("user", "permissions")
    def __init__(self, user: User | None = None, permissions: list[str] | None = None):
        self.user = user
        self.permissions = permissions or []
    @property
    def is_authenticated(self) -> bool:
        """Check if user is authenticated."""
        return self.user is not None
    def has_permission(self, resource: str, action: str) -> bool:
        """Check if context has permission for resource:action.
        Args:
            resource: Resource name (e.g., "threads")
            action: Action name (e.g., "read")
        Returns:
            True if user has permission
        """
        permission = f"{resource}:{action}"
        return permission in self.permissions
    def require_user(self) -> User:
        """Get user or raise 401.
        Raises:
            HTTPException 401 if not authenticated
        """
        if not self.user:
            raise HTTPException(status_code=401, detail="Authentication required")
        return self.user
 def get_auth_context(request: Request) -> AuthContext | None:
    """Get AuthContext from request state."""
    return getattr(request.state, "auth", None)
 _ALL_PERMISSIONS: list[str] = [
    Permissions.THREADS_READ,
    Permissions.THREADS_WRITE,
    Permissions.THREADS_DELETE,
    Permissions.RUNS_CREATE,
    Permissions.RUNS_READ,
    Permissions.RUNS_CANCEL,
 ]
 async def _authenticate(request: Request) -> AuthContext:
    """Authenticate request and return AuthContext.
    Delegates to deps.get_optional_user_from_request() for the JWT→User pipeline.
    Returns AuthContext with user=None for anonymous requests.
    """
    from app.gateway.deps import get_optional_user_from_request
    user = await get_optional_user_from_request(request)
    if user is None:
        return AuthContext(user=None, permissions=[])
    # In future, permissions could be stored in user record
    return AuthContext(user=user, permissions=_ALL_PERMISSIONS)
 def require_auth[**P, T](func: Callable[P, T]) -> Callable[P, T]:
    """Decorator that authenticates the request and sets AuthContext.
    Must be placed ABOVE other decorators (executes after them).
    Usage:
        @router.get("/{thread_id}")
        @require_auth  # Bottom decorator (executes first after permission check)
        @require_permission("threads", "read")
        async def get_thread(thread_id: str, request: Request):
            auth: AuthContext = request.state.auth
            ...
    Raises:
        ValueError: If 'request' parameter is missing
    """
    @functools.wraps(func)
    async def wrapper(*args: Any, **kwargs: Any) -> Any:
        request = kwargs.get("request")
        if request is None:
            raise ValueError("require_auth decorator requires 'request' parameter")
        # Authenticate and set context
        auth_context = await _authenticate(request)
        request.state.auth = auth_context
        return await func(*args, **kwargs)
    return wrapper
 def require_permission(
    resource: str,
    action: str,
    owner_check: bool = False,
    require_existing: bool = False,
 ) -> Callable[[Callable[P, T]], Callable[P, T]]:
    """Decorator that checks permission for resource:action.
    Must be used AFTER @require_auth.
    Args:
        resource: Resource name (e.g., "threads", "runs")
        action: Action name (e.g., "read", "write", "delete")
        owner_check: If True, validates that the current user owns the resource.
                     Requires 'thread_id' path parameter and performs ownership check.
        require_existing: Only meaningful with ``owner_check=True``. If True, a
                          missing ``threads_meta`` row counts as a denial (404)
                          instead of "untracked legacy thread, allow". Use on
                          **destructive / mutating** routes (DELETE, PATCH,
                          state-update) so a deleted thread can't be re-targeted
                          by another user via the missing-row code path.
    Usage:
        # Read-style: legacy untracked threads are allowed
        @require_permission("threads", "read", owner_check=True)
        async def get_thread(thread_id: str, request: Request):
            ...
        # Destructive: thread row MUST exist and be owned by caller
        @require_permission("threads", "delete", owner_check=True, require_existing=True)
        async def delete_thread(thread_id: str, request: Request):
            ...
    Raises:
        HTTPException 401: If authentication required but user is anonymous
        HTTPException 403: If user lacks permission
        HTTPException 404: If owner_check=True but user doesn't own the thread
        ValueError: If owner_check=True but 'thread_id' parameter is missing
    """
    def decorator(func: Callable[P, T]) -> Callable[P, T]:
        @functools.wraps(func)
        async def wrapper(*args: Any, **kwargs: Any) -> Any:
            request = kwargs.get("request")
            if request is None:
                raise ValueError("require_permission decorator requires 'request' parameter")
            auth: AuthContext = getattr(request.state, "auth", None)
            if auth is None:
                auth = await _authenticate(request)
                request.state.auth = auth
            if not auth.is_authenticated:
                raise HTTPException(status_code=401, detail="Authentication required")
            # Check permission
            if not auth.has_permission(resource, action):
                raise HTTPException(
                    status_code=403,
                    detail=f"Permission denied: {resource}:{action}",
                )
            # Owner check for thread-specific resources.
            #
            # 2.0-rc moved thread metadata into the SQL persistence layer
            # (``threads_meta`` table). We verify ownership via
            # ``ThreadMetaStore.check_access``: it returns True for
            # missing rows (untracked legacy thread) and for rows whose
            # ``user_id`` is NULL (shared / pre-auth data), so this is
            # strict-deny rather than strict-allow — only an *existing*
            # row with a *different* user_id triggers 404.
            if owner_check:
                thread_id = kwargs.get("thread_id")
                if thread_id is None:
                    raise ValueError("require_permission with owner_check=True requires 'thread_id' parameter")
                from app.gateway.deps import get_thread_store
                thread_store = get_thread_store(request)
                allowed = await thread_store.check_access(
                    thread_id,
                    str(auth.user.id),
                    require_existing=require_existing,
                )
                if not allowed:
                    raise HTTPException(
                        status_code=404,
                        detail=f"Thread {thread_id} not found",
                    )
            return await func(*args, **kwargs)
        return wrapper
    return decorator
@@ -0,0 +1,113 @@
 """CSRF protection middleware for FastAPI.
 Per RFC-001:
 State-changing operations require CSRF protection.
 """
 import secrets
 from collections.abc import Callable
 from fastapi import Request, Response
 from starlette.middleware.base import BaseHTTPMiddleware
 from starlette.responses import JSONResponse
 from starlette.types import ASGIApp
 CSRF_COOKIE_NAME = "csrf_token"
 CSRF_HEADER_NAME = "X-CSRF-Token"
 CSRF_TOKEN_LENGTH = 64  # bytes
 def is_secure_request(request: Request) -> bool:
    """Detect whether the original client request was made over HTTPS."""
    return request.headers.get("x-forwarded-proto", request.url.scheme) == "https"
 def generate_csrf_token() -> str:
    """Generate a secure random CSRF token."""
    return secrets.token_urlsafe(CSRF_TOKEN_LENGTH)
 def should_check_csrf(request: Request) -> bool:
    """Determine if a request needs CSRF validation.
    CSRF is checked for state-changing methods (POST, PUT, DELETE, PATCH).
    GET, HEAD, OPTIONS, and TRACE are exempt per RFC 7231.
    """
    if request.method not in ("POST", "PUT", "DELETE", "PATCH"):
        return False
    path = request.url.path.rstrip("/")
    # Exempt /api/v1/auth/me endpoint
    if path == "/api/v1/auth/me":
        return False
    return True
 _AUTH_EXEMPT_PATHS: frozenset[str] = frozenset(
    {
        "/api/v1/auth/login/local",
        "/api/v1/auth/logout",
        "/api/v1/auth/register",
        "/api/v1/auth/initialize",
    }
 )
 def is_auth_endpoint(request: Request) -> bool:
    """Check if the request is to an auth endpoint.
    Auth endpoints don't need CSRF validation on first call (no token).
    """
    return request.url.path.rstrip("/") in _AUTH_EXEMPT_PATHS
 class CSRFMiddleware(BaseHTTPMiddleware):
    """Middleware that implements CSRF protection using Double Submit Cookie pattern."""
    def __init__(self, app: ASGIApp) -> None:
        super().__init__(app)
    async def dispatch(self, request: Request, call_next: Callable) -> Response:
        _is_auth = is_auth_endpoint(request)
        if should_check_csrf(request) and not _is_auth:
            cookie_token = request.cookies.get(CSRF_COOKIE_NAME)
            header_token = request.headers.get(CSRF_HEADER_NAME)
            if not cookie_token or not header_token:
                return JSONResponse(
                    status_code=403,
                    content={"detail": "CSRF token missing. Include X-CSRF-Token header."},
                )
            if not secrets.compare_digest(cookie_token, header_token):
                return JSONResponse(
                    status_code=403,
                    content={"detail": "CSRF token mismatch."},
                )
        response = await call_next(request)
        # For auth endpoints that set up session, also set CSRF cookie
        if _is_auth and request.method == "POST":
            # Generate a new CSRF token for the session
            csrf_token = generate_csrf_token()
            is_https = is_secure_request(request)
            response.set_cookie(
                key=CSRF_COOKIE_NAME,
                value=csrf_token,
                httponly=False,  # Must be JS-readable for Double Submit Cookie pattern
                secure=is_https,
                samesite="strict",
            )
        return response
 def get_csrf_token(request: Request) -> str | None:
    """Get the CSRF token from the current request's cookies.
    This is useful for server-side rendering where you need to embed
    token in forms or headers.
    """
    return request.cookies.get(CSRF_COOKIE_NAME)
@@ -8,12 +8,25 @@ Initialization is handled directly in ``app.py`` via :class:`AsyncExitStack`.
 from __future__ import annotations
-from collections.abc import AsyncGenerator
+from collections.abc import AsyncGenerator, Callable
 from contextlib import AsyncExitStack, asynccontextmanager
 from typing import TYPE_CHECKING, TypeVar, cast
 from fastapi import FastAPI, HTTPException, Request
 from langgraph.types import Checkpointer
-from deerflow.runtime import RunManager, StreamBridge
+from deerflow.persistence.feedback import FeedbackRepository
 from deerflow.runtime import RunContext, RunManager, StreamBridge
 from deerflow.runtime.events.store.base import RunEventStore
 from deerflow.runtime.runs.store.base import RunStore
 if TYPE_CHECKING:
    from app.gateway.auth.local_provider import LocalAuthProvider
    from app.gateway.auth.repositories.sqlite import SQLiteUserRepository
    from deerflow.persistence.thread_meta.base import ThreadMetaStore
 T = TypeVar("T")
@asynccontextmanager
@@ -25,15 +38,52 @@ async def langgraph_runtime(app: FastAPI) -> AsyncGenerator[None, None]:
        async with langgraph_runtime(app):
            yield
    """
-    from deerflow.agents.checkpointer.async_provider import make_checkpointer
+    from deerflow.config import get_app_config
    from deerflow.persistence.engine import close_engine, get_session_factory, init_engine_from_config
    from deerflow.runtime import make_store, make_stream_bridge
    from deerflow.runtime.checkpointer.async_provider import make_checkpointer
    from deerflow.runtime.events.store import make_run_event_store
    async with AsyncExitStack() as stack:
        app.state.stream_bridge = await stack.enter_async_context(make_stream_bridge())
        # Initialize persistence engine BEFORE checkpointer so that
        # auto-create-database logic runs first (postgres backend).
        config = get_app_config()
        await init_engine_from_config(config.database)
        app.state.checkpointer = await stack.enter_async_context(make_checkpointer())
        app.state.store = await stack.enter_async_context(make_store())
-        app.state.run_manager = RunManager()
+
-        yield
+        # Initialize repositories — one get_session_factory() call for all.
        sf = get_session_factory()
        if sf is not None:
            from deerflow.persistence.feedback import FeedbackRepository
            from deerflow.persistence.run import RunRepository
            app.state.run_store = RunRepository(sf)
            app.state.feedback_repo = FeedbackRepository(sf)
        else:
            from deerflow.runtime.runs.store.memory import MemoryRunStore
            app.state.run_store = MemoryRunStore()
            app.state.feedback_repo = None
        from deerflow.persistence.thread_meta import make_thread_store
        app.state.thread_store = make_thread_store(sf, app.state.store)
        # Run event store (has its own factory with config-driven backend selection)
        run_events_config = getattr(config, "run_events", None)
        app.state.run_event_store = make_run_event_store(run_events_config)
        # RunManager with store backing for persistence
        app.state.run_manager = RunManager(store=app.state.run_store)
        try:
            yield
        finally:
            await close_engine()
 # ---------------------------------------------------------------------------
@@ -41,30 +91,144 @@ async def langgraph_runtime(app: FastAPI) -> AsyncGenerator[None, None]:
 # ---------------------------------------------------------------------------
-def get_stream_bridge(request: Request) -> StreamBridge:
+def _require(attr: str, label: str) -> Callable[[Request], T]:
-    """Return the global :class:`StreamBridge`, or 503."""
+    """Create a FastAPI dependency that returns ``app.state.<attr>`` or 503."""
-    bridge = getattr(request.app.state, "stream_bridge", None)
+
-    if bridge is None:
+    def dep(request: Request) -> T:
-        raise HTTPException(status_code=503, detail="Stream bridge not available")
+        val = getattr(request.app.state, attr, None)
-    return bridge
+        if val is None:
            raise HTTPException(status_code=503, detail=f"{label} not available")
        return cast(T, val)
    dep.__name__ = dep.__qualname__ = f"get_{attr}"
    return dep
-def get_run_manager(request: Request) -> RunManager:
+get_stream_bridge: Callable[[Request], StreamBridge] = _require("stream_bridge", "Stream bridge")
-    """Return the global :class:`RunManager`, or 503."""
+get_run_manager: Callable[[Request], RunManager] = _require("run_manager", "Run manager")
-    mgr = getattr(request.app.state, "run_manager", None)
+get_checkpointer: Callable[[Request], Checkpointer] = _require("checkpointer", "Checkpointer")
-    if mgr is None:
+get_run_event_store: Callable[[Request], RunEventStore] = _require("run_event_store", "Run event store")
-        raise HTTPException(status_code=503, detail="Run manager not available")
+get_feedback_repo: Callable[[Request], FeedbackRepository] = _require("feedback_repo", "Feedback")
-    return mgr
+get_run_store: Callable[[Request], RunStore] = _require("run_store", "Run store")
 def get_checkpointer(request: Request):
    """Return the global checkpointer, or 503."""
    cp = getattr(request.app.state, "checkpointer", None)
    if cp is None:
        raise HTTPException(status_code=503, detail="Checkpointer not available")
    return cp
 def get_store(request: Request):
    """Return the global store (may be ``None`` if not configured)."""
    return getattr(request.app.state, "store", None)
 def get_thread_store(request: Request) -> ThreadMetaStore:
    """Return the thread metadata store (SQL or memory-backed)."""
    val = getattr(request.app.state, "thread_store", None)
    if val is None:
        raise HTTPException(status_code=503, detail="Thread metadata store not available")
    return val
 def get_run_context(request: Request) -> RunContext:
    """Build a :class:`RunContext` from ``app.state`` singletons.
    Returns a *base* context with infrastructure dependencies.
    """
    from deerflow.config import get_app_config
    return RunContext(
        checkpointer=get_checkpointer(request),
        store=get_store(request),
        event_store=get_run_event_store(request),
        run_events_config=getattr(get_app_config(), "run_events", None),
        thread_store=get_thread_store(request),
    )
 # ---------------------------------------------------------------------------
 # Auth helpers (used by authz.py and auth middleware)
 # ---------------------------------------------------------------------------
 # Cached singletons to avoid repeated instantiation per request
 _cached_local_provider: LocalAuthProvider | None = None
 _cached_repo: SQLiteUserRepository | None = None
 def get_local_provider() -> LocalAuthProvider:
    """Get or create the cached LocalAuthProvider singleton.
    Must be called after ``init_engine_from_config()`` — the shared
    session factory is required to construct the user repository.
    """
    global _cached_local_provider, _cached_repo
    if _cached_repo is None:
        from app.gateway.auth.repositories.sqlite import SQLiteUserRepository
        from deerflow.persistence.engine import get_session_factory
        sf = get_session_factory()
        if sf is None:
            raise RuntimeError("get_local_provider() called before init_engine_from_config(); cannot access users table")
        _cached_repo = SQLiteUserRepository(sf)
    if _cached_local_provider is None:
        from app.gateway.auth.local_provider import LocalAuthProvider
        _cached_local_provider = LocalAuthProvider(repository=_cached_repo)
    return _cached_local_provider
 async def get_current_user_from_request(request: Request):
    """Get the current authenticated user from the request cookie.
    Raises HTTPException 401 if not authenticated.
    """
    from app.gateway.auth import decode_token
    from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse, TokenError, token_error_to_code
    access_token = request.cookies.get("access_token")
    if not access_token:
        raise HTTPException(
            status_code=401,
            detail=AuthErrorResponse(code=AuthErrorCode.NOT_AUTHENTICATED, message="Not authenticated").model_dump(),
        )
    payload = decode_token(access_token)
    if isinstance(payload, TokenError):
        raise HTTPException(
            status_code=401,
            detail=AuthErrorResponse(code=token_error_to_code(payload), message=f"Token error: {payload.value}").model_dump(),
        )
    provider = get_local_provider()
    user = await provider.get_user(payload.sub)
    if user is None:
        raise HTTPException(
            status_code=401,
            detail=AuthErrorResponse(code=AuthErrorCode.USER_NOT_FOUND, message="User not found").model_dump(),
        )
    # Token version mismatch → password was changed, token is stale
    if user.token_version != payload.ver:
        raise HTTPException(
            status_code=401,
            detail=AuthErrorResponse(code=AuthErrorCode.TOKEN_INVALID, message="Token revoked (password changed)").model_dump(),
        )
    return user
 async def get_optional_user_from_request(request: Request):
    """Get optional authenticated user from request.
    Returns None if not authenticated.
    """
    try:
        return await get_current_user_from_request(request)
    except HTTPException:
        return None
 async def get_current_user(request: Request) -> str | None:
    """Extract user_id from request cookie, or None if not authenticated.
    Thin adapter that returns the string id for callers that only need
    identification (e.g., ``feedback.py``). Full-user callers should use
    ``get_current_user_from_request`` or ``get_optional_user_from_request``.
    """
    user = await get_optional_user_from_request(request)
    return str(user.id) if user else None
@@ -0,0 +1,106 @@
 """LangGraph Server auth handler — shares JWT logic with Gateway.
 Loaded by LangGraph Server via langgraph.json ``auth.path``.
 Reuses the same ``decode_token`` / ``get_auth_config`` as Gateway,
 so both modes validate tokens with the same secret and rules.
 Two layers:
  1. @auth.authenticate — validates JWT cookie, extracts user_id,
     and enforces CSRF on state-changing methods (POST/PUT/DELETE/PATCH)
  2. @auth.on — returns metadata filter so each user only sees own threads
 """
 import secrets
 from langgraph_sdk import Auth
 from app.gateway.auth.errors import TokenError
 from app.gateway.auth.jwt import decode_token
 from app.gateway.deps import get_local_provider
 auth = Auth()
 # Methods that require CSRF validation (state-changing per RFC 7231).
 _CSRF_METHODS = frozenset({"POST", "PUT", "DELETE", "PATCH"})
 def _check_csrf(request) -> None:
    """Enforce Double Submit Cookie CSRF check for state-changing requests.
    Mirrors Gateway's CSRFMiddleware logic so that LangGraph routes
    proxied directly by nginx have the same CSRF protection.
    """
    method = getattr(request, "method", "") or ""
    if method.upper() not in _CSRF_METHODS:
        return
    cookie_token = request.cookies.get("csrf_token")
    header_token = request.headers.get("x-csrf-token")
    if not cookie_token or not header_token:
        raise Auth.exceptions.HTTPException(
            status_code=403,
            detail="CSRF token missing. Include X-CSRF-Token header.",
        )
    if not secrets.compare_digest(cookie_token, header_token):
        raise Auth.exceptions.HTTPException(
            status_code=403,
            detail="CSRF token mismatch.",
        )
@auth.authenticate
 async def authenticate(request):
    """Validate the session cookie, decode JWT, and check token_version.
    Same validation chain as Gateway's get_current_user_from_request:
      cookie → decode JWT → DB lookup → token_version match
    Also enforces CSRF on state-changing methods.
    """
    # CSRF check before authentication so forged cross-site requests
    # are rejected early, even if the cookie carries a valid JWT.
    _check_csrf(request)
    token = request.cookies.get("access_token")
    if not token:
        raise Auth.exceptions.HTTPException(
            status_code=401,
            detail="Not authenticated",
        )
    payload = decode_token(token)
    if isinstance(payload, TokenError):
        raise Auth.exceptions.HTTPException(
            status_code=401,
            detail=f"Token error: {payload.value}",
        )
    user = await get_local_provider().get_user(payload.sub)
    if user is None:
        raise Auth.exceptions.HTTPException(
            status_code=401,
            detail="User not found",
        )
    if user.token_version != payload.ver:
        raise Auth.exceptions.HTTPException(
            status_code=401,
            detail="Token revoked (password changed)",
        )
    return payload.sub
@auth.on
 async def add_owner_filter(ctx: Auth.types.AuthContext, value: dict):
    """Inject user_id metadata on writes; filter by user_id on reads.
    Gateway stores thread ownership as ``metadata.user_id``.
    This handler ensures LangGraph Server enforces the same isolation.
    """
    # On create/update: stamp user_id into metadata
    metadata = value.setdefault("metadata", {})
    metadata["user_id"] = ctx.user.identity
    # Return filter dict — LangGraph applies it to search/read/delete
    return {"user_id": ctx.user.identity}
@@ -5,6 +5,7 @@ from pathlib import Path
 from fastapi import HTTPException
 from deerflow.config.paths import get_paths
 from deerflow.runtime.user_context import get_effective_user_id
 def resolve_thread_virtual_path(thread_id: str, virtual_path: str) -> Path:
@@ -22,7 +23,7 @@ def resolve_thread_virtual_path(thread_id: str, virtual_path: str) -> Path:
        HTTPException: If the path is invalid or outside allowed directories.
    """
    try:
-        return get_paths().resolve_virtual_path(thread_id, virtual_path)
+        return get_paths().resolve_virtual_path(thread_id, virtual_path, user_id=get_effective_user_id())
    except ValueError as e:
        status = 403 if "traversal" in str(e) else 400
        raise HTTPException(status_code=status, detail=str(e))
@@ -7,6 +7,7 @@ from urllib.parse import quote
 from fastapi import APIRouter, HTTPException, Request
 from fastapi.responses import FileResponse, PlainTextResponse, Response
 from app.gateway.authz import require_permission
 from app.gateway.path_utils import resolve_thread_virtual_path
 logger = logging.getLogger(__name__)
@@ -81,6 +82,7 @@ def _extract_file_from_skill_archive(zip_path: Path, internal_path: str) -> byte
    summary="Get Artifact File",
    description="Retrieve an artifact file generated by the AI agent. Text and binary files can be viewed inline, while active web content is always downloaded.",
 )
@require_permission("threads", "read", owner_check=True)
 async def get_artifact(thread_id: str, path: str, request: Request, download: bool = False) -> Response:
    """Get an artifact file by its path.
@@ -0,0 +1,459 @@
 """Authentication endpoints."""
 import logging
 import os
 import time
 from ipaddress import ip_address, ip_network
 from fastapi import APIRouter, Depends, HTTPException, Request, Response, status
 from fastapi.security import OAuth2PasswordRequestForm
 from pydantic import BaseModel, EmailStr, Field, field_validator
 from app.gateway.auth import (
    UserResponse,
    create_access_token,
 )
 from app.gateway.auth.config import get_auth_config
 from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse
 from app.gateway.csrf_middleware import is_secure_request
 from app.gateway.deps import get_current_user_from_request, get_local_provider
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api/v1/auth", tags=["auth"])
 # ── Request/Response Models ──────────────────────────────────────────────
 class LoginResponse(BaseModel):
    """Response model for login — token only lives in HttpOnly cookie."""
    expires_in: int  # seconds
    needs_setup: bool = False
 # Top common-password blocklist. Drawn from the public SecLists "10k worst
 # passwords" set, lowercased + length>=8 only (shorter ones already fail
 # the min_length check). Kept tight on purpose: this is the **lower bound**
 # defense, not a full HIBP / passlib check, and runs in-process per request.
 _COMMON_PASSWORDS: frozenset[str] = frozenset(
    {
        "password",
        "password1",
        "password12",
        "password123",
        "password1234",
        "12345678",
        "123456789",
        "1234567890",
        "qwerty12",
        "qwertyui",
        "qwerty123",
        "abc12345",
        "abcd1234",
        "iloveyou",
        "letmein1",
        "welcome1",
        "welcome123",
        "admin123",
        "administrator",
        "passw0rd",
        "p@ssw0rd",
        "monkey12",
        "trustno1",
        "sunshine",
        "princess",
        "football",
        "baseball",
        "superman",
        "batman123",
        "starwars",
        "dragon123",
        "master123",
        "shadow12",
        "michael1",
        "jennifer",
        "computer",
    }
 )
 def _password_is_common(password: str) -> bool:
    """Case-insensitive blocklist check.
    Lowercases the input so trivial mutations like ``Password`` /
    ``PASSWORD`` are also rejected. Does not normalize digit substitutions
    (``p@ssw0rd`` is included as a literal entry instead) — keeping the
    rule cheap and predictable.
    """
    return password.lower() in _COMMON_PASSWORDS
 def _validate_strong_password(value: str) -> str:
    """Pydantic field-validator body shared by Register + ChangePassword.
    Constraint = function, not type-level mixin. The two request models
    have no "is-a" relationship; they only share the password-strength
    rule. Lifting it into a free function lets each model bind it via
    ``@field_validator(field_name)`` without inheritance gymnastics.
    """
    if _password_is_common(value):
        raise ValueError("Password is too common; choose a stronger password.")
    return value
 class RegisterRequest(BaseModel):
    """Request model for user registration."""
    email: EmailStr
    password: str = Field(..., min_length=8)
    _strong_password = field_validator("password")(classmethod(lambda cls, v: _validate_strong_password(v)))
 class ChangePasswordRequest(BaseModel):
    """Request model for password change (also handles setup flow)."""
    current_password: str
    new_password: str = Field(..., min_length=8)
    new_email: EmailStr | None = None
    _strong_password = field_validator("new_password")(classmethod(lambda cls, v: _validate_strong_password(v)))
 class MessageResponse(BaseModel):
    """Generic message response."""
    message: str
 # ── Helpers ───────────────────────────────────────────────────────────────
 def _set_session_cookie(response: Response, token: str, request: Request) -> None:
    """Set the access_token HttpOnly cookie on the response."""
    config = get_auth_config()
    is_https = is_secure_request(request)
    response.set_cookie(
        key="access_token",
        value=token,
        httponly=True,
        secure=is_https,
        samesite="lax",
        max_age=config.token_expiry_days * 24 * 3600 if is_https else None,
    )
 # ── Rate Limiting ────────────────────────────────────────────────────────
 # In-process dict — not shared across workers. Sufficient for single-worker deployments.
 _MAX_LOGIN_ATTEMPTS = 5
 _LOCKOUT_SECONDS = 300  # 5 minutes
 # ip → (fail_count, lock_until_timestamp)
 _login_attempts: dict[str, tuple[int, float]] = {}
 def _trusted_proxies() -> list:
    """Parse ``AUTH_TRUSTED_PROXIES`` env var into a list of ip_network objects.
    Comma-separated CIDR or single-IP entries. Empty / unset = no proxy is
    trusted (direct mode). Invalid entries are skipped with a logger warning.
    Read live so env-var overrides take effect immediately and tests can
    ``monkeypatch.setenv`` without poking a module-level cache.
    """
    raw = os.getenv("AUTH_TRUSTED_PROXIES", "").strip()
    if not raw:
        return []
    nets = []
    for entry in raw.split(","):
        entry = entry.strip()
        if not entry:
            continue
        try:
            nets.append(ip_network(entry, strict=False))
        except ValueError:
            logger.warning("AUTH_TRUSTED_PROXIES: ignoring invalid entry %r", entry)
    return nets
 def _get_client_ip(request: Request) -> str:
    """Extract the real client IP for rate limiting.
    Trust model:
    - The TCP peer (``request.client.host``) is always the baseline. It is
      whatever the kernel reports as the connecting socket — unforgeable
      by the client itself.
    - ``X-Real-IP`` is **only** honored if the TCP peer is in the
      ``AUTH_TRUSTED_PROXIES`` allowlist (set via env var, comma-separated
      CIDR or single IPs). When set, the gateway is assumed to be behind a
      reverse proxy (nginx, Cloudflare, ALB, …) that overwrites
      ``X-Real-IP`` with the original client address.
    - With no ``AUTH_TRUSTED_PROXIES`` set, ``X-Real-IP`` is silently
      ignored — closing the bypass where any client could rotate the
      header to dodge per-IP rate limits in dev / direct-gateway mode.
    ``X-Forwarded-For`` is intentionally NOT used because it is naturally
    client-controlled at the *first* hop and the trust chain is harder to
    audit per-request.
    """
    peer_host = request.client.host if request.client else None
    trusted = _trusted_proxies()
    if trusted and peer_host:
        try:
            peer_ip = ip_address(peer_host)
            if any(peer_ip in net for net in trusted):
                real_ip = request.headers.get("x-real-ip", "").strip()
                if real_ip:
                    return real_ip
        except ValueError:
            # peer_host wasn't a parseable IP (e.g. "unknown") — fall through
            pass
    return peer_host or "unknown"
 def _check_rate_limit(ip: str) -> None:
    """Raise 429 if the IP is currently locked out."""
    record = _login_attempts.get(ip)
    if record is None:
        return
    fail_count, lock_until = record
    if fail_count >= _MAX_LOGIN_ATTEMPTS:
        if time.time() < lock_until:
            raise HTTPException(
                status_code=429,
                detail="Too many login attempts. Try again later.",
            )
        del _login_attempts[ip]
 _MAX_TRACKED_IPS = 10000
 def _record_login_failure(ip: str) -> None:
    """Record a failed login attempt for the given IP."""
    # Evict expired lockouts when dict grows too large
    if len(_login_attempts) >= _MAX_TRACKED_IPS:
        now = time.time()
        expired = [k for k, (c, t) in _login_attempts.items() if c >= _MAX_LOGIN_ATTEMPTS and now >= t]
        for k in expired:
            del _login_attempts[k]
        # If still too large, evict cheapest-to-lose half: below-threshold
        # IPs (lock_until=0.0) sort first, then earliest-expiring lockouts.
        if len(_login_attempts) >= _MAX_TRACKED_IPS:
            by_time = sorted(_login_attempts.items(), key=lambda kv: kv[1][1])
            for k, _ in by_time[: len(by_time) // 2]:
                del _login_attempts[k]
    record = _login_attempts.get(ip)
    if record is None:
        _login_attempts[ip] = (1, 0.0)
    else:
        new_count = record[0] + 1
        lock_until = time.time() + _LOCKOUT_SECONDS if new_count >= _MAX_LOGIN_ATTEMPTS else 0.0
        _login_attempts[ip] = (new_count, lock_until)
 def _record_login_success(ip: str) -> None:
    """Clear failure counter for the given IP on successful login."""
    _login_attempts.pop(ip, None)
 # ── Endpoints ─────────────────────────────────────────────────────────────
@router.post("/login/local", response_model=LoginResponse)
 async def login_local(
    request: Request,
    response: Response,
    form_data: OAuth2PasswordRequestForm = Depends(),
 ):
    """Local email/password login."""
    client_ip = _get_client_ip(request)
    _check_rate_limit(client_ip)
    user = await get_local_provider().authenticate({"email": form_data.username, "password": form_data.password})
    if user is None:
        _record_login_failure(client_ip)
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail=AuthErrorResponse(code=AuthErrorCode.INVALID_CREDENTIALS, message="Incorrect email or password").model_dump(),
        )
    _record_login_success(client_ip)
    token = create_access_token(str(user.id), token_version=user.token_version)
    _set_session_cookie(response, token, request)
    return LoginResponse(
        expires_in=get_auth_config().token_expiry_days * 24 * 3600,
        needs_setup=user.needs_setup,
    )
@router.post("/register", response_model=UserResponse, status_code=status.HTTP_201_CREATED)
 async def register(request: Request, response: Response, body: RegisterRequest):
    """Register a new user account (always 'user' role).
    Admin is auto-created on first boot. This endpoint creates regular users.
    Auto-login by setting the session cookie.
    """
    try:
        user = await get_local_provider().create_user(email=body.email, password=body.password, system_role="user")
    except ValueError:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=AuthErrorResponse(code=AuthErrorCode.EMAIL_ALREADY_EXISTS, message="Email already registered").model_dump(),
        )
    token = create_access_token(str(user.id), token_version=user.token_version)
    _set_session_cookie(response, token, request)
    return UserResponse(id=str(user.id), email=user.email, system_role=user.system_role)
@router.post("/logout", response_model=MessageResponse)
 async def logout(request: Request, response: Response):
    """Logout current user by clearing the cookie."""
    response.delete_cookie(key="access_token", secure=is_secure_request(request), samesite="lax")
    return MessageResponse(message="Successfully logged out")
@router.post("/change-password", response_model=MessageResponse)
 async def change_password(request: Request, response: Response, body: ChangePasswordRequest):
    """Change password for the currently authenticated user.
    Also handles the first-boot setup flow:
    - If new_email is provided, updates email (checks uniqueness)
    - If user.needs_setup is True and new_email is given, clears needs_setup
    - Always increments token_version to invalidate old sessions
    - Re-issues session cookie with new token_version
    """
    from app.gateway.auth.password import hash_password_async, verify_password_async
    user = await get_current_user_from_request(request)
    if user.password_hash is None:
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=AuthErrorResponse(code=AuthErrorCode.INVALID_CREDENTIALS, message="OAuth users cannot change password").model_dump())
    if not await verify_password_async(body.current_password, user.password_hash):
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=AuthErrorResponse(code=AuthErrorCode.INVALID_CREDENTIALS, message="Current password is incorrect").model_dump())
    provider = get_local_provider()
    # Update email if provided
    if body.new_email is not None:
        existing = await provider.get_user_by_email(body.new_email)
        if existing and str(existing.id) != str(user.id):
            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=AuthErrorResponse(code=AuthErrorCode.EMAIL_ALREADY_EXISTS, message="Email already in use").model_dump())
        user.email = body.new_email
    # Update password + bump version
    user.password_hash = await hash_password_async(body.new_password)
    user.token_version += 1
    # Clear setup flag if this is the setup flow
    if user.needs_setup and body.new_email is not None:
        user.needs_setup = False
    await provider.update_user(user)
    # Re-issue cookie with new token_version
    token = create_access_token(str(user.id), token_version=user.token_version)
    _set_session_cookie(response, token, request)
    return MessageResponse(message="Password changed successfully")
@router.get("/me", response_model=UserResponse)
 async def get_me(request: Request):
    """Get current authenticated user info."""
    user = await get_current_user_from_request(request)
    return UserResponse(id=str(user.id), email=user.email, system_role=user.system_role, needs_setup=user.needs_setup)
@router.get("/setup-status")
 async def setup_status():
    """Check if an admin account exists. Returns needs_setup=True when no admin exists."""
    admin_count = await get_local_provider().count_admin_users()
    return {"needs_setup": admin_count == 0}
 class InitializeAdminRequest(BaseModel):
    """Request model for first-boot admin account creation."""
    email: EmailStr
    password: str = Field(..., min_length=8)
    _strong_password = field_validator("password")(classmethod(lambda cls, v: _validate_strong_password(v)))
@router.post("/initialize", response_model=UserResponse, status_code=status.HTTP_201_CREATED)
 async def initialize_admin(request: Request, response: Response, body: InitializeAdminRequest):
    """Create the first admin account on initial system setup.
    Only callable when no admin exists. Returns 409 Conflict if an admin
    already exists.
    On success, the admin account is created with ``needs_setup=False`` and
    the session cookie is set.
    """
    admin_count = await get_local_provider().count_admin_users()
    if admin_count > 0:
        raise HTTPException(
            status_code=status.HTTP_409_CONFLICT,
            detail=AuthErrorResponse(code=AuthErrorCode.SYSTEM_ALREADY_INITIALIZED, message="System already initialized").model_dump(),
        )
    try:
        user = await get_local_provider().create_user(email=body.email, password=body.password, system_role="admin", needs_setup=False)
    except ValueError:
        # DB unique-constraint race: another concurrent request beat us.
        raise HTTPException(
            status_code=status.HTTP_409_CONFLICT,
            detail=AuthErrorResponse(code=AuthErrorCode.SYSTEM_ALREADY_INITIALIZED, message="System already initialized").model_dump(),
        )
    token = create_access_token(str(user.id), token_version=user.token_version)
    _set_session_cookie(response, token, request)
    return UserResponse(id=str(user.id), email=user.email, system_role=user.system_role)
 # ── OAuth Endpoints (Future/Placeholder) ─────────────────────────────────
@router.get("/oauth/{provider}")
 async def oauth_login(provider: str):
    """Initiate OAuth login flow.
    Redirects to the OAuth provider's authorization URL.
    Currently a placeholder - requires OAuth provider implementation.
    """
    if provider not in ["github", "google"]:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail=f"Unsupported OAuth provider: {provider}",
        )
    raise HTTPException(
        status_code=status.HTTP_501_NOT_IMPLEMENTED,
        detail="OAuth login not yet implemented",
    )
@router.get("/callback/{provider}")
 async def oauth_callback(provider: str, code: str, state: str):
    """OAuth callback endpoint.
    Handles the OAuth provider's callback after user authorization.
    Currently a placeholder.
    """
    raise HTTPException(
        status_code=status.HTTP_501_NOT_IMPLEMENTED,
        detail="OAuth callback not yet implemented",
    )
@@ -0,0 +1,188 @@
 """Feedback endpoints — create, list, stats, delete.
 Allows users to submit thumbs-up/down feedback on runs,
 optionally scoped to a specific message.
 """
 from __future__ import annotations
 import logging
 from typing import Any
 from fastapi import APIRouter, HTTPException, Request
 from pydantic import BaseModel, Field
 from app.gateway.authz import require_permission
 from app.gateway.deps import get_current_user, get_feedback_repo, get_run_store
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api/threads", tags=["feedback"])
 # ---------------------------------------------------------------------------
 # Request / response models
 # ---------------------------------------------------------------------------
 class FeedbackCreateRequest(BaseModel):
    rating: int = Field(..., description="Feedback rating: +1 (positive) or -1 (negative)")
    comment: str | None = Field(default=None, description="Optional text feedback")
    message_id: str | None = Field(default=None, description="Optional: scope feedback to a specific message")
 class FeedbackUpsertRequest(BaseModel):
    rating: int = Field(..., description="Feedback rating: +1 (positive) or -1 (negative)")
    comment: str | None = Field(default=None, description="Optional text feedback")
 class FeedbackResponse(BaseModel):
    feedback_id: str
    run_id: str
    thread_id: str
    user_id: str | None = None
    message_id: str | None = None
    rating: int
    comment: str | None = None
    created_at: str = ""
 class FeedbackStatsResponse(BaseModel):
    run_id: str
    total: int = 0
    positive: int = 0
    negative: int = 0
 # ---------------------------------------------------------------------------
 # Endpoints
 # ---------------------------------------------------------------------------
@router.put("/{thread_id}/runs/{run_id}/feedback", response_model=FeedbackResponse)
@require_permission("threads", "write", owner_check=True, require_existing=True)
 async def upsert_feedback(
    thread_id: str,
    run_id: str,
    body: FeedbackUpsertRequest,
    request: Request,
 ) -> dict[str, Any]:
    """Create or update feedback for a run (idempotent)."""
    if body.rating not in (1, -1):
        raise HTTPException(status_code=400, detail="rating must be +1 or -1")
    user_id = await get_current_user(request)
    run_store = get_run_store(request)
    run = await run_store.get(run_id)
    if run is None:
        raise HTTPException(status_code=404, detail=f"Run {run_id} not found")
    if run.get("thread_id") != thread_id:
        raise HTTPException(status_code=404, detail=f"Run {run_id} not found in thread {thread_id}")
    feedback_repo = get_feedback_repo(request)
    return await feedback_repo.upsert(
        run_id=run_id,
        thread_id=thread_id,
        rating=body.rating,
        user_id=user_id,
        comment=body.comment,
    )
@router.delete("/{thread_id}/runs/{run_id}/feedback")
@require_permission("threads", "delete", owner_check=True, require_existing=True)
 async def delete_run_feedback(
    thread_id: str,
    run_id: str,
    request: Request,
 ) -> dict[str, bool]:
    """Delete the current user's feedback for a run."""
    user_id = await get_current_user(request)
    feedback_repo = get_feedback_repo(request)
    deleted = await feedback_repo.delete_by_run(
        thread_id=thread_id,
        run_id=run_id,
        user_id=user_id,
    )
    if not deleted:
        raise HTTPException(status_code=404, detail="No feedback found for this run")
    return {"success": True}
@router.post("/{thread_id}/runs/{run_id}/feedback", response_model=FeedbackResponse)
@require_permission("threads", "write", owner_check=True, require_existing=True)
 async def create_feedback(
    thread_id: str,
    run_id: str,
    body: FeedbackCreateRequest,
    request: Request,
 ) -> dict[str, Any]:
    """Submit feedback (thumbs-up/down) for a run."""
    if body.rating not in (1, -1):
        raise HTTPException(status_code=400, detail="rating must be +1 or -1")
    user_id = await get_current_user(request)
    # Validate run exists and belongs to thread
    run_store = get_run_store(request)
    run = await run_store.get(run_id)
    if run is None:
        raise HTTPException(status_code=404, detail=f"Run {run_id} not found")
    if run.get("thread_id") != thread_id:
        raise HTTPException(status_code=404, detail=f"Run {run_id} not found in thread {thread_id}")
    feedback_repo = get_feedback_repo(request)
    return await feedback_repo.create(
        run_id=run_id,
        thread_id=thread_id,
        rating=body.rating,
        user_id=user_id,
        message_id=body.message_id,
        comment=body.comment,
    )
@router.get("/{thread_id}/runs/{run_id}/feedback", response_model=list[FeedbackResponse])
@require_permission("threads", "read", owner_check=True)
 async def list_feedback(
    thread_id: str,
    run_id: str,
    request: Request,
 ) -> list[dict[str, Any]]:
    """List all feedback for a run."""
    feedback_repo = get_feedback_repo(request)
    return await feedback_repo.list_by_run(thread_id, run_id)
@router.get("/{thread_id}/runs/{run_id}/feedback/stats", response_model=FeedbackStatsResponse)
@require_permission("threads", "read", owner_check=True)
 async def feedback_stats(
    thread_id: str,
    run_id: str,
    request: Request,
 ) -> dict[str, Any]:
    """Get aggregated feedback stats (positive/negative counts) for a run."""
    feedback_repo = get_feedback_repo(request)
    return await feedback_repo.aggregate_by_run(thread_id, run_id)
@router.delete("/{thread_id}/runs/{run_id}/feedback/{feedback_id}")
@require_permission("threads", "delete", owner_check=True, require_existing=True)
 async def delete_feedback(
    thread_id: str,
    run_id: str,
    feedback_id: str,
    request: Request,
 ) -> dict[str, bool]:
    """Delete a feedback record."""
    feedback_repo = get_feedback_repo(request)
    # Verify feedback belongs to the specified thread/run before deleting
    existing = await feedback_repo.get(feedback_id)
    if existing is None:
        raise HTTPException(status_code=404, detail=f"Feedback {feedback_id} not found")
    if existing.get("thread_id") != thread_id or existing.get("run_id") != run_id:
        raise HTTPException(status_code=404, detail=f"Feedback {feedback_id} not found in run {run_id}")
    deleted = await feedback_repo.delete(feedback_id)
    if not deleted:
        raise HTTPException(status_code=404, detail=f"Feedback {feedback_id} not found")
    return {"success": True}
@@ -13,6 +13,7 @@ from deerflow.agents.memory.updater import (
    update_memory_fact,
 )
 from deerflow.config.memory_config import get_memory_config
 from deerflow.runtime.user_context import get_effective_user_id
 router = APIRouter(prefix="/api", tags=["memory"])
@@ -147,7 +148,7 @@ async def get_memory() -> MemoryResponse:
        }
        ```
    """
-    memory_data = get_memory_data()
+    memory_data = get_memory_data(user_id=get_effective_user_id())
    return MemoryResponse(**memory_data)
@@ -167,7 +168,7 @@ async def reload_memory() -> MemoryResponse:
    Returns:
        The reloaded memory data.
    """
-    memory_data = reload_memory_data()
+    memory_data = reload_memory_data(user_id=get_effective_user_id())
    return MemoryResponse(**memory_data)
@@ -181,7 +182,7 @@ async def reload_memory() -> MemoryResponse:
 async def clear_memory() -> MemoryResponse:
    """Clear all persisted memory data."""
    try:
-        memory_data = clear_memory_data()
+        memory_data = clear_memory_data(user_id=get_effective_user_id())
    except OSError as exc:
        raise HTTPException(status_code=500, detail="Failed to clear memory data.") from exc
@@ -202,6 +203,7 @@ async def create_memory_fact_endpoint(request: FactCreateRequest) -> MemoryRespo
            content=request.content,
            category=request.category,
            confidence=request.confidence,
            user_id=get_effective_user_id(),
        )
    except ValueError as exc:
        raise _map_memory_fact_value_error(exc) from exc
@@ -221,7 +223,7 @@ async def create_memory_fact_endpoint(request: FactCreateRequest) -> MemoryRespo
 async def delete_memory_fact_endpoint(fact_id: str) -> MemoryResponse:
    """Delete a single fact from memory by fact id."""
    try:
-        memory_data = delete_memory_fact(fact_id)
+        memory_data = delete_memory_fact(fact_id, user_id=get_effective_user_id())
    except KeyError as exc:
        raise HTTPException(status_code=404, detail=f"Memory fact '{fact_id}' not found.") from exc
    except OSError as exc:
@@ -245,6 +247,7 @@ async def update_memory_fact_endpoint(fact_id: str, request: FactPatchRequest) -
            content=request.content,
            category=request.category,
            confidence=request.confidence,
            user_id=get_effective_user_id(),
        )
    except ValueError as exc:
        raise _map_memory_fact_value_error(exc) from exc
@@ -265,7 +268,7 @@ async def update_memory_fact_endpoint(fact_id: str, request: FactPatchRequest) -
 )
 async def export_memory() -> MemoryResponse:
    """Export the current memory data."""
-    memory_data = get_memory_data()
+    memory_data = get_memory_data(user_id=get_effective_user_id())
    return MemoryResponse(**memory_data)
@@ -279,7 +282,7 @@ async def export_memory() -> MemoryResponse:
 async def import_memory(request: MemoryResponse) -> MemoryResponse:
    """Import and persist memory data."""
    try:
-        memory_data = import_memory_data(request.model_dump())
+        memory_data = import_memory_data(request.model_dump(), user_id=get_effective_user_id())
    except OSError as exc:
        raise HTTPException(status_code=500, detail="Failed to import memory data.") from exc
@@ -337,7 +340,7 @@ async def get_memory_status() -> MemoryStatusResponse:
        Combined memory configuration and current data.
    """
    config = get_memory_config()
-    memory_data = get_memory_data()
+    memory_data = get_memory_data(user_id=get_effective_user_id())
    return MemoryStatusResponse(
        config=MemoryConfigResponse(
@@ -11,10 +11,11 @@ import asyncio
 import logging
 import uuid
-from fastapi import APIRouter, Request
+from fastapi import APIRouter, HTTPException, Query, Request
 from fastapi.responses import StreamingResponse
-from app.gateway.deps import get_checkpointer, get_run_manager, get_stream_bridge
+from app.gateway.authz import require_permission
 from app.gateway.deps import get_checkpointer, get_feedback_repo, get_run_event_store, get_run_manager, get_run_store, get_stream_bridge
 from app.gateway.routers.thread_runs import RunCreateRequest
 from app.gateway.services import sse_consumer, start_run
 from deerflow.runtime import serialize_channel_values
@@ -85,3 +86,58 @@ async def stateless_wait(body: RunCreateRequest, request: Request) -> dict:
        logger.exception("Failed to fetch final state for run %s", record.run_id)
    return {"status": record.status.value, "error": record.error}
 # ---------------------------------------------------------------------------
 # Run-scoped read endpoints
 # ---------------------------------------------------------------------------
 async def _resolve_run(run_id: str, request: Request) -> dict:
    """Fetch run by run_id with user ownership check. Raises 404 if not found."""
    run_store = get_run_store(request)
    record = await run_store.get(run_id)  # user_id=AUTO filters by contextvar
    if record is None:
        raise HTTPException(status_code=404, detail=f"Run {run_id} not found")
    return record
@router.get("/{run_id}/messages")
@require_permission("runs", "read")
 async def run_messages(
    run_id: str,
    request: Request,
    limit: int = Query(default=50, le=200, ge=1),
    before_seq: int | None = Query(default=None),
    after_seq: int | None = Query(default=None),
 ) -> dict:
    """Return paginated messages for a run (cursor-based).
    Pagination:
    - after_seq: messages with seq > after_seq (forward)
    - before_seq: messages with seq < before_seq (backward)
    - neither: latest messages
    Response: { data: [...], has_more: bool }
    """
    run = await _resolve_run(run_id, request)
    event_store = get_run_event_store(request)
    rows = await event_store.list_messages_by_run(
        run["thread_id"],
        run_id,
        limit=limit + 1,
        before_seq=before_seq,
        after_seq=after_seq,
    )
    has_more = len(rows) > limit
    data = rows[:limit] if has_more else rows
    return {"data": data, "has_more": has_more}
@router.get("/{run_id}/feedback")
@require_permission("runs", "read")
 async def run_feedback(run_id: str, request: Request) -> list[dict]:
    """Return all feedback for a run."""
    run = await _resolve_run(run_id, request)
    feedback_repo = get_feedback_repo(request)
    return await feedback_repo.list_by_run(run["thread_id"], run_id)
@@ -1,10 +1,11 @@
 import json
 import logging
-from fastapi import APIRouter
+from fastapi import APIRouter, Request
 from langchain_core.messages import HumanMessage, SystemMessage
 from pydantic import BaseModel, Field
 from app.gateway.authz import require_permission
 from deerflow.models import create_chat_model
 logger = logging.getLogger(__name__)
@@ -98,12 +99,13 @@ def _format_conversation(messages: list[SuggestionMessage]) -> str:
    summary="Generate Follow-up Questions",
    description="Generate short follow-up questions a user might ask next, based on recent conversation context.",
 )
-async def generate_suggestions(thread_id: str, request: SuggestionsRequest) -> SuggestionsResponse:
+@require_permission("threads", "read", owner_check=True)
-    if not request.messages:
+async def generate_suggestions(thread_id: str, body: SuggestionsRequest, request: Request) -> SuggestionsResponse:
    if not body.messages:
        return SuggestionsResponse(suggestions=[])
-    n = request.n
+    n = body.n
-    conversation = _format_conversation(request.messages)
+    conversation = _format_conversation(body.messages)
    if not conversation:
        return SuggestionsResponse(suggestions=[])
@@ -120,7 +122,7 @@ async def generate_suggestions(thread_id: str, request: SuggestionsRequest) -> S
    user_content = f"Conversation Context:\n{conversation}\n\nGenerate {n} follow-up questions"
    try:
-        model = create_chat_model(name=request.model_name, thinking_enabled=False)
+        model = create_chat_model(name=body.model_name, thinking_enabled=False)
        response = await model.ainvoke([SystemMessage(content=system_instruction), HumanMessage(content=user_content)])
        raw = _extract_response_text(response.content)
        suggestions = _parse_json_string_list(raw) or []
@@ -19,7 +19,8 @@ from fastapi import APIRouter, HTTPException, Query, Request
 from fastapi.responses import Response, StreamingResponse
 from pydantic import BaseModel, Field
-from app.gateway.deps import get_checkpointer, get_run_manager, get_stream_bridge
+from app.gateway.authz import require_permission
 from app.gateway.deps import get_checkpointer, get_current_user, get_feedback_repo, get_run_event_store, get_run_manager, get_run_store, get_stream_bridge
 from app.gateway.services import sse_consumer, start_run
 from deerflow.runtime import RunRecord, serialize_channel_values
@@ -92,6 +93,7 @@ def _record_to_response(record: RunRecord) -> RunResponse:
@router.post("/{thread_id}/runs", response_model=RunResponse)
@require_permission("runs", "create", owner_check=True, require_existing=True)
 async def create_run(thread_id: str, body: RunCreateRequest, request: Request) -> RunResponse:
    """Create a background run (returns immediately)."""
    record = await start_run(body, thread_id, request)
@@ -99,6 +101,7 @@ async def create_run(thread_id: str, body: RunCreateRequest, request: Request) -
@router.post("/{thread_id}/runs/stream")
@require_permission("runs", "create", owner_check=True, require_existing=True)
 async def stream_run(thread_id: str, body: RunCreateRequest, request: Request) -> StreamingResponse:
    """Create a run and stream events via SSE.
@@ -126,6 +129,7 @@ async def stream_run(thread_id: str, body: RunCreateRequest, request: Request) -
@router.post("/{thread_id}/runs/wait", response_model=dict)
@require_permission("runs", "create", owner_check=True, require_existing=True)
 async def wait_run(thread_id: str, body: RunCreateRequest, request: Request) -> dict:
    """Create a run and block until it completes, returning the final state."""
    record = await start_run(body, thread_id, request)
@@ -151,6 +155,7 @@ async def wait_run(thread_id: str, body: RunCreateRequest, request: Request) ->
@router.get("/{thread_id}/runs", response_model=list[RunResponse])
@require_permission("runs", "read", owner_check=True)
 async def list_runs(thread_id: str, request: Request) -> list[RunResponse]:
    """List all runs for a thread."""
    run_mgr = get_run_manager(request)
@@ -159,6 +164,7 @@ async def list_runs(thread_id: str, request: Request) -> list[RunResponse]:
@router.get("/{thread_id}/runs/{run_id}", response_model=RunResponse)
@require_permission("runs", "read", owner_check=True)
 async def get_run(thread_id: str, run_id: str, request: Request) -> RunResponse:
    """Get details of a specific run."""
    run_mgr = get_run_manager(request)
@@ -169,6 +175,7 @@ async def get_run(thread_id: str, run_id: str, request: Request) -> RunResponse:
@router.post("/{thread_id}/runs/{run_id}/cancel")
@require_permission("runs", "cancel", owner_check=True, require_existing=True)
 async def cancel_run(
    thread_id: str,
    run_id: str,
@@ -206,6 +213,7 @@ async def cancel_run(
@router.get("/{thread_id}/runs/{run_id}/join")
@require_permission("runs", "read", owner_check=True)
 async def join_run(thread_id: str, run_id: str, request: Request) -> StreamingResponse:
    """Join an existing run's SSE stream."""
    bridge = get_stream_bridge(request)
@@ -226,6 +234,7 @@ async def join_run(thread_id: str, run_id: str, request: Request) -> StreamingRe
@router.api_route("/{thread_id}/runs/{run_id}/stream", methods=["GET", "POST"], response_model=None)
@require_permission("runs", "read", owner_check=True)
 async def stream_existing_run(
    thread_id: str,
    run_id: str,
@@ -265,3 +274,104 @@ async def stream_existing_run(
            "X-Accel-Buffering": "no",
        },
    )
 # ---------------------------------------------------------------------------
 # Messages / Events / Token usage endpoints
 # ---------------------------------------------------------------------------
@router.get("/{thread_id}/messages")
@require_permission("runs", "read", owner_check=True)
 async def list_thread_messages(
    thread_id: str,
    request: Request,
    limit: int = Query(default=50, le=200),
    before_seq: int | None = Query(default=None),
    after_seq: int | None = Query(default=None),
 ) -> list[dict]:
    """Return displayable messages for a thread (across all runs), with feedback attached."""
    event_store = get_run_event_store(request)
    messages = await event_store.list_messages(thread_id, limit=limit, before_seq=before_seq, after_seq=after_seq)
    # Attach feedback to the last AI message of each run
    feedback_repo = get_feedback_repo(request)
    user_id = await get_current_user(request)
    feedback_map = await feedback_repo.list_by_thread_grouped(thread_id, user_id=user_id)
    # Find the last ai_message per run_id
    last_ai_per_run: dict[str, int] = {}  # run_id -> index in messages list
    for i, msg in enumerate(messages):
        if msg.get("event_type") == "ai_message":
            last_ai_per_run[msg["run_id"]] = i
    # Attach feedback field
    last_ai_indices = set(last_ai_per_run.values())
    for i, msg in enumerate(messages):
        if i in last_ai_indices:
            run_id = msg["run_id"]
            fb = feedback_map.get(run_id)
            msg["feedback"] = (
                {
                    "feedback_id": fb["feedback_id"],
                    "rating": fb["rating"],
                    "comment": fb.get("comment"),
                }
                if fb
                else None
            )
        else:
            msg["feedback"] = None
    return messages
@router.get("/{thread_id}/runs/{run_id}/messages")
@require_permission("runs", "read", owner_check=True)
 async def list_run_messages(
    thread_id: str,
    run_id: str,
    request: Request,
    limit: int = Query(default=50, le=200, ge=1),
    before_seq: int | None = Query(default=None),
    after_seq: int | None = Query(default=None),
 ) -> dict:
    """Return paginated messages for a specific run.
    Response: { data: [...], has_more: bool }
    """
    event_store = get_run_event_store(request)
    rows = await event_store.list_messages_by_run(
        thread_id,
        run_id,
        limit=limit + 1,
        before_seq=before_seq,
        after_seq=after_seq,
    )
    has_more = len(rows) > limit
    data = rows[:limit] if has_more else rows
    return {"data": data, "has_more": has_more}
@router.get("/{thread_id}/runs/{run_id}/events")
@require_permission("runs", "read", owner_check=True)
 async def list_run_events(
    thread_id: str,
    run_id: str,
    request: Request,
    event_types: str | None = Query(default=None),
    limit: int = Query(default=500, le=2000),
 ) -> list[dict]:
    """Return the full event stream for a run (debug/audit)."""
    event_store = get_run_event_store(request)
    types = event_types.split(",") if event_types else None
    return await event_store.list_events(thread_id, run_id, event_types=types, limit=limit)
@router.get("/{thread_id}/token-usage")
@require_permission("threads", "read", owner_check=True)
 async def thread_token_usage(thread_id: str, request: Request) -> dict:
    """Thread-level token usage aggregation."""
    run_store = get_run_store(request)
    agg = await run_store.aggregate_tokens_by_thread(thread_id)
    return {"thread_id": thread_id, **agg}
@@ -13,28 +13,41 @@ matching the LangGraph Platform wire format expected by the
 from __future__ import annotations
 import logging
 import re
 import time
 import uuid
 from typing import Any
 from fastapi import APIRouter, HTTPException, Request
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, field_validator
-from app.gateway.deps import get_checkpointer, get_store
+from app.gateway.authz import require_permission
 from app.gateway.deps import get_checkpointer
 from app.gateway.utils import sanitize_log_param
 from deerflow.config.paths import Paths, get_paths
 from deerflow.runtime import serialize_channel_values
-
+from deerflow.runtime.user_context import get_effective_user_id
 # ---------------------------------------------------------------------------
 # Store namespace
 # ---------------------------------------------------------------------------
 THREADS_NS: tuple[str, ...] = ("threads",)
 """Namespace used by the Store for thread metadata records."""
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api/threads", tags=["threads"])
 # Metadata keys that the server controls; clients are not allowed to set
 # them. Pydantic ``@field_validator("metadata")`` strips them on every
 # inbound model below so a malicious client cannot reflect a forged
 # owner identity through the API surface. Defense-in-depth — the
 # row-level invariant is still ``threads_meta.user_id`` populated from
 # the auth contextvar; this list closes the metadata-blob echo gap.
 _SERVER_RESERVED_METADATA_KEYS: frozenset[str] = frozenset({"owner_id", "user_id"})
 def _strip_reserved_metadata(metadata: dict[str, Any] | None) -> dict[str, Any]:
    """Return ``metadata`` with server-controlled keys removed."""
    if not metadata:
        return metadata or {}
    return {k: v for k, v in metadata.items() if k not in _SERVER_RESERVED_METADATA_KEYS}
 # ---------------------------------------------------------------------------
 # Response / request models
 # ---------------------------------------------------------------------------
@@ -63,8 +76,11 @@ class ThreadCreateRequest(BaseModel):
    """Request body for creating a thread."""
    thread_id: str | None = Field(default=None, description="Optional thread ID (auto-generated if omitted)")
    assistant_id: str | None = Field(default=None, description="Associate thread with an assistant")
    metadata: dict[str, Any] = Field(default_factory=dict, description="Initial metadata")
    _strip_reserved = field_validator("metadata")(classmethod(lambda cls, v: _strip_reserved_metadata(v)))
 class ThreadSearchRequest(BaseModel):
    """Request body for searching threads."""
@@ -93,6 +109,8 @@ class ThreadPatchRequest(BaseModel):
    metadata: dict[str, Any] = Field(default_factory=dict, description="Metadata to merge")
    _strip_reserved = field_validator("metadata")(classmethod(lambda cls, v: _strip_reserved_metadata(v)))
 class ThreadStateUpdateRequest(BaseModel):
    """Request body for updating thread state (human-in-the-loop resume)."""
@@ -126,70 +144,25 @@ class ThreadHistoryRequest(BaseModel):
 # ---------------------------------------------------------------------------
-def _delete_thread_data(thread_id: str, paths: Paths | None = None) -> ThreadDeleteResponse:
+def _delete_thread_data(thread_id: str, paths: Paths | None = None, *, user_id: str | None = None) -> ThreadDeleteResponse:
    """Delete local persisted filesystem data for a thread."""
    path_manager = paths or get_paths()
    try:
-        path_manager.delete_thread_dir(thread_id)
+        path_manager.delete_thread_dir(thread_id, user_id=user_id)
    except ValueError as exc:
        raise HTTPException(status_code=422, detail=str(exc)) from exc
    except FileNotFoundError:
        # Not critical — thread data may not exist on disk
-        logger.debug("No local thread data to delete for %s", thread_id)
+        logger.debug("No local thread data to delete for %s", sanitize_log_param(thread_id))
        return ThreadDeleteResponse(success=True, message=f"No local data for {thread_id}")
    except Exception as exc:
-        logger.exception("Failed to delete thread data for %s", thread_id)
+        logger.exception("Failed to delete thread data for %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to delete local thread data.") from exc
-    logger.info("Deleted local thread data for %s", thread_id)
+    logger.info("Deleted local thread data for %s", sanitize_log_param(thread_id))
    return ThreadDeleteResponse(success=True, message=f"Deleted local thread data for {thread_id}")
 async def _store_get(store, thread_id: str) -> dict | None:
    """Fetch a thread record from the Store; returns ``None`` if absent."""
    item = await store.aget(THREADS_NS, thread_id)
    return item.value if item is not None else None
 async def _store_put(store, record: dict) -> None:
    """Write a thread record to the Store."""
    await store.aput(THREADS_NS, record["thread_id"], record)
 async def _store_upsert(store, thread_id: str, *, metadata: dict | None = None, values: dict | None = None) -> None:
    """Create or refresh a thread record in the Store.
    On creation the record is written with ``status="idle"``.  On update only
    ``updated_at`` (and optionally ``metadata`` / ``values``) are changed so
    that existing fields are preserved.
    ``values`` carries the agent-state snapshot exposed to the frontend
    (currently just ``{"title": "..."}``).
    """
    now = time.time()
    existing = await _store_get(store, thread_id)
    if existing is None:
        await _store_put(
            store,
            {
                "thread_id": thread_id,
                "status": "idle",
                "created_at": now,
                "updated_at": now,
                "metadata": metadata or {},
                "values": values or {},
            },
        )
    else:
        val = dict(existing)
        val["updated_at"] = now
        if metadata:
            val.setdefault("metadata", {}).update(metadata)
        if values:
            val.setdefault("values", {}).update(values)
        await _store_put(store, val)
 def _derive_thread_status(checkpoint_tuple) -> str:
    """Derive thread status from checkpoint metadata."""
    if checkpoint_tuple is None:
@@ -215,22 +188,18 @@ def _derive_thread_status(checkpoint_tuple) -> str:
@router.delete("/{thread_id}", response_model=ThreadDeleteResponse)
@require_permission("threads", "delete", owner_check=True, require_existing=True)
 async def delete_thread_data(thread_id: str, request: Request) -> ThreadDeleteResponse:
    """Delete local persisted filesystem data for a thread.
    Cleans DeerFlow-managed thread directories, removes checkpoint data,
-    and removes the thread record from the Store.
+    and removes the thread_meta row from the configured ThreadMetaStore
    (sqlite or memory).
    """
-    # Clean local filesystem
+    from app.gateway.deps import get_thread_store
    response = _delete_thread_data(thread_id)
-    # Remove from Store (best-effort)
+    # Clean local filesystem
-    store = get_store(request)
+    response = _delete_thread_data(thread_id, user_id=get_effective_user_id())
    if store is not None:
        try:
            await store.adelete(THREADS_NS, thread_id)
        except Exception:
            logger.debug("Could not delete store record for thread %s (not critical)", thread_id)
    # Remove checkpoints (best-effort)
    checkpointer = getattr(request.app.state, "checkpointer", None)
@@ -239,7 +208,15 @@ async def delete_thread_data(thread_id: str, request: Request) -> ThreadDeleteRe
            if hasattr(checkpointer, "adelete_thread"):
                await checkpointer.adelete_thread(thread_id)
        except Exception:
-            logger.debug("Could not delete checkpoints for thread %s (not critical)", thread_id)
+            logger.debug("Could not delete checkpoints for thread %s (not critical)", sanitize_log_param(thread_id))
    # Remove thread_meta row (best-effort) — required for sqlite backend
    # so the deleted thread no longer appears in /threads/search.
    try:
        thread_store = get_thread_store(request)
        await thread_store.delete(thread_id)
    except Exception:
        logger.debug("Could not delete thread_meta for %s (not critical)", sanitize_log_param(thread_id))
    return response
@@ -248,43 +225,40 @@ async def delete_thread_data(thread_id: str, request: Request) -> ThreadDeleteRe
 async def create_thread(body: ThreadCreateRequest, request: Request) -> ThreadResponse:
    """Create a new thread.
-    The thread record is written to the Store (for fast listing) and an
+    Writes a thread_meta record (so the thread appears in /threads/search)
-    empty checkpoint is written to the checkpointer (for state reads).
+    and an empty checkpoint (so state endpoints work immediately).
    Idempotent: returns the existing record when ``thread_id`` already exists.
    """
-    store = get_store(request)
+    from app.gateway.deps import get_thread_store
    checkpointer = get_checkpointer(request)
    thread_store = get_thread_store(request)
    thread_id = body.thread_id or str(uuid.uuid4())
    now = time.time()
    # ``body.metadata`` is already stripped of server-reserved keys by
    # ``ThreadCreateRequest._strip_reserved`` — see the model definition.
-    # Idempotency: return existing record from Store when already present
+    # Idempotency: return existing record when already present
-    if store is not None:
+    existing_record = await thread_store.get(thread_id)
-        existing_record = await _store_get(store, thread_id)
+    if existing_record is not None:
-        if existing_record is not None:
+        return ThreadResponse(
-            return ThreadResponse(
+            thread_id=thread_id,
-                thread_id=thread_id,
+            status=existing_record.get("status", "idle"),
-                status=existing_record.get("status", "idle"),
+            created_at=str(existing_record.get("created_at", "")),
-                created_at=str(existing_record.get("created_at", "")),
+            updated_at=str(existing_record.get("updated_at", "")),
-                updated_at=str(existing_record.get("updated_at", "")),
+            metadata=existing_record.get("metadata", {}),
-                metadata=existing_record.get("metadata", {}),
+        )
            )
-    # Write thread record to Store
+    # Write thread_meta so the thread appears in /threads/search immediately
-    if store is not None:
+    try:
-        try:
+        await thread_store.create(
-            await _store_put(
+            thread_id,
-                store,
+            assistant_id=getattr(body, "assistant_id", None),
-                {
+            metadata=body.metadata,
-                    "thread_id": thread_id,
+        )
-                    "status": "idle",
+    except Exception:
-                    "created_at": now,
+        logger.exception("Failed to write thread_meta for %s", sanitize_log_param(thread_id))
-                    "updated_at": now,
+        raise HTTPException(status_code=500, detail="Failed to create thread")
                    "metadata": body.metadata,
                },
            )
        except Exception:
            logger.exception("Failed to write thread %s to store", thread_id)
            raise HTTPException(status_code=500, detail="Failed to create thread")
    # Write an empty checkpoint so state endpoints work immediately
    config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
@@ -301,10 +275,10 @@ async def create_thread(body: ThreadCreateRequest, request: Request) -> ThreadRe
        }
        await checkpointer.aput(config, empty_checkpoint(), ckpt_metadata, {})
    except Exception:
-        logger.exception("Failed to create checkpoint for thread %s", thread_id)
+        logger.exception("Failed to create checkpoint for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to create thread")
-    logger.info("Thread created: %s", thread_id)
+    logger.info("Thread created: %s", sanitize_log_param(thread_id))
    return ThreadResponse(
        thread_id=thread_id,
        status="idle",
@@ -318,166 +292,91 @@ async def create_thread(body: ThreadCreateRequest, request: Request) -> ThreadRe
 async def search_threads(body: ThreadSearchRequest, request: Request) -> list[ThreadResponse]:
    """Search and list threads.
-    Two-phase approach:
+    Delegates to the configured ThreadMetaStore implementation
-
+    (SQL-backed for sqlite/postgres, Store-backed for memory mode).
    **Phase 1 — Store (fast path, O(threads))**: returns threads that were
    created or run through this Gateway.  Store records are tiny metadata
    dicts so fetching all of them at once is cheap.
    **Phase 2 — Checkpointer supplement (lazy migration)**: threads that
    were created directly by LangGraph Server (and therefore absent from the
    Store) are discovered here by iterating the shared checkpointer.  Any
    newly found thread is immediately written to the Store so that the next
    search skips Phase 2 for that thread — the Store converges to a full
    index over time without a one-shot migration job.
    """
-    store = get_store(request)
+    from app.gateway.deps import get_thread_store
    checkpointer = get_checkpointer(request)
-    # -----------------------------------------------------------------------
+    repo = get_thread_store(request)
-    # Phase 1: Store
+    rows = await repo.search(
-    # -----------------------------------------------------------------------
+        metadata=body.metadata or None,
-    merged: dict[str, ThreadResponse] = {}
+        status=body.status,
-
+        limit=body.limit,
-    if store is not None:
+        offset=body.offset,
-        try:
+    )
-            items = await store.asearch(THREADS_NS, limit=10_000)
+    return [
-        except Exception:
+        ThreadResponse(
-            logger.warning("Store search failed — falling back to checkpointer only", exc_info=True)
+            thread_id=r["thread_id"],
-            items = []
+            status=r.get("status", "idle"),
-
+            created_at=r.get("created_at", ""),
-        for item in items:
+            updated_at=r.get("updated_at", ""),
-            val = item.value
+            metadata=r.get("metadata", {}),
-            merged[val["thread_id"]] = ThreadResponse(
+            values={"title": r["display_name"]} if r.get("display_name") else {},
-                thread_id=val["thread_id"],
+            interrupts={},
-                status=val.get("status", "idle"),
+        )
-                created_at=str(val.get("created_at", "")),
+        for r in rows
-                updated_at=str(val.get("updated_at", "")),
+    ]
                metadata=val.get("metadata", {}),
                values=val.get("values", {}),
            )
    # -----------------------------------------------------------------------
    # Phase 2: Checkpointer supplement
    # Discovers threads not yet in the Store (e.g. created by LangGraph
    # Server) and lazily migrates them so future searches skip this phase.
    # -----------------------------------------------------------------------
    try:
        async for checkpoint_tuple in checkpointer.alist(None):
            cfg = getattr(checkpoint_tuple, "config", {})
            thread_id = cfg.get("configurable", {}).get("thread_id")
            if not thread_id or thread_id in merged:
                continue
            # Skip sub-graph checkpoints (checkpoint_ns is non-empty for those)
            if cfg.get("configurable", {}).get("checkpoint_ns", ""):
                continue
            ckpt_meta = getattr(checkpoint_tuple, "metadata", {}) or {}
            # Strip LangGraph internal keys from the user-visible metadata dict
            user_meta = {k: v for k, v in ckpt_meta.items() if k not in ("created_at", "updated_at", "step", "source", "writes", "parents")}
            # Extract state values (title) from the checkpoint's channel_values
            checkpoint_data = getattr(checkpoint_tuple, "checkpoint", {}) or {}
            channel_values = checkpoint_data.get("channel_values", {})
            ckpt_values = {}
            if title := channel_values.get("title"):
                ckpt_values["title"] = title
            thread_resp = ThreadResponse(
                thread_id=thread_id,
                status=_derive_thread_status(checkpoint_tuple),
                created_at=str(ckpt_meta.get("created_at", "")),
                updated_at=str(ckpt_meta.get("updated_at", ckpt_meta.get("created_at", ""))),
                metadata=user_meta,
                values=ckpt_values,
            )
            merged[thread_id] = thread_resp
            # Lazy migration — write to Store so the next search finds it there
            if store is not None:
                try:
                    await _store_upsert(store, thread_id, metadata=user_meta, values=ckpt_values or None)
                except Exception:
                    logger.debug("Failed to migrate thread %s to store (non-fatal)", thread_id)
    except Exception:
        logger.exception("Checkpointer scan failed during thread search")
        # Don't raise — return whatever was collected from Store + partial scan
    # -----------------------------------------------------------------------
    # Phase 3: Filter → sort → paginate
    # -----------------------------------------------------------------------
    results = list(merged.values())
    if body.metadata:
        results = [r for r in results if all(r.metadata.get(k) == v for k, v in body.metadata.items())]
    if body.status:
        results = [r for r in results if r.status == body.status]
    results.sort(key=lambda r: r.updated_at, reverse=True)
    return results[body.offset : body.offset + body.limit]
@router.patch("/{thread_id}", response_model=ThreadResponse)
@require_permission("threads", "write", owner_check=True, require_existing=True)
 async def patch_thread(thread_id: str, body: ThreadPatchRequest, request: Request) -> ThreadResponse:
    """Merge metadata into a thread record."""
-    store = get_store(request)
+    from app.gateway.deps import get_thread_store
    if store is None:
        raise HTTPException(status_code=503, detail="Store not available")
-    record = await _store_get(store, thread_id)
+    thread_store = get_thread_store(request)
    record = await thread_store.get(thread_id)
    if record is None:
        raise HTTPException(status_code=404, detail=f"Thread {thread_id} not found")
-    now = time.time()
+    # ``body.metadata`` already stripped by ``ThreadPatchRequest._strip_reserved``.
    updated = dict(record)
    updated.setdefault("metadata", {}).update(body.metadata)
    updated["updated_at"] = now
    try:
-        await _store_put(store, updated)
+        await thread_store.update_metadata(thread_id, body.metadata)
    except Exception:
-        logger.exception("Failed to patch thread %s", thread_id)
+        logger.exception("Failed to patch thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to update thread")
    # Re-read to get the merged metadata + refreshed updated_at
    record = await thread_store.get(thread_id) or record
    return ThreadResponse(
        thread_id=thread_id,
-        status=updated.get("status", "idle"),
+        status=record.get("status", "idle"),
-        created_at=str(updated.get("created_at", "")),
+        created_at=str(record.get("created_at", "")),
-        updated_at=str(now),
+        updated_at=str(record.get("updated_at", "")),
-        metadata=updated.get("metadata", {}),
+        metadata=record.get("metadata", {}),
    )
@router.get("/{thread_id}", response_model=ThreadResponse)
@require_permission("threads", "read", owner_check=True)
 async def get_thread(thread_id: str, request: Request) -> ThreadResponse:
    """Get thread info.
-    Reads metadata from the Store and derives the accurate execution
+    Reads metadata from the ThreadMetaStore and derives the accurate
-    status from the checkpointer.  Falls back to the checkpointer alone
+    execution status from the checkpointer.  Falls back to the checkpointer
-    for threads that pre-date Store adoption (backward compat).
+    alone for threads that pre-date ThreadMetaStore adoption (backward compat).
    """
-    store = get_store(request)
+    from app.gateway.deps import get_thread_store
    thread_store = get_thread_store(request)
    checkpointer = get_checkpointer(request)
-    record: dict | None = None
+    record: dict | None = await thread_store.get(thread_id)
    if store is not None:
        record = await _store_get(store, thread_id)
    # Derive accurate status from the checkpointer
    config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
    try:
        checkpoint_tuple = await checkpointer.aget_tuple(config)
    except Exception:
-        logger.exception("Failed to get checkpoint for thread %s", thread_id)
+        logger.exception("Failed to get checkpoint for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to get thread")
    if record is None and checkpoint_tuple is None:
        raise HTTPException(status_code=404, detail=f"Thread {thread_id} not found")
-    # If the thread exists in the checkpointer but not the store (e.g. legacy
+    # If the thread exists in the checkpointer but not in thread_meta (e.g.
-    # data), synthesize a minimal store record from the checkpoint metadata.
+    # legacy data created before thread_meta adoption), synthesize a minimal
    # record from the checkpoint metadata.
    if record is None and checkpoint_tuple is not None:
        ckpt_meta = getattr(checkpoint_tuple, "metadata", {}) or {}
        record = {
@@ -505,7 +404,9 @@ async def get_thread(thread_id: str, request: Request) -> ThreadResponse:
    )
 # ---------------------------------------------------------------------------
@router.get("/{thread_id}/state", response_model=ThreadStateResponse)
@require_permission("threads", "read", owner_check=True)
 async def get_thread_state(thread_id: str, request: Request) -> ThreadStateResponse:
    """Get the latest state snapshot for a thread.
@@ -518,7 +419,7 @@ async def get_thread_state(thread_id: str, request: Request) -> ThreadStateRespo
    try:
        checkpoint_tuple = await checkpointer.aget_tuple(config)
    except Exception:
-        logger.exception("Failed to get state for thread %s", thread_id)
+        logger.exception("Failed to get state for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to get thread state")
    if checkpoint_tuple is None:
@@ -542,8 +443,10 @@ async def get_thread_state(thread_id: str, request: Request) -> ThreadStateRespo
    next_tasks = [t.name for t in tasks_raw if hasattr(t, "name")]
    tasks = [{"id": getattr(t, "id", ""), "name": getattr(t, "name", "")} for t in tasks_raw]
    values = serialize_channel_values(channel_values)
    return ThreadStateResponse(
-        values=serialize_channel_values(channel_values),
+        values=values,
        next=next_tasks,
        metadata=metadata,
        checkpoint={"id": checkpoint_id, "ts": str(metadata.get("created_at", ""))},
@@ -555,15 +458,19 @@ async def get_thread_state(thread_id: str, request: Request) -> ThreadStateRespo
@router.post("/{thread_id}/state", response_model=ThreadStateResponse)
@require_permission("threads", "write", owner_check=True, require_existing=True)
 async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, request: Request) -> ThreadStateResponse:
    """Update thread state (e.g. for human-in-the-loop resume or title rename).
    Writes a new checkpoint that merges *body.values* into the latest
-    channel values, then syncs any updated ``title`` field back to the Store
+    channel values, then syncs any updated ``title`` field through the
-    so that ``/threads/search`` reflects the change immediately.
+    ThreadMetaStore abstraction so that ``/threads/search`` reflects the
    change immediately in both sqlite and memory backends.
    """
    from app.gateway.deps import get_thread_store
    checkpointer = get_checkpointer(request)
-    store = get_store(request)
+    thread_store = get_thread_store(request)
    # checkpoint_ns must be present in the config for aput — default to ""
    # (the root graph namespace).  checkpoint_id is optional; omitting it
@@ -580,7 +487,7 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re
    try:
        checkpoint_tuple = await checkpointer.aget_tuple(read_config)
    except Exception:
-        logger.exception("Failed to get state for thread %s", thread_id)
+        logger.exception("Failed to get state for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to get thread state")
    if checkpoint_tuple is None:
@@ -614,19 +521,22 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re
    try:
        new_config = await checkpointer.aput(write_config, checkpoint, metadata, {})
    except Exception:
-        logger.exception("Failed to update state for thread %s", thread_id)
+        logger.exception("Failed to update state for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to update thread state")
    new_checkpoint_id: str | None = None
    if isinstance(new_config, dict):
        new_checkpoint_id = new_config.get("configurable", {}).get("checkpoint_id")
-    # Sync title changes to the Store so /threads/search reflects them immediately.
+    # Sync title changes through the ThreadMetaStore abstraction so /threads/search
-    if store is not None and body.values and "title" in body.values:
+    # reflects them immediately in both sqlite and memory backends.
-        try:
+    if body.values and "title" in body.values:
-            await _store_upsert(store, thread_id, values={"title": body.values["title"]})
+        new_title = body.values["title"]
-        except Exception:
+        if new_title:  # Skip empty strings and None
-            logger.debug("Failed to sync title to store for thread %s (non-fatal)", thread_id)
+            try:
                await thread_store.update_display_name(thread_id, new_title)
            except Exception:
                logger.debug("Failed to sync title to thread_meta for %s (non-fatal)", sanitize_log_param(thread_id))
    return ThreadStateResponse(
        values=serialize_channel_values(channel_values),
@@ -638,8 +548,16 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re
@router.post("/{thread_id}/history", response_model=list[HistoryEntry])
@require_permission("threads", "read", owner_check=True)
 async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request: Request) -> list[HistoryEntry]:
-    """Get checkpoint history for a thread."""
+    """Get checkpoint history for a thread.
    Messages are read from the checkpointer's channel values (the
    authoritative source) and serialized via
    :func:`~deerflow.runtime.serialization.serialize_channel_values`.
    Only the latest (first) checkpoint carries the ``messages`` key to
    avoid duplicating them across every entry.
    """
    checkpointer = get_checkpointer(request)
    config: dict[str, Any] = {"configurable": {"thread_id": thread_id}}
@@ -647,6 +565,7 @@ async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request
        config["configurable"]["checkpoint_id"] = body.before
    entries: list[HistoryEntry] = []
    is_latest_checkpoint = True
    try:
        async for checkpoint_tuple in checkpointer.alist(config, limit=body.limit):
            ckpt_config = getattr(checkpoint_tuple, "config", {})
@@ -661,22 +580,42 @@ async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request
            channel_values = checkpoint.get("channel_values", {})
            # Build values from checkpoint channel_values
            values: dict[str, Any] = {}
            if title := channel_values.get("title"):
                values["title"] = title
            if thread_data := channel_values.get("thread_data"):
                values["thread_data"] = thread_data
            # Attach messages only to the latest checkpoint entry.
            if is_latest_checkpoint:
                messages = channel_values.get("messages")
                if messages:
                    values["messages"] = serialize_channel_values({"messages": messages}).get("messages", [])
            is_latest_checkpoint = False
            # Derive next tasks
            tasks_raw = getattr(checkpoint_tuple, "tasks", []) or []
            next_tasks = [t.name for t in tasks_raw if hasattr(t, "name")]
            # Strip LangGraph internal keys from metadata
            user_meta = {k: v for k, v in metadata.items() if k not in ("created_at", "updated_at", "step", "source", "writes", "parents")}
            # Keep step for ordering context
            if "step" in metadata:
                user_meta["step"] = metadata["step"]
            entries.append(
                HistoryEntry(
                    checkpoint_id=checkpoint_id,
                    parent_checkpoint_id=parent_id,
-                    metadata=metadata,
+                    metadata=user_meta,
-                    values=serialize_channel_values(channel_values),
+                    values=values,
                    created_at=str(metadata.get("created_at", "")),
                    next=next_tasks,
                )
            )
    except Exception:
-        logger.exception("Failed to get history for thread %s", thread_id)
+        logger.exception("Failed to get history for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to get thread history")
    return entries
@@ -4,10 +4,12 @@ import logging
 import os
 import stat
-from fastapi import APIRouter, File, HTTPException, UploadFile
+from fastapi import APIRouter, File, HTTPException, Request, UploadFile
 from pydantic import BaseModel
 from app.gateway.authz import require_permission
 from deerflow.config.paths import get_paths
 from deerflow.runtime.user_context import get_effective_user_id
 from deerflow.sandbox.sandbox_provider import get_sandbox_provider
 from deerflow.uploads.manager import (
    PathTraversalError,
@@ -54,8 +56,10 @@ def _make_file_sandbox_writable(file_path: os.PathLike[str] | str) -> None:
@router.post("", response_model=UploadResponse)
@require_permission("threads", "write", owner_check=True, require_existing=False)
 async def upload_files(
    thread_id: str,
    request: Request,
    files: list[UploadFile] = File(...),
 ) -> UploadResponse:
    """Upload multiple files to a thread's uploads directory."""
@@ -66,7 +70,7 @@ async def upload_files(
        uploads_dir = ensure_uploads_dir(thread_id)
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))
-    sandbox_uploads = get_paths().sandbox_uploads_dir(thread_id)
+    sandbox_uploads = get_paths().sandbox_uploads_dir(thread_id, user_id=get_effective_user_id())
    uploaded_files = []
    sandbox_provider = get_sandbox_provider()
@@ -133,7 +137,8 @@ async def upload_files(
@router.get("/list", response_model=dict)
-async def list_uploaded_files(thread_id: str) -> dict:
+@require_permission("threads", "read", owner_check=True)
 async def list_uploaded_files(thread_id: str, request: Request) -> dict:
    """List all files in a thread's uploads directory."""
    try:
        uploads_dir = get_uploads_dir(thread_id)
@@ -143,7 +148,7 @@ async def list_uploaded_files(thread_id: str) -> dict:
    enrich_file_listing(result, thread_id)
    # Gateway additionally includes the sandbox-relative path.
-    sandbox_uploads = get_paths().sandbox_uploads_dir(thread_id)
+    sandbox_uploads = get_paths().sandbox_uploads_dir(thread_id, user_id=get_effective_user_id())
    for f in result["files"]:
        f["path"] = str(sandbox_uploads / f["filename"])
@@ -151,7 +156,8 @@ async def list_uploaded_files(thread_id: str) -> dict:
@router.delete("/{filename}")
-async def delete_uploaded_file(thread_id: str, filename: str) -> dict:
+@require_permission("threads", "delete", owner_check=True, require_existing=True)
 async def delete_uploaded_file(thread_id: str, filename: str, request: Request) -> dict:
    """Delete a file from a thread's uploads directory."""
    try:
        uploads_dir = get_uploads_dir(thread_id)
@@ -8,16 +8,17 @@ frames, and consuming stream bridge events.  Router modules
 from __future__ import annotations
 import asyncio
 import dataclasses
 import json
 import logging
 import re
 import time
 from typing import Any
 from fastapi import HTTPException, Request
 from langchain_core.messages import HumanMessage
-from app.gateway.deps import get_checkpointer, get_run_manager, get_store, get_stream_bridge
+from app.gateway.deps import get_run_context, get_run_manager, get_run_store, get_stream_bridge
 from app.gateway.utils import sanitize_log_param
 from deerflow.runtime import (
    END_SENTINEL,
    HEARTBEAT_SENTINEL,
@@ -171,71 +172,6 @@ def build_run_config(
 # ---------------------------------------------------------------------------
 async def _upsert_thread_in_store(store, thread_id: str, metadata: dict | None) -> None:
    """Create or refresh the thread record in the Store.
    Called from :func:`start_run` so that threads created via the stateless
    ``/runs/stream`` endpoint (which never calls ``POST /threads``) still
    appear in ``/threads/search`` results.
    """
    # Deferred import to avoid circular import with the threads router module.
    from app.gateway.routers.threads import _store_upsert
    try:
        await _store_upsert(store, thread_id, metadata=metadata)
    except Exception:
        logger.warning("Failed to upsert thread %s in store (non-fatal)", thread_id)
 async def _sync_thread_title_after_run(
    run_task: asyncio.Task,
    thread_id: str,
    checkpointer: Any,
    store: Any,
 ) -> None:
    """Wait for *run_task* to finish, then persist the generated title to the Store.
    TitleMiddleware writes the generated title to the LangGraph agent state
    (checkpointer) but the Gateway's Store record is not updated automatically.
    This coroutine closes that gap by reading the final checkpoint after the
    run completes and syncing ``values.title`` into the Store record so that
    subsequent ``/threads/search`` responses include the correct title.
    Runs as a fire-and-forget :func:`asyncio.create_task`; failures are
    logged at DEBUG level and never propagate.
    """
    # Wait for the background run task to complete (any outcome).
    # asyncio.wait does not propagate task exceptions — it just returns
    # when the task is done, cancelled, or failed.
    await asyncio.wait({run_task})
    # Deferred import to avoid circular import with the threads router module.
    from app.gateway.routers.threads import _store_get, _store_put
    try:
        ckpt_config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
        ckpt_tuple = await checkpointer.aget_tuple(ckpt_config)
        if ckpt_tuple is None:
            return
        channel_values = ckpt_tuple.checkpoint.get("channel_values", {})
        title = channel_values.get("title")
        if not title:
            return
        existing = await _store_get(store, thread_id)
        if existing is None:
            return
        updated = dict(existing)
        updated.setdefault("values", {})["title"] = title
        updated["updated_at"] = time.time()
        await _store_put(store, updated)
        logger.debug("Synced title %r for thread %s", title, thread_id)
    except Exception:
        logger.debug("Failed to sync title for thread %s (non-fatal)", thread_id, exc_info=True)
 async def start_run(
    body: Any,
    thread_id: str,
@@ -255,8 +191,7 @@ async def start_run(
    """
    bridge = get_stream_bridge(request)
    run_mgr = get_run_manager(request)
-    checkpointer = get_checkpointer(request)
+    run_ctx = get_run_context(request)
    store = get_store(request)
    disconnect = DisconnectMode.cancel if body.on_disconnect == "cancel" else DisconnectMode.continue_
@@ -274,11 +209,21 @@ async def start_run(
    except UnsupportedStrategyError as exc:
        raise HTTPException(status_code=501, detail=str(exc)) from exc
-    # Ensure the thread is visible in /threads/search, even for threads that
+    # Upsert thread metadata so the thread appears in /threads/search,
-    # were never explicitly created via POST /threads (e.g. stateless runs).
+    # even for threads that were never explicitly created via POST /threads
-    store = get_store(request)
+    # (e.g. stateless runs).
-    if store is not None:
+    try:
-        await _upsert_thread_in_store(store, thread_id, body.metadata)
+        existing = await run_ctx.thread_store.get(thread_id)
        if existing is None:
            await run_ctx.thread_store.create(
                thread_id,
                assistant_id=body.assistant_id,
                metadata=body.metadata,
            )
        else:
            await run_ctx.thread_store.update_status(thread_id, "running")
    except Exception:
        logger.warning("Failed to upsert thread_meta for %s (non-fatal)", sanitize_log_param(thread_id))
    agent_factory = resolve_agent_factory(body.assistant_id)
    graph_input = normalize_input(body.input)
@@ -311,8 +256,7 @@ async def start_run(
            bridge,
            run_mgr,
            record,
-            checkpointer=checkpointer,
+            ctx=run_ctx,
            store=store,
            agent_factory=agent_factory,
            graph_input=graph_input,
            config=config,
@@ -324,11 +268,9 @@ async def start_run(
    )
    record.task = task
-    # After the run completes, sync the title generated by TitleMiddleware from
+    # Title sync is handled by worker.py's finally block which reads the
-    # the checkpointer into the Store record so that /threads/search returns the
+    # title from the checkpoint and calls thread_store.update_display_name
-    # correct title instead of an empty values dict.
+    # after the run completes.
    if store is not None:
        asyncio.create_task(_sync_thread_title_after_run(task, thread_id, checkpointer, store))
    return record
@@ -0,0 +1,6 @@
 """Shared utility helpers for the Gateway layer."""
 def sanitize_log_param(value: str) -> str:
    """Strip control characters to prevent log injection."""
    return value.replace("\n", "").replace("\r", "").replace("\x00", "")
@@ -0,0 +1,77 @@
 # Docker Test Gap (Section 七 7.4)
 This file documents the only **un-executed** test cases from
 `backend/docs/AUTH_TEST_PLAN.md` after the full release validation pass.
 ## Why this gap exists
 The release validation environment (sg_dev: `10.251.229.92`) **does not have
 a Docker daemon installed**. The TC-DOCKER cases are container-runtime
 behavior tests that need an actual Docker engine to spin up
 `docker/docker-compose.yaml` services.
 ```bash
 $ ssh sg_dev "which docker; docker --version"
 # (empty)
 # bash: docker: command not found
 ```
 All other test plan sections were executed against either:
 - The local dev box (Mac, all services running locally), or
 - The deployed sg_dev instance (gateway + frontend + nginx via SSH tunnel)
 ## Cases not executed
 | Case | Title | What it covers | Why not run |
 |---|---|---|---|
 | TC-DOCKER-01 | `users.db` volume persistence | Verify the `DEER_FLOW_HOME` bind mount survives container restart | needs `docker compose up` |
 | TC-DOCKER-02 | Session persistence across container restart | `AUTH_JWT_SECRET` env var keeps cookies valid after `docker compose down && up` | needs `docker compose down/up` |
 | TC-DOCKER-03 | Per-worker rate limiter divergence | Confirms in-process `_login_attempts` dict doesn't share state across `gunicorn` workers (4 by default in the compose file); known limitation, documented | needs multi-worker container |
 | TC-DOCKER-04 | IM channels skip AuthMiddleware | Verify Feishu/Slack/Telegram dispatchers run in-container against `http://langgraph:2024` without going through nginx | needs `docker logs` |
 | TC-DOCKER-05 | Admin credentials surfacing | **Updated post-simplify** — was "log scrape", now "0600 credential file in `DEER_FLOW_HOME`". The file-based behavior is already validated by TC-1.1 + TC-UPG-13 on sg_dev (non-Docker), so the only Docker-specific gap is verifying the volume mount carries the file out to the host | needs container + host volume |
 | TC-DOCKER-06 | Gateway-mode Docker deploy | `./scripts/deploy.sh --gateway` produces a 3-container topology (no `langgraph` container); same auth flow as standard mode | needs `docker compose --profile gateway` |
 ## Coverage already provided by non-Docker tests
 The **auth-relevant** behavior in each Docker case is already exercised by
 the test cases that ran on sg_dev or local:
 | Docker case | Auth behavior covered by |
 |---|---|
 | TC-DOCKER-01 (volume persistence) | TC-REENT-01 on sg_dev (admin row survives gateway restart) — same SQLite file, just no container layer between |
 | TC-DOCKER-02 (session persistence) | TC-API-02/03/06 (cookie roundtrip), plus TC-REENT-04 (multi-cookie) — JWT verification is process-state-free, container restart is equivalent to `pkill uvicorn && uv run uvicorn` |
 | TC-DOCKER-03 (per-worker rate limit) | TC-GW-04 + TC-REENT-09 (single-worker rate limit + 5min expiry). The cross-worker divergence is an architectural property of the in-memory dict; no auth code path differs |
 | TC-DOCKER-04 (IM channels skip auth) | Code-level only: `app/channels/manager.py` uses `langgraph_sdk` directly with no cookie handling. The langgraph_auth handler is bypassed by going through SDK, not HTTP |
 | TC-DOCKER-05 (credential surfacing) | TC-1.1 on sg_dev (file at `~/deer-flow/backend/.deer-flow/admin_initial_credentials.txt`, mode 0600, password 22 chars) — the only Docker-unique step is whether the bind mount projects this path onto the host, which is a `docker compose` config check, not a runtime behavior change |
 | TC-DOCKER-06 (gateway-mode container) | Section 七 7.2 covered by TC-GW-01..05 + Section 二 (gateway-mode auth flow on sg_dev) — same Gateway code, container is just a packaging change |
 ## Reproduction steps when Docker becomes available
 Anyone with `docker` + `docker compose` installed can reproduce the gap by
 running the test plan section verbatim. Pre-flight:
 ```bash
 # Required on the host
 docker --version           # >=24.x
 docker compose version     # plugin >=2.x
 # Required env var (otherwise sessions reset on every container restart)
 echo "AUTH_JWT_SECRET=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')" \
  >> .env
 # Optional: pin DEER_FLOW_HOME to a stable host path
 echo "DEER_FLOW_HOME=$HOME/deer-flow-data" >> .env
 ```
 Then run TC-DOCKER-01..06 from the test plan as written.
 ## Decision log
 - **Not blocking the release.** The auth-relevant behavior in every Docker
  case has an already-validated equivalent on bare metal. The gap is purely
  about *container packaging* details (bind mounts, multi-worker, log
  collection), not about whether the auth code paths work.
 - **TC-DOCKER-05 was updated in place** in `AUTH_TEST_PLAN.md` to reflect
  the post-simplify reality (credentials file → 0600 file, no log leak).
  The old "grep 'Password:' in docker logs" expectation would have failed
  silently and given a false sense of coverage.
@@ -0,0 +1,129 @@
 # Authentication Upgrade Guide
 DeerFlow 内置了认证模块。本文档面向从无认证版本升级的用户。
 ## 核心概念
 认证模块采用**始终强制**策略：
 - 首次启动时自动创建 admin 账号，随机密码打印到控制台日志
 - 认证从一开始就是强制的，无竞争窗口
 - 历史对话（升级前创建的 thread）自动迁移到 admin 名下
 ## 升级步骤
 ### 1. 更新代码
 ```bash
 git pull origin main
 cd backend && make install
 ```
 ### 2. 首次启动
 ```bash
 make dev
 ```
 控制台会输出：
 ```
 ============================================================
  Admin account created on first boot
  Email:    admin@deerflow.dev
  Password: aB3xK9mN_pQ7rT2w
  Change it after login: Settings → Account
 ============================================================
 ```
 如果未登录就重启了服务，不用担心——只要 setup 未完成，每次启动都会重置密码并重新打印到控制台。
 ### 3. 登录
 访问 `http://localhost:2026/login`，使用控制台输出的邮箱和密码登录。
 ### 4. 修改密码
 登录后进入 Settings → Account → Change Password。
 ### 5. 添加用户（可选）
 其他用户通过 `/login` 页面注册，自动获得 **user** 角色。每个用户只能看到自己的对话。
 ## 安全机制
 | 机制 | 说明 |
 |------|------|
 | JWT HttpOnly Cookie | Token 不暴露给 JavaScript，防止 XSS 窃取 |
 | CSRF Double Submit Cookie | 所有 POST/PUT/DELETE 请求需携带 `X-CSRF-Token` |
 | bcrypt 密码哈希 | 密码不以明文存储 |
 | 多租户隔离 | 用户只能访问自己的 thread |
 | HTTPS 自适应 | 检测 `x-forwarded-proto`，自动设置 `Secure` cookie 标志 |
 ## 常见操作
 ### 忘记密码
 ```bash
 cd backend
 # 重置 admin 密码
 python -m app.gateway.auth.reset_admin
 # 重置指定用户密码
 python -m app.gateway.auth.reset_admin --email user@example.com
 ```
 会输出新的随机密码。
 ### 完全重置
 删除用户数据库，重启后自动创建新 admin：
 ```bash
 rm -f backend/.deer-flow/users.db
 # 重启服务，控制台输出新密码
 ```
 ## 数据存储
 | 文件 | 内容 |
 |------|------|
 | `.deer-flow/users.db` | SQLite 用户数据库（密码哈希、角色） |
 | `.env` 中的 `AUTH_JWT_SECRET` | JWT 签名密钥（未设置时自动生成临时密钥，重启后 session 失效） |
 ### 生产环境建议
 ```bash
 # 生成持久化 JWT 密钥，避免重启后所有用户需重新登录
 python -c "import secrets; print(secrets.token_urlsafe(32))"
 # 将输出添加到 .env：
 # AUTH_JWT_SECRET=<生成的密钥>
 ```
 ## API 端点
 | 端点 | 方法 | 说明 |
 |------|------|------|
 | `/api/v1/auth/login/local` | POST | 邮箱密码登录（OAuth2 form） |
 | `/api/v1/auth/register` | POST | 注册新用户（user 角色） |
 | `/api/v1/auth/logout` | POST | 登出（清除 cookie） |
 | `/api/v1/auth/me` | GET | 获取当前用户信息 |
 | `/api/v1/auth/change-password` | POST | 修改密码 |
 | `/api/v1/auth/setup-status` | GET | 检查 admin 是否存在 |
 ## 兼容性
 - **标准模式**（`make dev`）：完全兼容，admin 自动创建
 - **Gateway 模式**（`make dev-pro`）：完全兼容
 - **Docker 部署**：完全兼容，`.deer-flow/users.db` 需持久化卷挂载
 - **IM 渠道**（Feishu/Slack/Telegram）：通过 LangGraph SDK 通信，不经过认证层
 - **DeerFlowClient**（嵌入式）：不经过 HTTP，不受认证影响
 ## 故障排查
 | 症状 | 原因 | 解决 |
 |------|------|------|
 | 启动后没看到密码 | admin 已存在（非首次启动） | 用 `reset_admin` 重置，或删 `users.db` |
 | 登录后 POST 返回 403 | CSRF token 缺失 | 确认前端已更新 |
 | 重启后需要重新登录 | `AUTH_JWT_SECRET` 未持久化 | 在 `.env` 中设置固定密钥 |
@@ -124,7 +124,7 @@ title:
 # checkpointer.py
 from langgraph.checkpoint.sqlite import SqliteSaver
-checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
+checkpointer = SqliteSaver.from_conn_string("deerflow.db")
 ```
 ```json
@@ -8,6 +8,9 @@
  "graphs": {
    "lead_agent": "deerflow.agents:make_lead_agent"
  },
  "auth": {
    "path": "./app/gateway/langgraph_auth.py:auth"
  },
  "checkpointer": {
    "path": "./packages/harness/deerflow/agents/checkpointer/async_provider.py:make_checkpointer"
  }
@@ -1,4 +1,3 @@
 from .checkpointer import get_checkpointer, make_checkpointer, reset_checkpointer
 from .factory import create_deerflow_agent
 from .features import Next, Prev, RuntimeFeatures
 from .lead_agent import make_lead_agent
@@ -18,7 +17,4 @@ __all__ = [
    "make_lead_agent",
    "SandboxState",
    "ThreadState",
    "get_checkpointer",
    "reset_checkpointer",
    "make_checkpointer",
 ]
@@ -1,7 +1,7 @@
 import logging
 from langchain.agents import create_agent
-from langchain.agents.middleware import AgentMiddleware, SummarizationMiddleware
+from langchain.agents.middleware import AgentMiddleware
 from langchain_core.runnables import RunnableConfig
 from deerflow.agents.lead_agent.prompt import apply_prompt_template
@@ -9,6 +9,7 @@ from deerflow.agents.middlewares.clarification_middleware import ClarificationMi
 from deerflow.agents.middlewares.loop_detection_middleware import LoopDetectionMiddleware
 from deerflow.agents.middlewares.memory_middleware import MemoryMiddleware
 from deerflow.agents.middlewares.subagent_limit_middleware import SubagentLimitMiddleware
 from deerflow.agents.middlewares.summarization_middleware import SummarizationMiddleware
 from deerflow.agents.middlewares.title_middleware import TitleMiddleware
 from deerflow.agents.middlewares.todo_middleware import TodoMiddleware
 from deerflow.agents.middlewares.token_usage_middleware import TokenUsageMiddleware
@@ -56,13 +57,15 @@ def _create_summarization_middleware() -> SummarizationMiddleware | None:
    # Prepare keep parameter
    keep = config.keep.to_tuple()
-    # Prepare model parameter
+    # Prepare model parameter.
    # Bind "middleware:summarize" tag so RunJournal identifies these LLM calls
    # as middleware rather than lead_agent (SummarizationMiddleware is a
    # LangChain built-in, so we tag the model at creation time).
    if config.model_name:
        model = create_chat_model(name=config.model_name, thinking_enabled=False)
    else:
        # Use a lightweight model for summarization to save costs
        # Falls back to default model if not explicitly specified
        model = create_chat_model(thinking_enabled=False)
    model = model.with_config(tags=["middleware:summarize"])
    # Prepare kwargs
    kwargs = {
@@ -519,12 +519,13 @@ def _get_memory_context(agent_name: str | None = None) -> str:
    try:
        from deerflow.agents.memory import format_memory_for_injection, get_memory_data
        from deerflow.config.memory_config import get_memory_config
        from deerflow.runtime.user_context import get_effective_user_id
        config = get_memory_config()
        if not config.enabled or not config.injection_enabled:
            return ""
-        memory_data = get_memory_data(agent_name)
+        memory_data = get_memory_data(agent_name, user_id=get_effective_user_id())
        memory_content = format_memory_for_injection(memory_data, max_tokens=config.max_injection_tokens)
        if not memory_content.strip():
@@ -20,6 +20,7 @@ class ConversationContext:
    messages: list[Any]
    timestamp: datetime = field(default_factory=lambda: datetime.now(UTC))
    agent_name: str | None = None
    user_id: str | None = None
    correction_detected: bool = False
    reinforcement_detected: bool = False
@@ -44,6 +45,7 @@ class MemoryUpdateQueue:
        thread_id: str,
        messages: list[Any],
        agent_name: str | None = None,
        user_id: str | None = None,
        correction_detected: bool = False,
        reinforcement_detected: bool = False,
    ) -> None:
@@ -53,6 +55,9 @@ class MemoryUpdateQueue:
            thread_id: The thread ID.
            messages: The conversation messages.
            agent_name: If provided, memory is stored per-agent. If None, uses global memory.
            user_id: The user ID captured at enqueue time. Stored in ConversationContext so it
                survives the threading.Timer boundary (ContextVar does not propagate across
                raw threads).
            correction_detected: Whether recent turns include an explicit correction signal.
            reinforcement_detected: Whether recent turns include a positive reinforcement signal.
        """
@@ -71,6 +76,7 @@ class MemoryUpdateQueue:
                thread_id=thread_id,
                messages=messages,
                agent_name=agent_name,
                user_id=user_id,
                correction_detected=merged_correction_detected,
                reinforcement_detected=merged_reinforcement_detected,
            )
@@ -136,6 +142,7 @@ class MemoryUpdateQueue:
                        agent_name=context.agent_name,
                        correction_detected=context.correction_detected,
                        reinforcement_detected=context.reinforcement_detected,
                        user_id=context.user_id,
                    )
                    if success:
                        logger.info("Memory updated successfully for thread %s", context.thread_id)
@@ -43,17 +43,17 @@ class MemoryStorage(abc.ABC):
    """Abstract base class for memory storage providers."""
    @abc.abstractmethod
-    def load(self, agent_name: str | None = None) -> dict[str, Any]:
+    def load(self, agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
        """Load memory data for the given agent."""
        pass
    @abc.abstractmethod
-    def reload(self, agent_name: str | None = None) -> dict[str, Any]:
+    def reload(self, agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
        """Force reload memory data for the given agent."""
        pass
    @abc.abstractmethod
-    def save(self, memory_data: dict[str, Any], agent_name: str | None = None) -> bool:
+    def save(self, memory_data: dict[str, Any], agent_name: str | None = None, *, user_id: str | None = None) -> bool:
        """Save memory data for the given agent."""
        pass
@@ -63,9 +63,9 @@ class FileMemoryStorage(MemoryStorage):
    def __init__(self):
        """Initialize the file memory storage."""
-        # Per-agent memory cache: keyed by agent_name (None = global)
+        # Per-user/agent memory cache: keyed by (user_id, agent_name) tuple (None = global)
        # Value: (memory_data, file_mtime)
-        self._memory_cache: dict[str | None, tuple[dict[str, Any], float | None]] = {}
+        self._memory_cache: dict[tuple[str | None, str | None], tuple[dict[str, Any], float | None]] = {}
    def _validate_agent_name(self, agent_name: str) -> None:
        """Validate that the agent name is safe to use in filesystem paths.
@@ -78,21 +78,29 @@ class FileMemoryStorage(MemoryStorage):
        if not AGENT_NAME_PATTERN.match(agent_name):
            raise ValueError(f"Invalid agent name {agent_name!r}: names must match {AGENT_NAME_PATTERN.pattern}")
-    def _get_memory_file_path(self, agent_name: str | None = None) -> Path:
+    def _get_memory_file_path(self, agent_name: str | None = None, *, user_id: str | None = None) -> Path:
        """Get the path to the memory file."""
        if user_id is not None:
            if agent_name is not None:
                self._validate_agent_name(agent_name)
                return get_paths().user_agent_memory_file(user_id, agent_name)
            config = get_memory_config()
            if config.storage_path and Path(config.storage_path).is_absolute():
                return Path(config.storage_path)
            return get_paths().user_memory_file(user_id)
        # Legacy: no user_id
        if agent_name is not None:
            self._validate_agent_name(agent_name)
            return get_paths().agent_memory_file(agent_name)
        config = get_memory_config()
        if config.storage_path:
            p = Path(config.storage_path)
            return p if p.is_absolute() else get_paths().base_dir / p
        return get_paths().memory_file
-    def _load_memory_from_file(self, agent_name: str | None = None) -> dict[str, Any]:
+    def _load_memory_from_file(self, agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
        """Load memory data from file."""
-        file_path = self._get_memory_file_path(agent_name)
+        file_path = self._get_memory_file_path(agent_name, user_id=user_id)
        if not file_path.exists():
            return create_empty_memory()
@@ -105,40 +113,42 @@ class FileMemoryStorage(MemoryStorage):
            logger.warning("Failed to load memory file: %s", e)
            return create_empty_memory()
-    def load(self, agent_name: str | None = None) -> dict[str, Any]:
+    def load(self, agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
        """Load memory data (cached with file modification time check)."""
-        file_path = self._get_memory_file_path(agent_name)
+        file_path = self._get_memory_file_path(agent_name, user_id=user_id)
        try:
            current_mtime = file_path.stat().st_mtime if file_path.exists() else None
        except OSError:
            current_mtime = None
-        cached = self._memory_cache.get(agent_name)
+        cache_key = (user_id, agent_name)
        cached = self._memory_cache.get(cache_key)
        if cached is None or cached[1] != current_mtime:
-            memory_data = self._load_memory_from_file(agent_name)
+            memory_data = self._load_memory_from_file(agent_name, user_id=user_id)
-            self._memory_cache[agent_name] = (memory_data, current_mtime)
+            self._memory_cache[cache_key] = (memory_data, current_mtime)
            return memory_data
        return cached[0]
-    def reload(self, agent_name: str | None = None) -> dict[str, Any]:
+    def reload(self, agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
        """Reload memory data from file, forcing cache invalidation."""
-        file_path = self._get_memory_file_path(agent_name)
+        file_path = self._get_memory_file_path(agent_name, user_id=user_id)
-        memory_data = self._load_memory_from_file(agent_name)
+        memory_data = self._load_memory_from_file(agent_name, user_id=user_id)
        try:
            mtime = file_path.stat().st_mtime if file_path.exists() else None
        except OSError:
            mtime = None
-        self._memory_cache[agent_name] = (memory_data, mtime)
+        cache_key = (user_id, agent_name)
        self._memory_cache[cache_key] = (memory_data, mtime)
        return memory_data
-    def save(self, memory_data: dict[str, Any], agent_name: str | None = None) -> bool:
+    def save(self, memory_data: dict[str, Any], agent_name: str | None = None, *, user_id: str | None = None) -> bool:
        """Save memory data to file and update cache."""
-        file_path = self._get_memory_file_path(agent_name)
+        file_path = self._get_memory_file_path(agent_name, user_id=user_id)
        try:
            file_path.parent.mkdir(parents=True, exist_ok=True)
@@ -155,7 +165,8 @@ class FileMemoryStorage(MemoryStorage):
            except OSError:
                mtime = None
-            self._memory_cache[agent_name] = (memory_data, mtime)
+            cache_key = (user_id, agent_name)
            self._memory_cache[cache_key] = (memory_data, mtime)
            logger.info("Memory saved to %s", file_path)
            return True
        except OSError as e:
@@ -27,27 +27,28 @@ def _create_empty_memory() -> dict[str, Any]:
    return create_empty_memory()
-def _save_memory_to_file(memory_data: dict[str, Any], agent_name: str | None = None) -> bool:
+def _save_memory_to_file(memory_data: dict[str, Any], agent_name: str | None = None, *, user_id: str | None = None) -> bool:
    """Backward-compatible wrapper around the configured memory storage save path."""
-    return get_memory_storage().save(memory_data, agent_name)
+    return get_memory_storage().save(memory_data, agent_name, user_id=user_id)
-def get_memory_data(agent_name: str | None = None) -> dict[str, Any]:
+def get_memory_data(agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
    """Get the current memory data via storage provider."""
-    return get_memory_storage().load(agent_name)
+    return get_memory_storage().load(agent_name, user_id=user_id)
-def reload_memory_data(agent_name: str | None = None) -> dict[str, Any]:
+def reload_memory_data(agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
    """Reload memory data via storage provider."""
-    return get_memory_storage().reload(agent_name)
+    return get_memory_storage().reload(agent_name, user_id=user_id)
-def import_memory_data(memory_data: dict[str, Any], agent_name: str | None = None) -> dict[str, Any]:
+def import_memory_data(memory_data: dict[str, Any], agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
    """Persist imported memory data via storage provider.
    Args:
        memory_data: Full memory payload to persist.
        agent_name: If provided, imports into per-agent memory.
        user_id: If provided, scopes memory to a specific user.
    Returns:
        The saved memory data after storage normalization.
@@ -56,15 +57,15 @@ def import_memory_data(memory_data: dict[str, Any], agent_name: str | None = Non
        OSError: If persisting the imported memory fails.
    """
    storage = get_memory_storage()
-    if not storage.save(memory_data, agent_name):
+    if not storage.save(memory_data, agent_name, user_id=user_id):
        raise OSError("Failed to save imported memory data")
-    return storage.load(agent_name)
+    return storage.load(agent_name, user_id=user_id)
-def clear_memory_data(agent_name: str | None = None) -> dict[str, Any]:
+def clear_memory_data(agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
    """Clear all stored memory data and persist an empty structure."""
    cleared_memory = create_empty_memory()
-    if not _save_memory_to_file(cleared_memory, agent_name):
+    if not _save_memory_to_file(cleared_memory, agent_name, user_id=user_id):
        raise OSError("Failed to save cleared memory data")
    return cleared_memory
@@ -81,6 +82,8 @@ def create_memory_fact(
    category: str = "context",
    confidence: float = 0.5,
    agent_name: str | None = None,
    *,
    user_id: str | None = None,
 ) -> dict[str, Any]:
    """Create a new fact and persist the updated memory data."""
    normalized_content = content.strip()
@@ -90,7 +93,7 @@ def create_memory_fact(
    normalized_category = category.strip() or "context"
    validated_confidence = _validate_confidence(confidence)
    now = utc_now_iso_z()
-    memory_data = get_memory_data(agent_name)
+    memory_data = get_memory_data(agent_name, user_id=user_id)
    updated_memory = dict(memory_data)
    facts = list(memory_data.get("facts", []))
    facts.append(
@@ -105,15 +108,15 @@ def create_memory_fact(
    )
    updated_memory["facts"] = facts
-    if not _save_memory_to_file(updated_memory, agent_name):
+    if not _save_memory_to_file(updated_memory, agent_name, user_id=user_id):
        raise OSError("Failed to save memory data after creating fact")
    return updated_memory
-def delete_memory_fact(fact_id: str, agent_name: str | None = None) -> dict[str, Any]:
+def delete_memory_fact(fact_id: str, agent_name: str | None = None, *, user_id: str | None = None) -> dict[str, Any]:
    """Delete a fact by its id and persist the updated memory data."""
-    memory_data = get_memory_data(agent_name)
+    memory_data = get_memory_data(agent_name, user_id=user_id)
    facts = memory_data.get("facts", [])
    updated_facts = [fact for fact in facts if fact.get("id") != fact_id]
    if len(updated_facts) == len(facts):
@@ -122,7 +125,7 @@ def delete_memory_fact(fact_id: str, agent_name: str | None = None) -> dict[str,
    updated_memory = dict(memory_data)
    updated_memory["facts"] = updated_facts
-    if not _save_memory_to_file(updated_memory, agent_name):
+    if not _save_memory_to_file(updated_memory, agent_name, user_id=user_id):
        raise OSError(f"Failed to save memory data after deleting fact '{fact_id}'")
    return updated_memory
@@ -134,9 +137,11 @@ def update_memory_fact(
    category: str | None = None,
    confidence: float | None = None,
    agent_name: str | None = None,
    *,
    user_id: str | None = None,
 ) -> dict[str, Any]:
    """Update an existing fact and persist the updated memory data."""
-    memory_data = get_memory_data(agent_name)
+    memory_data = get_memory_data(agent_name, user_id=user_id)
    updated_memory = dict(memory_data)
    updated_facts: list[dict[str, Any]] = []
    found = False
@@ -163,7 +168,7 @@ def update_memory_fact(
    updated_memory["facts"] = updated_facts
-    if not _save_memory_to_file(updated_memory, agent_name):
+    if not _save_memory_to_file(updated_memory, agent_name, user_id=user_id):
        raise OSError(f"Failed to save memory data after updating fact '{fact_id}'")
    return updated_memory
@@ -276,6 +281,7 @@ class MemoryUpdater:
        agent_name: str | None = None,
        correction_detected: bool = False,
        reinforcement_detected: bool = False,
        user_id: str | None = None,
    ) -> bool:
        """Update memory based on conversation messages.
@@ -285,6 +291,7 @@ class MemoryUpdater:
            agent_name: If provided, updates per-agent memory. If None, updates global memory.
            correction_detected: Whether recent turns include an explicit correction signal.
            reinforcement_detected: Whether recent turns include a positive reinforcement signal.
            user_id: If provided, scopes memory to a specific user.
        Returns:
            True if update was successful, False otherwise.
@@ -298,7 +305,7 @@ class MemoryUpdater:
        try:
            # Get current memory
-            current_memory = get_memory_data(agent_name)
+            current_memory = get_memory_data(agent_name, user_id=user_id)
            # Format conversation for prompt
            conversation_text = format_conversation_for_update(messages)
@@ -353,7 +360,7 @@ class MemoryUpdater:
            updated_memory = _strip_upload_mentions_from_memory(updated_memory)
            # Save
-            return get_memory_storage().save(updated_memory, agent_name)
+            return get_memory_storage().save(updated_memory, agent_name, user_id=user_id)
        except json.JSONDecodeError as e:
            logger.warning("Failed to parse LLM response for memory update: %s", e)
@@ -455,6 +462,7 @@ def update_memory_from_conversation(
    agent_name: str | None = None,
    correction_detected: bool = False,
    reinforcement_detected: bool = False,
    user_id: str | None = None,
 ) -> bool:
    """Convenience function to update memory from a conversation.
@@ -464,9 +472,10 @@ def update_memory_from_conversation(
        agent_name: If provided, updates per-agent memory. If None, updates global memory.
        correction_detected: Whether recent turns include an explicit correction signal.
        reinforcement_detected: Whether recent turns include a positive reinforcement signal.
        user_id: If provided, scopes memory to a specific user.
    Returns:
        True if successful, False otherwise.
    """
    updater = MemoryUpdater()
-    return updater.update_memory(messages, thread_id, agent_name, correction_detected, reinforcement_detected)
+    return updater.update_memory(messages, thread_id, agent_name, correction_detected, reinforcement_detected, user_id=user_id)
@@ -283,7 +283,7 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
            # the conversation; injecting one mid-conversation crashes
            # langchain_anthropic's _format_messages(). HumanMessage works
            # with all providers. See #1299.
-            return {"messages": [HumanMessage(content=warning)]}
+            return {"messages": [HumanMessage(content=warning, name="loop_warning")]}
        return None
@@ -11,6 +11,7 @@ from langgraph.runtime import Runtime
 from deerflow.agents.memory.queue import get_memory_queue
 from deerflow.config.memory_config import get_memory_config
 from deerflow.runtime.user_context import get_effective_user_id
 logger = logging.getLogger(__name__)
@@ -236,11 +237,16 @@ class MemoryMiddleware(AgentMiddleware[MemoryMiddlewareState]):
        # Queue the filtered conversation for memory update
        correction_detected = detect_correction(filtered_messages)
        reinforcement_detected = not correction_detected and detect_reinforcement(filtered_messages)
        # Capture user_id at enqueue time while the request context is still alive.
        # threading.Timer fires on a different thread where ContextVar values are not
        # propagated, so we must store user_id explicitly in ConversationContext.
        user_id = get_effective_user_id()
        queue = get_memory_queue()
        queue.add(
            thread_id=thread_id,
            messages=filtered_messages,
            agent_name=self._agent_name,
            user_id=user_id,
            correction_detected=correction_detected,
            reinforcement_detected=reinforcement_detected,
        )
@@ -0,0 +1,13 @@
 from typing import override
 from langchain.agents.middleware import SummarizationMiddleware as BaseSummarizationMiddleware
 from langchain_core.messages.human import HumanMessage
 class SummarizationMiddleware(BaseSummarizationMiddleware):
    @override
    def _build_new_messages(self, summary: str) -> list[HumanMessage]:
        """Override the base implementation to let the human message with the special name 'summary'.
        And this message will be ignored to display in the frontend, but still can be used as context for the model.
        """
        return [HumanMessage(content=f"Here is a summary of the conversation to date:\n\n{summary}", name="summary")]
@@ -1,13 +1,16 @@
 import logging
 from datetime import UTC, datetime
 from typing import NotRequired, override
 from langchain.agents import AgentState
 from langchain.agents.middleware import AgentMiddleware
 from langchain_core.messages import HumanMessage
 from langgraph.config import get_config
 from langgraph.runtime import Runtime
 from deerflow.agents.thread_state import ThreadDataState
 from deerflow.config.paths import Paths, get_paths
 from deerflow.runtime.user_context import get_effective_user_id
 logger = logging.getLogger(__name__)
@@ -46,32 +49,34 @@ class ThreadDataMiddleware(AgentMiddleware[ThreadDataMiddlewareState]):
        self._paths = Paths(base_dir) if base_dir else get_paths()
        self._lazy_init = lazy_init
-    def _get_thread_paths(self, thread_id: str) -> dict[str, str]:
+    def _get_thread_paths(self, thread_id: str, user_id: str | None = None) -> dict[str, str]:
        """Get the paths for a thread's data directories.
        Args:
            thread_id: The thread ID.
            user_id: Optional user ID for per-user path isolation.
        Returns:
            Dictionary with workspace_path, uploads_path, and outputs_path.
        """
        return {
-            "workspace_path": str(self._paths.sandbox_work_dir(thread_id)),
+            "workspace_path": str(self._paths.sandbox_work_dir(thread_id, user_id=user_id)),
-            "uploads_path": str(self._paths.sandbox_uploads_dir(thread_id)),
+            "uploads_path": str(self._paths.sandbox_uploads_dir(thread_id, user_id=user_id)),
-            "outputs_path": str(self._paths.sandbox_outputs_dir(thread_id)),
+            "outputs_path": str(self._paths.sandbox_outputs_dir(thread_id, user_id=user_id)),
        }
-    def _create_thread_directories(self, thread_id: str) -> dict[str, str]:
+    def _create_thread_directories(self, thread_id: str, user_id: str | None = None) -> dict[str, str]:
        """Create the thread data directories.
        Args:
            thread_id: The thread ID.
            user_id: Optional user ID for per-user path isolation.
        Returns:
            Dictionary with the created directory paths.
        """
-        self._paths.ensure_thread_dirs(thread_id)
+        self._paths.ensure_thread_dirs(thread_id, user_id=user_id)
-        return self._get_thread_paths(thread_id)
+        return self._get_thread_paths(thread_id, user_id=user_id)
    @override
    def before_agent(self, state: ThreadDataMiddlewareState, runtime: Runtime) -> dict | None:
@@ -84,16 +89,30 @@ class ThreadDataMiddleware(AgentMiddleware[ThreadDataMiddlewareState]):
        if thread_id is None:
            raise ValueError("Thread ID is required in runtime context or config.configurable")
        user_id = get_effective_user_id()
        if self._lazy_init:
            # Lazy initialization: only compute paths, don't create directories
-            paths = self._get_thread_paths(thread_id)
+            paths = self._get_thread_paths(thread_id, user_id=user_id)
        else:
            # Eager initialization: create directories immediately
-            paths = self._create_thread_directories(thread_id)
+            paths = self._create_thread_directories(thread_id, user_id=user_id)
            logger.debug("Created thread data directories for thread %s", thread_id)
        messages = list(state.get("messages", []))
        last_message = messages[-1] if messages else None
        if last_message and isinstance(last_message, HumanMessage):
            messages[-1] = HumanMessage(
                content=last_message.content,
                id=last_message.id,
                name=last_message.name or "user-input",
                additional_kwargs={**last_message.additional_kwargs, "run_id": runtime.context.get("run_id"), "timestamp": datetime.now(UTC).isoformat()},
            )
        return {
            "thread_data": {
                **paths,
-            }
+            },
            "messages": messages,
        }
@@ -1,10 +1,11 @@
 """Middleware for automatic thread title generation."""
 import logging
-from typing import NotRequired, override
+from typing import Any, NotRequired, override
 from langchain.agents import AgentState
 from langchain.agents.middleware import AgentMiddleware
 from langgraph.config import get_config
 from langgraph.runtime import Runtime
 from deerflow.config.title_config import get_title_config
@@ -100,6 +101,20 @@ class TitleMiddleware(AgentMiddleware[TitleMiddlewareState]):
            return user_msg[:fallback_chars].rstrip() + "..."
        return user_msg if user_msg else "New Conversation"
    def _get_runnable_config(self) -> dict[str, Any]:
        """Inherit the parent RunnableConfig and add middleware tag.
        This ensures RunJournal identifies LLM calls from this middleware
        as ``middleware:title`` instead of ``lead_agent``.
        """
        try:
            parent = get_config()
        except Exception:
            parent = {}
        config = {**parent}
        config["tags"] = [*(config.get("tags") or []), "middleware:title"]
        return config
    def _generate_title_result(self, state: TitleMiddlewareState) -> dict | None:
        """Generate a local fallback title without blocking on an LLM call."""
        if not self._should_generate_title(state):
@@ -121,7 +136,7 @@ class TitleMiddleware(AgentMiddleware[TitleMiddlewareState]):
                model = create_chat_model(name=config.model_name, thinking_enabled=False)
            else:
                model = create_chat_model(thinking_enabled=False)
-            response = await model.ainvoke(prompt)
+            response = await model.ainvoke(prompt, config=self._get_runnable_config())
            title = self._parse_title(response.content)
            if title:
                return {"title": title}
@@ -10,6 +10,7 @@ from langchain_core.messages import HumanMessage
 from langgraph.runtime import Runtime
 from deerflow.config.paths import Paths, get_paths
 from deerflow.runtime.user_context import get_effective_user_id
 from deerflow.utils.file_conversion import extract_outline
 logger = logging.getLogger(__name__)
@@ -221,7 +222,7 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
                thread_id = get_config().get("configurable", {}).get("thread_id")
            except RuntimeError:
                pass  # get_config() raises outside a runnable context (e.g. unit tests)
-        uploads_dir = self._paths.sandbox_uploads_dir(thread_id) if thread_id else None
+        uploads_dir = self._paths.sandbox_uploads_dir(thread_id, user_id=get_effective_user_id()) if thread_id else None
        # Get newly uploaded files from the current message's additional_kwargs.files
        new_files = self._files_from_kwargs(last_message, uploads_dir) or []
@@ -278,6 +279,7 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
        updated_message = HumanMessage(
            content=f"{files_message}\n\n{original_content}",
            id=last_message.id,
            name=last_message.name,
            additional_kwargs=last_message.additional_kwargs,
        )
@@ -40,6 +40,7 @@ from deerflow.config.app_config import get_app_config, reload_app_config
 from deerflow.config.extensions_config import ExtensionsConfig, SkillStateConfig, get_extensions_config, reload_extensions_config
 from deerflow.config.paths import get_paths
 from deerflow.models import create_chat_model
 from deerflow.runtime.user_context import get_effective_user_id
 from deerflow.skills.installer import install_skill_from_archive
 from deerflow.uploads.manager import (
    claim_unique_filename,
@@ -240,7 +241,7 @@ class DeerFlowClient:
        }
        checkpointer = self._checkpointer
        if checkpointer is None:
-            from deerflow.agents.checkpointer import get_checkpointer
+            from deerflow.runtime.checkpointer import get_checkpointer
            checkpointer = get_checkpointer()
        if checkpointer is not None:
@@ -374,7 +375,7 @@ class DeerFlowClient:
        """
        checkpointer = self._checkpointer
        if checkpointer is None:
-            from deerflow.agents.checkpointer.provider import get_checkpointer
+            from deerflow.runtime.checkpointer.provider import get_checkpointer
            checkpointer = get_checkpointer()
@@ -429,7 +430,7 @@ class DeerFlowClient:
        """
        checkpointer = self._checkpointer
        if checkpointer is None:
-            from deerflow.agents.checkpointer.provider import get_checkpointer
+            from deerflow.runtime.checkpointer.provider import get_checkpointer
            checkpointer = get_checkpointer()
@@ -769,19 +770,19 @@ class DeerFlowClient:
        """
        from deerflow.agents.memory.updater import get_memory_data
-        return get_memory_data()
+        return get_memory_data(user_id=get_effective_user_id())
    def export_memory(self) -> dict:
        """Export current memory data for backup or transfer."""
        from deerflow.agents.memory.updater import get_memory_data
-        return get_memory_data()
+        return get_memory_data(user_id=get_effective_user_id())
    def import_memory(self, memory_data: dict) -> dict:
        """Import and persist full memory data."""
        from deerflow.agents.memory.updater import import_memory_data
-        return import_memory_data(memory_data)
+        return import_memory_data(memory_data, user_id=get_effective_user_id())
    def get_model(self, name: str) -> dict | None:
        """Get a specific model's configuration by name.
@@ -956,13 +957,13 @@ class DeerFlowClient:
        """
        from deerflow.agents.memory.updater import reload_memory_data
-        return reload_memory_data()
+        return reload_memory_data(user_id=get_effective_user_id())
    def clear_memory(self) -> dict:
        """Clear all persisted memory data."""
        from deerflow.agents.memory.updater import clear_memory_data
-        return clear_memory_data()
+        return clear_memory_data(user_id=get_effective_user_id())
    def create_memory_fact(self, content: str, category: str = "context", confidence: float = 0.5) -> dict:
        """Create a single fact manually."""
@@ -1179,7 +1180,7 @@ class DeerFlowClient:
            ValueError: If the path is invalid.
        """
        try:
-            actual = get_paths().resolve_virtual_path(thread_id, path)
+            actual = get_paths().resolve_virtual_path(thread_id, path, user_id=get_effective_user_id())
        except ValueError as exc:
            if "traversal" in str(exc):
                from deerflow.uploads.manager import PathTraversalError
@@ -27,6 +27,7 @@ except ImportError:  # pragma: no cover - Windows fallback
 from deerflow.config import get_app_config
 from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
 from deerflow.runtime.user_context import get_effective_user_id
 from deerflow.sandbox.sandbox import Sandbox
 from deerflow.sandbox.sandbox_provider import SandboxProvider
@@ -260,15 +261,16 @@ class AioSandboxProvider(SandboxProvider):
        mounted Docker socket (DooD), the host Docker daemon can resolve the paths.
        """
        paths = get_paths()
-        paths.ensure_thread_dirs(thread_id)
+        user_id = get_effective_user_id()
        paths.ensure_thread_dirs(thread_id, user_id=user_id)
        return [
-            (paths.host_sandbox_work_dir(thread_id), f"{VIRTUAL_PATH_PREFIX}/workspace", False),
+            (paths.host_sandbox_work_dir(thread_id, user_id=user_id), f"{VIRTUAL_PATH_PREFIX}/workspace", False),
-            (paths.host_sandbox_uploads_dir(thread_id), f"{VIRTUAL_PATH_PREFIX}/uploads", False),
+            (paths.host_sandbox_uploads_dir(thread_id, user_id=user_id), f"{VIRTUAL_PATH_PREFIX}/uploads", False),
-            (paths.host_sandbox_outputs_dir(thread_id), f"{VIRTUAL_PATH_PREFIX}/outputs", False),
+            (paths.host_sandbox_outputs_dir(thread_id, user_id=user_id), f"{VIRTUAL_PATH_PREFIX}/outputs", False),
            # ACP workspace: read-only inside the sandbox (lead agent reads results;
            # the ACP subprocess writes from the host side, not from within the container).
-            (paths.host_acp_workspace_dir(thread_id), "/mnt/acp-workspace", True),
+            (paths.host_acp_workspace_dir(thread_id, user_id=user_id), "/mnt/acp-workspace", True),
        ]
    @staticmethod
@@ -480,8 +482,9 @@ class AioSandboxProvider(SandboxProvider):
        across multiple processes, preventing container-name conflicts.
        """
        paths = get_paths()
-        paths.ensure_thread_dirs(thread_id)
+        user_id = get_effective_user_id()
-        lock_path = paths.thread_dir(thread_id) / f"{sandbox_id}.lock"
+        paths.ensure_thread_dirs(thread_id, user_id=user_id)
        lock_path = paths.thread_dir(thread_id, user_id=user_id) / f"{sandbox_id}.lock"
        with open(lock_path, "a", encoding="utf-8") as lock_file:
            locked = False
@@ -10,10 +10,12 @@ from pydantic import BaseModel, ConfigDict, Field
 from deerflow.config.acp_config import load_acp_config_from_dict
 from deerflow.config.checkpointer_config import CheckpointerConfig, load_checkpointer_config_from_dict
 from deerflow.config.database_config import DatabaseConfig
 from deerflow.config.extensions_config import ExtensionsConfig
 from deerflow.config.guardrails_config import GuardrailsConfig, load_guardrails_config_from_dict
 from deerflow.config.memory_config import MemoryConfig, load_memory_config_from_dict
 from deerflow.config.model_config import ModelConfig
 from deerflow.config.run_events_config import RunEventsConfig
 from deerflow.config.sandbox_config import SandboxConfig
 from deerflow.config.skill_evolution_config import SkillEvolutionConfig
 from deerflow.config.skills_config import SkillsConfig
@@ -56,6 +58,8 @@ class AppConfig(BaseModel):
    subagents: SubagentsAppConfig = Field(default_factory=SubagentsAppConfig, description="Subagent runtime configuration")
    guardrails: GuardrailsConfig = Field(default_factory=GuardrailsConfig, description="Guardrail middleware configuration")
    model_config = ConfigDict(extra="allow", frozen=False)
    database: DatabaseConfig = Field(default_factory=DatabaseConfig, description="Unified database backend configuration")
    run_events: RunEventsConfig = Field(default_factory=RunEventsConfig, description="Run event storage configuration")
    checkpointer: CheckpointerConfig | None = Field(default=None, description="Checkpointer configuration")
    stream_bridge: StreamBridgeConfig | None = Field(default=None, description="Stream bridge configuration")
@@ -0,0 +1,102 @@
 """Unified database backend configuration.
 Controls BOTH the LangGraph checkpointer and the DeerFlow application
 persistence layer (runs, threads metadata, users, etc.). The user
 configures one backend; the system handles physical separation details.
 SQLite mode: checkpointer and app share a single .db file
 ({sqlite_dir}/deerflow.db) with WAL journal mode enabled on every
 connection. WAL allows concurrent readers and a single writer without
 blocking, making a unified file safe for both workloads.  Writers
 that contend for the lock wait via the default 5-second sqlite3
 busy timeout rather than failing immediately.
 Postgres mode: both use the same database URL but maintain independent
 connection pools with different lifecycles.
 Memory mode: checkpointer uses MemorySaver, app uses in-memory stores.
 No database is initialized.
 Sensitive values (postgres_url) should use $VAR syntax in config.yaml
 to reference environment variables from .env:
    database:
      backend: postgres
      postgres_url: $DATABASE_URL
 The $VAR resolution is handled by AppConfig.resolve_env_variables()
 before this config is instantiated -- DatabaseConfig itself does not
 need to do any environment variable processing.
 """
 from __future__ import annotations
 import os
 from typing import Literal
 from pydantic import BaseModel, Field
 class DatabaseConfig(BaseModel):
    backend: Literal["memory", "sqlite", "postgres"] = Field(
        default="memory",
        description=("Storage backend for both checkpointer and application data. 'memory' for development (no persistence across restarts), 'sqlite' for single-node deployment, 'postgres' for production multi-node deployment."),
    )
    sqlite_dir: str = Field(
        default=".deer-flow/data",
        description=("Directory for the SQLite database file. Both checkpointer and application data share {sqlite_dir}/deerflow.db."),
    )
    postgres_url: str = Field(
        default="",
        description=(
            "PostgreSQL connection URL, shared by checkpointer and app. "
            "Use $DATABASE_URL in config.yaml to reference .env. "
            "Example: postgresql://user:pass@host:5432/deerflow "
            "(the +asyncpg driver suffix is added automatically where needed)."
        ),
    )
    echo_sql: bool = Field(
        default=False,
        description="Echo all SQL statements to log (debug only).",
    )
    pool_size: int = Field(
        default=5,
        description="Connection pool size for the app ORM engine (postgres only).",
    )
    # -- Derived helpers (not user-configured) --
    @property
    def _resolved_sqlite_dir(self) -> str:
        """Resolve sqlite_dir to an absolute path (relative to CWD)."""
        from pathlib import Path
        return str(Path(self.sqlite_dir).resolve())
    @property
    def sqlite_path(self) -> str:
        """Unified SQLite file path shared by checkpointer and app."""
        return os.path.join(self._resolved_sqlite_dir, "deerflow.db")
    # Backward-compatible aliases
    @property
    def checkpointer_sqlite_path(self) -> str:
        """SQLite file path for the LangGraph checkpointer (alias for sqlite_path)."""
        return self.sqlite_path
    @property
    def app_sqlite_path(self) -> str:
        """SQLite file path for application ORM data (alias for sqlite_path)."""
        return self.sqlite_path
    @property
    def app_sqlalchemy_url(self) -> str:
        """SQLAlchemy async URL for the application ORM engine."""
        if self.backend == "sqlite":
            return f"sqlite+aiosqlite:///{self.sqlite_path}"
        if self.backend == "postgres":
            url = self.postgres_url
            if url.startswith("postgresql://"):
                url = url.replace("postgresql://", "postgresql+asyncpg://", 1)
            return url
        raise ValueError(f"No SQLAlchemy URL for backend={self.backend!r}")
@@ -14,8 +14,9 @@ class MemoryConfig(BaseModel):
        default="",
        description=(
            "Path to store memory data. "
-            "If empty, defaults to `{base_dir}/memory.json` (see Paths.memory_file). "
+            "If empty, defaults to per-user memory at `{base_dir}/users/{user_id}/memory.json`. "
-            "Absolute paths are used as-is. "
+            "Absolute paths are used as-is and opt out of per-user isolation "
            "(all users share the same file). "
            "Relative paths are resolved against `Paths.base_dir` "
            "(not the backend working directory). "
            "Note: if you previously set this to `.deer-flow/memory.json`, "
@@ -7,6 +7,7 @@ from pathlib import Path, PureWindowsPath
 VIRTUAL_PATH_PREFIX = "/mnt/user-data"
 _SAFE_THREAD_ID_RE = re.compile(r"^[A-Za-z0-9_\-]+$")
 _SAFE_USER_ID_RE = re.compile(r"^[A-Za-z0-9_\-]+$")
 def _default_local_base_dir() -> Path:
@@ -22,6 +23,13 @@ def _validate_thread_id(thread_id: str) -> str:
    return thread_id
 def _validate_user_id(user_id: str) -> str:
    """Validate a user ID before using it in filesystem paths."""
    if not _SAFE_USER_ID_RE.match(user_id):
        raise ValueError(f"Invalid user_id {user_id!r}: only alphanumeric characters, hyphens, and underscores are allowed.")
    return user_id
 def _join_host_path(base: str, *parts: str) -> str:
    """Join host filesystem path segments while preserving native style.
@@ -134,44 +142,63 @@ class Paths:
        """Per-agent memory file: `{base_dir}/agents/{name}/memory.json`."""
        return self.agent_dir(name) / "memory.json"
-    def thread_dir(self, thread_id: str) -> Path:
+    def user_dir(self, user_id: str) -> Path:
        """Directory for a specific user: `{base_dir}/users/{user_id}/`."""
        return self.base_dir / "users" / _validate_user_id(user_id)
    def user_memory_file(self, user_id: str) -> Path:
        """Per-user memory file: `{base_dir}/users/{user_id}/memory.json`."""
        return self.user_dir(user_id) / "memory.json"
    def user_agent_memory_file(self, user_id: str, agent_name: str) -> Path:
        """Per-user per-agent memory: `{base_dir}/users/{user_id}/agents/{name}/memory.json`."""
        return self.user_dir(user_id) / "agents" / agent_name.lower() / "memory.json"
    def thread_dir(self, thread_id: str, *, user_id: str | None = None) -> Path:
        """
-        Host path for a thread's data: `{base_dir}/threads/{thread_id}/`
+        Host path for a thread's data.
        When *user_id* is provided:
            `{base_dir}/users/{user_id}/threads/{thread_id}/`
        Otherwise (legacy layout):
            `{base_dir}/threads/{thread_id}/`
        This directory contains a `user-data/` subdirectory that is mounted
        as `/mnt/user-data/` inside the sandbox.
        Raises:
-            ValueError: If `thread_id` contains unsafe characters (path separators
+            ValueError: If `thread_id` or `user_id` contains unsafe characters (path
-                        or `..`) that could cause directory traversal.
+                        separators or `..`) that could cause directory traversal.
        """
        if user_id is not None:
            return self.user_dir(user_id) / "threads" / _validate_thread_id(thread_id)
        return self.base_dir / "threads" / _validate_thread_id(thread_id)
-    def sandbox_work_dir(self, thread_id: str) -> Path:
+    def sandbox_work_dir(self, thread_id: str, *, user_id: str | None = None) -> Path:
        """
        Host path for the agent's workspace directory.
        Host: `{base_dir}/threads/{thread_id}/user-data/workspace/`
        Sandbox: `/mnt/user-data/workspace/`
        """
-        return self.thread_dir(thread_id) / "user-data" / "workspace"
+        return self.thread_dir(thread_id, user_id=user_id) / "user-data" / "workspace"
-    def sandbox_uploads_dir(self, thread_id: str) -> Path:
+    def sandbox_uploads_dir(self, thread_id: str, *, user_id: str | None = None) -> Path:
        """
        Host path for user-uploaded files.
        Host: `{base_dir}/threads/{thread_id}/user-data/uploads/`
        Sandbox: `/mnt/user-data/uploads/`
        """
-        return self.thread_dir(thread_id) / "user-data" / "uploads"
+        return self.thread_dir(thread_id, user_id=user_id) / "user-data" / "uploads"
-    def sandbox_outputs_dir(self, thread_id: str) -> Path:
+    def sandbox_outputs_dir(self, thread_id: str, *, user_id: str | None = None) -> Path:
        """
        Host path for agent-generated artifacts.
        Host: `{base_dir}/threads/{thread_id}/user-data/outputs/`
        Sandbox: `/mnt/user-data/outputs/`
        """
-        return self.thread_dir(thread_id) / "user-data" / "outputs"
+        return self.thread_dir(thread_id, user_id=user_id) / "user-data" / "outputs"
-    def acp_workspace_dir(self, thread_id: str) -> Path:
+    def acp_workspace_dir(self, thread_id: str, *, user_id: str | None = None) -> Path:
        """
        Host path for the ACP workspace of a specific thread.
        Host: `{base_dir}/threads/{thread_id}/acp-workspace/`
@@ -180,41 +207,43 @@ class Paths:
        Each thread gets its own isolated ACP workspace so that concurrent
        sessions cannot read each other's ACP agent outputs.
        """
-        return self.thread_dir(thread_id) / "acp-workspace"
+        return self.thread_dir(thread_id, user_id=user_id) / "acp-workspace"
-    def sandbox_user_data_dir(self, thread_id: str) -> Path:
+    def sandbox_user_data_dir(self, thread_id: str, *, user_id: str | None = None) -> Path:
        """
        Host path for the user-data root.
        Host: `{base_dir}/threads/{thread_id}/user-data/`
        Sandbox: `/mnt/user-data/`
        """
-        return self.thread_dir(thread_id) / "user-data"
+        return self.thread_dir(thread_id, user_id=user_id) / "user-data"
-    def host_thread_dir(self, thread_id: str) -> str:
+    def host_thread_dir(self, thread_id: str, *, user_id: str | None = None) -> str:
        """Host path for a thread directory, preserving Windows path syntax."""
        if user_id is not None:
            return _join_host_path(self._host_base_dir_str(), "users", _validate_user_id(user_id), "threads", _validate_thread_id(thread_id))
        return _join_host_path(self._host_base_dir_str(), "threads", _validate_thread_id(thread_id))
-    def host_sandbox_user_data_dir(self, thread_id: str) -> str:
+    def host_sandbox_user_data_dir(self, thread_id: str, *, user_id: str | None = None) -> str:
        """Host path for a thread's user-data root."""
-        return _join_host_path(self.host_thread_dir(thread_id), "user-data")
+        return _join_host_path(self.host_thread_dir(thread_id, user_id=user_id), "user-data")
-    def host_sandbox_work_dir(self, thread_id: str) -> str:
+    def host_sandbox_work_dir(self, thread_id: str, *, user_id: str | None = None) -> str:
        """Host path for the workspace mount source."""
-        return _join_host_path(self.host_sandbox_user_data_dir(thread_id), "workspace")
+        return _join_host_path(self.host_sandbox_user_data_dir(thread_id, user_id=user_id), "workspace")
-    def host_sandbox_uploads_dir(self, thread_id: str) -> str:
+    def host_sandbox_uploads_dir(self, thread_id: str, *, user_id: str | None = None) -> str:
        """Host path for the uploads mount source."""
-        return _join_host_path(self.host_sandbox_user_data_dir(thread_id), "uploads")
+        return _join_host_path(self.host_sandbox_user_data_dir(thread_id, user_id=user_id), "uploads")
-    def host_sandbox_outputs_dir(self, thread_id: str) -> str:
+    def host_sandbox_outputs_dir(self, thread_id: str, *, user_id: str | None = None) -> str:
        """Host path for the outputs mount source."""
-        return _join_host_path(self.host_sandbox_user_data_dir(thread_id), "outputs")
+        return _join_host_path(self.host_sandbox_user_data_dir(thread_id, user_id=user_id), "outputs")
-    def host_acp_workspace_dir(self, thread_id: str) -> str:
+    def host_acp_workspace_dir(self, thread_id: str, *, user_id: str | None = None) -> str:
        """Host path for the ACP workspace mount source."""
-        return _join_host_path(self.host_thread_dir(thread_id), "acp-workspace")
+        return _join_host_path(self.host_thread_dir(thread_id, user_id=user_id), "acp-workspace")
-    def ensure_thread_dirs(self, thread_id: str) -> None:
+    def ensure_thread_dirs(self, thread_id: str, *, user_id: str | None = None) -> None:
        """Create all standard sandbox directories for a thread.
        Directories are created with mode 0o777 so that sandbox containers
@@ -228,24 +257,24 @@ class Paths:
        ACP agent invocation.
        """
        for d in [
-            self.sandbox_work_dir(thread_id),
+            self.sandbox_work_dir(thread_id, user_id=user_id),
-            self.sandbox_uploads_dir(thread_id),
+            self.sandbox_uploads_dir(thread_id, user_id=user_id),
-            self.sandbox_outputs_dir(thread_id),
+            self.sandbox_outputs_dir(thread_id, user_id=user_id),
-            self.acp_workspace_dir(thread_id),
+            self.acp_workspace_dir(thread_id, user_id=user_id),
        ]:
            d.mkdir(parents=True, exist_ok=True)
            d.chmod(0o777)
-    def delete_thread_dir(self, thread_id: str) -> None:
+    def delete_thread_dir(self, thread_id: str, *, user_id: str | None = None) -> None:
        """Delete all persisted data for a thread.
        The operation is idempotent: missing thread directories are ignored.
        """
-        thread_dir = self.thread_dir(thread_id)
+        thread_dir = self.thread_dir(thread_id, user_id=user_id)
        if thread_dir.exists():
            shutil.rmtree(thread_dir)
-    def resolve_virtual_path(self, thread_id: str, virtual_path: str) -> Path:
+    def resolve_virtual_path(self, thread_id: str, virtual_path: str, *, user_id: str | None = None) -> Path:
        """Resolve a sandbox virtual path to the actual host filesystem path.
        Args:
@@ -253,6 +282,7 @@ class Paths:
            virtual_path: Virtual path as seen inside the sandbox, e.g.
                          ``/mnt/user-data/outputs/report.pdf``.
                          Leading slashes are stripped before matching.
            user_id: Optional user ID for user-scoped path resolution.
        Returns:
            The resolved absolute host filesystem path.
@@ -270,7 +300,7 @@ class Paths:
            raise ValueError(f"Path must start with /{prefix}")
        relative = stripped[len(prefix) :].lstrip("/")
-        base = self.sandbox_user_data_dir(thread_id).resolve()
+        base = self.sandbox_user_data_dir(thread_id, user_id=user_id).resolve()
        actual = (base / relative).resolve()
        try:
@@ -0,0 +1,33 @@
 """Run event storage configuration.
 Controls where run events (messages + execution traces) are persisted.
 Backends:
 - memory: In-memory storage, data lost on restart. Suitable for
  development and testing.
 - db: SQL database via SQLAlchemy ORM. Provides full query capability.
  Suitable for production deployments.
 - jsonl: Append-only JSONL files. Lightweight alternative for
  single-node deployments that need persistence without a database.
 """
 from __future__ import annotations
 from typing import Literal
 from pydantic import BaseModel, Field
 class RunEventsConfig(BaseModel):
    backend: Literal["memory", "db", "jsonl"] = Field(
        default="memory",
        description="Storage backend for run events. 'memory' for development (no persistence), 'db' for production (SQL queries), 'jsonl' for lightweight single-node persistence.",
    )
    max_trace_content: int = Field(
        default=10240,
        description="Maximum trace content size in bytes before truncation (db backend only).",
    )
    track_token_usage: bool = Field(
        default=True,
        description="Whether RunJournal should accumulate token counts to RunRow.",
    )
@@ -113,7 +113,16 @@ def create_chat_model(name: str | None = None, thinking_enabled: bool = False, *
        elif "reasoning_effort" not in model_settings_from_config:
            model_settings_from_config["reasoning_effort"] = "medium"
-    model_instance = model_class(**{**model_settings_from_config, **kwargs})
+    # Ensure stream_usage is enabled so that token usage metadata is available
    # in streaming responses.  LangChain's BaseChatOpenAI only defaults
    # stream_usage=True when no custom base_url/api_base is set, so models
    # hitting third-party endpoints (e.g. doubao, deepseek) silently lose
    # usage data.  We default it to True unless explicitly configured.
    if "stream_usage" not in model_settings_from_config and "stream_usage" not in kwargs:
        if "stream_usage" in getattr(model_class, "model_fields", {}):
            model_settings_from_config["stream_usage"] = True
    model_instance = model_class(**kwargs, **model_settings_from_config)
    callbacks = build_tracing_callbacks()
    if callbacks:
@@ -0,0 +1,13 @@
 """DeerFlow application persistence layer (SQLAlchemy 2.0 async ORM).
 This module manages DeerFlow's own application data -- runs metadata,
 thread ownership, cron jobs, users. It is completely separate from
 LangGraph's checkpointer, which manages graph execution state.
 Usage:
    from deerflow.persistence import init_engine, close_engine, get_session_factory
 """
 from deerflow.persistence.engine import close_engine, get_engine, get_session_factory, init_engine
 __all__ = ["close_engine", "get_engine", "get_session_factory", "init_engine"]
@@ -0,0 +1,40 @@
 """SQLAlchemy declarative base with automatic to_dict support.
 All DeerFlow ORM models inherit from this Base. It provides a generic
 to_dict() method via SQLAlchemy's inspect() so individual models don't
 need to write their own serialization logic.
 LangGraph's checkpointer tables are NOT managed by this Base.
 """
 from __future__ import annotations
 from sqlalchemy import inspect as sa_inspect
 from sqlalchemy.orm import DeclarativeBase
 class Base(DeclarativeBase):
    """Base class for all DeerFlow ORM models.
    Provides:
    - Automatic to_dict() via SQLAlchemy column inspection.
    - Standard __repr__() showing all column values.
    """
    def to_dict(self, *, exclude: set[str] | None = None) -> dict:
        """Convert ORM instance to plain dict.
        Uses SQLAlchemy's inspect() to iterate mapped column attributes.
        Args:
            exclude: Optional set of column keys to omit.
        Returns:
            Dict of {column_key: value} for all mapped columns.
        """
        exclude = exclude or set()
        return {c.key: getattr(self, c.key) for c in sa_inspect(type(self)).mapper.column_attrs if c.key not in exclude}
    def __repr__(self) -> str:
        cols = ", ".join(f"{c.key}={getattr(self, c.key)!r}" for c in sa_inspect(type(self)).mapper.column_attrs)
        return f"{type(self).__name__}({cols})"
@@ -0,0 +1,190 @@
 """Async SQLAlchemy engine lifecycle management.
 Initializes at Gateway startup, provides session factory for
 repositories, disposes at shutdown.
 When database.backend="memory", init_engine is a no-op and
 get_session_factory() returns None. Repositories must check for
 None and fall back to in-memory implementations.
 """
 from __future__ import annotations
 import json
 import logging
 from sqlalchemy.ext.asyncio import AsyncEngine, AsyncSession, async_sessionmaker, create_async_engine
 def _json_serializer(obj: object) -> str:
    """JSON serializer with ensure_ascii=False for Chinese character support."""
    return json.dumps(obj, ensure_ascii=False)
 logger = logging.getLogger(__name__)
 _engine: AsyncEngine | None = None
 _session_factory: async_sessionmaker[AsyncSession] | None = None
 async def _auto_create_postgres_db(url: str) -> None:
    """Connect to the ``postgres`` maintenance DB and CREATE DATABASE.
    The target database name is extracted from *url*.  The connection is
    made to the default ``postgres`` database on the same server using
    ``AUTOCOMMIT`` isolation (CREATE DATABASE cannot run inside a
    transaction).
    """
    from sqlalchemy import text
    from sqlalchemy.engine.url import make_url
    parsed = make_url(url)
    db_name = parsed.database
    if not db_name:
        raise ValueError("Cannot auto-create database: no database name in URL")
    # Connect to the default 'postgres' database to issue CREATE DATABASE
    maint_url = parsed.set(database="postgres")
    maint_engine = create_async_engine(maint_url, isolation_level="AUTOCOMMIT")
    try:
        async with maint_engine.connect() as conn:
            await conn.execute(text(f'CREATE DATABASE "{db_name}"'))
        logger.info("Auto-created PostgreSQL database: %s", db_name)
    finally:
        await maint_engine.dispose()
 async def init_engine(
    backend: str,
    *,
    url: str = "",
    echo: bool = False,
    pool_size: int = 5,
    sqlite_dir: str = "",
 ) -> None:
    """Create the async engine and session factory, then auto-create tables.
    Args:
        backend: "memory", "sqlite", or "postgres".
        url: SQLAlchemy async URL (for sqlite/postgres).
        echo: Echo SQL to log.
        pool_size: Postgres connection pool size.
        sqlite_dir: Directory to create for SQLite (ensured to exist).
    """
    global _engine, _session_factory
    if backend == "memory":
        logger.info("Persistence backend=memory -- ORM engine not initialized")
        return
    if backend == "postgres":
        try:
            import asyncpg  # noqa: F401
        except ImportError:
            raise ImportError("database.backend is set to 'postgres' but asyncpg is not installed.\nInstall it with:\n    uv sync --extra postgres\nOr switch to backend: sqlite in config.yaml for single-node deployment.") from None
    if backend == "sqlite":
        import os
        from sqlalchemy import event
        os.makedirs(sqlite_dir or ".", exist_ok=True)
        _engine = create_async_engine(url, echo=echo, json_serializer=_json_serializer)
        # Enable WAL on every new connection. SQLite PRAGMA settings are
        # per-connection, so we wire the listener instead of running PRAGMA
        # once at startup. WAL gives concurrent reads + writers without
        # blocking and is the standard recommendation for any production
        # SQLite deployment (TC-UPG-06 in AUTH_TEST_PLAN.md). The companion
        # ``synchronous=NORMAL`` is the safe-and-fast pairing — fsync only
        # at WAL checkpoint boundaries instead of every commit.
        # Note: we do not set PRAGMA busy_timeout here — Python's sqlite3
        # driver already defaults to a 5-second busy timeout (see the
        # ``timeout`` kwarg of ``sqlite3.connect``), and aiosqlite /
        # SQLAlchemy's aiosqlite dialect inherit that default.  Setting
        # it again would be a no-op.
        @event.listens_for(_engine.sync_engine, "connect")
        def _enable_sqlite_wal(dbapi_conn, _record):  # noqa: ARG001 — SQLAlchemy contract
            cursor = dbapi_conn.cursor()
            try:
                cursor.execute("PRAGMA journal_mode=WAL;")
                cursor.execute("PRAGMA synchronous=NORMAL;")
                cursor.execute("PRAGMA foreign_keys=ON;")
            finally:
                cursor.close()
    elif backend == "postgres":
        _engine = create_async_engine(
            url,
            echo=echo,
            pool_size=pool_size,
            pool_pre_ping=True,
            json_serializer=_json_serializer,
        )
    else:
        raise ValueError(f"Unknown persistence backend: {backend!r}")
    _session_factory = async_sessionmaker(_engine, expire_on_commit=False)
    # Auto-create tables (dev convenience). Production should use Alembic.
    from deerflow.persistence.base import Base
    # Import all models so Base.metadata discovers them.
    # When no models exist yet (scaffolding phase), this is a no-op.
    try:
        import deerflow.persistence.models  # noqa: F401
    except ImportError:
        # Models package not yet available — tables won't be auto-created.
        # This is expected during initial scaffolding or minimal installs.
        logger.debug("deerflow.persistence.models not found; skipping auto-create tables")
    try:
        async with _engine.begin() as conn:
            await conn.run_sync(Base.metadata.create_all)
    except Exception as exc:
        if backend == "postgres" and "does not exist" in str(exc):
            # Database not yet created — attempt to auto-create it, then retry.
            await _auto_create_postgres_db(url)
            # Rebuild engine against the now-existing database
            await _engine.dispose()
            _engine = create_async_engine(url, echo=echo, pool_size=pool_size, pool_pre_ping=True, json_serializer=_json_serializer)
            _session_factory = async_sessionmaker(_engine, expire_on_commit=False)
            async with _engine.begin() as conn:
                await conn.run_sync(Base.metadata.create_all)
        else:
            raise
    logger.info("Persistence engine initialized: backend=%s", backend)
 async def init_engine_from_config(config) -> None:
    """Convenience: init engine from a DatabaseConfig object."""
    if config.backend == "memory":
        await init_engine("memory")
        return
    await init_engine(
        backend=config.backend,
        url=config.app_sqlalchemy_url,
        echo=config.echo_sql,
        pool_size=config.pool_size,
        sqlite_dir=config.sqlite_dir if config.backend == "sqlite" else "",
    )
 def get_session_factory() -> async_sessionmaker[AsyncSession] | None:
    """Return the async session factory, or None if backend=memory."""
    return _session_factory
 def get_engine() -> AsyncEngine | None:
    """Return the async engine, or None if not initialized."""
    return _engine
 async def close_engine() -> None:
    """Dispose the engine, release all connections."""
    global _engine, _session_factory
    if _engine is not None:
        await _engine.dispose()
        logger.info("Persistence engine closed")
    _engine = None
    _session_factory = None
@@ -0,0 +1,6 @@
 """Feedback persistence — ORM and SQL repository."""
 from deerflow.persistence.feedback.model import FeedbackRow
 from deerflow.persistence.feedback.sql import FeedbackRepository
 __all__ = ["FeedbackRepository", "FeedbackRow"]
@@ -0,0 +1,34 @@
 """ORM model for user feedback on runs."""
 from __future__ import annotations
 from datetime import UTC, datetime
 from sqlalchemy import DateTime, String, Text, UniqueConstraint
 from sqlalchemy.orm import Mapped, mapped_column
 from deerflow.persistence.base import Base
 class FeedbackRow(Base):
    __tablename__ = "feedback"
    __table_args__ = (
        UniqueConstraint("thread_id", "run_id", "user_id", name="uq_feedback_thread_run_user"),
    )
    feedback_id: Mapped[str] = mapped_column(String(64), primary_key=True)
    run_id: Mapped[str] = mapped_column(String(64), nullable=False, index=True)
    thread_id: Mapped[str] = mapped_column(String(64), nullable=False, index=True)
    user_id: Mapped[str | None] = mapped_column(String(64), index=True)
    message_id: Mapped[str | None] = mapped_column(String(64))
    # message_id is an optional RunEventStore event identifier —
    # allows feedback to target a specific message or the entire run
    rating: Mapped[int] = mapped_column(nullable=False)
    # +1 (thumbs-up) or -1 (thumbs-down)
    comment: Mapped[str | None] = mapped_column(Text)
    # Optional text feedback from the user
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC))
@@ -0,0 +1,217 @@
 """SQLAlchemy-backed feedback storage.
 Each method acquires its own short-lived session.
 """
 from __future__ import annotations
 import uuid
 from datetime import UTC, datetime
 from sqlalchemy import case, func, select
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
 from deerflow.persistence.feedback.model import FeedbackRow
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, resolve_user_id
 class FeedbackRepository:
    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
        self._sf = session_factory
    @staticmethod
    def _row_to_dict(row: FeedbackRow) -> dict:
        d = row.to_dict()
        val = d.get("created_at")
        if isinstance(val, datetime):
            d["created_at"] = val.isoformat()
        return d
    async def create(
        self,
        *,
        run_id: str,
        thread_id: str,
        rating: int,
        user_id: str | None | _AutoSentinel = AUTO,
        message_id: str | None = None,
        comment: str | None = None,
    ) -> dict:
        """Create a feedback record. rating must be +1 or -1."""
        if rating not in (1, -1):
            raise ValueError(f"rating must be +1 or -1, got {rating}")
        resolved_user_id = resolve_user_id(user_id, method_name="FeedbackRepository.create")
        row = FeedbackRow(
            feedback_id=str(uuid.uuid4()),
            run_id=run_id,
            thread_id=thread_id,
            user_id=resolved_user_id,
            message_id=message_id,
            rating=rating,
            comment=comment,
            created_at=datetime.now(UTC),
        )
        async with self._sf() as session:
            session.add(row)
            await session.commit()
            await session.refresh(row)
            return self._row_to_dict(row)
    async def get(
        self,
        feedback_id: str,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> dict | None:
        resolved_user_id = resolve_user_id(user_id, method_name="FeedbackRepository.get")
        async with self._sf() as session:
            row = await session.get(FeedbackRow, feedback_id)
            if row is None:
                return None
            if resolved_user_id is not None and row.user_id != resolved_user_id:
                return None
            return self._row_to_dict(row)
    async def list_by_run(
        self,
        thread_id: str,
        run_id: str,
        *,
        limit: int = 100,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> list[dict]:
        resolved_user_id = resolve_user_id(user_id, method_name="FeedbackRepository.list_by_run")
        stmt = select(FeedbackRow).where(FeedbackRow.thread_id == thread_id, FeedbackRow.run_id == run_id)
        if resolved_user_id is not None:
            stmt = stmt.where(FeedbackRow.user_id == resolved_user_id)
        stmt = stmt.order_by(FeedbackRow.created_at.asc()).limit(limit)
        async with self._sf() as session:
            result = await session.execute(stmt)
            return [self._row_to_dict(r) for r in result.scalars()]
    async def list_by_thread(
        self,
        thread_id: str,
        *,
        limit: int = 100,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> list[dict]:
        resolved_user_id = resolve_user_id(user_id, method_name="FeedbackRepository.list_by_thread")
        stmt = select(FeedbackRow).where(FeedbackRow.thread_id == thread_id)
        if resolved_user_id is not None:
            stmt = stmt.where(FeedbackRow.user_id == resolved_user_id)
        stmt = stmt.order_by(FeedbackRow.created_at.asc()).limit(limit)
        async with self._sf() as session:
            result = await session.execute(stmt)
            return [self._row_to_dict(r) for r in result.scalars()]
    async def delete(
        self,
        feedback_id: str,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> bool:
        resolved_user_id = resolve_user_id(user_id, method_name="FeedbackRepository.delete")
        async with self._sf() as session:
            row = await session.get(FeedbackRow, feedback_id)
            if row is None:
                return False
            if resolved_user_id is not None and row.user_id != resolved_user_id:
                return False
            await session.delete(row)
            await session.commit()
            return True
    async def upsert(
        self,
        *,
        run_id: str,
        thread_id: str,
        rating: int,
        user_id: str | None | _AutoSentinel = AUTO,
        comment: str | None = None,
    ) -> dict:
        """Create or update feedback for (thread_id, run_id, user_id). rating must be +1 or -1."""
        if rating not in (1, -1):
            raise ValueError(f"rating must be +1 or -1, got {rating}")
        resolved_user_id = resolve_user_id(user_id, method_name="FeedbackRepository.upsert")
        async with self._sf() as session:
            stmt = select(FeedbackRow).where(
                FeedbackRow.thread_id == thread_id,
                FeedbackRow.run_id == run_id,
                FeedbackRow.user_id == resolved_user_id,
            )
            result = await session.execute(stmt)
            row = result.scalar_one_or_none()
            if row is not None:
                row.rating = rating
                row.comment = comment
                row.created_at = datetime.now(UTC)
            else:
                row = FeedbackRow(
                    feedback_id=str(uuid.uuid4()),
                    run_id=run_id,
                    thread_id=thread_id,
                    user_id=resolved_user_id,
                    rating=rating,
                    comment=comment,
                    created_at=datetime.now(UTC),
                )
                session.add(row)
            await session.commit()
            await session.refresh(row)
            return self._row_to_dict(row)
    async def delete_by_run(
        self,
        *,
        thread_id: str,
        run_id: str,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> bool:
        """Delete the current user's feedback for a run. Returns True if a record was deleted."""
        resolved_user_id = resolve_user_id(user_id, method_name="FeedbackRepository.delete_by_run")
        async with self._sf() as session:
            stmt = select(FeedbackRow).where(
                FeedbackRow.thread_id == thread_id,
                FeedbackRow.run_id == run_id,
                FeedbackRow.user_id == resolved_user_id,
            )
            result = await session.execute(stmt)
            row = result.scalar_one_or_none()
            if row is None:
                return False
            await session.delete(row)
            await session.commit()
            return True
    async def list_by_thread_grouped(
        self,
        thread_id: str,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> dict[str, dict]:
        """Return feedback grouped by run_id for a thread: {run_id: feedback_dict}."""
        resolved_user_id = resolve_user_id(user_id, method_name="FeedbackRepository.list_by_thread_grouped")
        stmt = select(FeedbackRow).where(FeedbackRow.thread_id == thread_id)
        if resolved_user_id is not None:
            stmt = stmt.where(FeedbackRow.user_id == resolved_user_id)
        async with self._sf() as session:
            result = await session.execute(stmt)
            return {row.run_id: self._row_to_dict(row) for row in result.scalars()}
    async def aggregate_by_run(self, thread_id: str, run_id: str) -> dict:
        """Aggregate feedback stats for a run using database-side counting."""
        stmt = select(
            func.count().label("total"),
            func.coalesce(func.sum(case((FeedbackRow.rating == 1, 1), else_=0)), 0).label("positive"),
            func.coalesce(func.sum(case((FeedbackRow.rating == -1, 1), else_=0)), 0).label("negative"),
        ).where(FeedbackRow.thread_id == thread_id, FeedbackRow.run_id == run_id)
        async with self._sf() as session:
            row = (await session.execute(stmt)).one()
            return {
                "run_id": run_id,
                "total": row.total,
                "positive": row.positive,
                "negative": row.negative,
            }
@@ -0,0 +1,38 @@
 [alembic]
 script_location = %(here)s
 # Default URL for offline mode / autogenerate.
 # Runtime uses engine from DeerFlow config.
 sqlalchemy.url = sqlite+aiosqlite:///./data/deerflow.db
 [loggers]
 keys = root,sqlalchemy,alembic
 [handlers]
 keys = console
 [formatters]
 keys = generic
 [logger_root]
 level = WARN
 handlers = console
 [logger_sqlalchemy]
 level = WARN
 handlers =
 qualname = sqlalchemy.engine
 [logger_alembic]
 level = INFO
 handlers =
 qualname = alembic
 [handler_console]
 class = StreamHandler
 args = (sys.stderr,)
 level = NOTSET
 formatter = generic
 [formatter_generic]
 format = %(levelname)-5.5s [%(name)s] %(message)s
 datefmt = %H:%M:%S
@@ -0,0 +1,65 @@
 """Alembic environment for DeerFlow application tables.
 ONLY manages DeerFlow's tables (runs, threads_meta, cron_jobs, users).
 LangGraph's checkpointer tables are managed by LangGraph itself -- they
 have their own schema lifecycle and must not be touched by Alembic.
 """
 from __future__ import annotations
 import asyncio
 import logging
 from logging.config import fileConfig
 from alembic import context
 from sqlalchemy.ext.asyncio import create_async_engine
 from deerflow.persistence.base import Base
 # Import all models so metadata is populated.
 try:
    import deerflow.persistence.models  # noqa: F401 — register ORM models with Base.metadata
 except ImportError:
    # Models not available — migration will work with existing metadata only.
    logging.getLogger(__name__).warning("Could not import deerflow.persistence.models; Alembic may not detect all tables")
 config = context.config
 if config.config_file_name is not None:
    fileConfig(config.config_file_name)
 target_metadata = Base.metadata
 def run_migrations_offline() -> None:
    url = config.get_main_option("sqlalchemy.url")
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        render_as_batch=True,
    )
    with context.begin_transaction():
        context.run_migrations()
 def do_run_migrations(connection):
    context.configure(
        connection=connection,
        target_metadata=target_metadata,
        render_as_batch=True,  # Required for SQLite ALTER TABLE support
    )
    with context.begin_transaction():
        context.run_migrations()
 async def run_migrations_online() -> None:
    connectable = create_async_engine(config.get_main_option("sqlalchemy.url"))
    async with connectable.connect() as connection:
        await connection.run_sync(do_run_migrations)
    await connectable.dispose()
 if context.is_offline_mode():
    run_migrations_offline()
 else:
    asyncio.run(run_migrations_online())
@@ -0,0 +1,23 @@
 """ORM model registration entry point.
 Importing this module ensures all ORM models are registered with
 ``Base.metadata`` so Alembic autogenerate detects every table.
 The actual ORM classes have moved to entity-specific subpackages:
 - ``deerflow.persistence.thread_meta``
 - ``deerflow.persistence.run``
 - ``deerflow.persistence.feedback``
 - ``deerflow.persistence.user``
 ``RunEventRow`` remains in ``deerflow.persistence.models.run_event`` because
 its storage implementation lives in ``deerflow.runtime.events.store.db`` and
 there is no matching entity directory.
 """
 from deerflow.persistence.feedback.model import FeedbackRow
 from deerflow.persistence.models.run_event import RunEventRow
 from deerflow.persistence.run.model import RunRow
 from deerflow.persistence.thread_meta.model import ThreadMetaRow
 from deerflow.persistence.user.model import UserRow
 __all__ = ["FeedbackRow", "RunEventRow", "RunRow", "ThreadMetaRow", "UserRow"]
@@ -0,0 +1,35 @@
 """ORM model for run events."""
 from __future__ import annotations
 from datetime import UTC, datetime
 from sqlalchemy import JSON, DateTime, Index, String, Text, UniqueConstraint
 from sqlalchemy.orm import Mapped, mapped_column
 from deerflow.persistence.base import Base
 class RunEventRow(Base):
    __tablename__ = "run_events"
    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
    thread_id: Mapped[str] = mapped_column(String(64), nullable=False)
    run_id: Mapped[str] = mapped_column(String(64), nullable=False)
    # Owner of the conversation this event belongs to. Nullable for data
    # created before auth was introduced; populated by auth middleware on
    # new writes and by the boot-time orphan migration on existing rows.
    user_id: Mapped[str | None] = mapped_column(String(64), nullable=True, index=True)
    event_type: Mapped[str] = mapped_column(String(32), nullable=False)
    category: Mapped[str] = mapped_column(String(16), nullable=False)
    # "message" | "trace" | "lifecycle"
    content: Mapped[str] = mapped_column(Text, default="")
    event_metadata: Mapped[dict] = mapped_column(JSON, default=dict)
    seq: Mapped[int] = mapped_column(nullable=False)
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC))
    __table_args__ = (
        UniqueConstraint("thread_id", "seq", name="uq_events_thread_seq"),
        Index("ix_events_thread_cat_seq", "thread_id", "category", "seq"),
        Index("ix_events_run", "thread_id", "run_id", "seq"),
    )
@@ -0,0 +1,6 @@
 """Run metadata persistence — ORM and SQL repository."""
 from deerflow.persistence.run.model import RunRow
 from deerflow.persistence.run.sql import RunRepository
 __all__ = ["RunRepository", "RunRow"]
@@ -0,0 +1,49 @@
 """ORM model for run metadata."""
 from __future__ import annotations
 from datetime import UTC, datetime
 from sqlalchemy import JSON, DateTime, Index, String, Text
 from sqlalchemy.orm import Mapped, mapped_column
 from deerflow.persistence.base import Base
 class RunRow(Base):
    __tablename__ = "runs"
    run_id: Mapped[str] = mapped_column(String(64), primary_key=True)
    thread_id: Mapped[str] = mapped_column(String(64), nullable=False, index=True)
    assistant_id: Mapped[str | None] = mapped_column(String(128))
    user_id: Mapped[str | None] = mapped_column(String(64), index=True)
    status: Mapped[str] = mapped_column(String(20), default="pending")
    # "pending" | "running" | "success" | "error" | "timeout" | "interrupted"
    model_name: Mapped[str | None] = mapped_column(String(128))
    multitask_strategy: Mapped[str] = mapped_column(String(20), default="reject")
    metadata_json: Mapped[dict] = mapped_column(JSON, default=dict)
    kwargs_json: Mapped[dict] = mapped_column(JSON, default=dict)
    error: Mapped[str | None] = mapped_column(Text)
    # Convenience fields (for listing pages without querying RunEventStore)
    message_count: Mapped[int] = mapped_column(default=0)
    first_human_message: Mapped[str | None] = mapped_column(Text)
    last_ai_message: Mapped[str | None] = mapped_column(Text)
    # Token usage (accumulated in-memory by RunJournal, written on run completion)
    total_input_tokens: Mapped[int] = mapped_column(default=0)
    total_output_tokens: Mapped[int] = mapped_column(default=0)
    total_tokens: Mapped[int] = mapped_column(default=0)
    llm_call_count: Mapped[int] = mapped_column(default=0)
    lead_agent_tokens: Mapped[int] = mapped_column(default=0)
    subagent_tokens: Mapped[int] = mapped_column(default=0)
    middleware_tokens: Mapped[int] = mapped_column(default=0)
    # Follow-up association
    follow_up_to_run_id: Mapped[str | None] = mapped_column(String(64))
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC))
    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC), onupdate=lambda: datetime.now(UTC))
    __table_args__ = (Index("ix_runs_thread_status", "thread_id", "status"),)
@@ -0,0 +1,255 @@
 """SQLAlchemy-backed RunStore implementation.
 Each method acquires and releases its own short-lived session.
 Run status updates happen from background workers that may live
 minutes -- we don't hold connections across long execution.
 """
 from __future__ import annotations
 import json
 from datetime import UTC, datetime
 from typing import Any
 from sqlalchemy import func, select, update
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
 from deerflow.persistence.run.model import RunRow
 from deerflow.runtime.runs.store.base import RunStore
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, resolve_user_id
 class RunRepository(RunStore):
    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
        self._sf = session_factory
    @staticmethod
    def _safe_json(obj: Any) -> Any:
        """Ensure obj is JSON-serializable. Falls back to model_dump() or str()."""
        if obj is None:
            return None
        if isinstance(obj, (str, int, float, bool)):
            return obj
        if isinstance(obj, dict):
            return {k: RunRepository._safe_json(v) for k, v in obj.items()}
        if isinstance(obj, (list, tuple)):
            return [RunRepository._safe_json(v) for v in obj]
        if hasattr(obj, "model_dump"):
            try:
                return obj.model_dump()
            except Exception:
                pass
        if hasattr(obj, "dict"):
            try:
                return obj.dict()
            except Exception:
                pass
        try:
            json.dumps(obj)
            return obj
        except (TypeError, ValueError):
            return str(obj)
    @staticmethod
    def _row_to_dict(row: RunRow) -> dict[str, Any]:
        d = row.to_dict()
        # Remap JSON columns to match RunStore interface
        d["metadata"] = d.pop("metadata_json", {})
        d["kwargs"] = d.pop("kwargs_json", {})
        # Convert datetime to ISO string for consistency with MemoryRunStore
        for key in ("created_at", "updated_at"):
            val = d.get(key)
            if isinstance(val, datetime):
                d[key] = val.isoformat()
        return d
    async def put(
        self,
        run_id,
        *,
        thread_id,
        assistant_id=None,
        user_id: str | None | _AutoSentinel = AUTO,
        status="pending",
        multitask_strategy="reject",
        metadata=None,
        kwargs=None,
        error=None,
        created_at=None,
        follow_up_to_run_id=None,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="RunRepository.put")
        now = datetime.now(UTC)
        row = RunRow(
            run_id=run_id,
            thread_id=thread_id,
            assistant_id=assistant_id,
            user_id=resolved_user_id,
            status=status,
            multitask_strategy=multitask_strategy,
            metadata_json=self._safe_json(metadata) or {},
            kwargs_json=self._safe_json(kwargs) or {},
            error=error,
            follow_up_to_run_id=follow_up_to_run_id,
            created_at=datetime.fromisoformat(created_at) if created_at else now,
            updated_at=now,
        )
        async with self._sf() as session:
            session.add(row)
            await session.commit()
    async def get(
        self,
        run_id,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="RunRepository.get")
        async with self._sf() as session:
            row = await session.get(RunRow, run_id)
            if row is None:
                return None
            if resolved_user_id is not None and row.user_id != resolved_user_id:
                return None
            return self._row_to_dict(row)
    async def list_by_thread(
        self,
        thread_id,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
        limit=100,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="RunRepository.list_by_thread")
        stmt = select(RunRow).where(RunRow.thread_id == thread_id)
        if resolved_user_id is not None:
            stmt = stmt.where(RunRow.user_id == resolved_user_id)
        stmt = stmt.order_by(RunRow.created_at.desc()).limit(limit)
        async with self._sf() as session:
            result = await session.execute(stmt)
            return [self._row_to_dict(r) for r in result.scalars()]
    async def update_status(self, run_id, status, *, error=None):
        values: dict[str, Any] = {"status": status, "updated_at": datetime.now(UTC)}
        if error is not None:
            values["error"] = error
        async with self._sf() as session:
            await session.execute(update(RunRow).where(RunRow.run_id == run_id).values(**values))
            await session.commit()
    async def delete(
        self,
        run_id,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="RunRepository.delete")
        async with self._sf() as session:
            row = await session.get(RunRow, run_id)
            if row is None:
                return
            if resolved_user_id is not None and row.user_id != resolved_user_id:
                return
            await session.delete(row)
            await session.commit()
    async def list_pending(self, *, before=None):
        if before is None:
            before_dt = datetime.now(UTC)
        elif isinstance(before, datetime):
            before_dt = before
        else:
            before_dt = datetime.fromisoformat(before)
        stmt = select(RunRow).where(RunRow.status == "pending", RunRow.created_at <= before_dt).order_by(RunRow.created_at.asc())
        async with self._sf() as session:
            result = await session.execute(stmt)
            return [self._row_to_dict(r) for r in result.scalars()]
    async def update_run_completion(
        self,
        run_id: str,
        *,
        status: str,
        total_input_tokens: int = 0,
        total_output_tokens: int = 0,
        total_tokens: int = 0,
        llm_call_count: int = 0,
        lead_agent_tokens: int = 0,
        subagent_tokens: int = 0,
        middleware_tokens: int = 0,
        message_count: int = 0,
        last_ai_message: str | None = None,
        first_human_message: str | None = None,
        error: str | None = None,
    ) -> None:
        """Update status + token usage + convenience fields on run completion."""
        values: dict[str, Any] = {
            "status": status,
            "total_input_tokens": total_input_tokens,
            "total_output_tokens": total_output_tokens,
            "total_tokens": total_tokens,
            "llm_call_count": llm_call_count,
            "lead_agent_tokens": lead_agent_tokens,
            "subagent_tokens": subagent_tokens,
            "middleware_tokens": middleware_tokens,
            "message_count": message_count,
            "updated_at": datetime.now(UTC),
        }
        if last_ai_message is not None:
            values["last_ai_message"] = last_ai_message[:2000]
        if first_human_message is not None:
            values["first_human_message"] = first_human_message[:2000]
        if error is not None:
            values["error"] = error
        async with self._sf() as session:
            await session.execute(update(RunRow).where(RunRow.run_id == run_id).values(**values))
            await session.commit()
    async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
        """Aggregate token usage via a single SQL GROUP BY query."""
        _completed = RunRow.status.in_(("success", "error"))
        _thread = RunRow.thread_id == thread_id
        stmt = (
            select(
                func.coalesce(RunRow.model_name, "unknown").label("model"),
                func.count().label("runs"),
                func.coalesce(func.sum(RunRow.total_tokens), 0).label("total_tokens"),
                func.coalesce(func.sum(RunRow.total_input_tokens), 0).label("total_input_tokens"),
                func.coalesce(func.sum(RunRow.total_output_tokens), 0).label("total_output_tokens"),
                func.coalesce(func.sum(RunRow.lead_agent_tokens), 0).label("lead_agent"),
                func.coalesce(func.sum(RunRow.subagent_tokens), 0).label("subagent"),
                func.coalesce(func.sum(RunRow.middleware_tokens), 0).label("middleware"),
            )
            .where(_thread, _completed)
            .group_by(func.coalesce(RunRow.model_name, "unknown"))
        )
        async with self._sf() as session:
            rows = (await session.execute(stmt)).all()
        total_tokens = total_input = total_output = total_runs = 0
        lead_agent = subagent = middleware = 0
        by_model: dict[str, dict] = {}
        for r in rows:
            by_model[r.model] = {"tokens": r.total_tokens, "runs": r.runs}
            total_tokens += r.total_tokens
            total_input += r.total_input_tokens
            total_output += r.total_output_tokens
            total_runs += r.runs
            lead_agent += r.lead_agent
            subagent += r.subagent
            middleware += r.middleware
        return {
            "total_tokens": total_tokens,
            "total_input_tokens": total_input,
            "total_output_tokens": total_output,
            "total_runs": total_runs,
            "by_model": by_model,
            "by_caller": {
                "lead_agent": lead_agent,
                "subagent": subagent,
                "middleware": middleware,
            },
        }
@@ -0,0 +1,38 @@
 """Thread metadata persistence — ORM, abstract store, and concrete implementations."""
 from __future__ import annotations
 from typing import TYPE_CHECKING
 from deerflow.persistence.thread_meta.base import ThreadMetaStore
 from deerflow.persistence.thread_meta.memory import MemoryThreadMetaStore
 from deerflow.persistence.thread_meta.model import ThreadMetaRow
 from deerflow.persistence.thread_meta.sql import ThreadMetaRepository
 if TYPE_CHECKING:
    from langgraph.store.base import BaseStore
    from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
 __all__ = [
    "MemoryThreadMetaStore",
    "ThreadMetaRepository",
    "ThreadMetaRow",
    "ThreadMetaStore",
    "make_thread_store",
 ]
 def make_thread_store(
    session_factory: async_sessionmaker[AsyncSession] | None,
    store: BaseStore | None = None,
 ) -> ThreadMetaStore:
    """Create the appropriate ThreadMetaStore based on available backends.
    Returns a SQL-backed repository when a session factory is available,
    otherwise falls back to the in-memory LangGraph Store implementation.
    """
    if session_factory is not None:
        return ThreadMetaRepository(session_factory)
    if store is None:
        raise ValueError("make_thread_store requires either a session_factory (SQL) or a store (memory)")
    return MemoryThreadMetaStore(store)
@@ -0,0 +1,76 @@
 """Abstract interface for thread metadata storage.
 Implementations:
 - ThreadMetaRepository: SQL-backed (sqlite / postgres via SQLAlchemy)
 - MemoryThreadMetaStore: wraps LangGraph BaseStore (memory mode)
 All mutating and querying methods accept a ``user_id`` parameter with
 three-state semantics (see :mod:`deerflow.runtime.user_context`):
 - ``AUTO`` (default): resolve from the request-scoped contextvar.
 - Explicit ``str``: use the provided value verbatim.
 - Explicit ``None``: bypass owner filtering (migration/CLI only).
 """
 from __future__ import annotations
 import abc
 from deerflow.runtime.user_context import AUTO, _AutoSentinel
 class ThreadMetaStore(abc.ABC):
    @abc.abstractmethod
    async def create(
        self,
        thread_id: str,
        *,
        assistant_id: str | None = None,
        user_id: str | None | _AutoSentinel = AUTO,
        display_name: str | None = None,
        metadata: dict | None = None,
    ) -> dict:
        pass
    @abc.abstractmethod
    async def get(self, thread_id: str, *, user_id: str | None | _AutoSentinel = AUTO) -> dict | None:
        pass
    @abc.abstractmethod
    async def search(
        self,
        *,
        metadata: dict | None = None,
        status: str | None = None,
        limit: int = 100,
        offset: int = 0,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> list[dict]:
        pass
    @abc.abstractmethod
    async def update_display_name(self, thread_id: str, display_name: str, *, user_id: str | None | _AutoSentinel = AUTO) -> None:
        pass
    @abc.abstractmethod
    async def update_status(self, thread_id: str, status: str, *, user_id: str | None | _AutoSentinel = AUTO) -> None:
        pass
    @abc.abstractmethod
    async def update_metadata(self, thread_id: str, metadata: dict, *, user_id: str | None | _AutoSentinel = AUTO) -> None:
        """Merge ``metadata`` into the thread's metadata field.
        Existing keys are overwritten by the new values; keys absent from
        ``metadata`` are preserved. No-op if the thread does not exist
        or the owner check fails.
        """
        pass
    @abc.abstractmethod
    async def check_access(self, thread_id: str, user_id: str, *, require_existing: bool = False) -> bool:
        """Check if ``user_id`` has access to ``thread_id``."""
        pass
    @abc.abstractmethod
    async def delete(self, thread_id: str, *, user_id: str | None | _AutoSentinel = AUTO) -> None:
        pass
@@ -0,0 +1,149 @@
 """In-memory ThreadMetaStore backed by LangGraph BaseStore.
 Used when database.backend=memory. Delegates to the LangGraph Store's
 ``("threads",)`` namespace — the same namespace used by the Gateway
 router for thread records.
 """
 from __future__ import annotations
 import time
 from typing import Any
 from langgraph.store.base import BaseStore
 from deerflow.persistence.thread_meta.base import ThreadMetaStore
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, resolve_user_id
 THREADS_NS: tuple[str, ...] = ("threads",)
 class MemoryThreadMetaStore(ThreadMetaStore):
    def __init__(self, store: BaseStore) -> None:
        self._store = store
    async def _get_owned_record(
        self,
        thread_id: str,
        user_id: str | None | _AutoSentinel,
        method_name: str,
    ) -> dict | None:
        """Fetch a record and verify ownership. Returns a mutable copy, or None."""
        resolved = resolve_user_id(user_id, method_name=method_name)
        item = await self._store.aget(THREADS_NS, thread_id)
        if item is None:
            return None
        record = dict(item.value)
        if resolved is not None and record.get("user_id") != resolved:
            return None
        return record
    async def create(
        self,
        thread_id: str,
        *,
        assistant_id: str | None = None,
        user_id: str | None | _AutoSentinel = AUTO,
        display_name: str | None = None,
        metadata: dict | None = None,
    ) -> dict:
        resolved_user_id = resolve_user_id(user_id, method_name="MemoryThreadMetaStore.create")
        now = time.time()
        record: dict[str, Any] = {
            "thread_id": thread_id,
            "assistant_id": assistant_id,
            "user_id": resolved_user_id,
            "display_name": display_name,
            "status": "idle",
            "metadata": metadata or {},
            "values": {},
            "created_at": now,
            "updated_at": now,
        }
        await self._store.aput(THREADS_NS, thread_id, record)
        return record
    async def get(self, thread_id: str, *, user_id: str | None | _AutoSentinel = AUTO) -> dict | None:
        return await self._get_owned_record(thread_id, user_id, "MemoryThreadMetaStore.get")
    async def search(
        self,
        *,
        metadata: dict | None = None,
        status: str | None = None,
        limit: int = 100,
        offset: int = 0,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> list[dict]:
        resolved_user_id = resolve_user_id(user_id, method_name="MemoryThreadMetaStore.search")
        filter_dict: dict[str, Any] = {}
        if metadata:
            filter_dict.update(metadata)
        if status:
            filter_dict["status"] = status
        if resolved_user_id is not None:
            filter_dict["user_id"] = resolved_user_id
        items = await self._store.asearch(
            THREADS_NS,
            filter=filter_dict or None,
            limit=limit,
            offset=offset,
        )
        return [self._item_to_dict(item) for item in items]
    async def check_access(self, thread_id: str, user_id: str, *, require_existing: bool = False) -> bool:
        item = await self._store.aget(THREADS_NS, thread_id)
        if item is None:
            return not require_existing
        record_user_id = item.value.get("user_id")
        if record_user_id is None:
            return True
        return record_user_id == user_id
    async def update_display_name(self, thread_id: str, display_name: str, *, user_id: str | None | _AutoSentinel = AUTO) -> None:
        record = await self._get_owned_record(thread_id, user_id, "MemoryThreadMetaStore.update_display_name")
        if record is None:
            return
        record["display_name"] = display_name
        record["updated_at"] = time.time()
        await self._store.aput(THREADS_NS, thread_id, record)
    async def update_status(self, thread_id: str, status: str, *, user_id: str | None | _AutoSentinel = AUTO) -> None:
        record = await self._get_owned_record(thread_id, user_id, "MemoryThreadMetaStore.update_status")
        if record is None:
            return
        record["status"] = status
        record["updated_at"] = time.time()
        await self._store.aput(THREADS_NS, thread_id, record)
    async def update_metadata(self, thread_id: str, metadata: dict, *, user_id: str | None | _AutoSentinel = AUTO) -> None:
        record = await self._get_owned_record(thread_id, user_id, "MemoryThreadMetaStore.update_metadata")
        if record is None:
            return
        merged = dict(record.get("metadata") or {})
        merged.update(metadata)
        record["metadata"] = merged
        record["updated_at"] = time.time()
        await self._store.aput(THREADS_NS, thread_id, record)
    async def delete(self, thread_id: str, *, user_id: str | None | _AutoSentinel = AUTO) -> None:
        record = await self._get_owned_record(thread_id, user_id, "MemoryThreadMetaStore.delete")
        if record is None:
            return
        await self._store.adelete(THREADS_NS, thread_id)
    @staticmethod
    def _item_to_dict(item) -> dict[str, Any]:
        """Convert a Store SearchItem to the dict format expected by callers."""
        val = item.value
        return {
            "thread_id": item.key,
            "assistant_id": val.get("assistant_id"),
            "user_id": val.get("user_id"),
            "display_name": val.get("display_name"),
            "status": val.get("status", "idle"),
            "metadata": val.get("metadata", {}),
            "created_at": str(val.get("created_at", "")),
            "updated_at": str(val.get("updated_at", "")),
        }
@@ -0,0 +1,23 @@
 """ORM model for thread metadata."""
 from __future__ import annotations
 from datetime import UTC, datetime
 from sqlalchemy import JSON, DateTime, String
 from sqlalchemy.orm import Mapped, mapped_column
 from deerflow.persistence.base import Base
 class ThreadMetaRow(Base):
    __tablename__ = "threads_meta"
    thread_id: Mapped[str] = mapped_column(String(64), primary_key=True)
    assistant_id: Mapped[str | None] = mapped_column(String(128), index=True)
    user_id: Mapped[str | None] = mapped_column(String(64), index=True)
    display_name: Mapped[str | None] = mapped_column(String(256))
    status: Mapped[str] = mapped_column(String(20), default="idle")
    metadata_json: Mapped[dict] = mapped_column(JSON, default=dict)
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC))
    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC), onupdate=lambda: datetime.now(UTC))
@@ -0,0 +1,217 @@
 """SQLAlchemy-backed thread metadata repository."""
 from __future__ import annotations
 from datetime import UTC, datetime
 from typing import Any
 from sqlalchemy import select, update
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
 from deerflow.persistence.thread_meta.base import ThreadMetaStore
 from deerflow.persistence.thread_meta.model import ThreadMetaRow
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, resolve_user_id
 class ThreadMetaRepository(ThreadMetaStore):
    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
        self._sf = session_factory
    @staticmethod
    def _row_to_dict(row: ThreadMetaRow) -> dict[str, Any]:
        d = row.to_dict()
        d["metadata"] = d.pop("metadata_json", {})
        for key in ("created_at", "updated_at"):
            val = d.get(key)
            if isinstance(val, datetime):
                d[key] = val.isoformat()
        return d
    async def create(
        self,
        thread_id: str,
        *,
        assistant_id: str | None = None,
        user_id: str | None | _AutoSentinel = AUTO,
        display_name: str | None = None,
        metadata: dict | None = None,
    ) -> dict:
        # Auto-resolve user_id from contextvar when AUTO; explicit None
        # creates an orphan row (used by migration scripts).
        resolved_user_id = resolve_user_id(user_id, method_name="ThreadMetaRepository.create")
        now = datetime.now(UTC)
        row = ThreadMetaRow(
            thread_id=thread_id,
            assistant_id=assistant_id,
            user_id=resolved_user_id,
            display_name=display_name,
            metadata_json=metadata or {},
            created_at=now,
            updated_at=now,
        )
        async with self._sf() as session:
            session.add(row)
            await session.commit()
            await session.refresh(row)
            return self._row_to_dict(row)
    async def get(
        self,
        thread_id: str,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> dict | None:
        resolved_user_id = resolve_user_id(user_id, method_name="ThreadMetaRepository.get")
        async with self._sf() as session:
            row = await session.get(ThreadMetaRow, thread_id)
            if row is None:
                return None
            # Enforce owner filter unless explicitly bypassed (user_id=None).
            if resolved_user_id is not None and row.user_id != resolved_user_id:
                return None
            return self._row_to_dict(row)
    async def check_access(self, thread_id: str, user_id: str, *, require_existing: bool = False) -> bool:
        """Check if ``user_id`` has access to ``thread_id``.
        Two modes — one row, two distinct semantics depending on what
        the caller is about to do:
        - ``require_existing=False`` (default, permissive):
          Returns True for: row missing (untracked legacy thread),
          ``row.user_id`` is None (shared / pre-auth data),
          or ``row.user_id == user_id``. Use for **read-style**
          decorators where treating an untracked thread as accessible
          preserves backward-compat.
        - ``require_existing=True`` (strict):
          Returns True **only** when the row exists AND
          (``row.user_id == user_id`` OR ``row.user_id is None``).
          Use for **destructive / mutating** decorators (DELETE, PATCH,
          state-update) so a thread that has *already been deleted*
          cannot be re-targeted by any caller — closing the
          delete-idempotence cross-user gap where the row vanishing
          made every other user appear to "own" it.
        """
        async with self._sf() as session:
            row = await session.get(ThreadMetaRow, thread_id)
            if row is None:
                return not require_existing
            if row.user_id is None:
                return True
            return row.user_id == user_id
    async def search(
        self,
        *,
        metadata: dict | None = None,
        status: str | None = None,
        limit: int = 100,
        offset: int = 0,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> list[dict]:
        """Search threads with optional metadata and status filters.
        Owner filter is enforced by default: caller must be in a user
        context. Pass ``user_id=None`` to bypass (migration/CLI).
        """
        resolved_user_id = resolve_user_id(user_id, method_name="ThreadMetaRepository.search")
        stmt = select(ThreadMetaRow).order_by(ThreadMetaRow.updated_at.desc())
        if resolved_user_id is not None:
            stmt = stmt.where(ThreadMetaRow.user_id == resolved_user_id)
        if status:
            stmt = stmt.where(ThreadMetaRow.status == status)
        if metadata:
            # When metadata filter is active, fetch a larger window and filter
            # in Python. TODO(Phase 2): use JSON DB operators (Postgres @>,
            # SQLite json_extract) for server-side filtering.
            stmt = stmt.limit(limit * 5 + offset)
            async with self._sf() as session:
                result = await session.execute(stmt)
                rows = [self._row_to_dict(r) for r in result.scalars()]
            rows = [r for r in rows if all(r.get("metadata", {}).get(k) == v for k, v in metadata.items())]
            return rows[offset : offset + limit]
        else:
            stmt = stmt.limit(limit).offset(offset)
            async with self._sf() as session:
                result = await session.execute(stmt)
                return [self._row_to_dict(r) for r in result.scalars()]
    async def _check_ownership(self, session: AsyncSession, thread_id: str, resolved_user_id: str | None) -> bool:
        """Return True if the row exists and is owned (or filter bypassed)."""
        if resolved_user_id is None:
            return True  # explicit bypass
        row = await session.get(ThreadMetaRow, thread_id)
        return row is not None and row.user_id == resolved_user_id
    async def update_display_name(
        self,
        thread_id: str,
        display_name: str,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> None:
        """Update the display_name (title) for a thread."""
        resolved_user_id = resolve_user_id(user_id, method_name="ThreadMetaRepository.update_display_name")
        async with self._sf() as session:
            if not await self._check_ownership(session, thread_id, resolved_user_id):
                return
            await session.execute(update(ThreadMetaRow).where(ThreadMetaRow.thread_id == thread_id).values(display_name=display_name, updated_at=datetime.now(UTC)))
            await session.commit()
    async def update_status(
        self,
        thread_id: str,
        status: str,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> None:
        resolved_user_id = resolve_user_id(user_id, method_name="ThreadMetaRepository.update_status")
        async with self._sf() as session:
            if not await self._check_ownership(session, thread_id, resolved_user_id):
                return
            await session.execute(update(ThreadMetaRow).where(ThreadMetaRow.thread_id == thread_id).values(status=status, updated_at=datetime.now(UTC)))
            await session.commit()
    async def update_metadata(
        self,
        thread_id: str,
        metadata: dict,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> None:
        """Merge ``metadata`` into ``metadata_json``.
        Read-modify-write inside a single session/transaction so concurrent
        callers see consistent state. No-op if the row does not exist or
        the user_id check fails.
        """
        resolved_user_id = resolve_user_id(user_id, method_name="ThreadMetaRepository.update_metadata")
        async with self._sf() as session:
            row = await session.get(ThreadMetaRow, thread_id)
            if row is None:
                return
            if resolved_user_id is not None and row.user_id != resolved_user_id:
                return
            merged = dict(row.metadata_json or {})
            merged.update(metadata)
            row.metadata_json = merged
            row.updated_at = datetime.now(UTC)
            await session.commit()
    async def delete(
        self,
        thread_id: str,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ) -> None:
        resolved_user_id = resolve_user_id(user_id, method_name="ThreadMetaRepository.delete")
        async with self._sf() as session:
            row = await session.get(ThreadMetaRow, thread_id)
            if row is None:
                return
            if resolved_user_id is not None and row.user_id != resolved_user_id:
                return
            await session.delete(row)
            await session.commit()
@@ -0,0 +1,12 @@
 """User storage subpackage.
 Holds the ORM model for the ``users`` table. The concrete repository
 implementation (``SQLiteUserRepository``) lives in the app layer
 (``app.gateway.auth.repositories.sqlite``) because it converts
 between the ORM row and the auth module's pydantic ``User`` class.
 This keeps the harness package free of any dependency on app code.
 """
 from deerflow.persistence.user.model import UserRow
 __all__ = ["UserRow"]
@@ -0,0 +1,59 @@
 """ORM model for the users table.
 Lives in the harness persistence package so it is picked up by
 ``Base.metadata.create_all()`` alongside ``threads_meta``, ``runs``,
 ``run_events``, and ``feedback``. Using the shared engine means:
 - One SQLite/Postgres database, one connection pool
 - One schema initialisation codepath
 - Consistent async sessions across auth and persistence reads
 """
 from __future__ import annotations
 from datetime import UTC, datetime
 from sqlalchemy import Boolean, DateTime, Index, String, text
 from sqlalchemy.orm import Mapped, mapped_column
 from deerflow.persistence.base import Base
 class UserRow(Base):
    __tablename__ = "users"
    # UUIDs are stored as 36-char strings for cross-backend portability.
    id: Mapped[str] = mapped_column(String(36), primary_key=True)
    email: Mapped[str] = mapped_column(String(320), unique=True, nullable=False, index=True)
    password_hash: Mapped[str | None] = mapped_column(String(128), nullable=True)
    # "admin" | "user" — kept as plain string to avoid ALTER TABLE pain
    # when new roles are introduced.
    system_role: Mapped[str] = mapped_column(String(16), nullable=False, default="user")
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True),
        nullable=False,
        default=lambda: datetime.now(UTC),
    )
    # OAuth linkage (optional). A partial unique index enforces one
    # account per (provider, oauth_id) pair, leaving NULL/NULL rows
    # unconstrained so plain password accounts can coexist.
    oauth_provider: Mapped[str | None] = mapped_column(String(32), nullable=True)
    oauth_id: Mapped[str | None] = mapped_column(String(128), nullable=True)
    # Auth lifecycle flags
    needs_setup: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
    token_version: Mapped[int] = mapped_column(nullable=False, default=0)
    __table_args__ = (
        Index(
            "idx_users_oauth_identity",
            "oauth_provider",
            "oauth_id",
            unique=True,
            sqlite_where=text("oauth_provider IS NOT NULL AND oauth_id IS NOT NULL"),
        ),
    )
@@ -5,15 +5,22 @@ Re-exports the public API of :mod:`~deerflow.runtime.runs` and
 directly from ``deerflow.runtime``.
 """
-from .runs import ConflictError, DisconnectMode, RunManager, RunRecord, RunStatus, UnsupportedStrategyError, run_agent
+from .checkpointer import checkpointer_context, get_checkpointer, make_checkpointer, reset_checkpointer
 from .runs import ConflictError, DisconnectMode, RunContext, RunManager, RunRecord, RunStatus, UnsupportedStrategyError, run_agent
 from .serialization import serialize, serialize_channel_values, serialize_lc_object, serialize_messages_tuple
 from .store import get_store, make_store, reset_store, store_context
 from .stream_bridge import END_SENTINEL, HEARTBEAT_SENTINEL, MemoryStreamBridge, StreamBridge, StreamEvent, make_stream_bridge
 __all__ = [
    # checkpointer
    "checkpointer_context",
    "get_checkpointer",
    "make_checkpointer",
    "reset_checkpointer",
    # runs
    "ConflictError",
    "DisconnectMode",
    "RunContext",
    "RunManager",
    "RunRecord",
    "RunStatus",
@@ -7,12 +7,12 @@ Supported backends: memory, sqlite, postgres.
 Usage (e.g. FastAPI lifespan)::
-    from deerflow.agents.checkpointer.async_provider import make_checkpointer
+    from deerflow.runtime.checkpointer.async_provider import make_checkpointer
    async with make_checkpointer() as checkpointer:
        app.state.checkpointer = checkpointer  # InMemorySaver if not configured
-For sync usage see :mod:`deerflow.agents.checkpointer.provider`.
+For sync usage see :mod:`deerflow.runtime.checkpointer.provider`.
 """
 from __future__ import annotations
@@ -24,12 +24,12 @@ from collections.abc import AsyncIterator
 from langgraph.types import Checkpointer
-from deerflow.agents.checkpointer.provider import (
+from deerflow.config.app_config import get_app_config
 from deerflow.runtime.checkpointer.provider import (
    POSTGRES_CONN_REQUIRED,
    POSTGRES_INSTALL,
    SQLITE_INSTALL,
 )
 from deerflow.config.app_config import get_app_config
 from deerflow.runtime.store._sqlite_utils import ensure_sqlite_parent_dir, resolve_sqlite_conn_str
 logger = logging.getLogger(__name__)
@@ -84,23 +84,76 @@ async def _async_checkpointer(config) -> AsyncIterator[Checkpointer]:
@contextlib.asynccontextmanager
-async def make_checkpointer() -> AsyncIterator[Checkpointer]:
+async def _async_checkpointer_from_database(db_config) -> AsyncIterator[Checkpointer]:
-    """Async context manager that yields a checkpointer for the caller's lifetime.
+    """Async context manager that constructs a checkpointer from unified DatabaseConfig."""
-    Resources are opened on enter and closed on exit — no global state::
+    if db_config.backend == "memory":
        async with make_checkpointer() as checkpointer:
            app.state.checkpointer = checkpointer
    Yields an ``InMemorySaver`` when no checkpointer is configured in *config.yaml*.
    """
    config = get_app_config()
    if config.checkpointer is None:
        from langgraph.checkpoint.memory import InMemorySaver
        yield InMemorySaver()
        return
-    async with _async_checkpointer(config.checkpointer) as saver:
+    if db_config.backend == "sqlite":
-        yield saver
+        try:
            from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver
        except ImportError as exc:
            raise ImportError(SQLITE_INSTALL) from exc
        conn_str = db_config.checkpointer_sqlite_path
        ensure_sqlite_parent_dir(conn_str)
        async with AsyncSqliteSaver.from_conn_string(conn_str) as saver:
            await saver.setup()
            yield saver
        return
    if db_config.backend == "postgres":
        try:
            from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
        except ImportError as exc:
            raise ImportError(POSTGRES_INSTALL) from exc
        if not db_config.postgres_url:
            raise ValueError("database.postgres_url is required for the postgres backend")
        async with AsyncPostgresSaver.from_conn_string(db_config.postgres_url) as saver:
            await saver.setup()
            yield saver
        return
    raise ValueError(f"Unknown database backend: {db_config.backend!r}")
@contextlib.asynccontextmanager
 async def make_checkpointer() -> AsyncIterator[Checkpointer]:
    """Async context manager that yields a checkpointer for the caller's lifetime.
    Resources are opened on enter and closed on exit -- no global state::
        async with make_checkpointer() as checkpointer:
            app.state.checkpointer = checkpointer
    Yields an ``InMemorySaver`` when no checkpointer is configured in *config.yaml*.
    Priority:
    1. Legacy ``checkpointer:`` config section (backward compatible)
    2. Unified ``database:`` config section
    3. Default InMemorySaver
    """
    config = get_app_config()
    # Legacy: standalone checkpointer config takes precedence
    if config.checkpointer is not None:
        async with _async_checkpointer(config.checkpointer) as saver:
            yield saver
            return
    # Unified database config
    db_config = getattr(config, "database", None)
    if db_config is not None and db_config.backend != "memory":
        async with _async_checkpointer_from_database(db_config) as saver:
            yield saver
            return
    # Default: in-memory
    from langgraph.checkpoint.memory import InMemorySaver
    yield InMemorySaver()
@@ -7,7 +7,7 @@ Supported backends: memory, sqlite, postgres.
 Usage::
-    from deerflow.agents.checkpointer.provider import get_checkpointer, checkpointer_context
+    from deerflow.runtime.checkpointer.provider import get_checkpointer, checkpointer_context
    # Singleton — reused across calls, closed on process exit
    cp = get_checkpointer()
@@ -0,0 +1,134 @@
 """Pure functions to convert LangChain message objects to OpenAI Chat Completions format.
 Used by RunJournal to build content dicts for event storage.
 """
 from __future__ import annotations
 import json
 from typing import Any
 _ROLE_MAP = {
    "human": "user",
    "ai": "assistant",
    "system": "system",
    "tool": "tool",
 }
 def langchain_to_openai_message(message: Any) -> dict:
    """Convert a single LangChain BaseMessage to an OpenAI message dict.
    Handles:
    - HumanMessage → {"role": "user", "content": "..."}
    - AIMessage (text only) → {"role": "assistant", "content": "..."}
    - AIMessage (with tool_calls) → {"role": "assistant", "content": null, "tool_calls": [...]}
    - AIMessage (text + tool_calls) → both content and tool_calls present
    - AIMessage (list content / multimodal) → content preserved as list
    - SystemMessage → {"role": "system", "content": "..."}
    - ToolMessage → {"role": "tool", "tool_call_id": "...", "content": "..."}
    """
    msg_type = getattr(message, "type", "")
    role = _ROLE_MAP.get(msg_type, msg_type)
    content = getattr(message, "content", "")
    if role == "tool":
        return {
            "role": "tool",
            "tool_call_id": getattr(message, "tool_call_id", ""),
            "content": content,
        }
    if role == "assistant":
        tool_calls = getattr(message, "tool_calls", None) or []
        result: dict = {"role": "assistant"}
        if tool_calls:
            openai_tool_calls = []
            for tc in tool_calls:
                args = tc.get("args", {})
                openai_tool_calls.append(
                    {
                        "id": tc.get("id", ""),
                        "type": "function",
                        "function": {
                            "name": tc.get("name", ""),
                            "arguments": json.dumps(args) if not isinstance(args, str) else args,
                        },
                    }
                )
            # If no text content, set content to null per OpenAI spec
            result["content"] = content if (isinstance(content, list) and content) or (isinstance(content, str) and content) else None
            result["tool_calls"] = openai_tool_calls
        else:
            result["content"] = content
        return result
    # user / system / unknown
    return {"role": role, "content": content}
 def _infer_finish_reason(message: Any) -> str:
    """Infer OpenAI finish_reason from an AIMessage.
    Returns "tool_calls" if tool_calls present, else looks in
    response_metadata.finish_reason, else returns "stop".
    """
    tool_calls = getattr(message, "tool_calls", None) or []
    if tool_calls:
        return "tool_calls"
    resp_meta = getattr(message, "response_metadata", None) or {}
    if isinstance(resp_meta, dict):
        finish = resp_meta.get("finish_reason")
        if finish:
            return finish
    return "stop"
 def langchain_to_openai_completion(message: Any) -> dict:
    """Convert an AIMessage and its metadata to an OpenAI completion response dict.
    Returns:
        {
            "id": message.id,
            "model": message.response_metadata.get("model_name"),
            "choices": [{"index": 0, "message": <openai_message>, "finish_reason": <inferred>}],
            "usage": {"prompt_tokens": ..., "completion_tokens": ..., "total_tokens": ...} or None,
        }
    """
    resp_meta = getattr(message, "response_metadata", None) or {}
    model_name = resp_meta.get("model_name") if isinstance(resp_meta, dict) else None
    openai_msg = langchain_to_openai_message(message)
    finish_reason = _infer_finish_reason(message)
    usage_metadata = getattr(message, "usage_metadata", None)
    if usage_metadata is not None:
        input_tokens = usage_metadata.get("input_tokens", 0) or 0
        output_tokens = usage_metadata.get("output_tokens", 0) or 0
        usage: dict | None = {
            "prompt_tokens": input_tokens,
            "completion_tokens": output_tokens,
            "total_tokens": input_tokens + output_tokens,
        }
    else:
        usage = None
    return {
        "id": getattr(message, "id", None),
        "model": model_name,
        "choices": [
            {
                "index": 0,
                "message": openai_msg,
                "finish_reason": finish_reason,
            }
        ],
        "usage": usage,
    }
 def langchain_messages_to_openai(messages: list) -> list[dict]:
    """Convert a list of LangChain BaseMessages to OpenAI message dicts."""
    return [langchain_to_openai_message(m) for m in messages]
@@ -0,0 +1,4 @@
 from deerflow.runtime.events.store.base import RunEventStore
 from deerflow.runtime.events.store.memory import MemoryRunEventStore
 __all__ = ["MemoryRunEventStore", "RunEventStore"]
@@ -0,0 +1,26 @@
 from deerflow.runtime.events.store.base import RunEventStore
 from deerflow.runtime.events.store.memory import MemoryRunEventStore
 def make_run_event_store(config=None) -> RunEventStore:
    """Create a RunEventStore based on run_events.backend configuration."""
    if config is None or config.backend == "memory":
        return MemoryRunEventStore()
    if config.backend == "db":
        from deerflow.persistence.engine import get_session_factory
        sf = get_session_factory()
        if sf is None:
            # database.backend=memory but run_events.backend=db -> fallback
            return MemoryRunEventStore()
        from deerflow.runtime.events.store.db import DbRunEventStore
        return DbRunEventStore(sf, max_trace_content=config.max_trace_content)
    if config.backend == "jsonl":
        from deerflow.runtime.events.store.jsonl import JsonlRunEventStore
        return JsonlRunEventStore()
    raise ValueError(f"Unknown run_events backend: {config.backend!r}")
 __all__ = ["MemoryRunEventStore", "RunEventStore", "make_run_event_store"]
@@ -0,0 +1,109 @@
 """Abstract interface for run event storage.
 RunEventStore is the unified storage interface for run event streams.
 Messages (frontend display) and execution traces (debugging/audit) go
 through the same interface, distinguished by the ``category`` field.
 Implementations:
 - MemoryRunEventStore: in-memory dict (development, tests)
 - Future: DB-backed store (SQLAlchemy ORM), JSONL file store
 """
 from __future__ import annotations
 import abc
 class RunEventStore(abc.ABC):
    """Run event stream storage interface.
    All implementations must guarantee:
    1. put() events are retrievable in subsequent queries
    2. seq is strictly increasing within the same thread
    3. list_messages() only returns category="message" events
    4. list_events() returns all events for the specified run
    5. Returned dicts match the RunEvent field structure
    """
    @abc.abstractmethod
    async def put(
        self,
        *,
        thread_id: str,
        run_id: str,
        event_type: str,
        category: str,
        content: str | dict = "",
        metadata: dict | None = None,
        created_at: str | None = None,
    ) -> dict:
        """Write an event, auto-assign seq, return the complete record."""
    @abc.abstractmethod
    async def put_batch(self, events: list[dict]) -> list[dict]:
        """Batch-write events. Used by RunJournal flush buffer.
        Each dict's keys match put()'s keyword arguments.
        Returns complete records with seq assigned.
        """
    @abc.abstractmethod
    async def list_messages(
        self,
        thread_id: str,
        *,
        limit: int = 50,
        before_seq: int | None = None,
        after_seq: int | None = None,
    ) -> list[dict]:
        """Return displayable messages (category=message) for a thread, ordered by seq ascending.
        Supports bidirectional cursor pagination:
        - before_seq: return the last ``limit`` records with seq < before_seq (ascending)
        - after_seq: return the first ``limit`` records with seq > after_seq (ascending)
        - neither: return the latest ``limit`` records (ascending)
        """
    @abc.abstractmethod
    async def list_events(
        self,
        thread_id: str,
        run_id: str,
        *,
        event_types: list[str] | None = None,
        limit: int = 500,
    ) -> list[dict]:
        """Return the full event stream for a run, ordered by seq ascending.
        Optionally filter by event_types.
        """
    @abc.abstractmethod
    async def list_messages_by_run(
        self,
        thread_id: str,
        run_id: str,
        *,
        limit: int = 50,
        before_seq: int | None = None,
        after_seq: int | None = None,
    ) -> list[dict]:
        """Return displayable messages (category=message) for a specific run, ordered by seq ascending.
        Supports bidirectional cursor pagination:
        - after_seq: return the first ``limit`` records with seq > after_seq (ascending)
        - before_seq: return the last ``limit`` records with seq < before_seq (ascending)
        - neither: return the latest ``limit`` records (ascending)
        """
    @abc.abstractmethod
    async def count_messages(self, thread_id: str) -> int:
        """Count displayable messages (category=message) in a thread."""
    @abc.abstractmethod
    async def delete_by_thread(self, thread_id: str) -> int:
        """Delete all events for a thread. Return the number of deleted events."""
    @abc.abstractmethod
    async def delete_by_run(self, thread_id: str, run_id: str) -> int:
        """Delete all events for a specific run. Return the number of deleted events."""
@@ -0,0 +1,286 @@
 """SQLAlchemy-backed RunEventStore implementation.
 Persists events to the ``run_events`` table. Trace content is truncated
 at ``max_trace_content`` bytes to avoid bloating the database.
 """
 from __future__ import annotations
 import json
 import logging
 from datetime import UTC, datetime
 from sqlalchemy import delete, func, select
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
 from deerflow.persistence.models.run_event import RunEventRow
 from deerflow.runtime.events.store.base import RunEventStore
 from deerflow.runtime.user_context import AUTO, _AutoSentinel, get_current_user, resolve_user_id
 logger = logging.getLogger(__name__)
 class DbRunEventStore(RunEventStore):
    def __init__(self, session_factory: async_sessionmaker[AsyncSession], *, max_trace_content: int = 10240):
        self._sf = session_factory
        self._max_trace_content = max_trace_content
    @staticmethod
    def _row_to_dict(row: RunEventRow) -> dict:
        d = row.to_dict()
        d["metadata"] = d.pop("event_metadata", {})
        val = d.get("created_at")
        if isinstance(val, datetime):
            d["created_at"] = val.isoformat()
        d.pop("id", None)
        # Restore dict content that was JSON-serialized on write
        raw = d.get("content", "")
        if isinstance(raw, str) and d.get("metadata", {}).get("content_is_dict"):
            try:
                d["content"] = json.loads(raw)
            except (json.JSONDecodeError, ValueError):
                # Content looked like JSON (content_is_dict flag) but failed to parse;
                # keep the raw string as-is.
                logger.debug("Failed to deserialize content as JSON for event seq=%s", d.get("seq"))
        return d
    def _truncate_trace(self, category: str, content: str | dict, metadata: dict | None) -> tuple[str | dict, dict]:
        if category == "trace":
            text = json.dumps(content, default=str, ensure_ascii=False) if isinstance(content, dict) else content
            encoded = text.encode("utf-8")
            if len(encoded) > self._max_trace_content:
                # Truncate by bytes, then decode back (may cut a multi-byte char, so use errors="ignore")
                content = encoded[: self._max_trace_content].decode("utf-8", errors="ignore")
                metadata = {**(metadata or {}), "content_truncated": True, "original_byte_length": len(encoded)}
        return content, metadata or {}
    @staticmethod
    def _user_id_from_context() -> str | None:
        """Soft read of user_id from contextvar for write paths.
        Returns ``None`` (no filter / no stamp) if contextvar is unset,
        which is the expected case for background worker writes. HTTP
        request writes will have the contextvar set by auth middleware
        and get their user_id stamped automatically.
        Coerces ``user.id`` to ``str`` at the boundary: ``User.id`` is
        typed as ``UUID`` by the auth layer, but ``run_events.user_id``
        is ``VARCHAR(64)`` and aiosqlite cannot bind a raw UUID object
        to a VARCHAR column ("type 'UUID' is not supported") — the
        INSERT would silently roll back and the worker would hang.
        """
        user = get_current_user()
        return str(user.id) if user is not None else None
    async def put(self, *, thread_id, run_id, event_type, category, content="", metadata=None, created_at=None):  # noqa: D401
        """Write a single event — low-frequency path only.
        This opens a dedicated transaction with a FOR UPDATE lock to
        assign a monotonic *seq*.  For high-throughput writes use
        :meth:`put_batch`, which acquires the lock once for the whole
        batch.  Currently the only caller is ``worker.run_agent`` for
        the initial ``human_message`` event (once per run).
        """
        content, metadata = self._truncate_trace(category, content, metadata)
        if isinstance(content, dict):
            db_content = json.dumps(content, default=str, ensure_ascii=False)
            metadata = {**(metadata or {}), "content_is_dict": True}
        else:
            db_content = content
        user_id = self._user_id_from_context()
        async with self._sf() as session:
            async with session.begin():
                # Use FOR UPDATE to serialize seq assignment within a thread.
                # NOTE: with_for_update() on aggregates is a no-op on SQLite;
                # the UNIQUE(thread_id, seq) constraint catches races there.
                max_seq = await session.scalar(select(func.max(RunEventRow.seq)).where(RunEventRow.thread_id == thread_id).with_for_update())
                seq = (max_seq or 0) + 1
                row = RunEventRow(
                    thread_id=thread_id,
                    run_id=run_id,
                    user_id=user_id,
                    event_type=event_type,
                    category=category,
                    content=db_content,
                    event_metadata=metadata,
                    seq=seq,
                    created_at=datetime.fromisoformat(created_at) if created_at else datetime.now(UTC),
                )
                session.add(row)
            return self._row_to_dict(row)
    async def put_batch(self, events):
        if not events:
            return []
        user_id = self._user_id_from_context()
        async with self._sf() as session:
            async with session.begin():
                # Get max seq for the thread (assume all events in batch belong to same thread).
                # NOTE: with_for_update() on aggregates is a no-op on SQLite;
                # the UNIQUE(thread_id, seq) constraint catches races there.
                thread_id = events[0]["thread_id"]
                max_seq = await session.scalar(select(func.max(RunEventRow.seq)).where(RunEventRow.thread_id == thread_id).with_for_update())
                seq = max_seq or 0
                rows = []
                for e in events:
                    seq += 1
                    content = e.get("content", "")
                    category = e.get("category", "trace")
                    metadata = e.get("metadata")
                    content, metadata = self._truncate_trace(category, content, metadata)
                    if isinstance(content, dict):
                        db_content = json.dumps(content, default=str, ensure_ascii=False)
                        metadata = {**(metadata or {}), "content_is_dict": True}
                    else:
                        db_content = content
                    row = RunEventRow(
                        thread_id=e["thread_id"],
                        run_id=e["run_id"],
                        user_id=e.get("user_id", user_id),
                        event_type=e["event_type"],
                        category=category,
                        content=db_content,
                        event_metadata=metadata,
                        seq=seq,
                        created_at=datetime.fromisoformat(e["created_at"]) if e.get("created_at") else datetime.now(UTC),
                    )
                    session.add(row)
                    rows.append(row)
            return [self._row_to_dict(r) for r in rows]
    async def list_messages(
        self,
        thread_id,
        *,
        limit=50,
        before_seq=None,
        after_seq=None,
        user_id: str | None | _AutoSentinel = AUTO,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="DbRunEventStore.list_messages")
        stmt = select(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.category == "message")
        if resolved_user_id is not None:
            stmt = stmt.where(RunEventRow.user_id == resolved_user_id)
        if before_seq is not None:
            stmt = stmt.where(RunEventRow.seq < before_seq)
        if after_seq is not None:
            stmt = stmt.where(RunEventRow.seq > after_seq)
        if after_seq is not None:
            # Forward pagination: first `limit` records after cursor
            stmt = stmt.order_by(RunEventRow.seq.asc()).limit(limit)
            async with self._sf() as session:
                result = await session.execute(stmt)
                return [self._row_to_dict(r) for r in result.scalars()]
        else:
            # before_seq or default (latest): take last `limit` records, return ascending
            stmt = stmt.order_by(RunEventRow.seq.desc()).limit(limit)
            async with self._sf() as session:
                result = await session.execute(stmt)
                rows = list(result.scalars())
                return [self._row_to_dict(r) for r in reversed(rows)]
    async def list_events(
        self,
        thread_id,
        run_id,
        *,
        event_types=None,
        limit=500,
        user_id: str | None | _AutoSentinel = AUTO,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="DbRunEventStore.list_events")
        stmt = select(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id)
        if resolved_user_id is not None:
            stmt = stmt.where(RunEventRow.user_id == resolved_user_id)
        if event_types:
            stmt = stmt.where(RunEventRow.event_type.in_(event_types))
        stmt = stmt.order_by(RunEventRow.seq.asc()).limit(limit)
        async with self._sf() as session:
            result = await session.execute(stmt)
            return [self._row_to_dict(r) for r in result.scalars()]
    async def list_messages_by_run(
        self,
        thread_id,
        run_id,
        *,
        limit=50,
        before_seq=None,
        after_seq=None,
        user_id: str | None | _AutoSentinel = AUTO,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="DbRunEventStore.list_messages_by_run")
        stmt = select(RunEventRow).where(
            RunEventRow.thread_id == thread_id,
            RunEventRow.run_id == run_id,
            RunEventRow.category == "message",
        )
        if resolved_user_id is not None:
            stmt = stmt.where(RunEventRow.user_id == resolved_user_id)
        if before_seq is not None:
            stmt = stmt.where(RunEventRow.seq < before_seq)
        if after_seq is not None:
            stmt = stmt.where(RunEventRow.seq > after_seq)
        if after_seq is not None:
            stmt = stmt.order_by(RunEventRow.seq.asc()).limit(limit)
            async with self._sf() as session:
                result = await session.execute(stmt)
                return [self._row_to_dict(r) for r in result.scalars()]
        else:
            stmt = stmt.order_by(RunEventRow.seq.desc()).limit(limit)
            async with self._sf() as session:
                result = await session.execute(stmt)
                rows = list(result.scalars())
                return [self._row_to_dict(r) for r in reversed(rows)]
    async def count_messages(
        self,
        thread_id,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="DbRunEventStore.count_messages")
        stmt = select(func.count()).select_from(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.category == "message")
        if resolved_user_id is not None:
            stmt = stmt.where(RunEventRow.user_id == resolved_user_id)
        async with self._sf() as session:
            return await session.scalar(stmt) or 0
    async def delete_by_thread(
        self,
        thread_id,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="DbRunEventStore.delete_by_thread")
        async with self._sf() as session:
            count_conditions = [RunEventRow.thread_id == thread_id]
            if resolved_user_id is not None:
                count_conditions.append(RunEventRow.user_id == resolved_user_id)
            count_stmt = select(func.count()).select_from(RunEventRow).where(*count_conditions)
            count = await session.scalar(count_stmt) or 0
            if count > 0:
                await session.execute(delete(RunEventRow).where(*count_conditions))
                await session.commit()
            return count
    async def delete_by_run(
        self,
        thread_id,
        run_id,
        *,
        user_id: str | None | _AutoSentinel = AUTO,
    ):
        resolved_user_id = resolve_user_id(user_id, method_name="DbRunEventStore.delete_by_run")
        async with self._sf() as session:
            count_conditions = [RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id]
            if resolved_user_id is not None:
                count_conditions.append(RunEventRow.user_id == resolved_user_id)
            count_stmt = select(func.count()).select_from(RunEventRow).where(*count_conditions)
            count = await session.scalar(count_stmt) or 0
            if count > 0:
                await session.execute(delete(RunEventRow).where(*count_conditions))
                await session.commit()
            return count
@@ -0,0 +1,187 @@
 """JSONL file-backed RunEventStore implementation.
 Each run's events are stored in a single file:
 ``.deer-flow/threads/{thread_id}/runs/{run_id}.jsonl``
 All categories (message, trace, lifecycle) are in the same file.
 This backend is suitable for lightweight single-node deployments.
 Known trade-off: ``list_messages()`` must scan all run files for a
 thread since messages from multiple runs need unified seq ordering.
 ``list_events()`` reads only one file -- the fast path.
 """
 from __future__ import annotations
 import json
 import logging
 import re
 from datetime import UTC, datetime
 from pathlib import Path
 from deerflow.runtime.events.store.base import RunEventStore
 logger = logging.getLogger(__name__)
 _SAFE_ID_PATTERN = re.compile(r"^[A-Za-z0-9_\-]+$")
 class JsonlRunEventStore(RunEventStore):
    def __init__(self, base_dir: str | Path | None = None):
        self._base_dir = Path(base_dir) if base_dir else Path(".deer-flow")
        self._seq_counters: dict[str, int] = {}  # thread_id -> current max seq
    @staticmethod
    def _validate_id(value: str, label: str) -> str:
        """Validate that an ID is safe for use in filesystem paths."""
        if not value or not _SAFE_ID_PATTERN.match(value):
            raise ValueError(f"Invalid {label}: must be alphanumeric/dash/underscore, got {value!r}")
        return value
    def _thread_dir(self, thread_id: str) -> Path:
        self._validate_id(thread_id, "thread_id")
        return self._base_dir / "threads" / thread_id / "runs"
    def _run_file(self, thread_id: str, run_id: str) -> Path:
        self._validate_id(run_id, "run_id")
        return self._thread_dir(thread_id) / f"{run_id}.jsonl"
    def _next_seq(self, thread_id: str) -> int:
        self._seq_counters[thread_id] = self._seq_counters.get(thread_id, 0) + 1
        return self._seq_counters[thread_id]
    def _ensure_seq_loaded(self, thread_id: str) -> None:
        """Load max seq from existing files if not yet cached."""
        if thread_id in self._seq_counters:
            return
        max_seq = 0
        thread_dir = self._thread_dir(thread_id)
        if thread_dir.exists():
            for f in thread_dir.glob("*.jsonl"):
                for line in f.read_text(encoding="utf-8").strip().splitlines():
                    try:
                        record = json.loads(line)
                        max_seq = max(max_seq, record.get("seq", 0))
                    except json.JSONDecodeError:
                        logger.debug("Skipping malformed JSONL line in %s", f)
                        continue
        self._seq_counters[thread_id] = max_seq
    def _write_record(self, record: dict) -> None:
        path = self._run_file(record["thread_id"], record["run_id"])
        path.parent.mkdir(parents=True, exist_ok=True)
        with open(path, "a", encoding="utf-8") as f:
            f.write(json.dumps(record, default=str, ensure_ascii=False) + "\n")
    def _read_thread_events(self, thread_id: str) -> list[dict]:
        """Read all events for a thread, sorted by seq."""
        events = []
        thread_dir = self._thread_dir(thread_id)
        if not thread_dir.exists():
            return events
        for f in sorted(thread_dir.glob("*.jsonl")):
            for line in f.read_text(encoding="utf-8").strip().splitlines():
                if not line:
                    continue
                try:
                    events.append(json.loads(line))
                except json.JSONDecodeError:
                    logger.debug("Skipping malformed JSONL line in %s", f)
                    continue
        events.sort(key=lambda e: e.get("seq", 0))
        return events
    def _read_run_events(self, thread_id: str, run_id: str) -> list[dict]:
        """Read events for a specific run file."""
        path = self._run_file(thread_id, run_id)
        if not path.exists():
            return []
        events = []
        for line in path.read_text(encoding="utf-8").strip().splitlines():
            if not line:
                continue
            try:
                events.append(json.loads(line))
            except json.JSONDecodeError:
                logger.debug("Skipping malformed JSONL line in %s", path)
                continue
        events.sort(key=lambda e: e.get("seq", 0))
        return events
    async def put(self, *, thread_id, run_id, event_type, category, content="", metadata=None, created_at=None):
        self._ensure_seq_loaded(thread_id)
        seq = self._next_seq(thread_id)
        record = {
            "thread_id": thread_id,
            "run_id": run_id,
            "event_type": event_type,
            "category": category,
            "content": content,
            "metadata": metadata or {},
            "seq": seq,
            "created_at": created_at or datetime.now(UTC).isoformat(),
        }
        self._write_record(record)
        return record
    async def put_batch(self, events):
        if not events:
            return []
        results = []
        for ev in events:
            record = await self.put(**ev)
            results.append(record)
        return results
    async def list_messages(self, thread_id, *, limit=50, before_seq=None, after_seq=None):
        all_events = self._read_thread_events(thread_id)
        messages = [e for e in all_events if e.get("category") == "message"]
        if before_seq is not None:
            messages = [e for e in messages if e["seq"] < before_seq]
            return messages[-limit:]
        elif after_seq is not None:
            messages = [e for e in messages if e["seq"] > after_seq]
            return messages[:limit]
        else:
            return messages[-limit:]
    async def list_events(self, thread_id, run_id, *, event_types=None, limit=500):
        events = self._read_run_events(thread_id, run_id)
        if event_types is not None:
            events = [e for e in events if e.get("event_type") in event_types]
        return events[:limit]
    async def list_messages_by_run(self, thread_id, run_id, *, limit=50, before_seq=None, after_seq=None):
        events = self._read_run_events(thread_id, run_id)
        filtered = [e for e in events if e.get("category") == "message"]
        if before_seq is not None:
            filtered = [e for e in filtered if e.get("seq", 0) < before_seq]
        if after_seq is not None:
            filtered = [e for e in filtered if e.get("seq", 0) > after_seq]
        if after_seq is not None:
            return filtered[:limit]
        else:
            return filtered[-limit:] if len(filtered) > limit else filtered
    async def count_messages(self, thread_id):
        all_events = self._read_thread_events(thread_id)
        return sum(1 for e in all_events if e.get("category") == "message")
    async def delete_by_thread(self, thread_id):
        all_events = self._read_thread_events(thread_id)
        count = len(all_events)
        thread_dir = self._thread_dir(thread_id)
        if thread_dir.exists():
            for f in thread_dir.glob("*.jsonl"):
                f.unlink()
        self._seq_counters.pop(thread_id, None)
        return count
    async def delete_by_run(self, thread_id, run_id):
        events = self._read_run_events(thread_id, run_id)
        count = len(events)
        path = self._run_file(thread_id, run_id)
        if path.exists():
            path.unlink()
        return count
@@ -0,0 +1,128 @@
 """In-memory RunEventStore. Used when run_events.backend=memory (default) and in tests.
 Thread-safe for single-process async usage (no threading locks needed
 since all mutations happen within the same event loop).
 """
 from __future__ import annotations
 from datetime import UTC, datetime
 from deerflow.runtime.events.store.base import RunEventStore
 class MemoryRunEventStore(RunEventStore):
    def __init__(self) -> None:
        self._events: dict[str, list[dict]] = {}  # thread_id -> sorted event list
        self._seq_counters: dict[str, int] = {}  # thread_id -> last assigned seq
    def _next_seq(self, thread_id: str) -> int:
        current = self._seq_counters.get(thread_id, 0)
        next_val = current + 1
        self._seq_counters[thread_id] = next_val
        return next_val
    def _put_one(
        self,
        *,
        thread_id: str,
        run_id: str,
        event_type: str,
        category: str,
        content: str | dict = "",
        metadata: dict | None = None,
        created_at: str | None = None,
    ) -> dict:
        seq = self._next_seq(thread_id)
        record = {
            "thread_id": thread_id,
            "run_id": run_id,
            "event_type": event_type,
            "category": category,
            "content": content,
            "metadata": metadata or {},
            "seq": seq,
            "created_at": created_at or datetime.now(UTC).isoformat(),
        }
        self._events.setdefault(thread_id, []).append(record)
        return record
    async def put(
        self,
        *,
        thread_id,
        run_id,
        event_type,
        category,
        content="",
        metadata=None,
        created_at=None,
    ):
        return self._put_one(
            thread_id=thread_id,
            run_id=run_id,
            event_type=event_type,
            category=category,
            content=content,
            metadata=metadata,
            created_at=created_at,
        )
    async def put_batch(self, events):
        results = []
        for ev in events:
            record = self._put_one(**ev)
            results.append(record)
        return results
    async def list_messages(self, thread_id, *, limit=50, before_seq=None, after_seq=None):
        all_events = self._events.get(thread_id, [])
        messages = [e for e in all_events if e["category"] == "message"]
        if before_seq is not None:
            messages = [e for e in messages if e["seq"] < before_seq]
            # Take the last `limit` records
            return messages[-limit:]
        elif after_seq is not None:
            messages = [e for e in messages if e["seq"] > after_seq]
            return messages[:limit]
        else:
            # Return the latest `limit` records, ascending
            return messages[-limit:]
    async def list_events(self, thread_id, run_id, *, event_types=None, limit=500):
        all_events = self._events.get(thread_id, [])
        filtered = [e for e in all_events if e["run_id"] == run_id]
        if event_types is not None:
            filtered = [e for e in filtered if e["event_type"] in event_types]
        return filtered[:limit]
    async def list_messages_by_run(self, thread_id, run_id, *, limit=50, before_seq=None, after_seq=None):
        all_events = self._events.get(thread_id, [])
        filtered = [e for e in all_events if e["run_id"] == run_id and e["category"] == "message"]
        if before_seq is not None:
            filtered = [e for e in filtered if e["seq"] < before_seq]
        if after_seq is not None:
            filtered = [e for e in filtered if e["seq"] > after_seq]
        if after_seq is not None:
            return filtered[:limit]
        else:
            return filtered[-limit:] if len(filtered) > limit else filtered
    async def count_messages(self, thread_id):
        all_events = self._events.get(thread_id, [])
        return sum(1 for e in all_events if e["category"] == "message")
    async def delete_by_thread(self, thread_id):
        events = self._events.pop(thread_id, [])
        self._seq_counters.pop(thread_id, None)
        return len(events)
    async def delete_by_run(self, thread_id, run_id):
        all_events = self._events.get(thread_id, [])
        if not all_events:
            return 0
        remaining = [e for e in all_events if e["run_id"] != run_id]
        removed = len(all_events) - len(remaining)
        self._events[thread_id] = remaining
        return removed
@@ -0,0 +1,374 @@
 """Run event capture via LangChain callbacks.
 RunJournal sits between LangChain's callback mechanism and the pluggable
 RunEventStore. It standardizes callback data into RunEvent records and
 handles token usage accumulation.
 Key design decisions:
 - on_llm_new_token is NOT implemented -- only complete messages via on_llm_end
 - on_chat_model_start captures structured prompts as llm_request (OpenAI format) and
  extracts the first human message for run.input, because it is more reliable than
  on_chain_start (fires on every node) — messages here are fully structured.
 - on_chain_start with parent_run_id=None emits a run.start trace marking root invocation.
 - on_llm_end emits llm_response in OpenAI Chat Completions format
 - Token usage accumulated in memory, written to RunRow on run completion
 - Caller identification via tags injection (lead_agent / subagent:{name} / middleware:{name})
 """
 from __future__ import annotations
 import asyncio
 import logging
 import time
 from datetime import UTC, datetime
 from typing import TYPE_CHECKING, Any, cast
 from uuid import UUID
 from langchain_core.callbacks import BaseCallbackHandler
 from langchain_core.messages import AnyMessage, BaseMessage, HumanMessage, ToolMessage
 from langgraph.types import Command
 if TYPE_CHECKING:
    from deerflow.runtime.events.store.base import RunEventStore
 logger = logging.getLogger(__name__)
 class RunJournal(BaseCallbackHandler):
    """LangChain callback handler that captures events to RunEventStore."""
    def __init__(
        self,
        run_id: str,
        thread_id: str,
        event_store: RunEventStore,
        *,
        track_token_usage: bool = True,
        flush_threshold: int = 20,
    ):
        super().__init__()
        self.run_id = run_id
        self.thread_id = thread_id
        self._store = event_store
        self._track_tokens = track_token_usage
        self._flush_threshold = flush_threshold
        # Write buffer
        self._buffer: list[dict] = []
        self._pending_flush_tasks: set[asyncio.Task[None]] = set()
        # Token accumulators
        self._total_input_tokens = 0
        self._total_output_tokens = 0
        self._total_tokens = 0
        self._llm_call_count = 0
        self._lead_agent_tokens = 0
        self._subagent_tokens = 0
        self._middleware_tokens = 0
        # Convenience fields
        self._last_ai_msg: str | None = None
        self._first_human_msg: str | None = None
        self._msg_count = 0
        # Latency tracking
        self._llm_start_times: dict[str, float] = {}  # langchain run_id -> start time
        # LLM request/response tracking
        self._llm_call_index = 0
        self._cached_prompts: dict[str, list[dict]] = {}  # langchain run_id -> OpenAI messages
    # -- Lifecycle callbacks --
    def on_chain_start(
        self,
        serialized: dict[str, Any],
        inputs: dict[str, Any],
        *,
        run_id: UUID,
        parent_run_id: UUID | None = None,
        tags: list[str] | None = None,
        metadata: dict[str, Any] | None = None,
        **kwargs: Any,
    ) -> None:
        caller = self._identify_caller(tags)
        if parent_run_id is None:
            # Root graph invocation — emit a single trace event for the run start.
            chain_name = (serialized or {}).get("name", "unknown")
            self._put(
                event_type="run.start",
                category="trace",
                content={"chain": chain_name},
                metadata={"caller": caller, **(metadata or {})},
            )
    def on_chain_end(self, outputs: Any, *, run_id: UUID, **kwargs: Any) -> None:
        self._put(event_type="run.end", category="outputs", content=outputs, metadata={"status": "success"})
        self._flush_sync()
    def on_chain_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
        self._put(
            event_type="run.error",
            category="error",
            content=str(error),
            metadata={"error_type": type(error).__name__},
        )
        self._flush_sync()
    # -- LLM callbacks --
    def on_chat_model_start(
        self,
        serialized: dict,
        messages: list[list[BaseMessage]],
        *,
        run_id: UUID,
        tags: list[str] | None = None,
        **kwargs: Any,
    ) -> None:
        """Capture structured prompt messages for llm_request event.
        This is also the canonical place to extract the first human message:
        messages are fully structured here, it fires only on real LLM calls,
        and the content is never compressed by checkpoint trimming.
        """
        rid = str(run_id)
        self._llm_start_times[rid] = time.monotonic()
        self._llm_call_index += 1
        # Mark this run_id as seen so on_llm_end knows not to increment again.
        self._cached_prompts[rid] = []
        logger.info(f"on_chat_model_start {run_id}: tags={tags} serialized={serialized} messages={messages}")
        # Capture the first human message sent to any LLM in this run.
        if not self._first_human_msg:
            for batch in messages.reversed():
                for m in batch.reversed():
                    if isinstance(m, HumanMessage) and m.name != "summary":
                        caller = self._identify_caller(tags)
                        self.set_first_human_message(m.text)
                        self._put(
                            event_type="llm.human.input",
                            category="message",
                            content=m.model_dump(),
                            metadata={"caller": caller},
                        )
                        break
                if self._first_human_msg:
                    break
    def on_llm_start(self, serialized: dict, prompts: list[str], *, run_id: UUID, parent_run_id: UUID | None = None, tags: list[str] | None = None, metadata: dict[str, Any] | None = None, **kwargs: Any) -> None:
        # Fallback: on_chat_model_start is preferred. This just tracks latency.
        self._llm_start_times[str(run_id)] = time.monotonic()
    def on_llm_end(self, response, *, run_id, parent_run_id, tags, **kwargs) -> None:
        messages: list[AnyMessage] = []
        logger.info(f"on_llm_end {run_id}: response: {tags} {kwargs}")
        for generation in response.generations:
            for gen in generation:
                if hasattr(gen, "message"):
                    messages.append(gen.message)
                else:
                    logger.warning(f"on_llm_end {run_id}: generation has no message attribute: {gen}")
        for message in messages:
            caller = self._identify_caller(tags)
            # Latency
            rid = str(run_id)
            start = self._llm_start_times.pop(rid, None)
            latency_ms = int((time.monotonic() - start) * 1000) if start else None
            # Token usage from message
            usage = getattr(message, "usage_metadata", None)
            usage_dict = dict(usage) if usage else {}
            # Resolve call index
            call_index = self._llm_call_index
            if rid not in self._cached_prompts:
                # Fallback: on_chat_model_start was not called
                self._llm_call_index += 1
                call_index = self._llm_call_index
            # Trace event: llm_response (OpenAI completion format)
            self._put(
                event_type="llm.ai.response",
                category="message",
                content=message.model_dump(),
                metadata={
                    "caller": caller,
                    "usage": usage_dict,
                    "latency_ms": latency_ms,
                    "llm_call_index": call_index,
                },
            )
            # Token accumulation
            if self._track_tokens:
                input_tk = usage_dict.get("input_tokens", 0) or 0
                output_tk = usage_dict.get("output_tokens", 0) or 0
                total_tk = usage_dict.get("total_tokens", 0) or 0
                if total_tk == 0:
                    total_tk = input_tk + output_tk
                if total_tk > 0:
                    self._total_input_tokens += input_tk
                    self._total_output_tokens += output_tk
                    self._total_tokens += total_tk
                    self._llm_call_count += 1
    def on_llm_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
        self._llm_start_times.pop(str(run_id), None)
        self._put(event_type="llm.error", category="trace", content=str(error))
    def on_tool_start(self, serialized, input_str, *, run_id, parent_run_id=None, tags=None, metadata=None, inputs=None, **kwargs):
        """Handle tool start event, cache tool call ID for later correlation"""
        tool_call_id = str(run_id)
        logger.info(f"Tool start for node {run_id}, tool_call_id={tool_call_id}, tags={tags}, metadata={metadata}")
    def on_tool_end(self, output, *, run_id, parent_run_id=None, **kwargs):
        """Handle tool end event, append message and clear node data"""
        try:
            if isinstance(output, ToolMessage):
                msg = cast(ToolMessage, output)
                self._put(event_type="llm.tool.result", category="message", content=msg.model_dump())
            elif isinstance(output, Command):
                cmd = cast(Command, output)
                messages = cmd.update.get("messages", [])
                for message in messages:
                    if isinstance(message, BaseMessage):
                        self._put(event_type="llm.tool.result", category="message", content=message.model_dump())
                    else:
                        logger.warning(f"on_tool_end {run_id}: command update message is not BaseMessage: {type(message)}")
            else:
                logger.warning(f"on_tool_end {run_id}: output is not ToolMessage: {type(output)}")
        finally:
            logger.info(f"Tool end for node {run_id}")
    # -- Internal methods --
    def _put(self, *, event_type: str, category: str, content: str | dict = "", metadata: dict | None = None) -> None:
        self._buffer.append(
            {
                "thread_id": self.thread_id,
                "run_id": self.run_id,
                "event_type": event_type,
                "category": category,
                "content": content,
                "metadata": metadata or {},
                "created_at": datetime.now(UTC).isoformat(),
            }
        )
        if len(self._buffer) >= self._flush_threshold:
            self._flush_sync()
    def _flush_sync(self) -> None:
        """Best-effort flush of buffer to RunEventStore.
        BaseCallbackHandler methods are synchronous.  If an event loop is
        running we schedule an async ``put_batch``; otherwise the events
        stay in the buffer and are flushed later by the async ``flush()``
        call in the worker's ``finally`` block.
        """
        if not self._buffer:
            return
        # Skip if a flush is already in flight — avoids concurrent writes
        # to the same SQLite file from multiple fire-and-forget tasks.
        if self._pending_flush_tasks:
            return
        try:
            loop = asyncio.get_running_loop()
        except RuntimeError:
            # No event loop — keep events in buffer for later async flush.
            return
        batch = self._buffer.copy()
        self._buffer.clear()
        task = loop.create_task(self._flush_async(batch))
        self._pending_flush_tasks.add(task)
        task.add_done_callback(self._on_flush_done)
    async def _flush_async(self, batch: list[dict]) -> None:
        try:
            await self._store.put_batch(batch)
        except Exception:
            logger.warning(
                "Failed to flush %d events for run %s — returning to buffer",
                len(batch),
                self.run_id,
                exc_info=True,
            )
            # Return failed events to buffer for retry on next flush
            self._buffer = batch + self._buffer
    def _on_flush_done(self, task: asyncio.Task) -> None:
        self._pending_flush_tasks.discard(task)
        if task.cancelled():
            return
        exc = task.exception()
        if exc:
            logger.warning("Journal flush task failed: %s", exc)
    def _identify_caller(self, tags: list[str] | None, **kwargs) -> str:
        _tags = tags or kwargs.get("tags", [])
        for tag in _tags:
            if isinstance(tag, str) and (tag.startswith("subagent:") or tag.startswith("middleware:") or tag == "lead_agent"):
                return tag
        # Default to lead_agent: the main agent graph does not inject
        # callback tags, while subagents and middleware explicitly tag
        # themselves.
        return "lead_agent"
    # -- Public methods (called by worker) --
    def set_first_human_message(self, content: str) -> None:
        """Record the first human message for convenience fields."""
        self._first_human_msg = content[:2000] if content else None
    def record_middleware(self, tag: str, *, name: str, hook: str, action: str, changes: dict) -> None:
        """Record a middleware state-change event.
        Called by middleware implementations when they perform a meaningful
        state change (e.g., title generation, summarization, HITL approval).
        Pure-observation middleware should not call this.
        Args:
            tag: Short identifier for the middleware (e.g., "title", "summarize",
                 "guardrail"). Used to form event_type="middleware:{tag}".
            name: Full middleware class name.
            hook: Lifecycle hook that triggered the action (e.g., "after_model").
            action: Specific action performed (e.g., "generate_title").
            changes: Dict describing the state changes made.
        """
        self._put(
            event_type=f"middleware:{tag}",
            category="middleware",
            content={"name": name, "hook": hook, "action": action, "changes": changes},
        )
    async def flush(self) -> None:
        """Force flush remaining buffer. Called in worker's finally block."""
        if self._pending_flush_tasks:
            await asyncio.gather(*tuple(self._pending_flush_tasks), return_exceptions=True)
        while self._buffer:
            batch = self._buffer[: self._flush_threshold]
            del self._buffer[: self._flush_threshold]
            try:
                await self._store.put_batch(batch)
            except Exception:
                self._buffer = batch + self._buffer
                raise
    def get_completion_data(self) -> dict:
        """Return accumulated token and message data for run completion."""
        return {
            "total_input_tokens": self._total_input_tokens,
            "total_output_tokens": self._total_output_tokens,
            "total_tokens": self._total_tokens,
            "llm_call_count": self._llm_call_count,
            "lead_agent_tokens": self._lead_agent_tokens,
            "subagent_tokens": self._subagent_tokens,
            "middleware_tokens": self._middleware_tokens,
            "message_count": self._msg_count,
            "last_ai_message": self._last_ai_msg,
            "first_human_message": self._first_human_msg,
        }
@@ -2,11 +2,12 @@
 from .manager import ConflictError, RunManager, RunRecord, UnsupportedStrategyError
 from .schemas import DisconnectMode, RunStatus
-from .worker import run_agent
+from .worker import RunContext, run_agent
 __all__ = [
    "ConflictError",
    "DisconnectMode",
    "RunContext",
    "RunManager",
    "RunRecord",
    "RunStatus",
@@ -1,4 +1,4 @@
-"""In-memory run registry."""
+"""In-memory run registry with optional persistent RunStore backing."""
 from __future__ import annotations
@@ -7,9 +7,13 @@ import logging
 import uuid
 from dataclasses import dataclass, field
 from datetime import UTC, datetime
 from typing import TYPE_CHECKING
 from .schemas import DisconnectMode, RunStatus
 if TYPE_CHECKING:
    from deerflow.runtime.runs.store.base import RunStore
 logger = logging.getLogger(__name__)
@@ -38,11 +42,43 @@ class RunRecord:
 class RunManager:
-    """In-memory run registry.  All mutations are protected by an asyncio lock."""
+    """In-memory run registry with optional persistent RunStore backing.
-    def __init__(self) -> None:
+    All mutations are protected by an asyncio lock. When a ``store`` is
    provided, serializable metadata is also persisted to the store so
    that run history survives process restarts.
    """
    def __init__(self, store: RunStore | None = None) -> None:
        self._runs: dict[str, RunRecord] = {}
        self._lock = asyncio.Lock()
        self._store = store
    async def _persist_to_store(self, record: RunRecord) -> None:
        """Best-effort persist run record to backing store."""
        if self._store is None:
            return
        try:
            await self._store.put(
                record.run_id,
                thread_id=record.thread_id,
                assistant_id=record.assistant_id,
                status=record.status.value,
                multitask_strategy=record.multitask_strategy,
                metadata=record.metadata or {},
                kwargs=record.kwargs or {},
                created_at=record.created_at,
            )
        except Exception:
            logger.warning("Failed to persist run %s to store", record.run_id, exc_info=True)
    async def update_run_completion(self, run_id: str, **kwargs) -> None:
        """Persist token usage and completion data to the backing store."""
        if self._store is not None:
            try:
                await self._store.update_run_completion(run_id, **kwargs)
            except Exception:
                logger.warning("Failed to persist run completion for %s", run_id, exc_info=True)
    async def create(
        self,
@@ -71,6 +107,7 @@ class RunManager:
        )
        async with self._lock:
            self._runs[run_id] = record
        await self._persist_to_store(record)
        logger.info("Run created: run_id=%s thread_id=%s", run_id, thread_id)
        return record
@@ -83,7 +120,7 @@ class RunManager:
        async with self._lock:
            # Dict insertion order matches creation order, so reversing it gives
            # us deterministic newest-first results even when timestamps tie.
-            return [r for r in reversed(self._runs.values()) if r.thread_id == thread_id]
+            return [r for r in self._runs.values() if r.thread_id == thread_id]
    async def set_status(self, run_id: str, status: RunStatus, *, error: str | None = None) -> None:
        """Transition a run to a new status."""
@@ -96,6 +133,11 @@ class RunManager:
            record.updated_at = _now_iso()
            if error is not None:
                record.error = error
        if self._store is not None:
            try:
                await self._store.update_status(run_id, status.value, error=error)
            except Exception:
                logger.warning("Failed to persist status update for run %s", run_id, exc_info=True)
        logger.info("Run %s -> %s", run_id, status.value)
    async def cancel(self, run_id: str, *, action: str = "interrupt") -> bool:
@@ -185,6 +227,7 @@ class RunManager:
            )
            self._runs[run_id] = record
        await self._persist_to_store(record)
        logger.info("Run created: run_id=%s thread_id=%s", run_id, thread_id)
        return record
@@ -0,0 +1,4 @@
 from deerflow.runtime.runs.store.base import RunStore
 from deerflow.runtime.runs.store.memory import MemoryRunStore
 __all__ = ["MemoryRunStore", "RunStore"]
@@ -0,0 +1,95 @@
 """Abstract interface for run metadata storage.
 RunManager depends on this interface. Implementations:
 - MemoryRunStore: in-memory dict (development, tests)
 - Future: RunRepository backed by SQLAlchemy ORM
 All methods accept an optional user_id for user isolation.
 When user_id is None, no user filtering is applied (single-user mode).
 """
 from __future__ import annotations
 import abc
 from typing import Any
 class RunStore(abc.ABC):
    @abc.abstractmethod
    async def put(
        self,
        run_id: str,
        *,
        thread_id: str,
        assistant_id: str | None = None,
        user_id: str | None = None,
        status: str = "pending",
        multitask_strategy: str = "reject",
        metadata: dict[str, Any] | None = None,
        kwargs: dict[str, Any] | None = None,
        error: str | None = None,
        created_at: str | None = None,
    ) -> None:
        pass
    @abc.abstractmethod
    async def get(self, run_id: str) -> dict[str, Any] | None:
        pass
    @abc.abstractmethod
    async def list_by_thread(
        self,
        thread_id: str,
        *,
        user_id: str | None = None,
        limit: int = 100,
    ) -> list[dict[str, Any]]:
        pass
    @abc.abstractmethod
    async def update_status(
        self,
        run_id: str,
        status: str,
        *,
        error: str | None = None,
    ) -> None:
        pass
    @abc.abstractmethod
    async def delete(self, run_id: str) -> None:
        pass
    @abc.abstractmethod
    async def update_run_completion(
        self,
        run_id: str,
        *,
        status: str,
        total_input_tokens: int = 0,
        total_output_tokens: int = 0,
        total_tokens: int = 0,
        llm_call_count: int = 0,
        lead_agent_tokens: int = 0,
        subagent_tokens: int = 0,
        middleware_tokens: int = 0,
        message_count: int = 0,
        last_ai_message: str | None = None,
        first_human_message: str | None = None,
        error: str | None = None,
    ) -> None:
        pass
    @abc.abstractmethod
    async def list_pending(self, *, before: str | None = None) -> list[dict[str, Any]]:
        pass
    @abc.abstractmethod
    async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
        """Aggregate token usage for completed runs in a thread.
        Returns a dict with keys: total_tokens, total_input_tokens,
        total_output_tokens, total_runs, by_model (model_name → {tokens, runs}),
        by_caller ({lead_agent, subagent, middleware}).
        """
        pass
@@ -0,0 +1,98 @@
 """In-memory RunStore. Used when database.backend=memory (default) and in tests.
 Equivalent to the original RunManager._runs dict behavior.
 """
 from __future__ import annotations
 from datetime import UTC, datetime
 from typing import Any
 from deerflow.runtime.runs.store.base import RunStore
 class MemoryRunStore(RunStore):
    def __init__(self) -> None:
        self._runs: dict[str, dict[str, Any]] = {}
    async def put(
        self,
        run_id,
        *,
        thread_id,
        assistant_id=None,
        user_id=None,
        status="pending",
        multitask_strategy="reject",
        metadata=None,
        kwargs=None,
        error=None,
        created_at=None,
    ):
        now = datetime.now(UTC).isoformat()
        self._runs[run_id] = {
            "run_id": run_id,
            "thread_id": thread_id,
            "assistant_id": assistant_id,
            "user_id": user_id,
            "status": status,
            "multitask_strategy": multitask_strategy,
            "metadata": metadata or {},
            "kwargs": kwargs or {},
            "error": error,
            "created_at": created_at or now,
            "updated_at": now,
        }
    async def get(self, run_id):
        return self._runs.get(run_id)
    async def list_by_thread(self, thread_id, *, user_id=None, limit=100):
        results = [r for r in self._runs.values() if r["thread_id"] == thread_id and (user_id is None or r.get("user_id") == user_id)]
        results.sort(key=lambda r: r["created_at"], reverse=True)
        return results[:limit]
    async def update_status(self, run_id, status, *, error=None):
        if run_id in self._runs:
            self._runs[run_id]["status"] = status
            if error is not None:
                self._runs[run_id]["error"] = error
            self._runs[run_id]["updated_at"] = datetime.now(UTC).isoformat()
    async def delete(self, run_id):
        self._runs.pop(run_id, None)
    async def update_run_completion(self, run_id, *, status, **kwargs):
        if run_id in self._runs:
            self._runs[run_id]["status"] = status
            for key, value in kwargs.items():
                if value is not None:
                    self._runs[run_id][key] = value
            self._runs[run_id]["updated_at"] = datetime.now(UTC).isoformat()
    async def list_pending(self, *, before=None):
        now = before or datetime.now(UTC).isoformat()
        results = [r for r in self._runs.values() if r["status"] == "pending" and r["created_at"] <= now]
        results.sort(key=lambda r: r["created_at"])
        return results
    async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
        completed = [r for r in self._runs.values() if r["thread_id"] == thread_id and r.get("status") in ("success", "error")]
        by_model: dict[str, dict] = {}
        for r in completed:
            model = r.get("model_name") or "unknown"
            entry = by_model.setdefault(model, {"tokens": 0, "runs": 0})
            entry["tokens"] += r.get("total_tokens", 0)
            entry["runs"] += 1
        return {
            "total_tokens": sum(r.get("total_tokens", 0) for r in completed),
            "total_input_tokens": sum(r.get("total_input_tokens", 0) for r in completed),
            "total_output_tokens": sum(r.get("total_output_tokens", 0) for r in completed),
            "total_runs": len(completed),
            "by_model": by_model,
            "by_caller": {
                "lead_agent": sum(r.get("lead_agent_tokens", 0) for r in completed),
                "subagent": sum(r.get("subagent_tokens", 0) for r in completed),
                "middleware": sum(r.get("middleware_tokens", 0) for r in completed),
            },
        }
@@ -19,7 +19,11 @@ import asyncio
 import copy
 import inspect
 import logging
-from typing import Any, Literal
+from dataclasses import dataclass, field
 from typing import TYPE_CHECKING, Any, Literal
 if TYPE_CHECKING:
    from langchain_core.messages import HumanMessage
 from deerflow.runtime.serialization import serialize
 from deerflow.runtime.stream_bridge import StreamBridge
@@ -33,13 +37,28 @@ logger = logging.getLogger(__name__)
 _VALID_LG_MODES = {"values", "updates", "checkpoints", "tasks", "debug", "messages", "custom"}
@dataclass(frozen=True)
 class RunContext:
    """Infrastructure dependencies for a single agent run.
    Groups checkpointer, store, and persistence-related singletons so that
    ``run_agent`` (and any future callers) receive one object instead of a
    growing list of keyword arguments.
    """
    checkpointer: Any
    store: Any | None = field(default=None)
    event_store: Any | None = field(default=None)
    run_events_config: Any | None = field(default=None)
    thread_store: Any | None = field(default=None)
 async def run_agent(
    bridge: StreamBridge,
    run_manager: RunManager,
    record: RunRecord,
    *,
-    checkpointer: Any,
+    ctx: RunContext,
    store: Any | None = None,
    agent_factory: Any,
    graph_input: dict,
    config: dict,
@@ -50,6 +69,13 @@ async def run_agent(
 ) -> None:
    """Execute an agent in the background, publishing events to *bridge*."""
    # Unpack infrastructure dependencies from RunContext.
    checkpointer = ctx.checkpointer
    store = ctx.store
    event_store = ctx.event_store
    run_events_config = ctx.run_events_config
    thread_store = ctx.thread_store
    run_id = record.run_id
    thread_id = record.thread_id
    requested_modes: set[str] = set(stream_modes or ["values"])
@@ -57,6 +83,10 @@ async def run_agent(
    pre_run_snapshot: dict[str, Any] | None = None
    snapshot_capture_failed = False
    journal = None
    journal = None
    # Track whether "events" was requested but skipped
    if "events" in requested_modes:
        logger.info(
@@ -65,6 +95,22 @@ async def run_agent(
        )
    try:
        # Initialize RunJournal + write human_message event.
        # These are inside the try block so any exception (e.g. a DB
        # error writing the event) flows through the except/finally
        # path that publishes an "end" event to the SSE bridge —
        # otherwise a failure here would leave the stream hanging
        # with no terminator.
        if event_store is not None:
            from deerflow.runtime.journal import RunJournal
            journal = RunJournal(
                run_id=run_id,
                thread_id=thread_id,
                event_store=event_store,
                track_token_usage=getattr(run_events_config, "track_token_usage", True),
            )
        # 1. Mark running
        await run_manager.set_status(run_id, RunStatus.running)
@@ -102,14 +148,20 @@ async def run_agent(
        # Inject runtime context so middlewares can access thread_id
        # (langgraph-cli does this automatically; we must do it manually)
-        runtime = Runtime(context={"thread_id": thread_id}, store=store)
+        runtime = Runtime(context={"thread_id": thread_id, "run_id": run_id}, store=store)
        # If the caller already set a ``context`` key (LangGraph >= 0.6.0
        # prefers it over ``configurable`` for thread-level data), make
        # sure ``thread_id`` is available there too.
        if "context" in config and isinstance(config["context"], dict):
            config["context"].setdefault("thread_id", thread_id)
            config["context"].setdefault("run_id", run_id)
        config.setdefault("configurable", {})["__pregel_runtime"] = runtime
        # Inject RunJournal as a LangChain callback handler.
        # on_llm_end captures token usage; on_chain_start/end captures lifecycle.
        if journal is not None:
            config.setdefault("callbacks", []).append(journal)
        runnable_config = RunnableConfig(**config)
        agent = agent_factory(config=runnable_config)
@@ -236,6 +288,41 @@ async def run_agent(
        )
    finally:
        # Flush any buffered journal events and persist completion data
        if journal is not None:
            try:
                await journal.flush()
            except Exception:
                logger.warning("Failed to flush journal for run %s", run_id, exc_info=True)
            try:
                # Persist token usage + convenience fields to RunStore
                completion = journal.get_completion_data()
                await run_manager.update_run_completion(run_id, status=record.status.value, **completion)
            except Exception:
                logger.warning("Failed to persist run completion for %s (non-fatal)", run_id, exc_info=True)
        # Sync title from checkpoint to threads_meta.display_name
        if checkpointer is not None and thread_store is not None:
            try:
                ckpt_config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
                ckpt_tuple = await checkpointer.aget_tuple(ckpt_config)
                if ckpt_tuple is not None:
                    ckpt = getattr(ckpt_tuple, "checkpoint", {}) or {}
                    title = ckpt.get("channel_values", {}).get("title")
                    if title:
                        await thread_store.update_display_name(thread_id, title)
            except Exception:
                logger.debug("Failed to sync title for thread %s (non-fatal)", thread_id)
        # Update threads_meta status based on run outcome
        if thread_store is not None:
            try:
                final_status = "idle" if record.status == RunStatus.success else record.status.value
                await thread_store.update_status(thread_id, final_status)
            except Exception:
                logger.debug("Failed to update thread_meta status for %s (non-fatal)", thread_id)
        await bridge.publish_end(run_id)
        asyncio.create_task(bridge.cleanup(run_id, delay=60))
@@ -355,6 +442,31 @@ def _lg_mode_to_sse_event(mode: str) -> str:
    return mode
 def _extract_human_message(graph_input: dict) -> HumanMessage | None:
    """Extract or construct a HumanMessage from graph_input for event recording.
    Returns a LangChain HumanMessage so callers can use .model_dump() to get
    the checkpoint-aligned serialization format.
    """
    from langchain_core.messages import HumanMessage
    messages = graph_input.get("messages")
    if not messages:
        return None
    last = messages[-1] if isinstance(messages, list) else messages
    if isinstance(last, HumanMessage):
        return last
    if isinstance(last, str):
        return HumanMessage(content=last) if last else None
    if hasattr(last, "content"):
        content = last.content
        return HumanMessage(content=content)
    if isinstance(last, dict):
        content = last.get("content", "")
        return HumanMessage(content=content) if content else None
    return None
 def _unpack_stream_item(
    item: Any,
    lg_modes: list[str],
--- a/Show More
+++ b/Show More