Merge remote-tracking branch 'origin/rayhpeng/persistence-scaffold' into rayhpeng/persistence-scaffold

refactor(gateway): route all thread metadata access through ThreadMetaStore
Following the rename/delete bug fix in PR1, migrate the remaining direct LangGraph Store reads/writes in the threads router and services to the ThreadMetaStore abstraction so that the sqlite and memory backends behave identically and the legacy dual-write paths can be removed. Migrated endpoints (threads.py): - create_thread: idempotency check + write now use thread_meta_repo.get/create instead of dual-writing the LangGraph Store and the SQL row. - get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback for legacy threads is preserved. - patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata. - delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete already covers it. Removed dead code (services.py): - _upsert_thread_in_store — redundant with the immediately following thread_meta_repo.create() call. - _sync_thread_title_after_run — worker.py's finally block already syncs the title via thread_meta_repo.update_display_name() after each run. Removed dead code (threads.py): - _store_get / _store_put / _store_upsert helpers (no remaining callers). - THREADS_NS constant. - get_store import (router no longer touches the LangGraph Store directly). New abstract method: - ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into the thread's metadata field. Implemented in both ThreadMetaRepository (SQL, read-modify-write inside one session) and MemoryThreadMetaStore. Three new unit tests cover merge / empty / nonexistent behaviour. Net change: -134 lines. Full test suite: 1693 passed, 14 skipped. Verified end-to-end with curl in gateway mode against sqlite backend (create / patch / get / rename / search / delete). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:58:31 +08:00 · 2026-04-07 10:56:03 +08:00 · 2026-04-07 10:42:26 +08:00 · 2026-04-07 10:32:40 +08:00 · 2026-04-07 09:52:56 +08:00 · 2026-04-07 09:44:38 +08:00
270 changed files with 22193 additions and 1805 deletions
@@ -17,6 +17,7 @@ INFOQUEST_API_KEY=your-infoquest-api-key
 # DEEPSEEK_API_KEY=your-deepseek-api-key
 # NOVITA_API_KEY=your-novita-api-key  # OpenAI-compatible, see https://novita.ai
 # MINIMAX_API_KEY=your-minimax-api-key  # OpenAI-compatible, see https://platform.minimax.io
+# VLLM_API_KEY=your-vllm-api-key  # OpenAI-compatible
 # FEISHU_APP_ID=your-feishu-app-id
 # FEISHU_APP_SECRET=your-feishu-app-secret

@@ -32,3 +33,9 @@ INFOQUEST_API_KEY=your-infoquest-api-key

 # GitHub API Token
 # GITHUB_TOKEN=your-github-token
+
+# Database (only needed when config.yaml has database.backend: postgres)
+# DATABASE_URL=postgresql://deerflow:password@localhost:5432/deerflow
+#
+# WECOM_BOT_ID=your-wecom-bot-id
+# WECOM_BOT_SECRET=your-wecom-bot-secret
@@ -54,3 +54,6 @@ web/
 # Deployment artifacts
 backend/Dockerfile.langgraph
 config.yaml.bak
+.playwright-mcp
+.gstack/
+.worktrees
@@ -310,7 +310,7 @@ Every pull request runs the backend regression workflow at [.github/workflows/ba

 - [Configuration Guide](backend/docs/CONFIGURATION.md) - Setup and configuration
 - [Architecture Overview](backend/CLAUDE.md) - Technical architecture
- [MCP Setup Guide](MCP_SETUP.md) - Model Context Protocol configuration
+- [MCP Setup Guide](backend/docs/MCP_SERVER.md) - Model Context Protocol configuration

 ## Need Help?

@@ -1,13 +1,15 @@
 # DeerFlow - Unified Development Environment

-.PHONY: help config config-upgrade check install dev dev-daemon start stop up down clean docker-init docker-start docker-stop docker-logs docker-logs-frontend docker-logs-gateway
+.PHONY: help config config-upgrade check install dev dev-pro dev-daemon dev-daemon-pro start start-pro start-daemon start-daemon-pro stop up up-pro down clean docker-init docker-start docker-start-pro docker-stop docker-logs docker-logs-frontend docker-logs-gateway

-PYTHON ?= python
 BASH ?= bash

 # Detect OS for Windows compatibility
 ifeq ($(OS),Windows_NT)
    SHELL := cmd.exe
+    PYTHON ?= python
+else
+    PYTHON ?= python3
 endif

 help:
@@ -18,18 +20,25 @@ help:
 	@echo "  make install         - Install all dependencies (frontend + backend)"
 	@echo "  make setup-sandbox   - Pre-pull sandbox container image (recommended)"
 	@echo "  make dev             - Start all services in development mode (with hot-reloading)"
-	@echo "  make dev-daemon      - Start all services in background (daemon mode)"
+	@echo "  make dev-pro         - Start in dev + Gateway mode (experimental, no LangGraph server)"
+	@echo "  make dev-daemon      - Start dev services in background (daemon mode)"
+	@echo "  make dev-daemon-pro  - Start dev daemon + Gateway mode (experimental)"
 	@echo "  make start           - Start all services in production mode (optimized, no hot-reloading)"
+	@echo "  make start-pro       - Start in prod + Gateway mode (experimental)"
+	@echo "  make start-daemon    - Start prod services in background (daemon mode)"
+	@echo "  make start-daemon-pro - Start prod daemon + Gateway mode (experimental)"
 	@echo "  make stop            - Stop all running services"
 	@echo "  make clean           - Clean up processes and temporary files"
 	@echo ""
 	@echo "Docker Production Commands:"
 	@echo "  make up              - Build and start production Docker services (localhost:2026)"
+	@echo "  make up-pro          - Build and start production Docker in Gateway mode (experimental)"
 	@echo "  make down            - Stop and remove production Docker containers"
 	@echo ""
 	@echo "Docker Development Commands:"
 	@echo "  make docker-init     - Pull the sandbox image"
 	@echo "  make docker-start    - Start Docker services (mode-aware from config.yaml, localhost:2026)"
+	@echo "  make docker-start-pro - Start Docker in Gateway mode (experimental, no LangGraph container)"
 	@echo "  make docker-stop     - Stop Docker development services"
 	@echo "  make docker-logs     - View Docker development logs"
 	@echo "  make docker-logs-frontend - View Docker frontend logs"
@@ -96,39 +105,79 @@ setup-sandbox:

 # Start all services in development mode (with hot-reloading)
 dev:
+	@$(PYTHON) ./scripts/check.py
 ifeq ($(OS),Windows_NT)
 	@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --dev
 else
 	@./scripts/serve.sh --dev
 endif

+# Start all services in dev + Gateway mode (experimental: agent runtime embedded in Gateway)
+dev-pro:
+	@$(PYTHON) ./scripts/check.py
+ifeq ($(OS),Windows_NT)
+	@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --dev --gateway
+else
+	@./scripts/serve.sh --dev --gateway
+endif
+
 # Start all services in production mode (with optimizations)
 start:
+	@$(PYTHON) ./scripts/check.py
 ifeq ($(OS),Windows_NT)
 	@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --prod
 else
 	@./scripts/serve.sh --prod
 endif

+# Start all services in prod + Gateway mode (experimental)
+start-pro:
+	@$(PYTHON) ./scripts/check.py
+ifeq ($(OS),Windows_NT)
+	@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --prod --gateway
+else
+	@./scripts/serve.sh --prod --gateway
+endif
+
 # Start all services in daemon mode (background)
 dev-daemon:
-	@./scripts/start-daemon.sh
+	@$(PYTHON) ./scripts/check.py
+ifeq ($(OS),Windows_NT)
+	@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --dev --daemon
+else
+	@./scripts/serve.sh --dev --daemon
+endif
+
+# Start daemon + Gateway mode (experimental)
+dev-daemon-pro:
+	@$(PYTHON) ./scripts/check.py
+ifeq ($(OS),Windows_NT)
+	@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --dev --gateway --daemon
+else
+	@./scripts/serve.sh --dev --gateway --daemon
+endif
+
+# Start prod services in daemon mode (background)
+start-daemon:
+	@$(PYTHON) ./scripts/check.py
+ifeq ($(OS),Windows_NT)
+	@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --prod --daemon
+else
+	@./scripts/serve.sh --prod --daemon
+endif
+
+# Start prod daemon + Gateway mode (experimental)
+start-daemon-pro:
+	@$(PYTHON) ./scripts/check.py
+ifeq ($(OS),Windows_NT)
+	@call scripts\run-with-git-bash.cmd ./scripts/serve.sh --prod --gateway --daemon
+else
+	@./scripts/serve.sh --prod --gateway --daemon
+endif

 # Stop all services
 stop:
-	@echo "Stopping all services..."
-	@-pkill -f "langgraph dev" 2>/dev/null || true
-	@-pkill -f "uvicorn app.gateway.app:app" 2>/dev/null || true
-	@-pkill -f "next dev" 2>/dev/null || true
-	@-pkill -f "next start" 2>/dev/null || true
-	@-pkill -f "next-server" 2>/dev/null || true
-	@-pkill -f "next-server" 2>/dev/null || true
-	@-nginx -c $(PWD)/docker/nginx/nginx.local.conf -p $(PWD) -s quit 2>/dev/null || true
-	@sleep 1
-	@-pkill -9 nginx 2>/dev/null || true
-	@echo "Cleaning up sandbox containers..."
-	@-./scripts/cleanup-containers.sh deer-flow-sandbox 2>/dev/null || true
-	@echo "✓ All services stopped"
+	@./scripts/serve.sh --stop

 # Clean up
 clean: stop
@@ -150,6 +199,10 @@ docker-init:
 docker-start:
 	@./scripts/docker.sh start

+# Start Docker in Gateway mode (experimental)
+docker-start-pro:
+	@./scripts/docker.sh start --gateway
+
 # Stop Docker development environment
 docker-stop:
 	@./scripts/docker.sh stop
@@ -172,6 +225,10 @@ docker-logs-gateway:
 up:
 	@./scripts/deploy.sh

+# Build and start production services in Gateway mode
+up-pro:
+	@./scripts/deploy.sh --gateway
+
 # Stop and remove production containers
 down:
 	@./scripts/deploy.sh down
@@ -46,6 +46,7 @@ DeerFlow has newly integrated the intelligent search and crawling toolset indepe

 - [🦌 DeerFlow - 2.0](#-deerflow---20)
  - [Official Website](#official-website)
+  - [Coding Plan from ByteDance Volcengine](#coding-plan-from-bytedance-volcengine)
  - [InfoQuest](#infoquest)
  - [Table of Contents](#table-of-contents)
  - [One-Line Agent Setup](#one-line-agent-setup)
@@ -59,6 +60,8 @@ DeerFlow has newly integrated the intelligent search and crawling toolset indepe
      - [MCP Server](#mcp-server)
      - [IM Channels](#im-channels)
      - [LangSmith Tracing](#langsmith-tracing)
+      - [Langfuse Tracing](#langfuse-tracing)
+      - [Using Both Providers](#using-both-providers)
  - [From Deep Research to Super Agent Harness](#from-deep-research-to-super-agent-harness)
  - [Core Features](#core-features)
    - [Skills \& Tools](#skills--tools)
@@ -71,6 +74,8 @@ DeerFlow has newly integrated the intelligent search and crawling toolset indepe
  - [Embedded Python Client](#embedded-python-client)
  - [Documentation](#documentation)
  - [⚠️ Security Notice](#️-security-notice)
+    - [Improper Deployment May Introduce Security Risks](#improper-deployment-may-introduce-security-risks)
+    - [Security Recommendations](#security-recommendations)
  - [Contributing](#contributing)
  - [License](#license)
  - [Acknowledgments](#acknowledgments)
@@ -136,12 +141,26 @@ That prompt is intended for coding agents. It tells the agent to clone the repo
       api_key: $OPENAI_API_KEY
       use_responses_api: true
       output_version: responses/v1
+
+     - name: qwen3-32b-vllm
+       display_name: Qwen3 32B (vLLM)
+       use: deerflow.models.vllm_provider:VllmChatModel
+       model: Qwen/Qwen3-32B
+       api_key: $VLLM_API_KEY
+       base_url: http://localhost:8000/v1
+       supports_thinking: true
+       when_thinking_enabled:
+         extra_body:
+           chat_template_kwargs:
+             enable_thinking: true
   ```

   OpenRouter and similar OpenAI-compatible gateways should be configured with `langchain_openai:ChatOpenAI` plus `base_url`. If you prefer a provider-specific environment variable name, point `api_key` at that variable explicitly (for example `api_key: $OPENROUTER_API_KEY`).

   To route OpenAI models through `/v1/responses`, keep using `langchain_openai:ChatOpenAI` and set `use_responses_api: true` with `output_version: responses/v1`.

+   For vLLM 0.19.0, use `deerflow.models.vllm_provider:VllmChatModel`. For Qwen-style reasoning models, DeerFlow toggles reasoning with `extra_body.chat_template_kwargs.enable_thinking` and preserves vLLM's non-standard `reasoning` field across multi-turn tool-call conversations. Legacy `thinking` configs are normalized automatically for backward compatibility. Reasoning models may also require the server to be started with `--reasoning-parser ...`. If your local vLLM deployment accepts any non-empty API key, you can still set `VLLM_API_KEY` to a placeholder value.
+
   CLI-backed provider examples:

   ```yaml
@@ -243,6 +262,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed Docker development guide.
 If you prefer running services locally:

 Prerequisite: complete the "Configuration" steps above first (`make config` and model API keys). `make dev` requires a valid configuration file (defaults to `config.yaml` in the project root; can be overridden via `DEER_FLOW_CONFIG_PATH`).
+On Windows, run the local development flow from Git Bash. Native `cmd.exe` and PowerShell shells are not supported for the bash-based service scripts, and WSL is not guaranteed because some scripts rely on Git for Windows utilities such as `cygpath`.

 1. **Check prerequisites**:
   ```bash
@@ -274,6 +294,60 @@ Prerequisite: complete the "Configuration" steps above first (`make config` and

 6. **Access**: http://localhost:2026

+#### Startup Modes
+
+DeerFlow supports multiple startup modes across two dimensions:
+
+- **Dev / Prod** — dev enables hot-reload; prod uses pre-built frontend
+- **Standard / Gateway** — standard uses a separate LangGraph server (4 processes); Gateway mode (experimental) embeds the agent runtime in the Gateway API (3 processes)
+
+| | **Local Foreground** | **Local Daemon** | **Docker Dev** | **Docker Prod** |
+|---|---|---|---|---|
+| **Dev** | `./scripts/serve.sh --dev`<br/>`make dev` | `./scripts/serve.sh --dev --daemon`<br/>`make dev-daemon` | `./scripts/docker.sh start`<br/>`make docker-start` | — |
+| **Dev + Gateway** | `./scripts/serve.sh --dev --gateway`<br/>`make dev-pro` | `./scripts/serve.sh --dev --gateway --daemon`<br/>`make dev-daemon-pro` | `./scripts/docker.sh start --gateway`<br/>`make docker-start-pro` | — |
+| **Prod** | `./scripts/serve.sh --prod`<br/>`make start` | `./scripts/serve.sh --prod --daemon`<br/>`make start-daemon` | — | `./scripts/deploy.sh`<br/>`make up` |
+| **Prod + Gateway** | `./scripts/serve.sh --prod --gateway`<br/>`make start-pro` | `./scripts/serve.sh --prod --gateway --daemon`<br/>`make start-daemon-pro` | — | `./scripts/deploy.sh --gateway`<br/>`make up-pro` |
+
+| Action | Local | Docker Dev | Docker Prod |
+|---|---|---|---|
+| **Stop** | `./scripts/serve.sh --stop`<br/>`make stop` | `./scripts/docker.sh stop`<br/>`make docker-stop` | `./scripts/deploy.sh down`<br/>`make down` |
+| **Restart** | `./scripts/serve.sh --restart [flags]` | `./scripts/docker.sh restart` | — |
+
+> **Gateway mode** eliminates the LangGraph server process — the Gateway API handles agent execution directly via async tasks, managing its own concurrency.
+
+#### Why Gateway Mode?
+
+In standard mode, DeerFlow runs a dedicated [LangGraph Platform](https://langchain-ai.github.io/langgraph/) server alongside the Gateway API. This architecture works well but has trade-offs:
+
+| | Standard Mode | Gateway Mode |
+|---|---|---|
+| **Architecture** | Gateway (REST API) + LangGraph (agent runtime) | Gateway embeds agent runtime |
+| **Concurrency** | `--n-jobs-per-worker` per worker (requires license) | `--workers` × async tasks (no per-worker cap) |
+| **Containers / Processes** | 4 (frontend, gateway, langgraph, nginx) | 3 (frontend, gateway, nginx) |
+| **Resource usage** | Higher (two Python runtimes) | Lower (single Python runtime) |
+| **LangGraph Platform license** | Required for production images | Not required |
+| **Cold start** | Slower (two services to initialize) | Faster |
+
+Both modes are functionally equivalent — the same agents, tools, and skills work in either mode.
+
+#### Docker Production Deployment
+
+`deploy.sh` supports building and starting separately. Images are mode-agnostic — runtime mode is selected at start time:
+
+```bash
+# One-step (build + start)
+deploy.sh                    # standard mode (default)
+deploy.sh --gateway          # gateway mode
+
+# Two-step (build once, start with any mode)
+deploy.sh build              # build all images
+deploy.sh start              # start in standard mode
+deploy.sh start --gateway    # start in gateway mode
+
+# Stop
+deploy.sh down
+```
+
 ### Advanced
 #### Sandbox Mode

@@ -301,6 +375,7 @@ DeerFlow supports receiving tasks from messaging apps. Channels auto-start when
 | Telegram | Bot API (long-polling) | Easy |
 | Slack | Socket Mode | Moderate |
 | Feishu / Lark | WebSocket | Moderate |
+| WeCom | WebSocket | Moderate |

 **Configuration in `config.yaml`:**

@@ -328,6 +403,11 @@ channels:
    # domain: https://open.feishu.cn       # China (default)
    # domain: https://open.larksuite.com   # International

+  wecom:
+    enabled: true
+    bot_id: $WECOM_BOT_ID
+    bot_secret: $WECOM_BOT_SECRET
+
  slack:
    enabled: true
    bot_token: $SLACK_BOT_TOKEN     # xoxb-...
@@ -371,6 +451,10 @@ SLACK_APP_TOKEN=xapp-...
 # Feishu / Lark
 FEISHU_APP_ID=cli_xxxx
 FEISHU_APP_SECRET=your_app_secret
+
+# WeCom
+WECOM_BOT_ID=your_bot_id
+WECOM_BOT_SECRET=your_bot_secret
 ```

 **Telegram Setup**
@@ -393,6 +477,14 @@ FEISHU_APP_SECRET=your_app_secret
 3. Under **Events**, subscribe to `im.message.receive_v1` and select **Long Connection** mode.
 4. Copy the App ID and App Secret. Set `FEISHU_APP_ID` and `FEISHU_APP_SECRET` in `.env` and enable the channel in `config.yaml`.

+**WeCom Setup**
+
+1. Create a bot on the WeCom AI Bot platform and obtain the `bot_id` and `bot_secret`.
+2. Enable `channels.wecom` in `config.yaml` and fill in `bot_id` / `bot_secret`.
+3. Set `WECOM_BOT_ID` and `WECOM_BOT_SECRET` in `.env`.
+4. Make sure backend dependencies include `wecom-aibot-python-sdk`. The channel uses a WebSocket long connection and does not require a public callback URL.
+5. The current integration supports inbound text, image, and file messages. Final images/files generated by the agent are also sent back to the WeCom conversation.
+
 When DeerFlow runs in Docker Compose, IM channels execute inside the `gateway` container. In that case, do not point `channels.langgraph_url` or `channels.gateway_url` at `localhost`; use container service names such as `http://langgraph:2024` and `http://gateway:8001`, or set `DEER_FLOW_CHANNELS_LANGGRAPH_URL` and `DEER_FLOW_CHANNELS_GATEWAY_URL`.

 **Commands**
@@ -422,6 +514,27 @@ LANGSMITH_API_KEY=lsv2_pt_xxxxxxxxxxxxxxxx
 LANGSMITH_PROJECT=xxx
 ```

+#### Langfuse Tracing
+
+DeerFlow also supports [Langfuse](https://langfuse.com) observability for LangChain-compatible runs.
+
+Add the following to your `.env` file:
+
+```bash
+LANGFUSE_TRACING=true
+LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxxxxxxxxxx
+LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxxxxxxxxxx
+LANGFUSE_BASE_URL=https://cloud.langfuse.com
+```
+
+If you are using a self-hosted Langfuse instance, set `LANGFUSE_BASE_URL` to your deployment URL.
+
+#### Using Both Providers
+
+If both LangSmith and Langfuse are enabled, DeerFlow attaches both tracing callbacks and reports the same model activity to both systems.
+
+If a provider is explicitly enabled but missing required credentials, or if its callback fails to initialize, DeerFlow fails fast when tracing is initialized during model creation and the error message names the provider that caused the failure.
+
 For Docker deployments, tracing is disabled by default. Set `LANGSMITH_TRACING=true` and `LANGSMITH_API_KEY` in your `.env` to enable it.

 ## From Deep Research to Super Agent Harness
@@ -180,6 +180,7 @@ make down   # 停止并移除容器
 如果你更希望直接在本地启动各个服务：

 前提：先完成上面的“配置”步骤（`make config` 和模型 API key 配置）。`make dev` 需要有效配置文件，默认读取项目根目录下的 `config.yaml`，也可以通过 `DEER_FLOW_CONFIG_PATH` 覆盖。
+在 Windows 上，请使用 Git Bash 运行本地开发流程。基于 bash 的服务脚本不支持直接在原生 `cmd.exe` 或 PowerShell 中执行，且 WSL 也不保证可用，因为部分脚本依赖 Git for Windows 的 `cygpath` 等工具。

 1. **检查依赖环境**：
   ```bash
@@ -231,6 +232,7 @@ DeerFlow 支持从即时通讯应用接收任务。只要配置完成，对应
 | Telegram | Bot API（long-polling） | 简单 |
 | Slack | Socket Mode | 中等 |
 | Feishu / Lark | WebSocket | 中等 |
+| 企业微信智能机器人 | WebSocket | 中等 |

 **`config.yaml` 中的配置示例：**

@@ -258,6 +260,11 @@ channels:
    # domain: https://open.feishu.cn       # 国内版（默认）
    # domain: https://open.larksuite.com   # 国际版

+  wecom:
+    enabled: true
+    bot_id: $WECOM_BOT_ID
+    bot_secret: $WECOM_BOT_SECRET
+
  slack:
    enabled: true
    bot_token: $SLACK_BOT_TOKEN     # xoxb-...
@@ -301,6 +308,10 @@ SLACK_APP_TOKEN=xapp-...
 # Feishu / Lark
 FEISHU_APP_ID=cli_xxxx
 FEISHU_APP_SECRET=your_app_secret
+
+# 企业微信智能机器人
+WECOM_BOT_ID=your_bot_id
+WECOM_BOT_SECRET=your_bot_secret
 ```

 **Telegram 配置**
@@ -323,6 +334,14 @@ FEISHU_APP_SECRET=your_app_secret
 3. 在 **事件订阅** 中订阅 `im.message.receive_v1`，连接方式选择 **长连接**。
 4. 复制 App ID 和 App Secret，在 `.env` 中设置 `FEISHU_APP_ID` 和 `FEISHU_APP_SECRET`，并在 `config.yaml` 中启用该渠道。

+**企业微信智能机器人配置**
+
+1. 在企业微信智能机器人平台创建机器人，获取 `bot_id` 和 `bot_secret`。
+2. 在 `config.yaml` 中启用 `channels.wecom`，并填入 `bot_id` / `bot_secret`。
+3. 在 `.env` 中设置 `WECOM_BOT_ID` 和 `WECOM_BOT_SECRET`。
+4. 安装后端依赖时确保包含 `wecom-aibot-python-sdk`，渠道会通过 WebSocket 长连接接收消息，无需公网回调地址。
+5. 当前支持文本、图片和文件入站消息；agent 生成的最终图片/文件也会回传到企业微信会话中。
+
 **命令**

 渠道连接完成后，你可以直接在聊天窗口里和 DeerFlow 交互：
@@ -13,6 +13,10 @@ DeerFlow is a LangGraph-based AI super agent system with a full-stack architectu
 - **Nginx** (port 2026): Unified reverse proxy entry point
 - **Provisioner** (port 8002, optional in Docker dev): Started only when sandbox is configured for provisioner/Kubernetes mode

+**Runtime Modes**:
+- **Standard mode** (`make dev`): LangGraph Server handles agent execution as a separate process. 4 processes total.
+- **Gateway mode** (`make dev-pro`, experimental): Agent runtime embedded in Gateway via `RunManager` + `run_agent()` + `StreamBridge` (`packages/harness/deerflow/runtime/`). Service manages its own concurrency via async tasks. 3 processes total, no LangGraph Server.
+
 **Project Structure**:
 ```
 deer-flow/
@@ -80,6 +84,8 @@ When making code changes, you MUST update the relevant documentation:
 make check      # Check system requirements
 make install    # Install all dependencies (frontend + backend)
 make dev        # Start all services (LangGraph + Gateway + Frontend + Nginx), with config.yaml preflight
+make dev-pro    # Gateway mode (experimental): skip LangGraph, agent runtime embedded in Gateway
+make start-pro  # Production + Gateway mode (experimental)
 make stop       # Stop all services
 ```

@@ -232,7 +238,7 @@ Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` →
 - `ls` - Directory listing (tree format, max 2 levels)
 - `read_file` - Read file contents with optional line range
 - `write_file` - Write/append to files, creates directories
- `str_replace` - Substring replacement (single or all occurrences)
+- `str_replace` - Substring replacement (single or all occurrences); same-path serialization is scoped to `(sandbox.id, path)` so isolated sandboxes do not contend on identical virtual paths inside one process

 ### Subagent System (`packages/harness/deerflow/subagents/`)

@@ -287,10 +293,17 @@ Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` →

 - `create_chat_model(name, thinking_enabled)` instantiates LLM from config via reflection
 - Supports `thinking_enabled` flag with per-model `when_thinking_enabled` overrides
+- Supports vLLM-style thinking toggles via `when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking` for Qwen reasoning models, while normalizing legacy `thinking` configs for backward compatibility
 - Supports `supports_vision` flag for image understanding models
 - Config values starting with `$` resolved as environment variables
 - Missing provider modules surface actionable install hints from reflection resolvers (for example `uv add langchain-google-genai`)

+### vLLM Provider (`packages/harness/deerflow/models/vllm_provider.py`)
+
+- `VllmChatModel` subclasses `langchain_openai:ChatOpenAI` for vLLM 0.19.0 OpenAI-compatible endpoints
+- Preserves vLLM's non-standard assistant `reasoning` field on full responses, streaming deltas, and follow-up tool-call turns
+- Designed for configs that enable thinking through `extra_body.chat_template_kwargs.enable_thinking` on vLLM 0.19.0 Qwen reasoning models, while accepting the older `thinking` alias
+
 ### IM Channels System (`app/channels/`)

 Bridges external messaging platforms (Feishu, Slack, Telegram) to the DeerFlow agent via the LangGraph Server.
@@ -359,6 +372,7 @@ Focused regression coverage for the updater lives in `backend/tests/test_memory_

 **`config.yaml`** key sections:
 - `models[]` - LLM configs with `use` class path, `supports_thinking`, `supports_vision`, provider-specific fields
+- vLLM reasoning models should use `deerflow.models.vllm_provider:VllmChatModel`; for Qwen-style parsers prefer `when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking`, and DeerFlow will also normalize the older `thinking` alias
 - `tools[]` - Tool configs with `use` variable path and `group`
 - `tool_groups[]` - Logical groupings for tools
 - `sandbox.use` - Sandbox provider class path
@@ -436,8 +450,25 @@ make dev

 This starts all services and makes the application available at `http://localhost:2026`.

+**All startup modes:**
+
+| | **Local Foreground** | **Local Daemon** | **Docker Dev** | **Docker Prod** |
+|---|---|---|---|---|
+| **Dev** | `./scripts/serve.sh --dev`<br/>`make dev` | `./scripts/serve.sh --dev --daemon`<br/>`make dev-daemon` | `./scripts/docker.sh start`<br/>`make docker-start` | — |
+| **Dev + Gateway** | `./scripts/serve.sh --dev --gateway`<br/>`make dev-pro` | `./scripts/serve.sh --dev --gateway --daemon`<br/>`make dev-daemon-pro` | `./scripts/docker.sh start --gateway`<br/>`make docker-start-pro` | — |
+| **Prod** | `./scripts/serve.sh --prod`<br/>`make start` | `./scripts/serve.sh --prod --daemon`<br/>`make start-daemon` | — | `./scripts/deploy.sh`<br/>`make up` |
+| **Prod + Gateway** | `./scripts/serve.sh --prod --gateway`<br/>`make start-pro` | `./scripts/serve.sh --prod --gateway --daemon`<br/>`make start-daemon-pro` | — | `./scripts/deploy.sh --gateway`<br/>`make up-pro` |
+
+| Action | Local | Docker Dev | Docker Prod |
+|---|---|---|---|
+| **Stop** | `./scripts/serve.sh --stop`<br/>`make stop` | `./scripts/docker.sh stop`<br/>`make docker-stop` | `./scripts/deploy.sh down`<br/>`make down` |
+| **Restart** | `./scripts/serve.sh --restart [flags]` | `./scripts/docker.sh restart` | — |
+
+Gateway mode embeds the agent runtime in Gateway, no LangGraph server.
+
 **Nginx routing**:
- `/api/langgraph/*` → LangGraph Server (2024)
+- Standard mode: `/api/langgraph/*` → LangGraph Server (2024)
+- Gateway mode: `/api/langgraph/*` → Gateway embedded runtime (8001) (via envsubst)
 - `/api/*` (other) → Gateway API (8001)
 - `/` (non-API) → Frontend (3000)

@@ -1,14 +1,21 @@
-# Backend Development Dockerfile
+# Backend Dockerfile — multi-stage build
+# Stage 1 (builder): compiles native Python extensions with build-essential
+# Stage 2 (dev):     retains toolchain for dev containers (uv sync at startup)
+# Stage 3 (runtime): clean image without compiler toolchain for production

 # UV source image (override for restricted networks that cannot reach ghcr.io)
 ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.7.20
 FROM ${UV_IMAGE} AS uv-source

-FROM python:3.12-slim-bookworm
+# ── Stage 1: Builder ──────────────────────────────────────────────────────────
+FROM python:3.12-slim-bookworm AS builder

 ARG NODE_MAJOR=22
 ARG APT_MIRROR
 ARG UV_INDEX_URL
+# Optional extras to install (e.g. "postgres" for PostgreSQL support)
+# Usage: docker build --build-arg UV_EXTRAS=postgres ...
+ARG UV_EXTRAS

 # Optionally override apt mirror for restricted networks (e.g. APT_MIRROR=mirrors.aliyun.com)
 RUN if [ -n "${APT_MIRROR}" ]; then \
@@ -16,7 +23,7 @@ RUN if [ -n "${APT_MIRROR}" ]; then \
      sed -i "s|deb.debian.org|${APT_MIRROR}|g" /etc/apt/sources.list 2>/dev/null || true; \
    fi

-# Install system dependencies + Node.js (provides npx for MCP servers)
+# Install build tools + Node.js (build-essential needed for native Python extensions)
 RUN apt-get update && apt-get install -y \
    curl \
    build-essential \
@@ -29,6 +36,42 @@ RUN apt-get update && apt-get install -y \
    && apt-get install -y nodejs \
    && rm -rf /var/lib/apt/lists/*

+# Install uv (source image overridable via UV_IMAGE build arg)
+COPY --from=uv-source /uv /uvx /usr/local/bin/
+
+# Set working directory
+WORKDIR /app
+
+# Copy backend source code
+COPY backend ./backend
+
+# Install dependencies with cache mount
+# When UV_EXTRAS is set (e.g. "postgres"), installs optional dependencies.
+RUN --mount=type=cache,target=/root/.cache/uv \
+    sh -c "cd backend && UV_INDEX_URL=${UV_INDEX_URL:-https://pypi.org/simple} uv sync ${UV_EXTRAS:+--extra $UV_EXTRAS}"
+
+# ── Stage 2: Dev ──────────────────────────────────────────────────────────────
+# Retains compiler toolchain from builder so startup-time `uv sync` can build
+# source distributions in development containers.
+FROM builder AS dev
+
+# Install Docker CLI (for DooD: allows starting sandbox containers via host Docker socket)
+COPY --from=docker:cli /usr/local/bin/docker /usr/local/bin/docker
+
+EXPOSE 8001 2024
+
+CMD ["sh", "-c", "cd backend && PYTHONPATH=. uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001"]
+
+# ── Stage 3: Runtime ──────────────────────────────────────────────────────────
+# Clean image without build-essential — reduces size (~200 MB) and attack surface.
+FROM python:3.12-slim-bookworm
+
+# Copy Node.js runtime from builder (provides npx for MCP servers)
+COPY --from=builder /usr/bin/node /usr/bin/node
+COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
+RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
+    && ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
+
 # Install Docker CLI (for DooD: allows starting sandbox containers via host Docker socket)
 COPY --from=docker:cli /usr/local/bin/docker /usr/local/bin/docker

@@ -38,12 +81,8 @@ COPY --from=uv-source /uv /uvx /usr/local/bin/
 # Set working directory
 WORKDIR /app

-# Copy frontend source code
-COPY backend ./backend
-
-# Install dependencies with cache mount
-RUN --mount=type=cache,target=/root/.cache/uv \
-    sh -c "cd backend && UV_INDEX_URL=${UV_INDEX_URL:-https://pypi.org/simple} uv sync"
+# Copy backend with pre-built virtualenv from builder
+COPY --from=builder /app/backend ./backend

 # Expose ports (gateway: 8001, langgraph: 2024)
 EXPOSE 8001 2024
@@ -2,7 +2,7 @@ install:
 	uv sync

 dev:
-	uv run langgraph dev --no-browser --allow-blocking --no-reload
+	uv run langgraph dev --no-browser --no-reload --n-jobs-per-worker 10

 gateway:
 	PYTHONPATH=. uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001
@@ -78,6 +78,7 @@ Per-thread isolated execution with virtual path translation:
 - **Virtual paths**: `/mnt/user-data/{workspace,uploads,outputs}` → thread-specific physical directories
 - **Skills path**: `/mnt/skills` → `deer-flow/skills/` directory
 - **Skills loading**: Recursively discovers nested `SKILL.md` files under `skills/{public,custom}` and preserves nested container paths
+- **File-write safety**: `str_replace` serializes read-modify-write per `(sandbox.id, path)` so isolated sandboxes keep concurrency even when virtual paths match
 - **Tools**: `bash`, `ls`, `read_file`, `write_file`, `str_replace` (`bash` is disabled by default when using `LocalSandboxProvider`; use `AioSandboxProvider` for isolated shell access)

 ### Subagent System
@@ -330,7 +331,28 @@ LANGSMITH_PROJECT=xxx

 **Legacy variables:** The `LANGCHAIN_TRACING_V2`, `LANGCHAIN_API_KEY`, `LANGCHAIN_PROJECT`, and `LANGCHAIN_ENDPOINT` variables are also supported for backward compatibility. `LANGSMITH_*` variables take precedence when both are set.

-**Docker:** In `docker-compose.yaml`, tracing is disabled by default (`LANGSMITH_TRACING=false`). Set `LANGSMITH_TRACING=true` and provide `LANGSMITH_API_KEY` in your `.env` to enable it in containerized deployments.
+### Langfuse Tracing
+
+DeerFlow also supports [Langfuse](https://langfuse.com) observability for LangChain-compatible runs.
+
+Add the following to your `.env` file:
+
+```bash
+LANGFUSE_TRACING=true
+LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxxxxxxxxxx
+LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxxxxxxxxxx
+LANGFUSE_BASE_URL=https://cloud.langfuse.com
+```
+
+If you are using a self-hosted Langfuse deployment, set `LANGFUSE_BASE_URL` to your Langfuse host.
+
+### Dual Provider Behavior
+
+If both LangSmith and Langfuse are enabled, DeerFlow initializes and attaches both callbacks so the same run data is reported to both systems.
+
+If a provider is explicitly enabled but required credentials are missing, or the provider callback cannot be initialized, DeerFlow raises an error when tracing is initialized during model creation instead of silently disabling tracing.
+
+**Docker:** In `docker-compose.yaml`, tracing is disabled by default (`LANGSMITH_TRACING=false`). Set `LANGSMITH_TRACING=true` and/or `LANGFUSE_TRACING=true` in your `.env`, together with the required credentials, to enable tracing in containerized deployments.

 ---

@@ -106,3 +106,21 @@ class Channel(ABC):
                        logger.warning("[%s] file upload skipped for %s", self.name, attachment.filename)
                except Exception:
                    logger.exception("[%s] failed to upload file %s", self.name, attachment.filename)
+
+    async def receive_file(self, msg: InboundMessage, thread_id: str) -> InboundMessage:
+        """
+        Optionally process and materialize inbound file attachments for this channel.
+
+        By default, this method does nothing and simply returns the original message.
+        Subclasses (e.g. FeishuChannel) may override this to download files (images, documents, etc)
+        referenced in msg.files, save them to the sandbox, and update msg.text to include
+        the sandbox file paths for downstream model consumption.
+
+        Args:
+            msg: The inbound message, possibly containing file metadata in msg.files.
+            thread_id: The resolved DeerFlow thread ID for sandbox path context.
+
+        Returns:
+            The (possibly modified) InboundMessage, with text and/or files updated as needed.
+        """
+        return msg
@@ -0,0 +1,20 @@
+"""Shared command definitions used by all channel implementations.
+
+Keeping the authoritative command set in one place ensures that channel
+parsers (e.g. Feishu) and the ChannelManager dispatcher stay in sync
+automatically — adding or removing a command here is the single edit
+required.
+"""
+
+from __future__ import annotations
+
+KNOWN_CHANNEL_COMMANDS: frozenset[str] = frozenset(
+    {
+        "/bootstrap",
+        "/new",
+        "/status",
+        "/models",
+        "/memory",
+        "/help",
+    }
+)
@@ -5,15 +5,25 @@ from __future__ import annotations
 import asyncio
 import json
 import logging
+import re
 import threading
-from typing import Any
+from typing import Any, Literal

 from app.channels.base import Channel
-from app.channels.message_bus import InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
+from app.channels.commands import KNOWN_CHANNEL_COMMANDS
+from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
+from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
+from deerflow.sandbox.sandbox_provider import get_sandbox_provider

 logger = logging.getLogger(__name__)


+def _is_feishu_command(text: str) -> bool:
+    if not text.startswith("/"):
+        return False
+    return text.split(maxsplit=1)[0].lower() in KNOWN_CHANNEL_COMMANDS
+
+
 class FeishuChannel(Channel):
    """Feishu/Lark IM channel using the ``lark-oapi`` WebSocket client.

@@ -49,6 +59,8 @@ class FeishuChannel(Channel):
        self._CreateFileRequestBody = None
        self._CreateImageRequest = None
        self._CreateImageRequestBody = None
+        self._GetMessageResourceRequest = None
+        self._thread_lock = threading.Lock()

    async def start(self) -> None:
        if self._running:
@@ -66,6 +78,7 @@ class FeishuChannel(Channel):
                CreateMessageRequest,
                CreateMessageRequestBody,
                Emoji,
+                GetMessageResourceRequest,
                PatchMessageRequest,
                PatchMessageRequestBody,
                ReplyMessageRequest,
@@ -89,6 +102,7 @@ class FeishuChannel(Channel):
        self._CreateFileRequestBody = CreateFileRequestBody
        self._CreateImageRequest = CreateImageRequest
        self._CreateImageRequestBody = CreateImageRequestBody
+        self._GetMessageResourceRequest = GetMessageResourceRequest

        app_id = self.config.get("app_id", "")
        app_secret = self.config.get("app_secret", "")
@@ -199,7 +213,9 @@ class FeishuChannel(Channel):
                    await asyncio.sleep(delay)

        logger.error("[Feishu] send failed after %d attempts: %s", _max_retries, last_exc)
-        raise last_exc  # type: ignore[misc]
+        if last_exc is None:
+            raise RuntimeError("Feishu send failed without an exception from any attempt")
+        raise last_exc

    async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
        if not self._api_client:
@@ -266,6 +282,112 @@ class FeishuChannel(Channel):
            raise RuntimeError(f"Feishu file upload failed: code={response.code}, msg={response.msg}")
        return response.data.file_key

+    async def receive_file(self, msg: InboundMessage, thread_id: str) -> InboundMessage:
+        """Download a Feishu file into the thread uploads directory.
+
+        Returns the sandbox virtual path when the image is persisted successfully.
+        """
+        if not msg.thread_ts:
+            logger.warning("[Feishu] received file message without thread_ts, cannot associate with conversation: %s", msg)
+            return msg
+        files = msg.files
+        if not files:
+            logger.warning("[Feishu] received message with no files: %s", msg)
+            return msg
+        text = msg.text
+        for file in files:
+            if file.get("image_key"):
+                virtual_path = await self._receive_single_file(msg.thread_ts, file["image_key"], "image", thread_id)
+                text = text.replace("[image]", virtual_path, 1)
+            elif file.get("file_key"):
+                virtual_path = await self._receive_single_file(msg.thread_ts, file["file_key"], "file", thread_id)
+                text = text.replace("[file]", virtual_path, 1)
+        msg.text = text
+        return msg
+
+    async def _receive_single_file(self, message_id: str, file_key: str, type: Literal["image", "file"], thread_id: str) -> str:
+        request = self._GetMessageResourceRequest.builder().message_id(message_id).file_key(file_key).type(type).build()
+
+        def inner():
+            return self._api_client.im.v1.message_resource.get(request)
+
+        try:
+            response = await asyncio.to_thread(inner)
+        except Exception:
+            logger.exception("[Feishu] resource get request failed for resource_key=%s type=%s", file_key, type)
+            return f"Failed to obtain the [{type}]"
+
+        if not response.success():
+            logger.warning(
+                "[Feishu] resource get failed: resource_key=%s, type=%s, code=%s, msg=%s, log_id=%s ",
+                file_key,
+                type,
+                response.code,
+                response.msg,
+                response.get_log_id(),
+            )
+            return f"Failed to obtain the [{type}]"
+
+        image_stream = getattr(response, "file", None)
+        if image_stream is None:
+            logger.warning("[Feishu] resource get returned no file stream: resource_key=%s, type=%s", file_key, type)
+            return f"Failed to obtain the [{type}]"
+
+        try:
+            content: bytes = await asyncio.to_thread(image_stream.read)
+        except Exception:
+            logger.exception("[Feishu] failed to read resource stream: resource_key=%s, type=%s", file_key, type)
+            return f"Failed to obtain the [{type}]"
+
+        if not content:
+            logger.warning("[Feishu] empty resource content: resource_key=%s, type=%s", file_key, type)
+            return f"Failed to obtain the [{type}]"
+
+        paths = get_paths()
+        paths.ensure_thread_dirs(thread_id)
+        uploads_dir = paths.sandbox_uploads_dir(thread_id).resolve()
+
+        ext = "png" if type == "image" else "bin"
+        raw_filename = getattr(response, "file_name", "") or f"feishu_{file_key[-12:]}.{ext}"
+
+        # Sanitize filename: preserve extension, replace path chars in name part
+        if "." in raw_filename:
+            name_part, ext = raw_filename.rsplit(".", 1)
+            name_part = re.sub(r"[./\\]", "_", name_part)
+            filename = f"{name_part}.{ext}"
+        else:
+            filename = re.sub(r"[./\\]", "_", raw_filename)
+        resolved_target = uploads_dir / filename
+
+        def down_load():
+            # use thread_lock to avoid filename conflicts when writing
+            with self._thread_lock:
+                resolved_target.write_bytes(content)
+
+        try:
+            await asyncio.to_thread(down_load)
+        except Exception:
+            logger.exception("[Feishu] failed to persist downloaded resource: %s, type=%s", resolved_target, type)
+            return f"Failed to obtain the [{type}]"
+
+        virtual_path = f"{VIRTUAL_PATH_PREFIX}/uploads/{resolved_target.name}"
+
+        try:
+            sandbox_provider = get_sandbox_provider()
+            sandbox_id = sandbox_provider.acquire(thread_id)
+            if sandbox_id != "local":
+                sandbox = sandbox_provider.get(sandbox_id)
+                if sandbox is None:
+                    logger.warning("[Feishu] sandbox not found for thread_id=%s", thread_id)
+                    return f"Failed to obtain the [{type}]"
+                sandbox.update_file(virtual_path, content)
+        except Exception:
+            logger.exception("[Feishu] failed to sync resource into non-local sandbox: %s", virtual_path)
+            return f"Failed to obtain the [{type}]"
+
+        logger.info("[Feishu] downloaded resource mapped: file_key=%s -> %s", file_key, virtual_path)
+        return virtual_path
+
    # -- message formatting ------------------------------------------------

    @staticmethod
@@ -470,9 +592,28 @@ class FeishuChannel(Channel):
            # Parse message content
            content = json.loads(message.content)

+            # files_list store the any-file-key in feishu messages, which can be used to download the file content later
+            # In Feishu channel, image_keys are independent of file_keys.
+            # The file_key includes files, videos, and audio, but does not include stickers.
+            files_list = []
+
            if "text" in content:
                # Handle plain text messages
                text = content["text"]
+            elif "file_key" in content:
+                file_key = content.get("file_key")
+                if isinstance(file_key, str) and file_key:
+                    files_list.append({"file_key": file_key})
+                    text = "[file]"
+                else:
+                    text = ""
+            elif "image_key" in content:
+                image_key = content.get("image_key")
+                if isinstance(image_key, str) and image_key:
+                    files_list.append({"image_key": image_key})
+                    text = "[image]"
+                else:
+                    text = ""
            elif "content" in content and isinstance(content["content"], list):
                # Handle rich-text messages with a top-level "content" list (e.g., topic groups/posts)
                text_paragraphs: list[str] = []
@@ -486,6 +627,16 @@ class FeishuChannel(Channel):
                                    text_value = element.get("text", "")
                                    if text_value:
                                        paragraph_text_parts.append(text_value)
+                                elif element.get("tag") == "img":
+                                    image_key = element.get("image_key")
+                                    if isinstance(image_key, str) and image_key:
+                                        files_list.append({"image_key": image_key})
+                                        paragraph_text_parts.append("[image]")
+                                elif element.get("tag") in ("file", "media"):
+                                    file_key = element.get("file_key")
+                                    if isinstance(file_key, str) and file_key:
+                                        files_list.append({"file_key": file_key})
+                                        paragraph_text_parts.append("[file]")
                        if paragraph_text_parts:
                            # Join text segments within a paragraph with spaces to avoid "helloworld"
                            text_paragraphs.append(" ".join(paragraph_text_parts))
@@ -505,12 +656,13 @@ class FeishuChannel(Channel):
                text[:100] if text else "",
            )

-            if not text:
+            if not (text or files_list):
                logger.info("[Feishu] empty text, ignoring message")
                return

-            # Check if it's a command
-            if text.startswith("/"):
+            # Only treat known slash commands as commands; absolute paths and
+            # other slash-prefixed text should be handled as normal chat.
+            if _is_feishu_command(text):
                msg_type = InboundMessageType.COMMAND
            else:
                msg_type = InboundMessageType.CHAT
@@ -524,6 +676,7 @@ class FeishuChannel(Channel):
                text=text,
                msg_type=msg_type,
                thread_ts=msg_id,
+                files=files_list,
                metadata={"message_id": msg_id, "root_id": root_id},
            )
            inbound.topic_id = topic_id
@@ -7,11 +7,13 @@ import logging
 import mimetypes
 import re
 import time
-from collections.abc import Mapping
+from collections.abc import Awaitable, Callable, Mapping
 from typing import Any

+import httpx
 from langgraph_sdk.errors import ConflictError

+from app.channels.commands import KNOWN_CHANNEL_COMMANDS
 from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
 from app.channels.store import ChannelStore

@@ -35,8 +37,49 @@ CHANNEL_CAPABILITIES = {
    "feishu": {"supports_streaming": True},
    "slack": {"supports_streaming": False},
    "telegram": {"supports_streaming": False},
+    "wecom": {"supports_streaming": True},
 }

+InboundFileReader = Callable[[dict[str, Any], httpx.AsyncClient], Awaitable[bytes | None]]
+
+
+INBOUND_FILE_READERS: dict[str, InboundFileReader] = {}
+
+
+def register_inbound_file_reader(channel_name: str, reader: InboundFileReader) -> None:
+    INBOUND_FILE_READERS[channel_name] = reader
+
+
+async def _read_http_inbound_file(file_info: dict[str, Any], client: httpx.AsyncClient) -> bytes | None:
+    url = file_info.get("url")
+    if not isinstance(url, str) or not url:
+        return None
+
+    resp = await client.get(url)
+    resp.raise_for_status()
+    return resp.content
+
+
+async def _read_wecom_inbound_file(file_info: dict[str, Any], client: httpx.AsyncClient) -> bytes | None:
+    data = await _read_http_inbound_file(file_info, client)
+    if data is None:
+        return None
+
+    aeskey = file_info.get("aeskey") if isinstance(file_info.get("aeskey"), str) else None
+    if not aeskey:
+        return data
+
+    try:
+        from aibot.crypto_utils import decrypt_file
+    except Exception:
+        logger.exception("[Manager] failed to import WeCom decrypt_file")
+        return None
+
+    return decrypt_file(data, aeskey)
+
+
+register_inbound_file_reader("wecom", _read_wecom_inbound_file)
+

 class InvalidChannelSessionConfigError(ValueError):
    """Raised when IM channel session overrides contain invalid agent config."""
@@ -341,6 +384,105 @@ def _prepare_artifact_delivery(
    return response_text, attachments


+async def _ingest_inbound_files(thread_id: str, msg: InboundMessage) -> list[dict[str, Any]]:
+    if not msg.files:
+        return []
+
+    from deerflow.uploads.manager import claim_unique_filename, ensure_uploads_dir, normalize_filename
+
+    uploads_dir = ensure_uploads_dir(thread_id)
+    seen_names = {entry.name for entry in uploads_dir.iterdir() if entry.is_file()}
+
+    created: list[dict[str, Any]] = []
+    file_reader = INBOUND_FILE_READERS.get(msg.channel_name, _read_http_inbound_file)
+    async with httpx.AsyncClient(timeout=httpx.Timeout(20.0)) as client:
+        for idx, f in enumerate(msg.files):
+            if not isinstance(f, dict):
+                continue
+
+            ftype = f.get("type") if isinstance(f.get("type"), str) else "file"
+            filename = f.get("filename") if isinstance(f.get("filename"), str) else ""
+
+            try:
+                data = await file_reader(f, client)
+            except Exception:
+                logger.exception(
+                    "[Manager] failed to read inbound file: channel=%s, file=%s",
+                    msg.channel_name,
+                    f.get("url") or filename or idx,
+                )
+                continue
+
+            if data is None:
+                logger.warning(
+                    "[Manager] inbound file reader returned no data: channel=%s, file=%s",
+                    msg.channel_name,
+                    f.get("url") or filename or idx,
+                )
+                continue
+
+            if not filename:
+                ext = ".bin"
+                if ftype == "image":
+                    ext = ".png"
+                filename = f"{msg.thread_ts or 'msg'}_{idx}{ext}"
+
+            try:
+                safe_name = claim_unique_filename(normalize_filename(filename), seen_names)
+            except ValueError:
+                logger.warning(
+                    "[Manager] skipping inbound file with unsafe filename: channel=%s, file=%r",
+                    msg.channel_name,
+                    filename,
+                )
+                continue
+
+            dest = uploads_dir / safe_name
+            try:
+                dest.write_bytes(data)
+            except Exception:
+                logger.exception("[Manager] failed to write inbound file: %s", dest)
+                continue
+
+            created.append(
+                {
+                    "filename": safe_name,
+                    "size": len(data),
+                    "path": f"/mnt/user-data/uploads/{safe_name}",
+                    "is_image": ftype == "image",
+                }
+            )
+
+    return created
+
+
+def _format_uploaded_files_block(files: list[dict[str, Any]]) -> str:
+    lines = [
+        "<uploaded_files>",
+        "The following files were uploaded in this message:",
+        "",
+    ]
+    if not files:
+        lines.append("(empty)")
+    else:
+        for f in files:
+            filename = f.get("filename", "")
+            size = int(f.get("size") or 0)
+            size_kb = size / 1024 if size else 0
+            size_str = f"{size_kb:.1f} KB" if size_kb < 1024 else f"{size_kb / 1024:.1f} MB"
+            path = f.get("path", "")
+            is_image = bool(f.get("is_image"))
+            file_kind = "image" if is_image else "file"
+            lines.append(f"- {filename} ({size_str})")
+            lines.append(f"  Type: {file_kind}")
+            lines.append(f"  Path: {path}")
+            lines.append("")
+    lines.append("Use `read_file` for text-based files and documents.")
+    lines.append("Use `view_image` for image files (jpg, jpeg, png, webp) so the model can inspect the image content.")
+    lines.append("</uploaded_files>")
+    return "\n".join(lines)
+
+
 class ChannelManager:
    """Core dispatcher that bridges IM channels to the DeerFlow agent.

@@ -533,8 +675,25 @@ class ChannelManager:
            thread_id = await self._create_thread(client, msg)

        assistant_id, run_config, run_context = self._resolve_run_params(msg, thread_id)
+
+        # If the inbound message contains file attachments, let the channel
+        # materialize (download) them and update msg.text to include sandbox file paths.
+        # This enables downstream models to access user-uploaded files by path.
+        # Channels that do not support file download will simply return the original message.
+        if msg.files:
+            from .service import get_channel_service
+
+            service = get_channel_service()
+            channel = service.get_channel(msg.channel_name) if service else None
+            logger.info("[Manager] preparing receive file context for %d attachments", len(msg.files))
+            msg = await channel.receive_file(msg, thread_id) if channel else msg
        if extra_context:
            run_context.update(extra_context)
+
+        uploaded = await _ingest_inbound_files(thread_id, msg)
+        if uploaded:
+            msg.text = f"{_format_uploaded_files_block(uploaded)}\n\n{msg.text}".strip()
+
        if self._channel_supports_streaming(msg.channel_name):
            await self._handle_streaming_chat(
                client,
@@ -735,7 +894,8 @@ class ChannelManager:
                "/help — Show this help"
            )
        else:
-            reply = f"Unknown command: /{command}. Type /help for available commands."
+            available = " | ".join(sorted(KNOWN_CHANNEL_COMMANDS))
+            reply = f"Unknown command: /{command}. Available commands: {available}"

        outbound = OutboundMessage(
            channel_name=msg.channel_name,
@@ -6,6 +6,7 @@ import logging
 import os
 from typing import Any

+from app.channels.base import Channel
 from app.channels.manager import DEFAULT_GATEWAY_URL, DEFAULT_LANGGRAPH_URL, ChannelManager
 from app.channels.message_bus import MessageBus
 from app.channels.store import ChannelStore
@@ -17,6 +18,7 @@ _CHANNEL_REGISTRY: dict[str, str] = {
    "feishu": "app.channels.feishu:FeishuChannel",
    "slack": "app.channels.slack:SlackChannel",
    "telegram": "app.channels.telegram:TelegramChannel",
+    "wecom": "app.channels.wecom:WeComChannel",
 }

 _CHANNELS_LANGGRAPH_URL_ENV = "DEER_FLOW_CHANNELS_LANGGRAPH_URL"
@@ -163,6 +165,10 @@ class ChannelService:
            "channels": channels_status,
        }

+    def get_channel(self, name: str) -> Channel | None:
+        """Return a running channel instance by name when available."""
+        return self._channels.get(name)
+

 # -- singleton access -------------------------------------------------------

@@ -30,7 +30,7 @@ class SlackChannel(Channel):
        self._socket_client = None
        self._web_client = None
        self._loop: asyncio.AbstractEventLoop | None = None
-        self._allowed_users: set[str] = set(config.get("allowed_users", []))
+        self._allowed_users: set[str] = {str(user_id) for user_id in config.get("allowed_users", [])}

    async def start(self) -> None:
        if self._running:
@@ -126,7 +126,9 @@ class SlackChannel(Channel):
                )
            except Exception:
                pass
-        raise last_exc  # type: ignore[misc]
+        if last_exc is None:
+            raise RuntimeError("Slack send failed without an exception from any attempt")
+        raise last_exc

    async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
        if not self._web_client:
@@ -125,7 +125,9 @@ class TelegramChannel(Channel):
                    await asyncio.sleep(delay)

        logger.error("[Telegram] send failed after %d attempts: %s", _max_retries, last_exc)
-        raise last_exc  # type: ignore[misc]
+        if last_exc is None:
+            raise RuntimeError("Telegram send failed without an exception from any attempt")
+        raise last_exc

    async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
        if not self._application:
@@ -0,0 +1,394 @@
+from __future__ import annotations
+
+import asyncio
+import base64
+import hashlib
+import logging
+from collections.abc import Awaitable, Callable
+from typing import Any, cast
+
+from app.channels.base import Channel
+from app.channels.message_bus import (
+    InboundMessageType,
+    MessageBus,
+    OutboundMessage,
+    ResolvedAttachment,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class WeComChannel(Channel):
+    def __init__(self, bus: MessageBus, config: dict[str, Any]) -> None:
+        super().__init__(name="wecom", bus=bus, config=config)
+        self._bot_id: str | None = None
+        self._bot_secret: str | None = None
+        self._ws_client = None
+        self._ws_task: asyncio.Task | None = None
+        self._ws_frames: dict[str, dict[str, Any]] = {}
+        self._ws_stream_ids: dict[str, str] = {}
+        self._working_message = "Working on it..."
+
+    def _clear_ws_context(self, thread_ts: str | None) -> None:
+        if not thread_ts:
+            return
+        self._ws_frames.pop(thread_ts, None)
+        self._ws_stream_ids.pop(thread_ts, None)
+
+    async def _send_ws_upload_command(self, req_id: str, body: dict[str, Any], cmd: str) -> dict[str, Any]:
+        if not self._ws_client:
+            raise RuntimeError("WeCom WebSocket client is not available")
+
+        ws_manager = getattr(self._ws_client, "_ws_manager", None)
+        send_reply = getattr(ws_manager, "send_reply", None)
+        if not callable(send_reply):
+            raise RuntimeError("Installed wecom-aibot-python-sdk does not expose the WebSocket media upload API expected by DeerFlow. Use wecom-aibot-python-sdk==0.1.6 or update the adapter.")
+
+        send_reply_async = cast(Callable[[str, dict[str, Any], str], Awaitable[dict[str, Any]]], send_reply)
+        return await send_reply_async(req_id, body, cmd)
+
+    async def start(self) -> None:
+        if self._running:
+            return
+
+        bot_id = self.config.get("bot_id")
+        bot_secret = self.config.get("bot_secret")
+        working_message = self.config.get("working_message")
+
+        self._bot_id = bot_id if isinstance(bot_id, str) and bot_id else None
+        self._bot_secret = bot_secret if isinstance(bot_secret, str) and bot_secret else None
+        self._working_message = working_message if isinstance(working_message, str) and working_message else "Working on it..."
+
+        if not self._bot_id or not self._bot_secret:
+            logger.error("WeCom channel requires bot_id and bot_secret")
+            return
+
+        try:
+            from aibot import WSClient, WSClientOptions
+        except ImportError:
+            logger.error("wecom-aibot-python-sdk is not installed. Install it with: uv add wecom-aibot-python-sdk")
+            return
+        else:
+            self._ws_client = WSClient(WSClientOptions(bot_id=self._bot_id, secret=self._bot_secret, logger=logger))
+            self._ws_client.on("message.text", self._on_ws_text)
+            self._ws_client.on("message.mixed", self._on_ws_mixed)
+            self._ws_client.on("message.image", self._on_ws_image)
+            self._ws_client.on("message.file", self._on_ws_file)
+            self._ws_task = asyncio.create_task(self._ws_client.connect())
+
+            self._running = True
+            self.bus.subscribe_outbound(self._on_outbound)
+        logger.info("WeCom channel started")
+
+    async def stop(self) -> None:
+        self._running = False
+        self.bus.unsubscribe_outbound(self._on_outbound)
+        if self._ws_task:
+            try:
+                self._ws_task.cancel()
+            except Exception:
+                pass
+            self._ws_task = None
+        if self._ws_client:
+            try:
+                self._ws_client.disconnect()
+            except Exception:
+                pass
+        self._ws_client = None
+        self._ws_frames.clear()
+        self._ws_stream_ids.clear()
+        logger.info("WeCom channel stopped")
+
+    async def send(self, msg: OutboundMessage, *, _max_retries: int = 3) -> None:
+        if self._ws_client:
+            await self._send_ws(msg, _max_retries=_max_retries)
+            return
+        logger.warning("[WeCom] send called but WebSocket client is not available")
+
+    async def _on_outbound(self, msg: OutboundMessage) -> None:
+        if msg.channel_name != self.name:
+            return
+
+        try:
+            await self.send(msg)
+        except Exception:
+            logger.exception("Failed to send outbound message on channel %s", self.name)
+            if msg.is_final:
+                self._clear_ws_context(msg.thread_ts)
+            return
+
+        for attachment in msg.attachments:
+            try:
+                success = await self.send_file(msg, attachment)
+                if not success:
+                    logger.warning("[%s] file upload skipped for %s", self.name, attachment.filename)
+            except Exception:
+                logger.exception("[%s] failed to upload file %s", self.name, attachment.filename)
+
+        if msg.is_final:
+            self._clear_ws_context(msg.thread_ts)
+
+    async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
+        if not msg.is_final:
+            return True
+        if not self._ws_client:
+            return False
+        if not msg.thread_ts:
+            return False
+        frame = self._ws_frames.get(msg.thread_ts)
+        if not frame:
+            return False
+
+        media_type = "image" if attachment.is_image else "file"
+        size_limit = 2 * 1024 * 1024 if attachment.is_image else 20 * 1024 * 1024
+        if attachment.size > size_limit:
+            logger.warning(
+                "[WeCom] %s too large (%d bytes), skipping: %s",
+                media_type,
+                attachment.size,
+                attachment.filename,
+            )
+            return False
+
+        try:
+            media_id = await self._upload_media_ws(
+                media_type=media_type,
+                filename=attachment.filename,
+                path=str(attachment.actual_path),
+                size=attachment.size,
+            )
+            if not media_id:
+                return False
+
+            body = {media_type: {"media_id": media_id}, "msgtype": media_type}
+            await self._ws_client.reply(frame, body)
+            logger.debug("[WeCom] %s sent via ws: %s", media_type, attachment.filename)
+            return True
+        except Exception:
+            logger.exception("[WeCom] failed to upload/send file via ws: %s", attachment.filename)
+            return False
+
+    async def _on_ws_text(self, frame: dict[str, Any]) -> None:
+        body = frame.get("body", {}) or {}
+        text = ((body.get("text") or {}).get("content") or "").strip()
+        quote = body.get("quote", {}).get("text", {}).get("content", "").strip()
+        if not text and not quote:
+            return
+        await self._publish_ws_inbound(frame, text + (f"\nQuote message: {quote}" if quote else ""))
+
+    async def _on_ws_mixed(self, frame: dict[str, Any]) -> None:
+        body = frame.get("body", {}) or {}
+        mixed = body.get("mixed") or {}
+        items = mixed.get("msg_item") or []
+        parts: list[str] = []
+        files: list[dict[str, Any]] = []
+        for item in items:
+            item_type = (item or {}).get("msgtype")
+            if item_type == "text":
+                content = (((item or {}).get("text") or {}).get("content") or "").strip()
+                if content:
+                    parts.append(content)
+            elif item_type in ("image", "file"):
+                payload = (item or {}).get(item_type) or {}
+                url = payload.get("url")
+                aeskey = payload.get("aeskey")
+                if isinstance(url, str) and url:
+                    files.append(
+                        {
+                            "type": item_type,
+                            "url": url,
+                            "aeskey": (aeskey if isinstance(aeskey, str) and aeskey else None),
+                        }
+                    )
+        text = "\n\n".join(parts).strip()
+        if not text and not files:
+            return
+        if not text:
+            text = "（receive image/file）"
+        await self._publish_ws_inbound(frame, text, files=files)
+
+    async def _on_ws_image(self, frame: dict[str, Any]) -> None:
+        body = frame.get("body", {}) or {}
+        image = body.get("image") or {}
+        url = image.get("url")
+        aeskey = image.get("aeskey")
+        if not isinstance(url, str) or not url:
+            return
+        await self._publish_ws_inbound(
+            frame,
+            "（receive image ）",
+            files=[
+                {
+                    "type": "image",
+                    "url": url,
+                    "aeskey": aeskey if isinstance(aeskey, str) and aeskey else None,
+                }
+            ],
+        )
+
+    async def _on_ws_file(self, frame: dict[str, Any]) -> None:
+        body = frame.get("body", {}) or {}
+        file_obj = body.get("file") or {}
+        url = file_obj.get("url")
+        aeskey = file_obj.get("aeskey")
+        if not isinstance(url, str) or not url:
+            return
+        await self._publish_ws_inbound(
+            frame,
+            "（receive file）",
+            files=[
+                {
+                    "type": "file",
+                    "url": url,
+                    "aeskey": aeskey if isinstance(aeskey, str) and aeskey else None,
+                }
+            ],
+        )
+
+    async def _publish_ws_inbound(
+        self,
+        frame: dict[str, Any],
+        text: str,
+        *,
+        files: list[dict[str, Any]] | None = None,
+    ) -> None:
+        if not self._ws_client:
+            return
+        try:
+            from aibot import generate_req_id
+        except Exception:
+            return
+
+        body = frame.get("body", {}) or {}
+        msg_id = body.get("msgid")
+        if not msg_id:
+            return
+
+        user_id = (body.get("from") or {}).get("userid")
+
+        inbound_type = InboundMessageType.COMMAND if text.startswith("/") else InboundMessageType.CHAT
+        inbound = self._make_inbound(
+            chat_id=user_id,  # keep user's conversation in memory
+            user_id=user_id,
+            text=text,
+            msg_type=inbound_type,
+            thread_ts=msg_id,
+            files=files or [],
+            metadata={"aibotid": body.get("aibotid"), "chattype": body.get("chattype")},
+        )
+        inbound.topic_id = user_id  # keep the same thread
+
+        stream_id = generate_req_id("stream")
+        self._ws_frames[msg_id] = frame
+        self._ws_stream_ids[msg_id] = stream_id
+
+        try:
+            await self._ws_client.reply_stream(frame, stream_id, self._working_message, False)
+        except Exception:
+            pass
+
+        await self.bus.publish_inbound(inbound)
+
+    async def _send_ws(self, msg: OutboundMessage, *, _max_retries: int = 3) -> None:
+        if not self._ws_client:
+            return
+        try:
+            from aibot import generate_req_id
+        except Exception:
+            generate_req_id = None
+
+        if msg.thread_ts and msg.thread_ts in self._ws_frames:
+            frame = self._ws_frames[msg.thread_ts]
+            stream_id = self._ws_stream_ids.get(msg.thread_ts)
+            if not stream_id and generate_req_id:
+                stream_id = generate_req_id("stream")
+                self._ws_stream_ids[msg.thread_ts] = stream_id
+            if not stream_id:
+                return
+
+            last_exc: Exception | None = None
+            for attempt in range(_max_retries):
+                try:
+                    await self._ws_client.reply_stream(frame, stream_id, msg.text, bool(msg.is_final))
+                    return
+                except Exception as exc:
+                    last_exc = exc
+                    if attempt < _max_retries - 1:
+                        await asyncio.sleep(2**attempt)
+            if last_exc:
+                raise last_exc
+
+        body = {"msgtype": "markdown", "markdown": {"content": msg.text}}
+        last_exc = None
+        for attempt in range(_max_retries):
+            try:
+                await self._ws_client.send_message(msg.chat_id, body)
+                return
+            except Exception as exc:
+                last_exc = exc
+                if attempt < _max_retries - 1:
+                    await asyncio.sleep(2**attempt)
+        if last_exc:
+            raise last_exc
+
+    async def _upload_media_ws(
+        self,
+        *,
+        media_type: str,
+        filename: str,
+        path: str,
+        size: int,
+    ) -> str | None:
+        if not self._ws_client:
+            return None
+        try:
+            from aibot import generate_req_id
+        except Exception:
+            return None
+
+        chunk_size = 512 * 1024
+        total_chunks = (size + chunk_size - 1) // chunk_size
+        if total_chunks < 1 or total_chunks > 100:
+            logger.warning("[WeCom] invalid total_chunks=%d for %s", total_chunks, filename)
+            return None
+
+        md5_hasher = hashlib.md5()
+        with open(path, "rb") as f:
+            for chunk in iter(lambda: f.read(1024 * 1024), b""):
+                md5_hasher.update(chunk)
+        md5 = md5_hasher.hexdigest()
+
+        init_req_id = generate_req_id("aibot_upload_media_init")
+        init_body = {
+            "type": media_type,
+            "filename": filename,
+            "total_size": int(size),
+            "total_chunks": int(total_chunks),
+            "md5": md5,
+        }
+        init_ack = await self._send_ws_upload_command(init_req_id, init_body, "aibot_upload_media_init")
+        upload_id = (init_ack.get("body") or {}).get("upload_id")
+        if not upload_id:
+            logger.warning("[WeCom] upload init returned no upload_id: %s", init_ack)
+            return None
+
+        with open(path, "rb") as f:
+            for idx in range(total_chunks):
+                data = f.read(chunk_size)
+                if not data:
+                    break
+                chunk_req_id = generate_req_id("aibot_upload_media_chunk")
+                chunk_body = {
+                    "upload_id": upload_id,
+                    "chunk_index": int(idx),
+                    "base64_data": base64.b64encode(data).decode("utf-8"),
+                }
+                await self._send_ws_upload_command(chunk_req_id, chunk_body, "aibot_upload_media_chunk")
+
+        finish_req_id = generate_req_id("aibot_upload_media_finish")
+        finish_ack = await self._send_ws_upload_command(finish_req_id, {"upload_id": upload_id}, "aibot_upload_media_finish")
+        media_id = (finish_ack.get("body") or {}).get("media_id")
+        if not media_id:
+            logger.warning("[WeCom] upload finish returned no media_id: %s", finish_ack)
+            return None
+        return media_id
@@ -11,6 +11,7 @@ from app.gateway.routers import (
    artifacts,
    assistants_compat,
    channels,
+    feedback,
    mcp,
    memory,
    models,
@@ -199,6 +200,9 @@ This gateway provides custom endpoints for models, MCP configuration, skills, an
    # Assistants compatibility API (LangGraph Platform stub)
    app.include_router(assistants_compat.router)

+    # Feedback API is mounted at /api/threads/{thread_id}/runs/{run_id}/feedback
+    app.include_router(feedback.router)
+
    # Thread Runs API (LangGraph Platform-compatible runs lifecycle)
    app.include_router(thread_runs.router)

@@ -1,7 +1,8 @@
 """Centralized accessors for singleton objects stored on ``app.state``.

 **Getters** (used by routers): raise 503 when a required dependency is
-missing, except ``get_store`` which returns ``None``.
+missing, except ``get_store`` and ``get_thread_meta_repo`` which return
+``None``.

 Initialization is handled directly in ``app.py`` via :class:`AsyncExitStack`.
 """
@@ -13,7 +14,7 @@ from contextlib import AsyncExitStack, asynccontextmanager

 from fastapi import FastAPI, HTTPException, Request

-from deerflow.runtime import RunManager, StreamBridge
+from deerflow.runtime import RunContext, RunManager


@asynccontextmanager
@@ -26,45 +27,110 @@ async def langgraph_runtime(app: FastAPI) -> AsyncGenerator[None, None]:
            yield
    """
    from deerflow.agents.checkpointer.async_provider import make_checkpointer
+    from deerflow.config import get_app_config
+    from deerflow.persistence.engine import close_engine, get_session_factory, init_engine_from_config
    from deerflow.runtime import make_store, make_stream_bridge
+    from deerflow.runtime.events.store import make_run_event_store

    async with AsyncExitStack() as stack:
        app.state.stream_bridge = await stack.enter_async_context(make_stream_bridge())
+
+        # Initialize persistence engine BEFORE checkpointer so that
+        # auto-create-database logic runs first (postgres backend).
+        config = get_app_config()
+        await init_engine_from_config(config.database)
+
        app.state.checkpointer = await stack.enter_async_context(make_checkpointer())
        app.state.store = await stack.enter_async_context(make_store())
-        app.state.run_manager = RunManager()
-        yield
+
+        # Initialize repositories — one get_session_factory() call for all.
+        sf = get_session_factory()
+        if sf is not None:
+            from deerflow.persistence.feedback import FeedbackRepository
+            from deerflow.persistence.run import RunRepository
+            from deerflow.persistence.thread_meta import ThreadMetaRepository
+
+            app.state.run_store = RunRepository(sf)
+            app.state.feedback_repo = FeedbackRepository(sf)
+            app.state.thread_meta_repo = ThreadMetaRepository(sf)
+        else:
+            from deerflow.persistence.thread_meta import MemoryThreadMetaStore
+            from deerflow.runtime.runs.store.memory import MemoryRunStore
+
+            app.state.run_store = MemoryRunStore()
+            app.state.feedback_repo = None
+            app.state.thread_meta_repo = MemoryThreadMetaStore(app.state.store)
+
+        # Run event store (has its own factory with config-driven backend selection)
+        run_events_config = getattr(config, "run_events", None)
+        app.state.run_event_store = make_run_event_store(run_events_config)
+
+        # RunManager with store backing for persistence
+        app.state.run_manager = RunManager(store=app.state.run_store)
+
+        try:
+            yield
+        finally:
+            await close_engine()


 # ---------------------------------------------------------------------------
-# Getters – called by routers per-request
+# Getters -- called by routers per-request
 # ---------------------------------------------------------------------------


-def get_stream_bridge(request: Request) -> StreamBridge:
-    """Return the global :class:`StreamBridge`, or 503."""
-    bridge = getattr(request.app.state, "stream_bridge", None)
-    if bridge is None:
-        raise HTTPException(status_code=503, detail="Stream bridge not available")
-    return bridge
+def _require(attr: str, label: str):
+    """Create a FastAPI dependency that returns ``app.state.<attr>`` or 503."""
+
+    def dep(request: Request):
+        val = getattr(request.app.state, attr, None)
+        if val is None:
+            raise HTTPException(status_code=503, detail=f"{label} not available")
+        return val
+
+    dep.__name__ = dep.__qualname__ = f"get_{attr}"
+    return dep


-def get_run_manager(request: Request) -> RunManager:
-    """Return the global :class:`RunManager`, or 503."""
-    mgr = getattr(request.app.state, "run_manager", None)
-    if mgr is None:
-        raise HTTPException(status_code=503, detail="Run manager not available")
-    return mgr
-
-
-def get_checkpointer(request: Request):
-    """Return the global checkpointer, or 503."""
-    cp = getattr(request.app.state, "checkpointer", None)
-    if cp is None:
-        raise HTTPException(status_code=503, detail="Checkpointer not available")
-    return cp
+get_stream_bridge = _require("stream_bridge", "Stream bridge")
+get_run_manager = _require("run_manager", "Run manager")
+get_checkpointer = _require("checkpointer", "Checkpointer")
+get_run_event_store = _require("run_event_store", "Run event store")
+get_feedback_repo = _require("feedback_repo", "Feedback")
+get_run_store = _require("run_store", "Run store")


 def get_store(request: Request):
    """Return the global store (may be ``None`` if not configured)."""
    return getattr(request.app.state, "store", None)
+
+
+get_thread_meta_repo = _require("thread_meta_repo", "Thread metadata store")
+
+
+def get_run_context(request: Request) -> RunContext:
+    """Build a :class:`RunContext` from ``app.state`` singletons.
+
+    Returns a *base* context with infrastructure dependencies.  Callers that
+    need per-run fields (e.g. ``follow_up_to_run_id``) should use
+    ``dataclasses.replace(ctx, follow_up_to_run_id=...)`` before passing it
+    to :func:`run_agent`.
+    """
+    from deerflow.config import get_app_config
+
+    return RunContext(
+        checkpointer=get_checkpointer(request),
+        store=get_store(request),
+        event_store=get_run_event_store(request),
+        run_events_config=getattr(get_app_config(), "run_events", None),
+        thread_meta_repo=get_thread_meta_repo(request),
+    )
+
+
+async def get_current_user(request: Request) -> str | None:
+    """Extract user identity from request.
+
+    Phase 2: always returns None (no authentication).
+    Phase 3: extract user_id from JWT / session / API key header.
+    """
+    return None
@@ -24,7 +24,7 @@ class AgentResponse(BaseModel):
    description: str = Field(default="", description="Agent description")
    model: str | None = Field(default=None, description="Optional model override")
    tool_groups: list[str] | None = Field(default=None, description="Optional tool group whitelist")
-    soul: str | None = Field(default=None, description="SOUL.md content (included on GET /{name})")
+    soul: str | None = Field(default=None, description="SOUL.md content")


 class AgentsListResponse(BaseModel):
@@ -92,17 +92,17 @@ def _agent_config_to_response(agent_cfg: AgentConfig, include_soul: bool = False
    "/agents",
    response_model=AgentsListResponse,
    summary="List Custom Agents",
-    description="List all custom agents available in the agents directory.",
+    description="List all custom agents available in the agents directory, including their soul content.",
 )
 async def list_agents() -> AgentsListResponse:
    """List all custom agents.

    Returns:
-        List of all custom agents with their metadata (without soul content).
+        List of all custom agents with their metadata and soul content.
    """
    try:
        agents = list_custom_agents()
-        return AgentsListResponse(agents=[_agent_config_to_response(a) for a in agents])
+        return AgentsListResponse(agents=[_agent_config_to_response(a, include_soul=True) for a in agents])
    except Exception as e:
        logger.error(f"Failed to list agents: {e}", exc_info=True)
        raise HTTPException(status_code=500, detail=f"Failed to list agents: {str(e)}")
@@ -0,0 +1,127 @@
+"""Feedback endpoints — create, list, stats, delete.
+
+Allows users to submit thumbs-up/down feedback on runs,
+optionally scoped to a specific message.
+"""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from fastapi import APIRouter, HTTPException, Request
+from pydantic import BaseModel, Field
+
+from app.gateway.deps import get_current_user, get_feedback_repo, get_run_store
+
+logger = logging.getLogger(__name__)
+router = APIRouter(prefix="/api/threads", tags=["feedback"])
+
+
+# ---------------------------------------------------------------------------
+# Request / response models
+# ---------------------------------------------------------------------------
+
+
+class FeedbackCreateRequest(BaseModel):
+    rating: int = Field(..., description="Feedback rating: +1 (positive) or -1 (negative)")
+    comment: str | None = Field(default=None, description="Optional text feedback")
+    message_id: str | None = Field(default=None, description="Optional: scope feedback to a specific message")
+
+
+class FeedbackResponse(BaseModel):
+    feedback_id: str
+    run_id: str
+    thread_id: str
+    owner_id: str | None = None
+    message_id: str | None = None
+    rating: int
+    comment: str | None = None
+    created_at: str = ""
+
+
+class FeedbackStatsResponse(BaseModel):
+    run_id: str
+    total: int = 0
+    positive: int = 0
+    negative: int = 0
+
+
+# ---------------------------------------------------------------------------
+# Endpoints
+# ---------------------------------------------------------------------------
+
+
+@router.post("/{thread_id}/runs/{run_id}/feedback", response_model=FeedbackResponse)
+async def create_feedback(
+    thread_id: str,
+    run_id: str,
+    body: FeedbackCreateRequest,
+    request: Request,
+) -> dict[str, Any]:
+    """Submit feedback (thumbs-up/down) for a run."""
+    if body.rating not in (1, -1):
+        raise HTTPException(status_code=400, detail="rating must be +1 or -1")
+
+    user_id = await get_current_user(request)
+
+    # Validate run exists and belongs to thread
+    run_store = get_run_store(request)
+    run = await run_store.get(run_id)
+    if run is None:
+        raise HTTPException(status_code=404, detail=f"Run {run_id} not found")
+    if run.get("thread_id") != thread_id:
+        raise HTTPException(status_code=404, detail=f"Run {run_id} not found in thread {thread_id}")
+
+    feedback_repo = get_feedback_repo(request)
+    return await feedback_repo.create(
+        run_id=run_id,
+        thread_id=thread_id,
+        rating=body.rating,
+        owner_id=user_id,
+        message_id=body.message_id,
+        comment=body.comment,
+    )
+
+
+@router.get("/{thread_id}/runs/{run_id}/feedback", response_model=list[FeedbackResponse])
+async def list_feedback(
+    thread_id: str,
+    run_id: str,
+    request: Request,
+) -> list[dict[str, Any]]:
+    """List all feedback for a run."""
+    feedback_repo = get_feedback_repo(request)
+    return await feedback_repo.list_by_run(thread_id, run_id)
+
+
+@router.get("/{thread_id}/runs/{run_id}/feedback/stats", response_model=FeedbackStatsResponse)
+async def feedback_stats(
+    thread_id: str,
+    run_id: str,
+    request: Request,
+) -> dict[str, Any]:
+    """Get aggregated feedback stats (positive/negative counts) for a run."""
+    feedback_repo = get_feedback_repo(request)
+    return await feedback_repo.aggregate_by_run(thread_id, run_id)
+
+
+@router.delete("/{thread_id}/runs/{run_id}/feedback/{feedback_id}")
+async def delete_feedback(
+    thread_id: str,
+    run_id: str,
+    feedback_id: str,
+    request: Request,
+) -> dict[str, bool]:
+    """Delete a feedback record."""
+    feedback_repo = get_feedback_repo(request)
+    # Verify feedback belongs to the specified thread/run before deleting
+    existing = await feedback_repo.get(feedback_id)
+    if existing is None:
+        raise HTTPException(status_code=404, detail=f"Feedback {feedback_id} not found")
+    if existing.get("thread_id") != thread_id or existing.get("run_id") != run_id:
+        raise HTTPException(status_code=404, detail=f"Feedback {feedback_id} not found in run {run_id}")
+    deleted = await feedback_repo.delete(feedback_id)
+    if not deleted:
+        raise HTTPException(status_code=404, detail=f"Feedback {feedback_id} not found")
+    return {"success": True}
@@ -49,6 +49,7 @@ class Fact(BaseModel):
    confidence: float = Field(default=0.5, description="Confidence score (0-1)")
    createdAt: str = Field(default="", description="Creation timestamp")
    source: str = Field(default="unknown", description="Source thread ID")
+    sourceError: str | None = Field(default=None, description="Optional description of the prior mistake or wrong approach")


 class MemoryResponse(BaseModel):
@@ -108,6 +109,7 @@ class MemoryStatusResponse(BaseModel):
@router.get(
    "/memory",
    response_model=MemoryResponse,
+    response_model_exclude_none=True,
    summary="Get Memory Data",
    description="Retrieve the current global memory data including user context, history, and facts.",
 )
@@ -152,6 +154,7 @@ async def get_memory() -> MemoryResponse:
@router.post(
    "/memory/reload",
    response_model=MemoryResponse,
+    response_model_exclude_none=True,
    summary="Reload Memory Data",
    description="Reload memory data from the storage file, refreshing the in-memory cache.",
 )
@@ -171,6 +174,7 @@ async def reload_memory() -> MemoryResponse:
@router.delete(
    "/memory",
    response_model=MemoryResponse,
+    response_model_exclude_none=True,
    summary="Clear All Memory Data",
    description="Delete all saved memory data and reset the memory structure to an empty state.",
 )
@@ -187,6 +191,7 @@ async def clear_memory() -> MemoryResponse:
@router.post(
    "/memory/facts",
    response_model=MemoryResponse,
+    response_model_exclude_none=True,
    summary="Create Memory Fact",
    description="Create a single saved memory fact manually.",
 )
@@ -209,6 +214,7 @@ async def create_memory_fact_endpoint(request: FactCreateRequest) -> MemoryRespo
@router.delete(
    "/memory/facts/{fact_id}",
    response_model=MemoryResponse,
+    response_model_exclude_none=True,
    summary="Delete Memory Fact",
    description="Delete a single saved memory fact by its fact id.",
 )
@@ -227,6 +233,7 @@ async def delete_memory_fact_endpoint(fact_id: str) -> MemoryResponse:
@router.patch(
    "/memory/facts/{fact_id}",
    response_model=MemoryResponse,
+    response_model_exclude_none=True,
    summary="Patch Memory Fact",
    description="Partially update a single saved memory fact by its fact id while preserving omitted fields.",
 )
@@ -252,6 +259,7 @@ async def update_memory_fact_endpoint(fact_id: str, request: FactPatchRequest) -
@router.get(
    "/memory/export",
    response_model=MemoryResponse,
+    response_model_exclude_none=True,
    summary="Export Memory Data",
    description="Export the current global memory data as JSON for backup or transfer.",
 )
@@ -264,6 +272,7 @@ async def export_memory() -> MemoryResponse:
@router.post(
    "/memory/import",
    response_model=MemoryResponse,
+    response_model_exclude_none=True,
    summary="Import Memory Data",
    description="Import and overwrite the current global memory data from a JSON payload.",
 )
@@ -317,6 +326,7 @@ async def get_memory_config_endpoint() -> MemoryConfigResponse:
@router.get(
    "/memory/status",
    response_model=MemoryStatusResponse,
+    response_model_exclude_none=True,
    summary="Get Memory Status",
    description="Retrieve both memory configuration and current data in a single request.",
 )
@@ -51,6 +51,7 @@ async def stateless_stream(body: RunCreateRequest, request: Request) -> Streamin
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
            "X-Accel-Buffering": "no",
+            "Content-Location": f"/api/threads/{thread_id}/runs/{record.run_id}",
        },
    )

@@ -1,14 +1,29 @@
 import json
 import logging
+import shutil
 from pathlib import Path

 from fastapi import APIRouter, HTTPException
 from pydantic import BaseModel, Field

 from app.gateway.path_utils import resolve_thread_virtual_path
+from deerflow.agents.lead_agent.prompt import clear_skills_system_prompt_cache
 from deerflow.config.extensions_config import ExtensionsConfig, SkillStateConfig, get_extensions_config, reload_extensions_config
 from deerflow.skills import Skill, load_skills
 from deerflow.skills.installer import SkillAlreadyExistsError, install_skill_from_archive
+from deerflow.skills.manager import (
+    append_history,
+    atomic_write,
+    custom_skill_exists,
+    ensure_custom_skill_is_editable,
+    get_custom_skill_dir,
+    get_custom_skill_file,
+    get_skill_history_file,
+    read_custom_skill_content,
+    read_history,
+    validate_skill_markdown_content,
+)
+from deerflow.skills.security_scanner import scan_skill_content

 logger = logging.getLogger(__name__)

@@ -52,6 +67,22 @@ class SkillInstallResponse(BaseModel):
    message: str = Field(..., description="Installation result message")


+class CustomSkillContentResponse(SkillResponse):
+    content: str = Field(..., description="Raw SKILL.md content")
+
+
+class CustomSkillUpdateRequest(BaseModel):
+    content: str = Field(..., description="Replacement SKILL.md content")
+
+
+class CustomSkillHistoryResponse(BaseModel):
+    history: list[dict]
+
+
+class SkillRollbackRequest(BaseModel):
+    history_index: int = Field(default=-1, description="History entry index to restore from, defaulting to the latest change.")
+
+
 def _skill_to_response(skill: Skill) -> SkillResponse:
    """Convert a Skill object to a SkillResponse."""
    return SkillResponse(
@@ -78,6 +109,180 @@ async def list_skills() -> SkillsListResponse:
        raise HTTPException(status_code=500, detail=f"Failed to load skills: {str(e)}")


+@router.post(
+    "/skills/install",
+    response_model=SkillInstallResponse,
+    summary="Install Skill",
+    description="Install a skill from a .skill file (ZIP archive) located in the thread's user-data directory.",
+)
+async def install_skill(request: SkillInstallRequest) -> SkillInstallResponse:
+    try:
+        skill_file_path = resolve_thread_virtual_path(request.thread_id, request.path)
+        result = install_skill_from_archive(skill_file_path)
+        return SkillInstallResponse(**result)
+    except FileNotFoundError as e:
+        raise HTTPException(status_code=404, detail=str(e))
+    except SkillAlreadyExistsError as e:
+        raise HTTPException(status_code=409, detail=str(e))
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Failed to install skill: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Failed to install skill: {str(e)}")
+
+
+@router.get("/skills/custom", response_model=SkillsListResponse, summary="List Custom Skills")
+async def list_custom_skills() -> SkillsListResponse:
+    try:
+        skills = [skill for skill in load_skills(enabled_only=False) if skill.category == "custom"]
+        return SkillsListResponse(skills=[_skill_to_response(skill) for skill in skills])
+    except Exception as e:
+        logger.error("Failed to list custom skills: %s", e, exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Failed to list custom skills: {str(e)}")
+
+
+@router.get("/skills/custom/{skill_name}", response_model=CustomSkillContentResponse, summary="Get Custom Skill Content")
+async def get_custom_skill(skill_name: str) -> CustomSkillContentResponse:
+    try:
+        skills = load_skills(enabled_only=False)
+        skill = next((s for s in skills if s.name == skill_name and s.category == "custom"), None)
+        if skill is None:
+            raise HTTPException(status_code=404, detail=f"Custom skill '{skill_name}' not found")
+        return CustomSkillContentResponse(**_skill_to_response(skill).model_dump(), content=read_custom_skill_content(skill_name))
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error("Failed to get custom skill %s: %s", skill_name, e, exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Failed to get custom skill: {str(e)}")
+
+
+@router.put("/skills/custom/{skill_name}", response_model=CustomSkillContentResponse, summary="Edit Custom Skill")
+async def update_custom_skill(skill_name: str, request: CustomSkillUpdateRequest) -> CustomSkillContentResponse:
+    try:
+        ensure_custom_skill_is_editable(skill_name)
+        validate_skill_markdown_content(skill_name, request.content)
+        scan = await scan_skill_content(request.content, executable=False, location=f"{skill_name}/SKILL.md")
+        if scan.decision == "block":
+            raise HTTPException(status_code=400, detail=f"Security scan blocked the edit: {scan.reason}")
+        skill_file = get_custom_skill_dir(skill_name) / "SKILL.md"
+        prev_content = skill_file.read_text(encoding="utf-8")
+        atomic_write(skill_file, request.content)
+        append_history(
+            skill_name,
+            {
+                "action": "human_edit",
+                "author": "human",
+                "thread_id": None,
+                "file_path": "SKILL.md",
+                "prev_content": prev_content,
+                "new_content": request.content,
+                "scanner": {"decision": scan.decision, "reason": scan.reason},
+            },
+        )
+        clear_skills_system_prompt_cache()
+        return await get_custom_skill(skill_name)
+    except HTTPException:
+        raise
+    except FileNotFoundError as e:
+        raise HTTPException(status_code=404, detail=str(e))
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.error("Failed to update custom skill %s: %s", skill_name, e, exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Failed to update custom skill: {str(e)}")
+
+
+@router.delete("/skills/custom/{skill_name}", summary="Delete Custom Skill")
+async def delete_custom_skill(skill_name: str) -> dict[str, bool]:
+    try:
+        ensure_custom_skill_is_editable(skill_name)
+        skill_dir = get_custom_skill_dir(skill_name)
+        prev_content = read_custom_skill_content(skill_name)
+        append_history(
+            skill_name,
+            {
+                "action": "human_delete",
+                "author": "human",
+                "thread_id": None,
+                "file_path": "SKILL.md",
+                "prev_content": prev_content,
+                "new_content": None,
+                "scanner": {"decision": "allow", "reason": "Deletion requested."},
+            },
+        )
+        shutil.rmtree(skill_dir)
+        clear_skills_system_prompt_cache()
+        return {"success": True}
+    except FileNotFoundError as e:
+        raise HTTPException(status_code=404, detail=str(e))
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.error("Failed to delete custom skill %s: %s", skill_name, e, exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Failed to delete custom skill: {str(e)}")
+
+
+@router.get("/skills/custom/{skill_name}/history", response_model=CustomSkillHistoryResponse, summary="Get Custom Skill History")
+async def get_custom_skill_history(skill_name: str) -> CustomSkillHistoryResponse:
+    try:
+        if not custom_skill_exists(skill_name) and not get_skill_history_file(skill_name).exists():
+            raise HTTPException(status_code=404, detail=f"Custom skill '{skill_name}' not found")
+        return CustomSkillHistoryResponse(history=read_history(skill_name))
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error("Failed to read history for %s: %s", skill_name, e, exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Failed to read history: {str(e)}")
+
+
+@router.post("/skills/custom/{skill_name}/rollback", response_model=CustomSkillContentResponse, summary="Rollback Custom Skill")
+async def rollback_custom_skill(skill_name: str, request: SkillRollbackRequest) -> CustomSkillContentResponse:
+    try:
+        if not custom_skill_exists(skill_name) and not get_skill_history_file(skill_name).exists():
+            raise HTTPException(status_code=404, detail=f"Custom skill '{skill_name}' not found")
+        history = read_history(skill_name)
+        if not history:
+            raise HTTPException(status_code=400, detail=f"Custom skill '{skill_name}' has no history")
+        record = history[request.history_index]
+        target_content = record.get("prev_content")
+        if target_content is None:
+            raise HTTPException(status_code=400, detail="Selected history entry has no previous content to roll back to")
+        validate_skill_markdown_content(skill_name, target_content)
+        scan = await scan_skill_content(target_content, executable=False, location=f"{skill_name}/SKILL.md")
+        skill_file = get_custom_skill_file(skill_name)
+        current_content = skill_file.read_text(encoding="utf-8") if skill_file.exists() else None
+        history_entry = {
+            "action": "rollback",
+            "author": "human",
+            "thread_id": None,
+            "file_path": "SKILL.md",
+            "prev_content": current_content,
+            "new_content": target_content,
+            "rollback_from_ts": record.get("ts"),
+            "scanner": {"decision": scan.decision, "reason": scan.reason},
+        }
+        if scan.decision == "block":
+            append_history(skill_name, history_entry)
+            raise HTTPException(status_code=400, detail=f"Rollback blocked by security scanner: {scan.reason}")
+        atomic_write(skill_file, target_content)
+        append_history(skill_name, history_entry)
+        clear_skills_system_prompt_cache()
+        return await get_custom_skill(skill_name)
+    except HTTPException:
+        raise
+    except IndexError:
+        raise HTTPException(status_code=400, detail="history_index is out of range")
+    except FileNotFoundError as e:
+        raise HTTPException(status_code=404, detail=str(e))
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        logger.error("Failed to roll back custom skill %s: %s", skill_name, e, exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Failed to roll back custom skill: {str(e)}")
+
+
@router.get(
    "/skills/{skill_name}",
    response_model=SkillResponse,
@@ -147,27 +352,3 @@ async def update_skill(skill_name: str, request: SkillUpdateRequest) -> SkillRes
    except Exception as e:
        logger.error(f"Failed to update skill {skill_name}: {e}", exc_info=True)
        raise HTTPException(status_code=500, detail=f"Failed to update skill: {str(e)}")
-
-
-@router.post(
-    "/skills/install",
-    response_model=SkillInstallResponse,
-    summary="Install Skill",
-    description="Install a skill from a .skill file (ZIP archive) located in the thread's user-data directory.",
-)
-async def install_skill(request: SkillInstallRequest) -> SkillInstallResponse:
-    try:
-        skill_file_path = resolve_thread_virtual_path(request.thread_id, request.path)
-        result = install_skill_from_archive(skill_file_path)
-        return SkillInstallResponse(**result)
-    except FileNotFoundError as e:
-        raise HTTPException(status_code=404, detail=str(e))
-    except SkillAlreadyExistsError as e:
-        raise HTTPException(status_code=409, detail=str(e))
-    except ValueError as e:
-        raise HTTPException(status_code=400, detail=str(e))
-    except HTTPException:
-        raise
-    except Exception as e:
-        logger.error(f"Failed to install skill: {e}", exc_info=True)
-        raise HTTPException(status_code=500, detail=f"Failed to install skill: {str(e)}")
@@ -2,6 +2,7 @@ import json
 import logging

 from fastapi import APIRouter
+from langchain_core.messages import HumanMessage, SystemMessage
 from pydantic import BaseModel, Field

 from deerflow.models import create_chat_model
@@ -106,22 +107,21 @@ async def generate_suggestions(thread_id: str, request: SuggestionsRequest) -> S
    if not conversation:
        return SuggestionsResponse(suggestions=[])

-    prompt = (
+    system_instruction = (
        "You are generating follow-up questions to help the user continue the conversation.\n"
        f"Based on the conversation below, produce EXACTLY {n} short questions the user might ask next.\n"
        "Requirements:\n"
-        "- Questions must be relevant to the conversation.\n"
+        "- Questions must be relevant to the preceding conversation.\n"
        "- Questions must be written in the same language as the user.\n"
        "- Keep each question concise (ideally <= 20 words / <= 40 Chinese characters).\n"
        "- Do NOT include numbering, markdown, or any extra text.\n"
-        "- Output MUST be a JSON array of strings only.\n\n"
-        "Conversation:\n"
-        f"{conversation}\n"
+        "- Output MUST be a JSON array of strings only.\n"
    )
+    user_content = f"Conversation Context:\n{conversation}\n\nGenerate {n} follow-up questions"

    try:
        model = create_chat_model(name=request.model_name, thinking_enabled=False)
-        response = model.invoke(prompt)
+        response = await model.ainvoke([SystemMessage(content=system_instruction), HumanMessage(content=user_content)])
        raw = _extract_response_text(response.content)
        suggestions = _parse_json_string_list(raw) or []
        cleaned = [s.replace("\n", " ").strip() for s in suggestions if s.strip()]
@@ -19,7 +19,7 @@ from fastapi import APIRouter, HTTPException, Query, Request
 from fastapi.responses import Response, StreamingResponse
 from pydantic import BaseModel, Field

-from app.gateway.deps import get_checkpointer, get_run_manager, get_stream_bridge
+from app.gateway.deps import get_checkpointer, get_run_event_store, get_run_manager, get_run_store, get_stream_bridge
 from app.gateway.services import sse_consumer, start_run
 from deerflow.runtime import RunRecord, serialize_channel_values

@@ -53,6 +53,7 @@ class RunCreateRequest(BaseModel):
    after_seconds: float | None = Field(default=None, description="Delayed execution")
    if_not_exists: Literal["reject", "create"] = Field(default="create", description="Thread creation policy")
    feedback_keys: list[str] | None = Field(default=None, description="LangSmith feedback keys")
+    follow_up_to_run_id: str | None = Field(default=None, description="Run ID this message follows up on. Auto-detected from latest successful run if not provided.")


 class RunResponse(BaseModel):
@@ -118,8 +119,9 @@ async def stream_run(thread_id: str, body: RunCreateRequest, request: Request) -
            "Connection": "keep-alive",
            "X-Accel-Buffering": "no",
            # LangGraph Platform includes run metadata in this header.
-            # The SDK's _get_run_metadata_from_response() parses it.
-            "Content-Location": (f"/api/threads/{thread_id}/runs/{record.run_id}/stream?thread_id={thread_id}&run_id={record.run_id}"),
+            # The SDK uses a greedy regex to extract the run id from this path,
+            # so it must point at the canonical run resource without extra suffixes.
+            "Content-Location": f"/api/threads/{thread_id}/runs/{record.run_id}",
        },
    )

@@ -264,3 +266,50 @@ async def stream_existing_run(
            "X-Accel-Buffering": "no",
        },
    )
+
+
+# ---------------------------------------------------------------------------
+# Messages / Events / Token usage endpoints
+# ---------------------------------------------------------------------------
+
+
+@router.get("/{thread_id}/messages")
+async def list_thread_messages(
+    thread_id: str,
+    request: Request,
+    limit: int = Query(default=50, le=200),
+    before_seq: int | None = Query(default=None),
+    after_seq: int | None = Query(default=None),
+) -> list[dict]:
+    """Return displayable messages for a thread (across all runs)."""
+    event_store = get_run_event_store(request)
+    return await event_store.list_messages(thread_id, limit=limit, before_seq=before_seq, after_seq=after_seq)
+
+
+@router.get("/{thread_id}/runs/{run_id}/messages")
+async def list_run_messages(thread_id: str, run_id: str, request: Request) -> list[dict]:
+    """Return displayable messages for a specific run."""
+    event_store = get_run_event_store(request)
+    return await event_store.list_messages_by_run(thread_id, run_id)
+
+
+@router.get("/{thread_id}/runs/{run_id}/events")
+async def list_run_events(
+    thread_id: str,
+    run_id: str,
+    request: Request,
+    event_types: str | None = Query(default=None),
+    limit: int = Query(default=500, le=2000),
+) -> list[dict]:
+    """Return the full event stream for a run (debug/audit)."""
+    event_store = get_run_event_store(request)
+    types = event_types.split(",") if event_types else None
+    return await event_store.list_events(thread_id, run_id, event_types=types, limit=limit)
+
+
+@router.get("/{thread_id}/token-usage")
+async def thread_token_usage(thread_id: str, request: Request) -> dict:
+    """Thread-level token usage aggregation."""
+    run_store = get_run_store(request)
+    agg = await run_store.aggregate_tokens_by_thread(thread_id)
+    return {"thread_id": thread_id, **agg}
@@ -20,17 +20,11 @@ from typing import Any
 from fastapi import APIRouter, HTTPException, Request
 from pydantic import BaseModel, Field

-from app.gateway.deps import get_checkpointer, get_store
+from app.gateway.deps import get_checkpointer
+from app.gateway.utils import sanitize_log_param
 from deerflow.config.paths import Paths, get_paths
 from deerflow.runtime import serialize_channel_values

-# ---------------------------------------------------------------------------
-# Store namespace
-# ---------------------------------------------------------------------------
-
-THREADS_NS: tuple[str, ...] = ("threads",)
-"""Namespace used by the Store for thread metadata records."""
-
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api/threads", tags=["threads"])

@@ -63,6 +57,7 @@ class ThreadCreateRequest(BaseModel):
    """Request body for creating a thread."""

    thread_id: str | None = Field(default=None, description="Optional thread ID (auto-generated if omitted)")
+    assistant_id: str | None = Field(default=None, description="Associate thread with an assistant")
    metadata: dict[str, Any] = Field(default_factory=dict, description="Initial metadata")


@@ -135,61 +130,16 @@ def _delete_thread_data(thread_id: str, paths: Paths | None = None) -> ThreadDel
        raise HTTPException(status_code=422, detail=str(exc)) from exc
    except FileNotFoundError:
        # Not critical — thread data may not exist on disk
-        logger.debug("No local thread data to delete for %s", thread_id)
+        logger.debug("No local thread data to delete for %s", sanitize_log_param(thread_id))
        return ThreadDeleteResponse(success=True, message=f"No local data for {thread_id}")
    except Exception as exc:
-        logger.exception("Failed to delete thread data for %s", thread_id)
+        logger.exception("Failed to delete thread data for %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to delete local thread data.") from exc

-    logger.info("Deleted local thread data for %s", thread_id)
+    logger.info("Deleted local thread data for %s", sanitize_log_param(thread_id))
    return ThreadDeleteResponse(success=True, message=f"Deleted local thread data for {thread_id}")


-async def _store_get(store, thread_id: str) -> dict | None:
-    """Fetch a thread record from the Store; returns ``None`` if absent."""
-    item = await store.aget(THREADS_NS, thread_id)
-    return item.value if item is not None else None
-
-
-async def _store_put(store, record: dict) -> None:
-    """Write a thread record to the Store."""
-    await store.aput(THREADS_NS, record["thread_id"], record)
-
-
-async def _store_upsert(store, thread_id: str, *, metadata: dict | None = None, values: dict | None = None) -> None:
-    """Create or refresh a thread record in the Store.
-
-    On creation the record is written with ``status="idle"``.  On update only
-    ``updated_at`` (and optionally ``metadata`` / ``values``) are changed so
-    that existing fields are preserved.
-
-    ``values`` carries the agent-state snapshot exposed to the frontend
-    (currently just ``{"title": "..."}``).
-    """
-    now = time.time()
-    existing = await _store_get(store, thread_id)
-    if existing is None:
-        await _store_put(
-            store,
-            {
-                "thread_id": thread_id,
-                "status": "idle",
-                "created_at": now,
-                "updated_at": now,
-                "metadata": metadata or {},
-                "values": values or {},
-            },
-        )
-    else:
-        val = dict(existing)
-        val["updated_at"] = now
-        if metadata:
-            val.setdefault("metadata", {}).update(metadata)
-        if values:
-            val.setdefault("values", {}).update(values)
-        await _store_put(store, val)
-
-
 def _derive_thread_status(checkpoint_tuple) -> str:
    """Derive thread status from checkpoint metadata."""
    if checkpoint_tuple is None:
@@ -219,19 +169,14 @@ async def delete_thread_data(thread_id: str, request: Request) -> ThreadDeleteRe
    """Delete local persisted filesystem data for a thread.

    Cleans DeerFlow-managed thread directories, removes checkpoint data,
-    and removes the thread record from the Store.
+    and removes the thread_meta row from the configured ThreadMetaStore
+    (sqlite or memory).
    """
+    from app.gateway.deps import get_thread_meta_repo
+
    # Clean local filesystem
    response = _delete_thread_data(thread_id)

-    # Remove from Store (best-effort)
-    store = get_store(request)
-    if store is not None:
-        try:
-            await store.adelete(THREADS_NS, thread_id)
-        except Exception:
-            logger.debug("Could not delete store record for thread %s (not critical)", thread_id)
-
    # Remove checkpoints (best-effort)
    checkpointer = getattr(request.app.state, "checkpointer", None)
    if checkpointer is not None:
@@ -239,7 +184,15 @@ async def delete_thread_data(thread_id: str, request: Request) -> ThreadDeleteRe
            if hasattr(checkpointer, "adelete_thread"):
                await checkpointer.adelete_thread(thread_id)
        except Exception:
-            logger.debug("Could not delete checkpoints for thread %s (not critical)", thread_id)
+            logger.debug("Could not delete checkpoints for thread %s (not critical)", sanitize_log_param(thread_id))
+
+    # Remove thread_meta row (best-effort) — required for sqlite backend
+    # so the deleted thread no longer appears in /threads/search.
+    try:
+        thread_meta_repo = get_thread_meta_repo(request)
+        await thread_meta_repo.delete(thread_id)
+    except Exception:
+        logger.debug("Could not delete thread_meta for %s (not critical)", sanitize_log_param(thread_id))

    return response

@@ -248,43 +201,38 @@ async def delete_thread_data(thread_id: str, request: Request) -> ThreadDeleteRe
 async def create_thread(body: ThreadCreateRequest, request: Request) -> ThreadResponse:
    """Create a new thread.

-    The thread record is written to the Store (for fast listing) and an
-    empty checkpoint is written to the checkpointer (for state reads).
+    Writes a thread_meta record (so the thread appears in /threads/search)
+    and an empty checkpoint (so state endpoints work immediately).
    Idempotent: returns the existing record when ``thread_id`` already exists.
    """
-    store = get_store(request)
+    from app.gateway.deps import get_thread_meta_repo
+
    checkpointer = get_checkpointer(request)
+    thread_meta_repo = get_thread_meta_repo(request)
    thread_id = body.thread_id or str(uuid.uuid4())
    now = time.time()

-    # Idempotency: return existing record from Store when already present
-    if store is not None:
-        existing_record = await _store_get(store, thread_id)
-        if existing_record is not None:
-            return ThreadResponse(
-                thread_id=thread_id,
-                status=existing_record.get("status", "idle"),
-                created_at=str(existing_record.get("created_at", "")),
-                updated_at=str(existing_record.get("updated_at", "")),
-                metadata=existing_record.get("metadata", {}),
-            )
+    # Idempotency: return existing record when already present
+    existing_record = await thread_meta_repo.get(thread_id)
+    if existing_record is not None:
+        return ThreadResponse(
+            thread_id=thread_id,
+            status=existing_record.get("status", "idle"),
+            created_at=str(existing_record.get("created_at", "")),
+            updated_at=str(existing_record.get("updated_at", "")),
+            metadata=existing_record.get("metadata", {}),
+        )

-    # Write thread record to Store
-    if store is not None:
-        try:
-            await _store_put(
-                store,
-                {
-                    "thread_id": thread_id,
-                    "status": "idle",
-                    "created_at": now,
-                    "updated_at": now,
-                    "metadata": body.metadata,
-                },
-            )
-        except Exception:
-            logger.exception("Failed to write thread %s to store", thread_id)
-            raise HTTPException(status_code=500, detail="Failed to create thread")
+    # Write thread_meta so the thread appears in /threads/search immediately
+    try:
+        await thread_meta_repo.create(
+            thread_id,
+            assistant_id=getattr(body, "assistant_id", None),
+            metadata=body.metadata,
+        )
+    except Exception:
+        logger.exception("Failed to write thread_meta for %s", sanitize_log_param(thread_id))
+        raise HTTPException(status_code=500, detail="Failed to create thread")

    # Write an empty checkpoint so state endpoints work immediately
    config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
@@ -301,10 +249,10 @@ async def create_thread(body: ThreadCreateRequest, request: Request) -> ThreadRe
        }
        await checkpointer.aput(config, empty_checkpoint(), ckpt_metadata, {})
    except Exception:
-        logger.exception("Failed to create checkpoint for thread %s", thread_id)
+        logger.exception("Failed to create checkpoint for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to create thread")

-    logger.info("Thread created: %s", thread_id)
+    logger.info("Thread created: %s", sanitize_log_param(thread_id))
    return ThreadResponse(
        thread_id=thread_id,
        status="idle",
@@ -318,135 +266,56 @@ async def create_thread(body: ThreadCreateRequest, request: Request) -> ThreadRe
 async def search_threads(body: ThreadSearchRequest, request: Request) -> list[ThreadResponse]:
    """Search and list threads.

-    Two-phase approach:
-
-    **Phase 1 — Store (fast path, O(threads))**: returns threads that were
-    created or run through this Gateway.  Store records are tiny metadata
-    dicts so fetching all of them at once is cheap.
-
-    **Phase 2 — Checkpointer supplement (lazy migration)**: threads that
-    were created directly by LangGraph Server (and therefore absent from the
-    Store) are discovered here by iterating the shared checkpointer.  Any
-    newly found thread is immediately written to the Store so that the next
-    search skips Phase 2 for that thread — the Store converges to a full
-    index over time without a one-shot migration job.
+    Delegates to the configured ThreadMetaStore implementation
+    (SQL-backed for sqlite/postgres, Store-backed for memory mode).
    """
-    store = get_store(request)
-    checkpointer = get_checkpointer(request)
+    from app.gateway.deps import get_thread_meta_repo

-    # -----------------------------------------------------------------------
-    # Phase 1: Store
-    # -----------------------------------------------------------------------
-    merged: dict[str, ThreadResponse] = {}
-
-    if store is not None:
-        try:
-            items = await store.asearch(THREADS_NS, limit=10_000)
-        except Exception:
-            logger.warning("Store search failed — falling back to checkpointer only", exc_info=True)
-            items = []
-
-        for item in items:
-            val = item.value
-            merged[val["thread_id"]] = ThreadResponse(
-                thread_id=val["thread_id"],
-                status=val.get("status", "idle"),
-                created_at=str(val.get("created_at", "")),
-                updated_at=str(val.get("updated_at", "")),
-                metadata=val.get("metadata", {}),
-                values=val.get("values", {}),
-            )
-
-    # -----------------------------------------------------------------------
-    # Phase 2: Checkpointer supplement
-    # Discovers threads not yet in the Store (e.g. created by LangGraph
-    # Server) and lazily migrates them so future searches skip this phase.
-    # -----------------------------------------------------------------------
-    try:
-        async for checkpoint_tuple in checkpointer.alist(None):
-            cfg = getattr(checkpoint_tuple, "config", {})
-            thread_id = cfg.get("configurable", {}).get("thread_id")
-            if not thread_id or thread_id in merged:
-                continue
-
-            # Skip sub-graph checkpoints (checkpoint_ns is non-empty for those)
-            if cfg.get("configurable", {}).get("checkpoint_ns", ""):
-                continue
-
-            ckpt_meta = getattr(checkpoint_tuple, "metadata", {}) or {}
-            # Strip LangGraph internal keys from the user-visible metadata dict
-            user_meta = {k: v for k, v in ckpt_meta.items() if k not in ("created_at", "updated_at", "step", "source", "writes", "parents")}
-
-            # Extract state values (title) from the checkpoint's channel_values
-            checkpoint_data = getattr(checkpoint_tuple, "checkpoint", {}) or {}
-            channel_values = checkpoint_data.get("channel_values", {})
-            ckpt_values = {}
-            if title := channel_values.get("title"):
-                ckpt_values["title"] = title
-
-            thread_resp = ThreadResponse(
-                thread_id=thread_id,
-                status=_derive_thread_status(checkpoint_tuple),
-                created_at=str(ckpt_meta.get("created_at", "")),
-                updated_at=str(ckpt_meta.get("updated_at", ckpt_meta.get("created_at", ""))),
-                metadata=user_meta,
-                values=ckpt_values,
-            )
-            merged[thread_id] = thread_resp
-
-            # Lazy migration — write to Store so the next search finds it there
-            if store is not None:
-                try:
-                    await _store_upsert(store, thread_id, metadata=user_meta, values=ckpt_values or None)
-                except Exception:
-                    logger.debug("Failed to migrate thread %s to store (non-fatal)", thread_id)
-    except Exception:
-        logger.exception("Checkpointer scan failed during thread search")
-        # Don't raise — return whatever was collected from Store + partial scan
-
-    # -----------------------------------------------------------------------
-    # Phase 3: Filter → sort → paginate
-    # -----------------------------------------------------------------------
-    results = list(merged.values())
-
-    if body.metadata:
-        results = [r for r in results if all(r.metadata.get(k) == v for k, v in body.metadata.items())]
-
-    if body.status:
-        results = [r for r in results if r.status == body.status]
-
-    results.sort(key=lambda r: r.updated_at, reverse=True)
-    return results[body.offset : body.offset + body.limit]
+    repo = get_thread_meta_repo(request)
+    rows = await repo.search(
+        metadata=body.metadata or None,
+        status=body.status,
+        limit=body.limit,
+        offset=body.offset,
+    )
+    return [
+        ThreadResponse(
+            thread_id=r["thread_id"],
+            status=r.get("status", "idle"),
+            created_at=r.get("created_at", ""),
+            updated_at=r.get("updated_at", ""),
+            metadata=r.get("metadata", {}),
+            values={"title": r["display_name"]} if r.get("display_name") else {},
+            interrupts={},
+        )
+        for r in rows
+    ]


@router.patch("/{thread_id}", response_model=ThreadResponse)
 async def patch_thread(thread_id: str, body: ThreadPatchRequest, request: Request) -> ThreadResponse:
    """Merge metadata into a thread record."""
-    store = get_store(request)
-    if store is None:
-        raise HTTPException(status_code=503, detail="Store not available")
+    from app.gateway.deps import get_thread_meta_repo

-    record = await _store_get(store, thread_id)
+    thread_meta_repo = get_thread_meta_repo(request)
+    record = await thread_meta_repo.get(thread_id)
    if record is None:
        raise HTTPException(status_code=404, detail=f"Thread {thread_id} not found")

-    now = time.time()
-    updated = dict(record)
-    updated.setdefault("metadata", {}).update(body.metadata)
-    updated["updated_at"] = now
-
    try:
-        await _store_put(store, updated)
+        await thread_meta_repo.update_metadata(thread_id, body.metadata)
    except Exception:
-        logger.exception("Failed to patch thread %s", thread_id)
+        logger.exception("Failed to patch thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to update thread")

+    # Re-read to get the merged metadata + refreshed updated_at
+    record = await thread_meta_repo.get(thread_id) or record
    return ThreadResponse(
        thread_id=thread_id,
-        status=updated.get("status", "idle"),
-        created_at=str(updated.get("created_at", "")),
-        updated_at=str(now),
-        metadata=updated.get("metadata", {}),
+        status=record.get("status", "idle"),
+        created_at=str(record.get("created_at", "")),
+        updated_at=str(record.get("updated_at", "")),
+        metadata=record.get("metadata", {}),
    )


@@ -454,30 +323,31 @@ async def patch_thread(thread_id: str, body: ThreadPatchRequest, request: Reques
 async def get_thread(thread_id: str, request: Request) -> ThreadResponse:
    """Get thread info.

-    Reads metadata from the Store and derives the accurate execution
-    status from the checkpointer.  Falls back to the checkpointer alone
-    for threads that pre-date Store adoption (backward compat).
+    Reads metadata from the ThreadMetaStore and derives the accurate
+    execution status from the checkpointer.  Falls back to the checkpointer
+    alone for threads that pre-date ThreadMetaStore adoption (backward compat).
    """
-    store = get_store(request)
+    from app.gateway.deps import get_thread_meta_repo
+
+    thread_meta_repo = get_thread_meta_repo(request)
    checkpointer = get_checkpointer(request)

-    record: dict | None = None
-    if store is not None:
-        record = await _store_get(store, thread_id)
+    record: dict | None = await thread_meta_repo.get(thread_id)

    # Derive accurate status from the checkpointer
    config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
    try:
        checkpoint_tuple = await checkpointer.aget_tuple(config)
    except Exception:
-        logger.exception("Failed to get checkpoint for thread %s", thread_id)
+        logger.exception("Failed to get checkpoint for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to get thread")

    if record is None and checkpoint_tuple is None:
        raise HTTPException(status_code=404, detail=f"Thread {thread_id} not found")

-    # If the thread exists in the checkpointer but not the store (e.g. legacy
-    # data), synthesize a minimal store record from the checkpoint metadata.
+    # If the thread exists in the checkpointer but not in thread_meta (e.g.
+    # legacy data created before thread_meta adoption), synthesize a minimal
+    # record from the checkpoint metadata.
    if record is None and checkpoint_tuple is not None:
        ckpt_meta = getattr(checkpoint_tuple, "metadata", {}) or {}
        record = {
@@ -488,16 +358,19 @@ async def get_thread(thread_id: str, request: Request) -> ThreadResponse:
            "metadata": {k: v for k, v in ckpt_meta.items() if k not in ("created_at", "updated_at", "step", "source", "writes", "parents")},
        }

-    status = _derive_thread_status(checkpoint_tuple) if checkpoint_tuple is not None else record.get("status", "idle")  # type: ignore[union-attr]
+    if record is None:
+        raise HTTPException(status_code=404, detail=f"Thread {thread_id} not found")
+
+    status = _derive_thread_status(checkpoint_tuple) if checkpoint_tuple is not None else record.get("status", "idle")
    checkpoint = getattr(checkpoint_tuple, "checkpoint", {}) or {} if checkpoint_tuple is not None else {}
    channel_values = checkpoint.get("channel_values", {})

    return ThreadResponse(
        thread_id=thread_id,
        status=status,
-        created_at=str(record.get("created_at", "")),  # type: ignore[union-attr]
-        updated_at=str(record.get("updated_at", "")),  # type: ignore[union-attr]
-        metadata=record.get("metadata", {}),  # type: ignore[union-attr]
+        created_at=str(record.get("created_at", "")),
+        updated_at=str(record.get("updated_at", "")),
+        metadata=record.get("metadata", {}),
        values=serialize_channel_values(channel_values),
    )

@@ -515,7 +388,7 @@ async def get_thread_state(thread_id: str, request: Request) -> ThreadStateRespo
    try:
        checkpoint_tuple = await checkpointer.aget_tuple(config)
    except Exception:
-        logger.exception("Failed to get state for thread %s", thread_id)
+        logger.exception("Failed to get state for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to get thread state")

    if checkpoint_tuple is None:
@@ -556,11 +429,14 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re
    """Update thread state (e.g. for human-in-the-loop resume or title rename).

    Writes a new checkpoint that merges *body.values* into the latest
-    channel values, then syncs any updated ``title`` field back to the Store
-    so that ``/threads/search`` reflects the change immediately.
+    channel values, then syncs any updated ``title`` field through the
+    ThreadMetaStore abstraction so that ``/threads/search`` reflects the
+    change immediately in both sqlite and memory backends.
    """
+    from app.gateway.deps import get_thread_meta_repo
+
    checkpointer = get_checkpointer(request)
-    store = get_store(request)
+    thread_meta_repo = get_thread_meta_repo(request)

    # checkpoint_ns must be present in the config for aput — default to ""
    # (the root graph namespace).  checkpoint_id is optional; omitting it
@@ -577,7 +453,7 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re
    try:
        checkpoint_tuple = await checkpointer.aget_tuple(read_config)
    except Exception:
-        logger.exception("Failed to get state for thread %s", thread_id)
+        logger.exception("Failed to get state for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to get thread state")

    if checkpoint_tuple is None:
@@ -611,19 +487,22 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re
    try:
        new_config = await checkpointer.aput(write_config, checkpoint, metadata, {})
    except Exception:
-        logger.exception("Failed to update state for thread %s", thread_id)
+        logger.exception("Failed to update state for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to update thread state")

    new_checkpoint_id: str | None = None
    if isinstance(new_config, dict):
        new_checkpoint_id = new_config.get("configurable", {}).get("checkpoint_id")

-    # Sync title changes to the Store so /threads/search reflects them immediately.
-    if store is not None and body.values and "title" in body.values:
-        try:
-            await _store_upsert(store, thread_id, values={"title": body.values["title"]})
-        except Exception:
-            logger.debug("Failed to sync title to store for thread %s (non-fatal)", thread_id)
+    # Sync title changes through the ThreadMetaStore abstraction so /threads/search
+    # reflects them immediately in both sqlite and memory backends.
+    if body.values and "title" in body.values:
+        new_title = body.values["title"]
+        if new_title:  # Skip empty strings and None
+            try:
+                await thread_meta_repo.update_display_name(thread_id, new_title)
+            except Exception:
+                logger.debug("Failed to sync title to thread_meta for %s (non-fatal)", sanitize_log_param(thread_id))

    return ThreadStateResponse(
        values=serialize_channel_values(channel_values),
@@ -636,7 +515,14 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re

@router.post("/{thread_id}/history", response_model=list[HistoryEntry])
 async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request: Request) -> list[HistoryEntry]:
-    """Get checkpoint history for a thread."""
+    """Get checkpoint history for a thread.
+
+    Messages are read from the checkpointer's channel values (the
+    authoritative source) and serialized via
+    :func:`~deerflow.runtime.serialization.serialize_channel_values`.
+    Only the latest (first) checkpoint carries the ``messages`` key to
+    avoid duplicating them across every entry.
+    """
    checkpointer = get_checkpointer(request)

    config: dict[str, Any] = {"configurable": {"thread_id": thread_id}}
@@ -644,6 +530,7 @@ async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request
        config["configurable"]["checkpoint_id"] = body.before

    entries: list[HistoryEntry] = []
+    is_latest_checkpoint = True
    try:
        async for checkpoint_tuple in checkpointer.alist(config, limit=body.limit):
            ckpt_config = getattr(checkpoint_tuple, "config", {})
@@ -658,22 +545,42 @@ async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request

            channel_values = checkpoint.get("channel_values", {})

+            # Build values from checkpoint channel_values
+            values: dict[str, Any] = {}
+            if title := channel_values.get("title"):
+                values["title"] = title
+            if thread_data := channel_values.get("thread_data"):
+                values["thread_data"] = thread_data
+
+            # Attach messages from checkpointer only for the latest checkpoint
+            if is_latest_checkpoint:
+                messages = channel_values.get("messages")
+                if messages:
+                    values["messages"] = serialize_channel_values({"messages": messages}).get("messages", [])
+            is_latest_checkpoint = False
+
            # Derive next tasks
            tasks_raw = getattr(checkpoint_tuple, "tasks", []) or []
            next_tasks = [t.name for t in tasks_raw if hasattr(t, "name")]

+            # Strip LangGraph internal keys from metadata
+            user_meta = {k: v for k, v in metadata.items() if k not in ("created_at", "updated_at", "step", "source", "writes", "parents")}
+            # Keep step for ordering context
+            if "step" in metadata:
+                user_meta["step"] = metadata["step"]
+
            entries.append(
                HistoryEntry(
                    checkpoint_id=checkpoint_id,
                    parent_checkpoint_id=parent_id,
-                    metadata=metadata,
-                    values=serialize_channel_values(channel_values),
+                    metadata=user_meta,
+                    values=values,
                    created_at=str(metadata.get("created_at", "")),
                    next=next_tasks,
                )
            )
    except Exception:
-        logger.exception("Failed to get history for thread %s", thread_id)
+        logger.exception("Failed to get history for thread %s", sanitize_log_param(thread_id))
        raise HTTPException(status_code=500, detail="Failed to get thread history")

    return entries
@@ -8,16 +8,17 @@ frames, and consuming stream bridge events.  Router modules
 from __future__ import annotations

 import asyncio
+import dataclasses
 import json
 import logging
 import re
-import time
 from typing import Any

 from fastapi import HTTPException, Request
 from langchain_core.messages import HumanMessage

-from app.gateway.deps import get_checkpointer, get_run_manager, get_store, get_stream_bridge
+from app.gateway.deps import get_run_context, get_run_manager, get_run_store, get_stream_bridge
+from app.gateway.utils import sanitize_log_param
 from deerflow.runtime import (
    END_SENTINEL,
    HEARTBEAT_SENTINEL,
@@ -129,26 +130,38 @@ def build_run_config(
    the LangGraph Platform-compatible HTTP API and the IM channel path behave
    identically.
    """
-    configurable: dict[str, Any] = {"thread_id": thread_id}
+    config: dict[str, Any] = {"recursion_limit": 100}
    if request_config:
-        configurable.update(request_config.get("configurable", {}))
+        # LangGraph >= 0.6.0 introduced ``context`` as the preferred way to
+        # pass thread-level data and rejects requests that include both
+        # ``configurable`` and ``context``.  If the caller already sends
+        # ``context``, honour it and skip our own ``configurable`` dict.
+        if "context" in request_config:
+            if "configurable" in request_config:
+                logger.warning(
+                    "build_run_config: client sent both 'context' and 'configurable'; preferring 'context' (LangGraph >= 0.6.0). thread_id=%s, caller_configurable keys=%s",
+                    thread_id,
+                    list(request_config.get("configurable", {}).keys()),
+                )
+            config["context"] = request_config["context"]
+        else:
+            configurable = {"thread_id": thread_id}
+            configurable.update(request_config.get("configurable", {}))
+            config["configurable"] = configurable
+        for k, v in request_config.items():
+            if k not in ("configurable", "context"):
+                config[k] = v
+    else:
+        config["configurable"] = {"thread_id": thread_id}

    # Inject custom agent name when the caller specified a non-default assistant.
    # Honour an explicit configurable["agent_name"] in the request if already set.
-    if assistant_id and assistant_id != _DEFAULT_ASSISTANT_ID and "agent_name" not in configurable:
-        # Normalize the same way ChannelManager does: strip, lowercase,
-        # replace underscores with hyphens, then validate to prevent path
-        # traversal and invalid agent directory lookups.
-        normalized = assistant_id.strip().lower().replace("_", "-")
-        if not normalized or not re.fullmatch(r"[a-z0-9-]+", normalized):
-            raise ValueError(f"Invalid assistant_id {assistant_id!r}: must contain only letters, digits, and hyphens after normalization.")
-        configurable["agent_name"] = normalized
-
-    config: dict[str, Any] = {"configurable": configurable, "recursion_limit": 100}
-    if request_config:
-        for k, v in request_config.items():
-            if k != "configurable":
-                config[k] = v
+    if assistant_id and assistant_id != _DEFAULT_ASSISTANT_ID and "configurable" in config:
+        if "agent_name" not in config["configurable"]:
+            normalized = assistant_id.strip().lower().replace("_", "-")
+            if not normalized or not re.fullmatch(r"[a-z0-9-]+", normalized):
+                raise ValueError(f"Invalid assistant_id {assistant_id!r}: must contain only letters, digits, and hyphens after normalization.")
+            config["configurable"]["agent_name"] = normalized
    if metadata:
        config.setdefault("metadata", {}).update(metadata)
    return config
@@ -159,71 +172,6 @@ def build_run_config(
 # ---------------------------------------------------------------------------


-async def _upsert_thread_in_store(store, thread_id: str, metadata: dict | None) -> None:
-    """Create or refresh the thread record in the Store.
-
-    Called from :func:`start_run` so that threads created via the stateless
-    ``/runs/stream`` endpoint (which never calls ``POST /threads``) still
-    appear in ``/threads/search`` results.
-    """
-    # Deferred import to avoid circular import with the threads router module.
-    from app.gateway.routers.threads import _store_upsert
-
-    try:
-        await _store_upsert(store, thread_id, metadata=metadata)
-    except Exception:
-        logger.warning("Failed to upsert thread %s in store (non-fatal)", thread_id)
-
-
-async def _sync_thread_title_after_run(
-    run_task: asyncio.Task,
-    thread_id: str,
-    checkpointer: Any,
-    store: Any,
-) -> None:
-    """Wait for *run_task* to finish, then persist the generated title to the Store.
-
-    TitleMiddleware writes the generated title to the LangGraph agent state
-    (checkpointer) but the Gateway's Store record is not updated automatically.
-    This coroutine closes that gap by reading the final checkpoint after the
-    run completes and syncing ``values.title`` into the Store record so that
-    subsequent ``/threads/search`` responses include the correct title.
-
-    Runs as a fire-and-forget :func:`asyncio.create_task`; failures are
-    logged at DEBUG level and never propagate.
-    """
-    # Wait for the background run task to complete (any outcome).
-    # asyncio.wait does not propagate task exceptions — it just returns
-    # when the task is done, cancelled, or failed.
-    await asyncio.wait({run_task})
-
-    # Deferred import to avoid circular import with the threads router module.
-    from app.gateway.routers.threads import _store_get, _store_put
-
-    try:
-        ckpt_config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
-        ckpt_tuple = await checkpointer.aget_tuple(ckpt_config)
-        if ckpt_tuple is None:
-            return
-
-        channel_values = ckpt_tuple.checkpoint.get("channel_values", {})
-        title = channel_values.get("title")
-        if not title:
-            return
-
-        existing = await _store_get(store, thread_id)
-        if existing is None:
-            return
-
-        updated = dict(existing)
-        updated.setdefault("values", {})["title"] = title
-        updated["updated_at"] = time.time()
-        await _store_put(store, updated)
-        logger.debug("Synced title %r for thread %s", title, thread_id)
-    except Exception:
-        logger.debug("Failed to sync title for thread %s (non-fatal)", thread_id, exc_info=True)
-
-
 async def start_run(
    body: Any,
    thread_id: str,
@@ -243,11 +191,25 @@ async def start_run(
    """
    bridge = get_stream_bridge(request)
    run_mgr = get_run_manager(request)
-    checkpointer = get_checkpointer(request)
-    store = get_store(request)
+    run_ctx = get_run_context(request)

    disconnect = DisconnectMode.cancel if body.on_disconnect == "cancel" else DisconnectMode.continue_

+    # Resolve follow_up_to_run_id: explicit from request, or auto-detect from latest successful run
+    follow_up_to_run_id = getattr(body, "follow_up_to_run_id", None)
+    if follow_up_to_run_id is None:
+        run_store = get_run_store(request)
+        try:
+            recent_runs = await run_store.list_by_thread(thread_id, limit=1)
+            if recent_runs and recent_runs[0].get("status") == "success":
+                follow_up_to_run_id = recent_runs[0]["run_id"]
+        except Exception:
+            pass  # Don't block run creation
+
+    # Enrich base context with per-run field
+    if follow_up_to_run_id:
+        run_ctx = dataclasses.replace(run_ctx, follow_up_to_run_id=follow_up_to_run_id)
+
    try:
        record = await run_mgr.create_or_reject(
            thread_id,
@@ -256,17 +218,28 @@ async def start_run(
            metadata=body.metadata or {},
            kwargs={"input": body.input, "config": body.config},
            multitask_strategy=body.multitask_strategy,
+            follow_up_to_run_id=follow_up_to_run_id,
        )
    except ConflictError as exc:
        raise HTTPException(status_code=409, detail=str(exc)) from exc
    except UnsupportedStrategyError as exc:
        raise HTTPException(status_code=501, detail=str(exc)) from exc

-    # Ensure the thread is visible in /threads/search, even for threads that
-    # were never explicitly created via POST /threads (e.g. stateless runs).
-    store = get_store(request)
-    if store is not None:
-        await _upsert_thread_in_store(store, thread_id, body.metadata)
+    # Upsert thread metadata so the thread appears in /threads/search,
+    # even for threads that were never explicitly created via POST /threads
+    # (e.g. stateless runs).
+    try:
+        existing = await run_ctx.thread_meta_repo.get(thread_id)
+        if existing is None:
+            await run_ctx.thread_meta_repo.create(
+                thread_id,
+                assistant_id=body.assistant_id,
+                metadata=body.metadata,
+            )
+        else:
+            await run_ctx.thread_meta_repo.update_status(thread_id, "running")
+    except Exception:
+        logger.warning("Failed to upsert thread_meta for %s (non-fatal)", sanitize_log_param(thread_id))

    agent_factory = resolve_agent_factory(body.assistant_id)
    graph_input = normalize_input(body.input)
@@ -299,8 +272,7 @@ async def start_run(
            bridge,
            run_mgr,
            record,
-            checkpointer=checkpointer,
-            store=store,
+            ctx=run_ctx,
            agent_factory=agent_factory,
            graph_input=graph_input,
            config=config,
@@ -312,11 +284,9 @@ async def start_run(
    )
    record.task = task

-    # After the run completes, sync the title generated by TitleMiddleware from
-    # the checkpointer into the Store record so that /threads/search returns the
-    # correct title instead of an empty values dict.
-    if store is not None:
-        asyncio.create_task(_sync_thread_title_after_run(task, thread_id, checkpointer, store))
+    # Title sync is handled by worker.py's finally block which reads the
+    # title from the checkpoint and calls thread_meta_repo.update_display_name
+    # after the run completes.

    return record

@@ -333,8 +303,9 @@ async def sse_consumer(
    - ``cancel``: abort the background task on client disconnect.
    - ``continue``: let the task run; events are discarded.
    """
+    last_event_id = request.headers.get("Last-Event-ID")
    try:
-        async for entry in bridge.subscribe(record.run_id):
+        async for entry in bridge.subscribe(record.run_id, last_event_id=last_event_id):
            if await request.is_disconnected():
                break

@@ -0,0 +1,6 @@
+"""Shared utility helpers for the Gateway layer."""
+
+
+def sanitize_log_param(value: str) -> str:
+    """Strip control characters to prevent log injection."""
+    return value.replace("\n", "").replace("\r", "").replace("\x00", "")
@@ -248,7 +248,7 @@ def after_agent(self, state: TitleMiddlewareState, runtime: Runtime) -> dict | N
 - [`packages/harness/deerflow/agents/thread_state.py`](../packages/harness/deerflow/agents/thread_state.py) - ThreadState 定义
 - [`packages/harness/deerflow/agents/middlewares/title_middleware.py`](../packages/harness/deerflow/agents/middlewares/title_middleware.py) - TitleMiddleware 实现
 - [`packages/harness/deerflow/config/title_config.py`](../packages/harness/deerflow/config/title_config.py) - 配置管理
- [`config.yaml`](../config.yaml) - 配置文件
+- [`config.yaml`](../../config.example.yaml) - 配置文件
 - [`packages/harness/deerflow/agents/lead_agent/agent.py`](../packages/harness/deerflow/agents/lead_agent/agent.py) - Middleware 注册

 ## 参考资料
@@ -278,6 +278,12 @@ skills:
 - Skills are automatically discovered and loaded
 - Available in both local and Docker sandbox via path mapping

+**Per-Agent Skill Filtering**:
+Custom agents can restrict which skills they load by defining a `skills` field in their `config.yaml` (located at `workspace/agents/<agent_name>/config.yaml`):
+- **Omitted or `null`**: Loads all globally enabled skills (default fallback).
+- **`[]` (empty list)**: Disables all skills for this specific agent.
+- **`["skill-name"]`**: Loads only the explicitly specified skills.
+
 ### Title Generation

 Automatic conversation title generation:
@@ -30,7 +30,7 @@

 ### 2. 配置文件

-#### [`config.yaml`](../config.yaml)
+#### [`config.yaml`](../../config.example.yaml)
 - ✅ 添加 title 配置段：
 ```yaml
 title:
@@ -51,7 +51,7 @@ title:
 - ✅ 故障排查指南
 - ✅ State vs Metadata 对比

-#### [`BACKEND_TODO.md`](../BACKEND_TODO.md)
+#### [`TODO.md`](TODO.md)
 - ✅ 添加功能完成记录

 ### 4. 测试
@@ -0,0 +1,446 @@
+# [RFC] 在 DeerFlow 中增加 `grep` 与 `glob` 文件搜索工具
+
+## Summary
+
+我认为这个方向是对的，而且值得做。
+
+如果 DeerFlow 想更接近 Claude Code 这类 coding agent 的实际工作流，仅有 `ls` / `read_file` / `write_file` / `str_replace` 还不够。模型在进入修改前，通常还需要两类能力：
+
+- `glob`: 快速按路径模式找文件
+- `grep`: 快速按内容模式找候选位置
+
+这两类工具的价值，不是“功能上 bash 也能做”，而是它们能以更低 token 成本、更强约束、更稳定的输出格式，替代模型频繁走 `bash find` / `bash grep` / `rg` 的习惯。
+
+但前提是实现方式要对：**它们应该是只读、结构化、受限、可审计的原生工具，而不是对 shell 命令的简单包装。**
+
+## Problem
+
+当前 DeerFlow 的文件工具层主要覆盖：
+
+- `ls`: 浏览目录结构
+- `read_file`: 读取文件内容
+- `write_file`: 写文件
+- `str_replace`: 做局部字符串替换
+- `bash`: 兜底执行命令
+
+这套能力能完成任务，但在代码库探索阶段效率不高。
+
+典型问题：
+
+1. 模型想找 “所有 `*.tsx` 的 page 文件” 时，只能反复 `ls` 多层目录，或者退回 `bash find`
+2. 模型想找 “某个 symbol / 文案 / 配置键在哪里出现” 时，只能逐文件 `read_file`，或者退回 `bash grep` / `rg`
+3. 一旦退回 `bash`，工具调用就失去结构化输出，结果也更难做裁剪、分页、审计和跨 sandbox 一致化
+4. 对没有开启 host bash 的本地模式，`bash` 甚至可能不可用，此时缺少足够强的只读检索能力
+
+结论：DeerFlow 现在缺的不是“再多一个 shell 命令”，而是**文件系统检索层**。
+
+## Goals
+
+- 为 agent 提供稳定的路径搜索和内容搜索能力
+- 减少对 `bash` 的依赖，特别是在仓库探索阶段
+- 保持与现有 sandbox 安全模型一致
+- 输出格式结构化，便于模型后续串联 `read_file` / `str_replace`
+- 让本地 sandbox、容器 sandbox、未来 MCP 文件系统工具都能遵守同一语义
+
+## Non-Goals
+
+- 不做通用 shell 兼容层
+- 不暴露完整 grep/find/rg CLI 语法
+- 不在第一版支持二进制检索、复杂 PCRE 特性、上下文窗口高亮渲染等重功能
+- 不把它做成“任意磁盘搜索”，仍然只允许在 DeerFlow 已授权的路径内执行
+
+## Why This Is Worth Doing
+
+参考 Claude Code 这一类 agent 的设计思路，`glob` 和 `grep` 的核心价值不是新能力本身，而是把“探索代码库”的常见动作从开放式 shell 降到受控工具层。
+
+这样有几个直接收益：
+
+1. **更低的模型负担**
+   模型不需要自己拼 `find`, `grep`, `rg`, `xargs`, quoting 等命令细节。
+
+2. **更稳定的跨环境行为**
+   本地、Docker、AIO sandbox 不必依赖容器里是否装了 `rg`，也不会因为 shell 差异导致行为漂移。
+
+3. **更强的安全与审计**
+   调用参数就是“搜索什么、在哪搜、最多返回多少”，天然比任意命令更容易审计和限流。
+
+4. **更好的 token 效率**
+   `grep` 返回的是命中摘要而不是整段文件，模型只对少数候选路径再调用 `read_file`。
+
+5. **对 `tool_search` 友好**
+   当 DeerFlow 持续扩展工具集时，`grep` / `glob` 会成为非常高频的基础工具，值得保留为 built-in，而不是让模型总是退回通用 bash。
+
+## Proposal
+
+增加两个 built-in sandbox tools：
+
+- `glob`
+- `grep`
+
+推荐继续放在：
+
+- `backend/packages/harness/deerflow/sandbox/tools.py`
+
+并在 `config.example.yaml` 中默认加入 `file:read` 组。
+
+### 1. `glob` 工具
+
+用途：按路径模式查找文件或目录。
+
+建议 schema：
+
+```python
+@tool("glob", parse_docstring=True)
+def glob_tool(
+    runtime: ToolRuntime[ContextT, ThreadState],
+    description: str,
+    pattern: str,
+    path: str,
+    include_dirs: bool = False,
+    max_results: int = 200,
+) -> str:
+    ...
+```
+
+参数语义：
+
+- `description`: 与现有工具保持一致
+- `pattern`: glob 模式，例如 `**/*.py`、`src/**/test_*.ts`
+- `path`: 搜索根目录，必须是绝对路径
+- `include_dirs`: 是否返回目录
+- `max_results`: 最大返回条数，防止一次性打爆上下文
+
+建议返回格式：
+
+```text
+Found 3 paths under /mnt/user-data/workspace
+1. /mnt/user-data/workspace/backend/app.py
+2. /mnt/user-data/workspace/backend/tests/test_app.py
+3. /mnt/user-data/workspace/scripts/build.py
+```
+
+如果后续想更适合前端消费，也可以改成 JSON 字符串；但第一版为了兼容现有工具风格，返回可读文本即可。
+
+### 2. `grep` 工具
+
+用途：按内容模式搜索文件，返回命中位置摘要。
+
+建议 schema：
+
+```python
+@tool("grep", parse_docstring=True)
+def grep_tool(
+    runtime: ToolRuntime[ContextT, ThreadState],
+    description: str,
+    pattern: str,
+    path: str,
+    glob: str | None = None,
+    literal: bool = False,
+    case_sensitive: bool = False,
+    max_results: int = 100,
+) -> str:
+    ...
+```
+
+参数语义：
+
+- `pattern`: 搜索词或正则
+- `path`: 搜索根目录，必须是绝对路径
+- `glob`: 可选路径过滤，例如 `**/*.py`
+- `literal`: 为 `True` 时按普通字符串匹配，不解释为正则
+- `case_sensitive`: 是否大小写敏感
+- `max_results`: 最大返回命中数，不是文件数
+
+建议返回格式：
+
+```text
+Found 4 matches under /mnt/user-data/workspace
+/mnt/user-data/workspace/backend/config.py:12: TOOL_GROUPS = [...]
+/mnt/user-data/workspace/backend/config.py:48: def load_tool_config(...):
+/mnt/user-data/workspace/backend/tools.py:91: "tool_groups"
+/mnt/user-data/workspace/backend/tests/test_config.py:22: assert "tool_groups" in data
+```
+
+第一版建议只返回：
+
+- 文件路径
+- 行号
+- 命中行摘要
+
+不返回上下文块，避免结果过大。模型如果需要上下文，再调用 `read_file(path, start_line, end_line)`。
+
+## Design Principles
+
+### A. 不做 shell wrapper
+
+不建议把 `grep` 实现为：
+
+```python
+subprocess.run("grep ...")
+```
+
+也不建议在容器里直接拼 `find` / `rg` 命令。
+
+原因：
+
+- 会引入 shell quoting 和注入面
+- 会依赖不同 sandbox 内镜像是否安装同一套命令
+- Windows / macOS / Linux 行为不一致
+- 很难稳定控制输出条数与格式
+
+正确方向是：
+
+- `glob` 使用 Python 标准库路径遍历
+- `grep` 使用 Python 逐文件扫描
+- 输出由 DeerFlow 自己格式化
+
+如果未来为了性能考虑要优先调用 `rg`，也应该封装在 provider 内部，并保证外部语义不变，而不是把 CLI 暴露给模型。
+
+### B. 继续沿用 DeerFlow 的路径权限模型
+
+这两个工具必须复用当前 `ls` / `read_file` 的路径校验逻辑：
+
+- 本地模式走 `validate_local_tool_path(..., read_only=True)`
+- 支持 `/mnt/skills/...`
+- 支持 `/mnt/acp-workspace/...`
+- 支持 thread workspace / uploads / outputs 的虚拟路径解析
+- 明确拒绝越权路径与 path traversal
+
+也就是说，它们属于 **file:read**，不是 `bash` 的替代越权入口。
+
+### C. 结果必须硬限制
+
+没有硬限制的 `glob` / `grep` 很容易炸上下文。
+
+建议第一版至少限制：
+
+- `glob.max_results` 默认 200，最大 1000
+- `grep.max_results` 默认 100，最大 500
+- 单行摘要最大长度，例如 200 字符
+- 二进制文件跳过
+- 超大文件跳过，例如单文件大于 1 MB 或按配置控制
+
+此外，命中数超过阈值时应返回：
+
+- 已展示的条数
+- 被截断的事实
+- 建议用户缩小搜索范围
+
+例如：
+
+```text
+Found more than 100 matches, showing first 100. Narrow the path or add a glob filter.
+```
+
+### D. 工具语义要彼此互补
+
+推荐模型工作流应该是：
+
+1. `glob` 找候选文件
+2. `grep` 找候选位置
+3. `read_file` 读局部上下文
+4. `str_replace` / `write_file` 执行修改
+
+这样工具边界清晰，也更利于 prompt 中教模型形成稳定习惯。
+
+## Implementation Approach
+
+## Option A: 直接在 `sandbox/tools.py` 实现第一版
+
+这是我推荐的起步方案。
+
+做法：
+
+- 在 `sandbox/tools.py` 新增 `glob_tool` 与 `grep_tool`
+- 在 local sandbox 场景直接使用 Python 文件系统 API
+- 在非 local sandbox 场景，优先也通过 DeerFlow 自己控制的路径访问层实现
+
+优点：
+
+- 改动小
+- 能尽快验证 agent 效果
+- 不需要先改 `Sandbox` 抽象
+
+缺点：
+
+- `tools.py` 会继续变胖
+- 如果未来想在 provider 侧做性能优化，需要再抽象一次
+
+## Option B: 先扩展 `Sandbox` 抽象
+
+例如新增：
+
+```python
+class Sandbox(ABC):
+    def glob(self, path: str, pattern: str, include_dirs: bool = False, max_results: int = 200) -> list[str]:
+        ...
+
+    def grep(
+        self,
+        path: str,
+        pattern: str,
+        *,
+        glob: str | None = None,
+        literal: bool = False,
+        case_sensitive: bool = False,
+        max_results: int = 100,
+    ) -> list[GrepMatch]:
+        ...
+```
+
+优点：
+
+- 抽象更干净
+- 容器 / 远程 sandbox 可以各自优化
+
+缺点：
+
+- 首次引入成本更高
+- 需要同步改所有 sandbox provider
+
+结论：
+
+**第一版建议走 Option A，等工具价值验证后再下沉到 `Sandbox` 抽象层。**
+
+## Detailed Behavior
+
+### `glob` 行为
+
+- 输入根目录不存在：返回清晰错误
+- 根路径不是目录：返回清晰错误
+- 模式非法：返回清晰错误
+- 结果为空：返回 `No files matched`
+- 默认忽略项应尽量与当前 `list_dir` 对齐，例如：
+  - `.git`
+  - `node_modules`
+  - `__pycache__`
+  - `.venv`
+  - 构建产物目录
+
+这里建议抽一个共享 ignore 集，避免 `ls` 与 `glob` 结果风格不一致。
+
+### `grep` 行为
+
+- 默认只扫描文本文件
+- 检测到二进制文件直接跳过
+- 对超大文件直接跳过或只扫前 N KB
+- regex 编译失败时返回参数错误
+- 输出中的路径继续使用虚拟路径，而不是暴露宿主真实路径
+- 建议默认按文件路径、行号排序，保持稳定输出
+
+## Prompting Guidance
+
+如果引入这两个工具，建议同步更新系统提示中的文件操作建议：
+
+- 查找文件名模式时优先用 `glob`
+- 查找代码符号、配置项、文案时优先用 `grep`
+- 只有在工具不足以完成目标时才退回 `bash`
+
+否则模型仍会习惯性先调用 `bash`。
+
+## Risks
+
+### 1. 与 `bash` 能力重叠
+
+这是事实，但不是问题。
+
+`ls` 和 `read_file` 也都能被 `bash` 替代，但我们仍然保留它们，因为结构化工具更适合 agent。
+
+### 2. 性能问题
+
+在大仓库上，纯 Python `grep` 可能比 `rg` 慢。
+
+缓解方式：
+
+- 第一版先加结果上限和文件大小上限
+- 路径上强制要求 root path
+- 提供 `glob` 过滤缩小扫描范围
+- 后续如有必要，在 provider 内部做 `rg` 优化，但保持同一 schema
+
+### 3. 忽略规则不一致
+
+如果 `ls` 能看到的路径，`glob` 却看不到，模型会困惑。
+
+缓解方式：
+
+- 统一 ignore 规则
+- 在文档里明确“默认跳过常见依赖和构建目录”
+
+### 4. 正则搜索过于复杂
+
+如果第一版就支持大量 grep 方言，边界会很乱。
+
+缓解方式：
+
+- 第一版只支持 Python `re`
+- 并提供 `literal=True` 的简单模式
+
+## Alternatives Considered
+
+### A. 不增加工具，完全依赖 `bash`
+
+不推荐。
+
+这会让 DeerFlow 在代码探索体验上持续落后，也削弱无 bash 或受限 bash 场景下的能力。
+
+### B. 只加 `glob`，不加 `grep`
+
+不推荐。
+
+只解决“找文件”，没有解决“找位置”。模型最终还是会退回 `bash grep`。
+
+### C. 只加 `grep`，不加 `glob`
+
+也不推荐。
+
+`grep` 缺少路径模式过滤时，扫描范围经常太大；`glob` 是它的天然前置工具。
+
+### D. 直接接入 MCP filesystem server 的搜索能力
+
+短期不推荐作为主路径。
+
+MCP 可以是补充，但 `glob` / `grep` 作为 DeerFlow 的基础 coding tool，最好仍然是 built-in，这样才能在默认安装中稳定可用。
+
+## Acceptance Criteria
+
+- `config.example.yaml` 中可默认启用 `glob` 与 `grep`
+- 两个工具归属 `file:read` 组
+- 本地 sandbox 下严格遵守现有路径权限
+- 输出不泄露宿主机真实路径
+- 大结果集会被截断并明确提示
+- 模型可以通过 `glob -> grep -> read_file -> str_replace` 完成典型改码流
+- 在禁用 host bash 的本地模式下，仓库探索能力明显提升
+
+## Rollout Plan
+
+1. 在 `sandbox/tools.py` 中实现 `glob_tool` 与 `grep_tool`
+2. 抽取与 `list_dir` 一致的 ignore 规则，避免行为漂移
+3. 在 `config.example.yaml` 默认加入工具配置
+4. 为本地路径校验、虚拟路径映射、结果截断、二进制跳过补测试
+5. 更新 README / backend docs / prompt guidance
+6. 收集实际 agent 调用数据，再决定是否下沉到 `Sandbox` 抽象
+
+## Suggested Config
+
+```yaml
+tools:
+  - name: glob
+    group: file:read
+    use: deerflow.sandbox.tools:glob_tool
+
+  - name: grep
+    group: file:read
+    use: deerflow.sandbox.tools:grep_tool
+```
+
+## Final Recommendation
+
+结论是：**可以加，而且应该加。**
+
+但我会明确卡三个边界：
+
+1. `grep` / `glob` 必须是 built-in 的只读结构化工具
+2. 第一版不要做 shell wrapper，不要把 CLI 方言直接暴露给模型
+3. 先在 `sandbox/tools.py` 验证价值，再考虑是否下沉到 `Sandbox` provider 抽象
+
+如果按这个方向做，它会明显提升 DeerFlow 在 coding / repo exploration 场景下的可用性，而且风险可控。
@@ -83,23 +83,76 @@ async def _async_checkpointer(config) -> AsyncIterator[Checkpointer]:


@contextlib.asynccontextmanager
-async def make_checkpointer() -> AsyncIterator[Checkpointer]:
-    """Async context manager that yields a checkpointer for the caller's lifetime.
-    Resources are opened on enter and closed on exit — no global state::
-
-        async with make_checkpointer() as checkpointer:
-            app.state.checkpointer = checkpointer
-
-    Yields an ``InMemorySaver`` when no checkpointer is configured in *config.yaml*.
-    """
-
-    config = get_app_config()
-
-    if config.checkpointer is None:
+async def _async_checkpointer_from_database(db_config) -> AsyncIterator[Checkpointer]:
+    """Async context manager that constructs a checkpointer from unified DatabaseConfig."""
+    if db_config.backend == "memory":
        from langgraph.checkpoint.memory import InMemorySaver

        yield InMemorySaver()
        return

-    async with _async_checkpointer(config.checkpointer) as saver:
-        yield saver
+    if db_config.backend == "sqlite":
+        try:
+            from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver
+        except ImportError as exc:
+            raise ImportError(SQLITE_INSTALL) from exc
+
+        conn_str = db_config.checkpointer_sqlite_path
+        ensure_sqlite_parent_dir(conn_str)
+        async with AsyncSqliteSaver.from_conn_string(conn_str) as saver:
+            await saver.setup()
+            yield saver
+        return
+
+    if db_config.backend == "postgres":
+        try:
+            from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
+        except ImportError as exc:
+            raise ImportError(POSTGRES_INSTALL) from exc
+
+        if not db_config.postgres_url:
+            raise ValueError("database.postgres_url is required for the postgres backend")
+
+        async with AsyncPostgresSaver.from_conn_string(db_config.postgres_url) as saver:
+            await saver.setup()
+            yield saver
+        return
+
+    raise ValueError(f"Unknown database backend: {db_config.backend!r}")
+
+
+@contextlib.asynccontextmanager
+async def make_checkpointer() -> AsyncIterator[Checkpointer]:
+    """Async context manager that yields a checkpointer for the caller's lifetime.
+    Resources are opened on enter and closed on exit -- no global state::
+
+        async with make_checkpointer() as checkpointer:
+            app.state.checkpointer = checkpointer
+
+    Yields an ``InMemorySaver`` when no checkpointer is configured in *config.yaml*.
+
+    Priority:
+    1. Legacy ``checkpointer:`` config section (backward compatible)
+    2. Unified ``database:`` config section
+    3. Default InMemorySaver
+    """
+
+    config = get_app_config()
+
+    # Legacy: standalone checkpointer config takes precedence
+    if config.checkpointer is not None:
+        async with _async_checkpointer(config.checkpointer) as saver:
+            yield saver
+            return
+
+    # Unified database config
+    db_config = getattr(config, "database", None)
+    if db_config is not None and db_config.backend != "memory":
+        async with _async_checkpointer_from_database(db_config) as saver:
+            yield saver
+            return
+
+    # Default: in-memory
+    from langgraph.checkpoint.memory import InMemorySaver
+
+    yield InMemorySaver()
@@ -56,13 +56,15 @@ def _create_summarization_middleware() -> SummarizationMiddleware | None:
    # Prepare keep parameter
    keep = config.keep.to_tuple()

-    # Prepare model parameter
+    # Prepare model parameter.
+    # Bind "middleware:summarize" tag so RunJournal identifies these LLM calls
+    # as middleware rather than lead_agent (SummarizationMiddleware is a
+    # LangChain built-in, so we tag the model at creation time).
    if config.model_name:
        model = create_chat_model(name=config.model_name, thinking_enabled=False)
    else:
-        # Use a lightweight model for summarization to save costs
-        # Falls back to default model if not explicitly specified
        model = create_chat_model(thinking_enabled=False)
+    model = model.with_config(tags=["middleware:summarize"])

    # Prepare kwargs
    kwargs = {
@@ -343,6 +345,8 @@ def make_lead_agent(config: RunnableConfig):
        model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, reasoning_effort=reasoning_effort),
        tools=get_available_tools(model_name=model_name, groups=agent_config.tool_groups if agent_config else None, subagent_enabled=subagent_enabled),
        middleware=_build_middlewares(config, model_name=model_name, agent_name=agent_name),
-        system_prompt=apply_prompt_template(subagent_enabled=subagent_enabled, max_concurrent_subagents=max_concurrent_subagents, agent_name=agent_name),
+        system_prompt=apply_prompt_template(
+            subagent_enabled=subagent_enabled, max_concurrent_subagents=max_concurrent_subagents, agent_name=agent_name, available_skills=set(agent_config.skills) if agent_config and agent_config.skills is not None else None
+        ),
        state_schema=ThreadState,
    )
@@ -1,5 +1,6 @@
 import logging
 from datetime import datetime
+from functools import lru_cache

 from deerflow.config.agents_config import load_agent_soul
 from deerflow.skills import load_skills
@@ -8,6 +9,38 @@ from deerflow.subagents import get_available_subagent_names
 logger = logging.getLogger(__name__)


+def _get_enabled_skills():
+    try:
+        return list(load_skills(enabled_only=True))
+    except Exception:
+        logger.exception("Failed to load enabled skills for prompt injection")
+        return []
+
+
+def _skill_mutability_label(category: str) -> str:
+    return "[custom, editable]" if category == "custom" else "[built-in]"
+
+
+def clear_skills_system_prompt_cache() -> None:
+    _get_cached_skills_prompt_section.cache_clear()
+
+
+def _build_skill_evolution_section(skill_evolution_enabled: bool) -> str:
+    if not skill_evolution_enabled:
+        return ""
+    return """
+## Skill Self-Evolution
+After completing a task, consider creating or updating a skill when:
+- The task required 5+ tool calls to resolve
+- You overcame non-obvious errors or pitfalls
+- The user corrected your approach and the corrected version worked
+- You discovered a non-trivial, recurring workflow
+If you used a skill and encountered issues not covered by it, patch it immediately.
+Prefer patch over edit. Before creating a new skill, confirm with the user first.
+Skip simple one-off tasks.
+"""
+
+
 def _build_subagent_section(max_concurrent: int) -> str:
    """Build the subagent system prompt section with dynamic concurrency limit.

@@ -380,33 +413,21 @@ def _get_memory_context(agent_name: str | None = None) -> str:
        return ""


-def get_skills_prompt_section(available_skills: set[str] | None = None) -> str:
-    """Generate the skills prompt section with available skills list.
-
-    Returns the <skill_system>...</skill_system> block listing all enabled skills,
-    suitable for injection into any agent's system prompt.
-    """
-    skills = load_skills(enabled_only=True)
-
-    try:
-        from deerflow.config import get_app_config
-
-        config = get_app_config()
-        container_base_path = config.skills.container_path
-    except Exception:
-        container_base_path = "/mnt/skills"
-
-    if not skills:
-        return ""
-
-    if available_skills is not None:
-        skills = [skill for skill in skills if skill.name in available_skills]
-
-    skill_items = "\n".join(
-        f"    <skill>\n        <name>{skill.name}</name>\n        <description>{skill.description}</description>\n        <location>{skill.get_container_file_path(container_base_path)}</location>\n    </skill>" for skill in skills
-    )
-    skills_list = f"<available_skills>\n{skill_items}\n</available_skills>"
-
+@lru_cache(maxsize=32)
+def _get_cached_skills_prompt_section(
+    skill_signature: tuple[tuple[str, str, str, str], ...],
+    available_skills_key: tuple[str, ...] | None,
+    container_base_path: str,
+    skill_evolution_section: str,
+) -> str:
+    filtered = [(name, description, category, location) for name, description, category, location in skill_signature if available_skills_key is None or name in available_skills_key]
+    skills_list = ""
+    if filtered:
+        skill_items = "\n".join(
+            f"    <skill>\n        <name>{name}</name>\n        <description>{description} {_skill_mutability_label(category)}</description>\n        <location>{location}</location>\n    </skill>"
+            for name, description, category, location in filtered
+        )
+        skills_list = f"<available_skills>\n{skill_items}\n</available_skills>"
    return f"""<skill_system>
 You have access to skills that provide optimized workflows for specific tasks. Each skill contains best practices, frameworks, and references to additional resources.

@@ -418,12 +439,40 @@ You have access to skills that provide optimized workflows for specific tasks. E
 5. Follow the skill's instructions precisely

 **Skills are located at:** {container_base_path}
-
+{skill_evolution_section}
 {skills_list}

 </skill_system>"""


+def get_skills_prompt_section(available_skills: set[str] | None = None) -> str:
+    """Generate the skills prompt section with available skills list."""
+    skills = _get_enabled_skills()
+
+    try:
+        from deerflow.config import get_app_config
+
+        config = get_app_config()
+        container_base_path = config.skills.container_path
+        skill_evolution_enabled = config.skill_evolution.enabled
+    except Exception:
+        container_base_path = "/mnt/skills"
+        skill_evolution_enabled = False
+
+    if not skills and not skill_evolution_enabled:
+        return ""
+
+    if available_skills is not None and not any(skill.name in available_skills for skill in skills):
+        return ""
+
+    skill_signature = tuple((skill.name, skill.description, skill.category, skill.get_container_file_path(container_base_path)) for skill in skills)
+    available_key = tuple(sorted(available_skills)) if available_skills is not None else None
+    if not skill_signature and available_key is not None:
+        return ""
+    skill_evolution_section = _build_skill_evolution_section(skill_evolution_enabled)
+    return _get_cached_skills_prompt_section(skill_signature, available_key, container_base_path, skill_evolution_section)
+
+
 def get_agent_soul(agent_name: str | None) -> str:
    # Append SOUL.md (agent personality) if present
    soul = load_agent_soul(agent_name)
@@ -446,7 +495,7 @@ def get_deferred_tools_prompt_section() -> str:

        if not get_app_config().tool_search.enabled:
            return ""
-    except FileNotFoundError:
+    except Exception:
        return ""

    registry = get_deferred_registry()
@@ -29,6 +29,17 @@ Instructions:
 2. Extract relevant facts, preferences, and context with specific details (numbers, names, technologies)
 3. Update the memory sections as needed following the detailed length guidelines below

+Before extracting facts, perform a structured reflection on the conversation:
+1. Error/Retry Detection: Did the agent encounter errors, require retries, or produce incorrect results?
+   If yes, record the root cause and correct approach as a high-confidence fact with category "correction".
+2. User Correction Detection: Did the user correct the agent's direction, understanding, or output?
+   If yes, record the correct interpretation or approach as a high-confidence fact with category "correction".
+   Include what went wrong in "sourceError" only when category is "correction" and the mistake is explicit in the conversation.
+3. Project Constraint Discovery: Were any project-specific constraints discovered during the conversation?
+   If yes, record them as facts with the most appropriate category and confidence.
+
+{correction_hint}
+
 Memory Section Guidelines:

 **User Context** (Current state - concise summaries):
@@ -62,6 +73,7 @@ Memory Section Guidelines:
  * context: Background facts (job title, projects, locations, languages)
  * behavior: Working patterns, communication habits, problem-solving approaches
  * goal: Stated objectives, learning targets, project ambitions
+  * correction: Explicit agent mistakes or user corrections, including the correct approach
 - Confidence levels:
  * 0.9-1.0: Explicitly stated facts ("I work on X", "My role is Y")
  * 0.7-0.8: Strongly implied from actions/discussions
@@ -94,7 +106,7 @@ Output Format (JSON):
    "longTermBackground": {{ "summary": "...", "shouldUpdate": true/false }}
  }},
  "newFacts": [
-    {{ "content": "...", "category": "preference|knowledge|context|behavior|goal", "confidence": 0.0-1.0 }}
+    {{ "content": "...", "category": "preference|knowledge|context|behavior|goal|correction", "confidence": 0.0-1.0 }}
  ],
  "factsToRemove": ["fact_id_1", "fact_id_2"]
 }}
@@ -104,6 +116,8 @@ Important Rules:
 - Follow length guidelines: workContext/personalContext are concise (1-3 sentences), topOfMind and history sections are detailed (paragraphs)
 - Include specific metrics, version numbers, and proper nouns in facts
 - Only add facts that are clearly stated (0.9+) or strongly implied (0.7+)
+- Use category "correction" for explicit agent mistakes or user corrections; assign confidence >= 0.95 when the correction is explicit
+- Include "sourceError" only for explicit correction facts when the prior mistake or wrong approach is clearly stated; omit it otherwise
 - Remove facts that are contradicted by new information
 - When updating topOfMind, integrate new focus areas while removing completed/abandoned ones
  Keep 3-5 concurrent focus themes that are still active and relevant
@@ -126,7 +140,7 @@ Message:
 Extract facts in this JSON format:
 {{
  "facts": [
-    {{ "content": "...", "category": "preference|knowledge|context|behavior|goal", "confidence": 0.0-1.0 }}
+    {{ "content": "...", "category": "preference|knowledge|context|behavior|goal|correction", "confidence": 0.0-1.0 }}
  ]
 }}

@@ -136,6 +150,7 @@ Categories:
 - context: Background context (location, job, projects)
 - behavior: Behavioral patterns
 - goal: User's goals or objectives
+- correction: Explicit corrections or mistakes to avoid repeating

 Rules:
 - Only extract clear, specific facts
@@ -231,6 +246,10 @@ def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2
        if earlier.get("summary"):
            history_sections.append(f"Earlier: {earlier['summary']}")

+        background = history_data.get("longTermBackground", {})
+        if background.get("summary"):
+            history_sections.append(f"Background: {background['summary']}")
+
        if history_sections:
            sections.append("History:\n" + "\n".join(f"- {s}" for s in history_sections))

@@ -262,7 +281,11 @@ def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2
                continue
            category = str(fact.get("category", "context")).strip() or "context"
            confidence = _coerce_confidence(fact.get("confidence"), default=0.0)
-            line = f"- [{category} | {confidence:.2f}] {content}"
+            source_error = fact.get("sourceError")
+            if category == "correction" and isinstance(source_error, str) and source_error.strip():
+                line = f"- [{category} | {confidence:.2f}] {content} (avoid: {source_error.strip()})"
+            else:
+                line = f"- [{category} | {confidence:.2f}] {content}"

            # Each additional line is preceded by a newline (except the first).
            line_text = ("\n" + line) if fact_lines else line
@@ -20,6 +20,8 @@ class ConversationContext:
    messages: list[Any]
    timestamp: datetime = field(default_factory=datetime.utcnow)
    agent_name: str | None = None
+    correction_detected: bool = False
+    reinforcement_detected: bool = False


 class MemoryUpdateQueue:
@@ -37,25 +39,42 @@ class MemoryUpdateQueue:
        self._timer: threading.Timer | None = None
        self._processing = False

-    def add(self, thread_id: str, messages: list[Any], agent_name: str | None = None) -> None:
+    def add(
+        self,
+        thread_id: str,
+        messages: list[Any],
+        agent_name: str | None = None,
+        correction_detected: bool = False,
+        reinforcement_detected: bool = False,
+    ) -> None:
        """Add a conversation to the update queue.

        Args:
            thread_id: The thread ID.
            messages: The conversation messages.
            agent_name: If provided, memory is stored per-agent. If None, uses global memory.
+            correction_detected: Whether recent turns include an explicit correction signal.
+            reinforcement_detected: Whether recent turns include a positive reinforcement signal.
        """
        config = get_memory_config()
        if not config.enabled:
            return

-        context = ConversationContext(
-            thread_id=thread_id,
-            messages=messages,
-            agent_name=agent_name,
-        )
-
        with self._lock:
+            existing_context = next(
+                (context for context in self._queue if context.thread_id == thread_id),
+                None,
+            )
+            merged_correction_detected = correction_detected or (existing_context.correction_detected if existing_context is not None else False)
+            merged_reinforcement_detected = reinforcement_detected or (existing_context.reinforcement_detected if existing_context is not None else False)
+            context = ConversationContext(
+                thread_id=thread_id,
+                messages=messages,
+                agent_name=agent_name,
+                correction_detected=merged_correction_detected,
+                reinforcement_detected=merged_reinforcement_detected,
+            )
+
            # Check if this thread already has a pending update
            # If so, replace it with the newer one
            self._queue = [c for c in self._queue if c.thread_id != thread_id]
@@ -115,6 +134,8 @@ class MemoryUpdateQueue:
                        messages=context.messages,
                        thread_id=context.thread_id,
                        agent_name=context.agent_name,
+                        correction_detected=context.correction_detected,
+                        reinforcement_detected=context.reinforcement_detected,
                    )
                    if success:
                        logger.info("Memory updated successfully for thread %s", context.thread_id)
@@ -246,7 +246,7 @@ def _fact_content_key(content: Any) -> str | None:
    stripped = content.strip()
    if not stripped:
        return None
-    return stripped
+    return stripped.casefold()


 class MemoryUpdater:
@@ -266,13 +266,22 @@ class MemoryUpdater:
        model_name = self._model_name or config.model_name
        return create_chat_model(name=model_name, thinking_enabled=False)

-    def update_memory(self, messages: list[Any], thread_id: str | None = None, agent_name: str | None = None) -> bool:
+    def update_memory(
+        self,
+        messages: list[Any],
+        thread_id: str | None = None,
+        agent_name: str | None = None,
+        correction_detected: bool = False,
+        reinforcement_detected: bool = False,
+    ) -> bool:
        """Update memory based on conversation messages.

        Args:
            messages: List of conversation messages.
            thread_id: Optional thread ID for tracking source.
            agent_name: If provided, updates per-agent memory. If None, updates global memory.
+            correction_detected: Whether recent turns include an explicit correction signal.
+            reinforcement_detected: Whether recent turns include a positive reinforcement signal.

        Returns:
            True if update was successful, False otherwise.
@@ -295,9 +304,27 @@ class MemoryUpdater:
                return False

            # Build prompt
+            correction_hint = ""
+            if correction_detected:
+                correction_hint = (
+                    "IMPORTANT: Explicit correction signals were detected in this conversation. "
+                    "Pay special attention to what the agent got wrong, what the user corrected, "
+                    "and record the correct approach as a fact with category "
+                    '"correction" and confidence >= 0.95 when appropriate.'
+                )
+            if reinforcement_detected:
+                reinforcement_hint = (
+                    "IMPORTANT: Positive reinforcement signals were detected in this conversation. "
+                    "The user explicitly confirmed the agent's approach was correct or helpful. "
+                    "Record the confirmed approach, style, or preference as a fact with category "
+                    '"preference" or "behavior" and confidence >= 0.9 when appropriate.'
+                )
+                correction_hint = (correction_hint + "\n" + reinforcement_hint).strip() if correction_hint else reinforcement_hint
+
            prompt = MEMORY_UPDATE_PROMPT.format(
                current_memory=json.dumps(current_memory, indent=2),
                conversation=conversation_text,
+                correction_hint=correction_hint,
            )

            # Call LLM
@@ -383,6 +410,8 @@ class MemoryUpdater:
            confidence = fact.get("confidence", 0.5)
            if confidence >= config.fact_confidence_threshold:
                raw_content = fact.get("content", "")
+                if not isinstance(raw_content, str):
+                    continue
                normalized_content = raw_content.strip()
                fact_key = _fact_content_key(normalized_content)
                if fact_key is not None and fact_key in existing_fact_keys:
@@ -396,6 +425,11 @@ class MemoryUpdater:
                    "createdAt": now,
                    "source": thread_id or "unknown",
                }
+                source_error = fact.get("sourceError")
+                if isinstance(source_error, str):
+                    normalized_source_error = source_error.strip()
+                    if normalized_source_error:
+                        fact_entry["sourceError"] = normalized_source_error
                current_memory["facts"].append(fact_entry)
                if fact_key is not None:
                    existing_fact_keys.add(fact_key)
@@ -412,16 +446,24 @@ class MemoryUpdater:
        return current_memory


-def update_memory_from_conversation(messages: list[Any], thread_id: str | None = None, agent_name: str | None = None) -> bool:
+def update_memory_from_conversation(
+    messages: list[Any],
+    thread_id: str | None = None,
+    agent_name: str | None = None,
+    correction_detected: bool = False,
+    reinforcement_detected: bool = False,
+) -> bool:
    """Convenience function to update memory from a conversation.

    Args:
        messages: List of conversation messages.
        thread_id: Optional thread ID.
        agent_name: If provided, updates per-agent memory. If None, updates global memory.
+        correction_detected: Whether recent turns include an explicit correction signal.
+        reinforcement_detected: Whether recent turns include a positive reinforcement signal.

    Returns:
        True if successful, False otherwise.
    """
    updater = MemoryUpdater()
-    return updater.update_memory(messages, thread_id, agent_name)
+    return updater.update_memory(messages, thread_id, agent_name, correction_detected, reinforcement_detected)
@@ -0,0 +1,275 @@
+"""LLM error handling middleware with retry/backoff and user-facing fallbacks."""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import time
+from collections.abc import Awaitable, Callable
+from email.utils import parsedate_to_datetime
+from typing import Any, override
+
+from langchain.agents import AgentState
+from langchain.agents.middleware import AgentMiddleware
+from langchain.agents.middleware.types import (
+    ModelCallResult,
+    ModelRequest,
+    ModelResponse,
+)
+from langchain_core.messages import AIMessage
+from langgraph.errors import GraphBubbleUp
+
+logger = logging.getLogger(__name__)
+
+_RETRIABLE_STATUS_CODES = {408, 409, 425, 429, 500, 502, 503, 504}
+_BUSY_PATTERNS = (
+    "server busy",
+    "temporarily unavailable",
+    "try again later",
+    "please retry",
+    "please try again",
+    "overloaded",
+    "high demand",
+    "rate limit",
+    "负载较高",
+    "服务繁忙",
+    "稍后重试",
+    "请稍后重试",
+)
+_QUOTA_PATTERNS = (
+    "insufficient_quota",
+    "quota",
+    "billing",
+    "credit",
+    "payment",
+    "余额不足",
+    "超出限额",
+    "额度不足",
+    "欠费",
+)
+_AUTH_PATTERNS = (
+    "authentication",
+    "unauthorized",
+    "invalid api key",
+    "invalid_api_key",
+    "permission",
+    "forbidden",
+    "access denied",
+    "无权",
+    "未授权",
+)
+
+
+class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
+    """Retry transient LLM errors and surface graceful assistant messages."""
+
+    retry_max_attempts: int = 3
+    retry_base_delay_ms: int = 1000
+    retry_cap_delay_ms: int = 8000
+
+    def _classify_error(self, exc: BaseException) -> tuple[bool, str]:
+        detail = _extract_error_detail(exc)
+        lowered = detail.lower()
+        error_code = _extract_error_code(exc)
+        status_code = _extract_status_code(exc)
+
+        if _matches_any(lowered, _QUOTA_PATTERNS) or _matches_any(str(error_code).lower(), _QUOTA_PATTERNS):
+            return False, "quota"
+        if _matches_any(lowered, _AUTH_PATTERNS):
+            return False, "auth"
+
+        exc_name = exc.__class__.__name__
+        if exc_name in {
+            "APITimeoutError",
+            "APIConnectionError",
+            "InternalServerError",
+        }:
+            return True, "transient"
+        if status_code in _RETRIABLE_STATUS_CODES:
+            return True, "transient"
+        if _matches_any(lowered, _BUSY_PATTERNS):
+            return True, "busy"
+
+        return False, "generic"
+
+    def _build_retry_delay_ms(self, attempt: int, exc: BaseException) -> int:
+        retry_after = _extract_retry_after_ms(exc)
+        if retry_after is not None:
+            return retry_after
+        backoff = self.retry_base_delay_ms * (2 ** max(0, attempt - 1))
+        return min(backoff, self.retry_cap_delay_ms)
+
+    def _build_retry_message(self, attempt: int, wait_ms: int, reason: str) -> str:
+        seconds = max(1, round(wait_ms / 1000))
+        reason_text = "provider is busy" if reason == "busy" else "provider request failed temporarily"
+        return f"LLM request retry {attempt}/{self.retry_max_attempts}: {reason_text}. Retrying in {seconds}s."
+
+    def _build_user_message(self, exc: BaseException, reason: str) -> str:
+        detail = _extract_error_detail(exc)
+        if reason == "quota":
+            return "The configured LLM provider rejected the request because the account is out of quota, billing is unavailable, or usage is restricted. Please fix the provider account and try again."
+        if reason == "auth":
+            return "The configured LLM provider rejected the request because authentication or access is invalid. Please check the provider credentials and try again."
+        if reason in {"busy", "transient"}:
+            return "The configured LLM provider is temporarily unavailable after multiple retries. Please wait a moment and continue the conversation."
+        return f"LLM request failed: {detail}"
+
+    def _emit_retry_event(self, attempt: int, wait_ms: int, reason: str) -> None:
+        try:
+            from langgraph.config import get_stream_writer
+
+            writer = get_stream_writer()
+            writer(
+                {
+                    "type": "llm_retry",
+                    "attempt": attempt,
+                    "max_attempts": self.retry_max_attempts,
+                    "wait_ms": wait_ms,
+                    "reason": reason,
+                    "message": self._build_retry_message(attempt, wait_ms, reason),
+                }
+            )
+        except Exception:
+            logger.debug("Failed to emit llm_retry event", exc_info=True)
+
+    @override
+    def wrap_model_call(
+        self,
+        request: ModelRequest,
+        handler: Callable[[ModelRequest], ModelResponse],
+    ) -> ModelCallResult:
+        attempt = 1
+        while True:
+            try:
+                return handler(request)
+            except GraphBubbleUp:
+                # Preserve LangGraph control-flow signals (interrupt/pause/resume).
+                raise
+            except Exception as exc:
+                retriable, reason = self._classify_error(exc)
+                if retriable and attempt < self.retry_max_attempts:
+                    wait_ms = self._build_retry_delay_ms(attempt, exc)
+                    logger.warning(
+                        "Transient LLM error on attempt %d/%d; retrying in %dms: %s",
+                        attempt,
+                        self.retry_max_attempts,
+                        wait_ms,
+                        _extract_error_detail(exc),
+                    )
+                    self._emit_retry_event(attempt, wait_ms, reason)
+                    time.sleep(wait_ms / 1000)
+                    attempt += 1
+                    continue
+                logger.warning(
+                    "LLM call failed after %d attempt(s): %s",
+                    attempt,
+                    _extract_error_detail(exc),
+                    exc_info=exc,
+                )
+                return AIMessage(content=self._build_user_message(exc, reason))
+
+    @override
+    async def awrap_model_call(
+        self,
+        request: ModelRequest,
+        handler: Callable[[ModelRequest], Awaitable[ModelResponse]],
+    ) -> ModelCallResult:
+        attempt = 1
+        while True:
+            try:
+                return await handler(request)
+            except GraphBubbleUp:
+                # Preserve LangGraph control-flow signals (interrupt/pause/resume).
+                raise
+            except Exception as exc:
+                retriable, reason = self._classify_error(exc)
+                if retriable and attempt < self.retry_max_attempts:
+                    wait_ms = self._build_retry_delay_ms(attempt, exc)
+                    logger.warning(
+                        "Transient LLM error on attempt %d/%d; retrying in %dms: %s",
+                        attempt,
+                        self.retry_max_attempts,
+                        wait_ms,
+                        _extract_error_detail(exc),
+                    )
+                    self._emit_retry_event(attempt, wait_ms, reason)
+                    await asyncio.sleep(wait_ms / 1000)
+                    attempt += 1
+                    continue
+                logger.warning(
+                    "LLM call failed after %d attempt(s): %s",
+                    attempt,
+                    _extract_error_detail(exc),
+                    exc_info=exc,
+                )
+                return AIMessage(content=self._build_user_message(exc, reason))
+
+
+def _matches_any(detail: str, patterns: tuple[str, ...]) -> bool:
+    return any(pattern in detail for pattern in patterns)
+
+
+def _extract_error_code(exc: BaseException) -> Any:
+    for attr in ("code", "error_code"):
+        value = getattr(exc, attr, None)
+        if value not in (None, ""):
+            return value
+
+    body = getattr(exc, "body", None)
+    if isinstance(body, dict):
+        error = body.get("error")
+        if isinstance(error, dict):
+            for key in ("code", "type"):
+                value = error.get(key)
+                if value not in (None, ""):
+                    return value
+    return None
+
+
+def _extract_status_code(exc: BaseException) -> int | None:
+    for attr in ("status_code", "status"):
+        value = getattr(exc, attr, None)
+        if isinstance(value, int):
+            return value
+    response = getattr(exc, "response", None)
+    status = getattr(response, "status_code", None)
+    return status if isinstance(status, int) else None
+
+
+def _extract_retry_after_ms(exc: BaseException) -> int | None:
+    response = getattr(exc, "response", None)
+    headers = getattr(response, "headers", None)
+    if headers is None:
+        return None
+
+    raw = None
+    header_name = ""
+    for key in ("retry-after-ms", "Retry-After-Ms", "retry-after", "Retry-After"):
+        header_name = key
+        if hasattr(headers, "get"):
+            raw = headers.get(key)
+        if raw:
+            break
+    if not raw:
+        return None
+
+    try:
+        multiplier = 1 if "ms" in header_name.lower() else 1000
+        return max(0, int(float(raw) * multiplier))
+    except (TypeError, ValueError):
+        try:
+            target = parsedate_to_datetime(str(raw))
+            delta = target.timestamp() - time.time()
+            return max(0, int(delta * 1000))
+        except (TypeError, ValueError, OverflowError):
+            return None
+
+
+def _extract_error_detail(exc: BaseException) -> str:
+    detail = str(exc).strip()
+    if detail:
+        return detail
+    message = getattr(exc, "message", None)
+    if isinstance(message, str) and message.strip():
+        return message.strip()
+    return exc.__class__.__name__
@@ -182,6 +182,23 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):

        return None, False

+    @staticmethod
+    def _append_text(content: str | list | None, text: str) -> str | list:
+        """Append *text* to AIMessage content, handling str, list, and None.
+
+        When content is a list of content blocks (e.g. Anthropic thinking mode),
+        we append a new ``{"type": "text", ...}`` block instead of concatenating
+        a string to a list, which would raise ``TypeError``.
+        """
+        if content is None:
+            return text
+        if isinstance(content, list):
+            return [*content, {"type": "text", "text": f"\n\n{text}"}]
+        if isinstance(content, str):
+            return content + f"\n\n{text}"
+        # Fallback: coerce unexpected types to str to avoid TypeError
+        return str(content) + f"\n\n{text}"
+
    def _apply(self, state: AgentState, runtime: Runtime) -> dict | None:
        warning, hard_stop = self._track_and_check(state, runtime)

@@ -192,7 +209,7 @@ class LoopDetectionMiddleware(AgentMiddleware[AgentState]):
            stripped_msg = last_msg.model_copy(
                update={
                    "tool_calls": [],
-                    "content": (last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}",
+                    "content": self._append_text(last_msg.content, _HARD_STOP_MSG),
                }
            )
            return {"messages": [stripped_msg]}
@@ -14,6 +14,37 @@ from deerflow.config.memory_config import get_memory_config

 logger = logging.getLogger(__name__)

+_UPLOAD_BLOCK_RE = re.compile(r"<uploaded_files>[\s\S]*?</uploaded_files>\n*", re.IGNORECASE)
+_CORRECTION_PATTERNS = (
+    re.compile(r"\bthat(?:'s| is) (?:wrong|incorrect)\b", re.IGNORECASE),
+    re.compile(r"\byou misunderstood\b", re.IGNORECASE),
+    re.compile(r"\btry again\b", re.IGNORECASE),
+    re.compile(r"\bredo\b", re.IGNORECASE),
+    re.compile(r"不对"),
+    re.compile(r"你理解错了"),
+    re.compile(r"你理解有误"),
+    re.compile(r"重试"),
+    re.compile(r"重新来"),
+    re.compile(r"换一种"),
+    re.compile(r"改用"),
+)
+
+_REINFORCEMENT_PATTERNS = (
+    re.compile(r"\byes[,.]?\s+(?:exactly|perfect|that(?:'s| is) (?:right|correct|it))\b", re.IGNORECASE),
+    re.compile(r"\bperfect(?:[.!?]|$)", re.IGNORECASE),
+    re.compile(r"\bexactly\s+(?:right|correct)\b", re.IGNORECASE),
+    re.compile(r"\bthat(?:'s| is)\s+(?:exactly\s+)?(?:right|correct|what i (?:wanted|needed|meant))\b", re.IGNORECASE),
+    re.compile(r"\bkeep\s+(?:doing\s+)?that\b", re.IGNORECASE),
+    re.compile(r"\bjust\s+(?:like\s+)?(?:that|this)\b", re.IGNORECASE),
+    re.compile(r"\bthis is (?:great|helpful)\b(?:[.!?]|$)", re.IGNORECASE),
+    re.compile(r"\bthis is what i wanted\b(?:[.!?]|$)", re.IGNORECASE),
+    re.compile(r"对[，,]?\s*就是这样(?:[。！？!?.]|$)"),
+    re.compile(r"完全正确(?:[。！？!?.]|$)"),
+    re.compile(r"(?:对[，,]?\s*)?就是这个意思(?:[。！？!?.]|$)"),
+    re.compile(r"正是我想要的(?:[。！？!?.]|$)"),
+    re.compile(r"继续保持(?:[。！？!?.]|$)"),
+)
+

 class MemoryMiddlewareState(AgentState):
    """Compatible with the `ThreadState` schema."""
@@ -21,6 +52,22 @@ class MemoryMiddlewareState(AgentState):
    pass


+def _extract_message_text(message: Any) -> str:
+    """Extract plain text from message content for filtering and signal detection."""
+    content = getattr(message, "content", "")
+    if isinstance(content, list):
+        text_parts: list[str] = []
+        for part in content:
+            if isinstance(part, str):
+                text_parts.append(part)
+            elif isinstance(part, dict):
+                text_val = part.get("text")
+                if isinstance(text_val, str):
+                    text_parts.append(text_val)
+        return " ".join(text_parts)
+    return str(content)
+
+
 def _filter_messages_for_memory(messages: list[Any]) -> list[Any]:
    """Filter messages to keep only user inputs and final assistant responses.

@@ -44,18 +91,13 @@ def _filter_messages_for_memory(messages: list[Any]) -> list[Any]:
    Returns:
        Filtered list containing only user inputs and final assistant responses.
    """
-    _UPLOAD_BLOCK_RE = re.compile(r"<uploaded_files>[\s\S]*?</uploaded_files>\n*", re.IGNORECASE)
-
    filtered = []
    skip_next_ai = False
    for msg in messages:
        msg_type = getattr(msg, "type", None)

        if msg_type == "human":
-            content = getattr(msg, "content", "")
-            if isinstance(content, list):
-                content = " ".join(p.get("text", "") for p in content if isinstance(p, dict))
-            content_str = str(content)
+            content_str = _extract_message_text(msg)
            if "<uploaded_files>" in content_str:
                # Strip the ephemeral upload block; keep the user's real question.
                stripped = _UPLOAD_BLOCK_RE.sub("", content_str).strip()
@@ -87,6 +129,48 @@ def _filter_messages_for_memory(messages: list[Any]) -> list[Any]:
    return filtered


+def detect_correction(messages: list[Any]) -> bool:
+    """Detect explicit user corrections in recent conversation turns.
+
+    The queue keeps only one pending context per thread, so callers pass the
+    latest filtered message list. Checking only recent user turns keeps signal
+    detection conservative while avoiding stale corrections from long histories.
+    """
+    recent_user_msgs = [msg for msg in messages[-6:] if getattr(msg, "type", None) == "human"]
+
+    for msg in recent_user_msgs:
+        content = _extract_message_text(msg).strip()
+        if not content:
+            continue
+        if any(pattern.search(content) for pattern in _CORRECTION_PATTERNS):
+            return True
+
+    return False
+
+
+def detect_reinforcement(messages: list[Any]) -> bool:
+    """Detect explicit positive reinforcement signals in recent conversation turns.
+
+    Complements detect_correction() by identifying when the user confirms the
+    agent's approach was correct. This allows the memory system to record what
+    worked well, not just what went wrong.
+
+    The queue keeps only one pending context per thread, so callers pass the
+    latest filtered message list. Checking only recent user turns keeps signal
+    detection conservative while avoiding stale signals from long histories.
+    """
+    recent_user_msgs = [msg for msg in messages[-6:] if getattr(msg, "type", None) == "human"]
+
+    for msg in recent_user_msgs:
+        content = _extract_message_text(msg).strip()
+        if not content:
+            continue
+        if any(pattern.search(content) for pattern in _REINFORCEMENT_PATTERNS):
+            return True
+
+    return False
+
+
 class MemoryMiddleware(AgentMiddleware[MemoryMiddlewareState]):
    """Middleware that queues conversation for memory update after agent execution.

@@ -150,7 +234,15 @@ class MemoryMiddleware(AgentMiddleware[MemoryMiddlewareState]):
            return None

        # Queue the filtered conversation for memory update
+        correction_detected = detect_correction(filtered_messages)
+        reinforcement_detected = not correction_detected and detect_reinforcement(filtered_messages)
        queue = get_memory_queue()
-        queue.add(thread_id=thread_id, messages=filtered_messages, agent_name=self._agent_name)
+        queue.add(
+            thread_id=thread_id,
+            messages=filtered_messages,
+            agent_name=self._agent_name,
+            correction_detected=correction_detected,
+            reinforcement_detected=reinforcement_detected,
+        )

        return None
@@ -105,11 +105,16 @@ class SandboxAuditMiddleware(AgentMiddleware[ThreadState]):
            thread_id = cfg.get("configurable", {}).get("thread_id")
        return thread_id

-    def _write_audit(self, thread_id: str | None, command: str, verdict: str) -> None:
+    _AUDIT_COMMAND_LIMIT = 200
+
+    def _write_audit(self, thread_id: str | None, command: str, verdict: str, *, truncate: bool = False) -> None:
+        audited_command = command
+        if truncate and len(command) > self._AUDIT_COMMAND_LIMIT:
+            audited_command = f"{command[: self._AUDIT_COMMAND_LIMIT]}... ({len(command)} chars)"
        record = {
            "timestamp": datetime.now(UTC).isoformat(),
            "thread_id": thread_id or "unknown",
-            "command": command,
+            "command": audited_command,
            "verdict": verdict,
        }
        logger.info("[SandboxAudit] %s", json.dumps(record, ensure_ascii=False))
@@ -139,23 +144,52 @@ class SandboxAuditMiddleware(AgentMiddleware[ThreadState]):
            status=result.status,
        )

+    # ------------------------------------------------------------------
+    # Input sanitisation
+    # ------------------------------------------------------------------
+
+    # Normal bash commands rarely exceed a few hundred characters.  10 000 is
+    # well above any legitimate use case yet a tiny fraction of Linux ARG_MAX.
+    # Anything longer is almost certainly a payload injection or base64-encoded
+    # attack string.
+    _MAX_COMMAND_LENGTH = 10_000
+
+    def _validate_input(self, command: str) -> str | None:
+        """Return ``None`` if *command* is acceptable, else a rejection reason."""
+        if not command.strip():
+            return "empty command"
+        if len(command) > self._MAX_COMMAND_LENGTH:
+            return "command too long"
+        if "\x00" in command:
+            return "null byte detected"
+        return None
+
    # ------------------------------------------------------------------
    # Core logic (shared between sync and async paths)
    # ------------------------------------------------------------------

-    def _pre_process(self, request: ToolCallRequest) -> tuple[str, str | None, str]:
+    def _pre_process(self, request: ToolCallRequest) -> tuple[str, str | None, str, str | None]:
        """
-        Returns (command, thread_id, verdict).
+        Returns (command, thread_id, verdict, reject_reason).
        verdict is 'block', 'warn', or 'pass'.
+        reject_reason is non-None only for input sanitisation rejections.
        """
        args = request.tool_call.get("args", {})
-        command: str = args.get("command", "")
+        raw_command = args.get("command")
+        command = raw_command if isinstance(raw_command, str) else ""
        thread_id = self._get_thread_id(request)

-        # ① classify command
+        # ① input sanitisation — reject malformed input before regex analysis
+        reject_reason = self._validate_input(command)
+        if reject_reason:
+            self._write_audit(thread_id, command, "block", truncate=True)
+            logger.warning("[SandboxAudit] INVALID INPUT thread=%s reason=%s", thread_id, reject_reason)
+            return command, thread_id, "block", reject_reason
+
+        # ② classify command
        verdict = _classify_command(command)

-        # ② audit log
+        # ③ audit log
        self._write_audit(thread_id, command, verdict)

        if verdict == "block":
@@ -163,7 +197,7 @@ class SandboxAuditMiddleware(AgentMiddleware[ThreadState]):
        elif verdict == "warn":
            logger.warning("[SandboxAudit] WARN (medium-risk) thread=%s cmd=%r", thread_id, command)

-        return command, thread_id, verdict
+        return command, thread_id, verdict, None

    # ------------------------------------------------------------------
    # wrap_tool_call hooks
@@ -178,9 +212,10 @@ class SandboxAuditMiddleware(AgentMiddleware[ThreadState]):
        if request.tool_call.get("name") != "bash":
            return handler(request)

-        command, _, verdict = self._pre_process(request)
+        command, _, verdict, reject_reason = self._pre_process(request)
        if verdict == "block":
-            return self._build_block_message(request, "security violation detected")
+            reason = reject_reason or "security violation detected"
+            return self._build_block_message(request, reason)
        result = handler(request)
        if verdict == "warn":
            result = self._append_warn_to_result(result, command)
@@ -195,9 +230,10 @@ class SandboxAuditMiddleware(AgentMiddleware[ThreadState]):
        if request.tool_call.get("name") != "bash":
            return await handler(request)

-        command, _, verdict = self._pre_process(request)
+        command, _, verdict, reject_reason = self._pre_process(request)
        if verdict == "block":
-            return self._build_block_message(request, "security violation detected")
+            reason = reject_reason or "security violation detected"
+            return self._build_block_message(request, reason)
        result = await handler(request)
        if verdict == "warn":
            result = self._append_warn_to_result(result, command)
@@ -1,10 +1,11 @@
 """Middleware for automatic thread title generation."""

 import logging
-from typing import NotRequired, override
+from typing import Any, NotRequired, override

 from langchain.agents import AgentState
 from langchain.agents.middleware import AgentMiddleware
+from langgraph.config import get_config
 from langgraph.runtime import Runtime

 from deerflow.config.title_config import get_title_config
@@ -100,45 +101,48 @@ class TitleMiddleware(AgentMiddleware[TitleMiddlewareState]):
            return user_msg[:fallback_chars].rstrip() + "..."
        return user_msg if user_msg else "New Conversation"

+    def _get_runnable_config(self) -> dict[str, Any]:
+        """Inherit the parent RunnableConfig and add middleware tag.
+
+        This ensures RunJournal identifies LLM calls from this middleware
+        as ``middleware:title`` instead of ``lead_agent``.
+        """
+        try:
+            parent = get_config()
+        except Exception:
+            parent = {}
+        config = {**parent}
+        config["tags"] = [*(config.get("tags") or []), "middleware:title"]
+        return config
+
    def _generate_title_result(self, state: TitleMiddlewareState) -> dict | None:
-        """Synchronously generate a title. Returns state update or None."""
+        """Generate a local fallback title without blocking on an LLM call."""
        if not self._should_generate_title(state):
            return None

-        prompt, user_msg = self._build_title_prompt(state)
-        config = get_title_config()
-        model = create_chat_model(name=config.model_name, thinking_enabled=False)
-
-        try:
-            response = model.invoke(prompt)
-            title = self._parse_title(response.content)
-            if not title:
-                title = self._fallback_title(user_msg)
-        except Exception:
-            logger.exception("Failed to generate title (sync)")
-            title = self._fallback_title(user_msg)
-
-        return {"title": title}
+        _, user_msg = self._build_title_prompt(state)
+        return {"title": self._fallback_title(user_msg)}

    async def _agenerate_title_result(self, state: TitleMiddlewareState) -> dict | None:
-        """Asynchronously generate a title. Returns state update or None."""
+        """Generate a title asynchronously and fall back locally on failure."""
        if not self._should_generate_title(state):
            return None

-        prompt, user_msg = self._build_title_prompt(state)
        config = get_title_config()
-        model = create_chat_model(name=config.model_name, thinking_enabled=False)
+        prompt, user_msg = self._build_title_prompt(state)

        try:
-            response = await model.ainvoke(prompt)
+            if config.model_name:
+                model = create_chat_model(name=config.model_name, thinking_enabled=False)
+            else:
+                model = create_chat_model(thinking_enabled=False)
+            response = await model.ainvoke(prompt, config=self._get_runnable_config())
            title = self._parse_title(response.content)
-            if not title:
-                title = self._fallback_title(user_msg)
+            if title:
+                return {"title": title}
        except Exception:
-            logger.exception("Failed to generate title (async)")
-            title = self._fallback_title(user_msg)
-
-        return {"title": title}
+            logger.debug("Failed to generate async title; falling back to local title", exc_info=True)
+        return {"title": self._fallback_title(user_msg)}

    @override
    def after_model(self, state: TitleMiddlewareState, runtime: Runtime) -> dict | None:
@@ -72,6 +72,7 @@ def _build_runtime_middlewares(
    lazy_init: bool = True,
 ) -> list[AgentMiddleware]:
    """Build shared base middlewares for agent execution."""
+    from deerflow.agents.middlewares.llm_error_handling_middleware import LLMErrorHandlingMiddleware
    from deerflow.agents.middlewares.thread_data_middleware import ThreadDataMiddleware
    from deerflow.sandbox.middleware import SandboxMiddleware

@@ -90,6 +91,8 @@ def _build_runtime_middlewares(

        middlewares.append(DanglingToolCallMiddleware())

+    middlewares.append(LLMErrorHandlingMiddleware())
+
    # Guardrail middleware (if configured)
    from deerflow.config.guardrails_config import get_guardrails_config

@@ -135,6 +138,6 @@ def build_subagent_runtime_middlewares(*, lazy_init: bool = True) -> list[AgentM
    """Middlewares shared by subagent runtime before subagent-only middlewares."""
    return _build_runtime_middlewares(
        include_uploads=False,
-        include_dangling_tool_call_patch=False,
+        include_dangling_tool_call_patch=True,
        lazy_init=lazy_init,
    )
@@ -10,10 +10,52 @@ from langchain_core.messages import HumanMessage
 from langgraph.runtime import Runtime

 from deerflow.config.paths import Paths, get_paths
+from deerflow.utils.file_conversion import extract_outline

 logger = logging.getLogger(__name__)


+_OUTLINE_PREVIEW_LINES = 5
+
+
+def _extract_outline_for_file(file_path: Path) -> tuple[list[dict], list[str]]:
+    """Return the document outline and fallback preview for *file_path*.
+
+    Looks for a sibling ``<stem>.md`` file produced by the upload conversion
+    pipeline.
+
+    Returns:
+        (outline, preview) where:
+        - outline: list of ``{title, line}`` dicts (plus optional sentinel).
+          Empty when no headings are found or no .md exists.
+        - preview: first few non-empty lines of the .md, used as a content
+          anchor when outline is empty so the agent has some context.
+          Empty when outline is non-empty (no fallback needed).
+    """
+    md_path = file_path.with_suffix(".md")
+    if not md_path.is_file():
+        return [], []
+
+    outline = extract_outline(md_path)
+    if outline:
+        logger.debug("Extracted %d outline entries from %s", len(outline), file_path.name)
+        return outline, []
+
+    # outline is empty — read the first few non-empty lines as a content preview
+    preview: list[str] = []
+    try:
+        with md_path.open(encoding="utf-8") as f:
+            for line in f:
+                stripped = line.strip()
+                if stripped:
+                    preview.append(stripped)
+                if len(preview) >= _OUTLINE_PREVIEW_LINES:
+                    break
+    except Exception:
+        logger.debug("Failed to read preview lines from %s", md_path, exc_info=True)
+    return [], preview
+
+
 class UploadsMiddlewareState(AgentState):
    """State schema for uploads middleware."""

@@ -39,12 +81,38 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
        super().__init__()
        self._paths = Paths(base_dir) if base_dir else get_paths()

+    def _format_file_entry(self, file: dict, lines: list[str]) -> None:
+        """Append a single file entry (name, size, path, optional outline) to lines."""
+        size_kb = file["size"] / 1024
+        size_str = f"{size_kb:.1f} KB" if size_kb < 1024 else f"{size_kb / 1024:.1f} MB"
+        lines.append(f"- {file['filename']} ({size_str})")
+        lines.append(f"  Path: {file['path']}")
+        outline = file.get("outline") or []
+        if outline:
+            truncated = outline[-1].get("truncated", False)
+            visible = [e for e in outline if not e.get("truncated")]
+            lines.append("  Document outline (use `read_file` with line ranges to read sections):")
+            for entry in visible:
+                lines.append(f"    L{entry['line']}: {entry['title']}")
+            if truncated:
+                lines.append(f"    ... (showing first {len(visible)} headings; use `read_file` to explore further)")
+        else:
+            preview = file.get("outline_preview") or []
+            if preview:
+                lines.append("  No structural headings detected. Document begins with:")
+                for text in preview:
+                    lines.append(f"    > {text}")
+            lines.append("  Use `grep` to search for keywords (e.g. `grep(pattern='keyword', path='/mnt/user-data/uploads/')`).")
+        lines.append("")
+
    def _create_files_message(self, new_files: list[dict], historical_files: list[dict]) -> str:
        """Create a formatted message listing uploaded files.

        Args:
            new_files: Files uploaded in the current message.
            historical_files: Files uploaded in previous messages.
+                Each file dict may contain an optional ``outline`` key — a list of
+                ``{title, line}`` dicts extracted from the converted Markdown file.

        Returns:
            Formatted string inside <uploaded_files> tags.
@@ -55,25 +123,24 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
        lines.append("")
        if new_files:
            for file in new_files:
-                size_kb = file["size"] / 1024
-                size_str = f"{size_kb:.1f} KB" if size_kb < 1024 else f"{size_kb / 1024:.1f} MB"
-                lines.append(f"- {file['filename']} ({size_str})")
-                lines.append(f"  Path: {file['path']}")
-                lines.append("")
+                self._format_file_entry(file, lines)
        else:
            lines.append("(empty)")
+            lines.append("")

        if historical_files:
            lines.append("The following files were uploaded in previous messages and are still available:")
            lines.append("")
            for file in historical_files:
-                size_kb = file["size"] / 1024
-                size_str = f"{size_kb:.1f} KB" if size_kb < 1024 else f"{size_kb / 1024:.1f} MB"
-                lines.append(f"- {file['filename']} ({size_str})")
-                lines.append(f"  Path: {file['path']}")
-                lines.append("")
+                self._format_file_entry(file, lines)

-        lines.append("You can read these files using the `read_file` tool with the paths shown above.")
+        lines.append("To work with these files:")
+        lines.append("- Read from the file first — use the outline line numbers and `read_file` to locate relevant sections.")
+        lines.append("- Use `grep` to search for keywords when you are not sure which section to look at")
+        lines.append("  (e.g. `grep(pattern='revenue', path='/mnt/user-data/uploads/')`).")
+        lines.append("- Use `glob` to find files by name pattern")
+        lines.append("  (e.g. `glob(pattern='**/*.md', path='/mnt/user-data/uploads/')`).")
+        lines.append("- Only fall back to web search if the file content is clearly insufficient to answer the question.")
        lines.append("</uploaded_files>")

        return "\n".join(lines)
@@ -147,6 +214,13 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):

        # Resolve uploads directory for existence checks
        thread_id = (runtime.context or {}).get("thread_id")
+        if thread_id is None:
+            try:
+                from langgraph.config import get_config
+
+                thread_id = get_config().get("configurable", {}).get("thread_id")
+            except RuntimeError:
+                pass  # get_config() raises outside a runnable context (e.g. unit tests)
        uploads_dir = self._paths.sandbox_uploads_dir(thread_id) if thread_id else None

        # Get newly uploaded files from the current message's additional_kwargs.files
@@ -159,15 +233,26 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
            for file_path in sorted(uploads_dir.iterdir()):
                if file_path.is_file() and file_path.name not in new_filenames:
                    stat = file_path.stat()
+                    outline, preview = _extract_outline_for_file(file_path)
                    historical_files.append(
                        {
                            "filename": file_path.name,
                            "size": stat.st_size,
                            "path": f"/mnt/user-data/uploads/{file_path.name}",
                            "extension": file_path.suffix,
+                            "outline": outline,
+                            "outline_preview": preview,
                        }
                    )

+        # Attach outlines to new files as well
+        if uploads_dir:
+            for file in new_files:
+                phys_path = uploads_dir / file["filename"]
+                outline, preview = _extract_outline_for_file(phys_path)
+                file["outline"] = outline
+                file["outline_preview"] = preview
+
        if not new_files and not historical_files:
            return None

@@ -1,22 +1,19 @@
 """Middleware for injecting image details into conversation before LLM call."""

 import logging
-from typing import NotRequired, override
+from typing import override

-from langchain.agents import AgentState
 from langchain.agents.middleware import AgentMiddleware
 from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
 from langgraph.runtime import Runtime

-from deerflow.agents.thread_state import ViewedImageData
+from deerflow.agents.thread_state import ThreadState

 logger = logging.getLogger(__name__)


-class ViewImageMiddlewareState(AgentState):
-    """Compatible with the `ThreadState` schema."""
-
-    viewed_images: NotRequired[dict[str, ViewedImageData] | None]
+class ViewImageMiddlewareState(ThreadState):
+    """Reuse the thread state so reducer-backed keys keep their annotations."""


 class ViewImageMiddleware(AgentMiddleware[ViewImageMiddlewareState]):
@@ -117,6 +117,7 @@ class DeerFlowClient:
        subagent_enabled: bool = False,
        plan_mode: bool = False,
        agent_name: str | None = None,
+        available_skills: set[str] | None = None,
        middlewares: Sequence[AgentMiddleware] | None = None,
    ):
        """Initialize the client.
@@ -133,6 +134,7 @@ class DeerFlowClient:
            subagent_enabled: Enable subagent delegation.
            plan_mode: Enable TodoList middleware for plan mode.
            agent_name: Name of the agent to use.
+            available_skills: Optional set of skill names to make available. If None (default), all scanned skills are available.
            middlewares: Optional list of custom middlewares to inject into the agent.
        """
        if config_path is not None:
@@ -148,6 +150,7 @@ class DeerFlowClient:
        self._subagent_enabled = subagent_enabled
        self._plan_mode = plan_mode
        self._agent_name = agent_name
+        self._available_skills = set(available_skills) if available_skills is not None else None
        self._middlewares = list(middlewares) if middlewares else []

        # Lazy agent — created on first call, recreated when config changes.
@@ -208,6 +211,8 @@ class DeerFlowClient:
            cfg.get("thinking_enabled"),
            cfg.get("is_plan_mode"),
            cfg.get("subagent_enabled"),
+            self._agent_name,
+            frozenset(self._available_skills) if self._available_skills is not None else None,
        )

        if self._agent is not None and self._agent_config_key == key:
@@ -226,6 +231,7 @@ class DeerFlowClient:
                subagent_enabled=subagent_enabled,
                max_concurrent_subagents=max_concurrent_subagents,
                agent_name=self._agent_name,
+                available_skills=self._available_skills,
            ),
            "state_schema": ThreadState,
        }
@@ -339,6 +345,7 @@ class DeerFlowClient:
        Yields:
            StreamEvent with one of:
            - type="values"          data={"title": str|None, "messages": [...], "artifacts": [...]}
+            - type="custom"          data={...}
            - type="messages-tuple"  data={"type": "ai", "content": str, "id": str}
            - type="messages-tuple"  data={"type": "ai", "content": str, "id": str, "usage_metadata": {...}}
            - type="messages-tuple"  data={"type": "ai", "content": "", "id": str, "tool_calls": [...]}
@@ -359,7 +366,22 @@ class DeerFlowClient:
        seen_ids: set[str] = set()
        cumulative_usage: dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}

-        for chunk in self._agent.stream(state, config=config, context=context, stream_mode="values"):
+        for item in self._agent.stream(
+            state,
+            config=config,
+            context=context,
+            stream_mode=["values", "custom"],
+        ):
+            if isinstance(item, tuple) and len(item) == 2:
+                mode, chunk = item
+                mode = str(mode)
+            else:
+                mode, chunk = "values", item
+
+            if mode == "custom":
+                yield StreamEvent(type="custom", data=chunk)
+                continue
+
            messages = chunk.get("messages", [])

            for msg in messages:
@@ -7,6 +7,7 @@ import uuid
 from agent_sandbox import Sandbox as AioSandboxClient

 from deerflow.sandbox.sandbox import Sandbox
+from deerflow.sandbox.search import GrepMatch, path_matches, should_ignore_path, truncate_line

 logger = logging.getLogger(__name__)

@@ -124,16 +125,96 @@ class AioSandbox(Sandbox):
            content: The text content to write to the file.
            append: Whether to append the content to the file.
        """
-        try:
-            if append:
-                # Read existing content first and append
-                existing = self.read_file(path)
-                if not existing.startswith("Error:"):
-                    content = existing + content
-            self._client.file.write_file(file=path, content=content)
-        except Exception as e:
-            logger.error(f"Failed to write file in sandbox: {e}")
-            raise
+        with self._lock:
+            try:
+                if append:
+                    existing = self.read_file(path)
+                    if not existing.startswith("Error:"):
+                        content = existing + content
+                self._client.file.write_file(file=path, content=content)
+            except Exception as e:
+                logger.error(f"Failed to write file in sandbox: {e}")
+                raise
+
+    def glob(self, path: str, pattern: str, *, include_dirs: bool = False, max_results: int = 200) -> tuple[list[str], bool]:
+        if not include_dirs:
+            result = self._client.file.find_files(path=path, glob=pattern)
+            files = result.data.files if result.data and result.data.files else []
+            filtered = [file_path for file_path in files if not should_ignore_path(file_path)]
+            truncated = len(filtered) > max_results
+            return filtered[:max_results], truncated
+
+        result = self._client.file.list_path(path=path, recursive=True, show_hidden=False)
+        entries = result.data.files if result.data and result.data.files else []
+        matches: list[str] = []
+        root_path = path.rstrip("/") or "/"
+        root_prefix = root_path if root_path == "/" else f"{root_path}/"
+        for entry in entries:
+            if entry.path != root_path and not entry.path.startswith(root_prefix):
+                continue
+            if should_ignore_path(entry.path):
+                continue
+            rel_path = entry.path[len(root_path) :].lstrip("/")
+            if path_matches(pattern, rel_path):
+                matches.append(entry.path)
+                if len(matches) >= max_results:
+                    return matches, True
+        return matches, False
+
+    def grep(
+        self,
+        path: str,
+        pattern: str,
+        *,
+        glob: str | None = None,
+        literal: bool = False,
+        case_sensitive: bool = False,
+        max_results: int = 100,
+    ) -> tuple[list[GrepMatch], bool]:
+        import re as _re
+
+        regex_source = _re.escape(pattern) if literal else pattern
+        # Validate the pattern locally so an invalid regex raises re.error
+        # (caught by grep_tool's except re.error handler) rather than a
+        # generic remote API error.
+        _re.compile(regex_source, 0 if case_sensitive else _re.IGNORECASE)
+        regex = regex_source if case_sensitive else f"(?i){regex_source}"
+
+        if glob is not None:
+            find_result = self._client.file.find_files(path=path, glob=glob)
+            candidate_paths = find_result.data.files if find_result.data and find_result.data.files else []
+        else:
+            list_result = self._client.file.list_path(path=path, recursive=True, show_hidden=False)
+            entries = list_result.data.files if list_result.data and list_result.data.files else []
+            candidate_paths = [entry.path for entry in entries if not entry.is_directory]
+
+        matches: list[GrepMatch] = []
+        truncated = False
+
+        for file_path in candidate_paths:
+            if should_ignore_path(file_path):
+                continue
+
+            search_result = self._client.file.search_in_file(file=file_path, regex=regex)
+            data = search_result.data
+            if data is None:
+                continue
+
+            line_numbers = data.line_numbers or []
+            matched_lines = data.matches or []
+            for line_number, line in zip(line_numbers, matched_lines):
+                matches.append(
+                    GrepMatch(
+                        path=file_path,
+                        line_number=line_number if isinstance(line_number, int) else 0,
+                        line=truncate_line(line),
+                    )
+                )
+                if len(matches) >= max_results:
+                    truncated = True
+                    return matches, truncated
+
+        return matches, truncated

    def update_file(self, path: str, content: bytes) -> None:
        """Update a file with binary content in the sandbox.
@@ -142,9 +223,10 @@ class AioSandbox(Sandbox):
            path: The absolute path of the file to update.
            content: The binary content to write to the file.
        """
-        try:
-            base64_content = base64.b64encode(content).decode("utf-8")
-            self._client.file.write_file(file=path, content=base64_content, encoding="base64")
-        except Exception as e:
-            logger.error(f"Failed to update file in sandbox: {e}")
-            raise
+        with self._lock:
+            try:
+                base64_content = base64.b64encode(content).decode("utf-8")
+                self._client.file.write_file(file=path, content=base64_content, encoding="base64")
+            except Exception as e:
+                logger.error(f"Failed to update file in sandbox: {e}")
+                raise
@@ -1,13 +1,16 @@
 import logging
 import os

-import requests
+import httpx

 logger = logging.getLogger(__name__)

+_api_key_warned = False
+

 class JinaClient:
-    def crawl(self, url: str, return_format: str = "html", timeout: int = 10) -> str:
+    async def crawl(self, url: str, return_format: str = "html", timeout: int = 10) -> str:
+        global _api_key_warned
        headers = {
            "Content-Type": "application/json",
            "X-Return-Format": return_format,
@@ -15,11 +18,13 @@ class JinaClient:
        }
        if os.getenv("JINA_API_KEY"):
            headers["Authorization"] = f"Bearer {os.getenv('JINA_API_KEY')}"
-        else:
+        elif not _api_key_warned:
+            _api_key_warned = True
            logger.warning("Jina API key is not set. Provide your own key to access a higher rate limit. See https://jina.ai/reader for more information.")
        data = {"url": url}
        try:
-            response = requests.post("https://r.jina.ai/", headers=headers, json=data)
+            async with httpx.AsyncClient() as client:
+                response = await client.post("https://r.jina.ai/", headers=headers, json=data, timeout=timeout)

            if response.status_code != 200:
                error_message = f"Jina API returned status {response.status_code}: {response.text}"
@@ -34,5 +39,5 @@ class JinaClient:
            return response.text
        except Exception as e:
            error_message = f"Request to Jina API failed: {str(e)}"
-            logger.error(error_message)
+            logger.exception(error_message)
            return f"Error: {error_message}"
@@ -8,7 +8,7 @@ readability_extractor = ReadabilityExtractor()


@tool("web_fetch", parse_docstring=True)
-def web_fetch_tool(url: str) -> str:
+async def web_fetch_tool(url: str) -> str:
    """Fetch the contents of a web page at a given URL.
    Only fetch EXACT URLs that have been provided directly by the user or have been returned in results from the web_search and web_fetch tools.
    This tool can NOT access content that requires authentication, such as private Google Docs or pages behind login walls.
@@ -23,6 +23,8 @@ def web_fetch_tool(url: str) -> str:
    config = get_app_config().get_tool_config("web_fetch")
    if config is not None and "timeout" in config.model_extra:
        timeout = config.model_extra.get("timeout")
-    html_content = jina_client.crawl(url, return_format="html", timeout=timeout)
+    html_content = await jina_client.crawl(url, return_format="html", timeout=timeout)
+    if isinstance(html_content, str) and html_content.startswith("Error:"):
+        return html_content
    article = readability_extractor.extract_article(html_content)
    return article.to_markdown()[:4096]
@@ -2,11 +2,19 @@ from .app_config import get_app_config
 from .extensions_config import ExtensionsConfig, get_extensions_config
 from .memory_config import MemoryConfig, get_memory_config
 from .paths import Paths, get_paths
+from .skill_evolution_config import SkillEvolutionConfig
 from .skills_config import SkillsConfig
-from .tracing_config import get_tracing_config, is_tracing_enabled
+from .tracing_config import (
+    get_enabled_tracing_providers,
+    get_explicitly_enabled_tracing_providers,
+    get_tracing_config,
+    is_tracing_enabled,
+    validate_enabled_tracing_providers,
+)

 __all__ = [
    "get_app_config",
+    "SkillEvolutionConfig",
    "Paths",
    "get_paths",
    "SkillsConfig",
@@ -15,5 +23,8 @@ __all__ = [
    "MemoryConfig",
    "get_memory_config",
    "get_tracing_config",
+    "get_explicitly_enabled_tracing_providers",
+    "get_enabled_tracing_providers",
    "is_tracing_enabled",
+    "validate_enabled_tracing_providers",
 ]
@@ -22,6 +22,11 @@ class AgentConfig(BaseModel):
    description: str = ""
    model: str | None = None
    tool_groups: list[str] | None = None
+    # skills controls which skills are loaded into the agent's prompt:
+    # - None (or omitted): load all enabled skills (default fallback behavior)
+    # - [] (explicit empty list): disable all skills
+    # - ["skill1", "skill2"]: load only the specified skills
+    skills: list[str] | None = None


 def load_agent_config(name: str | None) -> AgentConfig | None:
@@ -1,5 +1,6 @@
 import logging
 import os
+from contextvars import ContextVar
 from pathlib import Path
 from typing import Any, Self

@@ -9,16 +10,19 @@ from pydantic import BaseModel, ConfigDict, Field

 from deerflow.config.acp_config import load_acp_config_from_dict
 from deerflow.config.checkpointer_config import CheckpointerConfig, load_checkpointer_config_from_dict
+from deerflow.config.database_config import DatabaseConfig
 from deerflow.config.extensions_config import ExtensionsConfig
-from deerflow.config.guardrails_config import load_guardrails_config_from_dict
-from deerflow.config.memory_config import load_memory_config_from_dict
+from deerflow.config.guardrails_config import GuardrailsConfig, load_guardrails_config_from_dict
+from deerflow.config.memory_config import MemoryConfig, load_memory_config_from_dict
 from deerflow.config.model_config import ModelConfig
+from deerflow.config.run_events_config import RunEventsConfig
 from deerflow.config.sandbox_config import SandboxConfig
+from deerflow.config.skill_evolution_config import SkillEvolutionConfig
 from deerflow.config.skills_config import SkillsConfig
 from deerflow.config.stream_bridge_config import StreamBridgeConfig, load_stream_bridge_config_from_dict
-from deerflow.config.subagents_config import load_subagents_config_from_dict
-from deerflow.config.summarization_config import load_summarization_config_from_dict
-from deerflow.config.title_config import load_title_config_from_dict
+from deerflow.config.subagents_config import SubagentsAppConfig, load_subagents_config_from_dict
+from deerflow.config.summarization_config import SummarizationConfig, load_summarization_config_from_dict
+from deerflow.config.title_config import TitleConfig, load_title_config_from_dict
 from deerflow.config.token_usage_config import TokenUsageConfig
 from deerflow.config.tool_config import ToolConfig, ToolGroupConfig
 from deerflow.config.tool_search_config import ToolSearchConfig, load_tool_search_config_from_dict
@@ -28,6 +32,13 @@ load_dotenv()
 logger = logging.getLogger(__name__)


+def _default_config_candidates() -> tuple[Path, ...]:
+    """Return deterministic config.yaml locations without relying on cwd."""
+    backend_dir = Path(__file__).resolve().parents[4]
+    repo_root = backend_dir.parent
+    return (backend_dir / "config.yaml", repo_root / "config.yaml")
+
+
 class AppConfig(BaseModel):
    """Config for the DeerFlow application"""

@@ -38,9 +49,17 @@ class AppConfig(BaseModel):
    tools: list[ToolConfig] = Field(default_factory=list, description="Available tools")
    tool_groups: list[ToolGroupConfig] = Field(default_factory=list, description="Available tool groups")
    skills: SkillsConfig = Field(default_factory=SkillsConfig, description="Skills configuration")
+    skill_evolution: SkillEvolutionConfig = Field(default_factory=SkillEvolutionConfig, description="Agent-managed skill evolution configuration")
    extensions: ExtensionsConfig = Field(default_factory=ExtensionsConfig, description="Extensions configuration (MCP servers and skills state)")
    tool_search: ToolSearchConfig = Field(default_factory=ToolSearchConfig, description="Tool search / deferred loading configuration")
+    title: TitleConfig = Field(default_factory=TitleConfig, description="Automatic title generation configuration")
+    summarization: SummarizationConfig = Field(default_factory=SummarizationConfig, description="Conversation summarization configuration")
+    memory: MemoryConfig = Field(default_factory=MemoryConfig, description="Memory subsystem configuration")
+    subagents: SubagentsAppConfig = Field(default_factory=SubagentsAppConfig, description="Subagent runtime configuration")
+    guardrails: GuardrailsConfig = Field(default_factory=GuardrailsConfig, description="Guardrail middleware configuration")
    model_config = ConfigDict(extra="allow", frozen=False)
+    database: DatabaseConfig = Field(default_factory=DatabaseConfig, description="Unified database backend configuration")
+    run_events: RunEventsConfig = Field(default_factory=RunEventsConfig, description="Run event storage configuration")
    checkpointer: CheckpointerConfig | None = Field(default=None, description="Checkpointer configuration")
    stream_bridge: StreamBridgeConfig | None = Field(default=None, description="Stream bridge configuration")

@@ -51,7 +70,7 @@ class AppConfig(BaseModel):
        Priority:
        1. If provided `config_path` argument, use it.
        2. If provided `DEER_FLOW_CONFIG_PATH` environment variable, use it.
-        3. Otherwise, first check the `config.yaml` in the current directory, then fallback to `config.yaml` in the parent directory.
+        3. Otherwise, search deterministic backend/repository-root defaults from `_default_config_candidates()`.
        """
        if config_path:
            path = Path(config_path)
@@ -64,14 +83,10 @@ class AppConfig(BaseModel):
                raise FileNotFoundError(f"Config file specified by environment variable `DEER_FLOW_CONFIG_PATH` not found at {path}")
            return path
        else:
-            # Check if the config.yaml is in the current directory
-            path = Path(os.getcwd()) / "config.yaml"
-            if not path.exists():
-                # Check if the config.yaml is in the parent directory of CWD
-                path = Path(os.getcwd()).parent / "config.yaml"
-                if not path.exists():
-                    raise FileNotFoundError("`config.yaml` file not found at the current directory nor its parent directory")
-            return path
+            for path in _default_config_candidates():
+                if path.exists():
+                    return path
+            raise FileNotFoundError("`config.yaml` file not found at the default backend or repository root locations")

    @classmethod
    def from_file(cls, config_path: str | None = None) -> Self:
@@ -244,6 +259,8 @@ _app_config: AppConfig | None = None
 _app_config_path: Path | None = None
 _app_config_mtime: float | None = None
 _app_config_is_custom = False
+_current_app_config: ContextVar[AppConfig | None] = ContextVar("deerflow_current_app_config", default=None)
+_current_app_config_stack: ContextVar[tuple[AppConfig | None, ...]] = ContextVar("deerflow_current_app_config_stack", default=())


 def _get_config_mtime(config_path: Path) -> float | None:
@@ -276,6 +293,10 @@ def get_app_config() -> AppConfig:
    """
    global _app_config, _app_config_path, _app_config_mtime

+    runtime_override = _current_app_config.get()
+    if runtime_override is not None:
+        return runtime_override
+
    if _app_config is not None and _app_config_is_custom:
        return _app_config

@@ -337,3 +358,26 @@ def set_app_config(config: AppConfig) -> None:
    _app_config_path = None
    _app_config_mtime = None
    _app_config_is_custom = True
+
+
+def peek_current_app_config() -> AppConfig | None:
+    """Return the runtime-scoped AppConfig override, if one is active."""
+    return _current_app_config.get()
+
+
+def push_current_app_config(config: AppConfig) -> None:
+    """Push a runtime-scoped AppConfig override for the current execution context."""
+    stack = _current_app_config_stack.get()
+    _current_app_config_stack.set(stack + (_current_app_config.get(),))
+    _current_app_config.set(config)
+
+
+def pop_current_app_config() -> None:
+    """Pop the latest runtime-scoped AppConfig override for the current execution context."""
+    stack = _current_app_config_stack.get()
+    if not stack:
+        _current_app_config.set(None)
+        return
+    previous = stack[-1]
+    _current_app_config_stack.set(stack[:-1])
+    _current_app_config.set(previous)
@@ -0,0 +1,92 @@
+"""Unified database backend configuration.
+
+Controls BOTH the LangGraph checkpointer and the DeerFlow application
+persistence layer (runs, threads metadata, users, etc.). The user
+configures one backend; the system handles physical separation details.
+
+SQLite mode: checkpointer and app use different .db files in the same
+directory to avoid write-lock contention. This is automatic.
+
+Postgres mode: both use the same database URL but maintain independent
+connection pools with different lifecycles.
+
+Memory mode: checkpointer uses MemorySaver, app uses in-memory stores.
+No database is initialized.
+
+Sensitive values (postgres_url) should use $VAR syntax in config.yaml
+to reference environment variables from .env:
+
+    database:
+      backend: postgres
+      postgres_url: $DATABASE_URL
+
+The $VAR resolution is handled by AppConfig.resolve_env_variables()
+before this config is instantiated -- DatabaseConfig itself does not
+need to do any environment variable processing.
+"""
+
+from __future__ import annotations
+
+import os
+from typing import Literal
+
+from pydantic import BaseModel, Field
+
+
+class DatabaseConfig(BaseModel):
+    backend: Literal["memory", "sqlite", "postgres"] = Field(
+        default="memory",
+        description=("Storage backend for both checkpointer and application data. 'memory' for development (no persistence across restarts), 'sqlite' for single-node deployment, 'postgres' for production multi-node deployment."),
+    )
+    sqlite_dir: str = Field(
+        default=".deer-flow/data",
+        description=("Directory for SQLite database files. Checkpointer uses {sqlite_dir}/checkpoints.db, application data uses {sqlite_dir}/app.db."),
+    )
+    postgres_url: str = Field(
+        default="",
+        description=(
+            "PostgreSQL connection URL, shared by checkpointer and app. "
+            "Use $DATABASE_URL in config.yaml to reference .env. "
+            "Example: postgresql://user:pass@host:5432/deerflow "
+            "(the +asyncpg driver suffix is added automatically where needed)."
+        ),
+    )
+    echo_sql: bool = Field(
+        default=False,
+        description="Echo all SQL statements to log (debug only).",
+    )
+    pool_size: int = Field(
+        default=5,
+        description="Connection pool size for the app ORM engine (postgres only).",
+    )
+
+    # -- Derived helpers (not user-configured) --
+
+    @property
+    def _resolved_sqlite_dir(self) -> str:
+        """Resolve sqlite_dir to an absolute path (relative to CWD)."""
+        from pathlib import Path
+
+        return str(Path(self.sqlite_dir).resolve())
+
+    @property
+    def checkpointer_sqlite_path(self) -> str:
+        """SQLite file path for the LangGraph checkpointer."""
+        return os.path.join(self._resolved_sqlite_dir, "checkpoints.db")
+
+    @property
+    def app_sqlite_path(self) -> str:
+        """SQLite file path for application ORM data."""
+        return os.path.join(self._resolved_sqlite_dir, "app.db")
+
+    @property
+    def app_sqlalchemy_url(self) -> str:
+        """SQLAlchemy async URL for the application ORM engine."""
+        if self.backend == "sqlite":
+            return f"sqlite+aiosqlite:///{self.app_sqlite_path}"
+        if self.backend == "postgres":
+            url = self.postgres_url
+            if url.startswith("postgresql://"):
+                url = url.replace("postgresql://", "postgresql+asyncpg://", 1)
+            return url
+        raise ValueError(f"No SQLAlchemy URL for backend={self.backend!r}")
@@ -80,6 +80,12 @@ class ExtensionsConfig(BaseModel):
        Args:
            config_path: Optional path to extensions config file.

+        Resolution order:
+            1. If provided `config_path` argument, use it.
+            2. If provided `DEER_FLOW_EXTENSIONS_CONFIG_PATH` environment variable, use it.
+            3. Otherwise, search backend/repository-root defaults for
+               `extensions_config.json`, then legacy `mcp_config.json`.
+
        Returns:
            Path to the extensions config file if found, otherwise None.
        """
@@ -94,24 +100,16 @@ class ExtensionsConfig(BaseModel):
                raise FileNotFoundError(f"Extensions config file specified by environment variable `DEER_FLOW_EXTENSIONS_CONFIG_PATH` not found at {path}")
            return path
        else:
-            # Check if the extensions_config.json is in the current directory
-            path = Path(os.getcwd()) / "extensions_config.json"
-            if path.exists():
-                return path
-
-            # Check if the extensions_config.json is in the parent directory of CWD
-            path = Path(os.getcwd()).parent / "extensions_config.json"
-            if path.exists():
-                return path
-
-            # Backward compatibility: check for mcp_config.json
-            path = Path(os.getcwd()) / "mcp_config.json"
-            if path.exists():
-                return path
-
-            path = Path(os.getcwd()).parent / "mcp_config.json"
-            if path.exists():
-                return path
+            backend_dir = Path(__file__).resolve().parents[4]
+            repo_root = backend_dir.parent
+            for path in (
+                backend_dir / "extensions_config.json",
+                repo_root / "extensions_config.json",
+                backend_dir / "mcp_config.json",
+                repo_root / "mcp_config.json",
+            ):
+                if path.exists():
+                    return path

            # Extensions are optional, so return None if not found
            return None
@@ -9,6 +9,12 @@ VIRTUAL_PATH_PREFIX = "/mnt/user-data"
 _SAFE_THREAD_ID_RE = re.compile(r"^[A-Za-z0-9_\-]+$")


+def _default_local_base_dir() -> Path:
+    """Return the repo-local DeerFlow state directory without relying on cwd."""
+    backend_dir = Path(__file__).resolve().parents[4]
+    return backend_dir / ".deer-flow"
+
+
 def _validate_thread_id(thread_id: str) -> str:
    """Validate a thread ID before using it in filesystem paths."""
    if not _SAFE_THREAD_ID_RE.match(thread_id):
@@ -67,8 +73,7 @@ class Paths:
    BaseDir resolution (in priority order):
        1. Constructor argument `base_dir`
        2. DEER_FLOW_HOME environment variable
-        3. Local dev fallback: cwd/.deer-flow  (when cwd is the backend/ dir)
-        4. Default: $HOME/.deer-flow
+        3. Repo-local fallback derived from this module path: `{backend_dir}/.deer-flow`
    """

    def __init__(self, base_dir: str | Path | None = None) -> None:
@@ -104,11 +109,7 @@ class Paths:
        if env_home := os.getenv("DEER_FLOW_HOME"):
            return Path(env_home).resolve()

-        cwd = Path.cwd()
-        if cwd.name == "backend" or (cwd / "pyproject.toml").exists():
-            return cwd / ".deer-flow"
-
-        return Path.home() / ".deer-flow"
+        return _default_local_base_dir()

    @property
    def memory_file(self) -> Path:
@@ -0,0 +1,33 @@
+"""Run event storage configuration.
+
+Controls where run events (messages + execution traces) are persisted.
+
+Backends:
+- memory: In-memory storage, data lost on restart. Suitable for
+  development and testing.
+- db: SQL database via SQLAlchemy ORM. Provides full query capability.
+  Suitable for production deployments.
+- jsonl: Append-only JSONL files. Lightweight alternative for
+  single-node deployments that need persistence without a database.
+"""
+
+from __future__ import annotations
+
+from typing import Literal
+
+from pydantic import BaseModel, Field
+
+
+class RunEventsConfig(BaseModel):
+    backend: Literal["memory", "db", "jsonl"] = Field(
+        default="memory",
+        description="Storage backend for run events. 'memory' for development (no persistence), 'db' for production (SQL queries), 'jsonl' for lightweight single-node persistence.",
+    )
+    max_trace_content: int = Field(
+        default=10240,
+        description="Maximum trace content size in bytes before truncation (db backend only).",
+    )
+    track_token_usage: bool = Field(
+        default=True,
+        description="Whether RunJournal should accumulate token counts to RunRow.",
+    )
@@ -64,4 +64,20 @@ class SandboxConfig(BaseModel):
        description="Environment variables to inject into the sandbox container. Values starting with $ will be resolved from host environment variables.",
    )

+    bash_output_max_chars: int = Field(
+        default=20000,
+        ge=0,
+        description="Maximum characters to keep from bash tool output. Output exceeding this limit is middle-truncated (head + tail), preserving the first and last half. Set to 0 to disable truncation.",
+    )
+    read_file_output_max_chars: int = Field(
+        default=50000,
+        ge=0,
+        description="Maximum characters to keep from read_file tool output. Output exceeding this limit is head-truncated. Set to 0 to disable truncation.",
+    )
+    ls_output_max_chars: int = Field(
+        default=20000,
+        ge=0,
+        description="Maximum characters to keep from ls tool output. Output exceeding this limit is head-truncated. Set to 0 to disable truncation.",
+    )
+
    model_config = ConfigDict(extra="allow")
@@ -0,0 +1,14 @@
+from pydantic import BaseModel, Field
+
+
+class SkillEvolutionConfig(BaseModel):
+    """Configuration for agent-managed skill evolution."""
+
+    enabled: bool = Field(
+        default=False,
+        description="Whether the agent can create and modify skills under skills/custom.",
+    )
+    moderation_model_name: str | None = Field(
+        default=None,
+        description="Optional model name for skill security moderation. Defaults to the primary chat model.",
+    )
@@ -3,6 +3,11 @@ from pathlib import Path
 from pydantic import BaseModel, Field


+def _default_repo_root() -> Path:
+    """Resolve the repo root without relying on the current working directory."""
+    return Path(__file__).resolve().parents[5]
+
+
 class SkillsConfig(BaseModel):
    """Configuration for skills system"""

@@ -26,8 +31,8 @@ class SkillsConfig(BaseModel):
            # Use configured path (can be absolute or relative)
            path = Path(self.path)
            if not path.is_absolute():
-                # If relative, resolve from current working directory
-                path = Path.cwd() / path
+                # If relative, resolve from the repo root for deterministic behavior.
+                path = _default_repo_root() / path
            return path.resolve()
        else:
            # Default: ../skills relative to backend directory
@@ -15,6 +15,11 @@ class SubagentOverrideConfig(BaseModel):
        ge=1,
        description="Timeout in seconds for this subagent (None = use global default)",
    )
+    max_turns: int | None = Field(
+        default=None,
+        ge=1,
+        description="Maximum turns for this subagent (None = use global or builtin default)",
+    )


 class SubagentsAppConfig(BaseModel):
@@ -25,6 +30,11 @@ class SubagentsAppConfig(BaseModel):
        ge=1,
        description="Default timeout in seconds for all subagents (default: 900 = 15 minutes)",
    )
+    max_turns: int | None = Field(
+        default=None,
+        ge=1,
+        description="Optional default max-turn override for all subagents (None = keep builtin defaults)",
+    )
    agents: dict[str, SubagentOverrideConfig] = Field(
        default_factory=dict,
        description="Per-agent configuration overrides keyed by agent name",
@@ -44,6 +54,15 @@ class SubagentsAppConfig(BaseModel):
            return override.timeout_seconds
        return self.timeout_seconds

+    def get_max_turns_for(self, agent_name: str, builtin_default: int) -> int:
+        """Get the effective max_turns for a specific agent."""
+        override = self.agents.get(agent_name)
+        if override is not None and override.max_turns is not None:
+            return override.max_turns
+        if self.max_turns is not None:
+            return self.max_turns
+        return builtin_default
+

 _subagents_config: SubagentsAppConfig = SubagentsAppConfig()

@@ -58,8 +77,26 @@ def load_subagents_config_from_dict(config_dict: dict) -> None:
    global _subagents_config
    _subagents_config = SubagentsAppConfig(**config_dict)

-    overrides_summary = {name: f"{override.timeout_seconds}s" for name, override in _subagents_config.agents.items() if override.timeout_seconds is not None}
+    overrides_summary = {}
+    for name, override in _subagents_config.agents.items():
+        parts = []
+        if override.timeout_seconds is not None:
+            parts.append(f"timeout={override.timeout_seconds}s")
+        if override.max_turns is not None:
+            parts.append(f"max_turns={override.max_turns}")
+        if parts:
+            overrides_summary[name] = ", ".join(parts)
+
    if overrides_summary:
-        logger.info(f"Subagents config loaded: default timeout={_subagents_config.timeout_seconds}s, per-agent overrides={overrides_summary}")
+        logger.info(
+            "Subagents config loaded: default timeout=%ss, default max_turns=%s, per-agent overrides=%s",
+            _subagents_config.timeout_seconds,
+            _subagents_config.max_turns,
+            overrides_summary,
+        )
    else:
-        logger.info(f"Subagents config loaded: default timeout={_subagents_config.timeout_seconds}s, no per-agent overrides")
+        logger.info(
+            "Subagents config loaded: default timeout=%ss, default max_turns=%s, no per-agent overrides",
+            _subagents_config.timeout_seconds,
+            _subagents_config.max_turns,
+        )
@@ -1,14 +1,12 @@
-import logging
 import os
 import threading

 from pydantic import BaseModel, Field

-logger = logging.getLogger(__name__)
 _config_lock = threading.Lock()


-class TracingConfig(BaseModel):
+class LangSmithTracingConfig(BaseModel):
    """Configuration for LangSmith tracing."""

    enabled: bool = Field(...)
@@ -18,9 +16,69 @@ class TracingConfig(BaseModel):

    @property
    def is_configured(self) -> bool:
-        """Check if tracing is fully configured (enabled and has API key)."""
        return self.enabled and bool(self.api_key)

+    def validate(self) -> None:
+        if self.enabled and not self.api_key:
+            raise ValueError("LangSmith tracing is enabled but LANGSMITH_API_KEY (or LANGCHAIN_API_KEY) is not set.")
+
+
+class LangfuseTracingConfig(BaseModel):
+    """Configuration for Langfuse tracing."""
+
+    enabled: bool = Field(...)
+    public_key: str | None = Field(...)
+    secret_key: str | None = Field(...)
+    host: str = Field(...)
+
+    @property
+    def is_configured(self) -> bool:
+        return self.enabled and bool(self.public_key) and bool(self.secret_key)
+
+    def validate(self) -> None:
+        if not self.enabled:
+            return
+        missing: list[str] = []
+        if not self.public_key:
+            missing.append("LANGFUSE_PUBLIC_KEY")
+        if not self.secret_key:
+            missing.append("LANGFUSE_SECRET_KEY")
+        if missing:
+            raise ValueError(f"Langfuse tracing is enabled but required settings are missing: {', '.join(missing)}")
+
+
+class TracingConfig(BaseModel):
+    """Tracing configuration for supported providers."""
+
+    langsmith: LangSmithTracingConfig = Field(...)
+    langfuse: LangfuseTracingConfig = Field(...)
+
+    @property
+    def is_configured(self) -> bool:
+        return bool(self.enabled_providers)
+
+    @property
+    def explicitly_enabled_providers(self) -> list[str]:
+        enabled: list[str] = []
+        if self.langsmith.enabled:
+            enabled.append("langsmith")
+        if self.langfuse.enabled:
+            enabled.append("langfuse")
+        return enabled
+
+    @property
+    def enabled_providers(self) -> list[str]:
+        enabled: list[str] = []
+        if self.langsmith.is_configured:
+            enabled.append("langsmith")
+        if self.langfuse.is_configured:
+            enabled.append("langfuse")
+        return enabled
+
+    def validate_enabled(self) -> None:
+        self.langsmith.validate()
+        self.langfuse.validate()
+

 _tracing_config: TracingConfig | None = None

@@ -29,12 +87,7 @@ _TRUTHY_VALUES = {"1", "true", "yes", "on"}


 def _env_flag_preferred(*names: str) -> bool:
-    """Return the boolean value of the first env var that is present and non-empty.
-
-    Accepted truthy values (case-insensitive): ``1``, ``true``, ``yes``, ``on``.
-    Any other non-empty value is treated as falsy.  If none of the named
-    variables is set, returns ``False``.
-    """
+    """Return the boolean value of the first env var that is present and non-empty."""
    for name in names:
        value = os.environ.get(name)
        if value is not None and value.strip():
@@ -52,43 +105,45 @@ def _first_env_value(*names: str) -> str | None:


 def get_tracing_config() -> TracingConfig:
-    """Get the current tracing configuration from environment variables.
-
-    ``LANGSMITH_*`` variables take precedence over their legacy ``LANGCHAIN_*``
-    counterparts.  For boolean flags (``enabled``), the *first* variable that is
-    present and non-empty in the priority list is the sole authority – its value
-    is parsed and returned without consulting the remaining candidates.  Accepted
-    truthy values are ``1``, ``true``, ``yes``, and ``on`` (case-insensitive);
-    any other non-empty value is treated as falsy.
-
-    Priority order:
-        enabled  : LANGSMITH_TRACING > LANGCHAIN_TRACING_V2 > LANGCHAIN_TRACING
-        api_key  : LANGSMITH_API_KEY  > LANGCHAIN_API_KEY
-        project  : LANGSMITH_PROJECT  > LANGCHAIN_PROJECT   (default: "deer-flow")
-        endpoint : LANGSMITH_ENDPOINT > LANGCHAIN_ENDPOINT  (default: https://api.smith.langchain.com)
-
-    Returns:
-        TracingConfig with current settings.
-    """
+    """Get the current tracing configuration from environment variables."""
    global _tracing_config
    if _tracing_config is not None:
        return _tracing_config
    with _config_lock:
-        if _tracing_config is not None:  # Double-check after acquiring lock
+        if _tracing_config is not None:
            return _tracing_config
        _tracing_config = TracingConfig(
-            # Keep compatibility with both legacy LANGCHAIN_* and newer LANGSMITH_* variables.
-            enabled=_env_flag_preferred("LANGSMITH_TRACING", "LANGCHAIN_TRACING_V2", "LANGCHAIN_TRACING"),
-            api_key=_first_env_value("LANGSMITH_API_KEY", "LANGCHAIN_API_KEY"),
-            project=_first_env_value("LANGSMITH_PROJECT", "LANGCHAIN_PROJECT") or "deer-flow",
-            endpoint=_first_env_value("LANGSMITH_ENDPOINT", "LANGCHAIN_ENDPOINT") or "https://api.smith.langchain.com",
+            langsmith=LangSmithTracingConfig(
+                enabled=_env_flag_preferred("LANGSMITH_TRACING", "LANGCHAIN_TRACING_V2", "LANGCHAIN_TRACING"),
+                api_key=_first_env_value("LANGSMITH_API_KEY", "LANGCHAIN_API_KEY"),
+                project=_first_env_value("LANGSMITH_PROJECT", "LANGCHAIN_PROJECT") or "deer-flow",
+                endpoint=_first_env_value("LANGSMITH_ENDPOINT", "LANGCHAIN_ENDPOINT") or "https://api.smith.langchain.com",
+            ),
+            langfuse=LangfuseTracingConfig(
+                enabled=_env_flag_preferred("LANGFUSE_TRACING"),
+                public_key=_first_env_value("LANGFUSE_PUBLIC_KEY"),
+                secret_key=_first_env_value("LANGFUSE_SECRET_KEY"),
+                host=_first_env_value("LANGFUSE_BASE_URL") or "https://cloud.langfuse.com",
+            ),
        )
        return _tracing_config


+def get_enabled_tracing_providers() -> list[str]:
+    """Return the configured tracing providers that are enabled and complete."""
+    return get_tracing_config().enabled_providers
+
+
+def get_explicitly_enabled_tracing_providers() -> list[str]:
+    """Return tracing providers explicitly enabled by config, even if incomplete."""
+    return get_tracing_config().explicitly_enabled_providers
+
+
+def validate_enabled_tracing_providers() -> None:
+    """Validate that any explicitly enabled providers are fully configured."""
+    get_tracing_config().validate_enabled()
+
+
 def is_tracing_enabled() -> bool:
-    """Check if LangSmith tracing is enabled and configured.
-    Returns:
-        True if tracing is enabled and has an API key.
-    """
+    """Check if any tracing provider is enabled and fully configured."""
    return get_tracing_config().is_configured
@@ -2,12 +2,34 @@ import logging

 from langchain.chat_models import BaseChatModel

-from deerflow.config import get_app_config, get_tracing_config, is_tracing_enabled
+from deerflow.config import get_app_config
 from deerflow.reflection import resolve_class
+from deerflow.tracing import build_tracing_callbacks

 logger = logging.getLogger(__name__)


+def _deep_merge_dicts(base: dict | None, override: dict) -> dict:
+    """Recursively merge two dictionaries without mutating the inputs."""
+    merged = dict(base or {})
+    for key, value in override.items():
+        if isinstance(value, dict) and isinstance(merged.get(key), dict):
+            merged[key] = _deep_merge_dicts(merged[key], value)
+        else:
+            merged[key] = value
+    return merged
+
+
+def _vllm_disable_chat_template_kwargs(chat_template_kwargs: dict) -> dict:
+    """Build the disable payload for vLLM/Qwen chat template kwargs."""
+    disable_kwargs: dict[str, bool] = {}
+    if "thinking" in chat_template_kwargs:
+        disable_kwargs["thinking"] = False
+    if "enable_thinking" in chat_template_kwargs:
+        disable_kwargs["enable_thinking"] = False
+    return disable_kwargs
+
+
 def create_chat_model(name: str | None = None, thinking_enabled: bool = False, **kwargs) -> BaseChatModel:
    """Create a chat model instance from the config.

@@ -53,13 +75,23 @@ def create_chat_model(name: str | None = None, thinking_enabled: bool = False, *
    if not thinking_enabled and has_thinking_settings:
        if effective_wte.get("extra_body", {}).get("thinking", {}).get("type"):
            # OpenAI-compatible gateway: thinking is nested under extra_body
-            kwargs.update({"extra_body": {"thinking": {"type": "disabled"}}})
-            kwargs.update({"reasoning_effort": "minimal"})
+            model_settings_from_config["extra_body"] = _deep_merge_dicts(
+                model_settings_from_config.get("extra_body"),
+                {"thinking": {"type": "disabled"}},
+            )
+            model_settings_from_config["reasoning_effort"] = "minimal"
+        elif disable_chat_template_kwargs := _vllm_disable_chat_template_kwargs(effective_wte.get("extra_body", {}).get("chat_template_kwargs") or {}):
+            # vLLM uses chat template kwargs to switch thinking on/off.
+            model_settings_from_config["extra_body"] = _deep_merge_dicts(
+                model_settings_from_config.get("extra_body"),
+                {"chat_template_kwargs": disable_chat_template_kwargs},
+            )
        elif effective_wte.get("thinking", {}).get("type"):
            # Native langchain_anthropic: thinking is a direct constructor parameter
-            kwargs.update({"thinking": {"type": "disabled"}})
-    if not model_config.supports_reasoning_effort and "reasoning_effort" in kwargs:
-        del kwargs["reasoning_effort"]
+            model_settings_from_config["thinking"] = {"type": "disabled"}
+    if not model_config.supports_reasoning_effort:
+        kwargs.pop("reasoning_effort", None)
+        model_settings_from_config.pop("reasoning_effort", None)

    # For Codex Responses API models: map thinking mode to reasoning_effort
    from deerflow.models.openai_codex_provider import CodexChatModel
@@ -77,19 +109,20 @@ def create_chat_model(name: str | None = None, thinking_enabled: bool = False, *
        elif "reasoning_effort" not in model_settings_from_config:
            model_settings_from_config["reasoning_effort"] = "medium"

+    # Ensure stream_usage is enabled so that token usage metadata is available
+    # in streaming responses.  LangChain's BaseChatOpenAI only defaults
+    # stream_usage=True when no custom base_url/api_base is set, so models
+    # hitting third-party endpoints (e.g. doubao, deepseek) silently lose
+    # usage data.  We default it to True unless explicitly configured.
+    if "stream_usage" not in model_settings_from_config and "stream_usage" not in kwargs:
+        if "stream_usage" in getattr(model_class, "model_fields", {}):
+            model_settings_from_config["stream_usage"] = True
+
    model_instance = model_class(**kwargs, **model_settings_from_config)

-    if is_tracing_enabled():
-        try:
-            from langchain_core.tracers.langchain import LangChainTracer
-
-            tracing_config = get_tracing_config()
-            tracer = LangChainTracer(
-                project_name=tracing_config.project,
-            )
-            existing_callbacks = model_instance.callbacks or []
-            model_instance.callbacks = [*existing_callbacks, tracer]
-            logger.debug(f"LangSmith tracing attached to model '{name}' (project='{tracing_config.project}')")
-        except Exception as e:
-            logger.warning(f"Failed to attach LangSmith tracing to model '{name}': {e}")
+    callbacks = build_tracing_callbacks()
+    if callbacks:
+        existing_callbacks = model_instance.callbacks or []
+        model_instance.callbacks = [*existing_callbacks, *callbacks]
+        logger.debug(f"Tracing attached to model '{name}' with providers={len(callbacks)}")
    return model_instance
@@ -0,0 +1,258 @@
+"""Custom vLLM provider built on top of LangChain ChatOpenAI.
+
+vLLM 0.19.0 exposes reasoning models through an OpenAI-compatible API, but
+LangChain's default OpenAI adapter drops the non-standard ``reasoning`` field
+from assistant messages and streaming deltas. That breaks interleaved
+thinking/tool-call flows because vLLM expects the assistant's prior reasoning to
+be echoed back on subsequent turns.
+
+This provider preserves ``reasoning`` on:
+- non-streaming responses
+- streaming deltas
+- multi-turn request payloads
+"""
+
+from __future__ import annotations
+
+import json
+from collections.abc import Mapping
+from typing import Any, cast
+
+import openai
+from langchain_core.language_models import LanguageModelInput
+from langchain_core.messages import (
+    AIMessage,
+    AIMessageChunk,
+    BaseMessageChunk,
+    ChatMessageChunk,
+    FunctionMessageChunk,
+    HumanMessageChunk,
+    SystemMessageChunk,
+    ToolMessageChunk,
+)
+from langchain_core.messages.tool import tool_call_chunk
+from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
+from langchain_openai import ChatOpenAI
+from langchain_openai.chat_models.base import _create_usage_metadata
+
+
+def _normalize_vllm_chat_template_kwargs(payload: dict[str, Any]) -> None:
+    """Map DeerFlow's legacy ``thinking`` toggle to vLLM/Qwen's ``enable_thinking``.
+
+    DeerFlow originally documented ``extra_body.chat_template_kwargs.thinking``
+    for vLLM, but vLLM 0.19.0's Qwen reasoning parser reads
+    ``chat_template_kwargs.enable_thinking``. Normalize the payload just before
+    it is sent so existing configs keep working and flash mode can truly
+    disable reasoning.
+    """
+    extra_body = payload.get("extra_body")
+    if not isinstance(extra_body, dict):
+        return
+
+    chat_template_kwargs = extra_body.get("chat_template_kwargs")
+    if not isinstance(chat_template_kwargs, dict):
+        return
+
+    if "thinking" not in chat_template_kwargs:
+        return
+
+    normalized_chat_template_kwargs = dict(chat_template_kwargs)
+    normalized_chat_template_kwargs.setdefault("enable_thinking", normalized_chat_template_kwargs["thinking"])
+    normalized_chat_template_kwargs.pop("thinking", None)
+    extra_body["chat_template_kwargs"] = normalized_chat_template_kwargs
+
+
+def _reasoning_to_text(reasoning: Any) -> str:
+    """Best-effort extraction of readable reasoning text from vLLM payloads."""
+    if isinstance(reasoning, str):
+        return reasoning
+
+    if isinstance(reasoning, list):
+        parts = [_reasoning_to_text(item) for item in reasoning]
+        return "".join(part for part in parts if part)
+
+    if isinstance(reasoning, dict):
+        for key in ("text", "content", "reasoning"):
+            value = reasoning.get(key)
+            if isinstance(value, str):
+                return value
+            if value is not None:
+                text = _reasoning_to_text(value)
+                if text:
+                    return text
+        try:
+            return json.dumps(reasoning, ensure_ascii=False)
+        except TypeError:
+            return str(reasoning)
+
+    try:
+        return json.dumps(reasoning, ensure_ascii=False)
+    except TypeError:
+        return str(reasoning)
+
+
+def _convert_delta_to_message_chunk_with_reasoning(_dict: Mapping[str, Any], default_class: type[BaseMessageChunk]) -> BaseMessageChunk:
+    """Convert a streaming delta to a LangChain message chunk while preserving reasoning."""
+    id_ = _dict.get("id")
+    role = cast(str, _dict.get("role"))
+    content = cast(str, _dict.get("content") or "")
+    additional_kwargs: dict[str, Any] = {}
+
+    if _dict.get("function_call"):
+        function_call = dict(_dict["function_call"])
+        if "name" in function_call and function_call["name"] is None:
+            function_call["name"] = ""
+        additional_kwargs["function_call"] = function_call
+
+    reasoning = _dict.get("reasoning")
+    if reasoning is not None:
+        additional_kwargs["reasoning"] = reasoning
+        reasoning_text = _reasoning_to_text(reasoning)
+        if reasoning_text:
+            additional_kwargs["reasoning_content"] = reasoning_text
+
+    tool_call_chunks = []
+    if raw_tool_calls := _dict.get("tool_calls"):
+        try:
+            tool_call_chunks = [
+                tool_call_chunk(
+                    name=rtc["function"].get("name"),
+                    args=rtc["function"].get("arguments"),
+                    id=rtc.get("id"),
+                    index=rtc["index"],
+                )
+                for rtc in raw_tool_calls
+            ]
+        except KeyError:
+            pass
+
+    if role == "user" or default_class == HumanMessageChunk:
+        return HumanMessageChunk(content=content, id=id_)
+    if role == "assistant" or default_class == AIMessageChunk:
+        return AIMessageChunk(
+            content=content,
+            additional_kwargs=additional_kwargs,
+            id=id_,
+            tool_call_chunks=tool_call_chunks,  # type: ignore[arg-type]
+        )
+    if role in ("system", "developer") or default_class == SystemMessageChunk:
+        role_kwargs = {"__openai_role__": "developer"} if role == "developer" else {}
+        return SystemMessageChunk(content=content, id=id_, additional_kwargs=role_kwargs)
+    if role == "function" or default_class == FunctionMessageChunk:
+        return FunctionMessageChunk(content=content, name=_dict["name"], id=id_)
+    if role == "tool" or default_class == ToolMessageChunk:
+        return ToolMessageChunk(content=content, tool_call_id=_dict["tool_call_id"], id=id_)
+    if role or default_class == ChatMessageChunk:
+        return ChatMessageChunk(content=content, role=role, id=id_)  # type: ignore[arg-type]
+    return default_class(content=content, id=id_)  # type: ignore[call-arg]
+
+
+def _restore_reasoning_field(payload_msg: dict[str, Any], orig_msg: AIMessage) -> None:
+    """Re-inject vLLM reasoning onto outgoing assistant messages."""
+    reasoning = orig_msg.additional_kwargs.get("reasoning")
+    if reasoning is None:
+        reasoning = orig_msg.additional_kwargs.get("reasoning_content")
+    if reasoning is not None:
+        payload_msg["reasoning"] = reasoning
+
+
+class VllmChatModel(ChatOpenAI):
+    """ChatOpenAI variant that preserves vLLM reasoning fields across turns."""
+
+    model_config = {"arbitrary_types_allowed": True}
+
+    @property
+    def _llm_type(self) -> str:
+        return "vllm-openai-compatible"
+
+    def _get_request_payload(
+        self,
+        input_: LanguageModelInput,
+        *,
+        stop: list[str] | None = None,
+        **kwargs: Any,
+    ) -> dict[str, Any]:
+        """Restore assistant reasoning in request payloads for interleaved thinking."""
+        original_messages = self._convert_input(input_).to_messages()
+        payload = super()._get_request_payload(input_, stop=stop, **kwargs)
+        _normalize_vllm_chat_template_kwargs(payload)
+        payload_messages = payload.get("messages", [])
+
+        if len(payload_messages) == len(original_messages):
+            for payload_msg, orig_msg in zip(payload_messages, original_messages):
+                if payload_msg.get("role") == "assistant" and isinstance(orig_msg, AIMessage):
+                    _restore_reasoning_field(payload_msg, orig_msg)
+        else:
+            ai_messages = [message for message in original_messages if isinstance(message, AIMessage)]
+            assistant_payloads = [message for message in payload_messages if message.get("role") == "assistant"]
+            for payload_msg, ai_msg in zip(assistant_payloads, ai_messages):
+                _restore_reasoning_field(payload_msg, ai_msg)
+
+        return payload
+
+    def _create_chat_result(self, response: dict | openai.BaseModel, generation_info: dict | None = None) -> ChatResult:
+        """Preserve vLLM reasoning on non-streaming responses."""
+        result = super()._create_chat_result(response, generation_info=generation_info)
+        response_dict = response if isinstance(response, dict) else response.model_dump()
+
+        for generation, choice in zip(result.generations, response_dict.get("choices", [])):
+            if not isinstance(generation, ChatGeneration):
+                continue
+            message = generation.message
+            if not isinstance(message, AIMessage):
+                continue
+            reasoning = choice.get("message", {}).get("reasoning")
+            if reasoning is None:
+                continue
+            message.additional_kwargs["reasoning"] = reasoning
+            reasoning_text = _reasoning_to_text(reasoning)
+            if reasoning_text:
+                message.additional_kwargs["reasoning_content"] = reasoning_text
+
+        return result
+
+    def _convert_chunk_to_generation_chunk(
+        self,
+        chunk: dict,
+        default_chunk_class: type,
+        base_generation_info: dict | None,
+    ) -> ChatGenerationChunk | None:
+        """Preserve vLLM reasoning on streaming deltas."""
+        if chunk.get("type") == "content.delta":
+            return None
+
+        token_usage = chunk.get("usage")
+        choices = chunk.get("choices", []) or chunk.get("chunk", {}).get("choices", [])
+        usage_metadata = _create_usage_metadata(token_usage, chunk.get("service_tier")) if token_usage else None
+
+        if len(choices) == 0:
+            generation_chunk = ChatGenerationChunk(message=default_chunk_class(content="", usage_metadata=usage_metadata), generation_info=base_generation_info)
+            if self.output_version == "v1":
+                generation_chunk.message.content = []
+                generation_chunk.message.response_metadata["output_version"] = "v1"
+            return generation_chunk
+
+        choice = choices[0]
+        if choice["delta"] is None:
+            return None
+
+        message_chunk = _convert_delta_to_message_chunk_with_reasoning(choice["delta"], default_chunk_class)
+        generation_info = {**base_generation_info} if base_generation_info else {}
+
+        if finish_reason := choice.get("finish_reason"):
+            generation_info["finish_reason"] = finish_reason
+            if model_name := chunk.get("model"):
+                generation_info["model_name"] = model_name
+            if system_fingerprint := chunk.get("system_fingerprint"):
+                generation_info["system_fingerprint"] = system_fingerprint
+            if service_tier := chunk.get("service_tier"):
+                generation_info["service_tier"] = service_tier
+
+        if logprobs := choice.get("logprobs"):
+            generation_info["logprobs"] = logprobs
+
+        if usage_metadata and isinstance(message_chunk, AIMessageChunk):
+            message_chunk.usage_metadata = usage_metadata
+
+        message_chunk.response_metadata["model_provider"] = "openai"
+        return ChatGenerationChunk(message=message_chunk, generation_info=generation_info or None)
@@ -0,0 +1,13 @@
+"""DeerFlow application persistence layer (SQLAlchemy 2.0 async ORM).
+
+This module manages DeerFlow's own application data -- runs metadata,
+thread ownership, cron jobs, users. It is completely separate from
+LangGraph's checkpointer, which manages graph execution state.
+
+Usage:
+    from deerflow.persistence import init_engine, close_engine, get_session_factory
+"""
+
+from deerflow.persistence.engine import close_engine, get_engine, get_session_factory, init_engine
+
+__all__ = ["close_engine", "get_engine", "get_session_factory", "init_engine"]
@@ -0,0 +1,40 @@
+"""SQLAlchemy declarative base with automatic to_dict support.
+
+All DeerFlow ORM models inherit from this Base. It provides a generic
+to_dict() method via SQLAlchemy's inspect() so individual models don't
+need to write their own serialization logic.
+
+LangGraph's checkpointer tables are NOT managed by this Base.
+"""
+
+from __future__ import annotations
+
+from sqlalchemy import inspect as sa_inspect
+from sqlalchemy.orm import DeclarativeBase
+
+
+class Base(DeclarativeBase):
+    """Base class for all DeerFlow ORM models.
+
+    Provides:
+    - Automatic to_dict() via SQLAlchemy column inspection.
+    - Standard __repr__() showing all column values.
+    """
+
+    def to_dict(self, *, exclude: set[str] | None = None) -> dict:
+        """Convert ORM instance to plain dict.
+
+        Uses SQLAlchemy's inspect() to iterate mapped column attributes.
+
+        Args:
+            exclude: Optional set of column keys to omit.
+
+        Returns:
+            Dict of {column_key: value} for all mapped columns.
+        """
+        exclude = exclude or set()
+        return {c.key: getattr(self, c.key) for c in sa_inspect(type(self)).mapper.column_attrs if c.key not in exclude}
+
+    def __repr__(self) -> str:
+        cols = ", ".join(f"{c.key}={getattr(self, c.key)!r}" for c in sa_inspect(type(self)).mapper.column_attrs)
+        return f"{type(self).__name__}({cols})"
@@ -0,0 +1,166 @@
+"""Async SQLAlchemy engine lifecycle management.
+
+Initializes at Gateway startup, provides session factory for
+repositories, disposes at shutdown.
+
+When database.backend="memory", init_engine is a no-op and
+get_session_factory() returns None. Repositories must check for
+None and fall back to in-memory implementations.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+
+from sqlalchemy.ext.asyncio import AsyncEngine, AsyncSession, async_sessionmaker, create_async_engine
+
+
+def _json_serializer(obj: object) -> str:
+    """JSON serializer with ensure_ascii=False for Chinese character support."""
+    return json.dumps(obj, ensure_ascii=False)
+
+
+logger = logging.getLogger(__name__)
+
+_engine: AsyncEngine | None = None
+_session_factory: async_sessionmaker[AsyncSession] | None = None
+
+
+async def _auto_create_postgres_db(url: str) -> None:
+    """Connect to the ``postgres`` maintenance DB and CREATE DATABASE.
+
+    The target database name is extracted from *url*.  The connection is
+    made to the default ``postgres`` database on the same server using
+    ``AUTOCOMMIT`` isolation (CREATE DATABASE cannot run inside a
+    transaction).
+    """
+    from sqlalchemy import text
+    from sqlalchemy.engine.url import make_url
+
+    parsed = make_url(url)
+    db_name = parsed.database
+    if not db_name:
+        raise ValueError("Cannot auto-create database: no database name in URL")
+
+    # Connect to the default 'postgres' database to issue CREATE DATABASE
+    maint_url = parsed.set(database="postgres")
+    maint_engine = create_async_engine(maint_url, isolation_level="AUTOCOMMIT")
+    try:
+        async with maint_engine.connect() as conn:
+            await conn.execute(text(f'CREATE DATABASE "{db_name}"'))
+        logger.info("Auto-created PostgreSQL database: %s", db_name)
+    finally:
+        await maint_engine.dispose()
+
+
+async def init_engine(
+    backend: str,
+    *,
+    url: str = "",
+    echo: bool = False,
+    pool_size: int = 5,
+    sqlite_dir: str = "",
+) -> None:
+    """Create the async engine and session factory, then auto-create tables.
+
+    Args:
+        backend: "memory", "sqlite", or "postgres".
+        url: SQLAlchemy async URL (for sqlite/postgres).
+        echo: Echo SQL to log.
+        pool_size: Postgres connection pool size.
+        sqlite_dir: Directory to create for SQLite (ensured to exist).
+    """
+    global _engine, _session_factory
+
+    if backend == "memory":
+        logger.info("Persistence backend=memory -- ORM engine not initialized")
+        return
+
+    if backend == "postgres":
+        try:
+            import asyncpg  # noqa: F401
+        except ImportError:
+            raise ImportError("database.backend is set to 'postgres' but asyncpg is not installed.\nInstall it with:\n    uv sync --extra postgres\nOr switch to backend: sqlite in config.yaml for single-node deployment.") from None
+
+    if backend == "sqlite":
+        import os
+
+        os.makedirs(sqlite_dir or ".", exist_ok=True)
+        _engine = create_async_engine(url, echo=echo, json_serializer=_json_serializer)
+    elif backend == "postgres":
+        _engine = create_async_engine(
+            url,
+            echo=echo,
+            pool_size=pool_size,
+            pool_pre_ping=True,
+            json_serializer=_json_serializer,
+        )
+    else:
+        raise ValueError(f"Unknown persistence backend: {backend!r}")
+
+    _session_factory = async_sessionmaker(_engine, expire_on_commit=False)
+
+    # Auto-create tables (dev convenience). Production should use Alembic.
+    from deerflow.persistence.base import Base
+
+    # Import all models so Base.metadata discovers them.
+    # When no models exist yet (scaffolding phase), this is a no-op.
+    try:
+        import deerflow.persistence.models  # noqa: F401
+    except ImportError:
+        # Models package not yet available — tables won't be auto-created.
+        # This is expected during initial scaffolding or minimal installs.
+        logger.debug("deerflow.persistence.models not found; skipping auto-create tables")
+
+    try:
+        async with _engine.begin() as conn:
+            await conn.run_sync(Base.metadata.create_all)
+    except Exception as exc:
+        if backend == "postgres" and "does not exist" in str(exc):
+            # Database not yet created — attempt to auto-create it, then retry.
+            await _auto_create_postgres_db(url)
+            # Rebuild engine against the now-existing database
+            await _engine.dispose()
+            _engine = create_async_engine(url, echo=echo, pool_size=pool_size, pool_pre_ping=True, json_serializer=_json_serializer)
+            _session_factory = async_sessionmaker(_engine, expire_on_commit=False)
+            async with _engine.begin() as conn:
+                await conn.run_sync(Base.metadata.create_all)
+        else:
+            raise
+
+    logger.info("Persistence engine initialized: backend=%s", backend)
+
+
+async def init_engine_from_config(config) -> None:
+    """Convenience: init engine from a DatabaseConfig object."""
+    if config.backend == "memory":
+        await init_engine("memory")
+        return
+    await init_engine(
+        backend=config.backend,
+        url=config.app_sqlalchemy_url,
+        echo=config.echo_sql,
+        pool_size=config.pool_size,
+        sqlite_dir=config.sqlite_dir if config.backend == "sqlite" else "",
+    )
+
+
+def get_session_factory() -> async_sessionmaker[AsyncSession] | None:
+    """Return the async session factory, or None if backend=memory."""
+    return _session_factory
+
+
+def get_engine() -> AsyncEngine | None:
+    """Return the async engine, or None if not initialized."""
+    return _engine
+
+
+async def close_engine() -> None:
+    """Dispose the engine, release all connections."""
+    global _engine, _session_factory
+    if _engine is not None:
+        await _engine.dispose()
+        logger.info("Persistence engine closed")
+    _engine = None
+    _session_factory = None
@@ -0,0 +1,6 @@
+"""Feedback persistence — ORM and SQL repository."""
+
+from deerflow.persistence.feedback.model import FeedbackRow
+from deerflow.persistence.feedback.sql import FeedbackRepository
+
+__all__ = ["FeedbackRepository", "FeedbackRow"]
@@ -0,0 +1,30 @@
+"""ORM model for user feedback on runs."""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+
+from sqlalchemy import DateTime, String, Text
+from sqlalchemy.orm import Mapped, mapped_column
+
+from deerflow.persistence.base import Base
+
+
+class FeedbackRow(Base):
+    __tablename__ = "feedback"
+
+    feedback_id: Mapped[str] = mapped_column(String(64), primary_key=True)
+    run_id: Mapped[str] = mapped_column(String(64), nullable=False, index=True)
+    thread_id: Mapped[str] = mapped_column(String(64), nullable=False, index=True)
+    owner_id: Mapped[str | None] = mapped_column(String(64), index=True)
+    message_id: Mapped[str | None] = mapped_column(String(64))
+    # message_id is an optional RunEventStore event identifier —
+    # allows feedback to target a specific message or the entire run
+
+    rating: Mapped[int] = mapped_column(nullable=False)
+    # +1 (thumbs-up) or -1 (thumbs-down)
+
+    comment: Mapped[str | None] = mapped_column(Text)
+    # Optional text feedback from the user
+
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC))
@@ -0,0 +1,98 @@
+"""SQLAlchemy-backed feedback storage.
+
+Each method acquires its own short-lived session.
+"""
+
+from __future__ import annotations
+
+import uuid
+from datetime import UTC, datetime
+
+from sqlalchemy import case, func, select
+from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
+
+from deerflow.persistence.feedback.model import FeedbackRow
+
+
+class FeedbackRepository:
+    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
+        self._sf = session_factory
+
+    @staticmethod
+    def _row_to_dict(row: FeedbackRow) -> dict:
+        d = row.to_dict()
+        val = d.get("created_at")
+        if isinstance(val, datetime):
+            d["created_at"] = val.isoformat()
+        return d
+
+    async def create(
+        self,
+        *,
+        run_id: str,
+        thread_id: str,
+        rating: int,
+        owner_id: str | None = None,
+        message_id: str | None = None,
+        comment: str | None = None,
+    ) -> dict:
+        """Create a feedback record. rating must be +1 or -1."""
+        if rating not in (1, -1):
+            raise ValueError(f"rating must be +1 or -1, got {rating}")
+        row = FeedbackRow(
+            feedback_id=str(uuid.uuid4()),
+            run_id=run_id,
+            thread_id=thread_id,
+            owner_id=owner_id,
+            message_id=message_id,
+            rating=rating,
+            comment=comment,
+            created_at=datetime.now(UTC),
+        )
+        async with self._sf() as session:
+            session.add(row)
+            await session.commit()
+            await session.refresh(row)
+            return self._row_to_dict(row)
+
+    async def get(self, feedback_id: str) -> dict | None:
+        async with self._sf() as session:
+            row = await session.get(FeedbackRow, feedback_id)
+            return self._row_to_dict(row) if row else None
+
+    async def list_by_run(self, thread_id: str, run_id: str, *, limit: int = 100) -> list[dict]:
+        stmt = select(FeedbackRow).where(FeedbackRow.thread_id == thread_id, FeedbackRow.run_id == run_id).order_by(FeedbackRow.created_at.asc()).limit(limit)
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def list_by_thread(self, thread_id: str, *, limit: int = 100) -> list[dict]:
+        stmt = select(FeedbackRow).where(FeedbackRow.thread_id == thread_id).order_by(FeedbackRow.created_at.asc()).limit(limit)
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def delete(self, feedback_id: str) -> bool:
+        async with self._sf() as session:
+            row = await session.get(FeedbackRow, feedback_id)
+            if row is None:
+                return False
+            await session.delete(row)
+            await session.commit()
+            return True
+
+    async def aggregate_by_run(self, thread_id: str, run_id: str) -> dict:
+        """Aggregate feedback stats for a run using database-side counting."""
+        stmt = select(
+            func.count().label("total"),
+            func.coalesce(func.sum(case((FeedbackRow.rating == 1, 1), else_=0)), 0).label("positive"),
+            func.coalesce(func.sum(case((FeedbackRow.rating == -1, 1), else_=0)), 0).label("negative"),
+        ).where(FeedbackRow.thread_id == thread_id, FeedbackRow.run_id == run_id)
+        async with self._sf() as session:
+            row = (await session.execute(stmt)).one()
+            return {
+                "run_id": run_id,
+                "total": row.total,
+                "positive": row.positive,
+                "negative": row.negative,
+            }
@@ -0,0 +1,38 @@
+[alembic]
+script_location = %(here)s
+# Default URL for offline mode / autogenerate.
+# Runtime uses engine from DeerFlow config.
+sqlalchemy.url = sqlite+aiosqlite:///./data/app.db
+
+[loggers]
+keys = root,sqlalchemy,alembic
+
+[handlers]
+keys = console
+
+[formatters]
+keys = generic
+
+[logger_root]
+level = WARN
+handlers = console
+
+[logger_sqlalchemy]
+level = WARN
+handlers =
+qualname = sqlalchemy.engine
+
+[logger_alembic]
+level = INFO
+handlers =
+qualname = alembic
+
+[handler_console]
+class = StreamHandler
+args = (sys.stderr,)
+level = NOTSET
+formatter = generic
+
+[formatter_generic]
+format = %(levelname)-5.5s [%(name)s] %(message)s
+datefmt = %H:%M:%S
@@ -0,0 +1,65 @@
+"""Alembic environment for DeerFlow application tables.
+
+ONLY manages DeerFlow's tables (runs, threads_meta, cron_jobs, users).
+LangGraph's checkpointer tables are managed by LangGraph itself -- they
+have their own schema lifecycle and must not be touched by Alembic.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+from logging.config import fileConfig
+
+from alembic import context
+from sqlalchemy.ext.asyncio import create_async_engine
+
+from deerflow.persistence.base import Base
+
+# Import all models so metadata is populated.
+try:
+    import deerflow.persistence.models  # noqa: F401 — register ORM models with Base.metadata
+except ImportError:
+    # Models not available — migration will work with existing metadata only.
+    logging.getLogger(__name__).warning("Could not import deerflow.persistence.models; Alembic may not detect all tables")
+
+config = context.config
+if config.config_file_name is not None:
+    fileConfig(config.config_file_name)
+
+target_metadata = Base.metadata
+
+
+def run_migrations_offline() -> None:
+    url = config.get_main_option("sqlalchemy.url")
+    context.configure(
+        url=url,
+        target_metadata=target_metadata,
+        literal_binds=True,
+        render_as_batch=True,
+    )
+    with context.begin_transaction():
+        context.run_migrations()
+
+
+def do_run_migrations(connection):
+    context.configure(
+        connection=connection,
+        target_metadata=target_metadata,
+        render_as_batch=True,  # Required for SQLite ALTER TABLE support
+    )
+    with context.begin_transaction():
+        context.run_migrations()
+
+
+async def run_migrations_online() -> None:
+    connectable = create_async_engine(config.get_main_option("sqlalchemy.url"))
+    async with connectable.connect() as connection:
+        await connection.run_sync(do_run_migrations)
+    await connectable.dispose()
+
+
+if context.is_offline_mode():
+    run_migrations_offline()
+else:
+    asyncio.run(run_migrations_online())
@@ -0,0 +1,21 @@
+"""ORM model registration entry point.
+
+Importing this module ensures all ORM models are registered with
+``Base.metadata`` so Alembic autogenerate detects every table.
+
+The actual ORM classes have moved to entity-specific subpackages:
+- ``deerflow.persistence.thread_meta``
+- ``deerflow.persistence.run``
+- ``deerflow.persistence.feedback``
+
+``RunEventRow`` remains in ``deerflow.persistence.models.run_event`` because
+its storage implementation lives in ``deerflow.runtime.events.store.db`` and
+there is no matching entity directory.
+"""
+
+from deerflow.persistence.feedback.model import FeedbackRow
+from deerflow.persistence.models.run_event import RunEventRow
+from deerflow.persistence.run.model import RunRow
+from deerflow.persistence.thread_meta.model import ThreadMetaRow
+
+__all__ = ["FeedbackRow", "RunEventRow", "RunRow", "ThreadMetaRow"]
@@ -0,0 +1,31 @@
+"""ORM model for run events."""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+
+from sqlalchemy import JSON, DateTime, Index, String, Text, UniqueConstraint
+from sqlalchemy.orm import Mapped, mapped_column
+
+from deerflow.persistence.base import Base
+
+
+class RunEventRow(Base):
+    __tablename__ = "run_events"
+
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    thread_id: Mapped[str] = mapped_column(String(64), nullable=False)
+    run_id: Mapped[str] = mapped_column(String(64), nullable=False)
+    event_type: Mapped[str] = mapped_column(String(32), nullable=False)
+    category: Mapped[str] = mapped_column(String(16), nullable=False)
+    # "message" | "trace" | "lifecycle"
+    content: Mapped[str] = mapped_column(Text, default="")
+    event_metadata: Mapped[dict] = mapped_column(JSON, default=dict)
+    seq: Mapped[int] = mapped_column(nullable=False)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC))
+
+    __table_args__ = (
+        UniqueConstraint("thread_id", "seq", name="uq_events_thread_seq"),
+        Index("ix_events_thread_cat_seq", "thread_id", "category", "seq"),
+        Index("ix_events_run", "thread_id", "run_id", "seq"),
+    )
@@ -0,0 +1,6 @@
+"""Run metadata persistence — ORM and SQL repository."""
+
+from deerflow.persistence.run.model import RunRow
+from deerflow.persistence.run.sql import RunRepository
+
+__all__ = ["RunRepository", "RunRow"]
@@ -0,0 +1,49 @@
+"""ORM model for run metadata."""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+
+from sqlalchemy import JSON, DateTime, Index, String, Text
+from sqlalchemy.orm import Mapped, mapped_column
+
+from deerflow.persistence.base import Base
+
+
+class RunRow(Base):
+    __tablename__ = "runs"
+
+    run_id: Mapped[str] = mapped_column(String(64), primary_key=True)
+    thread_id: Mapped[str] = mapped_column(String(64), nullable=False, index=True)
+    assistant_id: Mapped[str | None] = mapped_column(String(128))
+    owner_id: Mapped[str | None] = mapped_column(String(64), index=True)
+    status: Mapped[str] = mapped_column(String(20), default="pending")
+    # "pending" | "running" | "success" | "error" | "timeout" | "interrupted"
+
+    model_name: Mapped[str | None] = mapped_column(String(128))
+    multitask_strategy: Mapped[str] = mapped_column(String(20), default="reject")
+    metadata_json: Mapped[dict] = mapped_column(JSON, default=dict)
+    kwargs_json: Mapped[dict] = mapped_column(JSON, default=dict)
+    error: Mapped[str | None] = mapped_column(Text)
+
+    # Convenience fields (for listing pages without querying RunEventStore)
+    message_count: Mapped[int] = mapped_column(default=0)
+    first_human_message: Mapped[str | None] = mapped_column(Text)
+    last_ai_message: Mapped[str | None] = mapped_column(Text)
+
+    # Token usage (accumulated in-memory by RunJournal, written on run completion)
+    total_input_tokens: Mapped[int] = mapped_column(default=0)
+    total_output_tokens: Mapped[int] = mapped_column(default=0)
+    total_tokens: Mapped[int] = mapped_column(default=0)
+    llm_call_count: Mapped[int] = mapped_column(default=0)
+    lead_agent_tokens: Mapped[int] = mapped_column(default=0)
+    subagent_tokens: Mapped[int] = mapped_column(default=0)
+    middleware_tokens: Mapped[int] = mapped_column(default=0)
+
+    # Follow-up association
+    follow_up_to_run_id: Mapped[str | None] = mapped_column(String(64))
+
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC))
+    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC), onupdate=lambda: datetime.now(UTC))
+
+    __table_args__ = (Index("ix_runs_thread_status", "thread_id", "status"),)
@@ -0,0 +1,227 @@
+"""SQLAlchemy-backed RunStore implementation.
+
+Each method acquires and releases its own short-lived session.
+Run status updates happen from background workers that may live
+minutes -- we don't hold connections across long execution.
+"""
+
+from __future__ import annotations
+
+import json
+from datetime import UTC, datetime
+from typing import Any
+
+from sqlalchemy import func, select, update
+from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
+
+from deerflow.persistence.run.model import RunRow
+from deerflow.runtime.runs.store.base import RunStore
+
+
+class RunRepository(RunStore):
+    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
+        self._sf = session_factory
+
+    @staticmethod
+    def _safe_json(obj: Any) -> Any:
+        """Ensure obj is JSON-serializable. Falls back to model_dump() or str()."""
+        if obj is None:
+            return None
+        if isinstance(obj, (str, int, float, bool)):
+            return obj
+        if isinstance(obj, dict):
+            return {k: RunRepository._safe_json(v) for k, v in obj.items()}
+        if isinstance(obj, (list, tuple)):
+            return [RunRepository._safe_json(v) for v in obj]
+        if hasattr(obj, "model_dump"):
+            try:
+                return obj.model_dump()
+            except Exception:
+                pass
+        if hasattr(obj, "dict"):
+            try:
+                return obj.dict()
+            except Exception:
+                pass
+        try:
+            json.dumps(obj)
+            return obj
+        except (TypeError, ValueError):
+            return str(obj)
+
+    @staticmethod
+    def _row_to_dict(row: RunRow) -> dict[str, Any]:
+        d = row.to_dict()
+        # Remap JSON columns to match RunStore interface
+        d["metadata"] = d.pop("metadata_json", {})
+        d["kwargs"] = d.pop("kwargs_json", {})
+        # Convert datetime to ISO string for consistency with MemoryRunStore
+        for key in ("created_at", "updated_at"):
+            val = d.get(key)
+            if isinstance(val, datetime):
+                d[key] = val.isoformat()
+        return d
+
+    async def put(
+        self,
+        run_id,
+        *,
+        thread_id,
+        assistant_id=None,
+        owner_id=None,
+        status="pending",
+        multitask_strategy="reject",
+        metadata=None,
+        kwargs=None,
+        error=None,
+        created_at=None,
+        follow_up_to_run_id=None,
+    ):
+        now = datetime.now(UTC)
+        row = RunRow(
+            run_id=run_id,
+            thread_id=thread_id,
+            assistant_id=assistant_id,
+            owner_id=owner_id,
+            status=status,
+            multitask_strategy=multitask_strategy,
+            metadata_json=self._safe_json(metadata) or {},
+            kwargs_json=self._safe_json(kwargs) or {},
+            error=error,
+            follow_up_to_run_id=follow_up_to_run_id,
+            created_at=datetime.fromisoformat(created_at) if created_at else now,
+            updated_at=now,
+        )
+        async with self._sf() as session:
+            session.add(row)
+            await session.commit()
+
+    async def get(self, run_id):
+        async with self._sf() as session:
+            row = await session.get(RunRow, run_id)
+            return self._row_to_dict(row) if row else None
+
+    async def list_by_thread(self, thread_id, *, owner_id=None, limit=100):
+        stmt = select(RunRow).where(RunRow.thread_id == thread_id)
+        if owner_id is not None:
+            stmt = stmt.where(RunRow.owner_id == owner_id)
+        stmt = stmt.order_by(RunRow.created_at.desc()).limit(limit)
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def update_status(self, run_id, status, *, error=None):
+        values: dict[str, Any] = {"status": status, "updated_at": datetime.now(UTC)}
+        if error is not None:
+            values["error"] = error
+        async with self._sf() as session:
+            await session.execute(update(RunRow).where(RunRow.run_id == run_id).values(**values))
+            await session.commit()
+
+    async def delete(self, run_id):
+        async with self._sf() as session:
+            row = await session.get(RunRow, run_id)
+            if row is not None:
+                await session.delete(row)
+                await session.commit()
+
+    async def list_pending(self, *, before=None):
+        if before is None:
+            before_dt = datetime.now(UTC)
+        elif isinstance(before, datetime):
+            before_dt = before
+        else:
+            before_dt = datetime.fromisoformat(before)
+        stmt = select(RunRow).where(RunRow.status == "pending", RunRow.created_at <= before_dt).order_by(RunRow.created_at.asc())
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def update_run_completion(
+        self,
+        run_id: str,
+        *,
+        status: str,
+        total_input_tokens: int = 0,
+        total_output_tokens: int = 0,
+        total_tokens: int = 0,
+        llm_call_count: int = 0,
+        lead_agent_tokens: int = 0,
+        subagent_tokens: int = 0,
+        middleware_tokens: int = 0,
+        message_count: int = 0,
+        last_ai_message: str | None = None,
+        first_human_message: str | None = None,
+        error: str | None = None,
+    ) -> None:
+        """Update status + token usage + convenience fields on run completion."""
+        values: dict[str, Any] = {
+            "status": status,
+            "total_input_tokens": total_input_tokens,
+            "total_output_tokens": total_output_tokens,
+            "total_tokens": total_tokens,
+            "llm_call_count": llm_call_count,
+            "lead_agent_tokens": lead_agent_tokens,
+            "subagent_tokens": subagent_tokens,
+            "middleware_tokens": middleware_tokens,
+            "message_count": message_count,
+            "updated_at": datetime.now(UTC),
+        }
+        if last_ai_message is not None:
+            values["last_ai_message"] = last_ai_message[:2000]
+        if first_human_message is not None:
+            values["first_human_message"] = first_human_message[:2000]
+        if error is not None:
+            values["error"] = error
+        async with self._sf() as session:
+            await session.execute(update(RunRow).where(RunRow.run_id == run_id).values(**values))
+            await session.commit()
+
+    async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
+        """Aggregate token usage via a single SQL GROUP BY query."""
+        _completed = RunRow.status.in_(("success", "error"))
+        _thread = RunRow.thread_id == thread_id
+
+        stmt = (
+            select(
+                func.coalesce(RunRow.model_name, "unknown").label("model"),
+                func.count().label("runs"),
+                func.coalesce(func.sum(RunRow.total_tokens), 0).label("total_tokens"),
+                func.coalesce(func.sum(RunRow.total_input_tokens), 0).label("total_input_tokens"),
+                func.coalesce(func.sum(RunRow.total_output_tokens), 0).label("total_output_tokens"),
+                func.coalesce(func.sum(RunRow.lead_agent_tokens), 0).label("lead_agent"),
+                func.coalesce(func.sum(RunRow.subagent_tokens), 0).label("subagent"),
+                func.coalesce(func.sum(RunRow.middleware_tokens), 0).label("middleware"),
+            )
+            .where(_thread, _completed)
+            .group_by(func.coalesce(RunRow.model_name, "unknown"))
+        )
+
+        async with self._sf() as session:
+            rows = (await session.execute(stmt)).all()
+
+        total_tokens = total_input = total_output = total_runs = 0
+        lead_agent = subagent = middleware = 0
+        by_model: dict[str, dict] = {}
+        for r in rows:
+            by_model[r.model] = {"tokens": r.total_tokens, "runs": r.runs}
+            total_tokens += r.total_tokens
+            total_input += r.total_input_tokens
+            total_output += r.total_output_tokens
+            total_runs += r.runs
+            lead_agent += r.lead_agent
+            subagent += r.subagent
+            middleware += r.middleware
+
+        return {
+            "total_tokens": total_tokens,
+            "total_input_tokens": total_input,
+            "total_output_tokens": total_output,
+            "total_runs": total_runs,
+            "by_model": by_model,
+            "by_caller": {
+                "lead_agent": lead_agent,
+                "subagent": subagent,
+                "middleware": middleware,
+            },
+        }
@@ -0,0 +1,13 @@
+"""Thread metadata persistence — ORM, abstract store, and concrete implementations."""
+
+from deerflow.persistence.thread_meta.base import ThreadMetaStore
+from deerflow.persistence.thread_meta.memory import MemoryThreadMetaStore
+from deerflow.persistence.thread_meta.model import ThreadMetaRow
+from deerflow.persistence.thread_meta.sql import ThreadMetaRepository
+
+__all__ = [
+    "MemoryThreadMetaStore",
+    "ThreadMetaRepository",
+    "ThreadMetaRow",
+    "ThreadMetaStore",
+]
@@ -0,0 +1,60 @@
+"""Abstract interface for thread metadata storage.
+
+Implementations:
+- ThreadMetaRepository: SQL-backed (sqlite / postgres via SQLAlchemy)
+- MemoryThreadMetaStore: wraps LangGraph BaseStore (memory mode)
+"""
+
+from __future__ import annotations
+
+import abc
+
+
+class ThreadMetaStore(abc.ABC):
+    @abc.abstractmethod
+    async def create(
+        self,
+        thread_id: str,
+        *,
+        assistant_id: str | None = None,
+        owner_id: str | None = None,
+        display_name: str | None = None,
+        metadata: dict | None = None,
+    ) -> dict:
+        pass
+
+    @abc.abstractmethod
+    async def get(self, thread_id: str) -> dict | None:
+        pass
+
+    @abc.abstractmethod
+    async def search(
+        self,
+        *,
+        metadata: dict | None = None,
+        status: str | None = None,
+        limit: int = 100,
+        offset: int = 0,
+    ) -> list[dict]:
+        pass
+
+    @abc.abstractmethod
+    async def update_display_name(self, thread_id: str, display_name: str) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def update_status(self, thread_id: str, status: str) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def update_metadata(self, thread_id: str, metadata: dict) -> None:
+        """Merge ``metadata`` into the thread's metadata field.
+
+        Existing keys are overwritten by the new values; keys absent from
+        ``metadata`` are preserved. No-op if the thread does not exist.
+        """
+        pass
+
+    @abc.abstractmethod
+    async def delete(self, thread_id: str) -> None:
+        pass
@@ -0,0 +1,120 @@
+"""In-memory ThreadMetaStore backed by LangGraph BaseStore.
+
+Used when database.backend=memory. Delegates to the LangGraph Store's
+``("threads",)`` namespace — the same namespace used by the Gateway
+router for thread records.
+"""
+
+from __future__ import annotations
+
+import time
+from typing import Any
+
+from langgraph.store.base import BaseStore
+
+from deerflow.persistence.thread_meta.base import ThreadMetaStore
+
+THREADS_NS: tuple[str, ...] = ("threads",)
+
+
+class MemoryThreadMetaStore(ThreadMetaStore):
+    def __init__(self, store: BaseStore) -> None:
+        self._store = store
+
+    async def create(
+        self,
+        thread_id: str,
+        *,
+        assistant_id: str | None = None,
+        owner_id: str | None = None,
+        display_name: str | None = None,
+        metadata: dict | None = None,
+    ) -> dict:
+        now = time.time()
+        record: dict[str, Any] = {
+            "thread_id": thread_id,
+            "assistant_id": assistant_id,
+            "owner_id": owner_id,
+            "display_name": display_name,
+            "status": "idle",
+            "metadata": metadata or {},
+            "values": {},
+            "created_at": now,
+            "updated_at": now,
+        }
+        await self._store.aput(THREADS_NS, thread_id, record)
+        return record
+
+    async def get(self, thread_id: str) -> dict | None:
+        item = await self._store.aget(THREADS_NS, thread_id)
+        return item.value if item is not None else None
+
+    async def search(
+        self,
+        *,
+        metadata: dict | None = None,
+        status: str | None = None,
+        limit: int = 100,
+        offset: int = 0,
+    ) -> list[dict]:
+        filter_dict: dict[str, Any] = {}
+        if metadata:
+            filter_dict.update(metadata)
+        if status:
+            filter_dict["status"] = status
+
+        items = await self._store.asearch(
+            THREADS_NS,
+            filter=filter_dict or None,
+            limit=limit,
+            offset=offset,
+        )
+        return [self._item_to_dict(item) for item in items]
+
+    async def update_display_name(self, thread_id: str, display_name: str) -> None:
+        item = await self._store.aget(THREADS_NS, thread_id)
+        if item is None:
+            return
+        record = dict(item.value)
+        record["display_name"] = display_name
+        record["updated_at"] = time.time()
+        await self._store.aput(THREADS_NS, thread_id, record)
+
+    async def update_status(self, thread_id: str, status: str) -> None:
+        item = await self._store.aget(THREADS_NS, thread_id)
+        if item is None:
+            return
+        record = dict(item.value)
+        record["status"] = status
+        record["updated_at"] = time.time()
+        await self._store.aput(THREADS_NS, thread_id, record)
+
+    async def update_metadata(self, thread_id: str, metadata: dict) -> None:
+        """Merge ``metadata`` into the in-memory record. No-op if absent."""
+        item = await self._store.aget(THREADS_NS, thread_id)
+        if item is None:
+            return
+        record = dict(item.value)
+        merged = dict(record.get("metadata") or {})
+        merged.update(metadata)
+        record["metadata"] = merged
+        record["updated_at"] = time.time()
+        await self._store.aput(THREADS_NS, thread_id, record)
+
+    async def delete(self, thread_id: str) -> None:
+        await self._store.adelete(THREADS_NS, thread_id)
+
+    @staticmethod
+    def _item_to_dict(item) -> dict[str, Any]:
+        """Convert a Store SearchItem to the dict format expected by callers."""
+        val = item.value
+        return {
+            "thread_id": item.key,
+            "assistant_id": val.get("assistant_id"),
+            "owner_id": val.get("owner_id"),
+            "display_name": val.get("display_name"),
+            "status": val.get("status", "idle"),
+            "metadata": val.get("metadata", {}),
+            "created_at": str(val.get("created_at", "")),
+            "updated_at": str(val.get("updated_at", "")),
+        }
@@ -0,0 +1,23 @@
+"""ORM model for thread metadata."""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+
+from sqlalchemy import JSON, DateTime, String
+from sqlalchemy.orm import Mapped, mapped_column
+
+from deerflow.persistence.base import Base
+
+
+class ThreadMetaRow(Base):
+    __tablename__ = "threads_meta"
+
+    thread_id: Mapped[str] = mapped_column(String(64), primary_key=True)
+    assistant_id: Mapped[str | None] = mapped_column(String(128), index=True)
+    owner_id: Mapped[str | None] = mapped_column(String(64), index=True)
+    display_name: Mapped[str | None] = mapped_column(String(256))
+    status: Mapped[str] = mapped_column(String(20), default="idle")
+    metadata_json: Mapped[dict] = mapped_column(JSON, default=dict)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC))
+    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(UTC), onupdate=lambda: datetime.now(UTC))
@@ -0,0 +1,140 @@
+"""SQLAlchemy-backed thread metadata repository."""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+from typing import Any
+
+from sqlalchemy import select, update
+from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
+
+from deerflow.persistence.thread_meta.base import ThreadMetaStore
+from deerflow.persistence.thread_meta.model import ThreadMetaRow
+
+
+class ThreadMetaRepository(ThreadMetaStore):
+    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
+        self._sf = session_factory
+
+    @staticmethod
+    def _row_to_dict(row: ThreadMetaRow) -> dict[str, Any]:
+        d = row.to_dict()
+        d["metadata"] = d.pop("metadata_json", {})
+        for key in ("created_at", "updated_at"):
+            val = d.get(key)
+            if isinstance(val, datetime):
+                d[key] = val.isoformat()
+        return d
+
+    async def create(
+        self,
+        thread_id: str,
+        *,
+        assistant_id: str | None = None,
+        owner_id: str | None = None,
+        display_name: str | None = None,
+        metadata: dict | None = None,
+    ) -> dict:
+        now = datetime.now(UTC)
+        row = ThreadMetaRow(
+            thread_id=thread_id,
+            assistant_id=assistant_id,
+            owner_id=owner_id,
+            display_name=display_name,
+            metadata_json=metadata or {},
+            created_at=now,
+            updated_at=now,
+        )
+        async with self._sf() as session:
+            session.add(row)
+            await session.commit()
+            await session.refresh(row)
+            return self._row_to_dict(row)
+
+    async def get(self, thread_id: str) -> dict | None:
+        async with self._sf() as session:
+            row = await session.get(ThreadMetaRow, thread_id)
+            return self._row_to_dict(row) if row else None
+
+    async def list_by_owner(self, owner_id: str, *, limit: int = 100, offset: int = 0) -> list[dict]:
+        stmt = select(ThreadMetaRow).where(ThreadMetaRow.owner_id == owner_id).order_by(ThreadMetaRow.updated_at.desc()).limit(limit).offset(offset)
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def check_access(self, thread_id: str, owner_id: str) -> bool:
+        """Check if owner_id has access to thread_id.
+
+        Returns True if: row doesn't exist (untracked thread), owner_id
+        is None on the row (shared thread), or owner_id matches.
+        """
+        async with self._sf() as session:
+            row = await session.get(ThreadMetaRow, thread_id)
+            if row is None:
+                return True
+            if row.owner_id is None:
+                return True
+            return row.owner_id == owner_id
+
+    async def search(
+        self,
+        *,
+        metadata: dict | None = None,
+        status: str | None = None,
+        limit: int = 100,
+        offset: int = 0,
+    ) -> list[dict]:
+        """Search threads with optional metadata and status filters."""
+        stmt = select(ThreadMetaRow).order_by(ThreadMetaRow.updated_at.desc())
+        if status:
+            stmt = stmt.where(ThreadMetaRow.status == status)
+
+        if metadata:
+            # When metadata filter is active, fetch a larger window and filter
+            # in Python. TODO(Phase 2): use JSON DB operators (Postgres @>,
+            # SQLite json_extract) for server-side filtering.
+            stmt = stmt.limit(limit * 5 + offset)
+            async with self._sf() as session:
+                result = await session.execute(stmt)
+                rows = [self._row_to_dict(r) for r in result.scalars()]
+            rows = [r for r in rows if all(r.get("metadata", {}).get(k) == v for k, v in metadata.items())]
+            return rows[offset : offset + limit]
+        else:
+            stmt = stmt.limit(limit).offset(offset)
+            async with self._sf() as session:
+                result = await session.execute(stmt)
+                return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def update_display_name(self, thread_id: str, display_name: str) -> None:
+        """Update the display_name (title) for a thread."""
+        async with self._sf() as session:
+            await session.execute(update(ThreadMetaRow).where(ThreadMetaRow.thread_id == thread_id).values(display_name=display_name, updated_at=datetime.now(UTC)))
+            await session.commit()
+
+    async def update_status(self, thread_id: str, status: str) -> None:
+        async with self._sf() as session:
+            await session.execute(update(ThreadMetaRow).where(ThreadMetaRow.thread_id == thread_id).values(status=status, updated_at=datetime.now(UTC)))
+            await session.commit()
+
+    async def update_metadata(self, thread_id: str, metadata: dict) -> None:
+        """Merge ``metadata`` into ``metadata_json``.
+
+        Read-modify-write inside a single session/transaction so concurrent
+        callers see consistent state. No-op if the row does not exist.
+        """
+        async with self._sf() as session:
+            row = await session.get(ThreadMetaRow, thread_id)
+            if row is None:
+                return
+            merged = dict(row.metadata_json or {})
+            merged.update(metadata)
+            row.metadata_json = merged
+            row.updated_at = datetime.now(UTC)
+            await session.commit()
+
+    async def delete(self, thread_id: str) -> None:
+        async with self._sf() as session:
+            row = await session.get(ThreadMetaRow, thread_id)
+            if row is not None:
+                await session.delete(row)
+                await session.commit()
@@ -5,7 +5,7 @@ Re-exports the public API of :mod:`~deerflow.runtime.runs` and
 directly from ``deerflow.runtime``.
 """

-from .runs import ConflictError, DisconnectMode, RunManager, RunRecord, RunStatus, UnsupportedStrategyError, run_agent
+from .runs import ConflictError, DisconnectMode, RunContext, RunManager, RunRecord, RunStatus, UnsupportedStrategyError, run_agent
 from .serialization import serialize, serialize_channel_values, serialize_lc_object, serialize_messages_tuple
 from .store import get_store, make_store, reset_store, store_context
 from .stream_bridge import END_SENTINEL, HEARTBEAT_SENTINEL, MemoryStreamBridge, StreamBridge, StreamEvent, make_stream_bridge
@@ -14,6 +14,7 @@ __all__ = [
    # runs
    "ConflictError",
    "DisconnectMode",
+    "RunContext",
    "RunManager",
    "RunRecord",
    "RunStatus",
@@ -0,0 +1,134 @@
+"""Pure functions to convert LangChain message objects to OpenAI Chat Completions format.
+
+Used by RunJournal to build content dicts for event storage.
+"""
+
+from __future__ import annotations
+
+import json
+from typing import Any
+
+_ROLE_MAP = {
+    "human": "user",
+    "ai": "assistant",
+    "system": "system",
+    "tool": "tool",
+}
+
+
+def langchain_to_openai_message(message: Any) -> dict:
+    """Convert a single LangChain BaseMessage to an OpenAI message dict.
+
+    Handles:
+    - HumanMessage → {"role": "user", "content": "..."}
+    - AIMessage (text only) → {"role": "assistant", "content": "..."}
+    - AIMessage (with tool_calls) → {"role": "assistant", "content": null, "tool_calls": [...]}
+    - AIMessage (text + tool_calls) → both content and tool_calls present
+    - AIMessage (list content / multimodal) → content preserved as list
+    - SystemMessage → {"role": "system", "content": "..."}
+    - ToolMessage → {"role": "tool", "tool_call_id": "...", "content": "..."}
+    """
+    msg_type = getattr(message, "type", "")
+    role = _ROLE_MAP.get(msg_type, msg_type)
+    content = getattr(message, "content", "")
+
+    if role == "tool":
+        return {
+            "role": "tool",
+            "tool_call_id": getattr(message, "tool_call_id", ""),
+            "content": content,
+        }
+
+    if role == "assistant":
+        tool_calls = getattr(message, "tool_calls", None) or []
+        result: dict = {"role": "assistant"}
+
+        if tool_calls:
+            openai_tool_calls = []
+            for tc in tool_calls:
+                args = tc.get("args", {})
+                openai_tool_calls.append(
+                    {
+                        "id": tc.get("id", ""),
+                        "type": "function",
+                        "function": {
+                            "name": tc.get("name", ""),
+                            "arguments": json.dumps(args) if not isinstance(args, str) else args,
+                        },
+                    }
+                )
+            # If no text content, set content to null per OpenAI spec
+            result["content"] = content if (isinstance(content, list) and content) or (isinstance(content, str) and content) else None
+            result["tool_calls"] = openai_tool_calls
+        else:
+            result["content"] = content
+
+        return result
+
+    # user / system / unknown
+    return {"role": role, "content": content}
+
+
+def _infer_finish_reason(message: Any) -> str:
+    """Infer OpenAI finish_reason from an AIMessage.
+
+    Returns "tool_calls" if tool_calls present, else looks in
+    response_metadata.finish_reason, else returns "stop".
+    """
+    tool_calls = getattr(message, "tool_calls", None) or []
+    if tool_calls:
+        return "tool_calls"
+    resp_meta = getattr(message, "response_metadata", None) or {}
+    if isinstance(resp_meta, dict):
+        finish = resp_meta.get("finish_reason")
+        if finish:
+            return finish
+    return "stop"
+
+
+def langchain_to_openai_completion(message: Any) -> dict:
+    """Convert an AIMessage and its metadata to an OpenAI completion response dict.
+
+    Returns:
+        {
+            "id": message.id,
+            "model": message.response_metadata.get("model_name"),
+            "choices": [{"index": 0, "message": <openai_message>, "finish_reason": <inferred>}],
+            "usage": {"prompt_tokens": ..., "completion_tokens": ..., "total_tokens": ...} or None,
+        }
+    """
+    resp_meta = getattr(message, "response_metadata", None) or {}
+    model_name = resp_meta.get("model_name") if isinstance(resp_meta, dict) else None
+
+    openai_msg = langchain_to_openai_message(message)
+    finish_reason = _infer_finish_reason(message)
+
+    usage_metadata = getattr(message, "usage_metadata", None)
+    if usage_metadata is not None:
+        input_tokens = usage_metadata.get("input_tokens", 0) or 0
+        output_tokens = usage_metadata.get("output_tokens", 0) or 0
+        usage: dict | None = {
+            "prompt_tokens": input_tokens,
+            "completion_tokens": output_tokens,
+            "total_tokens": input_tokens + output_tokens,
+        }
+    else:
+        usage = None
+
+    return {
+        "id": getattr(message, "id", None),
+        "model": model_name,
+        "choices": [
+            {
+                "index": 0,
+                "message": openai_msg,
+                "finish_reason": finish_reason,
+            }
+        ],
+        "usage": usage,
+    }
+
+
+def langchain_messages_to_openai(messages: list) -> list[dict]:
+    """Convert a list of LangChain BaseMessages to OpenAI message dicts."""
+    return [langchain_to_openai_message(m) for m in messages]
@@ -0,0 +1,4 @@
+from deerflow.runtime.events.store.base import RunEventStore
+from deerflow.runtime.events.store.memory import MemoryRunEventStore
+
+__all__ = ["MemoryRunEventStore", "RunEventStore"]
@@ -0,0 +1,26 @@
+from deerflow.runtime.events.store.base import RunEventStore
+from deerflow.runtime.events.store.memory import MemoryRunEventStore
+
+
+def make_run_event_store(config=None) -> RunEventStore:
+    """Create a RunEventStore based on run_events.backend configuration."""
+    if config is None or config.backend == "memory":
+        return MemoryRunEventStore()
+    if config.backend == "db":
+        from deerflow.persistence.engine import get_session_factory
+
+        sf = get_session_factory()
+        if sf is None:
+            # database.backend=memory but run_events.backend=db -> fallback
+            return MemoryRunEventStore()
+        from deerflow.runtime.events.store.db import DbRunEventStore
+
+        return DbRunEventStore(sf, max_trace_content=config.max_trace_content)
+    if config.backend == "jsonl":
+        from deerflow.runtime.events.store.jsonl import JsonlRunEventStore
+
+        return JsonlRunEventStore()
+    raise ValueError(f"Unknown run_events backend: {config.backend!r}")
+
+
+__all__ = ["MemoryRunEventStore", "RunEventStore", "make_run_event_store"]
@@ -0,0 +1,99 @@
+"""Abstract interface for run event storage.
+
+RunEventStore is the unified storage interface for run event streams.
+Messages (frontend display) and execution traces (debugging/audit) go
+through the same interface, distinguished by the ``category`` field.
+
+Implementations:
+- MemoryRunEventStore: in-memory dict (development, tests)
+- Future: DB-backed store (SQLAlchemy ORM), JSONL file store
+"""
+
+from __future__ import annotations
+
+import abc
+
+
+class RunEventStore(abc.ABC):
+    """Run event stream storage interface.
+
+    All implementations must guarantee:
+    1. put() events are retrievable in subsequent queries
+    2. seq is strictly increasing within the same thread
+    3. list_messages() only returns category="message" events
+    4. list_events() returns all events for the specified run
+    5. Returned dicts match the RunEvent field structure
+    """
+
+    @abc.abstractmethod
+    async def put(
+        self,
+        *,
+        thread_id: str,
+        run_id: str,
+        event_type: str,
+        category: str,
+        content: str | dict = "",
+        metadata: dict | None = None,
+        created_at: str | None = None,
+    ) -> dict:
+        """Write an event, auto-assign seq, return the complete record."""
+
+    @abc.abstractmethod
+    async def put_batch(self, events: list[dict]) -> list[dict]:
+        """Batch-write events. Used by RunJournal flush buffer.
+
+        Each dict's keys match put()'s keyword arguments.
+        Returns complete records with seq assigned.
+        """
+
+    @abc.abstractmethod
+    async def list_messages(
+        self,
+        thread_id: str,
+        *,
+        limit: int = 50,
+        before_seq: int | None = None,
+        after_seq: int | None = None,
+    ) -> list[dict]:
+        """Return displayable messages (category=message) for a thread, ordered by seq ascending.
+
+        Supports bidirectional cursor pagination:
+        - before_seq: return the last ``limit`` records with seq < before_seq (ascending)
+        - after_seq: return the first ``limit`` records with seq > after_seq (ascending)
+        - neither: return the latest ``limit`` records (ascending)
+        """
+
+    @abc.abstractmethod
+    async def list_events(
+        self,
+        thread_id: str,
+        run_id: str,
+        *,
+        event_types: list[str] | None = None,
+        limit: int = 500,
+    ) -> list[dict]:
+        """Return the full event stream for a run, ordered by seq ascending.
+
+        Optionally filter by event_types.
+        """
+
+    @abc.abstractmethod
+    async def list_messages_by_run(
+        self,
+        thread_id: str,
+        run_id: str,
+    ) -> list[dict]:
+        """Return displayable messages (category=message) for a specific run, ordered by seq ascending."""
+
+    @abc.abstractmethod
+    async def count_messages(self, thread_id: str) -> int:
+        """Count displayable messages (category=message) in a thread."""
+
+    @abc.abstractmethod
+    async def delete_by_thread(self, thread_id: str) -> int:
+        """Delete all events for a thread. Return the number of deleted events."""
+
+    @abc.abstractmethod
+    async def delete_by_run(self, thread_id: str, run_id: str) -> int:
+        """Delete all events for a specific run. Return the number of deleted events."""
@@ -0,0 +1,185 @@
+"""SQLAlchemy-backed RunEventStore implementation.
+
+Persists events to the ``run_events`` table. Trace content is truncated
+at ``max_trace_content`` bytes to avoid bloating the database.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from datetime import UTC, datetime
+
+from sqlalchemy import delete, func, select
+from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
+
+from deerflow.persistence.models.run_event import RunEventRow
+from deerflow.runtime.events.store.base import RunEventStore
+
+logger = logging.getLogger(__name__)
+
+
+class DbRunEventStore(RunEventStore):
+    def __init__(self, session_factory: async_sessionmaker[AsyncSession], *, max_trace_content: int = 10240):
+        self._sf = session_factory
+        self._max_trace_content = max_trace_content
+
+    @staticmethod
+    def _row_to_dict(row: RunEventRow) -> dict:
+        d = row.to_dict()
+        d["metadata"] = d.pop("event_metadata", {})
+        val = d.get("created_at")
+        if isinstance(val, datetime):
+            d["created_at"] = val.isoformat()
+        d.pop("id", None)
+        # Restore dict content that was JSON-serialized on write
+        raw = d.get("content", "")
+        if isinstance(raw, str) and d.get("metadata", {}).get("content_is_dict"):
+            try:
+                d["content"] = json.loads(raw)
+            except (json.JSONDecodeError, ValueError):
+                # Content looked like JSON (content_is_dict flag) but failed to parse;
+                # keep the raw string as-is.
+                logger.debug("Failed to deserialize content as JSON for event seq=%s", d.get("seq"))
+        return d
+
+    def _truncate_trace(self, category: str, content: str | dict, metadata: dict | None) -> tuple[str | dict, dict]:
+        if category == "trace":
+            text = json.dumps(content, default=str, ensure_ascii=False) if isinstance(content, dict) else content
+            encoded = text.encode("utf-8")
+            if len(encoded) > self._max_trace_content:
+                # Truncate by bytes, then decode back (may cut a multi-byte char, so use errors="ignore")
+                content = encoded[: self._max_trace_content].decode("utf-8", errors="ignore")
+                metadata = {**(metadata or {}), "content_truncated": True, "original_byte_length": len(encoded)}
+        return content, metadata or {}
+
+    async def put(self, *, thread_id, run_id, event_type, category, content="", metadata=None, created_at=None):  # noqa: D401
+        """Write a single event — low-frequency path only.
+
+        This opens a dedicated transaction with a FOR UPDATE lock to
+        assign a monotonic *seq*.  For high-throughput writes use
+        :meth:`put_batch`, which acquires the lock once for the whole
+        batch.  Currently the only caller is ``worker.run_agent`` for
+        the initial ``human_message`` event (once per run).
+        """
+        content, metadata = self._truncate_trace(category, content, metadata)
+        if isinstance(content, dict):
+            db_content = json.dumps(content, default=str, ensure_ascii=False)
+            metadata = {**(metadata or {}), "content_is_dict": True}
+        else:
+            db_content = content
+        async with self._sf() as session:
+            async with session.begin():
+                # Use FOR UPDATE to serialize seq assignment within a thread.
+                # NOTE: with_for_update() on aggregates is a no-op on SQLite;
+                # the UNIQUE(thread_id, seq) constraint catches races there.
+                max_seq = await session.scalar(select(func.max(RunEventRow.seq)).where(RunEventRow.thread_id == thread_id).with_for_update())
+                seq = (max_seq or 0) + 1
+                row = RunEventRow(
+                    thread_id=thread_id,
+                    run_id=run_id,
+                    event_type=event_type,
+                    category=category,
+                    content=db_content,
+                    event_metadata=metadata,
+                    seq=seq,
+                    created_at=datetime.fromisoformat(created_at) if created_at else datetime.now(UTC),
+                )
+                session.add(row)
+            return self._row_to_dict(row)
+
+    async def put_batch(self, events):
+        if not events:
+            return []
+        async with self._sf() as session:
+            async with session.begin():
+                # Get max seq for the thread (assume all events in batch belong to same thread).
+                # NOTE: with_for_update() on aggregates is a no-op on SQLite;
+                # the UNIQUE(thread_id, seq) constraint catches races there.
+                thread_id = events[0]["thread_id"]
+                max_seq = await session.scalar(select(func.max(RunEventRow.seq)).where(RunEventRow.thread_id == thread_id).with_for_update())
+                seq = max_seq or 0
+                rows = []
+                for e in events:
+                    seq += 1
+                    content = e.get("content", "")
+                    category = e.get("category", "trace")
+                    metadata = e.get("metadata")
+                    content, metadata = self._truncate_trace(category, content, metadata)
+                    if isinstance(content, dict):
+                        db_content = json.dumps(content, default=str, ensure_ascii=False)
+                        metadata = {**(metadata or {}), "content_is_dict": True}
+                    else:
+                        db_content = content
+                    row = RunEventRow(
+                        thread_id=e["thread_id"],
+                        run_id=e["run_id"],
+                        event_type=e["event_type"],
+                        category=category,
+                        content=db_content,
+                        event_metadata=metadata,
+                        seq=seq,
+                        created_at=datetime.fromisoformat(e["created_at"]) if e.get("created_at") else datetime.now(UTC),
+                    )
+                    session.add(row)
+                    rows.append(row)
+            return [self._row_to_dict(r) for r in rows]
+
+    async def list_messages(self, thread_id, *, limit=50, before_seq=None, after_seq=None):
+        stmt = select(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.category == "message")
+        if before_seq is not None:
+            stmt = stmt.where(RunEventRow.seq < before_seq)
+        if after_seq is not None:
+            stmt = stmt.where(RunEventRow.seq > after_seq)
+
+        if after_seq is not None:
+            # Forward pagination: first `limit` records after cursor
+            stmt = stmt.order_by(RunEventRow.seq.asc()).limit(limit)
+            async with self._sf() as session:
+                result = await session.execute(stmt)
+                return [self._row_to_dict(r) for r in result.scalars()]
+        else:
+            # before_seq or default (latest): take last `limit` records, return ascending
+            stmt = stmt.order_by(RunEventRow.seq.desc()).limit(limit)
+            async with self._sf() as session:
+                result = await session.execute(stmt)
+                rows = list(result.scalars())
+                return [self._row_to_dict(r) for r in reversed(rows)]
+
+    async def list_events(self, thread_id, run_id, *, event_types=None, limit=500):
+        stmt = select(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id)
+        if event_types:
+            stmt = stmt.where(RunEventRow.event_type.in_(event_types))
+        stmt = stmt.order_by(RunEventRow.seq.asc()).limit(limit)
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def list_messages_by_run(self, thread_id, run_id):
+        stmt = select(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id, RunEventRow.category == "message").order_by(RunEventRow.seq.asc())
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def count_messages(self, thread_id):
+        stmt = select(func.count()).select_from(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.category == "message")
+        async with self._sf() as session:
+            return await session.scalar(stmt) or 0
+
+    async def delete_by_thread(self, thread_id):
+        async with self._sf() as session:
+            count_stmt = select(func.count()).select_from(RunEventRow).where(RunEventRow.thread_id == thread_id)
+            count = await session.scalar(count_stmt) or 0
+            if count > 0:
+                await session.execute(delete(RunEventRow).where(RunEventRow.thread_id == thread_id))
+                await session.commit()
+            return count
+
+    async def delete_by_run(self, thread_id, run_id):
+        async with self._sf() as session:
+            count_stmt = select(func.count()).select_from(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id)
+            count = await session.scalar(count_stmt) or 0
+            if count > 0:
+                await session.execute(delete(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id))
+                await session.commit()
+            return count
@@ -0,0 +1,179 @@
+"""JSONL file-backed RunEventStore implementation.
+
+Each run's events are stored in a single file:
+``.deer-flow/threads/{thread_id}/runs/{run_id}.jsonl``
+
+All categories (message, trace, lifecycle) are in the same file.
+This backend is suitable for lightweight single-node deployments.
+
+Known trade-off: ``list_messages()`` must scan all run files for a
+thread since messages from multiple runs need unified seq ordering.
+``list_events()`` reads only one file -- the fast path.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import re
+from datetime import UTC, datetime
+from pathlib import Path
+
+from deerflow.runtime.events.store.base import RunEventStore
+
+logger = logging.getLogger(__name__)
+
+_SAFE_ID_PATTERN = re.compile(r"^[A-Za-z0-9_\-]+$")
+
+
+class JsonlRunEventStore(RunEventStore):
+    def __init__(self, base_dir: str | Path | None = None):
+        self._base_dir = Path(base_dir) if base_dir else Path(".deer-flow")
+        self._seq_counters: dict[str, int] = {}  # thread_id -> current max seq
+
+    @staticmethod
+    def _validate_id(value: str, label: str) -> str:
+        """Validate that an ID is safe for use in filesystem paths."""
+        if not value or not _SAFE_ID_PATTERN.match(value):
+            raise ValueError(f"Invalid {label}: must be alphanumeric/dash/underscore, got {value!r}")
+        return value
+
+    def _thread_dir(self, thread_id: str) -> Path:
+        self._validate_id(thread_id, "thread_id")
+        return self._base_dir / "threads" / thread_id / "runs"
+
+    def _run_file(self, thread_id: str, run_id: str) -> Path:
+        self._validate_id(run_id, "run_id")
+        return self._thread_dir(thread_id) / f"{run_id}.jsonl"
+
+    def _next_seq(self, thread_id: str) -> int:
+        self._seq_counters[thread_id] = self._seq_counters.get(thread_id, 0) + 1
+        return self._seq_counters[thread_id]
+
+    def _ensure_seq_loaded(self, thread_id: str) -> None:
+        """Load max seq from existing files if not yet cached."""
+        if thread_id in self._seq_counters:
+            return
+        max_seq = 0
+        thread_dir = self._thread_dir(thread_id)
+        if thread_dir.exists():
+            for f in thread_dir.glob("*.jsonl"):
+                for line in f.read_text(encoding="utf-8").strip().splitlines():
+                    try:
+                        record = json.loads(line)
+                        max_seq = max(max_seq, record.get("seq", 0))
+                    except json.JSONDecodeError:
+                        logger.debug("Skipping malformed JSONL line in %s", f)
+                        continue
+        self._seq_counters[thread_id] = max_seq
+
+    def _write_record(self, record: dict) -> None:
+        path = self._run_file(record["thread_id"], record["run_id"])
+        path.parent.mkdir(parents=True, exist_ok=True)
+        with open(path, "a", encoding="utf-8") as f:
+            f.write(json.dumps(record, default=str, ensure_ascii=False) + "\n")
+
+    def _read_thread_events(self, thread_id: str) -> list[dict]:
+        """Read all events for a thread, sorted by seq."""
+        events = []
+        thread_dir = self._thread_dir(thread_id)
+        if not thread_dir.exists():
+            return events
+        for f in sorted(thread_dir.glob("*.jsonl")):
+            for line in f.read_text(encoding="utf-8").strip().splitlines():
+                if not line:
+                    continue
+                try:
+                    events.append(json.loads(line))
+                except json.JSONDecodeError:
+                    logger.debug("Skipping malformed JSONL line in %s", f)
+                    continue
+        events.sort(key=lambda e: e.get("seq", 0))
+        return events
+
+    def _read_run_events(self, thread_id: str, run_id: str) -> list[dict]:
+        """Read events for a specific run file."""
+        path = self._run_file(thread_id, run_id)
+        if not path.exists():
+            return []
+        events = []
+        for line in path.read_text(encoding="utf-8").strip().splitlines():
+            if not line:
+                continue
+            try:
+                events.append(json.loads(line))
+            except json.JSONDecodeError:
+                logger.debug("Skipping malformed JSONL line in %s", path)
+                continue
+        events.sort(key=lambda e: e.get("seq", 0))
+        return events
+
+    async def put(self, *, thread_id, run_id, event_type, category, content="", metadata=None, created_at=None):
+        self._ensure_seq_loaded(thread_id)
+        seq = self._next_seq(thread_id)
+        record = {
+            "thread_id": thread_id,
+            "run_id": run_id,
+            "event_type": event_type,
+            "category": category,
+            "content": content,
+            "metadata": metadata or {},
+            "seq": seq,
+            "created_at": created_at or datetime.now(UTC).isoformat(),
+        }
+        self._write_record(record)
+        return record
+
+    async def put_batch(self, events):
+        if not events:
+            return []
+        results = []
+        for ev in events:
+            record = await self.put(**ev)
+            results.append(record)
+        return results
+
+    async def list_messages(self, thread_id, *, limit=50, before_seq=None, after_seq=None):
+        all_events = self._read_thread_events(thread_id)
+        messages = [e for e in all_events if e.get("category") == "message"]
+
+        if before_seq is not None:
+            messages = [e for e in messages if e["seq"] < before_seq]
+            return messages[-limit:]
+        elif after_seq is not None:
+            messages = [e for e in messages if e["seq"] > after_seq]
+            return messages[:limit]
+        else:
+            return messages[-limit:]
+
+    async def list_events(self, thread_id, run_id, *, event_types=None, limit=500):
+        events = self._read_run_events(thread_id, run_id)
+        if event_types is not None:
+            events = [e for e in events if e.get("event_type") in event_types]
+        return events[:limit]
+
+    async def list_messages_by_run(self, thread_id, run_id):
+        events = self._read_run_events(thread_id, run_id)
+        return [e for e in events if e.get("category") == "message"]
+
+    async def count_messages(self, thread_id):
+        all_events = self._read_thread_events(thread_id)
+        return sum(1 for e in all_events if e.get("category") == "message")
+
+    async def delete_by_thread(self, thread_id):
+        all_events = self._read_thread_events(thread_id)
+        count = len(all_events)
+        thread_dir = self._thread_dir(thread_id)
+        if thread_dir.exists():
+            for f in thread_dir.glob("*.jsonl"):
+                f.unlink()
+        self._seq_counters.pop(thread_id, None)
+        return count
+
+    async def delete_by_run(self, thread_id, run_id):
+        events = self._read_run_events(thread_id, run_id)
+        count = len(events)
+        path = self._run_file(thread_id, run_id)
+        if path.exists():
+            path.unlink()
+        return count
@@ -0,0 +1,120 @@
+"""In-memory RunEventStore. Used when run_events.backend=memory (default) and in tests.
+
+Thread-safe for single-process async usage (no threading locks needed
+since all mutations happen within the same event loop).
+"""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+
+from deerflow.runtime.events.store.base import RunEventStore
+
+
+class MemoryRunEventStore(RunEventStore):
+    def __init__(self) -> None:
+        self._events: dict[str, list[dict]] = {}  # thread_id -> sorted event list
+        self._seq_counters: dict[str, int] = {}  # thread_id -> last assigned seq
+
+    def _next_seq(self, thread_id: str) -> int:
+        current = self._seq_counters.get(thread_id, 0)
+        next_val = current + 1
+        self._seq_counters[thread_id] = next_val
+        return next_val
+
+    def _put_one(
+        self,
+        *,
+        thread_id: str,
+        run_id: str,
+        event_type: str,
+        category: str,
+        content: str | dict = "",
+        metadata: dict | None = None,
+        created_at: str | None = None,
+    ) -> dict:
+        seq = self._next_seq(thread_id)
+        record = {
+            "thread_id": thread_id,
+            "run_id": run_id,
+            "event_type": event_type,
+            "category": category,
+            "content": content,
+            "metadata": metadata or {},
+            "seq": seq,
+            "created_at": created_at or datetime.now(UTC).isoformat(),
+        }
+        self._events.setdefault(thread_id, []).append(record)
+        return record
+
+    async def put(
+        self,
+        *,
+        thread_id,
+        run_id,
+        event_type,
+        category,
+        content="",
+        metadata=None,
+        created_at=None,
+    ):
+        return self._put_one(
+            thread_id=thread_id,
+            run_id=run_id,
+            event_type=event_type,
+            category=category,
+            content=content,
+            metadata=metadata,
+            created_at=created_at,
+        )
+
+    async def put_batch(self, events):
+        results = []
+        for ev in events:
+            record = self._put_one(**ev)
+            results.append(record)
+        return results
+
+    async def list_messages(self, thread_id, *, limit=50, before_seq=None, after_seq=None):
+        all_events = self._events.get(thread_id, [])
+        messages = [e for e in all_events if e["category"] == "message"]
+
+        if before_seq is not None:
+            messages = [e for e in messages if e["seq"] < before_seq]
+            # Take the last `limit` records
+            return messages[-limit:]
+        elif after_seq is not None:
+            messages = [e for e in messages if e["seq"] > after_seq]
+            return messages[:limit]
+        else:
+            # Return the latest `limit` records, ascending
+            return messages[-limit:]
+
+    async def list_events(self, thread_id, run_id, *, event_types=None, limit=500):
+        all_events = self._events.get(thread_id, [])
+        filtered = [e for e in all_events if e["run_id"] == run_id]
+        if event_types is not None:
+            filtered = [e for e in filtered if e["event_type"] in event_types]
+        return filtered[:limit]
+
+    async def list_messages_by_run(self, thread_id, run_id):
+        all_events = self._events.get(thread_id, [])
+        return [e for e in all_events if e["run_id"] == run_id and e["category"] == "message"]
+
+    async def count_messages(self, thread_id):
+        all_events = self._events.get(thread_id, [])
+        return sum(1 for e in all_events if e["category"] == "message")
+
+    async def delete_by_thread(self, thread_id):
+        events = self._events.pop(thread_id, [])
+        self._seq_counters.pop(thread_id, None)
+        return len(events)
+
+    async def delete_by_run(self, thread_id, run_id):
+        all_events = self._events.get(thread_id, [])
+        if not all_events:
+            return 0
+        remaining = [e for e in all_events if e["run_id"] != run_id]
+        removed = len(all_events) - len(remaining)
+        self._events[thread_id] = remaining
+        return removed
@@ -0,0 +1,471 @@
+"""Run event capture via LangChain callbacks.
+
+RunJournal sits between LangChain's callback mechanism and the pluggable
+RunEventStore. It standardizes callback data into RunEvent records and
+handles token usage accumulation.
+
+Key design decisions:
+- on_llm_new_token is NOT implemented -- only complete messages via on_llm_end
+- on_chat_model_start captures structured prompts as llm_request (OpenAI format)
+- on_llm_end emits llm_response in OpenAI Chat Completions format
+- Token usage accumulated in memory, written to RunRow on run completion
+- Caller identification via tags injection (lead_agent / subagent:{name} / middleware:{name})
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import time
+from datetime import UTC, datetime
+from typing import TYPE_CHECKING, Any
+from uuid import UUID
+
+from langchain_core.callbacks import BaseCallbackHandler
+
+if TYPE_CHECKING:
+    from deerflow.runtime.events.store.base import RunEventStore
+
+logger = logging.getLogger(__name__)
+
+
+class RunJournal(BaseCallbackHandler):
+    """LangChain callback handler that captures events to RunEventStore."""
+
+    def __init__(
+        self,
+        run_id: str,
+        thread_id: str,
+        event_store: RunEventStore,
+        *,
+        track_token_usage: bool = True,
+        flush_threshold: int = 20,
+    ):
+        super().__init__()
+        self.run_id = run_id
+        self.thread_id = thread_id
+        self._store = event_store
+        self._track_tokens = track_token_usage
+        self._flush_threshold = flush_threshold
+
+        # Write buffer
+        self._buffer: list[dict] = []
+
+        # Token accumulators
+        self._total_input_tokens = 0
+        self._total_output_tokens = 0
+        self._total_tokens = 0
+        self._llm_call_count = 0
+        self._lead_agent_tokens = 0
+        self._subagent_tokens = 0
+        self._middleware_tokens = 0
+
+        # Convenience fields
+        self._last_ai_msg: str | None = None
+        self._first_human_msg: str | None = None
+        self._msg_count = 0
+
+        # Latency tracking
+        self._llm_start_times: dict[str, float] = {}  # langchain run_id -> start time
+
+        # LLM request/response tracking
+        self._llm_call_index = 0
+        self._cached_prompts: dict[str, list[dict]] = {}  # langchain run_id -> OpenAI messages
+        self._cached_models: dict[str, str] = {}  # langchain run_id -> model name
+
+        # Tool call ID cache
+        self._tool_call_ids: dict[str, str] = {}  # langchain run_id -> tool_call_id
+
+    # -- Lifecycle callbacks --
+
+    def on_chain_start(self, serialized: dict, inputs: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        if kwargs.get("parent_run_id") is not None:
+            return
+        self._put(
+            event_type="run_start",
+            category="lifecycle",
+            metadata={"input_preview": str(inputs)[:500]},
+        )
+
+    def on_chain_end(self, outputs: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        if kwargs.get("parent_run_id") is not None:
+            return
+        self._put(event_type="run_end", category="lifecycle", metadata={"status": "success"})
+        self._flush_sync()
+
+    def on_chain_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
+        if kwargs.get("parent_run_id") is not None:
+            return
+        self._put(
+            event_type="run_error",
+            category="lifecycle",
+            content=str(error),
+            metadata={"error_type": type(error).__name__},
+        )
+        self._flush_sync()
+
+    # -- LLM callbacks --
+
+    def on_chat_model_start(self, serialized: dict, messages: list[list], *, run_id: UUID, **kwargs: Any) -> None:
+        """Capture structured prompt messages for llm_request event."""
+        from deerflow.runtime.converters import langchain_messages_to_openai
+
+        rid = str(run_id)
+        self._llm_start_times[rid] = time.monotonic()
+        self._llm_call_index += 1
+
+        model_name = serialized.get("name", "")
+        self._cached_models[rid] = model_name
+
+        # Convert the first message list (LangChain passes list-of-lists)
+        prompt_msgs = messages[0] if messages else []
+        openai_msgs = langchain_messages_to_openai(prompt_msgs)
+        self._cached_prompts[rid] = openai_msgs
+
+        caller = self._identify_caller(kwargs)
+        self._put(
+            event_type="llm_request",
+            category="trace",
+            content={"model": model_name, "messages": openai_msgs},
+            metadata={"caller": caller, "llm_call_index": self._llm_call_index},
+        )
+
+    def on_llm_start(self, serialized: dict, prompts: list[str], *, run_id: UUID, **kwargs: Any) -> None:
+        # Fallback: on_chat_model_start is preferred. This just tracks latency.
+        self._llm_start_times[str(run_id)] = time.monotonic()
+
+    def on_llm_end(self, response: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        from deerflow.runtime.converters import langchain_to_openai_completion
+
+        try:
+            message = response.generations[0][0].message
+        except (IndexError, AttributeError):
+            logger.debug("on_llm_end: could not extract message from response")
+            return
+
+        caller = self._identify_caller(kwargs)
+
+        # Latency
+        rid = str(run_id)
+        start = self._llm_start_times.pop(rid, None)
+        latency_ms = int((time.monotonic() - start) * 1000) if start else None
+
+        # Token usage from message
+        usage = getattr(message, "usage_metadata", None)
+        usage_dict = dict(usage) if usage else {}
+
+        # Resolve call index
+        call_index = self._llm_call_index
+        if rid not in self._cached_prompts:
+            # Fallback: on_chat_model_start was not called
+            self._llm_call_index += 1
+            call_index = self._llm_call_index
+
+        # Clean up caches
+        self._cached_prompts.pop(rid, None)
+        self._cached_models.pop(rid, None)
+
+        # Trace event: llm_response (OpenAI completion format)
+        content = getattr(message, "content", "")
+        self._put(
+            event_type="llm_response",
+            category="trace",
+            content=langchain_to_openai_completion(message),
+            metadata={
+                "caller": caller,
+                "usage": usage_dict,
+                "latency_ms": latency_ms,
+                "llm_call_index": call_index,
+            },
+        )
+
+        # Message events: only lead_agent gets message-category events.
+        # Content uses message.model_dump() to align with checkpoint format.
+        tool_calls = getattr(message, "tool_calls", None) or []
+        if caller == "lead_agent":
+            resp_meta = getattr(message, "response_metadata", None) or {}
+            model_name = resp_meta.get("model_name") if isinstance(resp_meta, dict) else None
+            if tool_calls:
+                # ai_tool_call: agent decided to use tools
+                self._put(
+                    event_type="ai_tool_call",
+                    category="message",
+                    content=message.model_dump(),
+                    metadata={"model_name": model_name, "finish_reason": "tool_calls"},
+                )
+            elif isinstance(content, str) and content:
+                # ai_message: final text reply
+                self._put(
+                    event_type="ai_message",
+                    category="message",
+                    content=message.model_dump(),
+                    metadata={"model_name": model_name, "finish_reason": "stop"},
+                )
+                self._last_ai_msg = content
+                self._msg_count += 1
+
+        # Token accumulation
+        if self._track_tokens:
+            input_tk = usage_dict.get("input_tokens", 0) or 0
+            output_tk = usage_dict.get("output_tokens", 0) or 0
+            total_tk = usage_dict.get("total_tokens", 0) or 0
+            if total_tk == 0:
+                total_tk = input_tk + output_tk
+            if total_tk > 0:
+                self._total_input_tokens += input_tk
+                self._total_output_tokens += output_tk
+                self._total_tokens += total_tk
+                self._llm_call_count += 1
+                if caller.startswith("subagent:"):
+                    self._subagent_tokens += total_tk
+                elif caller.startswith("middleware:"):
+                    self._middleware_tokens += total_tk
+                else:
+                    self._lead_agent_tokens += total_tk
+
+    def on_llm_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
+        self._llm_start_times.pop(str(run_id), None)
+        self._put(event_type="llm_error", category="trace", content=str(error))
+
+    # -- Tool callbacks --
+
+    def on_tool_start(self, serialized: dict, input_str: str, *, run_id: UUID, **kwargs: Any) -> None:
+        tool_call_id = kwargs.get("tool_call_id")
+        if tool_call_id:
+            self._tool_call_ids[str(run_id)] = tool_call_id
+        self._put(
+            event_type="tool_start",
+            category="trace",
+            metadata={
+                "tool_name": serialized.get("name", ""),
+                "tool_call_id": tool_call_id,
+                "args": str(input_str)[:2000],
+            },
+        )
+
+    def on_tool_end(self, output: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        from langchain_core.messages import ToolMessage
+
+        # Extract fields from ToolMessage object when LangChain provides one.
+        # LangChain's _format_output wraps tool results into a ToolMessage
+        # with tool_call_id, name, status, and artifact — more complete than
+        # what kwargs alone provides.
+        if isinstance(output, ToolMessage):
+            tool_call_id = output.tool_call_id or kwargs.get("tool_call_id") or self._tool_call_ids.pop(str(run_id), None)
+            tool_name = output.name or kwargs.get("name", "")
+            status = getattr(output, "status", "success") or "success"
+            content_str = output.content if isinstance(output.content, str) else str(output.content)
+            # Use model_dump() for checkpoint-aligned message content.
+            # Override tool_call_id if it was resolved from cache.
+            msg_content = output.model_dump()
+            if msg_content.get("tool_call_id") != tool_call_id:
+                msg_content["tool_call_id"] = tool_call_id
+        else:
+            tool_call_id = kwargs.get("tool_call_id") or self._tool_call_ids.pop(str(run_id), None)
+            tool_name = kwargs.get("name", "")
+            status = "success"
+            content_str = str(output)
+            # Construct checkpoint-aligned dict when output is a plain string.
+            msg_content = ToolMessage(
+                content=content_str,
+                tool_call_id=tool_call_id or "",
+                name=tool_name,
+                status=status,
+            ).model_dump()
+
+        # Trace event (always)
+        self._put(
+            event_type="tool_end",
+            category="trace",
+            content=content_str,
+            metadata={
+                "tool_name": tool_name,
+                "tool_call_id": tool_call_id,
+                "status": status,
+            },
+        )
+
+        # Message event: tool_result (checkpoint-aligned model_dump format)
+        self._put(
+            event_type="tool_result",
+            category="message",
+            content=msg_content,
+            metadata={"tool_name": tool_name, "status": status},
+        )
+
+    def on_tool_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
+        from langchain_core.messages import ToolMessage
+
+        tool_call_id = kwargs.get("tool_call_id") or self._tool_call_ids.pop(str(run_id), None)
+        tool_name = kwargs.get("name", "")
+
+        # Trace event
+        self._put(
+            event_type="tool_error",
+            category="trace",
+            content=str(error),
+            metadata={
+                "tool_name": tool_name,
+                "tool_call_id": tool_call_id,
+            },
+        )
+
+        # Message event: tool_result with error status (checkpoint-aligned)
+        msg_content = ToolMessage(
+            content=str(error),
+            tool_call_id=tool_call_id or "",
+            name=tool_name,
+            status="error",
+        ).model_dump()
+        self._put(
+            event_type="tool_result",
+            category="message",
+            content=msg_content,
+            metadata={"tool_name": tool_name, "status": "error"},
+        )
+
+    # -- Custom event callback --
+
+    def on_custom_event(self, name: str, data: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        from deerflow.runtime.serialization import serialize_lc_object
+
+        if name == "summarization":
+            data_dict = data if isinstance(data, dict) else {}
+            self._put(
+                event_type="summarization",
+                category="trace",
+                content=data_dict.get("summary", ""),
+                metadata={
+                    "replaced_message_ids": data_dict.get("replaced_message_ids", []),
+                    "replaced_count": data_dict.get("replaced_count", 0),
+                },
+            )
+            self._put(
+                event_type="middleware:summarize",
+                category="middleware",
+                content={"role": "system", "content": data_dict.get("summary", "")},
+                metadata={"replaced_count": data_dict.get("replaced_count", 0)},
+            )
+        else:
+            event_data = serialize_lc_object(data) if not isinstance(data, dict) else data
+            self._put(
+                event_type=name,
+                category="trace",
+                metadata=event_data if isinstance(event_data, dict) else {"data": event_data},
+            )
+
+    # -- Internal methods --
+
+    def _put(self, *, event_type: str, category: str, content: str | dict = "", metadata: dict | None = None) -> None:
+        self._buffer.append(
+            {
+                "thread_id": self.thread_id,
+                "run_id": self.run_id,
+                "event_type": event_type,
+                "category": category,
+                "content": content,
+                "metadata": metadata or {},
+                "created_at": datetime.now(UTC).isoformat(),
+            }
+        )
+        if len(self._buffer) >= self._flush_threshold:
+            self._flush_sync()
+
+    def _flush_sync(self) -> None:
+        """Best-effort flush of buffer to RunEventStore.
+
+        BaseCallbackHandler methods are synchronous.  If an event loop is
+        running we schedule an async ``put_batch``; otherwise the events
+        stay in the buffer and are flushed later by the async ``flush()``
+        call in the worker's ``finally`` block.
+        """
+        if not self._buffer:
+            return
+        try:
+            loop = asyncio.get_running_loop()
+        except RuntimeError:
+            # No event loop — keep events in buffer for later async flush.
+            return
+        batch = self._buffer.copy()
+        self._buffer.clear()
+        task = loop.create_task(self._flush_async(batch))
+        task.add_done_callback(self._on_flush_done)
+
+    async def _flush_async(self, batch: list[dict]) -> None:
+        try:
+            await self._store.put_batch(batch)
+        except Exception:
+            logger.warning(
+                "Failed to flush %d events for run %s — returning to buffer",
+                len(batch),
+                self.run_id,
+                exc_info=True,
+            )
+            # Return failed events to buffer for retry on next flush
+            self._buffer = batch + self._buffer
+
+    @staticmethod
+    def _on_flush_done(task: asyncio.Task) -> None:
+        if task.cancelled():
+            return
+        exc = task.exception()
+        if exc:
+            logger.warning("Journal flush task failed: %s", exc)
+
+    def _identify_caller(self, kwargs: dict) -> str:
+        for tag in kwargs.get("tags") or []:
+            if isinstance(tag, str) and (tag.startswith("subagent:") or tag.startswith("middleware:") or tag == "lead_agent"):
+                return tag
+        # Default to lead_agent: the main agent graph does not inject
+        # callback tags, while subagents and middleware explicitly tag
+        # themselves.
+        return "lead_agent"
+
+    # -- Public methods (called by worker) --
+
+    def set_first_human_message(self, content: str) -> None:
+        """Record the first human message for convenience fields."""
+        self._first_human_msg = content[:2000] if content else None
+
+    def record_middleware(self, tag: str, *, name: str, hook: str, action: str, changes: dict) -> None:
+        """Record a middleware state-change event.
+
+        Called by middleware implementations when they perform a meaningful
+        state change (e.g., title generation, summarization, HITL approval).
+        Pure-observation middleware should not call this.
+
+        Args:
+            tag: Short identifier for the middleware (e.g., "title", "summarize",
+                 "guardrail"). Used to form event_type="middleware:{tag}".
+            name: Full middleware class name.
+            hook: Lifecycle hook that triggered the action (e.g., "after_model").
+            action: Specific action performed (e.g., "generate_title").
+            changes: Dict describing the state changes made.
+        """
+        self._put(
+            event_type=f"middleware:{tag}",
+            category="middleware",
+            content={"name": name, "hook": hook, "action": action, "changes": changes},
+        )
+
+    async def flush(self) -> None:
+        """Force flush remaining buffer. Called in worker's finally block."""
+        if self._buffer:
+            batch = self._buffer.copy()
+            self._buffer.clear()
+            await self._store.put_batch(batch)
+
+    def get_completion_data(self) -> dict:
+        """Return accumulated token and message data for run completion."""
+        return {
+            "total_input_tokens": self._total_input_tokens,
+            "total_output_tokens": self._total_output_tokens,
+            "total_tokens": self._total_tokens,
+            "llm_call_count": self._llm_call_count,
+            "lead_agent_tokens": self._lead_agent_tokens,
+            "subagent_tokens": self._subagent_tokens,
+            "middleware_tokens": self._middleware_tokens,
+            "message_count": self._msg_count,
+            "last_ai_message": self._last_ai_msg,
+            "first_human_message": self._first_human_msg,
+        }
@@ -2,11 +2,12 @@

 from .manager import ConflictError, RunManager, RunRecord, UnsupportedStrategyError
 from .schemas import DisconnectMode, RunStatus
-from .worker import run_agent
+from .worker import RunContext, run_agent

 __all__ = [
    "ConflictError",
    "DisconnectMode",
+    "RunContext",
    "RunManager",
    "RunRecord",
    "RunStatus",
@@ -1,4 +1,4 @@
-"""In-memory run registry."""
+"""In-memory run registry with optional persistent RunStore backing."""

 from __future__ import annotations

@@ -7,9 +7,13 @@ import logging
 import uuid
 from dataclasses import dataclass, field
 from datetime import UTC, datetime
+from typing import TYPE_CHECKING

 from .schemas import DisconnectMode, RunStatus

+if TYPE_CHECKING:
+    from deerflow.runtime.runs.store.base import RunStore
+
 logger = logging.getLogger(__name__)


@@ -38,11 +42,44 @@ class RunRecord:


 class RunManager:
-    """In-memory run registry.  All mutations are protected by an asyncio lock."""
+    """In-memory run registry with optional persistent RunStore backing.

-    def __init__(self) -> None:
+    All mutations are protected by an asyncio lock. When a ``store`` is
+    provided, serializable metadata is also persisted to the store so
+    that run history survives process restarts.
+    """
+
+    def __init__(self, store: RunStore | None = None) -> None:
        self._runs: dict[str, RunRecord] = {}
        self._lock = asyncio.Lock()
+        self._store = store
+
+    async def _persist_to_store(self, record: RunRecord, *, follow_up_to_run_id: str | None = None) -> None:
+        """Best-effort persist run record to backing store."""
+        if self._store is None:
+            return
+        try:
+            await self._store.put(
+                record.run_id,
+                thread_id=record.thread_id,
+                assistant_id=record.assistant_id,
+                status=record.status.value,
+                multitask_strategy=record.multitask_strategy,
+                metadata=record.metadata or {},
+                kwargs=record.kwargs or {},
+                created_at=record.created_at,
+                follow_up_to_run_id=follow_up_to_run_id,
+            )
+        except Exception:
+            logger.warning("Failed to persist run %s to store", record.run_id, exc_info=True)
+
+    async def update_run_completion(self, run_id: str, **kwargs) -> None:
+        """Persist token usage and completion data to the backing store."""
+        if self._store is not None:
+            try:
+                await self._store.update_run_completion(run_id, **kwargs)
+            except Exception:
+                logger.warning("Failed to persist run completion for %s", run_id, exc_info=True)

    async def create(
        self,
@@ -53,6 +90,7 @@ class RunManager:
        metadata: dict | None = None,
        kwargs: dict | None = None,
        multitask_strategy: str = "reject",
+        follow_up_to_run_id: str | None = None,
    ) -> RunRecord:
        """Create a new pending run and register it."""
        run_id = str(uuid.uuid4())
@@ -71,6 +109,7 @@ class RunManager:
        )
        async with self._lock:
            self._runs[run_id] = record
+        await self._persist_to_store(record, follow_up_to_run_id=follow_up_to_run_id)
        logger.info("Run created: run_id=%s thread_id=%s", run_id, thread_id)
        return record

@@ -96,6 +135,11 @@ class RunManager:
            record.updated_at = _now_iso()
            if error is not None:
                record.error = error
+        if self._store is not None:
+            try:
+                await self._store.update_status(run_id, status.value, error=error)
+            except Exception:
+                logger.warning("Failed to persist status update for run %s", run_id, exc_info=True)
        logger.info("Run %s -> %s", run_id, status.value)

    async def cancel(self, run_id: str, *, action: str = "interrupt") -> bool:
@@ -132,6 +176,7 @@ class RunManager:
        metadata: dict | None = None,
        kwargs: dict | None = None,
        multitask_strategy: str = "reject",
+        follow_up_to_run_id: str | None = None,
    ) -> RunRecord:
        """Atomically check for inflight runs and create a new one.

@@ -185,6 +230,7 @@ class RunManager:
            )
            self._runs[run_id] = record

+        await self._persist_to_store(record, follow_up_to_run_id=follow_up_to_run_id)
        logger.info("Run created: run_id=%s thread_id=%s", run_id, thread_id)
        return record

@@ -0,0 +1,4 @@
+from deerflow.runtime.runs.store.base import RunStore
+from deerflow.runtime.runs.store.memory import MemoryRunStore
+
+__all__ = ["MemoryRunStore", "RunStore"]
@@ -0,0 +1,96 @@
+"""Abstract interface for run metadata storage.
+
+RunManager depends on this interface. Implementations:
+- MemoryRunStore: in-memory dict (development, tests)
+- Future: RunRepository backed by SQLAlchemy ORM
+
+All methods accept an optional owner_id for user isolation.
+When owner_id is None, no user filtering is applied (single-user mode).
+"""
+
+from __future__ import annotations
+
+import abc
+from typing import Any
+
+
+class RunStore(abc.ABC):
+    @abc.abstractmethod
+    async def put(
+        self,
+        run_id: str,
+        *,
+        thread_id: str,
+        assistant_id: str | None = None,
+        owner_id: str | None = None,
+        status: str = "pending",
+        multitask_strategy: str = "reject",
+        metadata: dict[str, Any] | None = None,
+        kwargs: dict[str, Any] | None = None,
+        error: str | None = None,
+        created_at: str | None = None,
+        follow_up_to_run_id: str | None = None,
+    ) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def get(self, run_id: str) -> dict[str, Any] | None:
+        pass
+
+    @abc.abstractmethod
+    async def list_by_thread(
+        self,
+        thread_id: str,
+        *,
+        owner_id: str | None = None,
+        limit: int = 100,
+    ) -> list[dict[str, Any]]:
+        pass
+
+    @abc.abstractmethod
+    async def update_status(
+        self,
+        run_id: str,
+        status: str,
+        *,
+        error: str | None = None,
+    ) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def delete(self, run_id: str) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def update_run_completion(
+        self,
+        run_id: str,
+        *,
+        status: str,
+        total_input_tokens: int = 0,
+        total_output_tokens: int = 0,
+        total_tokens: int = 0,
+        llm_call_count: int = 0,
+        lead_agent_tokens: int = 0,
+        subagent_tokens: int = 0,
+        middleware_tokens: int = 0,
+        message_count: int = 0,
+        last_ai_message: str | None = None,
+        first_human_message: str | None = None,
+        error: str | None = None,
+    ) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def list_pending(self, *, before: str | None = None) -> list[dict[str, Any]]:
+        pass
+
+    @abc.abstractmethod
+    async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
+        """Aggregate token usage for completed runs in a thread.
+
+        Returns a dict with keys: total_tokens, total_input_tokens,
+        total_output_tokens, total_runs, by_model (model_name → {tokens, runs}),
+        by_caller ({lead_agent, subagent, middleware}).
+        """
+        pass
@@ -0,0 +1,100 @@
+"""In-memory RunStore. Used when database.backend=memory (default) and in tests.
+
+Equivalent to the original RunManager._runs dict behavior.
+"""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+from typing import Any
+
+from deerflow.runtime.runs.store.base import RunStore
+
+
+class MemoryRunStore(RunStore):
+    def __init__(self) -> None:
+        self._runs: dict[str, dict[str, Any]] = {}
+
+    async def put(
+        self,
+        run_id,
+        *,
+        thread_id,
+        assistant_id=None,
+        owner_id=None,
+        status="pending",
+        multitask_strategy="reject",
+        metadata=None,
+        kwargs=None,
+        error=None,
+        created_at=None,
+        follow_up_to_run_id=None,
+    ):
+        now = datetime.now(UTC).isoformat()
+        self._runs[run_id] = {
+            "run_id": run_id,
+            "thread_id": thread_id,
+            "assistant_id": assistant_id,
+            "owner_id": owner_id,
+            "status": status,
+            "multitask_strategy": multitask_strategy,
+            "metadata": metadata or {},
+            "kwargs": kwargs or {},
+            "error": error,
+            "follow_up_to_run_id": follow_up_to_run_id,
+            "created_at": created_at or now,
+            "updated_at": now,
+        }
+
+    async def get(self, run_id):
+        return self._runs.get(run_id)
+
+    async def list_by_thread(self, thread_id, *, owner_id=None, limit=100):
+        results = [r for r in self._runs.values() if r["thread_id"] == thread_id and (owner_id is None or r.get("owner_id") == owner_id)]
+        results.sort(key=lambda r: r["created_at"], reverse=True)
+        return results[:limit]
+
+    async def update_status(self, run_id, status, *, error=None):
+        if run_id in self._runs:
+            self._runs[run_id]["status"] = status
+            if error is not None:
+                self._runs[run_id]["error"] = error
+            self._runs[run_id]["updated_at"] = datetime.now(UTC).isoformat()
+
+    async def delete(self, run_id):
+        self._runs.pop(run_id, None)
+
+    async def update_run_completion(self, run_id, *, status, **kwargs):
+        if run_id in self._runs:
+            self._runs[run_id]["status"] = status
+            for key, value in kwargs.items():
+                if value is not None:
+                    self._runs[run_id][key] = value
+            self._runs[run_id]["updated_at"] = datetime.now(UTC).isoformat()
+
+    async def list_pending(self, *, before=None):
+        now = before or datetime.now(UTC).isoformat()
+        results = [r for r in self._runs.values() if r["status"] == "pending" and r["created_at"] <= now]
+        results.sort(key=lambda r: r["created_at"])
+        return results
+
+    async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
+        completed = [r for r in self._runs.values() if r["thread_id"] == thread_id and r.get("status") in ("success", "error")]
+        by_model: dict[str, dict] = {}
+        for r in completed:
+            model = r.get("model_name") or "unknown"
+            entry = by_model.setdefault(model, {"tokens": 0, "runs": 0})
+            entry["tokens"] += r.get("total_tokens", 0)
+            entry["runs"] += 1
+        return {
+            "total_tokens": sum(r.get("total_tokens", 0) for r in completed),
+            "total_input_tokens": sum(r.get("total_input_tokens", 0) for r in completed),
+            "total_output_tokens": sum(r.get("total_output_tokens", 0) for r in completed),
+            "total_runs": len(completed),
+            "by_model": by_model,
+            "by_caller": {
+                "lead_agent": sum(r.get("lead_agent_tokens", 0) for r in completed),
+                "subagent": sum(r.get("subagent_tokens", 0) for r in completed),
+                "middleware": sum(r.get("middleware_tokens", 0) for r in completed),
+            },
+        }
@@ -17,7 +17,11 @@ from __future__ import annotations

 import asyncio
 import logging
-from typing import Any, Literal
+from dataclasses import dataclass, field
+from typing import TYPE_CHECKING, Any, Literal
+
+if TYPE_CHECKING:
+    from langchain_core.messages import HumanMessage

 from deerflow.runtime.serialization import serialize
 from deerflow.runtime.stream_bridge import StreamBridge
@@ -31,13 +35,29 @@ logger = logging.getLogger(__name__)
 _VALID_LG_MODES = {"values", "updates", "checkpoints", "tasks", "debug", "messages", "custom"}


+@dataclass(frozen=True)
+class RunContext:
+    """Infrastructure dependencies for a single agent run.
+
+    Groups checkpointer, store, and persistence-related singletons so that
+    ``run_agent`` (and any future callers) receive one object instead of a
+    growing list of keyword arguments.
+    """
+
+    checkpointer: Any
+    store: Any | None = field(default=None)
+    event_store: Any | None = field(default=None)
+    run_events_config: Any | None = field(default=None)
+    thread_meta_repo: Any | None = field(default=None)
+    follow_up_to_run_id: str | None = field(default=None)
+
+
 async def run_agent(
    bridge: StreamBridge,
    run_manager: RunManager,
    record: RunRecord,
    *,
-    checkpointer: Any,
-    store: Any | None = None,
+    ctx: RunContext,
    agent_factory: Any,
    graph_input: dict,
    config: dict,
@@ -48,10 +68,47 @@ async def run_agent(
 ) -> None:
    """Execute an agent in the background, publishing events to *bridge*."""

+    # Unpack infrastructure dependencies from RunContext.
+    checkpointer = ctx.checkpointer
+    store = ctx.store
+    event_store = ctx.event_store
+    run_events_config = ctx.run_events_config
+    thread_meta_repo = ctx.thread_meta_repo
+    follow_up_to_run_id = ctx.follow_up_to_run_id
+
    run_id = record.run_id
    thread_id = record.thread_id
    requested_modes: set[str] = set(stream_modes or ["values"])

+    # Initialize RunJournal for event capture
+    journal = None
+    if event_store is not None:
+        from deerflow.runtime.journal import RunJournal
+
+        journal = RunJournal(
+            run_id=run_id,
+            thread_id=thread_id,
+            event_store=event_store,
+            track_token_usage=getattr(run_events_config, "track_token_usage", True),
+        )
+
+        # Write human_message event (model_dump format, aligned with checkpoint)
+        human_msg = _extract_human_message(graph_input)
+        if human_msg is not None:
+            msg_metadata = {}
+            if follow_up_to_run_id:
+                msg_metadata["follow_up_to_run_id"] = follow_up_to_run_id
+            await event_store.put(
+                thread_id=thread_id,
+                run_id=run_id,
+                event_type="human_message",
+                category="message",
+                content=human_msg.model_dump(),
+                metadata=msg_metadata or None,
+            )
+            content = human_msg.content
+            journal.set_first_human_message(content if isinstance(content, str) else str(content))
+
    # Track whether "events" was requested but skipped
    if "events" in requested_modes:
        logger.info(
@@ -90,8 +147,18 @@ async def run_agent(
        # Inject runtime context so middlewares can access thread_id
        # (langgraph-cli does this automatically; we must do it manually)
        runtime = Runtime(context={"thread_id": thread_id}, store=store)
+        # If the caller already set a ``context`` key (LangGraph >= 0.6.0
+        # prefers it over ``configurable`` for thread-level data), make
+        # sure ``thread_id`` is available there too.
+        if "context" in config and isinstance(config["context"], dict):
+            config["context"].setdefault("thread_id", thread_id)
        config.setdefault("configurable", {})["__pregel_runtime"] = runtime

+        # Inject RunJournal as a LangChain callback handler.
+        # on_llm_end captures token usage; on_chain_start/end captures lifecycle.
+        if journal is not None:
+            config.setdefault("callbacks", []).append(journal)
+
        runnable_config = RunnableConfig(**config)
        agent = agent_factory(config=runnable_config)

@@ -206,6 +273,37 @@ async def run_agent(
        )

    finally:
+        # Flush any buffered journal events and persist completion data
+        if journal is not None:
+            try:
+                await journal.flush()
+            except Exception:
+                logger.warning("Failed to flush journal for run %s", run_id, exc_info=True)
+
+            # Persist token usage + convenience fields to RunStore
+            completion = journal.get_completion_data()
+            await run_manager.update_run_completion(run_id, status=record.status.value, **completion)
+
+        # Sync title from checkpoint to threads_meta.display_name
+        if checkpointer is not None:
+            try:
+                ckpt_config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
+                ckpt_tuple = await checkpointer.aget_tuple(ckpt_config)
+                if ckpt_tuple is not None:
+                    ckpt = getattr(ckpt_tuple, "checkpoint", {}) or {}
+                    title = ckpt.get("channel_values", {}).get("title")
+                    if title:
+                        await thread_meta_repo.update_display_name(thread_id, title)
+            except Exception:
+                logger.debug("Failed to sync title for thread %s (non-fatal)", thread_id)
+
+        # Update threads_meta status based on run outcome
+        try:
+            final_status = "idle" if record.status == RunStatus.success else record.status.value
+            await thread_meta_repo.update_status(thread_id, final_status)
+        except Exception:
+            logger.debug("Failed to update thread_meta status for %s (non-fatal)", thread_id)
+
        await bridge.publish_end(run_id)
        asyncio.create_task(bridge.cleanup(run_id, delay=60))

@@ -227,6 +325,31 @@ def _lg_mode_to_sse_event(mode: str) -> str:
    return mode


+def _extract_human_message(graph_input: dict) -> HumanMessage | None:
+    """Extract or construct a HumanMessage from graph_input for event recording.
+
+    Returns a LangChain HumanMessage so callers can use .model_dump() to get
+    the checkpoint-aligned serialization format.
+    """
+    from langchain_core.messages import HumanMessage
+
+    messages = graph_input.get("messages")
+    if not messages:
+        return None
+    last = messages[-1] if isinstance(messages, list) else messages
+    if isinstance(last, HumanMessage):
+        return last
+    if isinstance(last, str):
+        return HumanMessage(content=last) if last else None
+    if hasattr(last, "content"):
+        content = last.content
+        return HumanMessage(content=content)
+    if isinstance(last, dict):
+        content = last.get("content", "")
+        return HumanMessage(content=content) if content else None
+    return None
+
+
 def _unpack_stream_item(
    item: Any,
    lg_modes: list[str],
--- a/Show More
+++ b/Show More