Merge branch 'main' into fix-2788

fix(sandbox): cleanup dead containers and avoid lock-held liveness checks
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/96707445-0f8b-4901-8ef3-d8e5667f8a05 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
2026-06-11 09:55:59 +00:00 · 2026-05-27 08:29:21 +08:00 · 2026-05-11 00:09:09 +00:00 · 2026-05-10 22:53:58 +08:00
367 changed files with 3251 additions and 36225 deletions
@@ -59,7 +59,7 @@ smoke-test/
 2. **Check pnpm** - Package manager
 3. **Check uv** - Python package manager
 4. **Check nginx** - Reverse proxy
-5. **Check required ports** - Confirm that ports 2026, 3000, and 8001 are not occupied
+5. **Check required ports** - Confirm that ports 2026, 3000, 8001, and 2024 are not occupied

 **Docker mode environment check** (if Docker is selected):
 1. **Check whether Docker is installed** - Run `docker --version`
@@ -93,17 +93,17 @@ smoke-test/
 ### Phase 5: Service Health Check

 **Local mode health check**:
-1. **Check process status** - Confirm that Gateway, Frontend, and Nginx processes are all running
+1. **Check process status** - Confirm that LangGraph, Gateway, Frontend, and Nginx processes are all running
 2. **Check frontend service** - Visit `http://localhost:2026` and verify that the page loads
 3. **Check API Gateway** - Verify the `http://localhost:2026/health` endpoint
-4. **Check LangGraph-compatible API** - Verify the `/api/langgraph/*` route exposed by Gateway
+4. **Check LangGraph service** - Verify the availability of relevant endpoints
 5. **Frontend route smoke check** - Run `bash .agent/skills/smoke-test/scripts/frontend_check.sh` to verify key routes under `/workspace`

 **Docker mode health check** (when using Docker):
 1. **Check container status** - Run `docker ps` and confirm that all containers are running
 2. **Check frontend service** - Visit `http://localhost:2026` and verify that the page loads
 3. **Check API Gateway** - Verify the `http://localhost:2026/health` endpoint
-4. **Check LangGraph-compatible API** - Verify the `/api/langgraph/*` route exposed by Gateway
+4. **Check LangGraph service** - Verify the availability of relevant endpoints
 5. **Frontend route smoke check** - Run `bash .agent/skills/smoke-test/scripts/frontend_check.sh` to verify key routes under `/workspace`

 ### Optional Functional Verification
@@ -135,7 +135,7 @@ smoke-test/

 The following warnings can appear during smoke testing and do not block a successful result:
 - Feishu/Lark SSL errors in Gateway logs (certificate verification failure) can be ignored if that channel is not enabled
- Warnings in Gateway logs about missing methods in the custom checkpointer, such as `adelete_for_runs` or `aprune`, do not affect the core functionality
+- Warnings in LangGraph logs about missing methods in the custom checkpointer, such as `adelete_for_runs` or `aprune`, do not affect the core functionality

 ## Key Tools

@@ -138,6 +138,7 @@ This document describes the detailed operating steps for each phase of the DeerF
   lsof -i :2026  # Main port
   lsof -i :3000  # Frontend
   lsof -i :8001  # Gateway
+   lsof -i :2024  # LangGraph
   ```

 **Success Criteria**: All ports are free, or they are occupied only by DeerFlow-related processes.
@@ -257,7 +258,7 @@ This document describes the detailed operating steps for each phase of the DeerF
 **Steps**:
 1. Run `make dev-daemon` (background mode)

-**Description**: This command starts all services (Gateway embedded runtime, Frontend, Nginx).
+**Description**: This command starts all services (LangGraph, Gateway, Frontend, Nginx).

 **Notes**:
 - `make dev` runs in the foreground and stops with Ctrl+C
@@ -271,6 +272,7 @@ This document describes the detailed operating steps for each phase of the DeerF
 **Steps**:
 1. Wait 90-120 seconds for all services to start completely
 2. You can monitor startup progress by checking these log files:
+   - `logs/langgraph.log`
   - `logs/gateway.log`
   - `logs/frontend.log`
   - `logs/nginx.log`
@@ -314,10 +316,11 @@ This document describes the detailed operating steps for each phase of the DeerF
 **Steps**:
 1. Run the following command to check processes:
   ```bash
-   ps aux | grep -E "(uvicorn|next|nginx)" | grep -v grep
+   ps aux | grep -E "(langgraph|uvicorn|next|nginx)" | grep -v grep
   ```

 **Success Criteria**: Confirm that the following processes are running:
+- LangGraph (`langgraph dev`)
 - Gateway (`uvicorn app.gateway.app:app`)
 - Frontend (`next dev` or `next start`)
 - Nginx (`nginx`)
@@ -353,11 +356,10 @@ curl http://localhost:2026/health

 ---

-#### 5.1.4 Check LangGraph-compatible API
+#### 5.1.4 Check LangGraph Service

 **Steps**:
-1. Visit `http://localhost:2026/api/langgraph/assistants/lead_agent` to verify Gateway's LangGraph-compatible API route is reachable.
-2. A `401` response is acceptable when authentication is enabled and no session cookie is provided.
+1. Visit relevant LangGraph endpoints to verify availability

 ---

@@ -371,6 +373,7 @@ curl http://localhost:2026/health
   - `deer-flow-nginx`
   - `deer-flow-frontend`
   - `deer-flow-gateway`
+   - `deer-flow-langgraph` (if not in gateway mode)

 ---

@@ -403,11 +406,10 @@ curl http://localhost:2026/health

 ---

-#### 5.2.4 Check LangGraph-compatible API
+#### 5.2.4 Check LangGraph Service

 **Steps**:
-1. Visit `http://localhost:2026/api/langgraph/assistants/lead_agent` to verify Gateway's LangGraph-compatible API route is reachable.
-2. A `401` response is acceptable when authentication is enabled and no session cookie is provided.
+1. Visit relevant LangGraph endpoints to verify availability

 ---

@@ -254,6 +254,7 @@ Processes exit quickly after running `make dev-daemon`.
 **Solutions**:
 1. Check log files:
   ```bash
+   tail -f logs/langgraph.log
   tail -f logs/gateway.log
   tail -f logs/frontend.log
   tail -f logs/nginx.log
@@ -366,7 +367,24 @@ Errors appear in `gateway.log`.
   uv sync
   ```

-4. Confirm that the Gateway process is running normally.
+4. Confirm that the LangGraph service is running normally (if not in gateway mode)
+
+---
+
+### Issue: LangGraph Fails to Start
+
+**Symptoms**:
+Errors appear in `langgraph.log`.
+
+**Solutions**:
+1. Check LangGraph logs:
+   ```bash
+   tail -f logs/langgraph.log
+   ```
+
+2. Check config.yaml
+3. Check whether Python dependencies are complete
+4. Confirm that port 2024 is not occupied

 ---

@@ -501,7 +519,7 @@ Accessing `/health` returns an error or times out.

 2. Confirm that config.yaml exists and has valid formatting
 3. Check whether Python dependencies are complete
-4. Confirm that the Gateway process is running normally.
+4. Confirm that the LangGraph service is running normally

 **Solutions** (Docker mode):
 1. Check gateway container logs:
@@ -511,7 +529,7 @@ Accessing `/health` returns an error or times out.

 2. Confirm that config.yaml is mounted correctly
 3. Check whether Python dependencies are complete
-4. Confirm that the Gateway process is running normally.
+4. Confirm that the LangGraph service is running normally

 ---

@@ -521,7 +539,7 @@ Accessing `/health` returns an error or times out.

 #### View All Service Processes
 ```bash
-ps aux | grep -E "(uvicorn|next|nginx)" | grep -v grep
+ps aux | grep -E "(langgraph|uvicorn|next|nginx)" | grep -v grep
 ```

 #### View Service Logs
@@ -530,6 +548,7 @@ ps aux | grep -E "(uvicorn|next|nginx)" | grep -v grep
 tail -f logs/*.log

 # View specific service logs
+tail -f logs/langgraph.log
 tail -f logs/gateway.log
 tail -f logs/frontend.log
 tail -f logs/nginx.log
@@ -65,7 +65,7 @@ if ! command -v lsof >/dev/null 2>&1; then
    echo "  Install lsof and rerun this check"
    all_passed=false
 else
-    for port in 2026 3000 8001; do
+    for port in 2026 3000 8001 2024; do
        if lsof -i :$port >/dev/null 2>&1; then
            echo "⚠  Port $port is already in use:"
            lsof -i :$port | head -2
@@ -54,6 +54,7 @@ echo "=========================================="
 echo ""
 echo "🌐 Access URL: http://localhost:2026"
 echo "📋 View logs:"
+echo "   - logs/langgraph.log"
 echo "   - logs/gateway.log"
 echo "   - logs/frontend.log"
 echo "   - logs/nginx.log"
@@ -76,11 +76,12 @@ if [ "$mode" = "docker" ]; then
        all_passed=false
    fi
 else
-    summary_hint="logs/{gateway,frontend,nginx}.log"
+    summary_hint="logs/{langgraph,gateway,frontend,nginx}.log"
    print_step "1. Checking local service ports..."
    check_listen_port "Nginx" 2026
    check_listen_port "Frontend" 3000
    check_listen_port "Gateway" 8001
+    check_listen_port "LangGraph" 2024
 fi
 echo ""

@@ -103,8 +104,8 @@ else
 fi
 echo ""

-echo "5. Checking LangGraph-compatible Gateway API..."
-check_http_status "LangGraph-compatible Gateway API" "http://localhost:2026/api/langgraph/assistants/lead_agent" "200|401"
+echo "5. Checking LangGraph service..."
+check_http_status "LangGraph service" "http://localhost:2024/" "200|301|302|307|308|404"
 echo ""

 echo "=========================================="
@@ -78,7 +78,7 @@
 - [x] Container status - {{status_containers}}
 - [x] Frontend service - {{status_frontend}}
 - [x] API Gateway - {{status_api_gateway}}
- [x] LangGraph-compatible Gateway API - {{status_langgraph}}
+- [x] LangGraph service - {{status_langgraph}}

 **Phase Status**: {{stage5_status}}

@@ -147,6 +147,7 @@ Commit Message: {{git_commit_message}}
 | deer-flow-nginx | {{nginx_status}} | {{nginx_uptime}} |
 | deer-flow-frontend | {{frontend_status}} | {{frontend_uptime}} |
 | deer-flow-gateway | {{gateway_status}} | {{gateway_uptime}} |
+| deer-flow-langgraph | {{langgraph_status}} | {{langgraph_uptime}} |

 ---

@@ -80,7 +80,7 @@
 - [x] Process status - {{status_processes}}
 - [x] Frontend service - {{status_frontend}}
 - [x] API Gateway - {{status_api_gateway}}
- [x] LangGraph-compatible Gateway API - {{status_langgraph}}
+- [x] LangGraph service - {{status_langgraph}}

 **Phase Status**: {{stage5_status}}

@@ -152,7 +152,7 @@ Commit Message: {{git_commit_message}}
 | Nginx | {{nginx_status}} | {{nginx_endpoint}} |
 | Frontend | {{frontend_status}} | {{frontend_endpoint}} |
 | Gateway | {{gateway_status}} | {{gateway_endpoint}} |
-| Gateway LangGraph API | {{langgraph_status}} | {{langgraph_endpoint}} |
+| LangGraph | {{langgraph_status}} | {{langgraph_endpoint}} |

 ---

@@ -166,7 +166,7 @@ Commit Message: {{git_commit_message}}

 ### If the Test Fails
 1. [ ] Review references/troubleshooting.md for common solutions
-2. [ ] Check local logs: `logs/{gateway,frontend,nginx}.log`
+2. [ ] Check local logs: `logs/{langgraph,gateway,frontend,nginx}.log`
 3. [ ] Verify configuration file format and content
 4. [ ] If needed, fully reset the environment: `make stop && make clean && make install && make dev-daemon`

@@ -21,7 +21,6 @@ INFOQUEST_API_KEY=your-infoquest-api-key
 # DEEPSEEK_API_KEY=your-deepseek-api-key
 # NOVITA_API_KEY=your-novita-api-key  # OpenAI-compatible, see https://novita.ai
 # MINIMAX_API_KEY=your-minimax-api-key  # OpenAI-compatible, see https://platform.minimax.io
-# STEPFUN_API_KEY=your-stepfun-api-key  # OpenAI-compatible, see https://platform.stepfun.com
 # VLLM_API_KEY=your-vllm-api-key  # OpenAI-compatible
 # FEISHU_APP_ID=your-feishu-app-id
 # FEISHU_APP_SECRET=your-feishu-app-secret
@@ -1,159 +0,0 @@
-name: 🐛 Bug report
-description: Report something that isn't working so maintainers can reproduce and fix it.
-title: "[bug] "
-labels: ["bug"]
-body:
-  - type: markdown
-    attributes:
-      value: |
-        Thanks for taking the time to file a bug. A clear, reproducible report is the
-        single biggest factor in how fast it gets fixed.
-
-        Please fill in every required field — especially **reproduction steps** and **logs**.
-
-  - type: checkboxes
-    id: preflight
-    attributes:
-      label: Before you start
-      options:
-        - label: I searched [existing issues](https://github.com/bytedance/deer-flow/issues?q=is%3Aissue) and this is not a duplicate.
-          required: true
-        - label: I can reproduce this on the latest `main`.
-          required: false
-
-  - type: input
-    id: summary
-    attributes:
-      label: Problem summary
-      description: One sentence describing the bug.
-      placeholder: e.g. make dev fails to start the gateway service
-    validations:
-      required: true
-
-  - type: dropdown
-    id: areas
-    attributes:
-      label: Affected area(s)
-      description: Which part of DeerFlow does this touch? Select all that apply.
-      multiple: true
-      options:
-        - Frontend (UI / Next.js)
-        - Backend API (gateway / endpoints / SSE)
-        - Agents / LangGraph (graph, prompts, langgraph.json)
-        - Sandbox / Docker
-        - Skills
-        - MCP
-        - Config / setup (make, config.yaml, env)
-        - Docs
-        - Not sure
-    validations:
-      required: true
-
-  - type: textarea
-    id: actual
-    attributes:
-      label: What happened?
-      description: The actual behavior. Include the key error lines verbatim.
-      placeholder: When I do X, I expected Y but I got Z.
-    validations:
-      required: true
-
-  - type: textarea
-    id: expected
-    attributes:
-      label: Expected behavior
-      placeholder: What did you expect to happen instead?
-    validations:
-      required: true
-
-  - type: textarea
-    id: reproduce
-    attributes:
-      label: Steps to reproduce
-      description: Exact commands and sequence. Minimal steps that reliably reproduce the problem.
-      placeholder: |
-        1. make check
-        2. make install
-        3. make dev
-        4. ...
-    validations:
-      required: true
-
-  - type: textarea
-    id: logs
-    attributes:
-      label: Relevant logs
-      description: Paste key lines from logs (for example `logs/gateway.log`, `logs/frontend.log`). Redact secrets.
-      render: shell
-    validations:
-      required: true
-
-  - type: dropdown
-    id: run_mode
-    attributes:
-      label: How are you running DeerFlow?
-      options:
-        - Local (make dev)
-        - Docker (make docker-start)
-        - CI
-        - Other
-    validations:
-      required: true
-
-  - type: dropdown
-    id: os
-    attributes:
-      label: Operating system
-      options:
-        - macOS
-        - Linux
-        - Windows
-        - Other
-    validations:
-      required: true
-
-  - type: input
-    id: platform_details
-    attributes:
-      label: Platform details
-      description: Architecture and shell, if relevant.
-      placeholder: e.g. arm64, zsh
-
-  - type: input
-    id: python_version
-    attributes:
-      label: Python version
-      placeholder: e.g. Python 3.12.9
-
-  - type: input
-    id: node_version
-    attributes:
-      label: Node.js version
-      placeholder: e.g. v22.11.0
-
-  - type: input
-    id: pnpm_version
-    attributes:
-      label: pnpm version
-      placeholder: e.g. 10.26.2
-
-  - type: input
-    id: uv_version
-    attributes:
-      label: uv version
-      placeholder: e.g. 0.7.20
-
-  - type: textarea
-    id: git_info
-    attributes:
-      label: Git state
-      description: Output of `git branch --show-current` and the latest commit SHA.
-      placeholder: |
-        branch: feature/my-branch
-        commit: abcdef1
-
-  - type: textarea
-    id: additional
-    attributes:
-      label: Additional context
-      description: Screenshots, related issues, config snippets (redacted), or anything else that helps triage.
@@ -1,11 +0,0 @@
-blank_issues_enabled: false
-contact_links:
-  - name: 💬 Questions & usage help
-    url: https://github.com/bytedance/deer-flow/discussions/categories/q-a
-    about: "How do I use X? Why does Y behave like that? Ask in Discussions — it gets answered faster and stays searchable."
-  - name: 💡 Ideas & proposals
-    url: https://github.com/bytedance/deer-flow/discussions/categories/ideas
-    about: Have a half-formed idea? Float it in Discussions before opening a formal feature request.
-  - name: 🔒 Report a security vulnerability
-    url: https://github.com/bytedance/deer-flow/security/policy
-    about: Do not open a public issue for security problems. Follow the security policy instead.
@@ -1,67 +0,0 @@
-name: 💡 Feature request
-description: Propose a new capability or an improvement to an existing one.
-title: "[feat] "
-labels: ["enhancement"]
-body:
-  - type: markdown
-    attributes:
-      value: |
-        Thanks for the suggestion. For non-trivial features, please open a
-        [Discussion](https://github.com/bytedance/deer-flow/discussions/categories/ideas)
-        first to align on scope before writing code.
-
-  - type: checkboxes
-    id: preflight
-    attributes:
-      label: Before you start
-      options:
-        - label: I searched [existing issues](https://github.com/bytedance/deer-flow/issues?q=is%3Aissue) and this is not a duplicate.
-          required: true
-
-  - type: textarea
-    id: problem
-    attributes:
-      label: Problem / motivation
-      description: What problem does this solve? What is painful today, or what does it unblock?
-      placeholder: "I'm always frustrated when ..."
-    validations:
-      required: true
-
-  - type: textarea
-    id: solution
-    attributes:
-      label: Proposed solution
-      description: Describe the change from a user's / caller's perspective.
-    validations:
-      required: true
-
-  - type: dropdown
-    id: areas
-    attributes:
-      label: Affected area(s)
-      description: Which part of DeerFlow would this touch? Select all that apply.
-      multiple: true
-      options:
-        - Frontend (UI / Next.js)
-        - Backend API (gateway / endpoints / SSE)
-        - Agents / LangGraph (graph, prompts, langgraph.json)
-        - Sandbox / Docker
-        - Skills
-        - MCP
-        - Config / setup
-        - Docs
-        - Not sure
-    validations:
-      required: true
-
-  - type: textarea
-    id: alternatives
-    attributes:
-      label: Alternatives considered
-      description: Other approaches you weighed and why you discarded them.
-
-  - type: textarea
-    id: additional
-    attributes:
-      label: Additional context
-      description: Mockups, links, related issues, or anything else that helps.
@@ -0,0 +1,128 @@
+name: Runtime Information
+description: Report runtime/environment details to help reproduce an issue.
+title: "[runtime] "
+labels:
+  - needs-triage
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for sharing runtime details.
+        Complete this form so maintainers can quickly reproduce and diagnose the problem.
+
+  - type: input
+    id: summary
+    attributes:
+      label: Problem summary
+      description: Short summary of the issue.
+      placeholder: e.g. make dev fails to start gateway service
+    validations:
+      required: true
+
+  - type: textarea
+    id: expected
+    attributes:
+      label: Expected behavior
+      placeholder: What did you expect to happen?
+    validations:
+      required: true
+
+  - type: textarea
+    id: actual
+    attributes:
+      label: Actual behavior
+      placeholder: What happened instead? Include key error lines.
+    validations:
+      required: true
+
+  - type: dropdown
+    id: os
+    attributes:
+      label: Operating system
+      options:
+        - macOS
+        - Linux
+        - Windows
+        - Other
+    validations:
+      required: true
+
+  - type: input
+    id: platform_details
+    attributes:
+      label: Platform details
+      description: Add architecture and shell if relevant.
+      placeholder: e.g. arm64, zsh
+
+  - type: input
+    id: python_version
+    attributes:
+      label: Python version
+      placeholder: e.g. Python 3.12.9
+
+  - type: input
+    id: node_version
+    attributes:
+      label: Node.js version
+      placeholder: e.g. v23.11.0
+
+  - type: input
+    id: pnpm_version
+    attributes:
+      label: pnpm version
+      placeholder: e.g. 10.26.2
+
+  - type: input
+    id: uv_version
+    attributes:
+      label: uv version
+      placeholder: e.g. 0.7.20
+
+  - type: dropdown
+    id: run_mode
+    attributes:
+      label: How are you running DeerFlow?
+      options:
+        - Local (make dev)
+        - Docker (make docker-dev)
+        - CI
+        - Other
+    validations:
+      required: true
+
+  - type: textarea
+    id: reproduce
+    attributes:
+      label: Reproduction steps
+      description: Provide exact commands and sequence.
+      placeholder: |
+        1. make check
+        2. make install
+        3. make dev
+        4. ...
+    validations:
+      required: true
+
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant logs
+      description: Paste key lines from logs (for example logs/gateway.log, logs/frontend.log).
+      render: shell
+    validations:
+      required: true
+
+  - type: textarea
+    id: git_info
+    attributes:
+      label: Git state
+      description: Share output of git branch and latest commit SHA.
+      placeholder: |
+        branch: feature/my-branch
+        commit: abcdef1
+
+  - type: textarea
+    id: additional
+    attributes:
+      label: Additional context
+      description: Add anything else that might help triage.
@@ -1,119 +0,0 @@
-# Declarative label source of truth for DeerFlow.
-#
-# This file is the single source of truth for repository labels used by the
-# auto-labeling workflows (.github/workflows/pr-labeler.yml, pr-triage.yml,
-# issue-triage.yml). Auto-labelers can only apply labels that already exist,
-# so every label referenced by a workflow MUST be declared here.
-#
-# Apply with:  uv run --with pyyaml python scripts/sync_labels.py [--repo OWNER/NAME]
-# CI keeps it in sync via .github/workflows/label-sync.yml (runs on changes here).
-#
-# Sync is additive/update-only: it creates or updates the labels listed below
-# and never deletes labels that are not listed.
-#
-# Color = 6-digit hex without the leading '#'.
-
-labels:
-  # ── Type ─────────────────────────────────────────────────────────────────
-  # Mostly GitHub defaults; declared here so colors/descriptions stay stable
-  # and so issue templates can rely on them existing.
-  - name: bug
-    color: d73a4a
-    description: Something isn't working
-  - name: enhancement
-    color: a2eeef
-    description: New feature or request
-  - name: documentation
-    color: 0075ca
-    description: Improvements or additions to documentation
-  - name: question
-    color: d876e3
-    description: Further information is requested
-
-  # ── Area (auto, by changed paths — see .github/labeler.yml) ───────────────
-  # Mirrors the "Surface area" section of the pull request template.
-  - name: "area:frontend"
-    color: c5def5
-    description: Next.js frontend under frontend/
-  - name: "area:backend"
-    color: c5def5
-    description: Gateway / runtime / core backend under backend/
-  - name: "area:agents"
-    color: c5def5
-    description: Agents, subagents, graph wiring, prompts, langgraph.json
-  - name: "area:sandbox"
-    color: c5def5
-    description: Sandboxed execution and docker/
-  - name: "area:skills"
-    color: c5def5
-    description: Skills under skills/ or the skills harness
-  - name: "area:mcp"
-    color: c5def5
-    description: Model Context Protocol integration
-  - name: "area:ci"
-    color: c5def5
-    description: GitHub Actions, CI config, repo tooling
-  - name: "area:docs"
-    color: c5def5
-    description: Documentation and Markdown only
-  - name: "area:deps"
-    color: c5def5
-    description: Dependency manifests / lockfiles
-
-  # ── Size (auto, by additions + deletions — see pr-triage.yml) ─────────────
-  - name: "size/XS"
-    color: "009900"
-    description: PR changes < 20 lines
-  - name: "size/S"
-    color: 77bb00
-    description: PR changes 20-100 lines
-  - name: "size/M"
-    color: eebb00
-    description: PR changes 100-300 lines
-  - name: "size/L"
-    color: ee9900
-    description: PR changes 300-700 lines
-  - name: "size/XL"
-    color: ee5500
-    description: PR changes 700+ lines
-
-  # ── Risk (auto, by changed paths — see pr-triage.yml) ─────────────────────
-  - name: "risk:low"
-    color: 0e8a16
-    description: "Low risk: docs / i18n / assets only"
-  - name: "risk:medium"
-    color: fbca04
-    description: "Medium risk: regular code changes"
-  - name: "risk:high"
-    color: b60205
-    description: "High risk: backend API, agents, sandbox, auth, deps, CI"
-
-  # ── Priority (manual) ─────────────────────────────────────────────────────
-  - name: P0
-    color: b60205
-    description: Critical priority
-  - name: P1
-    color: d93f0b
-    description: Major priority
-  - name: P2
-    color: e99695
-    description: Normal priority
-
-  # ── Status (auto + manual) ────────────────────────────────────────────────
-  - name: needs-triage
-    color: fef2c0
-    description: Awaiting maintainer triage
-  - name: needs-validation
-    color: d4c5f9
-    description: Touches front/back contract surface; needs real-path validation
-  - name: skip-validation
-    color: cccccc
-    description: "Maintainer override: do not auto-add needs-validation on this PR"
-  - name: reviewing
-    color: 5319e7
-    description: A maintainer is reviewing this PR
-
-  # ── Contributor ───────────────────────────────────────────────────────────
-  - name: first-time-contributor
-    color: c2e0c6
-    description: First contribution to this repository — be welcoming
@@ -59,17 +59,3 @@ Fixes #
       Frontend:  cd frontend && pnpm format && pnpm lint && pnpm typecheck && BETTER_AUTH_SECRET=local-dev-secret pnpm build && make test
       Frontend E2E (if you touched frontend/): cd frontend && make test-e2e -->

-
-## AI assistance
-
-<!-- DeerFlow is an AI project — most PRs here use AI coding tools, and that's
-     welcome. Disclosing it just helps reviewers calibrate how closely to read the
-     diff. Please fill all three; don't delete the section. -->
-
-**Tool(s) used:** <!-- e.g. Claude Code, Cursor, GitHub Copilot, Codex, Windsurf, or "none" -->
-
-**How you used it:** <!-- e.g. "generated the module from a spec", "autocomplete only",
-     "AI wrote tests, I wrote the impl". A prompt or conversation link is great too. -->
-
- [ ] I've read and understand every line of this change and take responsibility for it — it's not unreviewed AI output.
-
@@ -1,38 +0,0 @@
-name: Label Sync
-
-# Keeps repository labels in sync with the declarative source of truth
-# (.github/labels.yml). Runs whenever that file changes on main, and can be
-# triggered manually. Additive/update-only — never deletes labels.
-
-on:
-  push:
-    branches: [main]
-    paths:
-      - ".github/labels.yml"
-      - "scripts/sync_labels.py"
-      - ".github/workflows/label-sync.yml"
-  workflow_dispatch:
-
-permissions:
-  contents: read
-  issues: write
-
-concurrency:
-  group: label-sync
-  cancel-in-progress: false
-
-jobs:
-  sync:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-
-      - name: Install uv
-        uses: astral-sh/setup-uv@v7
-
-      - name: Sync labels
-        run: uv run --with pyyaml python scripts/sync_labels.py
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          GH_REPO: ${{ github.repository }}
@@ -10,7 +10,7 @@ permissions:
  contents: read

 jobs:
-  lint-backend:
+  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
@@ -1,108 +0,0 @@
-name: Replay E2E (front-back contract)
-
-# Guards the front-back contract via record/replay (no API key in CI):
-#   Layer 1 — backend golden: replay a recorded trace through the real gateway,
-#             assert the SSE event sequence matches the committed golden.
-#   Layer 2 — full-stack render: real Next.js frontend + real gateway (replay
-#             model) + Chromium; assert the replayed turns render in the browser.
-# Triggered by changes on EITHER side of the contract so a backend change can no
-# longer pass without the frontend-facing checks running.
-
-on:
-  push:
-    branches: ["main"]
-    paths:
-      - "frontend/**"
-      - "backend/app/gateway/**"
-      - "backend/packages/harness/**"
-      - "backend/tests/fixtures/replay/**"
-      - "backend/tests/replay_provider.py"
-      - "backend/tests/_replay_fixture.py"
-      - "backend/tests/seed_runs_router.py"
-      - "backend/tests/test_replay_golden.py"
-      - "backend/scripts/run_replay_gateway.py"
-      - ".github/workflows/replay-e2e.yml"
-  pull_request:
-    types: [opened, synchronize, reopened, ready_for_review]
-    paths:
-      - "frontend/**"
-      - "backend/app/gateway/**"
-      - "backend/packages/harness/**"
-      - "backend/tests/fixtures/replay/**"
-      - "backend/tests/replay_provider.py"
-      - "backend/tests/_replay_fixture.py"
-      - "backend/tests/seed_runs_router.py"
-      - "backend/tests/test_replay_golden.py"
-      - "backend/scripts/run_replay_gateway.py"
-      - ".github/workflows/replay-e2e.yml"
-
-concurrency:
-  group: replay-e2e-${{ github.event.pull_request.number || github.ref }}
-  cancel-in-progress: true
-
-permissions:
-  contents: read
-
-jobs:
-  backend-replay-golden:
-    name: Layer 1 — backend golden (no API key)
-    if: github.event_name != 'pull_request' || github.event.pull_request.draft == false
-    runs-on: ubuntu-latest
-    timeout-minutes: 15
-    steps:
-      - uses: actions/checkout@v6
-      - name: Set up Python
-        uses: actions/setup-python@v6
-        with:
-          python-version: "3.12"
-      - name: Install uv
-        uses: astral-sh/setup-uv@v7
-      - name: Install backend dependencies
-        working-directory: backend
-        run: uv sync --group dev
-      - name: Replay golden (backend SSE contract)
-        working-directory: backend
-        run: PYTHONPATH=. uv run pytest tests/test_replay_golden.py -v
-
-  fullstack-replay-render:
-    name: Layer 2 — full-stack render (no API key)
-    if: github.event_name != 'pull_request' || github.event.pull_request.draft == false
-    runs-on: ubuntu-latest
-    timeout-minutes: 25
-    steps:
-      - uses: actions/checkout@v6
-      - name: Set up Python
-        uses: actions/setup-python@v6
-        with:
-          python-version: "3.12"
-      - name: Install uv
-        uses: astral-sh/setup-uv@v7
-      - name: Install backend dependencies (replay gateway)
-        working-directory: backend
-        run: uv sync --group dev
-      - name: Setup Node.js
-        uses: actions/setup-node@v4
-        with:
-          node-version: "22"
-      - name: Enable Corepack
-        run: corepack enable
-      - name: Use pinned pnpm version
-        run: corepack prepare pnpm@10.26.2 --activate
-      - name: Install frontend dependencies
-        working-directory: frontend
-        run: pnpm install --frozen-lockfile
-      - name: Install Playwright Chromium
-        working-directory: frontend
-        run: npx playwright install chromium --with-deps
-      - name: Full-stack replay render (DOM assertions are the gate)
-        working-directory: frontend
-        run: pnpm exec playwright test -c playwright.real-backend.config.ts
-      - name: Upload report + render artifact
-        uses: actions/upload-artifact@v4
-        if: ${{ !cancelled() }}
-        with:
-          name: replay-render
-          path: |
-            frontend/playwright-report/
-            frontend/test-results/
-          retention-days: 7
@@ -1,223 +0,0 @@
-name: Triage
-
-# One workflow for all event-driven PR/issue labeling. Replaces the former
-# pr-labeler / pr-triage / issue-triage workflows (and drops actions/labeler).
-#
-# Design notes:
-#   * All jobs are pure-metadata: they read changed-file lists / PR fields / the
-#     review payload via the API and write labels. PR code is NEVER checked out
-#     or executed, so pull_request_target is safe here.
-#   * Each job only reconciles labels in namespaces IT owns
-#     (area:* / size/* / risk:* / needs-validation). It never touches labels
-#     applied by maintainers or other tools (bug, priority, etc.). first-time-
-#     contributor and reviewing are add-only.
-#   * State is read LIVE (listFiles + listLabelsOnIssue) at run time, not from
-#     the (stale) event payload, so rapid synchronize events converge instead
-#     of thrashing.
-
-on:
-  pull_request_target:
-    types: [opened, synchronize, reopened, ready_for_review]
-  pull_request_review:
-    types: [submitted]
-  issues:
-    types: [opened]
-
-permissions:
-  contents: read
-  pull-requests: write
-  issues: write
-
-jobs:
-  # ── PR: area / size / risk / needs-validation / first-time ─────────────────
-  pr-labels:
-    if: github.event_name == 'pull_request_target' && github.event.pull_request.draft == false
-    runs-on: ubuntu-latest
-    concurrency:
-      group: triage-pr-${{ github.event.pull_request.number }}
-      cancel-in-progress: true
-    steps:
-      - name: Apply PR labels from live state
-        uses: actions/github-script@v8
-        with:
-          script: |
-            const pr = context.payload.pull_request;
-            const { owner, repo } = context.repo;
-            const num = pr.number;
-
-            // ---- live changed files ----
-            const files = await github.paginate(github.rest.pulls.listFiles, {
-              owner, repo, pull_number: num, per_page: 100,
-            });
-            const paths = files.map(f => f.filename);
-            const m = (re) => paths.some(p => re.test(p));
-
-            // ---- area: replaces .github/labeler.yml (path -> area) ----
-            const AREA_RULES = [
-              ['area:frontend', [/^frontend\//]],
-              ['area:backend',  [/^backend\/app\//, /^backend\/packages\/harness\/deerflow\/(runtime|persistence|config|tools|guardrails|tracing|models|utils|uploads)\//]],
-              ['area:agents',   [/^backend\/packages\/harness\/deerflow\/(agents|subagents|reflection)\//, /(^|\/)langgraph\.json$/, /^backend\/.*\/prompts\//]],
-              ['area:sandbox',  [/^docker\//, /^backend\/packages\/harness\/deerflow\/sandbox\//, /(^|\/)Dockerfile$/]],
-              ['area:skills',   [/^skills\//, /^backend\/packages\/harness\/deerflow\/skills\//, /^frontend\/src\/core\/skills\//]],
-              ['area:mcp',      [/^backend\/packages\/harness\/deerflow\/mcp\//, /^frontend\/src\/core\/mcp\//]],
-              ['area:ci',       [/^\.github\//, /^scripts\//]],
-              ['area:docs',     [/^docs\//, /\.mdx?$/]],
-              ['area:deps',     [/(^|\/)(pyproject\.toml|uv\.lock|package\.json|pnpm-lock\.yaml)$/]],
-            ];
-            const areaLabels = AREA_RULES
-              .filter(([, res]) => res.some(re => m(re)))
-              .map(([label]) => label);
-
-            // ---- size: additions+deletions, excluding lockfiles/snapshots ----
-            const EXCLUDE_SIZE = /(^|\/)(uv\.lock|pnpm-lock\.yaml|package-lock\.json)$|\.snap$/;
-            const churn = files
-              .filter(f => !EXCLUDE_SIZE.test(f.filename))
-              .reduce((s, f) => s + (f.additions || 0) + (f.deletions || 0), 0);
-            const sizeLabel =
-              churn < 20 ? 'size/XS' :
-              churn < 100 ? 'size/S' :
-              churn < 300 ? 'size/M' :
-              churn < 700 ? 'size/L' : 'size/XL';
-
-            // ---- risk ----
-            const docsOnly = paths.length > 0 && paths.every(p =>
-              /\.(md|mdx|txt)$/i.test(p) || p.startsWith('docs/') ||
-              /\.(png|jpe?g|gif|svg|webp|ico)$/i.test(p));
-            const highRisk =
-              m(/^backend\/app\/gateway\//) ||
-              m(/^backend\/packages\/harness\/deerflow\/(agents|subagents|sandbox)\//) ||
-              m(/(^|\/)langgraph\.json$/) ||
-              m(/(^|\/)(auth|authz|security)/i) ||
-              m(/(pyproject\.toml|uv\.lock|package\.json|pnpm-lock\.yaml)$/) ||
-              m(/^docker\//) ||
-              m(/^\.github\/workflows\//);
-            const riskLabel = docsOnly ? 'risk:low' : (highRisk ? 'risk:high' : 'risk:medium');
-
-            // ---- needs-validation: front/back contract surface ----
-            const contract =
-              m(/^backend\/app\/gateway\//) ||
-              m(/^backend\/packages\/harness\/deerflow\/(agents|subagents)\//) ||
-              m(/(^|\/)langgraph\.json$/) ||
-              m(/^frontend\/src\/core\/(api|threads|messages)\//);
-
-            // ---- live current labels (NOT the stale event payload) ----
-            const current = (await github.paginate(github.rest.issues.listLabelsOnIssue, {
-              owner, repo, issue_number: num, per_page: 100,
-            })).map(l => l.name);
-            const hasSkip = current.includes('skip-validation');
-
-            // Reconcile ONLY namespaces we own; never touch others.
-            const owned = (n) =>
-              n.startsWith('area:') || n.startsWith('size/') ||
-              n.startsWith('risk:') || n === 'needs-validation';
-            const desired = new Set([...areaLabels, sizeLabel, riskLabel]);
-            if (contract && !hasSkip) desired.add('needs-validation');
-
-            const toRemove = current.filter(n => owned(n) && !desired.has(n));
-            const toAdd = [...desired].filter(n => !current.includes(n));
-
-            // first-time-contributor: add-only, on opened, real users only.
-            if (context.payload.action === 'opened' &&
-                pr.user.type === 'User' &&
-                ['FIRST_TIME_CONTRIBUTOR', 'FIRST_TIMER'].includes(pr.author_association) &&
-                !current.includes('first-time-contributor')) {
-              toAdd.push('first-time-contributor');
-            }
-
-            for (const name of toRemove) {
-              try {
-                await github.rest.issues.removeLabel({ owner, repo, issue_number: num, name });
-              } catch (e) {
-                if (e.status !== 404) throw e;
-              }
-            }
-            if (toAdd.length) {
-              await github.rest.issues.addLabels({ owner, repo, issue_number: num, labels: toAdd });
-            }
-            core.info(`area=[${areaLabels.join(',')}] ${sizeLabel} ${riskLabel} churn=${churn} ` +
-              `validation=${desired.has('needs-validation')} ` +
-              `(+${toAdd.join(',') || '-'} / -${toRemove.join(',') || '-'})`);
-
-  # ── PR: reviewing label on a maintainer's human review ─────────────────────
-  reviewing:
-    if: github.event_name == 'pull_request_review'
-    runs-on: ubuntu-latest
-    concurrency:
-      group: triage-review-${{ github.event.pull_request.number }}
-      cancel-in-progress: false
-    steps:
-      - name: Add reviewing label for maintainer reviews
-        uses: actions/github-script@v8
-        with:
-          script: |
-            const { owner, repo } = context.repo;
-            const num = context.payload.pull_request.number;
-            const review = context.payload.review;
-            const assoc = review.author_association;     // payload field; no API call
-            const type = review.user && review.user.type;
-
-            // author_association is NONE for every automated reviewer
-            // (Copilot, CodeRabbit, Codex, Sourcery, ...), so this allowlist
-            // drops them all without a denylist — and never calls the
-            // collaborators API that 404s on "Copilot is not a user".
-            // user.type === 'User' guards the rare bot-added-as-collaborator case.
-            if (!['OWNER', 'MEMBER', 'COLLABORATOR'].includes(assoc) || type !== 'User') {
-              core.info(`reviewer ${review.user && review.user.login} assoc=${assoc} type=${type}; skipping.`);
-              return;
-            }
-
-            const labels = (await github.paginate(github.rest.issues.listLabelsOnIssue, {
-              owner, repo, issue_number: num, per_page: 100,
-            })).map(l => l.name);
-            if (labels.includes('reviewing')) {
-              core.info('Already labeled reviewing; skipping.');
-              return;
-            }
-            try {
-              await github.rest.issues.addLabels({
-                owner, repo, issue_number: num, labels: ['reviewing'],
-              });
-              core.info('Added "reviewing".');
-            } catch (e) {
-              if (e.status === 403) core.info('No permission to label (expected on some fork PRs).');
-              else throw e;
-            }
-
-  # ── Issue: needs-triage on every new issue ────────────────────────────────
-  issue-triage:
-    if: github.event_name == 'issues'
-    runs-on: ubuntu-latest
-    concurrency:
-      group: triage-issue-${{ github.event.issue.number }}
-      cancel-in-progress: false
-    steps:
-      - name: Add needs-triage label
-        uses: actions/github-script@v8
-        with:
-          script: |
-            const { owner, repo } = context.repo;
-            const issue_number = context.payload.issue.number;
-
-            // Read live labels (not the event payload) so labels added at creation
-            // time via the API or by another automation are seen — consistent with
-            // the live-state reads in the PR jobs above.
-            const current = (await github.paginate(github.rest.issues.listLabelsOnIssue, {
-              owner, repo, issue_number, per_page: 100,
-            })).map(l => l.name);
-            if (current.includes('needs-triage')) {
-              core.info('Issue already has needs-triage; nothing to do.');
-              return;
-            }
-            // Self-heal: create the label if it does not exist yet.
-            try {
-              await github.rest.issues.createLabel({
-                owner, repo, name: 'needs-triage', color: 'fef2c0',
-                description: 'Awaiting maintainer triage',
-              });
-            } catch (e) {
-              if (e.status !== 422) throw e; // 422 = already exists
-            }
-            await github.rest.issues.addLabels({
-              owner, repo, issue_number, labels: ['needs-triage'],
-            });
-            core.info(`Added needs-triage to #${issue_number}.`);
@@ -287,21 +287,6 @@ Nginx (port 2026) ← Unified entry point
   git push origin feature/your-feature-name
   ```

-## AI assistance disclosure
-
-DeerFlow is an AI project and we welcome AI-assisted contributions. To help
-reviewers calibrate how closely to read a change, **every pull request must
-complete the "AI assistance" section of the
-[PR template](.github/pull_request_template.md)**:
-
- which tool(s) you used (or `none`),
- how you used them, and
- a confirmation that a human has read, understands, and takes responsibility
-  for the change.
-
-Please don't delete the section. PRs that ignore it may be asked to fill it in
-before review.
-
 ## Testing

 ```bash
@@ -89,7 +89,36 @@ install:

 # Pre-pull sandbox Docker image (optional but recommended)
 setup-sandbox:
-	@$(RUN_WITH_GIT_BASH) ./scripts/setup-sandbox.sh
+	@echo "=========================================="
+	@echo "  Pre-pulling Sandbox Container Image"
+	@echo "=========================================="
+	@echo ""
+	@IMAGE=$$(grep -A 20 "# sandbox:" config.yaml 2>/dev/null | grep "image:" | awk '{print $$2}' | head -1); \
+	if [ -z "$$IMAGE" ]; then \
+		IMAGE="enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest"; \
+		echo "Using default image: $$IMAGE"; \
+	else \
+		echo "Using configured image: $$IMAGE"; \
+	fi; \
+	echo ""; \
+	if command -v container >/dev/null 2>&1 && [ "$$(uname)" = "Darwin" ]; then \
+		echo "Detected Apple Container on macOS, pulling image..."; \
+		container image pull "$$IMAGE" || echo "⚠ Apple Container pull failed, will try Docker"; \
+	fi; \
+	if command -v docker >/dev/null 2>&1; then \
+		echo "Pulling image using Docker..."; \
+		if docker pull "$$IMAGE"; then \
+			echo ""; \
+			echo "✓ Sandbox image pulled successfully"; \
+		else \
+			echo ""; \
+			echo "⚠ Failed to pull sandbox image (this is OK for local sandbox mode)"; \
+		fi; \
+	else \
+		echo "✗ Neither Docker nor Apple Container is available"; \
+		echo "  Please install Docker: https://docs.docker.com/get-docker/"; \
+		exit 1; \
+	fi

 # Start all services in development mode (with hot-reloading)
 dev:
@@ -119,6 +148,7 @@ stop:
 clean: stop
 	@echo "Cleaning up..."
 	@-rm -rf backend/.deer-flow 2>/dev/null || true
+	@-rm -rf backend/.langgraph_api 2>/dev/null || true
 	@-rm -rf logs/*.log 2>/dev/null || true
 	@echo "✓ Cleanup complete"

@@ -247,9 +247,6 @@ Access: http://localhost:2026

 The unified nginx endpoint is same-origin by default and does not emit browser CORS headers. If you run a split-origin or port-forwarded browser client, set `GATEWAY_CORS_ORIGINS` to comma-separated exact origins such as `http://localhost:3000`; the Gateway then applies the CORS allowlist and matching CSRF origin checks.

-> [!IMPORTANT]
-> The Gateway holds run state (RunManager and the stream bridge) in process, so production defaults to a single Gateway worker (`GATEWAY_WORKERS=1`). Raising the worker count without a shared cross-worker stream bridge — which is not yet available — breaks run cancellation, SSE reconnects, request de-duplication, and IM channels, because nginx uses no sticky sessions and each worker keeps its own run state. Scale a single worker up with more CPU/RAM (or move the database and sandbox onto dedicated tiers) instead of raising `GATEWAY_WORKERS`.
-
 See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed Docker development guide.

 #### Option 2: Local Development
@@ -343,8 +340,6 @@ See the [MCP Server Guide](backend/docs/MCP_SERVER.md) for detailed instructions

 DeerFlow supports receiving tasks from messaging apps. Channels auto-start when configured — no public IP required for any of them.

-DeerFlow can also expose user-owned IM channel connections in the workspace UI. When `channel_connections` is enabled, logged-in users can bind Telegram, Slack, Discord, Feishu/Lark, DingTalk, WeChat, or WeCom from the sidebar / Settings > Channels. It reuses the existing outbound `channels.*` transports, so no public IP or provider callback URL is required. Incoming IM messages then run under the connected DeerFlow user account. See [IM Channel Connections](backend/docs/IM_CHANNEL_CONNECTIONS.md) for setup and security notes.
-
 | Channel | Transport | Difficulty |
 |---------|-----------|------------|
 | Telegram | Bot API (long-polling) | Easy |
@@ -590,8 +585,6 @@ A standard Agent Skill is a structured capability module — a Markdown file tha

 Skills are loaded progressively — only when the task needs them, not all at once. This keeps the context window lean and makes DeerFlow work well even with token-sensitive models.

-Users can explicitly activate an enabled skill for a single turn by starting the request with `/skill-name`, for example `/data-analysis analyze uploads/foo.csv`. DeerFlow loads that skill's `SKILL.md` as hidden current-turn context while leaving the base prompt limited to skill metadata. Slash activation respects disabled skills, custom-agent skill whitelists, and existing channel commands such as `/new` and `/help`.
-
 When you install `.skill` archives through the Gateway, DeerFlow accepts standard optional frontmatter metadata such as `version`, `author`, and `compatibility` instead of rejecting otherwise valid external skills.

 Tools follow the same philosophy. DeerFlow comes with a core toolset — web search, web fetch, file operations, bash execution — and supports custom tools via MCP servers and Python functions. Swap anything. Add anything.
@@ -24,10 +24,5 @@ config.yaml
 # Langgraph
 .langgraph_api

-# Sandbox runtime working dir — pre-created and excluded from uvicorn reload
-# (scripts/serve.sh, docker/dev-entrypoint.sh). Anchored so it does not match
-# the source package backend/packages/harness/deerflow/sandbox/.
-/sandbox/
-
 # Claude Code settings
 .claude/settings.local.json
@@ -122,14 +122,10 @@ Blocking-IO runtime gate (`tests/blocking_io/`):
  `tests/support/detectors/blocking_io_runtime.py`). Any sync blocking IO
  call whose stack passes through DeerFlow business code while running on
  the asyncio event loop raises `BlockingError` and fails the test.
- Regression anchors live there: `test_skills_load.py` (locks the
+- Two regression anchors live there: `test_skills_load.py` (locks the
  `asyncio.to_thread` offload around `LocalSkillStorage.load_skills`, fix
-  for #1917); `test_sqlite_lifespan.py` (locks the offload around
-  SQLite path resolution plus `ensure_sqlite_parent_dir`, fix for #1912);
-  `test_jsonl_run_event_store.py` (locks `JsonlRunEventStore`'s async
-  API offloading its file IO via `asyncio.to_thread`, fix #3084); and
-  `test_uploads_middleware.py` (locks `UploadsMiddleware.abefore_agent`
-  offloading the uploads-directory scan off the event loop).
+  for #1917) and `test_sqlite_lifespan.py` (locks the offload around
+  SQLite path resolution plus `ensure_sqlite_parent_dir`, fix for #1912).
 - `test_gate_smoke.py` is a meta-test asserting the gate actually catches
  unoffloaded blocking IO and that the `@pytest.mark.allow_blocking_io`
  opt-out works.
@@ -192,7 +188,7 @@ from deerflow.config import get_app_config

 ### Middleware Chain

-Lead-agent middlewares are assembled in strict append order across `packages/harness/deerflow/agents/middlewares/tool_error_handling_middleware.py` (`build_lead_runtime_middlewares`) and `packages/harness/deerflow/agents/lead_agent/agent.py` (`build_middlewares`):
+Lead-agent middlewares are assembled in strict append order across `packages/harness/deerflow/agents/middlewares/tool_error_handling_middleware.py` (`build_lead_runtime_middlewares`) and `packages/harness/deerflow/agents/lead_agent/agent.py` (`_build_middlewares`):

 1. **ThreadDataMiddleware** - Creates per-thread directories under the user's isolation scope (`backend/.deer-flow/users/{user_id}/threads/{thread_id}/user-data/{workspace,uploads,outputs}`); resolves `user_id` via `get_effective_user_id()` (falls back to `"default"` in no-auth mode); Web UI thread deletion now follows LangGraph thread removal with Gateway cleanup of the local thread directory
 2. **UploadsMiddleware** - Tracks and injects newly uploaded files into conversation
@@ -202,17 +198,16 @@ Lead-agent middlewares are assembled in strict append order across `packages/har
 6. **GuardrailMiddleware** - Pre-tool-call authorization via pluggable `GuardrailProvider` protocol (optional, if `guardrails.enabled` in config). Evaluates each tool call and returns error ToolMessage on deny. Three provider options: built-in `AllowlistProvider` (zero deps), OAP policy providers (e.g. `aport-agent-guardrails`), or custom providers. See [docs/GUARDRAILS.md](docs/GUARDRAILS.md) for setup, usage, and how to implement a provider.
 7. **SandboxAuditMiddleware** - Audits sandboxed shell/file operations for security logging before tool execution continues
 8. **ToolErrorHandlingMiddleware** - Converts tool exceptions into error `ToolMessage`s so the run can continue instead of aborting
-9. **SkillActivationMiddleware** - Detects strict `/skill-name task` syntax on the latest real user message, resolves only enabled and runtime-allowed skills, reads `SKILL.md` from trusted skill storage, injects the skill body as hidden current-turn model context, and records a `middleware:skill_activation` audit event with skill name, category, path, and content hash
-10. **SummarizationMiddleware** - Context reduction when approaching token limits (optional, if enabled)
-11. **TodoListMiddleware** - Task tracking with `write_todos` tool (optional, if plan_mode)
-12. **TokenUsageMiddleware** - Records token usage metrics when token tracking is enabled (optional); subagent usage is cached by `tool_call_id` only while token usage is enabled and merged back into the dispatching AIMessage by message position rather than message id
-13. **TitleMiddleware** - Auto-generates thread title after first complete exchange and normalizes structured message content before prompting the title model
-14. **MemoryMiddleware** - Queues conversations for async memory update (filters to user + final AI responses)
-15. **ViewImageMiddleware** - Injects base64 image data before LLM call (conditional on vision support)
-16. **DeferredToolFilterMiddleware** - Hides deferred (MCP) tool schemas from the bound model using a build-time deferred-name set + catalog hash, reading per-thread promotions from `ThreadState.promoted` (hash-scoped, no ContextVar); a tool becomes bound on subsequent turns after `tool_search` returns its schema (optional, if `tool_search.enabled`)
-17. **SubagentLimitMiddleware** - Truncates excess `task` tool calls from model response to enforce `MAX_CONCURRENT_SUBAGENTS` limit (optional, if `subagent_enabled`)
-18. **LoopDetectionMiddleware** - Detects repeated tool-call loops; hard-stop responses clear both structured `tool_calls` and raw provider tool-call metadata before forcing a final text answer
-19. **ClarificationMiddleware** - Intercepts `ask_clarification` tool calls, interrupts via `Command(goto=END)` (must be last)
+9. **SummarizationMiddleware** - Context reduction when approaching token limits (optional, if enabled)
+10. **TodoListMiddleware** - Task tracking with `write_todos` tool (optional, if plan_mode)
+11. **TokenUsageMiddleware** - Records token usage metrics when token tracking is enabled (optional); subagent usage is cached by `tool_call_id` only while token usage is enabled and merged back into the dispatching AIMessage by message position rather than message id
+12. **TitleMiddleware** - Auto-generates thread title after first complete exchange and normalizes structured message content before prompting the title model
+13. **MemoryMiddleware** - Queues conversations for async memory update (filters to user + final AI responses)
+14. **ViewImageMiddleware** - Injects base64 image data before LLM call (conditional on vision support)
+15. **DeferredToolFilterMiddleware** - Hides deferred tool schemas from the bound model until tool search is enabled (optional)
+16. **SubagentLimitMiddleware** - Truncates excess `task` tool calls from model response to enforce `MAX_CONCURRENT_SUBAGENTS` limit (optional, if `subagent_enabled`)
+17. **LoopDetectionMiddleware** - Detects repeated tool-call loops; hard-stop responses clear both structured `tool_calls` and raw provider tool-call metadata before forcing a final text answer
+18. **ClarificationMiddleware** - Intercepts `ask_clarification` tool calls, interrupts via `Command(goto=END)` (must be last)

 ### Configuration System

@@ -224,9 +219,17 @@ Setup: Copy `config.example.yaml` to `config.yaml` in the **project root** direc

 **Config Caching**: `get_app_config()` caches the parsed config, but automatically reloads it when the resolved config path changes or the file's mtime increases. This keeps Gateway and LangGraph reads aligned with `config.yaml` edits without requiring a manual process restart.

-**Config Hot-Reload Boundary**: Gateway dependencies route through `get_app_config()` on every request, so per-run fields like `models[*].max_tokens`, `summarization.*`, `title.*`, `memory.*`, `subagents.*`, `tools[*]`, and the agent system prompt pick up `config.yaml` edits on the next message. `AppConfig` is intentionally **not** cached on `app.state` — `lifespan()` keeps a local `startup_config` variable for one-shot bootstrap work and passes it to `langgraph_runtime(app, startup_config)`.
+**Config Hot-Reload Boundary**: Gateway dependencies route through `get_app_config()` on every request, so per-run fields like `models[*].max_tokens`, `summarization.*`, `title.*`, `memory.*`, `subagents.*`, `tools[*]`, and the agent system prompt pick up `config.yaml` edits on the next message. `AppConfig` is intentionally **not** cached on `app.state` — `lifespan()` keeps a local `startup_config` variable for one-shot bootstrap work (logging level, channels, `langgraph_runtime` engines) and passes it explicitly to `langgraph_runtime(app, startup_config)`. Infrastructure fields are **restart-required**:

-Infrastructure fields are **restart-required**. The authoritative list lives in `packages/harness/deerflow/config/reload_boundary.py::STARTUP_ONLY_FIELDS` and is mirrored by the standardised `"startup-only:"` prefix on the corresponding `Field(description=...)` in `AppConfig`, so IDE hover on those fields surfaces the reason inline (no need to context-switch into this table). Currently registered: `database`, `checkpointer`, `run_events`, `stream_bridge`, `sandbox`, `log_level`, `channels`. Adding a new restart-required field requires updating the registry; drift is pinned by `tests/test_reload_boundary.py`.
+| Field | Why a restart is required |
+|---|---|
+| `database.*` | `init_engine_from_config()` runs once during `langgraph_runtime()` startup; the SQLAlchemy engine holds the connection pool. |
+| `checkpointer.*` (including SQLite WAL/journal settings) | `make_checkpointer()` binds the persistent checkpointer once at startup. |
+| `run_events.*` | `make_run_event_store()` selects memory- vs. SQL-backed implementation at startup. |
+| `stream_bridge.*` | `make_stream_bridge()` constructs the bridge object once. |
+| `sandbox.use` | `get_sandbox_provider()` caches the provider singleton (`_default_sandbox_provider`); a new class path takes effect only on next process start. |
+| `log_level` | `apply_logging_level()` is called only in `app.py` startup; it mutates the root logger's level, and `get_app_config()` returning a fresh `AppConfig` does not retrigger it. |
+| `channels.*` IM platform credentials | `start_channel_service()` is invoked once during startup; live channels are not rebuilt on config change. |

 Configuration priority:
 1. Explicit `config_path` argument
@@ -264,7 +267,7 @@ CORS is same-origin by default when requests enter through nginx on port 2026. S
 | **Uploads** (`/api/threads/{id}/uploads`) | `POST /` - upload files (auto-converts PDF/PPT/Excel/Word); `GET /list` - list; `DELETE /{filename}` - delete |
 | **Threads** (`/api/threads/{id}`) | `DELETE /` - remove DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail |
 | **Artifacts** (`/api/threads/{id}/artifacts`) | `GET /{path}` - serve artifacts; active content types (`text/html`, `application/xhtml+xml`, `image/svg+xml`) are always forced as download attachments to reduce XSS risk; `?download=true` still forces download for other file types |
-| **Suggestions** (`/api/threads/{id}/suggestions`) | `POST /` - generate follow-up questions; rich list/block model content is normalized and inline reasoning (`<think>...</think>`, including unclosed/truncated blocks from reasoning models like MiniMax-M3) is stripped before JSON parsing |
+| **Suggestions** (`/api/threads/{id}/suggestions`) | `POST /` - generate follow-up questions; rich list/block model content is normalized before JSON parsing |
 | **Thread Runs** (`/api/threads/{id}/runs`) | `POST /` - create background run; `POST /stream` - create + SSE stream; `POST /wait` - create + block; `GET /` - list runs; `GET /{rid}` - run details; `POST /{rid}/cancel` - cancel; `GET /{rid}/join` - join SSE; `GET /{rid}/messages` - paginated messages `{data, has_more}`; `GET /{rid}/events` - full event stream; `GET /../messages` - thread messages with feedback; `GET /../token-usage` - aggregate tokens |
 | **Feedback** (`/api/threads/{id}/runs/{rid}/feedback`) | `PUT /` - upsert feedback; `DELETE /` - delete user feedback; `POST /` - create feedback; `GET /` - list feedback; `GET /stats` - aggregate stats; `DELETE /{fid}` - delete specific |
 | **Runs** (`/api/runs`) | `POST /stream` - stateless run + SSE; `POST /wait` - stateless run + block; `GET /{rid}/messages` - paginated messages by run_id `{data, has_more}` (cursor: `after_seq`/`before_seq`); `GET /{rid}/feedback` - list feedback by run_id |
@@ -274,7 +277,6 @@ CORS is same-origin by default when requests enter through nginx on port 2026. S
 - When a persistent `RunStore` is configured, `get()` and `list_by_thread()` hydrate historical runs from the store. In-memory records win for the same `run_id` so task, abort, and stream-control state stays attached to active local runs.
 - `cancel()` and `create_or_reject(..., multitask_strategy="interrupt"|"rollback")` persist interrupted status through `RunStore.update_status()`, matching normal `set_status()` transitions.
 - Store-only hydrated runs are readable history. If the current worker has no in-memory task/control state for that run, cancellation APIs can return 409 because this worker cannot stop the task.
- `POST /wait` (both thread-scoped and `/api/runs/wait`) drains the stream bridge via `wait_for_run_completion()` instead of bare `await record.task`, so it honours the run's `on_disconnect` setting and cancels the background run on real client disconnect rather than returning a stale checkpoint (issue #3265).

 Proxied through nginx: `/api/langgraph/*` → Gateway LangGraph-compatible runtime, all other `/api/*` → Gateway REST APIs.

@@ -306,7 +308,6 @@ Proxied through nginx: `/api/langgraph/*` → Gateway LangGraph-compatible runti
 **Concurrency**: `MAX_CONCURRENT_SUBAGENTS = 3` enforced by `SubagentLimitMiddleware` (truncates excess tool calls in `after_model`), 15-minute timeout
 **Flow**: `task()` tool → `SubagentExecutor` → background thread → poll 5s → SSE events → result
 **Events**: `task_started`, `task_running`, `task_completed`/`task_failed`/`task_timed_out`
-**Deferred MCP tools** (if `tool_search.enabled`): `SubagentExecutor._build_initial_state` assembles deferral after policy filtering via the shared `assemble_deferred_tools` (fail-closed), appends the `tool_search` tool, injects the `<available-deferred-tools>` section into the subagent's `SystemMessage`, and threads the setup to `_create_agent`, which attaches `DeferredToolFilterMiddleware` through `build_subagent_runtime_middlewares(deferred_setup=...)`. Subagents thus withhold full MCP schemas until promotion, same as the lead agent; each task run gets a fresh `ThreadState` so promotion is isolated per run

 ### Tool System (`packages/harness/deerflow/tools/`)

@@ -341,7 +342,7 @@ Proxied through nginx: `/api/langgraph/*` → Gateway LangGraph-compatible runti
 - **Cache invalidation**: Detects config file changes via mtime comparison
 - **Transports**: stdio (command-based), SSE, HTTP
 - **OAuth (HTTP/SSE)**: Supports token endpoint flows (`client_credentials`, `refresh_token`) with automatic token refresh + Authorization header injection
- **Runtime updates**: Gateway API saves to extensions_config.json; the Gateway-embedded runtime detects changes via mtime
+- **Runtime updates**: Gateway API saves to extensions_config.json; LangGraph detects via mtime

 ### Skills System (`packages/harness/deerflow/skills/`)

@@ -349,7 +350,6 @@ Proxied through nginx: `/api/langgraph/*` → Gateway LangGraph-compatible runti
 - **Format**: Directory with `SKILL.md` (YAML frontmatter: name, description, license, allowed-tools)
 - **Loading**: `load_skills()` recursively scans `skills/{public,custom}` for `SKILL.md`, parses metadata, and reads enabled state from extensions_config.json
 - **Injection**: Enabled skills listed in agent system prompt with container paths
- **Slash activation**: `/skill-name task` loads that enabled skill's `SKILL.md` for the current model call only. The resolver rejects leading whitespace, missing separators, reserved channel commands (`/new`, `/help`, `/bootstrap`, `/status`, `/models`, `/memory`), disabled skills, and skills outside a custom agent's whitelist.
 - **Installation**: `POST /api/skills/install` extracts .skill ZIP archive to custom/ directory

 ### Model Factory (`packages/harness/deerflow/models/factory.py`)
@@ -369,7 +369,8 @@ Proxied through nginx: `/api/langgraph/*` → Gateway LangGraph-compatible runti

 ### IM Channels System (`app/channels/`)

-Bridges external messaging platforms (Feishu, Slack, Telegram, Discord, DingTalk) to the DeerFlow agent via Gateway's LangGraph-compatible API.
+Bridges external messaging platforms (Feishu, Slack, Telegram, DingTalk) to the DeerFlow agent via the LangGraph Server.
+

 **Architecture**: Channels communicate with Gateway through the `langgraph-sdk` HTTP client (same as the frontend), ensuring threads are created and managed server-side. The internal SDK client injects process-local internal auth plus a matching CSRF cookie/header pair so Gateway accepts state-changing thread/run requests from channel workers without relying on browser session cookies.

@@ -379,21 +380,18 @@ Bridges external messaging platforms (Feishu, Slack, Telegram, Discord, DingTalk
 - `manager.py` - Core dispatcher: creates threads via `client.threads.create()`, routes commands, keeps Slack/Telegram on `client.runs.wait()`, and uses `client.runs.stream(["messages-tuple", "values"])` for Feishu incremental outbound updates
 - `base.py` - Abstract `Channel` base class (start/stop/send lifecycle)
 - `service.py` - Manages lifecycle of all configured channels from `config.yaml`
- `slack.py` / `feishu.py` / `telegram.py` / `discord.py` / `dingtalk.py` - Platform-specific implementations (`feishu.py` tracks the running card `message_id` in memory and patches the same card in place; `dingtalk.py` optionally uses AI Card streaming for in-place updates when `card_template_id` is configured)
- `app/gateway/routers/channel_connections.py` - Browser-facing user connection and disconnect APIs
- `deerflow.persistence.channel_connections` - SQL-backed user-owned connection, optional credential, connect state, and conversation store
+- `slack.py` / `feishu.py` / `telegram.py` / `dingtalk.py` - Platform-specific implementations (`feishu.py` tracks the running card `message_id` in memory and patches the same card in place; `dingtalk.py` optionally uses AI Card streaming for in-place updates when `card_template_id` is configured)

 **Message Flow**:
 1. External platform -> Channel impl -> `MessageBus.publish_inbound()`
 2. `ChannelManager._dispatch_loop()` consumes from queue
-3. For user-owned channel connections, incoming messages carry `connection_id`, `owner_user_id`, and `workspace_id`; `owner_user_id` becomes the DeerFlow run `user_id`, while the raw platform user id remains `channel_user_id`
-4. For chat: look up/create thread through Gateway's LangGraph-compatible API
-5. Feishu chat: `runs.stream()` → accumulate AI text → publish multiple outbound updates (`is_final=False`) → publish final outbound (`is_final=True`)
-6. Slack/Telegram chat: `runs.wait()` → extract final response → publish outbound
-7. Feishu channel sends one running reply card up front, then patches the same card for each outbound update (card JSON sets `config.update_multi=true` for Feishu's patch API requirement)
-8. DingTalk AI Card mode (when `card_template_id` configured): `runs.stream()` → create card with initial text → stream updates via `PUT /v1.0/card/streaming` → finalize on `is_final=True`. Falls back to `sampleMarkdown` if card creation or streaming fails
-9. For commands (`/new`, `/status`, `/models`, `/memory`, `/help`): handle locally or query Gateway API
-10. Outbound → channel callbacks → platform reply
+3. For chat: look up/create thread through Gateway's LangGraph-compatible API
+4. Feishu chat: `runs.stream()` → accumulate AI text → publish multiple outbound updates (`is_final=False`) → publish final outbound (`is_final=True`)
+5. Slack/Telegram chat: `runs.wait()` → extract final response → publish outbound
+6. Feishu channel sends one running reply card up front, then patches the same card for each outbound update (card JSON sets `config.update_multi=true` for Feishu's patch API requirement)
+7. DingTalk AI Card mode (when `card_template_id` configured): `runs.stream()` → create card with initial text → stream updates via `PUT /v1.0/card/streaming` → finalize on `is_final=True`. Falls back to `sampleMarkdown` if card creation or streaming fails
+8. For commands (`/new`, `/status`, `/models`, `/memory`, `/help`): handle locally or query Gateway API
+9. Outbound → channel callbacks → platform reply

 **Configuration** (`config.yaml` -> `channels`):
 - `langgraph_url` - LangGraph-compatible Gateway API base URL (default: `http://localhost:8001/api`)
@@ -401,16 +399,6 @@ Bridges external messaging platforms (Feishu, Slack, Telegram, Discord, DingTalk
 - In Docker Compose, IM channels run inside the `gateway` container, so `localhost` points back to that container. Use `http://gateway:8001/api` for `langgraph_url` and `http://gateway:8001` for `gateway_url`, or set `DEER_FLOW_CHANNELS_LANGGRAPH_URL` / `DEER_FLOW_CHANNELS_GATEWAY_URL`.
 - Per-channel configs: `feishu` (app_id, app_secret), `slack` (bot_token, app_token), `telegram` (bot_token), `dingtalk` (client_id, client_secret, optional `card_template_id` for AI Card streaming)

-**User-owned channel connections** (`config.yaml` -> `channel_connections`):
- Disabled by default. It is a user-binding layer on top of the existing `channels.*` runtime config, not a replacement for provider bot credentials.
- No public IP, OAuth callback URL, or provider webhook route is required by the current implementation.
- Telegram uses a deep-link `/start <code>` flow over the existing long-polling worker. Slack, Discord, Feishu/Lark, DingTalk, WeChat, and WeCom use `/connect <code>` over their existing outbound channel workers.
- Frontend APIs: `GET /api/channels/providers`, `GET /api/channels/connections`, `POST /api/channels/{provider}/connect`, and `DELETE /api/channels/connections/{connection_id}`.
- Browser APIs remain protected by normal Gateway auth/CSRF. Provider messages arrive through the already-configured channel workers.
- Slack replies use the configured operator bot token from `channels.slack` unless a future provider-token flow stores per-connection credentials.
- Telegram, Slack, Discord, Feishu/Lark, DingTalk, WeChat, and WeCom workers resolve incoming platform identities to connection records before reaching `ChannelManager`.
- See `backend/docs/IM_CHANNEL_CONNECTIONS.md` for provider setup and operational notes.
-

 ### Memory System (`packages/harness/deerflow/agents/memory/`)

@@ -441,12 +429,6 @@ Bridges external messaging platforms (Feishu, Slack, Telegram, Discord, DingTalk
 4. Applies updates atomically (temp file + rename) with cache invalidation, skipping duplicate fact content before append
 5. Next interaction injects top 15 facts + context into `<memory>` tags in system prompt

-**Token counting** (`packages/harness/deerflow/agents/memory/prompt.py`):
- `_count_tokens` budgets the injection. In default `tiktoken` mode, the encoding is loaded lazily and cached.
- Failed tiktoken loads are cached with a timestamp. During the fixed cooldown (`_TIKTOKEN_RETRY_COOLDOWN_S`, 600s), callers fall back to char estimation immediately instead of re-triggering the blocking BPE download; after the cooldown, transient outages can self-heal without a restart.
- In-flight loads are cached as a LOADING sentinel so concurrent callers fall back instead of spawning more blocking threads.
- Set `memory.token_counting: char` to skip tiktoken entirely and use the network-free CJK-aware char estimate.
-
 Focused regression coverage for the updater lives in `backend/tests/test_memory_updater.py`.

 **Configuration** (`config.yaml` → `memory`):
@@ -456,7 +438,6 @@ Focused regression coverage for the updater lives in `backend/tests/test_memory_
 - `model_name` - LLM for updates (null = default model)
 - `max_facts` / `fact_confidence_threshold` - Fact storage limits (100 / 0.7)
 - `max_injection_tokens` - Token limit for prompt injection (2000)
- `token_counting` - Token counting strategy for the injection budget: `tiktoken` (default, accurate but may download BPE data from a public endpoint on first use — can block for a long time in network-restricted environments, see issues #3402/#3429) or `char` (network-free CJK-aware char estimate, never touches tiktoken)

 ### Reflection System (`packages/harness/deerflow/reflection/`)

@@ -514,7 +495,7 @@ Both can be modified at runtime via Gateway API endpoints or `DeerFlowClient` me
  - `"messages-tuple"` — per-chunk update: for AI text this is a **delta** (concat per `id` to rebuild the full message); tool calls and tool results are emitted once each
  - `"custom"` — forwarded from `StreamWriter`
  - `"end"` — stream finished (carries cumulative `usage` counted once per message id)
- Agent created lazily via `create_agent()` + `build_middlewares()`, same as `make_lead_agent`
+- Agent created lazily via `create_agent()` + `_build_middlewares()`, same as `make_lead_agent`
 - Supports `checkpointer` parameter for state persistence across turns
 - `reset_agent()` forces agent recreation (e.g. after memory or skill changes)
 - See [docs/STREAMING.md](docs/STREAMING.md) for the full design: why Gateway and DeerFlowClient are parallel paths, LangGraph's `stream_mode` semantics, the per-id dedup invariants, and regression testing strategy
@@ -64,7 +64,7 @@ FROM builder AS dev
 # Install Docker CLI (for DooD: allows starting sandbox containers via host Docker socket)
 COPY --from=docker:cli /usr/local/bin/docker /usr/local/bin/docker

-EXPOSE 8001
+EXPOSE 8001 2024

 CMD ["sh", "-c", "cd backend && PYTHONPATH=. uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001"]

@@ -94,8 +94,8 @@ WORKDIR /app
 # Copy backend with pre-built virtualenv from builder
 COPY --from=builder /app/backend ./backend

-# Expose Gateway API port.
-EXPOSE 8001
+# Expose ports (gateway: 8001, langgraph: 2024)
+EXPOSE 8001 2024

 # Default command (can be overridden in docker-compose)
 CMD ["sh", "-c", "cd backend && PYTHONPATH=. uv run --no-sync uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001"]
@@ -18,21 +18,3 @@ KNOWN_CHANNEL_COMMANDS: frozenset[str] = frozenset(
        "/help",
    }
 )
-
-
-def extract_connect_code(text: str) -> str | None:
-    """Extract the one-time channel binding code from a connect command."""
-    parts = text.strip().split()
-    if len(parts) < 2:
-        return None
-    command = parts[0].lower()
-    if command in {"/connect", "connect"}:
-        return parts[1]
-    return None
-
-
-def is_known_channel_command(text: str) -> bool:
-    """Return whether text starts with a registered channel control command."""
-    if not text.startswith("/"):
-        return False
-    return text.split(maxsplit=1)[0].lower() in KNOWN_CHANNEL_COMMANDS
@@ -1,44 +0,0 @@
-"""Helpers for attaching persisted channel connection ownership to inbound messages."""
-
-from __future__ import annotations
-
-from typing import Any
-
-from app.channels.message_bus import InboundMessage
-
-
-async def attach_connection_identity(
-    inbound: InboundMessage,
-    *,
-    repo: Any,
-    provider: str,
-    workspace_id: str | None,
-    fallback_without_workspace: bool = False,
-) -> InboundMessage:
-    """Attach connection metadata to an inbound message when a persisted binding exists."""
-    if repo is None:
-        return inbound
-
-    workspace_candidates: list[str | None] = []
-    if workspace_id:
-        workspace_candidates.append(workspace_id)
-    if fallback_without_workspace:
-        workspace_candidates.append(None)
-    if not workspace_candidates:
-        return inbound
-
-    for candidate in workspace_candidates:
-        connection = await repo.find_connection_by_external_identity(
-            provider=provider,
-            external_account_id=inbound.user_id,
-            workspace_id=candidate,
-        )
-        if connection is None:
-            continue
-
-        inbound.connection_id = connection["id"]
-        inbound.owner_user_id = connection["owner_user_id"]
-        inbound.workspace_id = connection.get("workspace_id")
-        return inbound
-
-    return inbound
@@ -14,8 +14,7 @@ from typing import Any
 import httpx

 from app.channels.base import Channel
-from app.channels.commands import extract_connect_code, is_known_channel_command
-from app.channels.connection_identity import attach_connection_identity
+from app.channels.commands import KNOWN_CHANNEL_COMMANDS
 from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment

 logger = logging.getLogger(__name__)
@@ -60,7 +59,9 @@ def _normalize_allowed_users(allowed_users: Any) -> set[str]:


 def _is_dingtalk_command(text: str) -> bool:
-    return is_known_channel_command(text)
+    if not text.startswith("/"):
+        return False
+    return text.split(maxsplit=1)[0].lower() in KNOWN_CHANNEL_COMMANDS


 def _extract_text_from_rich_text(rich_text_list: list) -> str:
@@ -137,7 +138,6 @@ class DingTalkChannel(Channel):
        self._incoming_messages: dict[str, Any] = {}
        self._incoming_messages_lock = threading.Lock()
        self._card_repliers: dict[str, Any] = {}
-        self._connection_repo = config.get("connection_repo")

    @property
    def supports_streaming(self) -> bool:
@@ -397,24 +397,6 @@ class DingTalkChannel(Channel):
                text[:100],
            )

-            connect_code = extract_connect_code(text)
-            if connect_code and self._connection_repo is not None:
-                if self._main_loop and self._main_loop.is_running():
-                    fut = asyncio.run_coroutine_threadsafe(
-                        self._bind_connection_from_connect_code(
-                            conversation_type=conversation_type,
-                            sender_staff_id=sender_staff_id,
-                            sender_nick=sender_nick,
-                            conversation_id=conversation_id,
-                            code=connect_code,
-                        ),
-                        self._main_loop,
-                    )
-                    fut.add_done_callback(lambda f, mid=msg_id: self._log_future_error(f, "bind_connection", mid))
-                else:
-                    logger.warning("[DingTalk] main loop not running, cannot bind channel connection")
-                return
-
            if _is_dingtalk_command(text):
                msg_type = InboundMessageType.COMMAND
            else:
@@ -470,95 +452,11 @@ class DingTalkChannel(Channel):
        return ""

    async def _prepare_inbound(self, chat_id: str, inbound: InboundMessage) -> None:
-        inbound = await self._attach_connection_identity(inbound)
        # Running reply must finish before publish_inbound so AI card tracks are
        # registered before the manager emits streaming outbounds.
        await self._send_running_reply(chat_id, inbound)
        await self.bus.publish_inbound(inbound)

-    @staticmethod
-    def _connection_workspace_id(conversation_type: str, conversation_id: str) -> str | None:
-        if conversation_type == _CONVERSATION_TYPE_GROUP and conversation_id:
-            return conversation_id
-        return None
-
-    async def _attach_connection_identity(self, inbound: InboundMessage) -> InboundMessage:
-        conversation_type = str(inbound.metadata.get("conversation_type") or _CONVERSATION_TYPE_P2P)
-        conversation_id = str(inbound.metadata.get("conversation_id") or "")
-        return await attach_connection_identity(
-            inbound,
-            repo=self._connection_repo,
-            provider="dingtalk",
-            workspace_id=self._connection_workspace_id(conversation_type, conversation_id),
-            fallback_without_workspace=True,
-        )
-
-    async def _bind_connection_from_connect_code(
-        self,
-        *,
-        conversation_type: str,
-        sender_staff_id: str,
-        sender_nick: str,
-        conversation_id: str,
-        code: str,
-    ) -> bool:
-        if self._connection_repo is None or not code:
-            return False
-
-        state = await self._connection_repo.consume_oauth_state(provider="dingtalk", state=code)
-        if state is None:
-            await self._send_connection_reply(
-                conversation_type,
-                sender_staff_id,
-                conversation_id,
-                "DingTalk connection code is invalid or expired.",
-            )
-            return True
-
-        if not sender_staff_id:
-            await self._send_connection_reply(
-                conversation_type,
-                sender_staff_id,
-                conversation_id,
-                "DingTalk connection could not be completed from this message.",
-            )
-            return True
-
-        await self._connection_repo.upsert_connection(
-            owner_user_id=state["owner_user_id"],
-            provider="dingtalk",
-            external_account_id=sender_staff_id,
-            external_account_name=sender_nick or None,
-            workspace_id=self._connection_workspace_id(conversation_type, conversation_id),
-            metadata={
-                "conversation_type": conversation_type,
-                "conversation_id": conversation_id,
-            },
-            status="connected",
-        )
-        await self._send_connection_reply(
-            conversation_type,
-            sender_staff_id,
-            conversation_id,
-            "DingTalk connected to DeerFlow.",
-        )
-        return True
-
-    async def _send_connection_reply(
-        self,
-        conversation_type: str,
-        sender_staff_id: str,
-        conversation_id: str,
-        text: str,
-    ) -> None:
-        robot_code = self._client_id
-        if conversation_type == _CONVERSATION_TYPE_GROUP:
-            if conversation_id:
-                await self._send_text_message_to_group(robot_code, conversation_id, text)
-            return
-        if sender_staff_id:
-            await self._send_text_message_to_user(robot_code, sender_staff_id, text)
-
    async def _send_running_reply(self, chat_id: str, inbound: InboundMessage) -> None:
        conversation_type = inbound.metadata.get("conversation_type", _CONVERSATION_TYPE_P2P)
        sender_staff_id = inbound.metadata.get("sender_staff_id", "")
@@ -10,9 +10,7 @@ from pathlib import Path
 from typing import Any

 from app.channels.base import Channel
-from app.channels.commands import extract_connect_code, is_known_channel_command
-from app.channels.connection_identity import attach_connection_identity
-from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
+from app.channels.message_bus import InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment

 logger = logging.getLogger(__name__)

@@ -71,7 +69,6 @@ class DiscordChannel(Channel):
        self._discord_loop: asyncio.AbstractEventLoop | None = None
        self._main_loop: asyncio.AbstractEventLoop | None = None
        self._discord_module = None
-        self._connection_repo = config.get("connection_repo")

    async def start(self) -> None:
        if self._running:
@@ -289,10 +286,6 @@ class DiscordChannel(Channel):
            text = text.replace(bot_mention or "", "").replace(alt_mention or "", "").replace(standard_mention or "", "").strip()
            # Don't return early if text is empty — still process the mention (e.g., create thread)

-        connect_code = extract_connect_code(text)
-        if connect_code and await self._bind_connection_from_connect_code(message, connect_code):
-            return
-
        # --- Determine thread/channel routing and typing target ---
        thread_id = None
        chat_id = None
@@ -307,7 +300,7 @@ class DiscordChannel(Channel):

            # If this is a known active thread, process normally
            if thread_id in self._active_thread_ids:
-                msg_type = InboundMessageType.COMMAND if is_known_channel_command(text) else InboundMessageType.CHAT
+                msg_type = InboundMessageType.COMMAND if text.startswith("/") else InboundMessageType.CHAT
                inbound = self._make_inbound(
                    chat_id=chat_id,
                    user_id=str(message.author.id),
@@ -321,7 +314,6 @@ class DiscordChannel(Channel):
                    },
                )
                inbound.topic_id = thread_id
-                inbound = await self._attach_connection_identity(inbound, guild_id=str(guild.id) if guild else None)
                self._publish(inbound)
                # Start typing indicator in the thread
                if typing_target:
@@ -415,7 +407,7 @@ class DiscordChannel(Channel):
            chat_id = channel_id
            typing_target = message.channel  # Type into the channel

-        msg_type = InboundMessageType.COMMAND if is_known_channel_command(text) else InboundMessageType.CHAT
+        msg_type = InboundMessageType.COMMAND if text.startswith("/") else InboundMessageType.CHAT
        inbound = self._make_inbound(
            chat_id=chat_id,
            user_id=str(message.author.id),
@@ -429,7 +421,6 @@ class DiscordChannel(Channel):
            },
        )
        inbound.topic_id = thread_id
-        inbound = await self._attach_connection_identity(inbound, guild_id=str(guild.id) if guild else None)

        # Start typing indicator in the correct target (thread or channel)
        if typing_target:
@@ -444,60 +435,6 @@ class DiscordChannel(Channel):
            future = asyncio.run_coroutine_threadsafe(self.bus.publish_inbound(inbound), self._main_loop)
            future.add_done_callback(lambda f: logger.exception("[Discord] publish_inbound failed", exc_info=f.exception()) if f.exception() else None)

-    async def _attach_connection_identity(self, inbound: InboundMessage, guild_id: str | None = None) -> InboundMessage:
-        return await attach_connection_identity(
-            inbound,
-            repo=self._connection_repo,
-            provider="discord",
-            workspace_id=guild_id,
-            fallback_without_workspace=True,
-        )
-
-    async def _bind_connection_from_connect_code(self, message, code: str) -> bool:
-        if self._connection_repo is None or not code:
-            return False
-
-        state = await self._connection_repo.consume_oauth_state(provider="discord", state=code)
-        if state is None:
-            await self._send_connection_reply(message, "Discord connection code is invalid or expired.")
-            return True
-
-        guild = getattr(message, "guild", None)
-        channel = getattr(message, "channel", None)
-        author = getattr(message, "author", None)
-        user_id = str(getattr(author, "id", "") or "")
-        if not user_id:
-            await self._send_connection_reply(message, "Discord connection could not be completed from this message.")
-            return True
-
-        guild_id = str(getattr(guild, "id", "") or "") or None
-        await self._connection_repo.upsert_connection(
-            owner_user_id=state["owner_user_id"],
-            provider="discord",
-            external_account_id=user_id,
-            external_account_name=getattr(author, "display_name", None) or getattr(author, "name", None),
-            workspace_id=guild_id,
-            workspace_name=getattr(guild, "name", None) if guild is not None else None,
-            metadata={
-                "guild_id": guild_id,
-                "channel_id": str(getattr(channel, "id", "") or ""),
-            },
-            status="connected",
-        )
-        await self._send_connection_reply(message, "Discord connected to DeerFlow.")
-        return True
-
-    @staticmethod
-    async def _send_connection_reply(message, text: str) -> None:
-        channel = getattr(message, "channel", None)
-        send = getattr(channel, "send", None)
-        if send is None:
-            return
-        try:
-            await send(text)
-        except Exception:
-            logger.exception("[Discord] failed to send connection reply")
-
    def _run_client(self) -> None:
        self._discord_loop = asyncio.new_event_loop()
        asyncio.set_event_loop(self._discord_loop)
@@ -7,31 +7,22 @@ import json
 import logging
 import re
 import threading
-import time
 from typing import Any, Literal

 from app.channels.base import Channel
-from app.channels.commands import extract_connect_code, is_known_channel_command
-from app.channels.connection_identity import attach_connection_identity
-from app.channels.message_bus import (
-    PENDING_CLARIFICATION_METADATA_KEY,
-    RESOLVED_FROM_PENDING_CLARIFICATION_METADATA_KEY,
-    InboundMessage,
-    InboundMessageType,
-    MessageBus,
-    OutboundMessage,
-    ResolvedAttachment,
-)
+from app.channels.commands import KNOWN_CHANNEL_COMMANDS
+from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
 from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
 from deerflow.runtime.user_context import get_effective_user_id
 from deerflow.sandbox.sandbox_provider import get_sandbox_provider

 logger = logging.getLogger(__name__)
-PENDING_CLARIFICATION_TTL_SECONDS = 30 * 60


 def _is_feishu_command(text: str) -> bool:
-    return is_known_channel_command(text)
+    if not text.startswith("/"):
+        return False
+    return text.split(maxsplit=1)[0].lower() in KNOWN_CHANNEL_COMMANDS


 class FeishuChannel(Channel):
@@ -65,46 +56,17 @@ class FeishuChannel(Channel):
        self._background_tasks: set[asyncio.Task] = set()
        self._running_card_ids: dict[str, str] = {}
        self._running_card_tasks: dict[str, asyncio.Task] = {}
-        self._pending_clarifications: dict[tuple[str, str], list[dict[str, Any]]] = {}
        self._CreateFileRequest = None
        self._CreateFileRequestBody = None
        self._CreateImageRequest = None
        self._CreateImageRequestBody = None
        self._GetMessageResourceRequest = None
        self._thread_lock = threading.Lock()
-        self._connection_repo = config.get("connection_repo")
-
-    @staticmethod
-    def _non_empty_str(value: Any) -> str | None:
-        if isinstance(value, str) and value.strip():
-            return value.strip()
-        return None
-
-    @staticmethod
-    def _pending_key(chat_id: str, user_id: str) -> tuple[str, str]:
-        return (chat_id, user_id)

    @property
    def supports_streaming(self) -> bool:
        return True

-    @property
-    def is_running(self) -> bool:
-        if not self._running:
-            return False
-        return self._thread is not None and self._thread.is_alive()
-
-    def _build_event_handler(self, lark):
-        return (
-            lark.EventDispatcherHandler.builder("", "")
-            .register_p2_im_message_receive_v1(self._on_message)
-            .register_p2_im_message_message_read_v1(self._on_ignored_message_event)
-            .register_p2_im_message_reaction_created_v1(self._on_ignored_message_event)
-            .register_p2_im_message_reaction_deleted_v1(self._on_ignored_message_event)
-            .register_p2_im_message_recalled_v1(self._on_ignored_message_event)
-            .build()
-        )
-
    async def start(self) -> None:
        if self._running:
            return
@@ -198,7 +160,7 @@ class FeishuChannel(Channel):
            # thread's uvloop.
            _ws_client_mod.loop = loop

-            event_handler = self._build_event_handler(lark)
+            event_handler = lark.EventDispatcherHandler.builder("", "").register_p2_im_message_receive_v1(self._on_message).build()
            ws_client = lark.ws.Client(
                app_id=app_id,
                app_secret=app_secret,
@@ -210,10 +172,6 @@ class FeishuChannel(Channel):
        except Exception:
            if self._running:
                logger.exception("Feishu WebSocket error")
-            self._running = False
-
-    def _on_ignored_message_event(self, event) -> None:
-        logger.debug("[Feishu] ignoring non-content message event: %s", type(event).__name__)

    async def stop(self) -> None:
        self._running = False
@@ -573,25 +531,18 @@ class FeishuChannel(Channel):
                        "[Feishu] failed to patch running card %s, falling back to final reply",
                        running_card_id,
                    )
-                    fallback_card_id = await self._reply_card(source_message_id, msg.text)
-                    self._remember_thread_mapping(msg, source_message_id, fallback_card_id)
-                    self._remember_pending_clarification(msg, fallback_card_id)
+                    await self._reply_card(source_message_id, msg.text)
                else:
-                    self._remember_thread_mapping(msg, source_message_id, running_card_id)
-                    self._remember_pending_clarification(msg, running_card_id)
                    logger.info("[Feishu] running card updated: source=%s card=%s", source_message_id, running_card_id)
            elif msg.is_final:
-                final_card_id = await self._reply_card(source_message_id, msg.text)
-                self._remember_thread_mapping(msg, source_message_id, final_card_id)
-                self._remember_pending_clarification(msg, final_card_id)
+                await self._reply_card(source_message_id, msg.text)
            elif awaited_running_card_task:
                logger.warning(
                    "[Feishu] running card task finished without message_id for source=%s, skipping duplicate non-final creation",
                    source_message_id,
                )
            else:
-                created_card_id = await self._ensure_running_card(source_message_id, msg.text)
-                self._remember_thread_mapping(msg, source_message_id, created_card_id)
+                await self._ensure_running_card(source_message_id, msg.text)

            if msg.is_final:
                self._running_card_ids.pop(source_message_id, None)
@@ -602,129 +553,6 @@ class FeishuChannel(Channel):

    # -- internal ----------------------------------------------------------

-    def _remember_thread_mapping(self, msg: OutboundMessage, *topic_ids: str | None) -> None:
-        store = self.config.get("channel_store")
-        if store is None or not msg.thread_id:
-            return
-
-        metadata_topic_ids = [
-            msg.metadata.get("message_id"),
-            msg.metadata.get("root_id"),
-            msg.metadata.get("parent_id"),
-            msg.metadata.get("thread_id"),
-            msg.metadata.get("topic_id"),
-        ]
-        user_id = ""
-        raw_user_id = msg.metadata.get("user_id")
-        if isinstance(raw_user_id, str):
-            user_id = raw_user_id
-
-        seen: set[str] = set()
-        for topic_id in [*topic_ids, *metadata_topic_ids]:
-            topic_id = self._non_empty_str(topic_id)
-            if not topic_id or topic_id in seen:
-                continue
-            seen.add(topic_id)
-            try:
-                store.set_thread_id(
-                    self.name,
-                    msg.chat_id,
-                    msg.thread_id,
-                    topic_id=topic_id,
-                    user_id=user_id,
-                )
-            except Exception:
-                logger.exception("[Feishu] failed to remember thread mapping for topic_id=%s", topic_id)
-
-    def _remember_pending_clarification(self, msg: OutboundMessage, card_message_id: str | None) -> None:
-        if not msg.is_final or msg.metadata.get(PENDING_CLARIFICATION_METADATA_KEY) is not True:
-            return
-
-        user_id = self._non_empty_str(msg.metadata.get("user_id"))
-        topic_id = self._non_empty_str(msg.metadata.get("topic_id"))
-        source_message_id = self._non_empty_str(msg.thread_ts) or self._non_empty_str(msg.metadata.get("message_id"))
-        if not (user_id and topic_id and msg.thread_id and source_message_id and card_message_id):
-            return
-
-        key = self._pending_key(msg.chat_id, user_id)
-        pending = {
-            "thread_id": msg.thread_id,
-            "topic_id": topic_id,
-            "source_message_id": source_message_id,
-            "card_message_id": card_message_id,
-            "created_at": time.time(),
-        }
-        with self._thread_lock:
-            # Plain-message clarification continuity is a short-lived in-memory
-            # hint; explicit Feishu replies are still covered by persisted
-            # message-id mappings.
-            self._pending_clarifications.setdefault(key, []).append(pending)
-        logger.info(
-            "[Feishu] pending clarification remembered: chat_id=%s user_id=%s topic_id=%s thread_id=%s",
-            msg.chat_id,
-            user_id,
-            topic_id,
-            msg.thread_id,
-        )
-
-    def _consume_pending_clarification(self, chat_id: str, user_id: str) -> dict[str, Any] | None:
-        key = self._pending_key(chat_id, user_id)
-        with self._thread_lock:
-            pending_items = self._pending_clarifications.get(key)
-            if not pending_items:
-                return None
-
-            now = time.time()
-            while pending_items:
-                pending = pending_items.pop(0)
-                created_at = pending.get("created_at")
-                if isinstance(created_at, (int, float)) and now - created_at <= PENDING_CLARIFICATION_TTL_SECONDS:
-                    if pending_items:
-                        self._pending_clarifications[key] = pending_items
-                    else:
-                        self._pending_clarifications.pop(key, None)
-                    return pending
-                logger.info("[Feishu] pending clarification expired: chat_id=%s user_id=%s", chat_id, user_id)
-
-            self._pending_clarifications.pop(key, None)
-            return None
-
-    def _ensure_pending_thread_mapping(self, chat_id: str, user_id: str, pending: dict[str, Any]) -> None:
-        store = self.config.get("channel_store")
-        topic_id = self._non_empty_str(pending.get("topic_id"))
-        thread_id = self._non_empty_str(pending.get("thread_id"))
-        if store is None or not topic_id or not thread_id:
-            return
-        try:
-            store.set_thread_id(self.name, chat_id, thread_id, topic_id=topic_id, user_id=user_id)
-        except Exception:
-            logger.exception("[Feishu] failed to restore pending clarification mapping for topic_id=%s", topic_id)
-
-    def _resolve_topic_id(
-        self,
-        chat_id: str,
-        msg_id: str,
-        *,
-        root_id: str | None,
-        parent_id: str | None,
-        thread_id: str | None,
-    ) -> tuple[str, bool]:
-        store = self.config.get("channel_store")
-        candidates = [root_id, parent_id, thread_id]
-
-        if store is not None:
-            for candidate in candidates:
-                candidate = self._non_empty_str(candidate)
-                if not candidate:
-                    continue
-                try:
-                    if store.get_thread_id(self.name, chat_id, topic_id=candidate):
-                        return candidate, True
-                except Exception:
-                    logger.exception("[Feishu] failed to resolve stored topic mapping for topic_id=%s", candidate)
-
-        return root_id or msg_id, False
-
    @staticmethod
    def _log_future_error(fut, name: str, msg_id: str) -> None:
        """Callback for run_coroutine_threadsafe futures to surface errors."""
@@ -749,47 +577,11 @@ class FeishuChannel(Channel):

    async def _prepare_inbound(self, msg_id: str, inbound) -> None:
        """Kick off Feishu side effects without delaying inbound dispatch."""
-        inbound = await self._attach_connection_identity(inbound)
        reaction_task = asyncio.create_task(self._add_reaction(msg_id, "OK"))
        self._track_background_task(reaction_task, name="add_reaction", msg_id=msg_id)
        self._ensure_running_card_started(msg_id)
        await self.bus.publish_inbound(inbound)

-    async def _attach_connection_identity(self, inbound: InboundMessage) -> InboundMessage:
-        return await attach_connection_identity(
-            inbound,
-            repo=self._connection_repo,
-            provider="feishu",
-            workspace_id=inbound.chat_id,
-        )
-
-    async def _bind_connection_from_connect_code(self, *, message_id: str, chat_id: str, user_id: str, code: str) -> bool:
-        if self._connection_repo is None or not code:
-            return False
-
-        state = await self._connection_repo.consume_oauth_state(provider="feishu", state=code)
-        if state is None:
-            await self._reply_card(message_id, "Feishu connection code is invalid or expired.")
-            return True
-
-        if not user_id or not chat_id:
-            await self._reply_card(message_id, "Feishu connection could not be completed from this message.")
-            return True
-
-        await self._connection_repo.upsert_connection(
-            owner_user_id=state["owner_user_id"],
-            provider="feishu",
-            external_account_id=user_id,
-            workspace_id=chat_id,
-            metadata={
-                "chat_id": chat_id,
-                "message_id": message_id,
-            },
-            status="connected",
-        )
-        await self._reply_card(message_id, "Feishu connected to DeerFlow.")
-        return True
-
    def _on_message(self, event) -> None:
        """Called by lark-oapi when a message is received (runs in lark thread)."""
        try:
@@ -801,9 +593,7 @@ class FeishuChannel(Channel):

            # root_id is set when the message is a reply within a Feishu thread.
            # Use it as topic_id so all replies share the same DeerFlow thread.
-            root_id = self._non_empty_str(getattr(message, "root_id", None))
-            parent_id = self._non_empty_str(getattr(message, "parent_id", None))
-            feishu_thread_id = self._non_empty_str(getattr(message, "thread_id", None))
+            root_id = getattr(message, "root_id", None) or None

            # Parse message content
            content = json.loads(message.content)
@@ -864,12 +654,10 @@ class FeishuChannel(Channel):
            text = text.strip()

            logger.info(
-                "[Feishu] parsed message: chat_id=%s, msg_id=%s, root_id=%s, parent_id=%s, thread_id=%s, sender=%s, text=%r",
+                "[Feishu] parsed message: chat_id=%s, msg_id=%s, root_id=%s, sender=%s, text=%r",
                chat_id,
                msg_id,
                root_id,
-                parent_id,
-                feishu_thread_id,
                sender_id,
                text[:100] if text else "",
            )
@@ -878,23 +666,6 @@ class FeishuChannel(Channel):
                logger.info("[Feishu] empty text, ignoring message")
                return

-            connect_code = extract_connect_code(text)
-            if connect_code and self._connection_repo is not None:
-                if self._main_loop and self._main_loop.is_running():
-                    fut = asyncio.run_coroutine_threadsafe(
-                        self._bind_connection_from_connect_code(
-                            message_id=msg_id,
-                            chat_id=chat_id,
-                            user_id=sender_id,
-                            code=connect_code,
-                        ),
-                        self._main_loop,
-                    )
-                    fut.add_done_callback(lambda f, mid=msg_id: self._log_future_error(f, "bind_connection", mid))
-                else:
-                    logger.warning("[Feishu] main loop not running, cannot bind channel connection")
-                return
-
            # Only treat known slash commands as commands; absolute paths and
            # other slash-prefixed text should be handled as normal chat.
            if _is_feishu_command(text):
@@ -902,24 +673,8 @@ class FeishuChannel(Channel):
            else:
                msg_type = InboundMessageType.CHAT

-            # Prefer any platform message id that already maps to a DeerFlow
-            # thread. This keeps replies to bot clarification cards in the
-            # original conversation even when Feishu reports the card as root.
-            topic_id, resolved_from_stored_mapping = self._resolve_topic_id(
-                chat_id,
-                msg_id,
-                root_id=root_id,
-                parent_id=parent_id,
-                thread_id=feishu_thread_id,
-            )
-            resolved_from_pending = False
-            if msg_type == InboundMessageType.CHAT and not resolved_from_stored_mapping:
-                pending = self._consume_pending_clarification(chat_id, sender_id)
-                pending_topic_id = self._non_empty_str(pending.get("topic_id")) if pending else None
-                if pending_topic_id:
-                    topic_id = pending_topic_id
-                    self._ensure_pending_thread_mapping(chat_id, sender_id, pending)
-                    resolved_from_pending = True
+            # topic_id: use root_id for replies (same topic), msg_id for new messages (new topic)
+            topic_id = root_id or msg_id

            inbound = self._make_inbound(
                chat_id=chat_id,
@@ -928,15 +683,7 @@ class FeishuChannel(Channel):
                msg_type=msg_type,
                thread_ts=msg_id,
                files=files_list,
-                metadata={
-                    "message_id": msg_id,
-                    "root_id": root_id,
-                    "parent_id": parent_id,
-                    "thread_id": feishu_thread_id,
-                    "topic_id": topic_id,
-                    "user_id": sender_id,
-                    RESOLVED_FROM_PENDING_CLARIFICATION_METADATA_KEY: resolved_from_pending,
-                },
+                metadata={"message_id": msg_id, "root_id": root_id},
            )
            inbound.topic_id = topic_id

@@ -8,7 +8,6 @@ import mimetypes
 import re
 import time
 from collections.abc import Awaitable, Callable, Mapping
-from dataclasses import dataclass
 from pathlib import Path
 from typing import Any

@@ -16,24 +15,11 @@ import httpx
 from langgraph_sdk.errors import ConflictError

 from app.channels.commands import KNOWN_CHANNEL_COMMANDS
-from app.channels.message_bus import (
-    PENDING_CLARIFICATION_METADATA_KEY,
-    InboundMessage,
-    InboundMessageType,
-    MessageBus,
-    OutboundMessage,
-    ResolvedAttachment,
-)
+from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
 from app.channels.store import ChannelStore
 from app.gateway.csrf_middleware import CSRF_COOKIE_NAME, CSRF_HEADER_NAME, generate_csrf_token
 from app.gateway.internal_auth import create_internal_auth_headers
-from deerflow.config.agents_config import load_agent_config
-from deerflow.config.paths import make_safe_user_id
 from deerflow.runtime.user_context import get_effective_user_id
-from deerflow.skills.slash import parse_slash_skill_reference
-from deerflow.skills.storage import get_or_new_skill_storage
-from deerflow.skills.storage.skill_storage import SkillStorage
-from deerflow.utils.messages import ORIGINAL_USER_CONTENT_KEY

 logger = logging.getLogger(__name__)

@@ -130,16 +116,6 @@ class InvalidChannelSessionConfigError(ValueError):
    """Raised when IM channel session overrides contain invalid agent config."""


-class SlashSkillCommandResolutionError(RuntimeError):
-    """Raised when IM slash-skill command resolution cannot complete safely."""
-
-
-@dataclass(frozen=True, slots=True)
-class _SlashSkillCommandResolution:
-    route_to_chat: bool = False
-    failure_message: str | None = None
-
-
 def _is_thread_busy_error(exc: BaseException | None) -> bool:
    if exc is None:
        return False
@@ -197,8 +173,6 @@ def _extract_response_text(result: dict | list) -> str:

        # Stop at the last human message — anything before it is a previous turn
        if msg_type == "human":
-            if _is_hidden_human_control_message(msg):
-                continue
            break

        # Check for tool messages from ask_clarification (interrupt case)
@@ -226,70 +200,6 @@ def _extract_response_text(result: dict | list) -> str:
    return ""


-def _messages_from_result(result: dict | list) -> list[Any]:
-    if isinstance(result, list):
-        return result
-    if isinstance(result, dict):
-        messages = result.get("messages", [])
-        if isinstance(messages, list):
-            return messages
-    return []
-
-
-def _current_turn_messages(result: dict | list) -> list[dict[str, Any]]:
-    messages = _messages_from_result(result)
-    current_turn: list[dict[str, Any]] = []
-    for msg in reversed(messages):
-        if not isinstance(msg, dict):
-            continue
-        if msg.get("type") == "human":
-            break
-        current_turn.append(msg)
-    current_turn.reverse()
-    return current_turn
-
-
-def _has_current_turn_clarification(result: dict | list) -> bool:
-    """Return True only when the current turn's final result is clarification."""
-    for msg in reversed(_current_turn_messages(result)):
-        msg_type = msg.get("type")
-        if msg_type == "tool":
-            return msg.get("name") == "ask_clarification"
-        if msg_type == "ai":
-            content = msg.get("content")
-            if isinstance(content, str):
-                if content:
-                    return False
-            elif content:
-                return False
-            if msg.get("tool_calls"):
-                return False
-    return False
-
-
-def _response_metadata(base_metadata: dict[str, Any], *, pending_clarification: bool = False) -> dict[str, Any]:
-    metadata = _slim_metadata(base_metadata)
-    if pending_clarification:
-        metadata[PENDING_CLARIFICATION_METADATA_KEY] = True
-    return metadata
-
-
-def _thread_channel_metadata(msg: InboundMessage) -> dict[str, Any]:
-    channel_source: dict[str, Any] = {
-        "type": "im_channel",
-        "provider": msg.channel_name,
-        "chat_id": msg.chat_id,
-    }
-    if msg.topic_id:
-        channel_source["topic_id"] = msg.topic_id
-    if msg.thread_ts:
-        channel_source["thread_ts"] = msg.thread_ts
-    if msg.connection_id:
-        channel_source["connection_id"] = msg.connection_id
-
-    return {"channel_source": channel_source}
-
-
 def _extract_text_content(content: Any) -> str:
    """Extract text from a streaming payload content field."""
    if isinstance(content, str):
@@ -403,8 +313,6 @@ def _extract_artifacts(result: dict | list) -> list[str]:
            continue
        # Stop at the last human message — anything before it is a previous turn
        if msg.get("type") == "human":
-            if _is_hidden_human_control_message(msg):
-                continue
            break
        # Look for AI messages with present_files tool calls
        if msg.get("type") == "ai":
@@ -417,18 +325,6 @@ def _extract_artifacts(result: dict | list) -> list[str]:
    return artifacts


-def _is_hidden_human_control_message(msg: Mapping[str, Any]) -> bool:
-    """Return whether a human message is an internal control message hidden from UI."""
-    if msg.get("type") != "human":
-        return False
-
-    additional_kwargs = msg.get("additional_kwargs")
-    if not isinstance(additional_kwargs, Mapping):
-        return False
-
-    return additional_kwargs.get("hide_from_ui") is True
-
-
 def _format_artifact_text(artifacts: list[str]) -> str:
    """Format artifact paths into a human-readable text block listing filenames."""
    import posixpath
@@ -442,73 +338,6 @@ def _format_artifact_text(artifacts: list[str]) -> str:
 _OUTPUTS_VIRTUAL_PREFIX = "/mnt/user-data/outputs/"


-def _unknown_command_reply(command: str | None = None) -> str:
-    available = " | ".join(sorted(KNOWN_CHANNEL_COMMANDS))
-    if command:
-        return f"Unknown command: /{command}. Available commands: {available}"
-    return f"Unknown command. Available commands: {available}"
-
-
-def _human_input_message(content: str, *, original_content: str | None = None) -> dict[str, Any]:
-    message: dict[str, Any] = {"role": "human", "content": content}
-    if original_content is not None and original_content != content:
-        message["additional_kwargs"] = {ORIGINAL_USER_CONTENT_KEY: original_content}
-    return message
-
-
-def _auth_disabled_owner_user_id() -> str | None:
-    try:
-        from app.gateway.auth_disabled import AUTH_DISABLED_USER_ID, is_auth_disabled
-    except Exception:
-        logger.debug("Unable to inspect auth-disabled mode for channel owner fallback", exc_info=True)
-        return None
-    return AUTH_DISABLED_USER_ID if is_auth_disabled() else None
-
-
-def _effective_owner_user_id(msg: InboundMessage) -> str | None:
-    return _auth_disabled_owner_user_id() or msg.owner_user_id
-
-
-def _apply_effective_owner(msg: InboundMessage) -> InboundMessage:
-    owner_user_id = _effective_owner_user_id(msg)
-    if owner_user_id:
-        msg.owner_user_id = owner_user_id
-    return msg
-
-
-def _owner_headers(msg: InboundMessage) -> dict[str, str] | None:
-    owner_user_id = _effective_owner_user_id(msg)
-    if not owner_user_id:
-        return None
-    return create_internal_auth_headers(owner_user_id=owner_user_id)
-
-
-def _resolve_slash_skill_command(
-    text: str,
-    available_skills: set[str] | None = None,
-    storage: SkillStorage | Callable[[], SkillStorage] | None = None,
-) -> _SlashSkillCommandResolution | None:
-    reference = parse_slash_skill_reference(text)
-    if reference is None:
-        return None
-    try:
-        resolved_storage = storage() if callable(storage) else storage or get_or_new_skill_storage()
-        skills = resolved_storage.load_skills(enabled_only=False)
-
-        skill = next((candidate for candidate in skills if candidate.name == reference.name), None)
-        if skill is None:
-            return None
-        if not skill.enabled:
-            return _SlashSkillCommandResolution(failure_message=f"Skill `/{reference.name}` is installed but disabled. Enable it before using slash activation.")
-        if available_skills is not None and reference.name not in available_skills:
-            return _SlashSkillCommandResolution(failure_message=f"Skill `/{reference.name}` is not available for this agent.")
-
-        return _SlashSkillCommandResolution(route_to_chat=True)
-    except Exception as exc:
-        logger.exception("[Manager] failed to resolve slash skill command")
-        raise SlashSkillCommandResolutionError("Failed to resolve slash skill command. Please check the skill configuration.") from exc
-
-
 def _resolve_attachments(thread_id: str, artifacts: list[str]) -> list[ResolvedAttachment]:
    """Resolve virtual artifact paths to host filesystem paths with metadata.

@@ -713,7 +542,6 @@ class ChannelManager:
        assistant_id: str = DEFAULT_ASSISTANT_ID,
        default_session: dict[str, Any] | None = None,
        channel_sessions: dict[str, Any] | None = None,
-        connection_repo: Any | None = None,
    ) -> None:
        self.bus = bus
        self.store = store
@@ -723,9 +551,7 @@ class ChannelManager:
        self._assistant_id = assistant_id
        self._default_session = _as_dict(default_session)
        self._channel_sessions = dict(channel_sessions or {})
-        self._connection_repo = connection_repo
        self._client = None  # lazy init — langgraph_sdk async client
-        self._skill_storage: SkillStorage | None = None
        self._csrf_token = generate_csrf_token()
        self._semaphore: asyncio.Semaphore | None = None
        self._running = False
@@ -773,25 +599,12 @@ class ChannelManager:
        configurable["checkpoint_ns"] = ""
        configurable["thread_id"] = thread_id

-        # ``user_id`` drives DeerFlow-owned memory, files, and thread buckets.
-        # For browser-connected IM channels, prefer the DeerFlow account that
-        # owns the connection. Preserve the raw platform user under
-        # ``channel_user_id`` for platform-facing lookups and audits.
-        run_context_identity: dict[str, Any] = {"thread_id": thread_id}
-        owner_user_id = _effective_owner_user_id(msg)
-        if owner_user_id:
-            run_context_identity["user_id"] = make_safe_user_id(owner_user_id)
-        elif msg.user_id:
-            run_context_identity["user_id"] = make_safe_user_id(msg.user_id)
-        if msg.user_id:
-            run_context_identity["channel_user_id"] = msg.user_id
-
        run_context = _merge_dicts(
            DEFAULT_RUN_CONTEXT,
            self._default_session.get("context"),
            channel_layer.get("context"),
            user_layer.get("context"),
-            run_context_identity,
+            {"thread_id": thread_id},
        )

        # Custom agents are implemented as lead_agent + agent_name context.
@@ -803,21 +616,6 @@ class ChannelManager:

        return assistant_id, run_config, run_context

-    def _resolve_available_skill_names(self, msg: InboundMessage) -> set[str] | None:
-        thread_id = self.store.get_thread_id(msg.channel_name, msg.chat_id, topic_id=msg.topic_id) or ""
-        _, _, run_context = self._resolve_run_params(msg, thread_id)
-        if run_context.get("is_bootstrap"):
-            return {"bootstrap"}
-
-        agent_name = run_context.get("agent_name")
-        if not isinstance(agent_name, str) or not agent_name.strip():
-            return None
-
-        agent_config = load_agent_config(_normalize_custom_agent_name(agent_name))
-        if agent_config and agent_config.skills is not None:
-            return set(agent_config.skills)
-        return None
-
    # -- LangGraph SDK client (lazy) ----------------------------------------

    def _get_client(self):
@@ -835,11 +633,6 @@ class ChannelManager:
            )
        return self._client

-    def _get_skill_storage(self) -> SkillStorage:
-        if self._skill_storage is None:
-            self._skill_storage = get_or_new_skill_storage()
-        return self._skill_storage
-
    # -- lifecycle ---------------------------------------------------------

    async def start(self) -> None:
@@ -895,7 +688,6 @@ class ChannelManager:
            logger.error("[Manager] unhandled error in message task: %s", exc, exc_info=exc)

    async def _handle_message(self, msg: InboundMessage) -> None:
-        msg = _apply_effective_owner(msg)
        async with self._semaphore:
            try:
                if msg.msg_type == InboundMessageType.COMMAND:
@@ -910,14 +702,6 @@ class ChannelManager:
                    exc,
                )
                await self._send_error(msg, str(exc))
-            except SlashSkillCommandResolutionError as exc:
-                logger.warning(
-                    "Slash skill command resolution failed for %s (chat=%s): %s",
-                    msg.channel_name,
-                    msg.chat_id,
-                    exc,
-                )
-                await self._send_error(msg, str(exc))
            except Exception:
                logger.exception(
                    "Error handling message from %s (chat=%s)",
@@ -928,27 +712,10 @@ class ChannelManager:

    # -- chat handling -----------------------------------------------------

-    async def _lookup_thread_id(self, msg: InboundMessage) -> str | None:
-        if msg.connection_id and self._connection_repo is not None:
-            return await self._connection_repo.get_thread_id(
-                msg.connection_id,
-                msg.chat_id,
-                msg.topic_id,
-            )
-        return self.store.get_thread_id(msg.channel_name, msg.chat_id, topic_id=msg.topic_id)
-
-    async def _store_thread_id(self, msg: InboundMessage, thread_id: str) -> None:
-        if msg.connection_id and msg.owner_user_id and self._connection_repo is not None:
-            await self._connection_repo.set_thread_id(
-                connection_id=msg.connection_id,
-                owner_user_id=msg.owner_user_id,
-                provider=msg.channel_name,
-                external_conversation_id=msg.chat_id,
-                external_topic_id=msg.topic_id,
-                thread_id=thread_id,
-            )
-            return
-
+    async def _create_thread(self, client, msg: InboundMessage) -> str:
+        """Create a new thread through Gateway and store the mapping."""
+        thread = await client.threads.create()
+        thread_id = thread["thread_id"]
        self.store.set_thread_id(
            msg.channel_name,
            msg.chat_id,
@@ -956,40 +723,18 @@ class ChannelManager:
            topic_id=msg.topic_id,
            user_id=msg.user_id,
        )
-
-    async def _create_thread(self, client, msg: InboundMessage) -> str:
-        """Create a new thread through Gateway and store the mapping."""
-        metadata = _thread_channel_metadata(msg)
-        owner_headers = _owner_headers(msg)
-        if owner_headers:
-            thread = await client.threads.create(metadata=metadata, headers=owner_headers)
-        else:
-            thread = await client.threads.create(metadata=metadata)
-        thread_id = thread["thread_id"]
-        await self._store_thread_id(msg, thread_id)
        logger.info("[Manager] new thread created through Gateway: thread_id=%s for chat_id=%s topic_id=%s", thread_id, msg.chat_id, msg.topic_id)
        return thread_id

-    async def _update_thread_channel_metadata(self, client, msg: InboundMessage, thread_id: str) -> None:
-        """Best-effort source metadata backfill for existing IM-created threads."""
-        update_kwargs: dict[str, Any] = {"metadata": _thread_channel_metadata(msg)}
-        if owner_headers := _owner_headers(msg):
-            update_kwargs["headers"] = owner_headers
-        try:
-            await client.threads.update(thread_id, **update_kwargs)
-        except Exception:
-            logger.debug("[Manager] failed to update channel metadata for thread_id=%s", thread_id, exc_info=True)
-
    async def _handle_chat(self, msg: InboundMessage, extra_context: dict[str, Any] | None = None) -> None:
        client = self._get_client()

        # Look up existing DeerFlow thread.
        # topic_id may be None (e.g. Telegram private chats) — the store
        # handles this by using the "channel:chat_id" key without a topic suffix.
-        thread_id = await self._lookup_thread_id(msg)
+        thread_id = self.store.get_thread_id(msg.channel_name, msg.chat_id, topic_id=msg.topic_id)
        if thread_id:
            logger.info("[Manager] reusing thread: thread_id=%s for topic_id=%s", thread_id, msg.topic_id)
-            await self._update_thread_channel_metadata(client, msg, thread_id)

        # No existing thread found — create a new one
        if thread_id is None:
@@ -1011,11 +756,9 @@ class ChannelManager:
        if extra_context:
            run_context.update(extra_context)

-        original_text = msg.text
        uploaded = await _ingest_inbound_files(thread_id, msg)
        if uploaded:
            msg.text = f"{_format_uploaded_files_block(uploaded)}\n\n{msg.text}".strip()
-        human_message = _human_input_message(msg.text, original_content=original_text)

        if self._channel_supports_streaming(msg.channel_name):
            await self._handle_streaming_chat(
@@ -1025,24 +768,18 @@ class ChannelManager:
                assistant_id,
                run_config,
                run_context,
-                human_message,
            )
            return

        logger.info("[Manager] invoking runs.wait(thread_id=%s, text=%r)", thread_id, msg.text[:100])
-        run_kwargs: dict[str, Any] = {
-            "input": {"messages": [human_message]},
-            "config": run_config,
-            "context": run_context,
-            "multitask_strategy": "reject",
-        }
-        if owner_headers := _owner_headers(msg):
-            run_kwargs["headers"] = owner_headers
        try:
            result = await client.runs.wait(
                thread_id,
                assistant_id,
-                **run_kwargs,
+                input={"messages": [{"role": "human", "content": msg.text}]},
+                config=run_config,
+                context=run_context,
+                multitask_strategy="reject",
            )
        except Exception as exc:
            if _is_thread_busy_error(exc):
@@ -1053,7 +790,6 @@ class ChannelManager:
                raise

        response_text = _extract_response_text(result)
-        pending_clarification = _has_current_turn_clarification(result)
        artifacts = _extract_artifacts(result)

        logger.info(
@@ -1079,9 +815,7 @@ class ChannelManager:
            artifacts=artifacts,
            attachments=attachments,
            thread_ts=msg.thread_ts,
-            connection_id=msg.connection_id,
-            owner_user_id=msg.owner_user_id,
-            metadata=_response_metadata(msg.metadata, pending_clarification=pending_clarification),
+            metadata=_slim_metadata(msg.metadata),
        )
        logger.info("[Manager] publishing outbound message to bus: channel=%s, chat_id=%s", msg.channel_name, msg.chat_id)
        await self.bus.publish_outbound(outbound)
@@ -1094,7 +828,6 @@ class ChannelManager:
        assistant_id: str,
        run_config: dict[str, Any],
        run_context: dict[str, Any],
-        human_message: dict[str, Any],
    ) -> None:
        logger.info("[Manager] invoking runs.stream(thread_id=%s, text=%r)", thread_id, msg.text[:100])

@@ -1105,21 +838,16 @@ class ChannelManager:
        last_published_text = ""
        last_publish_at = 0.0
        stream_error: BaseException | None = None
-        stream_kwargs: dict[str, Any] = {
-            "input": {"messages": [human_message]},
-            "config": run_config,
-            "context": run_context,
-            "stream_mode": ["messages-tuple", "values"],
-            "multitask_strategy": "reject",
-        }
-        if owner_headers := _owner_headers(msg):
-            stream_kwargs["headers"] = owner_headers

        try:
            async for chunk in client.runs.stream(
                thread_id,
                assistant_id,
-                **stream_kwargs,
+                input={"messages": [{"role": "human", "content": msg.text}]},
+                config=run_config,
+                context=run_context,
+                stream_mode=["messages-tuple", "values"],
+                multitask_strategy="reject",
            ):
                event = getattr(chunk, "event", "")
                data = getattr(chunk, "data", None)
@@ -1149,9 +877,7 @@ class ChannelManager:
                        text=latest_text,
                        is_final=False,
                        thread_ts=msg.thread_ts,
-                        connection_id=msg.connection_id,
-                        owner_user_id=msg.owner_user_id,
-                        metadata=_response_metadata(msg.metadata),
+                        metadata=_slim_metadata(msg.metadata),
                    )
                )
                last_published_text = latest_text
@@ -1165,7 +891,6 @@ class ChannelManager:
        finally:
            result = last_values if last_values is not None else {"messages": [{"type": "ai", "content": latest_text}]}
            response_text = _extract_response_text(result)
-            pending_clarification = _has_current_turn_clarification(result)
            artifacts = _extract_artifacts(result)
            response_text, attachments = _prepare_artifact_delivery(thread_id, response_text, artifacts)

@@ -1197,29 +922,18 @@ class ChannelManager:
                    attachments=attachments,
                    is_final=True,
                    thread_ts=msg.thread_ts,
-                    connection_id=msg.connection_id,
-                    owner_user_id=msg.owner_user_id,
-                    metadata=_response_metadata(msg.metadata, pending_clarification=pending_clarification),
+                    metadata=_slim_metadata(msg.metadata),
                )
            )

    # -- command handling --------------------------------------------------

    async def _handle_command(self, msg: InboundMessage) -> None:
-        raw_text = msg.text
-        text = raw_text.strip()
+        text = msg.text.strip()
        parts = text.split(maxsplit=1)
-        reply: str | None = None
-        if not parts:
-            command = None
-            reply = _unknown_command_reply()
-        else:
-            command = parts[0].lower().removeprefix("/")
+        command = parts[0].lower().lstrip("/")

-        if reply is None and not raw_text.startswith("/"):
-            reply = _unknown_command_reply(command)
-
-        if reply is None and command == "bootstrap":
+        if command == "bootstrap":
            from dataclasses import replace as _dc_replace

            chat_text = parts[1] if len(parts) > 1 else "Initialize workspace"
@@ -1227,19 +941,27 @@ class ChannelManager:
            await self._handle_chat(chat_msg, extra_context={"is_bootstrap": True})
            return

-        if reply is None and command == "new":
+        if command == "new":
            # Create a new thread through Gateway
            client = self._get_client()
-            await self._create_thread(client, msg)
+            thread = await client.threads.create()
+            new_thread_id = thread["thread_id"]
+            self.store.set_thread_id(
+                msg.channel_name,
+                msg.chat_id,
+                new_thread_id,
+                topic_id=msg.topic_id,
+                user_id=msg.user_id,
+            )
            reply = "New conversation started."
-        elif reply is None and command == "status":
-            thread_id = await self._lookup_thread_id(msg)
+        elif command == "status":
+            thread_id = self.store.get_thread_id(msg.channel_name, msg.chat_id, topic_id=msg.topic_id)
            reply = f"Active thread: {thread_id}" if thread_id else "No active conversation."
-        elif reply is None and command == "models":
+        elif command == "models":
            reply = await self._fetch_gateway("/api/models", "models")
-        elif reply is None and command == "memory":
+        elif command == "memory":
            reply = await self._fetch_gateway("/api/memory", "memory")
-        elif reply is None and command == "help":
+        elif command == "help":
            reply = (
                "Available commands:\n"
                "/bootstrap — Start a bootstrap session (enables agent setup)\n"
@@ -1247,36 +969,18 @@ class ChannelManager:
                "/status — Show current thread info\n"
                "/models — List available models\n"
                "/memory — Show memory status\n"
-                "/<skill-name> <task> — Activate an enabled skill for one turn\n"
                "/help — Show this help"
            )
-        elif reply is None:
-            slash_resolution = await asyncio.to_thread(
-                lambda: _resolve_slash_skill_command(
-                    raw_text,
-                    self._resolve_available_skill_names(msg),
-                    self._get_skill_storage,
-                )
-            )
-            if slash_resolution and slash_resolution.failure_message:
-                reply = slash_resolution.failure_message
-            elif slash_resolution and slash_resolution.route_to_chat:
-                from dataclasses import replace as _dc_replace
-
-                chat_msg = _dc_replace(msg, msg_type=InboundMessageType.CHAT)
-                await self._handle_chat(chat_msg)
-                return
        else:
-                reply = _unknown_command_reply(command)
+            available = " | ".join(sorted(KNOWN_CHANNEL_COMMANDS))
+            reply = f"Unknown command: /{command}. Available commands: {available}"

        outbound = OutboundMessage(
            channel_name=msg.channel_name,
            chat_id=msg.chat_id,
-            thread_id=await self._lookup_thread_id(msg) or "",
+            thread_id=self.store.get_thread_id(msg.channel_name, msg.chat_id) or "",
            text=reply,
            thread_ts=msg.thread_ts,
-            connection_id=msg.connection_id,
-            owner_user_id=msg.owner_user_id,
            metadata=_slim_metadata(msg.metadata),
        )
        await self.bus.publish_outbound(outbound)
@@ -1312,11 +1016,9 @@ class ChannelManager:
        outbound = OutboundMessage(
            channel_name=msg.channel_name,
            chat_id=msg.chat_id,
-            thread_id=await self._lookup_thread_id(msg) or "",
+            thread_id=self.store.get_thread_id(msg.channel_name, msg.chat_id) or "",
            text=error_text,
            thread_ts=msg.thread_ts,
-            connection_id=msg.connection_id,
-            owner_user_id=msg.owner_user_id,
            metadata=_slim_metadata(msg.metadata),
        )
        await self.bus.publish_outbound(outbound)
@@ -13,9 +13,6 @@ from typing import Any

 logger = logging.getLogger(__name__)

-PENDING_CLARIFICATION_METADATA_KEY = "pending_clarification"
-RESOLVED_FROM_PENDING_CLARIFICATION_METADATA_KEY = "resolved_from_pending_clarification"
-

 # ---------------------------------------------------------------------------
 # Message types
@@ -44,12 +41,6 @@ class InboundMessage:
            Messages sharing the same ``topic_id`` within a ``chat_id`` will
            reuse the same DeerFlow thread.  When ``None``, each message
            creates a new thread (one-shot Q&A).
-        connection_id: Optional DeerFlow channel connection id. When present,
-            conversation mapping is scoped by the connection instead of the
-            legacy global ``channel_name:chat_id[:topic_id]`` key.
-        owner_user_id: DeerFlow user id that owns the channel connection.
-            Platform user ids stay in ``user_id``.
-        workspace_id: Optional external workspace/guild/team id.
        files: Optional list of file attachments (platform-specific dicts).
        metadata: Arbitrary extra data from the channel.
        created_at: Unix timestamp when the message was created.
@@ -62,9 +53,6 @@ class InboundMessage:
    msg_type: InboundMessageType = InboundMessageType.CHAT
    thread_ts: str | None = None
    topic_id: str | None = None
-    connection_id: str | None = None
-    owner_user_id: str | None = None
-    workspace_id: str | None = None
    files: list[dict[str, Any]] = field(default_factory=list)
    metadata: dict[str, Any] = field(default_factory=dict)
    created_at: float = field(default_factory=time.time)
@@ -104,9 +92,6 @@ class OutboundMessage:
        is_final: Whether this is the final message in the response stream.
        thread_ts: Optional platform thread identifier for threaded replies.
        metadata: Arbitrary extra data.
-        connection_id: Optional DeerFlow channel connection id used for
-            connection-specific outbound credentials.
-        owner_user_id: DeerFlow user id that owns the channel connection.
        created_at: Unix timestamp.
    """

@@ -118,8 +103,6 @@ class OutboundMessage:
    attachments: list[ResolvedAttachment] = field(default_factory=list)
    is_final: bool = True
    thread_ts: str | None = None
-    connection_id: str | None = None
-    owner_user_id: str | None = None
    metadata: dict[str, Any] = field(default_factory=dict)
    created_at: float = field(default_factory=time.time)

@@ -1,137 +0,0 @@
-"""Local persistence for runtime IM channel configuration."""
-
-from __future__ import annotations
-
-import json
-import logging
-import tempfile
-import threading
-from pathlib import Path
-from typing import Any
-
-logger = logging.getLogger(__name__)
-
-
-class ChannelRuntimeConfigStore:
-    """JSON-backed store for channel credentials entered from the UI.
-
-    This intentionally mirrors ``ChannelStore``: local/private deployments get
-    durable runtime configuration without needing a public callback URL or a
-    config.yaml edit.
-    """
-
-    def __init__(self, path: str | Path | None = None) -> None:
-        if path is None:
-            from deerflow.config.paths import get_paths
-
-            path = Path(get_paths().base_dir) / "channels" / "runtime-config.json"
-        self._path = Path(path)
-        self._path.parent.mkdir(parents=True, exist_ok=True)
-        self._data: dict[str, dict[str, Any]] = self._load()
-        self._lock = threading.Lock()
-
-    def _load(self) -> dict[str, dict[str, Any]]:
-        if self._path.exists():
-            try:
-                raw = json.loads(self._path.read_text(encoding="utf-8"))
-            except (json.JSONDecodeError, OSError):
-                logger.warning("Corrupt channel runtime config store at %s, starting fresh", self._path)
-                return {}
-            if isinstance(raw, dict):
-                return {str(name): dict(value) for name, value in raw.items() if isinstance(value, dict)}
-        return {}
-
-    def _save(self) -> None:
-        fd = tempfile.NamedTemporaryFile(
-            mode="w",
-            dir=self._path.parent,
-            suffix=".tmp",
-            delete=False,
-        )
-        try:
-            json.dump(self._data, fd, indent=2, ensure_ascii=False)
-            fd.close()
-            Path(fd.name).replace(self._path)
-            try:
-                self._path.chmod(0o600)
-            except OSError:
-                logger.debug("Unable to chmod channel runtime config store at %s", self._path, exc_info=True)
-        except BaseException:
-            fd.close()
-            Path(fd.name).unlink(missing_ok=True)
-            raise
-
-    def load_all(self) -> dict[str, dict[str, Any]]:
-        with self._lock:
-            return {name: dict(config) for name, config in self._data.items()}
-
-    def get_provider_config(self, provider: str) -> dict[str, Any] | None:
-        with self._lock:
-            config = self._data.get(provider)
-            return dict(config) if isinstance(config, dict) else None
-
-    def set_provider_config(self, provider: str, config: dict[str, Any]) -> None:
-        with self._lock:
-            self._data[provider] = dict(config)
-            self._save()
-
-    def remove_provider_config(self, provider: str) -> bool:
-        with self._lock:
-            if provider not in self._data:
-                return False
-            del self._data[provider]
-            self._save()
-            return True
-
-
-def _provider_enabled(channel_connections_config: Any, provider: str) -> bool:
-    provider_config = getattr(channel_connections_config, provider, None)
-    return bool(getattr(provider_config, "enabled", False))
-
-
-def merge_runtime_channel_configs(
-    channels_config: dict[str, Any],
-    channel_connections_config: Any,
-    *,
-    store: ChannelRuntimeConfigStore | None = None,
-) -> None:
-    """Merge persisted runtime provider config into ``channels_config`` in-place."""
-    if channel_connections_config is None or not getattr(channel_connections_config, "enabled", False):
-        return
-
-    runtime_store = store or ChannelRuntimeConfigStore()
-    for provider, runtime_config in runtime_store.load_all().items():
-        if not _provider_enabled(channel_connections_config, provider):
-            continue
-        existing = channels_config.get(provider)
-        merged = dict(runtime_config)
-        if isinstance(existing, dict):
-            merged.update(existing)
-        channels_config[provider] = merged
-
-
-def apply_runtime_connection_config(
-    channel_connections_config: Any,
-    *,
-    store: ChannelRuntimeConfigStore | None = None,
-) -> Any:
-    """Apply persisted connection metadata that lives outside ``channels``.
-
-    Telegram uses a bot username for deep links; UI-entered values are stored
-    with the runtime channel config so local restarts keep the provider
-    configured.
-    """
-    if channel_connections_config is None or not getattr(channel_connections_config, "enabled", False):
-        return channel_connections_config
-
-    runtime_store = store or ChannelRuntimeConfigStore()
-    telegram_runtime_config = runtime_store.get_provider_config("telegram")
-    bot_username = ""
-    if isinstance(telegram_runtime_config, dict):
-        bot_username = str(telegram_runtime_config.get("bot_username") or "").strip()
-    if not bot_username or not _provider_enabled(channel_connections_config, "telegram"):
-        return channel_connections_config
-
-    config = channel_connections_config.model_copy(deep=True)
-    config.telegram.bot_username = bot_username
-    return config
@@ -9,7 +9,6 @@ from typing import TYPE_CHECKING, Any
 from app.channels.base import Channel
 from app.channels.manager import DEFAULT_GATEWAY_URL, DEFAULT_LANGGRAPH_URL, ChannelManager
 from app.channels.message_bus import MessageBus
-from app.channels.runtime_config_store import merge_runtime_channel_configs
 from app.channels.store import ChannelStore

 logger = logging.getLogger(__name__)
@@ -53,30 +52,6 @@ def _resolve_service_url(config: dict[str, Any], config_key: str, env_key: str,
    return default


-def _merge_channel_connection_runtime_config(channels_config: dict[str, Any], app_config: AppConfig) -> None:
-    connection_config = getattr(app_config, "channel_connections", None)
-    merge_runtime_channel_configs(channels_config, connection_config)
-
-
-def _make_connection_repo(app_config: AppConfig):
-    connection_config = getattr(app_config, "channel_connections", None)
-    if connection_config is None or not getattr(connection_config, "enabled", False):
-        return None
-
-    try:
-        from deerflow.persistence.channel_connections import ChannelConnectionRepository
-        from deerflow.persistence.engine import get_session_factory
-    except Exception:
-        logger.exception("Failed to import channel connection repository")
-        return None
-
-    session_factory = get_session_factory()
-    if session_factory is None:
-        logger.warning("Channel connections are enabled but database persistence is not available")
-        return None
-    return ChannelConnectionRepository(session_factory)
-
-
 class ChannelService:
    """Manages the lifecycle of all configured IM channels.

@@ -84,10 +59,9 @@ class ChannelService:
    instantiates enabled channels, and starts the ChannelManager dispatcher.
    """

-    def __init__(self, channels_config: dict[str, Any] | None = None, *, connection_repo: Any | None = None) -> None:
+    def __init__(self, channels_config: dict[str, Any] | None = None) -> None:
        self.bus = MessageBus()
        self.store = ChannelStore()
-        self._connection_repo = connection_repo
        config = dict(channels_config or {})
        langgraph_url = _resolve_service_url(config, "langgraph_url", _CHANNELS_LANGGRAPH_URL_ENV, DEFAULT_LANGGRAPH_URL)
        gateway_url = _resolve_service_url(config, "gateway_url", _CHANNELS_GATEWAY_URL_ENV, DEFAULT_GATEWAY_URL)
@@ -100,7 +74,6 @@ class ChannelService:
            gateway_url=gateway_url,
            default_session=default_session if isinstance(default_session, dict) else None,
            channel_sessions=channel_sessions,
-            connection_repo=connection_repo,
        )
        self._channels: dict[str, Any] = {}  # name -> Channel instance
        self._config = config
@@ -117,9 +90,8 @@ class ChannelService:
        # extra fields are allowed by AppConfig (extra="allow")
        extra = app_config.model_extra or {}
        if "channels" in extra:
-            channels_config = dict(extra["channels"] or {})
-        _merge_channel_connection_runtime_config(channels_config, app_config)
-        return cls(channels_config=channels_config, connection_repo=_make_connection_repo(app_config))
+            channels_config = extra["channels"]
+        return cls(channels_config=channels_config)

    async def start(self) -> None:
        """Start the manager and all enabled channels."""
@@ -179,27 +151,6 @@ class ChannelService:

        return await self._start_channel(name, config)

-    async def configure_channel(self, name: str, config: dict[str, Any]) -> bool:
-        """Apply runtime config for a channel and restart it if the service is running."""
-        self._config[name] = dict(config)
-        if not self._running:
-            return True
-        return await self.restart_channel(name)
-
-    async def remove_channel(self, name: str) -> bool:
-        """Remove runtime config for a channel and stop it if currently running."""
-        self._config.pop(name, None)
-        channel = self._channels.pop(name, None)
-        if channel is None:
-            return True
-        try:
-            await channel.stop()
-            logger.info("Channel %s stopped and removed", name)
-            return True
-        except Exception:
-            logger.exception("Error stopping channel %s for removal", name)
-            return False
-
    async def _start_channel(self, name: str, config: dict[str, Any]) -> bool:
        """Instantiate and start a single channel."""
        import_path = _CHANNEL_REGISTRY.get(name)
@@ -218,8 +169,6 @@ class ChannelService:
        try:
            config = dict(config)
            config["channel_store"] = self.store
-            if self._connection_repo is not None:
-                config["connection_repo"] = self._connection_repo
            channel = channel_cls(bus=self.bus, config=config)
            self._channels[name] = channel
            await channel.start()
@@ -9,8 +9,6 @@ from typing import Any
 from markdown_to_mrkdwn import SlackMarkdownConverter

 from app.channels.base import Channel
-from app.channels.commands import extract_connect_code, is_known_channel_command
-from app.channels.connection_identity import attach_connection_identity
 from app.channels.message_bus import InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment

 logger = logging.getLogger(__name__)
@@ -34,20 +32,6 @@ def _normalize_allowed_users(allowed_users: Any) -> set[str]:
    return {str(user_id) for user_id in values if str(user_id)}


-def _strip_leading_slack_bot_mention(text: str, bot_user_id: str | None) -> str:
-    if not bot_user_id:
-        return text
-    if not text.startswith("<@"):
-        return text
-    end = text.find(">")
-    if end <= 2:
-        return text
-    mentioned_user_id = text[2:end].split("|", 1)[0].lstrip("!")
-    if mentioned_user_id != bot_user_id:
-        return text
-    return text[end + 1 :].lstrip()
-
-
 class SlackChannel(Channel):
    """Slack IM channel using Socket Mode (WebSocket, no public IP).

@@ -65,10 +49,6 @@ class SlackChannel(Channel):
        self._web_client = None
        self._loop: asyncio.AbstractEventLoop | None = None
        self._allowed_users = _normalize_allowed_users(config.get("allowed_users", []))
-        self._connection_repo = config.get("connection_repo")
-        self._web_client_factory = config.get("web_client_factory")
-        configured_bot_user_id = config.get("bot_user_id")
-        self._bot_user_id = str(configured_bot_user_id).lstrip("@") if configured_bot_user_id else None

    async def start(self) -> None:
        if self._running:
@@ -83,28 +63,15 @@ class SlackChannel(Channel):
            return

        self._SocketModeResponse = SocketModeResponse
-        if self._web_client_factory is None:
-            self._web_client_factory = WebClient

        bot_token = self.config.get("bot_token", "")
        app_token = self.config.get("app_token", "")

-        if self._connection_repo is not None and self.config.get("event_delivery") == "http":
-            if not bot_token:
-                logger.error("Slack HTTP Events mode requires bot_token")
-                return
-            await self._initialize_operator_web_client(str(bot_token))
-            self._loop = asyncio.get_event_loop()
-            self._running = True
-            self.bus.subscribe_outbound(self._on_outbound)
-            logger.info("Slack channel started in HTTP Events mode")
-            return
-
        if not bot_token or not app_token:
            logger.error("Slack channel requires bot_token and app_token")
            return

-        await self._initialize_operator_web_client(str(bot_token))
+        self._web_client = WebClient(token=bot_token)
        self._socket_client = SocketModeClient(
            app_token=app_token,
            web_client=self._web_client,
@@ -129,8 +96,7 @@ class SlackChannel(Channel):
        logger.info("Slack channel stopped")

    async def send(self, msg: OutboundMessage, *, _max_retries: int = 3) -> None:
-        web_client = await self._get_web_client_for_message(msg)
-        if not web_client:
+        if not self._web_client:
            return

        kwargs: dict[str, Any] = {
@@ -143,12 +109,11 @@ class SlackChannel(Channel):
        last_exc: Exception | None = None
        for attempt in range(_max_retries):
            try:
-                await asyncio.to_thread(web_client.chat_postMessage, **kwargs)
+                await asyncio.to_thread(self._web_client.chat_postMessage, **kwargs)
                # Add a completion reaction to the thread root
                if msg.thread_ts:
                    await asyncio.to_thread(
-                        self._add_reaction_with_client,
-                        web_client,
+                        self._add_reaction,
                        msg.chat_id,
                        msg.thread_ts,
                        "white_check_mark",
@@ -172,8 +137,7 @@ class SlackChannel(Channel):
        if msg.thread_ts:
            try:
                await asyncio.to_thread(
-                    self._add_reaction_with_client,
-                    web_client,
+                    self._add_reaction,
                    msg.chat_id,
                    msg.thread_ts,
                    "x",
@@ -185,8 +149,7 @@ class SlackChannel(Channel):
        raise last_exc

    async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
-        web_client = await self._get_web_client_for_message(msg)
-        if not web_client:
+        if not self._web_client:
            return False

        try:
@@ -199,7 +162,7 @@ class SlackChannel(Channel):
            if msg.thread_ts:
                kwargs["thread_ts"] = msg.thread_ts

-            await asyncio.to_thread(web_client.files_upload_v2, **kwargs)
+            await asyncio.to_thread(self._web_client.files_upload_v2, **kwargs)
            logger.info("[Slack] file uploaded: %s to channel=%s", attachment.filename, msg.chat_id)
            return True
        except Exception:
@@ -208,38 +171,12 @@ class SlackChannel(Channel):

    # -- internal ----------------------------------------------------------

-    async def _initialize_operator_web_client(self, bot_token: str) -> None:
-        self._web_client = self._web_client_factory(token=bot_token)
-        if self._bot_user_id is not None:
+    def _add_reaction(self, channel_id: str, timestamp: str, emoji: str) -> None:
+        """Add an emoji reaction to a message (best-effort, non-blocking)."""
+        if not self._web_client:
            return
        try:
-            auth_info = await asyncio.to_thread(self._web_client.auth_test)
-            user_id = auth_info.get("user_id") if isinstance(auth_info, dict) else None
-            if user_id is None:
-                auth_get = getattr(auth_info, "get", None)
-                user_id = auth_get("user_id") if callable(auth_get) else None
-            if isinstance(user_id, str) and user_id:
-                self._bot_user_id = user_id
-        except Exception:
-            logger.warning("[Slack] failed to resolve bot user id; app mention text may include the bot mention", exc_info=True)
-
-    async def _get_web_client_for_message(self, msg: OutboundMessage):
-        if msg.connection_id and self._connection_repo is not None:
-            credentials = await self._connection_repo.get_credentials(msg.connection_id)
-            access_token = credentials.get("access_token") if credentials else None
-            if not access_token:
-                return self._web_client
-            if self._web_client_factory is None:
-                from slack_sdk import WebClient
-
-                self._web_client_factory = WebClient
-            return self._web_client_factory(token=access_token)
-        return self._web_client
-
-    @staticmethod
-    def _add_reaction_with_client(web_client, channel_id: str, timestamp: str, emoji: str) -> None:
-        try:
-            web_client.reactions_add(
+            self._web_client.reactions_add(
                channel=channel_id,
                timestamp=timestamp,
                name=emoji,
@@ -248,12 +185,6 @@ class SlackChannel(Channel):
            if "already_reacted" not in str(exc):
                logger.warning("[Slack] failed to add reaction %s: %s", emoji, exc)

-    def _add_reaction(self, channel_id: str, timestamp: str, emoji: str) -> None:
-        """Add an emoji reaction to a message (best-effort, non-blocking)."""
-        if not self._web_client:
-            return
-        self._add_reaction_with_client(self._web_client, channel_id, timestamp, emoji)
-
    def _send_running_reply(self, channel_id: str, thread_ts: str) -> None:
        """Send a 'Working on it......' reply in the thread (called from SDK thread)."""
        if not self._web_client:
@@ -279,26 +210,17 @@ class SlackChannel(Channel):
            if event_type != "events_api":
                return

-            if self._bot_user_id is None:
-                authorization = next((item for item in req.payload.get("authorizations", []) if isinstance(item, dict)), None)
-                user_id = authorization.get("user_id") if authorization else None
-                if isinstance(user_id, str) and user_id:
-                    self._bot_user_id = user_id
-
            event = req.payload.get("event", {})
            etype = event.get("type", "")

            # Handle message events (DM or @mention)
            if etype in ("message", "app_mention"):
-                self._handle_message_event(
-                    event,
-                    team_id=req.payload.get("team_id") or req.payload.get("team") or event.get("team"),
-                )
+                self._handle_message_event(event)

        except Exception:
            logger.exception("Error processing Slack event")

-    def _handle_message_event(self, event: dict, *, team_id: str | None = None) -> None:
+    def _handle_message_event(self, event: dict) -> None:
        # Ignore bot messages
        if event.get("bot_id") or event.get("subtype"):
            return
@@ -311,28 +233,13 @@ class SlackChannel(Channel):
            return

        text = event.get("text", "").strip()
-        if event.get("type") == "app_mention":
-            text = _strip_leading_slack_bot_mention(text, self._bot_user_id)
        if not text:
            return

-        connect_code = extract_connect_code(text)
-        if connect_code:
-            if self._loop and self._loop.is_running():
-                asyncio.run_coroutine_threadsafe(
-                    self._bind_connection_from_connect_code(
-                        event=event,
-                        team_id=str(team_id or event.get("team") or ""),
-                        code=connect_code,
-                    ),
-                    self._loop,
-                )
-            return
-
        channel_id = event.get("channel", "")
        thread_ts = event.get("thread_ts") or event.get("ts", "")

-        if is_known_channel_command(text):
+        if text.startswith("/"):
            msg_type = InboundMessageType.COMMAND
        else:
            msg_type = InboundMessageType.CHAT
@@ -354,61 +261,4 @@ class SlackChannel(Channel):
            self._add_reaction(channel_id, event.get("ts", thread_ts), "eyes")
            # Send "running" reply first (fire-and-forget from SDK thread)
            self._send_running_reply(channel_id, thread_ts)
-            if self._connection_repo is None:
            asyncio.run_coroutine_threadsafe(self.bus.publish_inbound(inbound), self._loop)
-            else:
-                asyncio.run_coroutine_threadsafe(self._publish_inbound_with_connection(inbound, team_id=team_id), self._loop)
-
-    async def _publish_inbound_with_connection(self, inbound, *, team_id: str | None = None) -> None:
-        inbound = await self._attach_connection_identity(inbound, team_id=team_id)
-        await self.bus.publish_inbound(inbound)
-
-    async def _attach_connection_identity(self, inbound, *, team_id: str | None = None):
-        workspace_id = str(team_id or inbound.metadata.get("team_id") or "")
-        return await attach_connection_identity(
-            inbound,
-            repo=self._connection_repo,
-            provider="slack",
-            workspace_id=workspace_id,
-        )
-
-    async def _bind_connection_from_connect_code(self, *, event: dict, team_id: str, code: str) -> bool:
-        if self._connection_repo is None or not code:
-            return False
-
-        channel_id = str(event.get("channel") or "")
-        thread_ts = str(event.get("thread_ts") or event.get("ts") or "")
-        state = await self._connection_repo.consume_oauth_state(provider="slack", state=code)
-        if state is None:
-            self._post_connection_reply(channel_id, "Slack connection code is invalid or expired.", thread_ts)
-            return True
-
-        user_id = str(event.get("user") or "")
-        if not user_id or not team_id:
-            self._post_connection_reply(channel_id, "Slack connection could not be completed from this message.", thread_ts)
-            return True
-
-        await self._connection_repo.upsert_connection(
-            owner_user_id=state["owner_user_id"],
-            provider="slack",
-            external_account_id=user_id,
-            workspace_id=team_id,
-            metadata={
-                "team_id": team_id,
-                "channel_id": channel_id,
-            },
-            status="connected",
-        )
-        self._post_connection_reply(channel_id, "Slack connected to DeerFlow.", thread_ts)
-        return True
-
-    def _post_connection_reply(self, channel_id: str, text: str, thread_ts: str | None = None) -> None:
-        if not self._web_client or not channel_id:
-            return
-        kwargs: dict[str, Any] = {"channel": channel_id, "text": text}
-        if thread_ts:
-            kwargs["thread_ts"] = thread_ts
-        try:
-            self._web_client.chat_postMessage(**kwargs)
-        except Exception:
-            logger.exception("[Slack] failed to send connection reply in channel=%s", channel_id)
@@ -8,7 +8,6 @@ import threading
 from typing import Any

 from app.channels.base import Channel
-from app.channels.connection_identity import attach_connection_identity
 from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment

 logger = logging.getLogger(__name__)
@@ -36,7 +35,6 @@ class TelegramChannel(Channel):
                pass
        # chat_id -> last sent message_id for threaded replies
        self._last_bot_message: dict[str, int] = {}
-        self._connection_repo = config.get("connection_repo")

    async def start(self) -> None:
        if self._running:
@@ -62,17 +60,12 @@ class TelegramChannel(Channel):

        # Command handlers
        app.add_handler(CommandHandler("start", self._cmd_start))
-        app.add_handler(CommandHandler("bootstrap", self._cmd_generic))
        app.add_handler(CommandHandler("new", self._cmd_generic))
        app.add_handler(CommandHandler("status", self._cmd_generic))
        app.add_handler(CommandHandler("models", self._cmd_generic))
        app.add_handler(CommandHandler("memory", self._cmd_generic))
        app.add_handler(CommandHandler("help", self._cmd_generic))

-        # Slash skill commands are dynamic and cannot all be pre-registered
-        # with Telegram, so route unknown slash commands through chat handling.
-        app.add_handler(MessageHandler(filters.TEXT & filters.COMMAND, self._on_text))
-
        # General message handler
        app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, self._on_text))

@@ -178,26 +171,6 @@ class TelegramChannel(Channel):
            logger.exception("[Telegram] failed to send file: %s", attachment.filename)
            return False

-    async def process_webhook_update(self, payload: dict[str, Any]) -> bool:
-        if not self._application:
-            return False
-        try:
-            from telegram import Update
-        except ImportError:
-            logger.error("python-telegram-bot is not installed. Install it with: uv add python-telegram-bot")
-            return False
-
-        update = Update.de_json(payload, self._application.bot)
-        if update is None:
-            return False
-
-        if self._tg_loop and self._tg_loop.is_running():
-            future = asyncio.run_coroutine_threadsafe(self._application.process_update(update), self._tg_loop)
-            await asyncio.wrap_future(future)
-        else:
-            await self._application.process_update(update)
-        return True
-
    # -- helpers -----------------------------------------------------------

    async def _send_running_reply(self, chat_id: str, reply_to_message_id: int) -> None:
@@ -255,90 +228,10 @@ class TelegramChannel(Channel):
            return True
        return user_id in self._allowed_users

-    @staticmethod
-    def _telegram_display_name(user) -> str:
-        full_name = getattr(user, "full_name", None)
-        if isinstance(full_name, str) and full_name:
-            return full_name
-        username = getattr(user, "username", None)
-        if isinstance(username, str) and username:
-            return username
-        return str(getattr(user, "id", ""))
-
-    async def _bind_connection_from_start_token(self, update, state_token: str) -> bool:
-        if self._connection_repo is None or not state_token:
-            return False
-
-        state = await self._connection_repo.consume_oauth_state(provider="telegram", state=state_token)
-        if state is None:
-            await update.message.reply_text("Telegram connection link is invalid or expired.")
-            return True
-
-        owner_user_id = state["owner_user_id"]
-        user_id = str(update.effective_user.id)
-        chat_id = str(update.effective_chat.id)
-        connection = await self._connection_repo.upsert_connection(
-            owner_user_id=owner_user_id,
-            provider="telegram",
-            external_account_id=user_id,
-            external_account_name=self._telegram_display_name(update.effective_user),
-            workspace_id=chat_id,
-            workspace_name=None,
-            metadata={
-                "chat_id": chat_id,
-                "chat_type": update.effective_chat.type,
-                "telegram_username": getattr(update.effective_user, "username", None),
-            },
-            status="connected",
-        )
-        logger.info("[Telegram] bound chat=%s user=%s to DeerFlow user=%s connection=%s", chat_id, user_id, owner_user_id, connection["id"])
-        await update.message.reply_text("Telegram connected to DeerFlow.")
-        return True
-
-    async def _attach_connection_identity(self, inbound: InboundMessage) -> InboundMessage:
-        return await attach_connection_identity(
-            inbound,
-            repo=self._connection_repo,
-            provider="telegram",
-            workspace_id=inbound.chat_id,
-        )
-
-    def _get_bot_username(self, context) -> str | None:
-        bot = getattr(context, "bot", None)
-        username = getattr(bot, "username", None)
-        if not username and self._application is not None:
-            username = getattr(getattr(self._application, "bot", None), "username", None)
-        return str(username) if username else None
-
-    @staticmethod
-    def _strip_bot_username_from_leading_command(text: str, bot_username: str | None) -> str:
-        username = (bot_username or "").lstrip("@").lower()
-        if not username or not text.startswith("/"):
-            return text
-
-        parts = text.split(maxsplit=1)
-        command_token = parts[0]
-        if "@" not in command_token:
-            return text
-
-        command_name, addressed_username = command_token[1:].rsplit("@", 1)
-        if not command_name or addressed_username.lower() != username:
-            return text
-
-        normalized = f"/{command_name}"
-        if len(parts) > 1:
-            normalized = f"{normalized} {parts[1]}"
-        return normalized
-
    async def _cmd_start(self, update, context) -> None:
        """Handle /start command."""
        if not self._check_user(update.effective_user.id):
            return
-        args = getattr(context, "args", []) if context is not None else []
-        if args:
-            handled = await self._bind_connection_from_start_token(update, str(args[0]))
-            if handled:
-                return
        await update.message.reply_text("Welcome to DeerFlow! Send me a message to start a conversation.\nType /help for available commands.")

    async def _process_incoming_with_reply(self, chat_id: str, msg_id: int, inbound: InboundMessage) -> None:
@@ -350,7 +243,7 @@ class TelegramChannel(Channel):
        if not self._check_user(update.effective_user.id):
            return

-        text = self._strip_bot_username_from_leading_command(update.message.text.strip(), self._get_bot_username(context))
+        text = update.message.text
        chat_id = str(update.effective_chat.id)
        user_id = str(update.effective_user.id)
        msg_id = str(update.message.message_id)
@@ -374,7 +267,6 @@ class TelegramChannel(Channel):
            thread_ts=msg_id,
        )
        inbound.topic_id = topic_id
-        inbound = await self._attach_connection_identity(inbound)

        if self._main_loop and self._main_loop.is_running():
            fut = asyncio.run_coroutine_threadsafe(self._process_incoming_with_reply(chat_id, update.message.message_id, inbound), self._main_loop)
@@ -387,7 +279,7 @@ class TelegramChannel(Channel):
        if not self._check_user(update.effective_user.id):
            return

-        text = self._strip_bot_username_from_leading_command(update.message.text.strip(), self._get_bot_username(context))
+        text = update.message.text.strip()
        if not text:
            return

@@ -417,7 +309,6 @@ class TelegramChannel(Channel):
            thread_ts=msg_id,
        )
        inbound.topic_id = topic_id
-        inbound = await self._attach_connection_identity(inbound)

        if self._main_loop and self._main_loop.is_running():
            fut = asyncio.run_coroutine_threadsafe(self._process_incoming_with_reply(chat_id, update.message.message_id, inbound), self._main_loop)
@@ -22,9 +22,7 @@ from cryptography.hazmat.primitives import padding
 from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes

 from app.channels.base import Channel
-from app.channels.commands import extract_connect_code, is_known_channel_command
-from app.channels.connection_identity import attach_connection_identity
-from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
+from app.channels.message_bus import InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment

 logger = logging.getLogger(__name__)

@@ -254,7 +252,6 @@ class WechatChannel(Channel):
        self._state_dir = self._resolve_state_dir(config.get("state_dir"))
        self._cursor_path = self._state_dir / "wechat-getupdates.json" if self._state_dir else None
        self._auth_path = self._state_dir / "wechat-auth.json" if self._state_dir else None
-        self._connection_repo = config.get("connection_repo")
        self._load_state()

    async def start(self) -> None:
@@ -619,21 +616,11 @@ class WechatChannel(Channel):
            if thread_ts:
                self._context_tokens_by_thread[thread_ts] = context_token

-        connect_code = extract_connect_code(text)
-        if connect_code and self._connection_repo is not None:
-            handled = await self._bind_connection_from_connect_code(
-                chat_id=chat_id,
-                context_token=context_token,
-                code=connect_code,
-            )
-            if handled:
-                return
-
        inbound = self._make_inbound(
            chat_id=chat_id,
            user_id=chat_id,
            text=text,
-            msg_type=InboundMessageType.COMMAND if is_known_channel_command(text) else InboundMessageType.CHAT,
+            msg_type=InboundMessageType.COMMAND if text.startswith("/") else InboundMessageType.CHAT,
            thread_ts=thread_ts,
            files=files,
            metadata={
@@ -644,54 +631,8 @@ class WechatChannel(Channel):
            },
        )
        inbound.topic_id = None
-        inbound = await self._attach_connection_identity(inbound)
        await self.bus.publish_inbound(inbound)

-    async def _attach_connection_identity(self, inbound: InboundMessage) -> InboundMessage:
-        return await attach_connection_identity(
-            inbound,
-            repo=self._connection_repo,
-            provider="wechat",
-            workspace_id=inbound.chat_id,
-        )
-
-    async def _bind_connection_from_connect_code(self, *, chat_id: str, context_token: str, code: str) -> bool:
-        if self._connection_repo is None or not code:
-            return False
-
-        state = await self._connection_repo.consume_oauth_state(provider="wechat", state=code)
-        if state is None:
-            await self._send_connection_reply(chat_id, context_token, "WeChat connection code is invalid or expired.")
-            return True
-
-        if not chat_id:
-            await self._send_connection_reply(chat_id, context_token, "WeChat connection could not be completed from this message.")
-            return True
-
-        await self._connection_repo.upsert_connection(
-            owner_user_id=state["owner_user_id"],
-            provider="wechat",
-            external_account_id=chat_id,
-            workspace_id=chat_id,
-            metadata={
-                "context_token": context_token,
-            },
-            status="connected",
-        )
-        await self._send_connection_reply(chat_id, context_token, "WeChat connected to DeerFlow.")
-        return True
-
-    async def _send_connection_reply(self, chat_id: str, context_token: str, text: str) -> None:
-        if not context_token:
-            return
-        await self._send_text_message(
-            chat_id=chat_id,
-            context_token=context_token,
-            text=text,
-            client_id_prefix="deerflow-connect",
-            max_retries=1,
-        )
-
    async def _ensure_authenticated(self) -> bool:
        async with self._auth_lock:
            if self._bot_token:
@@ -8,10 +8,7 @@ from collections.abc import Awaitable, Callable
 from typing import Any, cast

 from app.channels.base import Channel
-from app.channels.commands import extract_connect_code, is_known_channel_command
-from app.channels.connection_identity import attach_connection_identity
 from app.channels.message_bus import (
-    InboundMessage,
    InboundMessageType,
    MessageBus,
    OutboundMessage,
@@ -31,7 +28,6 @@ class WeComChannel(Channel):
        self._ws_frames: dict[str, dict[str, Any]] = {}
        self._ws_stream_ids: dict[str, str] = {}
        self._working_message = "Working on it..."
-        self._connection_repo = config.get("connection_repo")

    @property
    def supports_streaming(self) -> bool:
@@ -274,17 +270,7 @@ class WeComChannel(Channel):

        user_id = (body.get("from") or {}).get("userid")

-        connect_code = extract_connect_code(text)
-        if connect_code and self._connection_repo is not None:
-            handled = await self._bind_connection_from_connect_code(
-                frame=frame,
-                user_id=str(user_id or ""),
-                code=connect_code,
-            )
-            if handled:
-                return
-
-        inbound_type = InboundMessageType.COMMAND if is_known_channel_command(text) else InboundMessageType.CHAT
+        inbound_type = InboundMessageType.COMMAND if text.startswith("/") else InboundMessageType.CHAT
        inbound = self._make_inbound(
            chat_id=user_id,  # keep user's conversation in memory
            user_id=user_id,
@@ -305,52 +291,8 @@ class WeComChannel(Channel):
        except Exception:
            pass

-        inbound = await self._attach_connection_identity(inbound)
        await self.bus.publish_inbound(inbound)

-    async def _attach_connection_identity(self, inbound: InboundMessage) -> InboundMessage:
-        return await attach_connection_identity(
-            inbound,
-            repo=self._connection_repo,
-            provider="wecom",
-            workspace_id=str(inbound.metadata.get("aibotid") or "") or None,
-            fallback_without_workspace=True,
-        )
-
-    async def _bind_connection_from_connect_code(self, *, frame: dict[str, Any], user_id: str, code: str) -> bool:
-        if self._connection_repo is None or not code:
-            return False
-
-        state = await self._connection_repo.consume_oauth_state(provider="wecom", state=code)
-        if state is None:
-            await self._send_connection_reply(frame, "WeCom connection code is invalid or expired.")
-            return True
-
-        if not user_id:
-            await self._send_connection_reply(frame, "WeCom connection could not be completed from this message.")
-            return True
-
-        body = frame.get("body", {}) or {}
-        workspace_id = str(body.get("aibotid") or "") or None
-        await self._connection_repo.upsert_connection(
-            owner_user_id=state["owner_user_id"],
-            provider="wecom",
-            external_account_id=user_id,
-            workspace_id=workspace_id,
-            metadata={
-                "aibotid": workspace_id,
-                "chattype": body.get("chattype"),
-            },
-            status="connected",
-        )
-        await self._send_connection_reply(frame, "WeCom connected to DeerFlow.")
-        return True
-
-    async def _send_connection_reply(self, frame: dict[str, Any], text: str) -> None:
-        if not self._ws_client:
-            return
-        await self._ws_client.reply(frame, {"msgtype": "text", "text": {"content": text}})
-
    async def _send_ws(self, msg: OutboundMessage, *, _max_retries: int = 3) -> None:
        if not self._ws_client:
            return
@@ -6,7 +6,6 @@ from contextlib import asynccontextmanager
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware

-from app.gateway.auth_disabled import warn_if_auth_disabled_enabled
 from app.gateway.auth_middleware import AuthMiddleware
 from app.gateway.config import get_gateway_config
 from app.gateway.csrf_middleware import CSRFMiddleware, get_configured_cors_origins
@@ -16,7 +15,6 @@ from app.gateway.routers import (
    artifacts,
    assistants_compat,
    auth,
-    channel_connections,
    channels,
    feedback,
    mcp,
@@ -174,7 +172,6 @@ async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
        startup_config = get_app_config()
        apply_logging_level(startup_config.log_level)
        logger.info("Configuration loaded successfully")
-        warn_if_auth_disabled_enabled()
    except Exception as e:
        error_msg = f"Failed to load configuration during gateway startup: {e}"
        logger.exception(error_msg)
@@ -182,31 +179,6 @@ async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    config = get_gateway_config()
    logger.info(f"Starting API Gateway on {config.host}:{config.port}")

-    # Pre-warm tiktoken encoding cache so the first memory-injection request
-    # never blocks on the BPE data download (which hits an OpenAI/Azure URL
-    # that may be unreachable in restricted networks — see issue #3402).
-    # When memory.token_counting is "char", token counting never touches
-    # tiktoken, so skip the warm-up entirely (avoids even the 5s probe in
-    # network-restricted deployments — see issue #3429).
-    if startup_config.memory.token_counting == "char":
-        logger.info("memory.token_counting='char'; skipping tiktoken warm-up (network-free token estimation)")
-    else:
-        try:
-            from deerflow.agents.memory.prompt import warm_tiktoken_cache
-
-            warmed = await asyncio.wait_for(
-                asyncio.to_thread(warm_tiktoken_cache),
-                timeout=5,
-            )
-            if warmed:
-                logger.info("tiktoken encoding cache warmed successfully")
-            else:
-                logger.warning("tiktoken encoding cache warm-up failed; token counting will use character-based fallback until tiktoken loads successfully")
-        except TimeoutError:
-            logger.warning("tiktoken encoding cache warm-up timed out; token counting will use character-based fallback until tiktoken loads successfully")
-        except Exception:
-            logger.warning("tiktoken warm-up skipped", exc_info=True)
-
    # Initialize LangGraph runtime components (StreamBridge, RunManager, checkpointer, store)
    async with langgraph_runtime(app, startup_config):
        logger.info("LangGraph runtime initialised")
@@ -385,9 +357,6 @@ This gateway provides runtime endpoints for agent runs plus custom endpoints for
    # Suggestions API is mounted at /api/threads/{thread_id}/suggestions
    app.include_router(suggestions.router)

-    # User-facing IM channel connection API is mounted at /api/channels
-    app.include_router(channel_connections.router)
-
    # Channels API is mounted at /api/channels
    app.include_router(channels.router)

@@ -1,56 +0,0 @@
-"""Shared helpers for local/E2E auth-disabled mode."""
-
-from __future__ import annotations
-
-import logging
-import os
-from types import SimpleNamespace
-
-from deerflow.runtime.user_context import DEFAULT_USER_ID
-
-AUTH_DISABLED_ENV_VAR = "DEER_FLOW_AUTH_DISABLED"
-AUTH_DISABLED_USER_ID = DEFAULT_USER_ID
-AUTH_DISABLED_USER_EMAIL = "default@test.local"
-
-AUTH_SOURCE_SESSION = "session"
-AUTH_SOURCE_INTERNAL = "internal"
-AUTH_SOURCE_AUTH_DISABLED = "auth_disabled"
-
-_PRODUCTION_ENV_VARS: tuple[str, ...] = ("DEER_FLOW_ENV", "ENVIRONMENT")
-_PRODUCTION_ENV_VALUES: frozenset[str] = frozenset({"prod", "production"})
-
-logger = logging.getLogger(__name__)
-
-
-def is_explicit_production_environment() -> bool:
-    return any(os.environ.get(name, "").strip().lower() in _PRODUCTION_ENV_VALUES for name in _PRODUCTION_ENV_VARS)
-
-
-def is_auth_disabled_requested() -> bool:
-    return os.environ.get(AUTH_DISABLED_ENV_VAR) == "1"
-
-
-def is_auth_disabled() -> bool:
-    return is_auth_disabled_requested() and not is_explicit_production_environment()
-
-
-def warn_if_auth_disabled_enabled() -> None:
-    if not is_auth_disabled():
-        return
-
-    logger.warning(
-        "%s=1 is active: authentication is bypassed and anonymous requests run as synthetic admin user %r. Do not enable this in shared or production deployments.",
-        AUTH_DISABLED_ENV_VAR,
-        AUTH_DISABLED_USER_ID,
-    )
-
-
-def get_auth_disabled_user():
-    return SimpleNamespace(
-        id=AUTH_DISABLED_USER_ID,
-        email=AUTH_DISABLED_USER_EMAIL,
-        password_hash=None,
-        system_role="admin",
-        needs_setup=False,
-        token_version=0,
-    )
@@ -17,13 +17,6 @@ from starlette.responses import JSONResponse
 from starlette.types import ASGIApp

 from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse
-from app.gateway.auth_disabled import (
-    AUTH_SOURCE_AUTH_DISABLED,
-    AUTH_SOURCE_INTERNAL,
-    AUTH_SOURCE_SESSION,
-    get_auth_disabled_user,
-    is_auth_disabled,
-)
 from app.gateway.authz import _ALL_PERMISSIONS, AuthContext
 from app.gateway.internal_auth import INTERNAL_AUTH_HEADER_NAME, get_internal_user, is_valid_internal_auth_token
 from deerflow.runtime.user_context import reset_current_user, set_current_user
@@ -87,14 +80,18 @@ class AuthMiddleware(BaseHTTPMiddleware):
        if is_valid_internal_auth_token(request.headers.get(INTERNAL_AUTH_HEADER_NAME)):
            internal_user = get_internal_user()

-        auth_source = AUTH_SOURCE_SESSION
-        access_token = request.cookies.get("access_token")
-
        # Non-public path: require session cookie
-        if internal_user is not None:
-            user = internal_user
-            auth_source = AUTH_SOURCE_INTERNAL
-        elif access_token:
+        if internal_user is None and not request.cookies.get("access_token"):
+            return JSONResponse(
+                status_code=401,
+                content={
+                    "detail": AuthErrorResponse(
+                        code=AuthErrorCode.NOT_AUTHENTICATED,
+                        message="Authentication required",
+                    ).model_dump()
+                },
+            )
+
        # Strict JWT validation: reject junk/expired tokens with 401
        # right here instead of silently passing through. This closes
        # the "junk cookie bypass" gap (AUTH_TEST_PLAN test 7.5.8):
@@ -108,33 +105,19 @@ class AuthMiddleware(BaseHTTPMiddleware):
        # bubble up, so we catch and render it as JSONResponse here.
        from app.gateway.deps import get_current_user_from_request

+        if internal_user is not None:
+            user = internal_user
+        else:
            try:
                user = await get_current_user_from_request(request)
            except HTTPException as exc:
-                if not is_auth_disabled():
                return JSONResponse(status_code=exc.status_code, content={"detail": exc.detail})
-                user = get_auth_disabled_user()
-                auth_source = AUTH_SOURCE_AUTH_DISABLED
-        elif is_auth_disabled():
-            user = get_auth_disabled_user()
-            auth_source = AUTH_SOURCE_AUTH_DISABLED
-        else:
-            return JSONResponse(
-                status_code=401,
-                content={
-                    "detail": AuthErrorResponse(
-                        code=AuthErrorCode.NOT_AUTHENTICATED,
-                        message="Authentication required",
-                    ).model_dump()
-                },
-            )

        # Stamp both request.state.user (for the contextvar pattern)
        # and request.state.auth (so @require_permission's "auth is
        # None" branch short-circuits instead of running the entire
        # JWT-decode + DB-lookup pipeline a second time per request).
        request.state.user = user
-        request.state.auth_source = auth_source
        request.state.auth = AuthContext(user=user, permissions=_ALL_PERMISSIONS)
        token = set_current_user(user)
        try:
@@ -276,11 +276,6 @@ def require_permission(
            # strict-deny rather than strict-allow — only an *existing*
            # row with a *different* user_id triggers 404.
            if owner_check:
-                from app.gateway.internal_auth import INTERNAL_SYSTEM_ROLE
-
-                if getattr(auth.user, "system_role", None) == INTERNAL_SYSTEM_ROLE:
-                    return await func(*args, **kwargs)
-
                thread_id = kwargs.get("thread_id")
                if thread_id is None:
                    raise ValueError("require_permission with owner_check=True requires 'thread_id' parameter")
@@ -14,8 +14,6 @@ from starlette.middleware.base import BaseHTTPMiddleware
 from starlette.responses import JSONResponse
 from starlette.types import ASGIApp

-from app.gateway.auth_disabled import is_auth_disabled
-
 CSRF_COOKIE_NAME = "csrf_token"
 CSRF_HEADER_NAME = "X-CSRF-Token"
 CSRF_TOKEN_LENGTH = 64  # bytes
@@ -40,9 +38,6 @@ def should_check_csrf(request: Request) -> bool:
    if request.method not in ("POST", "PUT", "DELETE", "PATCH"):
        return False

-    if is_auth_disabled():
-        return False
-
    path = request.url.path.rstrip("/")
    # Exempt /api/v1/auth/me endpoint
    if path == "/api/v1/auth/me":
@@ -17,7 +17,6 @@ Initialization is handled directly in ``app.py`` via :class:`AsyncExitStack`.

 from __future__ import annotations

-import asyncio
 import logging
 from collections.abc import AsyncGenerator, Callable
 from contextlib import AsyncExitStack, asynccontextmanager
@@ -34,43 +33,6 @@ from deerflow.runtime.runs.store.base import RunStore

 logger = logging.getLogger(__name__)

-# Upper bound (seconds) for draining in-flight runs during shutdown, before the
-# AsyncExitStack tears down the checkpointer (and its connection pool). Kept
-# local to avoid an app -> deps -> app import cycle. This is a *separate* budget
-# from ``app.gateway.app._SHUTDOWN_HOOK_TIMEOUT_SECONDS`` (currently also 5.0s,
-# which bounds channel-service stop): the two govern independent teardown steps
-# and may diverge, but both count toward the lifespan shutdown window — revisit
-# them together if their sum must stay within the server's graceful-shutdown
-# timeout.
-_RUN_DRAIN_TIMEOUT_SECONDS = 5.0
-
-
-async def _drain_inflight_runs(run_manager: RunManager) -> None:
-    """Drain in-flight runs before the checkpointer is torn down (issue #3373).
-
-    Shields the (internally-bounded) drain so that even if the lifespan
-    coroutine is itself cancelled mid-shutdown — a second SIGINT or the server's
-    graceful-shutdown timeout, i.e. the same signal storm behind #3373 — the
-    checkpointer pool is not closed while run tasks are still writing
-    checkpoints. On such a cancellation we let the already-running drain finish
-    (it is bounded by ``RunManager.shutdown``'s own timeout) and then propagate
-    the cancellation.
-    """
-    drain = asyncio.create_task(run_manager.shutdown(timeout=_RUN_DRAIN_TIMEOUT_SECONDS))
-    try:
-        await asyncio.shield(drain)
-    except asyncio.CancelledError:
-        # Re-shield so this second wait does not abandon the in-flight drain;
-        # it is bounded, so this cannot hang. Then re-raise to honour shutdown.
-        try:
-            await asyncio.shield(drain)
-        except Exception:
-            logger.exception("In-flight run drain failed after shutdown cancellation")
-        raise
-    except Exception:
-        logger.exception("Failed to drain in-flight runs during shutdown")
-
-
 if TYPE_CHECKING:
    from app.gateway.auth.local_provider import LocalAuthProvider
    from app.gateway.auth.repositories.sqlite import SQLiteUserRepository
@@ -119,16 +81,6 @@ def get_config() -> AppConfig:
    split-brain where the worker / lead-agent thread saw a stale startup
    snapshot.

-    Hot-reload boundary: fields backed by startup-time singletons
-    (engines, sandbox provider, IM channels, logging handler) require a
-    process restart to change at runtime. The authoritative list lives in
-    :mod:`deerflow.config.reload_boundary` and is mirrored by the
-    standardised ``"startup-only:"`` prefix on the matching
-    ``Field(description=...)`` in :class:`AppConfig` — IDE hover on those
-    fields will surface the boundary inline. See
-    ``backend/CLAUDE.md`` "Config Hot-Reload Boundary" for the operator
-    summary.
-
    Any failure to materialise the config (missing file, permission denied,
    YAML parse error, validation error) is reported as 503 — semantically
    "the gateway cannot serve requests without a usable configuration" — and
@@ -225,14 +177,6 @@ async def langgraph_runtime(app: FastAPI, startup_config: AppConfig) -> AsyncGen
        try:
            yield
        finally:
-            # Drain in-flight run tasks BEFORE the AsyncExitStack tears down the
-            # checkpointer (and its connection pool). A run still mid-graph would
-            # otherwise leak into asyncio.run() shutdown, where langgraph's
-            # _checkpointer_put_after_previous aput races the closed pool and
-            # raises PoolClosed (issue #3373).
-            run_manager = getattr(app.state, "run_manager", None)
-            if run_manager is not None:
-                await _drain_inflight_runs(run_manager)
            await close_engine()


@@ -331,17 +275,6 @@ async def get_current_user_from_request(request: Request):

    Raises HTTPException 401 if not authenticated.
    """
-    state = getattr(request, "state", None)
-    state_user = getattr(state, "user", None)
-    from app.gateway.auth_disabled import AUTH_SOURCE_AUTH_DISABLED, AUTH_SOURCE_INTERNAL, AUTH_SOURCE_SESSION
-
-    if state_user is not None and getattr(state, "auth_source", None) in {
-        AUTH_SOURCE_SESSION,
-        AUTH_SOURCE_AUTH_DISABLED,
-        AUTH_SOURCE_INTERNAL,
-    }:
-        return state_user
-
    from app.gateway.auth import decode_token
    from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse, TokenError, token_error_to_code

@@ -5,14 +5,11 @@ from __future__ import annotations
 import os
 import secrets
 from types import SimpleNamespace
-from typing import Any

 from deerflow.runtime.user_context import DEFAULT_USER_ID

 INTERNAL_AUTH_HEADER_NAME = "X-DeerFlow-Internal-Token"
-INTERNAL_OWNER_USER_ID_HEADER_NAME = "X-DeerFlow-Owner-User-Id"
 INTERNAL_AUTH_ENV_VAR = "DEER_FLOW_INTERNAL_AUTH_TOKEN"
-INTERNAL_SYSTEM_ROLE = "internal"


 def _load_internal_auth_token() -> str:
@@ -25,12 +22,9 @@ def _load_internal_auth_token() -> str:
 _INTERNAL_AUTH_TOKEN = _load_internal_auth_token()


-def create_internal_auth_headers(*, owner_user_id: str | None = None) -> dict[str, str]:
+def create_internal_auth_headers() -> dict[str, str]:
    """Return headers that authenticate trusted Gateway internal calls."""
-    headers = {INTERNAL_AUTH_HEADER_NAME: _INTERNAL_AUTH_TOKEN}
-    if owner_user_id:
-        headers[INTERNAL_OWNER_USER_ID_HEADER_NAME] = owner_user_id
-    return headers
+    return {INTERNAL_AUTH_HEADER_NAME: _INTERNAL_AUTH_TOKEN}


 def is_valid_internal_auth_token(token: str | None) -> bool:
@@ -40,22 +34,4 @@ def is_valid_internal_auth_token(token: str | None) -> bool:

 def get_internal_user():
    """Return the synthetic user used for trusted internal channel calls."""
-    return SimpleNamespace(id=DEFAULT_USER_ID, system_role=INTERNAL_SYSTEM_ROLE)
-
-
-def get_trusted_internal_owner_user_id(request: Any) -> str | None:
-    """Return the owner override for a trusted internal request, if present.
-
-    The header is ignored for normal browser/API callers. It is only honored
-    after ``AuthMiddleware`` has validated the internal auth token and stamped
-    the synthetic internal user onto ``request.state.user``.
-    """
-    user = getattr(getattr(request, "state", None), "user", None)
-    if getattr(user, "system_role", None) != INTERNAL_SYSTEM_ROLE:
-        return None
-
-    owner_user_id = request.headers.get(INTERNAL_OWNER_USER_ID_HEADER_NAME)
-    if not owner_user_id:
-        return None
-    owner_user_id = owner_user_id.strip()
-    return owner_user_id or None
+    return SimpleNamespace(id=DEFAULT_USER_ID, system_role="internal")
@@ -20,7 +20,6 @@ from langgraph_sdk import Auth

 from app.gateway.auth.errors import TokenError
 from app.gateway.auth.jwt import decode_token
-from app.gateway.auth_disabled import AUTH_DISABLED_USER_ID, is_auth_disabled
 from app.gateway.deps import get_local_provider

 auth = Auth()
@@ -39,9 +38,6 @@ def _check_csrf(request) -> None:
    if method.upper() not in _CSRF_METHODS:
        return

-    if is_auth_disabled():
-        return
-
    cookie_token = request.cookies.get("csrf_token")
    header_token = request.headers.get("x-csrf-token")

@@ -70,9 +66,6 @@ async def authenticate(request):
    # are rejected early, even if the cookie carries a valid JWT.
    _check_csrf(request)

-    if is_auth_disabled():
-        return AUTH_DISABLED_USER_ID
-
    token = request.cookies.get("access_token")
    if not token:
        raise Auth.exceptions.HTTPException(
@@ -1,15 +0,0 @@
-"""Shared pagination helpers for gateway routers."""
-
-from __future__ import annotations
-
-
-def trim_run_message_page(rows: list[dict], *, limit: int, after_seq: int | None) -> tuple[list[dict], bool]:
-    """Trim a ``limit + 1`` run-message page while preserving page boundaries."""
-    has_more = len(rows) > limit
-    if not has_more:
-        return rows, False
-
-    if after_seq is not None:
-        return rows[:limit], True
-
-    return rows[-limit:], True
@@ -1,6 +1,5 @@
 """CRUD API for custom agents."""

-import asyncio
 import logging
 import re
 import shutil
@@ -214,21 +213,15 @@ async def create_agent_endpoint(request: AgentCreateRequest) -> AgentResponse:
    user_id = get_effective_user_id()
    paths = get_paths()

-    def _create_agent() -> AgentResponse | None:
-        # Worker thread: base-dir resolution, existence checks, directory/file
-        # creation, read-back, and failure cleanup are all blocking filesystem
-        # IO that must stay off the event loop.
    agent_dir = paths.user_agent_dir(user_id, normalized_name)
    legacy_dir = paths.agent_dir(normalized_name)

-        if legacy_dir.exists():
-            return None  # signals 409 to the caller
+    if agent_dir.exists() or legacy_dir.exists():
+        raise HTTPException(status_code=409, detail=f"Agent '{normalized_name}' already exists")

    try:
-            try:
-                agent_dir.mkdir(parents=True, exist_ok=False)
-            except FileExistsError:
-                return None  # signals 409 to the caller
+        agent_dir.mkdir(parents=True, exist_ok=True)
+
        # Write config.yaml
        config_data: dict = {"name": normalized_name}
        if request.description:
@@ -252,23 +245,16 @@ async def create_agent_endpoint(request: AgentCreateRequest) -> AgentResponse:

        agent_cfg = load_agent_config(normalized_name, user_id=user_id)
        return _agent_config_to_response(agent_cfg, include_soul=True, user_id=user_id)
-        except Exception:
-            # Clean up partial state on failure before surfacing the error.
+
+    except HTTPException:
+        raise
+    except Exception as e:
+        # Clean up on failure
        if agent_dir.exists():
            shutil.rmtree(agent_dir)
-            raise
-
-    try:
-        response = await asyncio.to_thread(_create_agent)
-    except Exception as e:
        logger.error(f"Failed to create agent '{request.name}': {e}", exc_info=True)
        raise HTTPException(status_code=500, detail=f"Failed to create agent: {str(e)}")

-    if response is None:
-        raise HTTPException(status_code=409, detail=f"Agent '{normalized_name}' already exists")
-
-    return response
-

@router.put(
    "/agents/{name}",
@@ -442,30 +428,19 @@ async def delete_agent(name: str) -> None:
    name = _normalize_agent_name(name)
    user_id = get_effective_user_id()
    paths = get_paths()
-
-    def _remove_agent_dir() -> tuple[str, str]:
-        # Runs in a worker thread: resolving the base dir, probing the directory
-        # (`exists`), and removing it (`rmtree`) are all blocking filesystem IO
-        # that must stay off the event loop.
    agent_dir = paths.user_agent_dir(user_id, name)
+
    if not agent_dir.exists():
-            outcome = "legacy" if paths.agent_dir(name).exists() else "missing"
-            return outcome, str(agent_dir)
-        shutil.rmtree(agent_dir)
-        return "deleted", str(agent_dir)
-
-    try:
-        outcome, agent_dir = await asyncio.to_thread(_remove_agent_dir)
-    except Exception as e:
-        logger.error(f"Failed to delete agent '{name}': {e}", exc_info=True)
-        raise HTTPException(status_code=500, detail=f"Failed to delete agent: {str(e)}")
-
-    if outcome == "legacy":
+        if paths.agent_dir(name).exists():
            raise HTTPException(
                status_code=409,
                detail=(f"Agent '{name}' only exists in the legacy shared layout and is not scoped to a user. Run scripts/migrate_user_isolation.py to move legacy agents into the per-user layout before deleting."),
            )
-    if outcome == "missing":
        raise HTTPException(status_code=404, detail=f"Agent '{name}' not found")

+    try:
+        shutil.rmtree(agent_dir)
        logger.info(f"Deleted agent '{name}' from {agent_dir}")
+    except Exception as e:
+        logger.error(f"Failed to delete agent '{name}': {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Failed to delete agent: {str(e)}")
@@ -341,19 +341,9 @@ async def change_password(request: Request, response: Response, body: ChangePass
    - Re-issues session cookie with new token_version
    """
    from app.gateway.auth.password import hash_password_async, verify_password_async
-    from app.gateway.auth_disabled import AUTH_SOURCE_AUTH_DISABLED

    user = await get_current_user_from_request(request)

-    if getattr(request.state, "auth_source", None) == AUTH_SOURCE_AUTH_DISABLED:
-        raise HTTPException(
-            status_code=status.HTTP_400_BAD_REQUEST,
-            detail=AuthErrorResponse(
-                code=AuthErrorCode.INVALID_CREDENTIALS,
-                message="Password changes are not available when DEER_FLOW_AUTH_DISABLED=1.",
-            ).model_dump(),
-        )
-
    if user.password_hash is None:
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=AuthErrorResponse(code=AuthErrorCode.INVALID_CREDENTIALS, message="OAuth users cannot change password").model_dump())

@@ -1,600 +0,0 @@
-"""Browser-facing APIs for user-owned IM channel bindings."""
-
-from __future__ import annotations
-
-import logging
-import secrets
-from datetime import UTC, datetime, timedelta
-from typing import Any
-
-from fastapi import APIRouter, HTTPException, Request, Response
-from pydantic import BaseModel, Field
-
-from app.channels.runtime_config_store import (
-    ChannelRuntimeConfigStore,
-    apply_runtime_connection_config,
-    merge_runtime_channel_configs,
-)
-from deerflow.config.channel_connections_config import ChannelConnectionsConfig
-from deerflow.persistence.channel_connections import ChannelConnectionRepository
-from deerflow.persistence.engine import get_session_factory
-
-router = APIRouter(prefix="/api/channels", tags=["channel-connections"])
-logger = logging.getLogger(__name__)
-
-_STATE_TTL_SECONDS = 600
-_MASKED_CREDENTIAL_VALUE = "********"
-
-
-class ChannelCredentialFieldResponse(BaseModel):
-    name: str
-    label: str
-    type: str = "text"
-    required: bool = True
-
-
-class ChannelProviderResponse(BaseModel):
-    provider: str
-    display_name: str
-    enabled: bool
-    configured: bool
-    connectable: bool
-    unavailable_reason: str | None = None
-    auth_mode: str
-    connection_status: str
-    credential_fields: list[ChannelCredentialFieldResponse] = Field(default_factory=list)
-    credential_values: dict[str, str] = Field(default_factory=dict)
-
-
-class ChannelProvidersResponse(BaseModel):
-    enabled: bool
-    providers: list[ChannelProviderResponse]
-
-
-class ChannelConnectionResponse(BaseModel):
-    id: str
-    provider: str
-    status: str
-    external_account_id: str | None = None
-    external_account_name: str | None = None
-    workspace_id: str | None = None
-    workspace_name: str | None = None
-    scopes: list[str] = Field(default_factory=list)
-    metadata: dict[str, Any] = Field(default_factory=dict)
-
-
-class ChannelConnectionsResponse(BaseModel):
-    connections: list[ChannelConnectionResponse]
-
-
-class ChannelConnectResponse(BaseModel):
-    provider: str
-    mode: str
-    url: str | None = None
-    code: str
-    instruction: str
-    expires_in: int
-
-
-class ChannelRuntimeConfigRequest(BaseModel):
-    values: dict[str, str] = Field(default_factory=dict)
-
-
-_PROVIDER_META: dict[str, dict[str, str]] = {
-    "telegram": {"display_name": "Telegram", "auth_mode": "deep_link"},
-    "slack": {"display_name": "Slack", "auth_mode": "binding_code"},
-    "discord": {"display_name": "Discord", "auth_mode": "binding_code"},
-    "feishu": {"display_name": "Feishu", "auth_mode": "binding_code"},
-    "dingtalk": {"display_name": "DingTalk", "auth_mode": "binding_code"},
-    "wechat": {"display_name": "WeChat", "auth_mode": "binding_code"},
-    "wecom": {"display_name": "WeCom", "auth_mode": "binding_code"},
-}
-
-_CREDENTIAL_FIELDS: dict[str, tuple[dict[str, str], ...]] = {
-    "telegram": (
-        {"name": "bot_token", "label": "Bot token", "type": "password"},
-        {"name": "bot_username", "label": "Bot username", "type": "text"},
-    ),
-    "slack": (
-        {"name": "bot_token", "label": "Bot token", "type": "password"},
-        {"name": "app_token", "label": "App token", "type": "password"},
-    ),
-    "discord": ({"name": "bot_token", "label": "Bot token", "type": "password"},),
-    "feishu": (
-        {"name": "app_id", "label": "App ID", "type": "text"},
-        {"name": "app_secret", "label": "App secret", "type": "password"},
-    ),
-    "dingtalk": (
-        {"name": "client_id", "label": "Client ID", "type": "text"},
-        {"name": "client_secret", "label": "Client secret", "type": "password"},
-    ),
-    "wechat": ({"name": "bot_token", "label": "Bot token", "type": "password"},),
-    "wecom": (
-        {"name": "bot_id", "label": "Bot ID", "type": "text"},
-        {"name": "bot_secret", "label": "Bot secret", "type": "password"},
-    ),
-}
-
-_RUNTIME_REQUIREMENTS: dict[str, tuple[str, ...]] = {
-    "telegram": ("bot_token",),
-    "slack": ("bot_token", "app_token"),
-    "discord": ("bot_token",),
-    "feishu": ("app_id", "app_secret"),
-    "dingtalk": ("client_id", "client_secret"),
-    "wechat": ("bot_token",),
-    "wecom": ("bot_id", "bot_secret"),
-}
-
-
-def _get_user_id(request: Request) -> str:
-    user = getattr(request.state, "user", None)
-    if user is None:
-        raise HTTPException(status_code=401, detail="Authentication required")
-    return str(user.id)
-
-
-def _get_app_config():
-    from deerflow.config.app_config import get_app_config
-
-    return get_app_config()
-
-
-def _get_runtime_config_store(request: Request) -> ChannelRuntimeConfigStore:
-    store = getattr(request.app.state, "channel_runtime_config_store", None)
-    if isinstance(store, ChannelRuntimeConfigStore):
-        return store
-    store = ChannelRuntimeConfigStore()
-    request.app.state.channel_runtime_config_store = store
-    return store
-
-
-def _get_channel_connections_config(request: Request) -> ChannelConnectionsConfig:
-    config = getattr(request.app.state, "channel_connections_config", None)
-    if not isinstance(config, ChannelConnectionsConfig):
-        config = _get_app_config().channel_connections
-    config = apply_runtime_connection_config(config, store=_get_runtime_config_store(request))
-    request.app.state.channel_connections_config = config
-    return config
-
-
-def _get_channels_config(request: Request) -> dict[str, Any]:
-    state_config = getattr(request.app.state, "channels_config", None)
-    if isinstance(state_config, dict):
-        return state_config
-
-    result = _load_channels_config(request, _get_channel_connections_config(request))
-    request.app.state.channels_config = result
-    return result
-
-
-def _load_channels_config(request: Request, config: ChannelConnectionsConfig) -> dict[str, Any]:
-    app_config = _get_app_config()
-    extra = app_config.model_extra or {}
-    channels_config = extra.get("channels")
-    result = dict(channels_config) if isinstance(channels_config, dict) else {}
-    merge_runtime_channel_configs(
-        result,
-        config,
-        store=_get_runtime_config_store(request),
-    )
-    return result
-
-
-def _get_repository(request: Request, config: ChannelConnectionsConfig) -> ChannelConnectionRepository:
-    repo = getattr(request.app.state, "channel_connection_repo", None)
-    if isinstance(repo, ChannelConnectionRepository):
-        return repo
-
-    sf = get_session_factory()
-    if sf is None:
-        raise HTTPException(status_code=503, detail="Channel connection persistence is not available")
-
-    repo = ChannelConnectionRepository(sf)
-    request.app.state.channel_connection_repo = repo
-    return repo
-
-
-def _provider_config(config: ChannelConnectionsConfig, provider: str):
-    provider_config = getattr(config, provider, None)
-    if provider_config is None:
-        raise HTTPException(status_code=404, detail="Unknown channel provider")
-    return provider_config
-
-
-def _runtime_channel_configured(provider: str, channels_config: dict[str, Any]) -> bool:
-    runtime_config = channels_config.get(provider)
-    if not isinstance(runtime_config, dict) or not runtime_config.get("enabled", False):
-        return False
-    return all(str(runtime_config.get(key) or "").strip() for key in _RUNTIME_REQUIREMENTS[provider])
-
-
-def _runtime_unavailable_reason(provider: str) -> str:
-    meta = _PROVIDER_META.get(provider)
-    display_name = meta["display_name"] if meta else provider
-    return f"Enter the required {display_name} credentials to connect this channel."
-
-
-def _runtime_not_running_reason(provider: str) -> str:
-    meta = _PROVIDER_META.get(provider)
-    display_name = meta["display_name"] if meta else provider
-    return f"{display_name} channel is configured but is not running. Check the credentials and save this channel again."
-
-
-def _runtime_channel_running(provider: str) -> bool | None:
-    try:
-        from app.channels.service import get_channel_service
-    except Exception:
-        logger.debug("Unable to inspect channel service status", exc_info=True)
-        return None
-
-    service = get_channel_service()
-    if service is None:
-        return None
-    try:
-        status = service.get_status()
-    except Exception:
-        logger.debug("Unable to read channel service status", exc_info=True)
-        return None
-
-    if not status.get("service_running"):
-        return False
-    channel_status = status.get("channels", {}).get(provider)
-    if not isinstance(channel_status, dict):
-        return None
-    return bool(channel_status.get("running"))
-
-
-def _provider_unavailable_reason(
-    config: ChannelConnectionsConfig,
-    channels_config: dict[str, Any],
-    provider: str,
-) -> str | None:
-    provider_config = _provider_config(config, provider)
-    if not provider_config.enabled:
-        return None
-    if not provider_config.configured:
-        return _runtime_unavailable_reason(provider)
-    if not _runtime_channel_configured(provider, channels_config):
-        return _runtime_unavailable_reason(provider)
-    if _runtime_channel_running(provider) is False:
-        return _runtime_not_running_reason(provider)
-    return None
-
-
-def _provider_status(
-    config: ChannelConnectionsConfig,
-    channels_config: dict[str, Any],
-    provider: str,
-) -> tuple[dict[str, bool], str | None]:
-    declared = config.provider_status(provider)
-    unavailable_reason = _provider_unavailable_reason(config, channels_config, provider)
-    configured = declared["configured"] and _runtime_channel_configured(provider, channels_config)
-    return {"enabled": declared["enabled"], "configured": configured}, unavailable_reason
-
-
-def _new_binding_code() -> str:
-    return secrets.token_urlsafe(16)
-
-
-async def _create_state(
-    repo: ChannelConnectionRepository,
-    *,
-    owner_user_id: str,
-    provider: str,
-) -> str:
-    state = _new_binding_code()
-    await repo.create_oauth_state(
-        owner_user_id=owner_user_id,
-        provider=provider,
-        state=state,
-        expires_at=datetime.now(UTC) + timedelta(seconds=_STATE_TTL_SECONDS),
-    )
-    return state
-
-
-def _connect_instruction(provider: str, code: str) -> str:
-    if provider == "telegram":
-        return f"Send /start {code} to the DeerFlow Telegram bot."
-    meta = _PROVIDER_META.get(provider)
-    if meta is None:
-        raise HTTPException(status_code=404, detail="Unknown channel provider")
-    return f"Send /connect {code} to the DeerFlow {meta['display_name']} bot."
-
-
-def _connect_url(config: ChannelConnectionsConfig, provider: str, code: str) -> str | None:
-    if provider == "telegram":
-        provider_config = _provider_config(config, provider)
-        return f"https://t.me/{provider_config.bot_username}?start={code}"
-    if _PROVIDER_META.get(provider, {}).get("auth_mode") == "binding_code":
-        return None
-    raise HTTPException(status_code=404, detail="Unknown channel provider")
-
-
-def _connection_updated_at(connection: dict[str, Any]) -> datetime:
-    value = connection.get("updated_at")
-    if isinstance(value, datetime):
-        return value if value.tzinfo is not None else value.replace(tzinfo=UTC)
-    if isinstance(value, str) and value:
-        try:
-            return datetime.fromisoformat(value.replace("Z", "+00:00"))
-        except ValueError:
-            pass
-    return datetime.min.replace(tzinfo=UTC)
-
-
-def _newest_connection_by_provider(connections: list[dict[str, Any]]) -> dict[str, dict[str, Any]]:
-    by_provider: dict[str, dict[str, Any]] = {}
-    for item in connections:
-        existing = by_provider.get(item["provider"])
-        if existing is None or _connection_updated_at(item) > _connection_updated_at(existing):
-            by_provider[item["provider"]] = item
-    return by_provider
-
-
-def _credential_fields(provider: str) -> list[ChannelCredentialFieldResponse]:
-    fields = _CREDENTIAL_FIELDS.get(provider)
-    if fields is None:
-        raise HTTPException(status_code=404, detail="Unknown channel provider")
-    return [ChannelCredentialFieldResponse(**field) for field in fields]
-
-
-def _credential_values(provider: str, channels_config: dict[str, Any]) -> dict[str, str]:
-    runtime_config = channels_config.get(provider)
-    if not isinstance(runtime_config, dict):
-        return {}
-
-    values: dict[str, str] = {}
-    for field in _credential_fields(provider):
-        value = str(runtime_config.get(field.name) or "").strip()
-        if not value:
-            continue
-        values[field.name] = _MASKED_CREDENTIAL_VALUE if field.type == "password" else value
-    return values
-
-
-def _provider_response(
-    config: ChannelConnectionsConfig,
-    channels_config: dict[str, Any],
-    provider: str,
-    meta: dict[str, str],
-    connection: dict[str, Any] | None = None,
-) -> ChannelProviderResponse:
-    status, unavailable_reason = _provider_status(config, channels_config, provider)
-    if connection:
-        connection_status = connection["status"]
-    elif status["configured"] and unavailable_reason is None:
-        connection_status = "connected"
-    else:
-        connection_status = "not_connected"
-    credential_values = _credential_values(provider, channels_config)
-    if provider == "telegram" and not credential_values.get("bot_username"):
-        bot_username = str(_provider_config(config, provider).bot_username or "").strip()
-        if bot_username:
-            credential_values["bot_username"] = bot_username
-    return ChannelProviderResponse(
-        provider=provider,
-        display_name=meta["display_name"],
-        enabled=status["enabled"],
-        configured=status["configured"],
-        connectable=status["enabled"] and status["configured"] and unavailable_reason is None,
-        unavailable_reason=unavailable_reason,
-        auth_mode=meta["auth_mode"],
-        connection_status=connection_status,
-        credential_fields=_credential_fields(provider),
-        credential_values=credential_values,
-    )
-
-
-def _required_runtime_values(
-    provider: str,
-    values: dict[str, str],
-    existing_config: dict[str, Any] | None = None,
-) -> dict[str, str]:
-    fields = _credential_fields(provider)
-    cleaned: dict[str, str] = {}
-    missing: list[str] = []
-    existing_config = existing_config or {}
-    for field in fields:
-        raw_value = values.get(field.name, "")
-        if field.type == "password" and raw_value == _MASKED_CREDENTIAL_VALUE:
-            existing_value = str(existing_config.get(field.name) or "").strip()
-            if existing_value:
-                cleaned[field.name] = existing_value
-                continue
-        value = raw_value.strip() if isinstance(raw_value, str) else str(raw_value or "").strip()
-        if field.required and not value:
-            missing.append(field.label)
-        cleaned[field.name] = value
-    if missing:
-        raise HTTPException(status_code=400, detail=f"Missing required channel configuration: {', '.join(missing)}")
-    return cleaned
-
-
-async def _restart_runtime_channel_if_available(provider: str, runtime_config: dict[str, Any]) -> bool | None:
-    try:
-        from app.channels.service import get_channel_service
-    except Exception:
-        logger.exception("Failed to import channel service while configuring %s", provider)
-        return None
-
-    service = get_channel_service()
-    if service is None:
-        return None
-    return await service.configure_channel(provider, runtime_config)
-
-
-async def _sync_runtime_channel_after_removal(provider: str, channels_config: dict[str, Any]) -> bool | None:
-    try:
-        from app.channels.service import get_channel_service
-    except Exception:
-        logger.exception("Failed to import channel service while disconnecting %s", provider)
-        return None
-
-    service = get_channel_service()
-    if service is None:
-        return None
-
-    runtime_config = channels_config.get(provider)
-    if isinstance(runtime_config, dict) and runtime_config.get("enabled", False):
-        return await service.configure_channel(provider, runtime_config)
-    return await service.remove_channel(provider)
-
-
-@router.get("/providers", response_model=ChannelProvidersResponse)
-async def get_channel_providers(request: Request) -> ChannelProvidersResponse:
-    config = _get_channel_connections_config(request)
-    channels_config = _get_channels_config(request)
-    repo = None
-    if config.enabled:
-        try:
-            repo = _get_repository(request, config)
-        except HTTPException as exc:
-            if exc.status_code != 503:
-                raise
-    owner_user_id = _get_user_id(request)
-    connections = await repo.list_connections(owner_user_id) if repo is not None else []
-    by_provider = _newest_connection_by_provider(connections)
-
-    providers: list[ChannelProviderResponse] = []
-    for provider, meta in _PROVIDER_META.items():
-        if not config.provider_status(provider)["enabled"]:
-            continue
-        connection = by_provider.get(provider)
-        providers.append(_provider_response(config, channels_config, provider, meta, connection))
-    return ChannelProvidersResponse(enabled=config.enabled, providers=providers)
-
-
-@router.get("/connections", response_model=ChannelConnectionsResponse)
-async def get_channel_connections(request: Request) -> ChannelConnectionsResponse:
-    config = _get_channel_connections_config(request)
-    if not config.enabled:
-        return ChannelConnectionsResponse(connections=[])
-    repo = _get_repository(request, config)
-    rows = await repo.list_connections(_get_user_id(request))
-    return ChannelConnectionsResponse(connections=[ChannelConnectionResponse(**row) for row in rows])
-
-
-@router.delete("/connections/{connection_id}", status_code=204)
-async def disconnect_channel_connection(connection_id: str, request: Request) -> Response:
-    config = _get_channel_connections_config(request)
-    if not config.enabled:
-        raise HTTPException(status_code=400, detail="Channel connections are disabled")
-
-    repo = _get_repository(request, config)
-    disconnected = await repo.disconnect_connection(
-        connection_id=connection_id,
-        owner_user_id=_get_user_id(request),
-    )
-    if not disconnected:
-        raise HTTPException(status_code=404, detail="Channel connection not found")
-    return Response(status_code=204)
-
-
-@router.delete("/{provider}/runtime-config", response_model=ChannelProviderResponse)
-async def disconnect_channel_provider_runtime(provider: str, request: Request) -> ChannelProviderResponse:
-    config = _get_channel_connections_config(request)
-    if not config.enabled:
-        raise HTTPException(status_code=400, detail="Channel connections are disabled")
-
-    provider_config = _provider_config(config, provider)
-    if not provider_config.enabled:
-        raise HTTPException(status_code=400, detail="Channel provider is not enabled")
-
-    owner_user_id = _get_user_id(request)
-    try:
-        repo = _get_repository(request, config)
-    except HTTPException as exc:
-        if exc.status_code != 503:
-            raise
-        repo = None
-
-    if repo is not None:
-        for connection in await repo.list_connections(owner_user_id):
-            if connection["provider"] == provider and connection["status"] != "revoked":
-                await repo.disconnect_connection(
-                    connection_id=connection["id"],
-                    owner_user_id=owner_user_id,
-                )
-
-    _get_runtime_config_store(request).remove_provider_config(provider)
-    channels_config = _load_channels_config(request, config)
-    request.app.state.channels_config = channels_config
-
-    stopped = await _sync_runtime_channel_after_removal(provider, channels_config)
-    if stopped is False:
-        display_name = _PROVIDER_META[provider]["display_name"]
-        raise HTTPException(status_code=400, detail=f"Failed to stop {display_name} channel. Try again.")
-
-    return _provider_response(config, channels_config, provider, _PROVIDER_META[provider])
-
-
-@router.post("/{provider}/connect", response_model=ChannelConnectResponse)
-async def connect_channel_provider(provider: str, request: Request) -> ChannelConnectResponse:
-    config = _get_channel_connections_config(request)
-    channels_config = _get_channels_config(request)
-    if not config.enabled:
-        raise HTTPException(status_code=400, detail="Channel connections are disabled")
-
-    status, unavailable_reason = _provider_status(config, channels_config, provider)
-    if not status["enabled"]:
-        raise HTTPException(status_code=400, detail="Channel provider is not enabled")
-    if unavailable_reason:
-        raise HTTPException(status_code=400, detail=unavailable_reason)
-    if not status["configured"]:
-        raise HTTPException(status_code=400, detail="Channel provider is not configured")
-
-    repo = _get_repository(request, config)
-    code = await _create_state(
-        repo,
-        owner_user_id=_get_user_id(request),
-        provider=provider,
-    )
-    return ChannelConnectResponse(
-        provider=provider,
-        mode=_PROVIDER_META[provider]["auth_mode"],
-        url=_connect_url(config, provider, code),
-        code=code,
-        instruction=_connect_instruction(provider, code),
-        expires_in=_STATE_TTL_SECONDS,
-    )
-
-
-@router.post("/{provider}/runtime-config", response_model=ChannelProviderResponse)
-async def configure_channel_provider_runtime(
-    provider: str,
-    body: ChannelRuntimeConfigRequest,
-    request: Request,
-) -> ChannelProviderResponse:
-    config = _get_channel_connections_config(request)
-    if not config.enabled:
-        raise HTTPException(status_code=400, detail="Channel connections are disabled")
-
-    provider_config = _provider_config(config, provider)
-    if not provider_config.enabled:
-        raise HTTPException(status_code=400, detail="Channel provider is not enabled")
-
-    channels_config = _get_channels_config(request)
-    existing = channels_config.get(provider)
-    runtime_config = dict(existing) if isinstance(existing, dict) else {}
-    values = _required_runtime_values(provider, body.values, runtime_config)
-    runtime_config["enabled"] = True
-
-    for key in _RUNTIME_REQUIREMENTS[provider]:
-        runtime_config[key] = values[key]
-
-    if provider == "telegram":
-        runtime_config["bot_username"] = values["bot_username"]
-        provider_config.bot_username = values["bot_username"]
-        request.app.state.channel_connections_config = config
-
-    channels_config[provider] = runtime_config
-    request.app.state.channels_config = channels_config
-
-    started = await _restart_runtime_channel_if_available(provider, runtime_config)
-    if started is False:
-        display_name = _PROVIDER_META[provider]["display_name"]
-        raise HTTPException(status_code=400, detail=f"Failed to start {display_name} channel. Check the values and try again.")
-
-    _get_runtime_config_store(request).set_provider_config(provider, runtime_config)
-
-    return _provider_response(config, channels_config, provider, _PROVIDER_META[provider])
@@ -1,10 +1,9 @@
 import json
 import logging
-import os
 from pathlib import Path
 from typing import Literal

-from fastapi import APIRouter, HTTPException, Request, status
+from fastapi import APIRouter, HTTPException
 from pydantic import BaseModel, Field

 from deerflow.config.extensions_config import ExtensionsConfig, get_extensions_config, reload_extensions_config
@@ -13,11 +12,6 @@ logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api", tags=["mcp"])


-_MCP_STDIO_COMMAND_ALLOWLIST_ENV = "DEER_FLOW_MCP_STDIO_COMMAND_ALLOWLIST"
-_DEFAULT_MCP_STDIO_COMMAND_ALLOWLIST = frozenset({"npx", "uvx"})
-_SHELL_METACHARS = frozenset(";|&`$<>\n\r")
-
-
 class McpOAuthConfigResponse(BaseModel):
    """OAuth configuration for an MCP server."""

@@ -72,78 +66,6 @@ class McpConfigUpdateRequest(BaseModel):
 _MASKED_VALUE = "***"


-async def _require_admin_user(request: Request) -> None:
-    """Require the authenticated caller to be an admin user.
-
-    ``AuthMiddleware`` normally stamps ``request.state.user`` before the
-    request reaches this router. Falling back to the strict dependency keeps
-    this route safe even in tests or alternative ASGI compositions that mount
-    the router without the global middleware.
-    """
-    user = getattr(request.state, "user", None)
-    if user is None:
-        from app.gateway.deps import get_current_user_from_request
-
-        user = await get_current_user_from_request(request)
-
-    if getattr(user, "system_role", None) != "admin":
-        raise HTTPException(
-            status_code=status.HTTP_403_FORBIDDEN,
-            detail="Admin privileges required to manage MCP configuration.",
-        )
-
-
-def _allowed_stdio_commands() -> set[str]:
-    """Return executable names allowed for API-managed stdio MCP servers."""
-    raw = os.environ.get(_MCP_STDIO_COMMAND_ALLOWLIST_ENV)
-    base = set(_DEFAULT_MCP_STDIO_COMMAND_ALLOWLIST)
-    if raw is None:
-        return base
-    extra = {item.strip() for item in raw.split(",") if item.strip()}
-    return base | extra
-
-
-def _stdio_command_name(command: str | None, *, server_name: str) -> str:
-    """Normalize and validate a stdio command field from the API boundary."""
-    if command is None or not command.strip():
-        raise HTTPException(
-            status_code=status.HTTP_400_BAD_REQUEST,
-            detail=f"MCP server '{server_name}' with stdio transport requires a command.",
-        )
-
-    stripped = command.strip()
-    has_path_separator = "/" in stripped or "\\" in stripped
-    if stripped != command or has_path_separator or any(ch.isspace() for ch in stripped) or any(ch in stripped for ch in _SHELL_METACHARS):
-        raise HTTPException(
-            status_code=status.HTTP_400_BAD_REQUEST,
-            detail=(f"MCP server '{server_name}' command must be a single executable name; put parameters in args instead."),
-        )
-
-    return stripped
-
-
-def _validate_mcp_update_request(request: McpConfigUpdateRequest) -> None:
-    """Validate API-submitted MCP config before it is persisted.
-
-    Local config files can still express arbitrary advanced setups, but the
-    HTTP API is an untrusted boundary. Restricting stdio commands here reduces
-    the blast radius of a compromised authenticated browser session.
-    """
-    allowed_commands = _allowed_stdio_commands()
-    for name, server in request.mcp_servers.items():
-        transport_type = (server.type or "stdio").lower()
-        if transport_type != "stdio":
-            continue
-
-        command_name = _stdio_command_name(server.command, server_name=name)
-        if command_name not in allowed_commands:
-            allowed = ", ".join(sorted(allowed_commands)) or "<none>"
-            raise HTTPException(
-                status_code=status.HTTP_400_BAD_REQUEST,
-                detail=(f"MCP server '{name}' uses disallowed stdio command '{command_name}'. Allowed commands: {allowed}. Configure {_MCP_STDIO_COMMAND_ALLOWLIST_ENV} to extend this list."),
-            )
-
-
 def _mask_server_config(server: McpServerConfigResponse) -> McpServerConfigResponse:
    """Return a copy of server config with sensitive fields masked.

@@ -240,7 +162,7 @@ def _merge_preserving_secrets(
    summary="Get MCP Configuration",
    description="Retrieve the current Model Context Protocol (MCP) server configurations.",
 )
-async def get_mcp_configuration(request: Request) -> McpConfigResponse:
+async def get_mcp_configuration() -> McpConfigResponse:
    """Get the current MCP configuration.

    Returns:
@@ -261,8 +183,6 @@ async def get_mcp_configuration(request: Request) -> McpConfigResponse:
        }
        ```
    """
-    await _require_admin_user(request)
-
    config = get_extensions_config()

    servers = {name: _mask_server_config(McpServerConfigResponse(**server.model_dump())) for name, server in config.mcp_servers.items()}
@@ -275,7 +195,7 @@ async def get_mcp_configuration(request: Request) -> McpConfigResponse:
    summary="Update MCP Configuration",
    description="Update Model Context Protocol (MCP) server configurations and save to file.",
 )
-async def update_mcp_configuration(request: Request, body: McpConfigUpdateRequest) -> McpConfigResponse:
+async def update_mcp_configuration(request: McpConfigUpdateRequest) -> McpConfigResponse:
    """Update the MCP configuration.

    This will:
@@ -308,9 +228,6 @@ async def update_mcp_configuration(request: Request, body: McpConfigUpdateReques
        ```
    """
    try:
-        await _require_admin_user(request)
-        _validate_mcp_update_request(body)
-
        # Get the current config path (or determine where to save it)
        config_path = ExtensionsConfig.resolve_config_path()

@@ -338,7 +255,7 @@ async def update_mcp_configuration(request: Request, body: McpConfigUpdateReques

        # Merge incoming server configs with raw on-disk secrets
        merged_servers: dict[str, McpServerConfigResponse] = {}
-        for name, incoming in body.mcp_servers.items():
+        for name, incoming in request.mcp_servers.items():
            raw_server = raw_servers.get(name)
            if raw_server is not None:
                merged_servers[name] = _merge_preserving_secrets(
@@ -359,15 +276,14 @@ async def update_mcp_configuration(request: Request, body: McpConfigUpdateReques

        logger.info(f"MCP configuration updated and saved to: {config_path}")

-        # Reload the Gateway configuration and update the global cache. The
-        # agent runtime lives in Gateway, so this keeps API reads and tool
-        # execution aligned after extensions_config.json changes.
+        # NOTE: No need to reload/reset cache here - LangGraph Server (separate process)
+        # will detect config file changes via mtime and reinitialize MCP tools automatically
+
+        # Reload the configuration and update the global cache
        reloaded_config = reload_extensions_config()
        servers = {name: _mask_server_config(McpServerConfigResponse(**server.model_dump())) for name, server in reloaded_config.mcp_servers.items()}
        return McpConfigResponse(mcp_servers=servers)

-    except HTTPException:
-        raise
    except Exception as e:
        logger.error(f"Failed to update MCP configuration: {e}", exc_info=True)
        raise HTTPException(status_code=500, detail=f"Failed to update MCP configuration: {str(e)}")
@@ -98,7 +98,6 @@ class MemoryConfigResponse(BaseModel):
    fact_confidence_threshold: float = Field(..., description="Minimum confidence threshold for facts")
    injection_enabled: bool = Field(..., description="Whether memory injection is enabled")
    max_injection_tokens: int = Field(..., description="Maximum tokens for memory injection")
-    token_counting: str = Field(..., description="Token counting strategy for memory injection ('tiktoken' or 'char')")


 class MemoryStatusResponse(BaseModel):
@@ -311,8 +310,7 @@ async def get_memory_config_endpoint() -> MemoryConfigResponse:
            "max_facts": 100,
            "fact_confidence_threshold": 0.7,
            "injection_enabled": true,
-            "max_injection_tokens": 2000,
-            "token_counting": "tiktoken"
+            "max_injection_tokens": 2000
        }
        ```
    """
@@ -325,7 +323,6 @@ async def get_memory_config_endpoint() -> MemoryConfigResponse:
        fact_confidence_threshold=config.fact_confidence_threshold,
        injection_enabled=config.injection_enabled,
        max_injection_tokens=config.max_injection_tokens,
-        token_counting=config.token_counting,
    )


@@ -354,7 +351,6 @@ async def get_memory_status() -> MemoryStatusResponse:
            fact_confidence_threshold=config.fact_confidence_threshold,
            injection_enabled=config.injection_enabled,
            max_injection_tokens=config.max_injection_tokens,
-            token_counting=config.token_counting,
        ),
        data=MemoryResponse(**memory_data),
    )
@@ -7,6 +7,7 @@ is reused so that conversation history is preserved across calls.

 from __future__ import annotations

+import asyncio
 import logging
 import uuid

@@ -15,9 +16,8 @@ from fastapi.responses import StreamingResponse

 from app.gateway.authz import require_permission
 from app.gateway.deps import get_checkpointer, get_feedback_repo, get_run_event_store, get_run_manager, get_run_store, get_stream_bridge
-from app.gateway.pagination import trim_run_message_page
 from app.gateway.routers.thread_runs import RunCreateRequest
-from app.gateway.services import sse_consumer, start_run, wait_for_run_completion
+from app.gateway.services import sse_consumer, start_run
 from deerflow.runtime import serialize_channel_values

 logger = logging.getLogger(__name__)
@@ -66,15 +66,14 @@ async def stateless_wait(body: RunCreateRequest, request: Request) -> dict:
    Otherwise a new temporary thread is created.
    """
    thread_id = _resolve_thread_id(body)
-    bridge = get_stream_bridge(request)
-    run_mgr = get_run_manager(request)
    record = await start_run(body, thread_id, request)

-    completed = True
    if record.task is not None:
-        completed = await wait_for_run_completion(bridge, record, request, run_mgr)
+        try:
+            await record.task
+        except asyncio.CancelledError:
+            pass

-    if completed:
    checkpointer = get_checkpointer(request)
    config = {"configurable": {"thread_id": thread_id}}
    try:
@@ -130,7 +129,8 @@ async def run_messages(
        before_seq=before_seq,
        after_seq=after_seq,
    )
-    data, has_more = trim_run_message_page(rows, limit=limit, after_seq=after_seq)
+    has_more = len(rows) > limit
+    data = rows[:limit] if has_more else rows
    return {"data": data, "has_more": has_more}


@@ -1,6 +1,5 @@
 import json
 import logging
-import re

 from fastapi import APIRouter, Depends, Request
 from langchain_core.messages import HumanMessage, SystemMessage
@@ -31,31 +30,6 @@ class SuggestionsResponse(BaseModel):
    suggestions: list[str] = Field(default_factory=list, description="Suggested follow-up questions")


-# Matches a complete <think>...</think> block (case-insensitive, spans newlines).
-_THINK_BLOCK_RE = re.compile(r"<think\b[^>]*>.*?</think\s*>", re.IGNORECASE | re.DOTALL)
-# Matches a dangling, unclosed <think> (model truncated at max_tokens mid-thought).
-_OPEN_THINK_RE = re.compile(r"<think\b[^>]*>", re.IGNORECASE)
-
-
-def _strip_think_blocks(text: str) -> str:
-    """Remove reasoning-model ``<think>...</think>`` blocks from the response.
-
-    Reasoning models such as MiniMax-M3 inline their chain-of-thought into the
-    message ``content`` wrapped in ``<think>...</think>`` (``reasoning_split``
-    defaults to false), rather than exposing a separate ``reasoning_content``
-    field. The thinking text frequently contains ``[`` / ``]`` characters, which
-    corrupted the downstream ``find('[')`` / ``rfind(']')`` JSON extraction and
-    produced empty suggestions. We strip the reasoning before parsing so only
-    the actual answer remains.
-    """
-    text = _THINK_BLOCK_RE.sub("", text)
-    # Drop any unclosed <think> (and everything after it) left by truncation.
-    open_match = _OPEN_THINK_RE.search(text)
-    if open_match:
-        text = text[: open_match.start()]
-    return text.strip()
-
-
 def _strip_markdown_code_fence(text: str) -> str:
    stripped = text.strip()
    if not stripped.startswith("```"):
@@ -67,8 +41,7 @@ def _strip_markdown_code_fence(text: str) -> str:


 def _parse_json_string_list(text: str) -> list[str] | None:
-    candidate = _strip_think_blocks(text)
-    candidate = _strip_markdown_code_fence(candidate)
+    candidate = _strip_markdown_code_fence(text)
    start = candidate.find("[")
    end = candidate.rfind("]")
    if start == -1 or end == -1 or end <= start:
@@ -21,8 +21,7 @@ from pydantic import BaseModel, Field

 from app.gateway.authz import require_permission
 from app.gateway.deps import get_checkpointer, get_current_user, get_feedback_repo, get_run_event_store, get_run_manager, get_run_store, get_stream_bridge
-from app.gateway.pagination import trim_run_message_page
-from app.gateway.services import sse_consumer, start_run, wait_for_run_completion
+from app.gateway.services import sse_consumer, start_run
 from deerflow.runtime import RunRecord, RunStatus, serialize_channel_values

 logger = logging.getLogger(__name__)
@@ -176,15 +175,14 @@ async def stream_run(thread_id: str, body: RunCreateRequest, request: Request) -
@require_permission("runs", "create", owner_check=True, require_existing=True)
 async def wait_run(thread_id: str, body: RunCreateRequest, request: Request) -> dict:
    """Create a run and block until it completes, returning the final state."""
-    bridge = get_stream_bridge(request)
-    run_mgr = get_run_manager(request)
    record = await start_run(body, thread_id, request)

-    completed = True
    if record.task is not None:
-        completed = await wait_for_run_completion(bridge, record, request, run_mgr)
+        try:
+            await record.task
+        except asyncio.CancelledError:
+            pass

-    if completed:
    checkpointer = get_checkpointer(request)
    config = {"configurable": {"thread_id": thread_id}}
    try:
@@ -279,12 +277,7 @@ async def join_run(thread_id: str, run_id: str, request: Request) -> StreamingRe
    )


-# Register GET and POST as separate routes so each method gets a unique OpenAPI
-# operationId. ``api_route(methods=["GET", "POST"])`` shares one route registration
-# across both methods, which makes FastAPI emit the same ``operationId`` twice and
-# warn about a duplicate operation id during OpenAPI generation.
-@router.get("/{thread_id}/runs/{run_id}/stream", response_model=None)
-@router.post("/{thread_id}/runs/{run_id}/stream", response_model=None)
+@router.api_route("/{thread_id}/runs/{run_id}/stream", methods=["GET", "POST"], response_model=None)
@require_permission("runs", "read", owner_check=True)
 async def stream_existing_run(
    thread_id: str,
@@ -403,7 +396,8 @@ async def list_run_messages(
        before_seq=before_seq,
        after_seq=after_seq,
    )
-    data, has_more = trim_run_message_page(rows, limit=limit, after_seq=after_seq)
+    has_more = len(rows) > limit
+    data = rows[:limit] if has_more else rows
    return {"data": data, "has_more": has_more}


@@ -17,12 +17,11 @@ import uuid
 from typing import Any

 from fastapi import APIRouter, HTTPException, Request
-from langgraph.checkpoint.base import empty_checkpoint, uuid6
+from langgraph.checkpoint.base import empty_checkpoint
 from pydantic import BaseModel, Field, field_validator

 from app.gateway.authz import require_permission
 from app.gateway.deps import get_checkpointer
-from app.gateway.internal_auth import get_trusted_internal_owner_user_id
 from app.gateway.utils import sanitize_log_param
 from deerflow.config.paths import Paths, get_paths
 from deerflow.runtime import serialize_channel_values
@@ -258,19 +257,11 @@ async def create_thread(body: ThreadCreateRequest, request: Request) -> ThreadRe
    thread_store = get_thread_store(request)
    thread_id = body.thread_id or str(uuid.uuid4())
    now = now_iso()
-    thread_owner_user_id = get_trusted_internal_owner_user_id(request)
-    thread_owner_kwargs = {"user_id": thread_owner_user_id} if thread_owner_user_id else {}
    # ``body.metadata`` is already stripped of server-reserved keys by
    # ``ThreadCreateRequest._strip_reserved`` — see the model definition.

    # Idempotency: return existing record when already present
-    existing_record = await thread_store.get(thread_id, **thread_owner_kwargs)
-    if existing_record is None and thread_owner_user_id:
-        unscoped_record = await thread_store.get(thread_id, user_id=None)
-        if unscoped_record is not None:
-            if unscoped_record.get("user_id") != thread_owner_user_id:
-                await thread_store.update_owner(thread_id, thread_owner_user_id, user_id=None)
-            existing_record = await thread_store.get(thread_id, **thread_owner_kwargs)
+    existing_record = await thread_store.get(thread_id)
    if existing_record is not None:
        return ThreadResponse(
            thread_id=thread_id,
@@ -285,7 +276,6 @@ async def create_thread(body: ThreadCreateRequest, request: Request) -> ThreadRe
        await thread_store.create(
            thread_id,
            assistant_id=getattr(body, "assistant_id", None),
-            **thread_owner_kwargs,
            metadata=body.metadata,
        )
    except Exception:
@@ -546,21 +536,9 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re
        metadata["step"] = metadata.get("step", 0) + 1
        metadata["writes"] = {body.as_node: body.values}

-    # Assign a new checkpoint ID so aput performs an INSERT rather than an
-    # in-place REPLACE of the existing row.  Use uuid6 (time-ordered) rather
-    # than uuid4 (random) so the new ID is always lexicographically greater
-    # than the previous one — LangGraph's checkpointers determine the "latest"
-    # checkpoint by max(checkpoint_ids) string order, matching the uuid6 epoch.
-    checkpoint["id"] = str(uuid6())
-
    # aput requires checkpoint_ns in the config — use the same config used for the
-    # read (which always includes checkpoint_ns=""). The fresh checkpoint ID is
-    # assigned above via checkpoint["id"]; keep checkpoint_id out of the config so
-    # the write is keyed by the new checkpoint payload rather than the prior read.
-    # All supported savers (InMemorySaver, AsyncSqliteSaver, AsyncPostgresSaver)
-    # persist and echo back checkpoint["id"] verbatim — none mint their own — so
-    # the new_config below carries the uuid6 we assigned here. (Regression-locked
-    # by test_update_thread_state_inserts_new_checkpoint_each_call.)
+    # read (which always includes checkpoint_ns="").  Do NOT include checkpoint_id
+    # so that aput generates a fresh checkpoint ID for the new snapshot.
    write_config: dict[str, Any] = {
        "configurable": {
            "thread_id": thread_id,
@@ -579,7 +557,7 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re

    # Sync title changes through the ThreadMetaStore abstraction so /threads/search
    # reflects them immediately in both sqlite and memory backends.
-    if thread_store and body.values and "title" in body.values:
+    if body.values and "title" in body.values:
        new_title = body.values["title"]
        if new_title:  # Skip empty strings and None
            try:
@@ -39,39 +39,15 @@ DEFAULT_MAX_FILE_SIZE = 50 * 1024 * 1024
 DEFAULT_MAX_TOTAL_SIZE = 100 * 1024 * 1024


-class UploadedFileInfo(BaseModel):
-    """Uploaded file metadata exposed by upload and list APIs."""
-
-    filename: str
-    size: int
-    path: str
-    virtual_path: str
-    artifact_url: str
-    extension: str | None = None
-    modified: float | None = None
-    original_filename: str | None = None
-    markdown_file: str | None = None
-    markdown_path: str | None = None
-    markdown_virtual_path: str | None = None
-    markdown_artifact_url: str | None = None
-
-
 class UploadResponse(BaseModel):
    """Response model for file upload."""

    success: bool
-    files: list[UploadedFileInfo]
+    files: list[dict[str, str]]
    message: str
    skipped_files: list[str] = Field(default_factory=list)


-class UploadListResponse(BaseModel):
-    """Response model for uploaded file listing."""
-
-    files: list[UploadedFileInfo]
-    count: int
-
-
 class UploadLimits(BaseModel):
    """Application-level upload limits exposed to clients."""

@@ -280,7 +256,7 @@ async def upload_files(

            file_info = {
                "filename": safe_filename,
-                "size": file_size,
+                "size": str(file_size),
                "path": str(sandbox_uploads / safe_filename),
                "virtual_path": virtual_path,
                "artifact_url": upload_artifact_url(thread_id, safe_filename),
@@ -357,9 +333,9 @@ async def get_upload_limits(
    return _get_upload_limits(config)


-@router.get("/list", response_model=UploadListResponse)
+@router.get("/list", response_model=dict)
@require_permission("threads", "read", owner_check=True)
-async def list_uploaded_files(thread_id: str, request: Request) -> UploadListResponse:
+async def list_uploaded_files(thread_id: str, request: Request) -> dict:
    """List all files in a thread's uploads directory."""
    try:
        uploads_dir = get_uploads_dir(thread_id)
@@ -373,7 +349,7 @@ async def list_uploaded_files(thread_id: str, request: Request) -> UploadListRes
    for f in result["files"]:
        f["path"] = str(sandbox_uploads / f["filename"])

-    return UploadListResponse(**result)
+    return result


@router.delete("/{filename}")
@@ -12,7 +12,6 @@ import json
 import logging
 import re
 from collections.abc import Mapping
-from types import SimpleNamespace
 from typing import Any

 from fastapi import HTTPException, Request
@@ -20,7 +19,6 @@ from langchain_core.messages import BaseMessage
 from langchain_core.messages.utils import convert_to_messages

 from app.gateway.deps import get_run_context, get_run_manager, get_stream_bridge
-from app.gateway.internal_auth import INTERNAL_SYSTEM_ROLE, get_trusted_internal_owner_user_id
 from app.gateway.utils import sanitize_log_param
 from deerflow.config.app_config import get_app_config
 from deerflow.runtime import (
@@ -36,7 +34,6 @@ from deerflow.runtime import (
    run_agent,
 )
 from deerflow.runtime.runs.naming import resolve_root_run_name
-from deerflow.runtime.user_context import reset_current_user, set_current_user

 logger = logging.getLogger(__name__)

@@ -143,14 +140,7 @@ def merge_run_context_overrides(config: dict[str, Any], context: Mapping[str, An
    """Merge whitelisted keys from ``body.context`` into both ``config['configurable']``
    and ``config['context']`` so they are visible to legacy configurable readers and
    to LangGraph ``ToolRuntime.context`` consumers (e.g. the ``setup_agent`` tool —
-    see issue #2677).
-
-    ``user_id`` is intentionally propagated into ``config['context']`` in addition to
-    the whitelisted keys, so non-web callers (e.g. IM channels) that supply identity in
-    ``body.context`` keep it on ``ToolRuntime.context``. It is merged with
-    ``setdefault`` so a server-authenticated id stamped by
-    :func:`inject_authenticated_user_context` always wins over the client-supplied one.
-    """
+    see issue #2677)."""
    if not context:
        return
    configurable = config.setdefault("configurable", {})
@@ -161,8 +151,6 @@ def merge_run_context_overrides(config: dict[str, Any], context: Mapping[str, An
                configurable.setdefault(key, context[key])
            if isinstance(runtime_context, dict):
                runtime_context.setdefault(key, context[key])
-    if "user_id" in context and isinstance(runtime_context, dict):
-        runtime_context.setdefault("user_id", context["user_id"])


 def inject_authenticated_user_context(config: dict[str, Any], request: Request) -> None:
@@ -178,9 +166,6 @@ def inject_authenticated_user_context(config: dict[str, Any], request: Request)
    if user_id is None:
        return

-    if getattr(user, "system_role", None) == INTERNAL_SYSTEM_ROLE:
-        return
-
    runtime_context = config.setdefault("context", {})
    if isinstance(runtime_context, dict):
        runtime_context["user_id"] = str(user_id)
@@ -317,24 +302,6 @@ async def start_run(
                detail=f"Model {model_name!r} is not in the configured model allowlist",
            )

-    owner_user_id = get_trusted_internal_owner_user_id(request)
-    # Stateless run endpoints carry thread_id in the request *body*, so the
-    # @require_permission(owner_check=True) decorator -- which resolves ownership
-    # from the path param -- cannot protect them. Enforce thread ownership here,
-    # before any run is created, so one user cannot start runs on (or read /wait
-    # checkpoint state from) another user's thread. Missing rows (auto-created
-    # temp threads) and NULL-owner rows (shared / pre-auth data) stay accessible
-    # via check_access; only a thread already owned by another user is rejected
-    # with 404, matching thread_runs.py's anti-enumeration behaviour. Internal
-    # channel runs act on behalf of IM users they do not own (see
-    # inject_authenticated_user_context), so the internal system role is exempt.
-    user = getattr(request.state, "user", None)
-    if user is not None and getattr(user, "system_role", None) != INTERNAL_SYSTEM_ROLE:
-        if not await run_ctx.thread_store.check_access(thread_id, str(user.id)):
-            raise HTTPException(status_code=404, detail=f"Thread {thread_id} not found")
-
-    owner_context_token = set_current_user(SimpleNamespace(id=owner_user_id)) if owner_user_id else None
-    try:
    try:
        record = await run_mgr.create_or_reject(
            thread_id,
@@ -344,7 +311,6 @@ async def start_run(
            kwargs={"input": body.input, "config": body.config},
            multitask_strategy=body.multitask_strategy,
            model_name=model_name,
-                user_id=owner_user_id,
        )
    except ConflictError as exc:
        raise HTTPException(status_code=409, detail=str(exc)) from exc
@@ -356,12 +322,6 @@ async def start_run(
    # (e.g. stateless runs).
    try:
        existing = await run_ctx.thread_store.get(thread_id)
-            if existing is None and owner_user_id:
-                unscoped_existing = await run_ctx.thread_store.get(thread_id, user_id=None)
-                if unscoped_existing is not None:
-                    if unscoped_existing.get("user_id") != owner_user_id:
-                        await run_ctx.thread_store.update_owner(thread_id, owner_user_id, user_id=None)
-                    existing = await run_ctx.thread_store.get(thread_id)
        if existing is None:
            await run_ctx.thread_store.create(
                thread_id,
@@ -408,9 +368,6 @@ async def start_run(
    # after the run completes.

    return record
-    finally:
-        if owner_context_token is not None:
-            reset_current_user(owner_context_token)


 async def sse_consumer(
@@ -445,51 +402,3 @@ async def sse_consumer(
        if record.status in (RunStatus.pending, RunStatus.running):
            if record.on_disconnect == DisconnectMode.cancel:
                await run_mgr.cancel(record.run_id)
-
-
-async def wait_for_run_completion(
-    bridge: StreamBridge,
-    record: RunRecord,
-    request: Request,
-    run_mgr: RunManager,
-) -> bool:
-    """Block until the run publishes ``END_SENTINEL``, honouring on_disconnect.
-
-    The non-streaming ``/wait`` endpoints used to ``await record.task``
-    directly with no disconnect handling.  When the client (or an
-    intermediate HTTP proxy) timed out during a long tool call such as
-    ``pip install``, the handler would swallow ``CancelledError`` and
-    serialize whatever checkpoint happened to exist — masking a half-finished
-    run as a normal completion (issue #3265).
-
-    This helper consumes the same bridge that ``sse_consumer`` does so the
-    wait path shares its disconnect semantics: each wake-up polls
-    ``request.is_disconnected()``; on a real disconnect it cancels the
-    background run when ``record.on_disconnect`` is ``cancel``.  The bridge's
-    heartbeat sentinels guarantee at least one wake-up per
-    ``heartbeat_interval`` even when the agent emits no events for a while.
-
-    Returns:
-        ``True`` when ``END_SENTINEL`` was observed (run reached a terminal
-        state), ``False`` when the loop exited because the client
-        disconnected.  Callers must skip checkpoint serialization on
-        ``False`` so a partial checkpoint is not returned as a normal
-        response.
-    """
-    completed = False
-    try:
-        async for entry in bridge.subscribe(record.run_id):
-            # END_SENTINEL means the run reached a terminal state; honour it
-            # even if the client just disconnected so the caller still serializes
-            # the real final checkpoint.
-            if entry is END_SENTINEL:
-                completed = True
-                return True
-            if await request.is_disconnected():
-                break
-            # Heartbeats and regular events: keep waiting for END_SENTINEL.
-        return completed
-    finally:
-        if not completed and record.status in (RunStatus.pending, RunStatus.running):
-            if record.on_disconnect == DisconnectMode.cancel:
-                await run_mgr.cancel(record.run_id)
@@ -228,13 +228,10 @@ Get current MCP server configurations.
 GET /api/mcp/config
 ```

-Requires an authenticated admin session. Sensitive env/header/OAuth secret
-values are masked in the response.
-
 **Response:**
 ```json
 {
-  "mcp_servers": {
+  "mcpServers": {
    "github": {
      "enabled": true,
      "type": "stdio",
@@ -258,15 +255,10 @@ PUT /api/mcp/config
 Content-Type: application/json
 ```

-Requires an authenticated admin session. API-managed `stdio` MCP servers may
-only use allowed executable names for `command` (default: `npx`, `uvx`). Set
-`DEER_FLOW_MCP_STDIO_COMMAND_ALLOWLIST` to a comma-separated list when a
-deployment needs additional trusted launchers.
-
 **Request Body:**
 ```json
 {
-  "mcp_servers": {
+  "mcpServers": {
    "github": {
      "enabled": true,
      "type": "stdio",
@@ -284,18 +276,8 @@ deployment needs additional trusted launchers.
 **Response:**
 ```json
 {
-  "mcp_servers": {
-    "github": {
-      "enabled": true,
-      "type": "stdio",
-      "command": "npx",
-      "args": ["-y", "@modelcontextprotocol/server-github"],
-      "env": {
-        "GITHUB_TOKEN": "***"
-      },
-      "description": "GitHub operations"
-    }
-  }
+  "success": true,
+  "message": "MCP configuration updated"
 }
 ```

@@ -29,7 +29,7 @@ All other test plan sections were executed against either:
 | TC-DOCKER-03 | Per-worker rate limiter divergence | Confirms in-process `_login_attempts` dict doesn't share state across `gunicorn` workers (4 by default in the compose file); known limitation, documented | needs multi-worker container |
 | TC-DOCKER-04 | IM channels use internal Gateway auth | Verify Feishu/Slack/Telegram dispatchers attach the process-local internal auth header plus CSRF cookie/header when calling Gateway-compatible LangGraph APIs | needs `docker logs` |
 | TC-DOCKER-05 | Reset credentials surfacing | `reset_admin` writes a 0600 credential file in `DEER_FLOW_HOME` instead of logging plaintext. The file-based behavior is validated by non-Docker reset tests, so the only Docker-specific gap is verifying the volume mount carries the file out to the host | needs container + host volume |
-| TC-DOCKER-06 | Docker deploy uses Gateway embedded runtime | `./scripts/deploy.sh` produces a Gateway + frontend + nginx topology (no `langgraph` container); same auth flow as local `make dev` | needs `docker compose up` |
+| TC-DOCKER-06 | Gateway-mode Docker deploy | `./scripts/deploy.sh --gateway` produces a 3-container topology (no `langgraph` container); same auth flow as standard mode | needs `docker compose --profile gateway` |

 ## Coverage already provided by non-Docker tests

@@ -43,7 +43,7 @@ the test cases that ran on sg_dev or local:
 | TC-DOCKER-03 (per-worker rate limit) | TC-GW-04 + TC-REENT-09 (single-worker rate limit + 5min expiry). The cross-worker divergence is an architectural property of the in-memory dict; no auth code path differs |
 | TC-DOCKER-04 (IM channels use internal auth) | Code-level: `app/channels/manager.py` creates the `langgraph_sdk` client with `create_internal_auth_headers()` plus CSRF cookie/header, so channel workers do not rely on browser cookies |
 | TC-DOCKER-05 (credential surfacing) | `reset_admin` writes `.deer-flow/admin_initial_credentials.txt` with mode 0600 and logs only the path — the only Docker-unique step is whether the bind mount projects this path onto the host, which is a `docker compose` config check, not a runtime behavior change |
-| TC-DOCKER-06 (Gateway embedded runtime container) | Section 七 7.2 covered by TC-GW-01..05 + Section 二 (Gateway auth flow on sg_dev) — same Gateway code, container is just a packaging change |
+| TC-DOCKER-06 (gateway-mode container) | Section 七 7.2 covered by TC-GW-01..05 + Section 二 (gateway-mode auth flow on sg_dev) — same Gateway code, container is just a packaging change |

 ## Reproduction steps when Docker becomes available

@@ -4,12 +4,10 @@

 | 模式 | 启动命令 | Auth 层 | 端口 |
 |------|---------|---------|------|
-| 标准模式 | `make dev` | Gateway AuthMiddleware（全量） | 2026 (nginx) |
+| 标准模式 | `make dev` | Gateway AuthMiddleware + LangGraph auth | 2026 (nginx) |
+| Gateway 模式 | `make dev-pro` | Gateway AuthMiddleware（全量） | 2026 (nginx) |
 | 直连 Gateway | `cd backend && make gateway` | Gateway AuthMiddleware | 8001 |
-| 直连 LangGraph 兼容性 | 手动运行 LangGraph 工具链时使用 | LangGraph auth | 2024 |
-
-`make dev`、Docker dev 和生产部署默认都运行 Gateway embedded runtime。
-`app.gateway.langgraph_auth` 仅用于保留的直连 LangGraph 工具链 / Studio 兼容性测试，不是标准服务启动路径。
+| 直连 LangGraph | `cd backend && make dev` | LangGraph auth | 2024 |

 每种模式下都需执行以下测试。

@@ -23,8 +21,10 @@
 # 清除已有数据
 rm -f backend/.deer-flow/data/deerflow.db

-# 启动标准模式（Gateway embedded runtime）
-make dev
+# 选择模式启动
+make dev          # 标准模式
+# 或
+make dev-pro      # Gateway 模式
 ```

 **验证点：**
@@ -57,7 +57,7 @@ make dev

 ## 二、接口流程测试

-> 以下用 `BASE=http://localhost:2026` 为例。标准模式经 nginx 暴露此地址。
+> 以下用 `BASE=http://localhost:2026` 为例。标准模式和 Gateway 模式都用此地址。
 > 直连测试替换为对应端口。
 >
 > **CSRF token 提取**：多处用到从 cookie jar 提取 CSRF token，统一使用：
@@ -211,18 +211,20 @@ curl -s -X POST $BASE/api/threads/search \

 **预期：** 返回 0 或仅包含 user2 自己的 thread

-### 2.3 LangGraph-compatible Gateway 路由隔离
+### 2.3 标准模式 LangGraph Server 隔离

-#### TC-API-10: LangGraph-compatible 端点需要 cookie
+> 仅在标准模式下测试。Gateway 模式不跑 LangGraph Server。
+
+#### TC-API-10: LangGraph 端点需要 cookie

 ```bash
-# 不带 cookie 访问 LangGraph-compatible 接口
+# 不带 cookie 访问 LangGraph 接口
 curl -s -w "%{http_code}" $BASE/api/langgraph/threads
 ```

 **预期：** 401

-#### TC-API-11: LangGraph-compatible 路由带 cookie 可访问
+#### TC-API-11: LangGraph 带 cookie 可访问

 ```bash
 curl -s $BASE/api/langgraph/threads -b user1.txt | jq length
@@ -230,10 +232,10 @@ curl -s $BASE/api/langgraph/threads -b user1.txt | jq length

 **预期：** 200，返回 user1 的 thread 列表

-#### TC-API-12: LangGraph-compatible 路由隔离 — 用户只看到自己的
+#### TC-API-12: LangGraph 隔离 — 用户只看到自己的

 ```bash
-# user2 查 threads
+# user2 查 LangGraph threads
 curl -s $BASE/api/langgraph/threads -b user2.txt | jq length
 ```

@@ -1232,11 +1234,21 @@ P2=$(awk -F': ' '/^password:/ {print $2}' /tmp/deerflow-reset-p2.txt)
 ## 七、模式差异测试

 > 以下用 `GW=http://localhost:8001` 表示直连 Gateway，`BASE=http://localhost:2026` 表示经 nginx。
-> 标准启动命令：`make dev`（或 `./scripts/serve.sh --dev`）。
+> Gateway 模式启动命令：`make dev-pro`（或 `./scripts/serve.sh --dev --gateway`）。

-### 7.1 标准启动模式
+### 7.1 标准模式独有

-#### TC-MODE-01: Gateway AuthMiddleware 的 token_version 检查
+> 启动命令：`make dev`（或 `./scripts/serve.sh --dev`）
+
+#### TC-MODE-01: LangGraph Server 独立运行，需 cookie
+
+```bash
+# 无 cookie 访问 LangGraph
+curl -s -w "%{http_code}" -o /dev/null $BASE/api/langgraph/threads/search
+# 预期: 403（LangGraph auth handler 拒绝）
+```
+
+#### TC-MODE-02: LangGraph auth 的 token_version 检查

 ```bash
 # 登录拿 cookie
@@ -1249,9 +1261,9 @@ curl -s -X POST $BASE/api/v1/auth/change-password \
  -b cookies.txt -H "Content-Type: application/json" -H "X-CSRF-Token: $CSRF" \
  -d '{"current_password":"正确密码","new_password":"NewPass1!"}' -c new_cookies.txt

-# 用旧 cookie 访问 LangGraph-compatible 路由
+# 用旧 cookie 访问 LangGraph
 curl -s -w "%{http_code}" $BASE/api/langgraph/threads/search -b cookies.txt
-# 预期: 401（token_version 不匹配）
+# 预期: 403（token_version 不匹配）

 # 用新 cookie 访问
 CSRF2=$(grep csrf_token new_cookies.txt | awk '{print $NF}')
@@ -1260,7 +1272,7 @@ curl -s -w "%{http_code}" -X POST $BASE/api/langgraph/threads/search \
 # 预期: 200
 ```

-#### TC-MODE-02: Gateway owner filter 隔离
+#### TC-MODE-03: LangGraph auth 的 owner filter 隔离

 ```bash
 # user1 创建 thread
@@ -1285,9 +1297,18 @@ print('OK: user2 sees', len(threads), 'threads, none belong to user1')
 "
 ```

-#### TC-MODE-03: 所有请求经 AuthMiddleware
+### 7.2 Gateway 模式独有
+
+> 启动命令：`make dev-pro`（或 `./scripts/serve.sh --dev --gateway`）
+> 无 LangGraph Server 进程，agent runtime 嵌入 Gateway。
+
+#### TC-MODE-04: 所有请求经 AuthMiddleware

 ```bash
+# 确认 LangGraph Server 未运行
+curl -s -w "%{http_code}" -o /dev/null http://localhost:2024/ok
+# 预期: 000（连接被拒）
+
 # Gateway API 受保护
 curl -s -w "%{http_code}" -o /dev/null $BASE/api/models
 # 预期: 401
@@ -1298,7 +1319,7 @@ curl -s -w "%{http_code}" -o /dev/null -X POST $BASE/api/langgraph/threads/searc
 # 预期: 401
 ```

-#### TC-MODE-04: 标准模式下完整 auth 流程
+#### TC-MODE-05: Gateway 模式下完整 auth 流程

 ```bash
 # 登录
@@ -1313,7 +1334,7 @@ curl -s -X POST $BASE/api/langgraph/threads \
  -d '{"metadata":{}}' | python3 -c "import sys,json; print(json.load(sys.stdin)['thread_id'])"
 # 预期: 返回 thread_id

-# CSRF 保护（CSRFMiddleware 覆盖所有 Gateway 路由）
+# CSRF 保护（Gateway 模式下 CSRFMiddleware 直接覆盖所有路由）
 curl -s -w "%{http_code}" -o /dev/null -X POST $BASE/api/langgraph/threads \
  -b cookies.txt -H "Content-Type: application/json" -d '{"metadata":{}}'
 # 预期: 403（CSRF token missing）
@@ -1412,7 +1433,7 @@ done

 ### 7.4 Docker 部署

-> 启动命令：`./scripts/deploy.sh`
+> 启动命令：`./scripts/deploy.sh`（标准）或 `./scripts/deploy.sh --gateway`（Gateway 模式）
 > Docker Compose 文件：`docker/docker-compose.yaml`
 >
 > 前置条件：
@@ -1521,16 +1542,16 @@ docker logs deer-flow-gateway 2>&1 | grep -iE "Password: .{15,}" && echo "FAIL:
 - 容器日志输出**路径**（不是密码本身），符合 CodeQL `py/clear-text-logging-sensitive-data` 规则
 - `grep "Password:"` 在日志中**应当无匹配**（旧行为已废弃，simplify pass 移除了日志泄露路径）

-#### TC-DOCKER-06: Docker 部署
+#### TC-DOCKER-06: Gateway 模式 Docker 部署

 ```bash
-# 标准 Docker 模式：runtime 嵌入 gateway 容器
-./scripts/deploy.sh
+# Gateway 模式：无 langgraph 容器
+./scripts/deploy.sh --gateway
 sleep 15

-# 确认 gateway 容器存在
-docker ps --filter name=deer-flow-gateway --format '{{.Names}}'
-# 预期: deer-flow-gateway
+# 确认 langgraph 容器不存在
+docker ps --filter name=deer-flow-langgraph --format '{{.Names}}' | wc -l
+# 预期: 0

 # auth 流程正常：未登录受保护接口返回 401
 curl -s -w "%{http_code}" -o /dev/null $BASE/api/models
@@ -124,8 +124,8 @@ python -c "import secrets; print(secrets.token_urlsafe(32))"

 ## 兼容性

- **本地开发**（`make dev`）：Gateway embedded runtime 完全兼容；无 admin 时访问 `/setup` 初始化
- **Gateway embedded runtime**：标准脚本、Docker dev 和生产部署均通过 Gateway 提供认证与 LangGraph-compatible API
+- **标准模式**（`make dev`）：完全兼容；无 admin 时访问 `/setup` 初始化
+- **Gateway 模式**（`make dev-pro`）：完全兼容
 - **Docker 部署**：完全兼容，`.deer-flow/data/deerflow.db` 需持久化卷挂载
 - **IM 渠道**（Feishu/Slack/Telegram）：通过 Gateway 内部认证通信，使用 `default` 用户桶
 - **DeerFlowClient**（嵌入式）：不经过 HTTP，不受认证影响
@@ -1,154 +0,0 @@
-# Blocking IO detection usage and maintenance
-
-This document describes how to use and maintain DeerFlow backend blocking-IO
-detection for async event-loop safety.
-
-The goal is narrow: find and prevent synchronous IO from blocking backend
-async event-loop paths. Static and runtime detection are complementary, but
-they have different jobs.
-
-## Static detector
-
-The static detector is the discovery tool. It scans backend source code and
-reports candidate blocking-IO call sites that may need human review.
-
-Run it from the repository root:
-
-```bash
-make detect-blocking-io
-```
-
-Or from `backend/`:
-
-```bash
-make detect-blocking-io
-```
-
-The report is written to:
-
-```text
-.deer-flow/blocking-io-findings.json
-```
-
-Use this output for review and triage. A static finding is a candidate, not
-proof that production blocks the event loop at runtime. The current static
-rules are intentionally broad; prefer triaging existing output before adding
-new static rules.
-
-Add a static rule only when review finds a recurring high-risk blocking
-pattern that is invisible to the current detector.
-
-## Runtime detector
-
-The runtime detector is the CI regression guard. It uses Blockbuster to fail a
-focused test when code under `app.*` or `deerflow.*` performs blocking IO on
-the asyncio event-loop thread.
-
-Run it from `backend/`:
-
-```bash
-make test-blocking-io
-```
-
-The runtime gate starts from confirmed production bugs and protects those
-paths from regressing. It does not prove that the entire backend is free of
-blocking IO; it only covers the production paths exercised by
-`backend/tests/blocking_io/`.
-
-## Maintenance workflow
-
-Use the static detector to find candidates, then use review to decide which
-async production paths are worth protecting in CI.
-
-The normal workflow is:
-
-1. Run the static detector to find backend blocking-IO candidates.
-2. Use human review to pick high-risk production async paths.
-3. Add or update a focused runtime anchor in `backend/tests/blocking_io/`.
-4. Let CI prevent that path from regressing.
-
-Runtime detection has two maintenance paths.
-
-### Add a runtime rule
-
-Add a runtime rule when Blockbuster's default rules do not cover a generic
-blocking primitive used by production code.
-
-Rules belong in:
-
-```text
-backend/tests/support/detectors/blocking_io_runtime.py
-```
-
-Add them to `_PROJECT_BLOCKING_RULES`, not directly inside individual tests.
-Keeping rules centralized makes it clear which extra primitives DeerFlow
-expects Blockbuster to catch.
-
-Example shape:
-
-```python
-import subprocess
-
-from blockbuster import BlockBusterFunction
-
-_PROJECT_BLOCKING_RULES = (
-    (
-        "subprocess.Popen.__init__",
-        BlockBusterFunction(
-            subprocess.Popen,
-            "__init__",
-            scanned_modules=["app", "deerflow"],
-        ),
-    ),
-)
-```
-
-Do not add a runtime rule just because a business path is not tested. A rule
-only expands what Blockbuster can intercept after code runs.
-
-### Add a runtime anchor
-
-Add a runtime anchor when a high-risk async production path should be protected
-by CI but no existing `backend/tests/blocking_io/` test executes it.
-
-Anchors belong in:
-
-```text
-backend/tests/blocking_io/
-```
-
-A good anchor should:
-
- Call the real production async entry point.
- Avoid bypassing the blocking surface with test-only `asyncio.to_thread`
-  wrappers.
- Use real local filesystem inputs when the bug shape is filesystem IO.
- Mock only the external dependency boundary, such as a network service or
-  third-party saver class.
- Fail if a future change moves the blocking operation back onto the event
-  loop.
-
-Avoid testing only the low-level helper unless that helper is the production
-async entry point. The runtime gate is most useful when it protects the caller
-that production actually executes.
-
-## Current runtime coverage
-
-The runtime anchors protect confirmed blocking-IO bug shapes:
-
- SQLite checkpointer setup, including path resolution and parent-directory
-  creation.
- Subagent skill metadata loading through `SubagentExecutor._load_skills()`.
- `JsonlRunEventStore` async API (`put` / `list_*` / `delete_*`): the JSONL
-  run-event backend offloads its synchronous file IO via `asyncio.to_thread`
-  (fix #3084); this anchor drives the real async API under the gate so any
-  blocking IO reintroduced on the loop fails, not only removal of one
-  `to_thread` call.
- `UploadsMiddleware.before_agent` uploads-directory scan: a sync-only middleware
-  hook runs on the event loop under async graph execution, so the scan is
-  offloaded via `abefore_agent` + `run_in_executor`.
- Gate health checks: Blockbuster catches unoffloaded calls, opt-out works, and
-  patches are restored after exceptions.
-
-As static detection and review identify more high-risk async paths, add new
-runtime anchors incrementally.
@@ -36,7 +36,6 @@ models:
 - OpenAI (`langchain_openai:ChatOpenAI`)
 - Anthropic (`langchain_anthropic:ChatAnthropic`)
 - DeepSeek (`langchain_deepseek:ChatDeepSeek`)
- Xiaomi MiMo (`deerflow.models.patched_mimo:PatchedChatMiMo`)
 - Claude Code OAuth (`deerflow.models.claude_provider:ClaudeChatModel`)
 - Codex CLI (`deerflow.models.openai_codex_provider:CodexChatModel`)
 - Any LangChain-compatible provider
@@ -95,35 +94,25 @@ models:
        thinking:
          type: enabled

-  - name: minimax-m3
-    display_name: MiniMax M3
+  - name: minimax-m2.5
+    display_name: MiniMax M2.5
    use: langchain_openai:ChatOpenAI
-    model: MiniMax-M3
+    model: MiniMax-M2.5
    api_key: $MINIMAX_API_KEY
    base_url: https://api.minimax.io/v1
    max_tokens: 4096
    temperature: 1.0  # MiniMax requires temperature in (0.0, 1.0]
    supports_vision: true

-  - name: minimax-m2.7
-    display_name: MiniMax M2.7
+  - name: minimax-m2.5-highspeed
+    display_name: MiniMax M2.5 Highspeed
    use: langchain_openai:ChatOpenAI
-    model: MiniMax-M2.7
+    model: MiniMax-M2.5-highspeed
    api_key: $MINIMAX_API_KEY
    base_url: https://api.minimax.io/v1
    max_tokens: 4096
    temperature: 1.0  # MiniMax requires temperature in (0.0, 1.0]
-    supports_vision: false  # M2.7 is text-only; M3 supports vision
-
-  - name: minimax-m2.7-highspeed
-    display_name: MiniMax M2.7 Highspeed
-    use: langchain_openai:ChatOpenAI
-    model: MiniMax-M2.7-highspeed
-    api_key: $MINIMAX_API_KEY
-    base_url: https://api.minimax.io/v1
-    max_tokens: 4096
-    temperature: 1.0  # MiniMax requires temperature in (0.0, 1.0]
-    supports_vision: false  # M2.7 is text-only; M3 supports vision
+    supports_vision: true
  - name: openrouter-gemini-2.5-flash
    display_name: Gemini 2.5 Flash (OpenRouter)
    use: langchain_openai:ChatOpenAI
@@ -177,37 +166,6 @@ models:

 For Gemini accessed **without** thinking (e.g. via OpenRouter where thinking is not activated), the plain `langchain_openai:ChatOpenAI` with `supports_thinking: false` is sufficient and no patch is needed.

-**MiMo with thinking via OpenAI-compatible API**:
-
-MiMo returns `reasoning_content` on assistant messages in thinking mode. In multi-turn agent conversations with tool calls, subsequent requests must preserve that historical `reasoning_content` on assistant messages or the MiMo API can return HTTP 400. Standard `langchain_openai:ChatOpenAI` drops this provider-specific field, so use `deerflow.models.patched_mimo:PatchedChatMiMo`:
-
-For pay-as-you-go API keys (`sk-...`), use `https://api.xiaomimimo.com/v1`. For Token Plan keys (`tp-...`), use the regional Token Plan Base URL shown in the MiMo console, such as `https://token-plan-cn.xiaomimimo.com/v1`. MiMo documents these key types as separate and non-interchangeable.
-
-`PatchedChatMiMo` is model-id agnostic. Use it for every MiMo thinking model entry you configure, including model entries referenced by `subagents.*.model` overrides (for example `mimo-v2.5-pro`, `mimo-v2.5`, `mimo-v2-pro`, `mimo-v2-omni`, or `mimo-v2-flash`).
-
-```yaml
-models:
-  - name: mimo-v2.5-pro
-    display_name: MiMo V2.5 Pro
-    use: deerflow.models.patched_mimo:PatchedChatMiMo
-    model: mimo-v2.5-pro
-    api_key: $MIMO_API_KEY
-    base_url: https://api.xiaomimimo.com/v1
-    max_tokens: 8192
-    supports_thinking: true
-    supports_vision: false
-    when_thinking_enabled:
-      extra_body:
-        thinking:
-          type: enabled
-    when_thinking_disabled:
-      extra_body:
-        thinking:
-          type: disabled
-```
-
-`PatchedChatMiMo` preserves MiMo's `choices[].message.reasoning_content`, streaming `delta.reasoning_content`, and request-history assistant `reasoning_content` fields. It does not reuse the DeepSeek provider.
-
 ### Tool Groups

 Organize tools into logical groups:
@@ -361,7 +319,6 @@ models:
 - `OPENAI_API_KEY` - OpenAI API key
 - `ANTHROPIC_API_KEY` - Anthropic API key
 - `DEEPSEEK_API_KEY` - DeepSeek API key
- `MIMO_API_KEY` - Xiaomi MiMo API key
 - `NOVITA_API_KEY` - Novita API key (OpenAI-compatible endpoint)
 - `TAVILY_API_KEY` - Tavily search API key
 - `DEER_FLOW_PROJECT_ROOT` - Project root for relative runtime paths
@@ -1,121 +0,0 @@
-# IM Channel Connections
-
-DeerFlow supports user-owned IM channel bindings for Telegram, Slack, Discord, Feishu/Lark, DingTalk, WeChat, and WeCom. The feature reuses the existing `channels.*` runtime configuration, so it works in local and private deployments with the same outbound transports already supported by DeerFlow.
-
-No public IP, OAuth callback URL, or provider webhook is required in this implementation.
-
-## Configuration
-
-Configure the actual IM bots under the existing `channels` block:
-
-```yaml
-channels:
-  telegram:
-    enabled: true
-    bot_token: $TELEGRAM_BOT_TOKEN
-
-  slack:
-    enabled: true
-    bot_token: $SLACK_BOT_TOKEN
-    app_token: $SLACK_APP_TOKEN
-
-  discord:
-    enabled: true
-    bot_token: $DISCORD_BOT_TOKEN
-
-  feishu:
-    enabled: true
-    app_id: $FEISHU_APP_ID
-    app_secret: $FEISHU_APP_SECRET
-
-  dingtalk:
-    enabled: true
-    client_id: $DINGTALK_CLIENT_ID
-    client_secret: $DINGTALK_CLIENT_SECRET
-
-  wechat:
-    enabled: true
-    bot_token: $WECHAT_BOT_TOKEN
-
-  wecom:
-    enabled: true
-    bot_id: $WECOM_BOT_ID
-    bot_secret: $WECOM_BOT_SECRET
-```
-
-Then enable user bindings in `channel_connections`:
-
-```yaml
-channel_connections:
-  enabled: true
-
-  telegram:
-    enabled: true
-    bot_username: $TELEGRAM_BOT_USERNAME
-
-  slack:
-    enabled: true
-
-  discord:
-    enabled: true
-
-  feishu:
-    enabled: true
-
-  dingtalk:
-    enabled: true
-
-  wechat:
-    enabled: true
-
-  wecom:
-    enabled: true
-```
-
-`channel_connections` does not duplicate provider secrets. It only controls the browser-facing connect UI and stores per-user binding records. Telegram needs `bot_username` only so the frontend can open a deep link.
-
-## Connect Flow
-
-Telegram:
-
- The frontend creates a short one-time code.
- The Connect button opens `https://t.me/<bot_username>?start=<code>`.
- The existing Telegram long-polling worker receives `/start <code>` and binds that Telegram chat/user to the current DeerFlow user.
-
-Slack:
-
- The frontend creates a short one-time code.
- The UI shows `Send /connect <code> to the DeerFlow Slack bot.`
- The existing Slack Socket Mode worker receives the message and binds the Slack user/team to the current DeerFlow user.
-
-Discord:
-
- The frontend creates a short one-time code.
- The UI shows `Send /connect <code> to the DeerFlow Discord bot.`
- The existing Discord Gateway worker receives the message and binds the Discord user/guild to the current DeerFlow user.
-
-Feishu/Lark, DingTalk, WeChat, and WeCom:
-
- The frontend creates a short one-time code.
- The UI shows `Send /connect <code> to the DeerFlow <Provider> bot.`
- The already-running long-connection or polling worker receives the message and binds the platform user/workspace identity to the current DeerFlow user.
-
-Codes expire after 10 minutes and are single-use.
-
-## Runtime Model
-
-Connection records live in SQL tables under `deerflow.persistence.channel_connections`:
-
- `channel_connections`: owner user, provider identity, workspace/guild/team, status, metadata.
- `channel_oauth_states`: one-time connect codes and Telegram deep-link state.
- `channel_conversations`: connection-scoped IM conversation to DeerFlow thread mapping.
- `channel_credentials`: reserved for future provider-token flows, not used by the local/private binding flow.
-
-Incoming messages that resolve to a connection carry `connection_id`, `owner_user_id`, and `workspace_id`. `ChannelManager` uses `owner_user_id` as the DeerFlow run user id and preserves the raw platform user id as `channel_user_id`.
-
-## Security Notes
-
- Browser APIs remain authenticated and CSRF-protected.
- Connect codes are random, short-lived, and single-use.
- Provider bot tokens remain in `channels.*` and are never returned to the browser.
- This implementation does not add public provider callback or webhook routes.
@@ -31,8 +31,7 @@ Current injection format:

 Token counting:
 - Uses `tiktoken` (`cl100k_base`) when available
- Falls back to a network-free CJK-aware character estimate if tokenizer import or encoding load fails
-  (CJK characters count as ~2 chars/token, other characters as ~4 chars/token)
+- Falls back to `len(text) // 4` if tokenizer import fails

 ## Known Gap

@@ -19,7 +19,6 @@ This directory contains detailed documentation for the DeerFlow backend.
 | [STREAMING.md](STREAMING.md) | Token-level streaming design: Gateway vs DeerFlowClient paths, `stream_mode` semantics, per-id dedup |
 | [FILE_UPLOAD.md](FILE_UPLOAD.md) | File upload functionality |
 | [PATH_EXAMPLES.md](PATH_EXAMPLES.md) | Path types and usage examples |
-| [SANDBOX_MEMORY_PROFILING.md](SANDBOX_MEMORY_PROFILING.md) | Sandbox memory baseline and runtime comparison guide |
 | [summarization.md](summarization.md) | Context summarization feature |
 | [plan_mode_usage.md](plan_mode_usage.md) | Plan mode with TodoList |
 | [AUTO_TITLE_GENERATION.md](AUTO_TITLE_GENERATION.md) | Automatic title generation |
@@ -1,120 +0,0 @@
-# Record/Replay E2E — front-back contract verification
-
-Deterministic, **key-free** end-to-end checks that a backend change can't
-silently break the frontend (and vice-versa). Two complementary layers, fed by a
-single recording.
-
-## Why
-
-The mock-based frontend e2e hand-writes the backend's JSON/SSE, so a backend
-schema or SSE change passes green ("fake green"). These layers replay a recorded
-**real** run against the **real** backend (and, for Layer 2, the real frontend),
-so contract drift turns the build red instead.
-
-## The two layers
-
- **Layer 1 — backend golden** (`tests/test_replay_golden.py`): replays a fixture
-  through the real FastAPI gateway with `ReplayChatModel` and asserts the streamed
-  SSE event sequence equals a committed golden. Fast, no browser. Guards protocol
-  *shape*.
- **Layer 2 — full-stack render** (`frontend/tests/e2e-real-backend/`): real
-  Next.js + real gateway (replay model) + Chromium; asserts the replayed
-  auto-title and a follow-up suggestion render in the browser. Guards semantic
-  *render*. (Complementary to Layer 1 — neither subsumes the other.)
-
-Layer 2 also hosts **cross-stack contract scenarios** — the dangerous class
-where a backend change silently breaks a frontend assumption and *both sides'
-unit tests stay green*. See below.
-
-## Cross-stack scenario: multi-run render order (`multi-run-order.spec.ts`)
-
-Regression guard for issue **#3352** (after context compression, refreshing a
-thread rendered history out of order). Root cause was a front-back desync:
-backend `RunManager.list_by_thread` returns runs **newest-first** (PR #2932),
-while the frontend (`core/threads/hooks.ts`) iterated runs and **prepended** each
-loaded page — inverting chronological order once the checkpoint no longer held
-the older messages. The backend ordering test was green throughout, and the
-frontend regression unit test hardcodes "backend returns newest-first" in a mock,
-so only a *real frontend against a real backend* catches the desync.
-
-This scenario does **not** record a conversation. It uses a **test-only seeder**
-(`tests/seed_runs_router.py`, mounted on the replay gateway only when
-`DEERFLOW_ENABLE_TEST_SEED=1`) to stand up a thread with ≥2 runs and per-run
-message events — and deliberately **no checkpoint**, which is the #3352
-precondition: it forces the frontend's per-run reload path to be the sole source
-of truth so the ordering bug becomes observable. The seeder writes through the
-gateway's own run/event stores using the request's auth context, so the real
-`list_by_thread` → `/runs/{id}/messages` → prepend path runs live. Reverting the
-#3354 frontend fix turns this spec red.
-
-## How replay works
-
-`tests/replay_provider.py::ReplayChatModel` returns recorded assistant turns keyed
-by a **normalized hash of the model caller + conversation**. The conversation is
-human / ai / tool messages — role, text, tool-call name+args; with
-`<system-reminder>`, dates, UUIDs, tmp paths stripped. The caller is the stable
-source of the model call (`lead_agent`, `middleware:title`, `suggest_agent`,
-`subagent:*`, etc.). A miss raises loudly rather than passing silently.
-
-**The system prompt is excluded from the match key.** The lead-agent system
-prompt is a living, frequently-edited implementation detail — its wording changes
-across PRs (e.g. #3195 added a "File Editing Workflow" section). Hashing it would
-make every fixture go stale and red-fail unrelated PRs the moment anyone edits the
-prompt. The conversation flow (user input → tool calls → results → answer) is the
-stable contract that identifies a recorded turn. The caller still stays in the
-key so two different model users with identical conversation text do not compete
-for the same replay bucket. (This mirrors how open-design's mock picker keys on
-the user prompt, not the system internals.) Combined with pinning skills +
-extensions empty and disabling memory/summarization
-(`tests/_replay_fixture.py::build_config_yaml`), a fixture replays the same across
-machines, days, prompt edits, and CI. Replaying needs **no API key**.
-
-A swallowed hash-miss keeps the SSE *event shapes* identical (the gateway wraps it
-into a normal assistant error message), so the Layer-1 golden can't catch a miss
-by shape alone — it inspects `replay_provider.replay_misses()` and fails loud
-instead. Layer-2 already fails on a miss (the recorded turns never render).
-
-## Record a new scenario (needs a real key — dev machine only)
-
-Recording drives the **real frontend** so captured inputs match exactly what the
-browser sends; fixtures contain no API key.
-
-```bash
-# 1. drive the real frontend against a real-model gateway, capturing model calls
-OPENAI_API_KEY=... OPENAI_API_BASE=<openai-compatible-endpoint>/v1 \
-  DEERFLOW_RECORD_OUT=/tmp/rec/turns.jsonl RECORD_MODEL=<model> \
-  bash -c 'cd frontend && pnpm exec playwright test -c playwright.record.config.ts'
-
-# 2. stitch the capture into a fixture
-cd backend && uv run python scripts/build_fixture_from_jsonl.py \
-  --jsonl /tmp/rec/turns.jsonl --meta /tmp/rec/turns.jsonl.meta.json \
-  --out tests/fixtures/replay/<scenario>.<mode>.json --model <model>
-
-# 3. regenerate the committed golden
-DEERFLOW_WRITE_GOLDEN=1 PYTHONPATH=. uv run pytest tests/test_replay_golden.py
-```
-
-## Run (no key)
-
-```bash
-cd backend  && PYTHONPATH=. uv run pytest tests/test_replay_golden.py          # Layer 1
-cd frontend && pnpm exec playwright test -c playwright.real-backend.config.ts  # Layer 2
-```
-
-## CI
-
-`.github/workflows/replay-e2e.yml` runs both layers on changes to **either** side
-of the contract (`frontend/**`, `backend/app/gateway/**`,
-`backend/packages/harness/**`, fixtures). DOM assertions are the gate; the rendered
-screenshot + Playwright HTML report are uploaded as a CI artifact.
-
-## Known limitations
-
- Visual regression baselines are OS-specific, so they are a **local dev gate
-  only** (gitignored); CI uploads the render as an artifact for human review
-  instead of hard-asserting a cross-OS baseline.
- Fixtures are coupled to the recording-time prompt; if new
-  environment-dependent content enters the system prompt, extend the
-  normalization in `replay_provider.py` (or pin it in `build_config_yaml`).
- Re-record a scenario if the agent graph changes how many model calls it makes
-  — the replay raises loudly on a hash miss pointing at the divergence.
@@ -1,81 +0,0 @@
-# Sandbox Memory Profiling
-
-This guide records a repeatable baseline before changing the sandbox runtime.
-Issue #3213 reports per-sandbox memory near 1 GiB in Kubernetes. Before adding
-or recommending a new provider, capture the current AIO sandbox baseline and
-compare candidates with the same DeerFlow workload.
-
-## What to Measure
-
-Measure at least these samples:
-
-1. Empty sandbox after it becomes ready.
-2. After a simple bash command.
-3. After a Python task that imports common packages.
-4. After a Node task when Node-based workloads are expected.
-5. After generating files under `/mnt/user-data/outputs`.
-6. After release and warm reuse.
-7. At the target concurrency level, for example 10, 50, or 100 sandboxes.
-
-`kubectl top` reports Kubernetes/container working set memory. Treat it as a
-capacity signal, not exclusive RSS/PSS. Pod-level memory includes every
-container in the Pod and may include cache charged to the cgroup. If a result
-looks surprising, inspect the sandbox processes and cgroup metrics on the node
-before drawing conclusions.
-
-## Capture a Snapshot
-
-Run this from the repository root:
-
-```bash
-python scripts/sandbox_memory_profile.py \
-  --namespace deer-flow \
-  --selector app=deer-flow-sandbox \
-  --sample empty \
-  --include-processes \
-  --format markdown
-```
-
-Use a descriptive `--sample` value for each phase:
-
-```bash
-python scripts/sandbox_memory_profile.py --sample after-bash --format json
-python scripts/sandbox_memory_profile.py --sample after-python --format json
-python scripts/sandbox_memory_profile.py --sample after-artifact --format json
-```
-
-`--include-processes` runs `kubectl exec ... ps` in each sandbox Pod and adds
-the highest-RSS processes to the report. This helps distinguish Pod-level cgroup
-memory from process RSS. The two numbers will not match exactly because cgroup
-memory can include cache and other kernel-accounted memory.
-
-Save the raw JSON when comparing backends so totals, pod names, images,
-requests, limits, and timestamps can be audited later.
-
-## Candidate Runtime Matrix
-
-For AIO, CubeSandbox, OpenSandbox, gVisor, Kata, or another candidate, compare
-the same workload and record:
-
-| Area | Required Evidence |
-| --- | --- |
-| Capacity | Pod or instance count, total memory, average memory, max memory |
-| Startup | Ready latency at 1, 10, 50, and 100 concurrent sandboxes |
-| Commands | Bash output, timeout behavior, failure shape |
-| Files | `read_file`, `write_file`, binary `update_file`, `list_dir`, `glob`, `grep` |
-| Uploads | Files uploaded by the gateway are visible inside the sandbox |
-| Artifacts | Files written to `/mnt/user-data/outputs` are readable by the backend artifact API |
-| Paths | `/mnt/user-data/workspace`, `/mnt/user-data/uploads`, `/mnt/user-data/outputs`, `/mnt/acp-workspace`, and skills paths keep their expected semantics |
-| Isolation | Different users and threads cannot read each other's data |
-| Cleanup | Release, idle timeout, process restart, and orphan cleanup free resources |
-| Operations | Deployment prerequisites, privileged components, networking, storage, and upgrade path |
-
-## PR Guidance
-
-Do not claim that a new provider fixes high-concurrency memory usage until the
-same DeerFlow workload has been measured on both the current AIO sandbox and the
-candidate backend.
-
-For an experimental provider PR, prefer `Related to #3213` unless the PR also
-includes reproducible DeerFlow workload data that demonstrates the target memory
-reduction and preserves uploads, outputs, artifacts, and isolation behavior.
@@ -26,7 +26,7 @@
  - Replace sync `requests` with `httpx.AsyncClient` in community tools (tavily, jina_ai, firecrawl, infoquest, image_search)
  - [x] Replace sync `model.invoke()` with async `model.ainvoke()` in title_middleware and memory updater
  - Consider `asyncio.to_thread()` wrapper for remaining blocking file I/O
-  - For production: tune Gateway worker/runtime settings for long-running agent workloads
+  - For production: use `langgraph up` (multi-worker) instead of `langgraph dev` (single-worker)

 ## Resolved Issues

@@ -127,8 +127,8 @@ complex_agent = create_agent_for_task("high")
 ## How It Works

 1. When `make_lead_agent(config)` is called, it extracts `is_plan_mode` from `config.configurable`
-2. The config is passed to `build_middlewares(config)`
-3. `build_middlewares()` reads `is_plan_mode` and calls `_create_todo_list_middleware(is_plan_mode)`
+2. The config is passed to `_build_middlewares(config)`
+3. `_build_middlewares()` reads `is_plan_mode` and calls `_create_todo_list_middleware(is_plan_mode)`
 4. If `is_plan_mode=True`, a `TodoListMiddleware` instance is created and added to the middleware chain
 5. The middleware automatically adds a `write_todos` tool to the agent's toolset
 6. The agent can use this tool to manage tasks during execution
@@ -141,7 +141,7 @@ make_lead_agent(config)
  │
  ├─> Extracts: is_plan_mode = config.configurable.get("is_plan_mode", False)
  │
-  └─> build_middlewares(config)
+  └─> _build_middlewares(config)
        │
        ├─> ThreadDataMiddleware
        ├─> SandboxMiddleware
@@ -156,7 +156,7 @@ make_lead_agent(config)
 ### Agent Module
 - **Location**: `packages/harness/deerflow/agents/lead_agent/agent.py`
 - **Function**: `_create_todo_list_middleware(is_plan_mode: bool)` - Creates TodoListMiddleware if plan mode is enabled
- **Function**: `build_middlewares(config: RunnableConfig)` - Builds middleware chain based on runtime config
+- **Function**: `_build_middlewares(config: RunnableConfig)` - Builds middleware chain based on runtime config
 - **Function**: `make_lead_agent(config: RunnableConfig)` - Creates agent with appropriate middlewares

 ### Runtime Configuration
@@ -18,8 +18,6 @@ middleware, and the async path inside ``TitleMiddleware``. Any new in-graph
 ``create_chat_model`` call must add to this list and pass the flag.
 """

-from __future__ import annotations
-
 import logging

 from langchain.agents import create_agent
@@ -49,8 +47,6 @@ from deerflow.tracing import build_tracing_callbacks

 logger = logging.getLogger(__name__)

-_BOOTSTRAP_SKILL_NAMES = {"bootstrap"}
-

 def _get_runtime_config(config: RunnableConfig) -> dict:
    """Merge legacy configurable options with LangGraph runtime context."""
@@ -267,31 +263,20 @@ Being proactive with task management demonstrates thoroughness and ensures all r
 # ViewImageMiddleware should be before ClarificationMiddleware to inject image details before LLM
 # ToolErrorHandlingMiddleware should be before ClarificationMiddleware to convert tool exceptions to ToolMessages
 # ClarificationMiddleware should be last to intercept clarification requests after model calls
-def build_middlewares(
+def _build_middlewares(
    config: RunnableConfig,
    model_name: str | None,
    agent_name: str | None = None,
    custom_middlewares: list[AgentMiddleware] | None = None,
    *,
-    available_skills: set[str] | None = None,
    app_config: AppConfig | None = None,
-    deferred_setup=None,
 ):
-    """Build the lead-agent middleware chain based on runtime configuration.
-
-    Public entry point for the lead agent's full middleware composition. Used by
-    ``make_lead_agent`` and by the embedded ``DeerFlowClient`` (a lead-agent variant
-    that needs the identical chain). Keep this name stable: it is imported across a
-    module boundary, so renames/signature changes ripple into ``client.py``.
+    """Build middleware chain based on runtime configuration.

    Args:
        config: Runtime configuration containing configurable options like is_plan_mode.
-        model_name: Resolved runtime model name; gates vision-only middleware.
        agent_name: If provided, MemoryMiddleware will use per-agent memory storage.
        custom_middlewares: Optional list of custom middlewares to inject into the chain.
-        app_config: Explicit AppConfig; falls back to ``get_app_config()`` when omitted.
-        deferred_setup: Optional deferred-MCP-tool setup that attaches
-            ``DeferredToolFilterMiddleware`` when ``tool_search`` is enabled.

    Returns:
        List of middleware instances.
@@ -305,13 +290,6 @@ def build_middlewares(

    middlewares.append(DynamicContextMiddleware(agent_name=agent_name, app_config=resolved_app_config))

-    # Deterministically load a full SKILL.md when the user starts the turn with
-    # /skill-name. This keeps the base system prompt metadata-only while giving
-    # explicit user activation priority over model-side relevance guessing.
-    from deerflow.agents.middlewares.skill_activation_middleware import SkillActivationMiddleware
-
-    middlewares.append(SkillActivationMiddleware(available_skills=available_skills, app_config=resolved_app_config))
-
    # Add summarization middleware if enabled
    summarization_middleware = _create_summarization_middleware(app_config=resolved_app_config)
    if summarization_middleware is not None:
@@ -340,13 +318,11 @@ def build_middlewares(
    if model_config is not None and model_config.supports_vision:
        middlewares.append(ViewImageMiddleware())

-    # Hide deferred tool schemas from model binding until tool_search promotes them.
-    # The deferred set + catalog hash come from the build-time setup (assembled
-    # after tool-policy filtering); promotion is read from graph state.
-    if deferred_setup is not None and deferred_setup.deferred_names:
+    # Add DeferredToolFilterMiddleware to hide deferred tool schemas from model binding
+    if resolved_app_config.tool_search.enabled:
        from deerflow.agents.middlewares.deferred_tool_filter_middleware import DeferredToolFilterMiddleware

-        middlewares.append(DeferredToolFilterMiddleware(deferred_setup.deferred_names, deferred_setup.catalog_hash))
+        middlewares.append(DeferredToolFilterMiddleware())

    # Add SubagentLimitMiddleware to truncate excess parallel task calls
    subagent_enabled = cfg.get("subagent_enabled", False)
@@ -379,7 +355,7 @@ def build_middlewares(

 def _available_skill_names(agent_config, is_bootstrap: bool) -> set[str] | None:
    if is_bootstrap:
-        return set(_BOOTSTRAP_SKILL_NAMES)
+        return {"bootstrap"}
    if agent_config and agent_config.skills is not None:
        return set(agent_config.skills)
    return None
@@ -410,7 +386,6 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
    # Lazy import to avoid circular dependency
    from deerflow.tools import get_available_tools
    from deerflow.tools.builtins import setup_agent, update_agent
-    from deerflow.tools.builtins.tool_search import assemble_deferred_tools

    cfg = _get_runtime_config(config)
    resolved_app_config = app_config
@@ -485,27 +460,16 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):

    if is_bootstrap:
        # Special bootstrap agent with minimal prompt for initial custom agent creation flow
-        # Keep the bootstrap skill set intentionally narrow so agent creation
-        # remains deterministic before the custom agent's own config exists.
-        raw_tools = get_available_tools(model_name=model_name, subagent_enabled=subagent_enabled, app_config=resolved_app_config) + [setup_agent]
-        filtered = filter_tools_by_skill_allowed_tools(raw_tools, skills_for_tool_policy)
-        final_tools, setup = assemble_deferred_tools(filtered, enabled=resolved_app_config.tool_search.enabled)
+        tools = get_available_tools(model_name=model_name, subagent_enabled=subagent_enabled, app_config=resolved_app_config) + [setup_agent]
        return create_agent(
            model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, app_config=resolved_app_config, attach_tracing=False),
-            tools=final_tools,
-            middleware=build_middlewares(
-                config,
-                model_name=model_name,
-                available_skills=set(_BOOTSTRAP_SKILL_NAMES),
-                app_config=resolved_app_config,
-                deferred_setup=setup,
-            ),
+            tools=filter_tools_by_skill_allowed_tools(tools, skills_for_tool_policy),
+            middleware=_build_middlewares(config, model_name=model_name, app_config=resolved_app_config),
            system_prompt=apply_prompt_template(
                subagent_enabled=subagent_enabled,
                max_concurrent_subagents=max_concurrent_subagents,
-                available_skills=set(_BOOTSTRAP_SKILL_NAMES),
+                available_skills=set(["bootstrap"]),
                app_config=resolved_app_config,
-                deferred_names=setup.deferred_names,
            ),
            state_schema=ThreadState,
        )
@@ -514,27 +478,17 @@ def _make_lead_agent(config: RunnableConfig, *, app_config: AppConfig):
    # The default agent (no agent_name) does not see this tool.
    extra_tools = [update_agent] if agent_name else []
    # Default lead agent (unchanged behavior)
-    raw_tools = get_available_tools(model_name=model_name, groups=agent_config.tool_groups if agent_config else None, subagent_enabled=subagent_enabled, app_config=resolved_app_config)
-    filtered = filter_tools_by_skill_allowed_tools(raw_tools + extra_tools, skills_for_tool_policy)
-    final_tools, setup = assemble_deferred_tools(filtered, enabled=resolved_app_config.tool_search.enabled)
+    tools = get_available_tools(model_name=model_name, groups=agent_config.tool_groups if agent_config else None, subagent_enabled=subagent_enabled, app_config=resolved_app_config)
    return create_agent(
        model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled, reasoning_effort=reasoning_effort, app_config=resolved_app_config, attach_tracing=False),
-        tools=final_tools,
-        middleware=build_middlewares(
-            config,
-            model_name=model_name,
-            agent_name=agent_name,
-            available_skills=available_skills,
-            app_config=resolved_app_config,
-            deferred_setup=setup,
-        ),
+        tools=filter_tools_by_skill_allowed_tools(tools + extra_tools, skills_for_tool_policy),
+        middleware=_build_middlewares(config, model_name=model_name, agent_name=agent_name, app_config=resolved_app_config),
        system_prompt=apply_prompt_template(
            subagent_enabled=subagent_enabled,
            max_concurrent_subagents=max_concurrent_subagents,
            agent_name=agent_name,
-            available_skills=available_skills,
+            available_skills=set(agent_config.skills) if agent_config and agent_config.skills is not None else None,
            app_config=resolved_app_config,
-            deferred_names=setup.deferred_names,
        ),
        state_schema=ThreadState,
    )
@@ -10,7 +10,6 @@ from deerflow.config.agents_config import load_agent_soul
 from deerflow.skills.storage import get_or_new_skill_storage
 from deerflow.skills.types import Skill, SkillCategory
 from deerflow.subagents import get_available_subagent_names
-from deerflow.tools.builtins.tool_search import get_deferred_tools_prompt_section

 if TYPE_CHECKING:
    from deerflow.config.app_config import AppConfig
@@ -543,14 +542,6 @@ combined with a FastAPI gateway for REST API access [citation:FastAPI](https://f
 {subagent_reminder}- Skill First: Always load the relevant skill before starting **complex** tasks.
 - Progressive Loading: Load resources incrementally as referenced in skills
 - Output Files: Final deliverables must be in `/mnt/user-data/outputs`
- File Editing Workflow: When revising an existing file, prefer
-  `str_replace` over `write_file` — it sends only the diff and avoids
-  re-emitting the whole file (mirrors Claude Code's Edit and Codex's
-  apply_patch). When writing long new content from scratch, split it
-  into sections: the first `write_file` call creates the file, then use
-  `write_file` with append=True to extend it section by section. This
-  keeps each tool call small and avoids mid-stream chunk-gap timeouts
-  on oversized single-shot writes. (See issue #3189.)  
 - Clarity: Be direct and helpful, avoid unnecessary meta-commentary
 - Including Images and Mermaid: Images and Mermaid diagrams are always welcomed in the Markdown format, and you're encouraged to use `![Image Description](image_path)\n\n` or "```mermaid" to display images in response or Markdown files
 - Multi-task: Better utilize parallel tool calling to call multiple tools at one time for better performance
@@ -586,11 +577,7 @@ def _get_memory_context(agent_name: str | None = None, *, app_config: AppConfig
            return ""

        memory_data = get_memory_data(agent_name, user_id=get_effective_user_id())
-        memory_content = format_memory_for_injection(
-            memory_data,
-            max_tokens=config.max_injection_tokens,
-            use_tiktoken=(config.token_counting == "tiktoken"),
-        )
+        memory_content = format_memory_for_injection(memory_data, max_tokens=config.max_injection_tokens)

        if not memory_content.strip():
            return ""
@@ -629,11 +616,6 @@ You have access to skills that provide optimized workflows for specific tasks. E
 4. Load referenced resources only when needed during execution
 5. Follow the skill's instructions precisely

-**Explicit Slash Skill Activation:**
- If the user starts a request with `/<skill-name>`, that skill was explicitly requested for the current turn.
- Follow the activated skill before choosing a general workflow.
- The runtime injects the activated skill content for explicit slash activations; do not call `read_file` for that SKILL.md again unless the injected skill references supporting resources you need.
-
 **Skills are located at:** {container_base_path}
 {skill_evolution_section}
 {skills_list}
@@ -696,13 +678,42 @@ SOUL.md or config.yaml — those write into a temporary sandbox/tool workspace a
 Rules:
 - Always pass the FULL replacement text for `soul` (no patch semantics). Start from your current SOUL above and apply the user's edits.
 - Only pass the fields that should change. Omit the others to preserve them.
- Never pass literal strings like `"null"`, `"none"`, or `"undefined"` for unchanged fields.
 - Pass `skills=[]` to disable all skills, or omit `skills` to keep the existing whitelist.
 - After `update_agent` returns successfully, tell the user the change is persisted and will take effect on the next turn.
 </self_update>
 """


+def get_deferred_tools_prompt_section(*, app_config: AppConfig | None = None) -> str:
+    """Generate <available-deferred-tools> block for the system prompt.
+
+    Lists only deferred tool names so the agent knows what exists
+    and can use tool_search to load them.
+    Returns empty string when tool_search is disabled or no tools are deferred.
+    """
+    from deerflow.tools.builtins.tool_search import get_deferred_registry
+
+    if app_config is None:
+        try:
+            from deerflow.config import get_app_config
+
+            config = get_app_config()
+        except Exception:
+            return ""
+    else:
+        config = app_config
+
+    if not config.tool_search.enabled:
+        return ""
+
+    registry = get_deferred_registry()
+    if not registry:
+        return ""
+
+    names = "\n".join(e.name for e in registry.entries)
+    return f"<available-deferred-tools>\n{names}\n</available-deferred-tools>"
+
+
 def _build_acp_section(*, app_config: AppConfig | None = None) -> str:
    """Build the ACP agent prompt section, only if ACP agents are configured."""
    if app_config is None:
@@ -761,7 +772,6 @@ def apply_prompt_template(
    agent_name: str | None = None,
    available_skills: set[str] | None = None,
    app_config: AppConfig | None = None,
-    deferred_names: frozenset[str] = frozenset(),
 ) -> str:
    # Include subagent section only if enabled (from runtime parameter)
    n = max_concurrent_subagents
@@ -789,7 +799,7 @@ def apply_prompt_template(
    skills_section = get_skills_prompt_section(available_skills, app_config=app_config)

    # Get deferred tools section (tool_search)
-    deferred_tools_section = get_deferred_tools_prompt_section(deferred_names=deferred_names)
+    deferred_tools_section = get_deferred_tools_prompt_section(app_config=app_config)

    # Build ACP agent section only if ACP agents are configured
    acp_section = _build_acp_section(app_config=app_config)
@@ -1,15 +1,8 @@
 """Prompt templates for memory update and injection."""

-from __future__ import annotations
-
-import logging
 import math
 import re
-import threading
-import time
-from typing import Any, cast
-
-logger = logging.getLogger(__name__)
+from typing import Any

 try:
    import tiktoken
@@ -167,137 +160,26 @@ Rules:
 Return ONLY valid JSON."""


-# Module-level tiktoken encoding cache.  Populated lazily on first use;
-# subsequent calls are a dict lookup (no network I/O).  Pre-warming at
-# startup via :func:`warm_tiktoken_cache` avoids blocking a request on the
-# (potentially slow) first ``get_encoding`` call.
-#
-# A *failed* load is cached as a ``(None, monotonic_timestamp)`` tuple so that
-# a network-restricted environment does not re-attempt the blocking BPE
-# download on every subsequent call.  After ``_TIKTOKEN_RETRY_COOLDOWN_S`` the
-# failure is allowed to expire so a transient network outage can self-heal back
-# to accurate tiktoken counting without a process restart.  A load already in
-# progress is cached as ``_TIKTOKEN_ENCODING_LOADING`` so concurrent callers
-# fall back immediately instead of spawning more blocking
-# ``tiktoken.get_encoding`` threads.  Use the ``memory.token_counting: char``
-# config to skip tiktoken entirely.
-_TIKTOKEN_ENCODING_MISSING = object()
-_TIKTOKEN_ENCODING_LOADING = object()
-# Cooldown before a *failed* tiktoken load is re-attempted. This is an internal
-# tuning constant rather than a user-facing config: it only affects how quickly
-# the default ``tiktoken`` mode self-heals after a transient network outage.
-# Deployments that want to avoid tiktoken's network dependency entirely should
-# set ``memory.token_counting: char`` instead of tuning this value.
-_TIKTOKEN_RETRY_COOLDOWN_S = 600.0
-_tiktoken_encoding_cache: dict[str, Any] = {}
-_tiktoken_encoding_cache_lock = threading.Lock()
-
-
-def _get_tiktoken_encoding(encoding_name: str = "cl100k_base") -> tiktoken.Encoding | None:
-    """Return a cached tiktoken encoding, or ``None`` on failure / unavailability.
-
-    On the very first call for a given *encoding_name*, tiktoken may need to
-    download the BPE data from ``openaipublic.blob.core.windows.net``.  In
-    network-restricted environments (e.g. deployments behind the GFW) this
-    download can block for tens of minutes before the OS TCP timeout kicks in.
-    The caller must therefore be prepared for this to block and should run it
-    off the event loop (e.g. via ``asyncio.to_thread``).
-
-    A failed load is remembered (with a timestamp) so subsequent calls fall
-    back immediately to character-based estimation instead of re-triggering the
-    blocking download. The failure expires after ``_TIKTOKEN_RETRY_COOLDOWN_S``
-    so a transient outage can self-heal without a restart. A load already in
-    progress is also remembered so that a timed-out caller does not leave a
-    window where later requests start more blocking ``get_encoding`` calls.
-    """
-    if not TIKTOKEN_AVAILABLE:
-        return None
-
-    with _tiktoken_encoding_cache_lock:
-        cached = _tiktoken_encoding_cache.get(encoding_name, _TIKTOKEN_ENCODING_MISSING)
-        if cached is _TIKTOKEN_ENCODING_LOADING:
-            return None
-        if isinstance(cached, tuple):
-            # Cached failure: (None, failed_at). Retry only after cooldown.
-            _, failed_at = cached
-            if time.monotonic() - failed_at < _TIKTOKEN_RETRY_COOLDOWN_S:
-                return None
-            cached = _TIKTOKEN_ENCODING_MISSING
-        if cached is not _TIKTOKEN_ENCODING_MISSING:
-            return cast("tiktoken.Encoding", cached)
-        _tiktoken_encoding_cache[encoding_name] = _TIKTOKEN_ENCODING_LOADING
-
-    try:
-        encoding = tiktoken.get_encoding(encoding_name)
-    except Exception:
-        logger.warning("Failed to load tiktoken encoding %r; falling back to char-based estimation", encoding_name, exc_info=True)
-        with _tiktoken_encoding_cache_lock:
-            _tiktoken_encoding_cache[encoding_name] = (None, time.monotonic())
-        return None
-
-    with _tiktoken_encoding_cache_lock:
-        _tiktoken_encoding_cache[encoding_name] = encoding
-    return encoding
-
-
-def _char_based_token_estimate(text: str) -> int:
-    """Network-free token estimate that accounts for CJK density.
-
-    The plain ``len(text) // 4`` heuristic is reasonable for English/code
-    (~4 chars per token) but significantly under-estimates token counts for
-    Chinese, Japanese, and Korean text, where the ratio is closer to 1.5-2
-    characters per token. Counting CJK characters separately (~2 chars per
-    token) avoids over-filling the injection budget for CJK-heavy memory
-    content.
-    """
-    cjk = sum(
-        1
-        for ch in text
-        if "\u4e00" <= ch <= "\u9fff"  # CJK Unified Ideographs
-        or "\u3040" <= ch <= "\u30ff"  # Hiragana + Katakana
-        or "\uac00" <= ch <= "\ud7a3"  # Hangul syllables
-    )
-    return (len(text) - cjk) // 4 + cjk // 2
-
-
-def _count_tokens(text: str, encoding_name: str = "cl100k_base", *, use_tiktoken: bool = True) -> int:
+def _count_tokens(text: str, encoding_name: str = "cl100k_base") -> int:
    """Count tokens in text using tiktoken.

    Args:
        text: The text to count tokens for.
        encoding_name: The encoding to use (default: cl100k_base for GPT-4/3.5).
-        use_tiktoken: When ``False``, skip tiktoken entirely and use the
-            network-free character-based estimate. This guarantees no BPE
-            download is attempted (see ``memory.token_counting`` config).

    Returns:
        The number of tokens in the text.
    """
-    if not use_tiktoken:
-        return _char_based_token_estimate(text)
-
-    encoding = _get_tiktoken_encoding(encoding_name)
-    if encoding is None:
-        # Fallback to CJK-aware character estimation if tiktoken is not
-        # available or the encoding failed to load.
-        return _char_based_token_estimate(text)
+    if not TIKTOKEN_AVAILABLE:
+        # Fallback to character-based estimation if tiktoken is not available
+        return len(text) // 4

    try:
+        encoding = tiktoken.get_encoding(encoding_name)
        return len(encoding.encode(text))
    except Exception:
-        # Fallback to CJK-aware character estimation on error.
-        return _char_based_token_estimate(text)
-
-
-def warm_tiktoken_cache() -> bool:
-    """Pre-warm the tiktoken encoding cache.
-
-    Call at startup (off the event loop) so the first request never blocks
-    on the BPE download.  Returns ``True`` if the encoding was loaded
-    successfully (or was already cached), ``False`` if tiktoken is
-    unavailable or the download failed.
-    """
-    return _get_tiktoken_encoding("cl100k_base") is not None
+        # Fallback to character-based estimation on error
+        return len(text) // 4


 def _coerce_confidence(value: Any, default: float = 0.0) -> float:
@@ -316,15 +198,12 @@ def _coerce_confidence(value: Any, default: float = 0.0) -> float:
    return max(0.0, min(1.0, confidence))


-def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000, *, use_tiktoken: bool = True) -> str:
+def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000) -> str:
    """Format memory data for injection into system prompt.

    Args:
        memory_data: The memory data dictionary.
        max_tokens: Maximum tokens to use (counted via tiktoken for accuracy).
-        use_tiktoken: When ``False``, all token counting uses the network-free
-            character-based estimate instead of tiktoken (see
-            ``memory.token_counting`` config). Defaults to ``True``.

    Returns:
        Formatted memory string for system prompt injection.
@@ -386,10 +265,10 @@ def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2
        # Compute token count for existing sections once, then account
        # incrementally for each fact line to avoid full-string re-tokenization.
        base_text = "\n\n".join(sections)
-        base_tokens = _count_tokens(base_text, use_tiktoken=use_tiktoken) if base_text else 0
+        base_tokens = _count_tokens(base_text) if base_text else 0
        # Account for the separator between existing sections and the facts section.
        facts_header = "Facts:\n"
-        separator_tokens = _count_tokens("\n\n" + facts_header, use_tiktoken=use_tiktoken) if base_text else _count_tokens(facts_header, use_tiktoken=use_tiktoken)
+        separator_tokens = _count_tokens("\n\n" + facts_header) if base_text else _count_tokens(facts_header)
        running_tokens = base_tokens + separator_tokens

        fact_lines: list[str] = []
@@ -410,7 +289,7 @@ def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2

            # Each additional line is preceded by a newline (except the first).
            line_text = ("\n" + line) if fact_lines else line
-            line_tokens = _count_tokens(line_text, use_tiktoken=use_tiktoken)
+            line_tokens = _count_tokens(line_text)

            if running_tokens + line_tokens <= max_tokens:
                fact_lines.append(line)
@@ -426,9 +305,8 @@ def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2

    result = "\n\n".join(sections)

-    # Use accurate token counting with tiktoken (or the char-based estimate
-    # when use_tiktoken is False).
-    token_count = _count_tokens(result, use_tiktoken=use_tiktoken)
+    # Use accurate token counting with tiktoken
+    token_count = _count_tokens(result)
    if token_count > max_tokens:
        # Truncate to fit within token limit
        # Estimate characters to remove based on token ratio
@@ -227,110 +227,6 @@ def _extract_text(content: Any) -> str:
    return str(content)


-_REQUIRED_MEMORY_UPDATE_TOP_LEVEL_KEYS = frozenset({"user", "history", "newFacts", "factsToRemove"})
-
-
-def _normalize_memory_update_fact(fact: Any) -> dict[str, Any] | None:
-    """Normalize a single fact entry from a model-produced memory update."""
-    if not isinstance(fact, dict):
-        return None
-
-    raw_content = fact.get("content")
-    if not isinstance(raw_content, str):
-        return None
-    content = raw_content.strip()
-    if not content:
-        return None
-
-    raw_category = fact.get("category")
-    category = raw_category.strip() if isinstance(raw_category, str) and raw_category.strip() else "context"
-
-    raw_confidence = fact.get("confidence", 0.5)
-    if isinstance(raw_confidence, bool):
-        return None
-    if isinstance(raw_confidence, str):
-        raw_confidence = raw_confidence.strip()
-        if not raw_confidence:
-            return None
-        try:
-            raw_confidence = float(raw_confidence)
-        except ValueError:
-            return None
-    elif isinstance(raw_confidence, (int, float)):
-        raw_confidence = float(raw_confidence)
-    else:
-        return None
-
-    if not math.isfinite(raw_confidence):
-        return None
-
-    normalized_fact = {
-        "content": content,
-        "category": category,
-        "confidence": raw_confidence,
-    }
-    source_error = fact.get("sourceError")
-    if isinstance(source_error, str):
-        normalized_source_error = source_error.strip()
-        if normalized_source_error:
-            normalized_fact["sourceError"] = normalized_source_error
-
-    return normalized_fact
-
-
-def _normalize_memory_update_data(update_data: dict[str, Any]) -> dict[str, Any]:
-    """Coerce parsed memory update data into the shape consumed by _apply_updates."""
-    user = update_data.get("user")
-    history = update_data.get("history")
-    new_facts = update_data.get("newFacts")
-    facts_to_remove = update_data.get("factsToRemove")
-    normalized_facts_to_remove = [fact_id for fact_id in facts_to_remove if isinstance(fact_id, str)] if isinstance(facts_to_remove, list) else []
-    normalized_new_facts = []
-    dropped_new_fact = not isinstance(new_facts, list)
-    if isinstance(new_facts, list):
-        for fact in new_facts:
-            normalized_fact = _normalize_memory_update_fact(fact)
-            if normalized_fact is not None:
-                normalized_new_facts.append(normalized_fact)
-            else:
-                dropped_new_fact = True
-
-    if normalized_facts_to_remove and dropped_new_fact:
-        raise json.JSONDecodeError(
-            "Unsafe partial memory update: factsToRemove with malformed newFacts",
-            json.dumps(update_data, ensure_ascii=False),
-            0,
-        )
-
-    return {
-        "user": user if isinstance(user, dict) else {},
-        "history": history if isinstance(history, dict) else {},
-        "newFacts": normalized_new_facts,
-        "factsToRemove": normalized_facts_to_remove,
-    }
-
-
-def _parse_memory_update_response(response_content: Any) -> dict[str, Any]:
-    """Parse the first valid memory-update JSON object from an LLM response.
-
-    Some providers may wrap JSON in thinking traces, prose, or markdown fences
-    even when prompted to return JSON only. This parser accepts safely
-    extractable JSON objects but does not repair truncated or malformed JSON.
-    """
-    response_text = _extract_text(response_content).strip()
-    decoder = json.JSONDecoder()
-
-    for match in re.finditer(r"\{", response_text):
-        try:
-            parsed, _end = decoder.raw_decode(response_text[match.start() :])
-        except json.JSONDecodeError:
-            continue
-        if isinstance(parsed, dict) and _REQUIRED_MEMORY_UPDATE_TOP_LEVEL_KEYS.issubset(parsed):
-            return _normalize_memory_update_data(parsed)
-
-    raise json.JSONDecodeError("No valid memory update JSON object found", response_text, 0)
-
-
 # Matches sentences that describe a file-upload *event* rather than general
 # file-related work.  Deliberately narrow to avoid removing legitimate facts
 # such as "User works with CSV files" or "prefers PDF export".
@@ -457,7 +353,13 @@ class MemoryUpdater:
        user_id: str | None = None,
    ) -> bool:
        """Parse the model response, apply updates, and persist memory."""
-        update_data = _parse_memory_update_response(response_content)
+        response_text = _extract_text(response_content).strip()
+
+        if response_text.startswith("```"):
+            lines = response_text.split("\n")
+            response_text = "\n".join(lines[1:-1] if lines[-1] == "```" else lines[1:])
+
+        update_data = json.loads(response_text)
        # Deep-copy before in-place mutation so a subsequent save() failure
        # cannot corrupt the still-cached original object reference.
        updated_memory = self._apply_updates(copy.deepcopy(current_memory), update_data, thread_id)
@@ -26,11 +26,6 @@ from langchain_core.messages import ToolMessage

 logger = logging.getLogger(__name__)

-# Workaround for issue #2894: malformed write_file calls can carry huge Markdown
-# payloads in invalid tool-call args. Keep recovery error details short so the
-# synthetic ToolMessage does not echo large or malformed content back to the model.
-_MAX_RECOVERY_ERROR_DETAIL_LEN = 500
-

 class DanglingToolCallMiddleware(AgentMiddleware[AgentState]):
    """Inserts placeholder ToolMessages for dangling tool calls before model invocation.
@@ -103,25 +98,9 @@ class DanglingToolCallMiddleware(AgentMiddleware[AgentState]):
    @staticmethod
    def _synthetic_tool_message_content(tool_call: dict) -> str:
        if tool_call.get("invalid"):
-            name = tool_call.get("name")
            error = tool_call.get("error")
-            error_text = error[:_MAX_RECOVERY_ERROR_DETAIL_LEN] if isinstance(error, str) and error else ""
-            # Workaround for issue #2894: malformed write_file calls can carry huge Markdown
-            # payloads in invalid tool-call args. Keep recovery guidance actionable without
-            # echoing large or malformed content back to the model.
-            if name == "write_file":
-                details = f" Parser error: {error_text}" if error_text else ""
-                return (
-                    "[write_file failed before execution: the tool-call arguments were not valid JSON, "
-                    "so no file was written. This often happens when the model tries to write a very "
-                    "large Markdown file in a single tool call, especially when `content` contains "
-                    "unescaped quotes, inline JSON, backslashes, or code fences. Do not retry the same "
-                    "large `write_file` payload for this artifact; provide the report/content directly "
-                    "as normal assistant text in your next response. If a file write is still needed "
-                    f"later, split the file into smaller sections instead of one large payload.{details}]"
-                )
-            if error_text:
-                return f"[Tool call could not be executed because its arguments were invalid: {error_text}]"
+            if isinstance(error, str) and error:
+                return f"[Tool call could not be executed because its arguments were invalid: {error}]"
            return "[Tool call could not be executed because its arguments were invalid.]"
        return "[Tool call was interrupted and did not return a result.]"

@@ -1,15 +1,12 @@
 """Middleware to filter deferred tool schemas from model binding.

-When tool_search is enabled, MCP tools are still passed to ToolNode for
-execution, but their schemas must NOT be sent to the LLM via bind_tools until
-the model has discovered them via tool_search. This middleware removes the
-still-deferred tools from request.tools before model binding, and blocks tool
-calls to tools that have not been promoted yet.
+When tool_search is enabled, MCP tools are registered in the DeferredToolRegistry
+and passed to ToolNode for execution, but their schemas should NOT be sent to the
+LLM via bind_tools (that's the whole point of deferral — saving context tokens).

-The deferred name set and the catalog hash are injected at construction time
-(no ContextVar). Promotion state is read from graph state (``state["promoted"]``),
-scoped by catalog hash so a stale persisted promotion cannot expose a renamed
-or drifted tool.
+This middleware intercepts wrap_model_call and removes deferred tools from
+request.tools so that model.bind_tools only receives active tool schemas.
+The agent discovers deferred tools at runtime via the tool_search tool.
 """

 import logging
@@ -27,49 +24,47 @@ logger = logging.getLogger(__name__)


 class DeferredToolFilterMiddleware(AgentMiddleware[AgentState]):
-    """Hide deferred tool schemas from the bound model until promoted.
+    """Remove deferred tools from request.tools before model binding.

    ToolNode still holds all tools (including deferred) for execution routing,
-    but the LLM only sees active tool schemas plus tools that have already been
-    promoted (recorded in ``state["promoted"]`` under the current catalog hash).
+    but the LLM only sees active tool schemas — deferred tools are discoverable
+    via tool_search at runtime.
    """

-    def __init__(self, deferred_names: frozenset[str], catalog_hash: str | None):
-        super().__init__()
-        self._deferred = deferred_names
-        self._catalog_hash = catalog_hash
-
-    def _promoted(self, state) -> set[str]:
-        promoted = (state or {}).get("promoted")
-        if promoted and promoted.get("catalog_hash") == self._catalog_hash:
-            return set(promoted.get("names") or [])
-        return set()
-
-    def _hidden(self, state) -> set[str]:
-        return set(self._deferred) - self._promoted(state)
-
    def _filter_tools(self, request: ModelRequest) -> ModelRequest:
-        if not self._deferred:
+        from deerflow.tools.builtins.tool_search import get_deferred_registry
+
+        registry = get_deferred_registry()
+        if not registry:
            return request
-        hide = self._hidden(request.state)
-        if not hide:
-            return request
-        active = [t for t in request.tools if getattr(t, "name", None) not in hide]
-        if len(active) < len(request.tools):
-            logger.debug("Filtered %d deferred tool schema(s) from model binding", len(request.tools) - len(active))
-        return request.override(tools=active)
+
+        deferred_names = registry.deferred_names
+        active_tools = [t for t in request.tools if getattr(t, "name", None) not in deferred_names]
+
+        if len(active_tools) < len(request.tools):
+            logger.debug(f"Filtered {len(request.tools) - len(active_tools)} deferred tool schema(s) from model binding")
+
+        return request.override(tools=active_tools)

    def _blocked_tool_message(self, request: ToolCallRequest) -> ToolMessage | None:
-        if not self._deferred:
+        from deerflow.tools.builtins.tool_search import get_deferred_registry
+
+        registry = get_deferred_registry()
+        if not registry:
            return None
-        name = str(request.tool_call.get("name") or "")
-        if not name or name not in self._hidden(request.state):
+
+        tool_name = str(request.tool_call.get("name") or "")
+        if not tool_name:
            return None
+
+        if not registry.contains(tool_name):
+            return None
+
        tool_call_id = str(request.tool_call.get("id") or "missing_tool_call_id")
        return ToolMessage(
-            content=(f"Error: Tool '{name}' is deferred and has not been promoted yet. Call tool_search first to expose and promote this tool's schema, then retry."),
+            content=(f"Error: Tool '{tool_name}' is deferred and has not been promoted yet. Call tool_search first to expose and promote this tool's schema, then retry."),
            tool_call_id=tool_call_id,
-            name=name,
+            name=tool_name,
            status="error",
        )

@@ -28,7 +28,6 @@ Date-update format:

 from __future__ import annotations

-import asyncio
 import logging
 import re
 import uuid
@@ -44,12 +43,6 @@ if TYPE_CHECKING:

 logger = logging.getLogger(__name__)

-# Upper bound (seconds) for a single _inject() offload.  If the warm-up at
-# gateway startup failed silently, the first request may still hit a cold
-# tiktoken BPE download that blocks until the OS TCP timeout (~26 min).
-# This cap ensures the request degrades gracefully instead of hanging.
-_INJECT_TIMEOUT_SECONDS = 5.0
-
 _DATE_RE = re.compile(r"<current_date>([^<]+)</current_date>")
 _DYNAMIC_CONTEXT_REMINDER_KEY = "dynamic_context_reminder"
 _SUMMARY_MESSAGE_NAME = "summary"
@@ -208,25 +201,4 @@ class DynamicContextMiddleware(AgentMiddleware):

    @override
    async def abefore_agent(self, state, runtime: Runtime) -> dict | None:
-        # _inject() performs synchronous file I/O (memory JSON loading) and
-        # potentially blocking network calls (tiktoken encoding download on
-        # first use).  Offload to a thread so the event loop is never blocked
-        # — a blocking call here starves all concurrent HTTP handlers (auth,
-        # SSE heartbeats, etc.).  See issue #3402.
-        #
-        # Bounded timeout: if startup warm-up failed silently (e.g. network
-        # blip during deploy), the first request's cold tiktoken download can
-        # block for tens of minutes (OS TCP timeout).  Time-box injection so
-        # the request degrades gracefully (no memory context) rather than
-        # hanging.
-        try:
-            return await asyncio.wait_for(
-                asyncio.to_thread(self._inject, state),
-                timeout=_INJECT_TIMEOUT_SECONDS,
-            )
-        except TimeoutError:
-            logger.warning(
-                "DynamicContextMiddleware: injection timed out (%.1fs); skipping memory/date injection for this turn",
-                _INJECT_TIMEOUT_SECONDS,
-            )
-            return None
+        return self._inject(state)
@@ -62,41 +62,6 @@ _AUTH_PATTERNS = (
    "未授权",
 )

-# Per-exception retry budget overrides.
-#
-# Some transient errors are retriable in principle but expensive to retry at
-# the default budget. StreamChunkTimeoutError in particular fires after the
-# upstream provider has already stalled for `stream_chunk_timeout` seconds
-# (typically 120-240s); a full 3-attempt loop can therefore stack 6-12 minutes
-# of dead air before surfacing the failure to the user. We keep exactly one
-# retry (cheap reconnect that catches genuine transient TCP blips) and then
-# fail fast — the same buffered payload is overwhelmingly likely to fail
-# again at the upstream provider for the same reason.
-#
-# Keys are exception class *names* (not classes) so we don't introduce
-# import-time coupling on optional dependencies like langchain-openai. The
-# value is the absolute max attempt count, NOT additional retries — so a
-# value of 2 means "1 first attempt + 1 retry" (the CR-requested
-# "keep one retry" behavior).
-_RETRY_BUDGET_OVERRIDES: dict[str, int] = {
-    "StreamChunkTimeoutError": 2,
-}
-
-# Exception class names that indicate the upstream stream-chunk watchdog
-# fired because the model stalled mid-flight. These deserve a more specific
-# user-facing message than the generic "temporarily unavailable" copy,
-# because the typical root cause is a long tool-call serialization stalling
-# the upstream stream — and the most actionable advice we can give the user
-# is "ask for a shorter / split output" rather than "wait and retry".
-# Generic connection drops (httpx RemoteProtocolError / ReadError) are
-# intentionally excluded: they routinely fire on transient network blips
-# with normal payloads, where the "split the work" guidance is misleading.
-_STREAM_DROP_EXCEPTIONS: frozenset[str] = frozenset(
-    {
-        "StreamChunkTimeoutError",
-    }
-)
-

 class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
    """Retry transient LLM errors and surface graceful assistant messages."""
@@ -118,18 +83,6 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
        self._circuit_state = "closed"
        self._circuit_probe_in_flight = False

-    def _max_attempts_for(self, exc: BaseException) -> int:
-        """Return the effective max attempt count for this exception.
-
-        Falls back to `self.retry_max_attempts` unless the exception class name
-        appears in the per-exception override table.
-        """
-        override = _RETRY_BUDGET_OVERRIDES.get(type(exc).__name__)
-        if override is None:
-            return self.retry_max_attempts
-
-        return min(override, self.retry_max_attempts)
-
    def _check_circuit(self) -> bool:
        """Returns True if circuit is OPEN (fast fail), False otherwise."""
        with self._circuit_lock:
@@ -200,7 +153,6 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
            "InternalServerError",
            "ReadError",  # httpx.ReadError: connection dropped mid-stream
            "RemoteProtocolError",  # httpx: server closed connection unexpectedly
-            "StreamChunkTimeoutError",  # langchain-openai: chunk gap exceeded stream_chunk_timeout
        }:
            return True, "transient"
        if status_code in _RETRIABLE_STATUS_CODES:
@@ -225,24 +177,6 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
    def _build_circuit_breaker_message(self) -> str:
        return "The configured LLM provider is currently unavailable due to continuous failures. Circuit breaker is engaged to protect the system. Please wait a moment before trying again."

-    def _build_error_fallback_message(
-        self,
-        content: str,
-        *,
-        error_type: str,
-        reason: str,
-        detail: str,
-    ) -> AIMessage:
-        return AIMessage(
-            content=content,
-            additional_kwargs={
-                "deerflow_error_fallback": True,
-                "error_type": error_type,
-                "error_reason": reason,
-                "error_detail": detail,
-            },
-        )
-
    def _build_user_message(self, exc: BaseException, reason: str) -> str:
        detail = _extract_error_detail(exc)
        if reason == "quota":
@@ -250,31 +184,9 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
        if reason == "auth":
            return "The configured LLM provider rejected the request because authentication or access is invalid. Please check the provider credentials and try again."
        if reason in {"busy", "transient"}:
-            # Stream-drop failures (chunk-gap timeout, peer-closed connection,
-            # raw read error) almost always point at a single oversized
-            # tool-call payload — the model spent so long serializing JSON
-            # arguments that the upstream provider buffered and the stream
-            # gap exceeded `stream_chunk_timeout`. Surfacing this distinct
-            # cause lets the user split or shorten their next request
-            # instead of helplessly retrying the same prompt.
-            if type(exc).__name__ in _STREAM_DROP_EXCEPTIONS:
-                return (
-                    "The model's streaming response was interrupted before it could "
-                    "finish. This usually happens when a single response or tool call "
-                    "is very large — please ask the assistant to split the work into "
-                    "smaller steps, or shorten the requested output, and try again."
-                )
            return "The configured LLM provider is temporarily unavailable after multiple retries. Please wait a moment and continue the conversation."
        return f"LLM request failed: {detail}"

-    def _build_user_fallback_message(self, exc: BaseException, reason: str) -> AIMessage:
-        return self._build_error_fallback_message(
-            self._build_user_message(exc, reason),
-            error_type=type(exc).__name__,
-            reason=reason,
-            detail=_extract_error_detail(exc),
-        )
-
    def _emit_retry_event(self, attempt: int, wait_ms: int, reason: str) -> None:
        try:
            from langgraph.config import get_stream_writer
@@ -300,12 +212,7 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelCallResult:
        if self._check_circuit():
-            return self._build_error_fallback_message(
-                self._build_circuit_breaker_message(),
-                error_type="CircuitBreakerOpen",
-                reason="circuit_open",
-                detail="LLM circuit breaker is open",
-            )
+            return AIMessage(content=self._build_circuit_breaker_message())

        attempt = 1
        while True:
@@ -321,8 +228,7 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
                raise
            except Exception as exc:
                retriable, reason = self._classify_error(exc)
-                max_attempts = self._max_attempts_for(exc)
-                if retriable and attempt < max_attempts:
+                if retriable and attempt < self.retry_max_attempts:
                    wait_ms = self._build_retry_delay_ms(attempt, exc)
                    logger.warning(
                        "Transient LLM error on attempt %d/%d; retrying in %dms: %s",
@@ -343,7 +249,7 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
                )
                if retriable:
                    self._record_failure()
-                return self._build_user_fallback_message(exc, reason)
+                return AIMessage(content=self._build_user_message(exc, reason))

    @override
    async def awrap_model_call(
@@ -352,12 +258,7 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
        handler: Callable[[ModelRequest], Awaitable[ModelResponse]],
    ) -> ModelCallResult:
        if self._check_circuit():
-            return self._build_error_fallback_message(
-                self._build_circuit_breaker_message(),
-                error_type="CircuitBreakerOpen",
-                reason="circuit_open",
-                detail="LLM circuit breaker is open",
-            )
+            return AIMessage(content=self._build_circuit_breaker_message())

        attempt = 1
        while True:
@@ -373,8 +274,7 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
                raise
            except Exception as exc:
                retriable, reason = self._classify_error(exc)
-                max_attempts = self._max_attempts_for(exc)
-                if retriable and attempt < max_attempts:
+                if retriable and attempt < self.retry_max_attempts:
                    wait_ms = self._build_retry_delay_ms(attempt, exc)
                    logger.warning(
                        "Transient LLM error on attempt %d/%d; retrying in %dms: %s",
@@ -395,7 +295,7 @@ class LLMErrorHandlingMiddleware(AgentMiddleware[AgentState]):
                )
                if retriable:
                    self._record_failure()
-                return self._build_user_fallback_message(exc, reason)
+                return AIMessage(content=self._build_user_message(exc, reason))


 def _matches_any(detail: str, patterns: tuple[str, ...]) -> bool:
@@ -1,289 +0,0 @@
-"""Middleware for explicit slash skill activation."""
-
-from __future__ import annotations
-
-import asyncio
-import hashlib
-import html
-import logging
-import uuid
-from collections.abc import Awaitable, Callable
-from dataclasses import dataclass
-from pathlib import Path
-from typing import TYPE_CHECKING, override
-
-from langchain.agents.middleware import AgentMiddleware
-from langchain.agents.middleware.types import ModelRequest, ModelResponse
-from langchain_core.messages import AIMessage, HumanMessage
-
-from deerflow.skills.slash import parse_slash_skill_reference, resolve_slash_skill
-from deerflow.skills.storage import get_or_new_skill_storage
-from deerflow.skills.storage.skill_storage import SkillStorage
-from deerflow.skills.types import SKILL_MD_FILE
-from deerflow.utils.messages import get_original_user_content_text
-
-if TYPE_CHECKING:
-    from deerflow.config.app_config import AppConfig
-
-logger = logging.getLogger(__name__)
-
-_SLASH_SKILL_ACTIVATION_KEY = "slash_skill_activation"
-_SLASH_SKILL_ACTIVATION_TARGET_ID_KEY = "slash_skill_activation_target_id"
-_SUMMARY_MESSAGE_NAME = "summary"
-
-
-@dataclass(frozen=True, slots=True)
-class _Activation:
-    skill_name: str
-    category: str
-    container_file_path: str
-    skill_content: str
-    content_hash: str
-    remaining_text: str
-
-
-@dataclass(frozen=True, slots=True)
-class _ActivationResolution:
-    activation: _Activation | None = None
-    failure_message: str | None = None
-
-
-def is_slash_skill_activation_reminder(message: object) -> bool:
-    """Return whether a message is hidden slash-skill activation context."""
-    return isinstance(message, HumanMessage) and bool(message.additional_kwargs.get(_SLASH_SKILL_ACTIVATION_KEY))
-
-
-def _is_user_activation_target(message: object) -> bool:
-    if not isinstance(message, HumanMessage):
-        return False
-    if message.name == _SUMMARY_MESSAGE_NAME:
-        return False
-    if message.additional_kwargs.get("hide_from_ui"):
-        return False
-    return True
-
-
-class SkillActivationMiddleware(AgentMiddleware):
-    """Inject full SKILL.md content when the user explicitly types /skill-name."""
-
-    def __init__(
-        self,
-        *,
-        available_skills: set[str] | None = None,
-        app_config: AppConfig | None = None,
-    ) -> None:
-        super().__init__()
-        self._available_skills = set(available_skills) if available_skills is not None else None
-        self._app_config = app_config
-
-    def _storage(self) -> SkillStorage:
-        if self._app_config is not None:
-            return get_or_new_skill_storage(app_config=self._app_config)
-        return get_or_new_skill_storage()
-
-    @staticmethod
-    def _read_skill_content(skill_file: Path, skills_root: Path) -> str:
-        if skill_file.name != SKILL_MD_FILE:
-            raise ValueError(f"Expected {SKILL_MD_FILE}, got {skill_file.name}")
-        resolved_root = skills_root.resolve()
-        resolved_file = skill_file.resolve()
-        try:
-            resolved_file.relative_to(resolved_root)
-        except ValueError as exc:
-            raise ValueError("Resolved skill file must stay within the configured skills root.") from exc
-        if not resolved_file.is_file():
-            raise FileNotFoundError(resolved_file)
-        return resolved_file.read_text(encoding="utf-8")
-
-    def _resolve_activation(self, text: str) -> _ActivationResolution | None:
-        reference = parse_slash_skill_reference(text)
-        if reference is None:
-            return None
-
-        storage = self._storage()
-        skills = storage.load_skills(enabled_only=False)
-        skill = next((candidate for candidate in skills if candidate.name == reference.name), None)
-        if skill is None:
-            return _ActivationResolution(failure_message=f"Skill `/{reference.name}` is not installed.")
-        if not skill.enabled:
-            return _ActivationResolution(failure_message=f"Skill `/{reference.name}` is installed but disabled. Enable it before using slash activation.")
-        if self._available_skills is not None and reference.name not in self._available_skills:
-            return _ActivationResolution(failure_message=f"Skill `/{reference.name}` is not available for this agent.")
-
-        resolved = resolve_slash_skill(
-            text,
-            skills,
-            available_skills=self._available_skills,
-            container_base_path=storage.get_container_root(),
-        )
-        if resolved is None:
-            return _ActivationResolution(failure_message=f"Skill `/{reference.name}` could not be resolved.")
-
-        try:
-            skill_content = self._read_skill_content(resolved.skill.skill_file, storage.get_skills_root_path())
-        except (OSError, ValueError):
-            logger.exception("Failed to read slash-activated skill %s", resolved.skill.name)
-            return _ActivationResolution(failure_message=f"Skill `/{reference.name}` could not be loaded safely. Please check the skill installation.")
-
-        content_hash = hashlib.sha256(skill_content.encode("utf-8")).hexdigest()
-        return _ActivationResolution(
-            activation=_Activation(
-                skill_name=resolved.skill.name,
-                category=str(resolved.skill.category),
-                container_file_path=resolved.container_file_path,
-                skill_content=skill_content,
-                content_hash=content_hash,
-                remaining_text=resolved.remaining_text,
-            )
-        )
-
-    @staticmethod
-    def _build_activation_reminder(activation: _Activation) -> str:
-        user_request = activation.remaining_text or ("No additional task text was provided after the slash skill command. Ask the user what they want to do with this skill if the next step is unclear.")
-        escaped_user_request = html.escape(user_request, quote=False)
-        escaped_skill_content = html.escape(activation.skill_content, quote=False)
-        escaped_skill_name = html.escape(activation.skill_name, quote=True)
-        escaped_category = html.escape(activation.category, quote=True)
-        escaped_path = html.escape(activation.container_file_path, quote=True)
-        escaped_content_hash = html.escape(activation.content_hash, quote=True)
-        return f"""<slash_skill_activation>
-The user explicitly activated the `{activation.skill_name}` skill for this turn.
-Treat the task text as:
-<user_request>
-{escaped_user_request}
-</user_request>
-
-Follow this skill before choosing a general workflow. Load supporting resources from the same skill directory only when needed.
-
-<skill name="{escaped_skill_name}" category="{escaped_category}" path="{escaped_path}" sha256="{escaped_content_hash}">
-<skill_content encoding="xml-escaped">
-{escaped_skill_content}
-</skill_content>
-</skill>
-</slash_skill_activation>"""
-
-    @staticmethod
-    def _has_existing_activation_for_target(messages: list, target_index: int, target: HumanMessage) -> bool:
-        if target_index <= 0:
-            return False
-
-        if target.id:
-            for previous in messages[:target_index]:
-                if not is_slash_skill_activation_reminder(previous):
-                    continue
-                target_id = previous.additional_kwargs.get(_SLASH_SKILL_ACTIVATION_TARGET_ID_KEY)
-                if target_id == target.id or previous.id == f"{target.id}__slash_activation":
-                    return True
-
-        previous = messages[target_index - 1]
-        return is_slash_skill_activation_reminder(previous)
-
-    def _find_activation_target(self, messages: list) -> tuple[int, HumanMessage, _ActivationResolution] | None:
-        if not messages:
-            return None
-
-        target_index = next((idx for idx in range(len(messages) - 1, -1, -1) if _is_user_activation_target(messages[idx])), None)
-        if target_index is None:
-            return None
-
-        target = messages[target_index]
-        if target is None:
-            return None
-        if self._has_existing_activation_for_target(messages, target_index, target):
-            return None
-
-        content = get_original_user_content_text(target.content, target.additional_kwargs)
-        resolution = self._resolve_activation(content)
-        if resolution is None:
-            return None
-        return target_index, target, resolution
-
-    @staticmethod
-    def _record_activation(request: ModelRequest, activation: _Activation, *, hook: str) -> None:
-        runtime = getattr(request, "runtime", None)
-        context = getattr(runtime, "context", None)
-        journal = context.get("__run_journal") if isinstance(context, dict) else None
-        if journal is None:
-            return
-        try:
-            journal.record_middleware(
-                "skill_activation",
-                name="SkillActivationMiddleware",
-                hook=hook,
-                action="activate",
-                changes={
-                    "skill_name": activation.skill_name,
-                    "category": activation.category,
-                    "path": activation.container_file_path,
-                    "content_hash": activation.content_hash,
-                },
-            )
-        except Exception:
-            logger.debug("Failed to record slash skill activation audit event", exc_info=True)
-
-    def _prepare_model_request(self, request: ModelRequest, *, hook: str) -> ModelRequest | AIMessage | None:
-        target_and_resolution = self._find_activation_target(list(request.messages))
-        if target_and_resolution is None:
-            return None
-
-        target_index, target, resolution = target_and_resolution
-        if resolution.failure_message:
-            return AIMessage(content=resolution.failure_message)
-
-        activation = resolution.activation
-        if activation is None:
-            return None
-
-        logger.info(
-            "SkillActivationMiddleware: activating slash skill %s category=%s path=%s hash=%s",
-            activation.skill_name,
-            activation.category,
-            activation.container_file_path,
-            activation.content_hash,
-        )
-        self._record_activation(request, activation, hook=hook)
-        activation_msg = self._make_activation_message(target, self._build_activation_reminder(activation))
-        messages = list(request.messages)
-        messages.insert(target_index, activation_msg)
-        return request.override(messages=messages)
-
-    @staticmethod
-    def _make_activation_message(target: HumanMessage, activation_content: str) -> HumanMessage:
-        stable_id = target.id or str(uuid.uuid4())
-        additional_kwargs = {
-            "hide_from_ui": True,
-            _SLASH_SKILL_ACTIVATION_KEY: True,
-        }
-        if target.id:
-            additional_kwargs[_SLASH_SKILL_ACTIVATION_TARGET_ID_KEY] = target.id
-        return HumanMessage(
-            content=activation_content,
-            id=f"{stable_id}__slash_activation",
-            additional_kwargs=additional_kwargs,
-        )
-
-    @override
-    def wrap_model_call(
-        self,
-        request: ModelRequest,
-        handler: Callable[[ModelRequest], ModelResponse],
-    ) -> ModelResponse | AIMessage:
-        prepared = self._prepare_model_request(request, hook="wrap_model_call")
-        if prepared is None:
-            return handler(request)
-        if isinstance(prepared, AIMessage):
-            return prepared
-        return handler(prepared)
-
-    @override
-    async def awrap_model_call(
-        self,
-        request: ModelRequest,
-        handler: Callable[[ModelRequest], Awaitable[ModelResponse]],
-    ) -> ModelResponse | AIMessage:
-        prepared = await asyncio.to_thread(self._prepare_model_request, request, hook="awrap_model_call")
-        if prepared is None:
-            return await handler(request)
-        if isinstance(prepared, AIMessage):
-            return prepared
-        return await handler(prepared)
@@ -9,9 +9,8 @@ from typing import Any, Protocol, override, runtime_checkable

 from langchain.agents import AgentState
 from langchain.agents.middleware import SummarizationMiddleware
-from langchain_core.messages import AIMessage, AnyMessage, HumanMessage, RemoveMessage, ToolMessage, get_buffer_string
+from langchain_core.messages import AIMessage, AnyMessage, HumanMessage, RemoveMessage, ToolMessage
 from langgraph.config import get_config
-from langgraph.constants import TAG_NOSTREAM
 from langgraph.graph.message import REMOVE_ALL_MESSAGES
 from langgraph.runtime import Runtime

@@ -117,74 +116,6 @@ class DeerFlowSummarizationMiddleware(SummarizationMiddleware):
        self._preserve_recent_skill_count = max(0, preserve_recent_skill_count)
        self._preserve_recent_skill_tokens = max(0, preserve_recent_skill_tokens)
        self._preserve_recent_skill_tokens_per_skill = max(0, preserve_recent_skill_tokens_per_skill)
-        # The summary LLM call runs inside a LangGraph middleware hook, so its token
-        # stream would otherwise be captured by the messages-tuple stream callback and
-        # broadcast to the frontend as a phantom AI message. Tag a dedicated model copy
-        # with TAG_NOSTREAM so the streaming handler skips it.
-        # Keep self.model untagged so the parent's profile / ls_params inspection still works.
-        #
-        # Preserve any tags already bound on the model (e.g. "middleware:summarize" set in
-        # lead_agent/agent.py for RunJournal attribution): RunnableBinding.with_config does a
-        # shallow merge that would otherwise overwrite the existing tags list entirely.
-        existing_tags = list((getattr(self.model, "config", None) or {}).get("tags") or [])
-        merged_tags = [*existing_tags, TAG_NOSTREAM] if TAG_NOSTREAM not in existing_tags else existing_tags
-        self._summary_model = self.model.with_config(tags=merged_tags)
-
-    @override
-    def _create_summary(self, messages_to_summarize: list[AnyMessage]) -> str:
-        return self._summarize_with(messages_to_summarize)
-
-    @override
-    async def _acreate_summary(self, messages_to_summarize: list[AnyMessage]) -> str:
-        return await self._asummarize_with(messages_to_summarize)
-
-    def _summarize_with(self, messages_to_summarize: list[AnyMessage]) -> str:
-        """Mirror the parent ``_create_summary`` but invoke the nostream-tagged model.
-
-        We do not swap ``self.model`` at the instance level: the agent/middleware is
-        cached and reused across concurrent runs, so a temporary swap would leak the
-        ``RunnableBinding`` to other coroutines during ``await`` and break parent logic
-        that inspects the raw model (``profile`` / ``_get_ls_params``).
-        """
-        if not messages_to_summarize:
-            return "No previous conversation history."
-        prompt = self._build_summary_prompt(messages_to_summarize)
-        if prompt is None:
-            return "Previous conversation was too long to summarize."
-        try:
-            response = self._summary_model.invoke(
-                prompt,
-                config={"metadata": {"lc_source": "summarization"}},
-            )
-            return response.text.strip()
-        except Exception as e:
-            return f"Error generating summary: {e!s}"
-
-    async def _asummarize_with(self, messages_to_summarize: list[AnyMessage]) -> str:
-        """Async counterpart of :meth:`_summarize_with` using the nostream model."""
-        if not messages_to_summarize:
-            return "No previous conversation history."
-        prompt = self._build_summary_prompt(messages_to_summarize)
-        if prompt is None:
-            return "Previous conversation was too long to summarize."
-        try:
-            response = await self._summary_model.ainvoke(
-                prompt,
-                config={"metadata": {"lc_source": "summarization"}},
-            )
-            return response.text.strip()
-        except Exception as e:
-            return f"Error generating summary: {e!s}"
-
-    def _build_summary_prompt(self, messages_to_summarize: list[AnyMessage]) -> str | None:
-        """Build the summary prompt, returning ``None`` when trimming leaves nothing."""
-        trimmed_messages = self._trim_messages_for_summary(messages_to_summarize)
-        if not trimmed_messages:
-            return None
-        # Format messages to avoid token inflation from metadata when str() is called on
-        # message objects.
-        formatted_messages = get_buffer_string(trimmed_messages)
-        return self.summary_prompt.format(messages=formatted_messages).rstrip()

    def before_model(self, state: AgentState, runtime: Runtime) -> dict | None:
        return self._maybe_summarize(state, runtime)
@@ -2,7 +2,7 @@

 import logging
 from collections.abc import Awaitable, Callable
-from typing import TYPE_CHECKING, override
+from typing import override

 from langchain.agents import AgentState
 from langchain.agents.middleware import AgentMiddleware
@@ -12,48 +12,10 @@ from langgraph.prebuilt.tool_node import ToolCallRequest
 from langgraph.types import Command

 from deerflow.config.app_config import AppConfig
-from deerflow.subagents.status_contract import (
-    extract_subagent_status,
-    make_subagent_additional_kwargs,
-)
-
-if TYPE_CHECKING:
-    from deerflow.tools.builtins.tool_search import DeferredToolSetup

 logger = logging.getLogger(__name__)

 _MISSING_TOOL_CALL_ID = "missing_tool_call_id"
-_TASK_TOOL_NAME = "task"
-
-
-def _stamp_task_subagent_status(message: ToolMessage, *, tool_name: str, error: str | None = None) -> ToolMessage:
-    """Centralised stamping of ``additional_kwargs.subagent_status``.
-
-    Bytedance/deer-flow issue #3146: the frontend now reads the subagent
-    status from a structured field instead of parsing the leading text of
-    the task tool's return string. That contract is enforced here, in the
-    one place every task tool result flows through, rather than at the 5
-    normal-return + 3 ``Error:`` pre-execution branches inside
-    ``task_tool.py``. Centralisation prevents the "added a new return
-    path, forgot the stamp" drift mode.
-
-    For non-``task`` tools this is a no-op so other tools' additional_kwargs
-    conventions are untouched.
-    """
-    if tool_name != _TASK_TOOL_NAME:
-        return message
-    content = message.content if isinstance(message.content, str) else ""
-    status = extract_subagent_status(content)
-    if status is None:
-        # Non-terminal streaming chunks or unrecognised shapes leave the
-        # field unset so the frontend can keep the card on its in-progress
-        # placeholder until a real terminal frame arrives.
-        return message
-    stamp = make_subagent_additional_kwargs(status, error=error)
-    existing = dict(message.additional_kwargs or {})
-    existing.update(stamp)
-    message.additional_kwargs = existing
-    return message


 class ToolErrorHandlingMiddleware(AgentMiddleware[AgentState]):
@@ -67,31 +29,12 @@ class ToolErrorHandlingMiddleware(AgentMiddleware[AgentState]):
            detail = detail[:497] + "..."

        content = f"Error: Tool '{tool_name}' failed with {exc.__class__.__name__}: {detail}. Continue with available context, or choose an alternative tool."
-        message = ToolMessage(
+        return ToolMessage(
            content=content,
            tool_call_id=tool_call_id,
            name=tool_name,
            status="error",
        )
-        # Stamp the structured subagent status on the wrapper too: the
-        # frontend would otherwise have to fall back to prefix-matching
-        # ``Error: Tool 'task' failed ...`` on the wire. The ``subagent_error``
-        # carries the same ``ExcClass: detail`` shape the wrapper string
-        # uses so debugging artifacts stay aligned.
-        structured_error = f"{exc.__class__.__name__}: {detail}"
-        return _stamp_task_subagent_status(message, tool_name=tool_name, error=structured_error)
-
-    @staticmethod
-    def _maybe_stamp(result: ToolMessage | Command, request: ToolCallRequest) -> ToolMessage | Command:
-        """Apply the subagent stamp to successful task tool returns.
-
-        ``Command`` results bypass the stamp — they encode LangGraph
-        control flow rather than user-facing tool output.
-        """
-        if not isinstance(result, ToolMessage):
-            return result
-        tool_name = str(request.tool_call.get("name") or "")
-        return _stamp_task_subagent_status(result, tool_name=tool_name)

    @override
    def wrap_tool_call(
@@ -100,14 +43,13 @@ class ToolErrorHandlingMiddleware(AgentMiddleware[AgentState]):
        handler: Callable[[ToolCallRequest], ToolMessage | Command],
    ) -> ToolMessage | Command:
        try:
-            result = handler(request)
+            return handler(request)
        except GraphBubbleUp:
            # Preserve LangGraph control-flow signals (interrupt/pause/resume).
            raise
        except Exception as exc:
            logger.exception("Tool execution failed (sync): name=%s id=%s", request.tool_call.get("name"), request.tool_call.get("id"))
            return self._build_error_message(request, exc)
-        return self._maybe_stamp(result, request)

    @override
    async def awrap_tool_call(
@@ -116,14 +58,13 @@ class ToolErrorHandlingMiddleware(AgentMiddleware[AgentState]):
        handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
    ) -> ToolMessage | Command:
        try:
-            result = await handler(request)
+            return await handler(request)
        except GraphBubbleUp:
            # Preserve LangGraph control-flow signals (interrupt/pause/resume).
            raise
        except Exception as exc:
            logger.exception("Tool execution failed (async): name=%s id=%s", request.tool_call.get("name"), request.tool_call.get("id"))
            return self._build_error_message(request, exc)
-        return self._maybe_stamp(result, request)


 def _build_runtime_middlewares(
@@ -136,11 +77,9 @@ def _build_runtime_middlewares(
    """Build shared base middlewares for agent execution."""
    from deerflow.agents.middlewares.llm_error_handling_middleware import LLMErrorHandlingMiddleware
    from deerflow.agents.middlewares.thread_data_middleware import ThreadDataMiddleware
-    from deerflow.agents.middlewares.tool_output_budget_middleware import ToolOutputBudgetMiddleware
    from deerflow.sandbox.middleware import SandboxMiddleware

    middlewares: list[AgentMiddleware] = [
-        ToolOutputBudgetMiddleware.from_app_config(app_config),
        ThreadDataMiddleware(lazy_init=lazy_init),
        SandboxMiddleware(lazy_init=lazy_init),
    ]
@@ -148,7 +87,7 @@ def _build_runtime_middlewares(
    if include_uploads:
        from deerflow.agents.middlewares.uploads_middleware import UploadsMiddleware

-        middlewares.insert(2, UploadsMiddleware())
+        middlewares.insert(1, UploadsMiddleware())

    if include_dangling_tool_call_patch:
        from deerflow.agents.middlewares.dangling_tool_call_middleware import DanglingToolCallMiddleware
@@ -202,7 +141,6 @@ def build_subagent_runtime_middlewares(
    app_config: AppConfig | None = None,
    model_name: str | None = None,
    lazy_init: bool = True,
-    deferred_setup: "DeferredToolSetup | None" = None,
 ) -> list[AgentMiddleware]:
    """Middlewares shared by subagent runtime before subagent-only middlewares."""
    if app_config is None:
@@ -226,16 +164,6 @@ def build_subagent_runtime_middlewares(

        middlewares.append(ViewImageMiddleware())

-    # Hide deferred (MCP) tool schemas from the subagent's model binding until
-    # tool_search promotes them. This is the same wiring the lead agent gets. The deferred
-    # set + catalog hash come from the build-time setup (assembled after
-    # tool-policy filtering); promotion is read from graph state. Empty/None
-    # setup (deferral disabled or no MCP tool survived) is a pure no-op.
-    if deferred_setup is not None and deferred_setup.deferred_names:
-        from deerflow.agents.middlewares.deferred_tool_filter_middleware import DeferredToolFilterMiddleware
-
-        middlewares.append(DeferredToolFilterMiddleware(deferred_setup.deferred_names, deferred_setup.catalog_hash))
-
    # Same provider safety-termination guard the lead agent uses — subagents
    # are equally exposed to truncated tool_calls returned with
    # finish_reason=content_filter (and friends), and the bad call would then
@@ -1,643 +0,0 @@
-"""Middleware that enforces a per-result budget on tool outputs.
-
-Oversized tool results are persisted to disk and replaced with a compact
-preview containing a file reference.  When disk persistence is
-unavailable the middleware falls back to head+tail truncation so the
-model context is never blown by a single large tool return.
-"""
-
-from __future__ import annotations
-
-import asyncio
-import logging
-import os
-import shlex
-import uuid
-from collections.abc import Awaitable, Callable
-from dataclasses import replace as dc_replace
-from typing import TYPE_CHECKING, Any, override
-
-from langchain.agents import AgentState
-from langchain.agents.middleware import AgentMiddleware
-from langchain.agents.middleware.types import ModelCallResult, ModelRequest, ModelResponse
-from langchain_core.messages import ToolMessage
-from langgraph.prebuilt.tool_node import ToolCallRequest
-from langgraph.types import Command
-
-from deerflow.config.tool_output_config import ToolOutputConfig
-from deerflow.sandbox.sandbox_provider import get_sandbox_provider
-
-if TYPE_CHECKING:
-    from deerflow.sandbox.sandbox import Sandbox
-
-logger = logging.getLogger(__name__)
-
-# Virtual outputs root inside the sandbox. Host-mounted sandboxes map this to
-# the thread outputs dir on the host; for non-mounted (remote) sandboxes the
-# same path is written directly into the sandbox filesystem so the model's
-# ``read_file`` tool can read it back (issue #3416).
-_VIRTUAL_OUTPUTS_BASE = "/mnt/user-data/outputs"
-
-
-def _default_config() -> ToolOutputConfig:
-    return ToolOutputConfig()
-
-
-# ---------------------------------------------------------------------------
-# Text helpers
-# ---------------------------------------------------------------------------
-
-
-def _message_text(content: Any) -> str | None:
-    """Extract a plain-text representation from a ToolMessage content field.
-
-    Returns ``None`` for non-string / multimodal content so the caller
-    can skip budget enforcement (images, structured blocks, etc.).
-    """
-    if isinstance(content, str):
-        return content
-    if content is None:
-        return None
-    if isinstance(content, list):
-        pieces: list[str] = []
-        for part in content:
-            if isinstance(part, str):
-                pieces.append(part)
-            elif isinstance(part, dict) and isinstance(part.get("text"), str):
-                pieces.append(part["text"])
-            else:
-                return None
-        return "\n".join(pieces) if pieces else None
-    return None
-
-
-def _snap_to_line_boundary(text: str, pos: int) -> int:
-    """Return *pos* or the nearest preceding newline+1, whichever is closer.
-
-    Used so that previews and truncations end on a complete line when
-    possible.  If no newline exists in the second half of ``text[:pos]``
-    the original *pos* is returned unchanged.
-    """
-    if pos <= 0 or pos >= len(text):
-        return pos
-    half = pos // 2
-    nl = text.rfind("\n", half, pos)
-    if nl >= 0:
-        return nl + 1
-    return pos
-
-
-# ---------------------------------------------------------------------------
-# Disk persistence
-# ---------------------------------------------------------------------------
-
-_EXT_MAP: dict[str, str] = {
-    "bash": "log",
-    "bash_tool": "log",
-    "web_fetch": "log",
-}
-
-
-def _sanitize_tool_name(name: str) -> str:
-    """Strip path separators and traversal components from a tool name."""
-    base = os.path.basename(name)
-    safe = base.replace("..", "").replace("/", "_").replace("\\", "_")
-    return safe or "unknown"
-
-
-def _build_externalized_filename(*, tool_name: str, tool_call_id: str) -> str:
-    """Build the on-disk filename for an externalized tool output.
-
-    Shared by the host-disk and sandbox externalization paths so both
-    produce the identical naming scheme.
-    """
-    safe_name = _sanitize_tool_name(tool_name)
-    ext = _EXT_MAP.get(tool_name, "txt")
-    short_id = uuid.uuid4().hex[:12]
-    return f"{safe_name}-{short_id}.{ext}"
-
-
-def _externalize(
-    content: str,
-    *,
-    tool_name: str,
-    tool_call_id: str,
-    outputs_path: str,
-    storage_subdir: str,
-) -> str | None:
-    """Write *content* to disk and return the virtual path, or ``None`` on failure."""
-    if os.path.isabs(storage_subdir) or ".." in storage_subdir:
-        return None
-    storage_dir = os.path.join(outputs_path, storage_subdir)
-    try:
-        os.makedirs(storage_dir, exist_ok=True)
-    except OSError:
-        return None
-
-    filename = _build_externalized_filename(tool_name=tool_name, tool_call_id=tool_call_id)
-    filepath = os.path.join(storage_dir, filename)
-
-    if not os.path.abspath(filepath).startswith(os.path.abspath(storage_dir)):
-        return None
-
-    try:
-        with open(filepath, "w", encoding="utf-8") as f:
-            f.write(content)
-    except OSError:
-        return None
-
-    return f"{_VIRTUAL_OUTPUTS_BASE}/{storage_subdir}/{filename}"
-
-
-def _externalize_to_sandbox(
-    content: str,
-    *,
-    tool_name: str,
-    tool_call_id: str,
-    storage_subdir: str,
-    sandbox: Sandbox,
-) -> str | None:
-    """Write *content* into the sandbox filesystem and return the virtual path.
-
-    Used when the sandbox does not use thread-data mounts (e.g. a remote AIO
-    sandbox): the host-side :func:`_externalize` virtual path would not exist
-    inside the sandbox, so the model's ``read_file`` tool could not read it
-    back (issue #3416). Returns the same virtual-path contract on success, or
-    ``None`` to signal the caller to fall back to inline truncation.
-    """
-    if os.path.isabs(storage_subdir) or ".." in storage_subdir:
-        return None
-    filename = _build_externalized_filename(tool_name=tool_name, tool_call_id=tool_call_id)
-    virtual_dir = f"{_VIRTUAL_OUTPUTS_BASE}/{storage_subdir}"
-    virtual_path = f"{virtual_dir}/{filename}"
-    try:
-        # AIO sandbox write_file does NOT create parent directories, so create
-        # them explicitly before writing. execute_command returns its stdout
-        # verbatim (including an "Error: ..." string on failure) rather than
-        # raising, so we cannot rely on exception propagation here.
-        sandbox.execute_command(f"mkdir -p {shlex.quote(virtual_dir)}")
-        sandbox.write_file(virtual_path, content)
-        # Validate the file landed: execute_command may have silently failed
-        # to create the directory, and write_file backends differ. Refuse to
-        # hand the model an unreadable read_file path.
-        check = sandbox.execute_command(f"test -s {shlex.quote(virtual_path)} && echo OK || echo MISSING")
-        if not isinstance(check, str) or check.strip() != "OK":
-            logger.warning(
-                "Sandbox externalize validation failed: path=%s, check=%r",
-                virtual_path,
-                check,
-            )
-            return None
-    except Exception:
-        logger.exception(
-            "Failed to externalize %s output to sandbox (call_id=%s)",
-            tool_name,
-            tool_call_id,
-        )
-        return None
-    return virtual_path
-
-
-# ---------------------------------------------------------------------------
-# Preview / fallback builders
-# ---------------------------------------------------------------------------
-
-
-def _build_preview(
-    content: str,
-    *,
-    tool_name: str,
-    virtual_path: str,
-    head_chars: int,
-    tail_chars: int,
-) -> str:
-    """Build a preview with a file reference for externalized output."""
-    total = len(content)
-    head_end = _snap_to_line_boundary(content, min(head_chars, total))
-    tail_start = max(head_end, total - tail_chars)
-    tail_start_snapped = _snap_to_line_boundary(content, tail_start)
-    if tail_start_snapped > head_end:
-        tail_start = tail_start_snapped
-
-    head = content[:head_end]
-    tail = content[tail_start:] if tail_start < total else ""
-
-    omitted = total - len(head) - len(tail)
-    ref = f"\n\n[Full {tool_name} output saved to {virtual_path} ({total} chars, ~{total // 4} tokens). Use read_file with start_line and end_line to access specific sections. {omitted} chars omitted from this preview.]\n\n"
-
-    parts = [head, ref]
-    if tail:
-        parts.append(tail)
-    return "".join(parts)
-
-
-def _build_fallback(
-    content: str,
-    *,
-    tool_name: str,
-    max_chars: int,
-    head_chars: int,
-    tail_chars: int,
-) -> str:
-    """Build a head+tail truncation when disk persistence is unavailable.
-
-    The returned string is guaranteed to be no longer than *max_chars*.
-    """
-    total = len(content)
-    if max_chars <= 0 or total <= max_chars:
-        return content
-
-    marker_template = "\n\n[... {n} chars omitted from {tn} output. Persistent storage unavailable. Consider narrowing the query or using more specific parameters.]\n\n"
-    marker_overhead = len(marker_template.format(n=total, tn=tool_name))
-
-    if marker_overhead >= max_chars:
-        return content[:max_chars]
-
-    budget = max_chars - marker_overhead
-    effective_head = min(head_chars, budget)
-    effective_tail = min(tail_chars, max(0, budget - effective_head))
-
-    head_end = _snap_to_line_boundary(content, min(effective_head, total))
-    tail_start = max(head_end, total - effective_tail)
-    tail_start_snapped = _snap_to_line_boundary(content, tail_start)
-    if tail_start_snapped > head_end:
-        tail_start = tail_start_snapped
-
-    head = content[:head_end]
-    tail = content[tail_start:] if tail_start < total else ""
-    omitted = total - len(head) - len(tail)
-
-    marker = marker_template.format(n=omitted, tn=tool_name)
-
-    parts = [head, marker]
-    if tail:
-        parts.append(tail)
-    return "".join(parts)
-
-
-# ---------------------------------------------------------------------------
-# Core budget logic
-# ---------------------------------------------------------------------------
-
-
-def _resolve_outputs_path(request: ToolCallRequest) -> str | None:
-    """Best-effort extraction of the thread outputs path."""
-    runtime = getattr(request, "runtime", None)
-    if runtime is None:
-        return None
-    state = getattr(runtime, "state", None)
-    if state is None:
-        return None
-    thread_data = state.get("thread_data")
-    if not isinstance(thread_data, dict):
-        return None
-    outputs_path = thread_data.get("outputs_path")
-    return outputs_path if isinstance(outputs_path, str) else None
-
-
-def _resolve_sandbox(request: ToolCallRequest) -> Sandbox | None:
-    """Resolve the active sandbox for the current tool call, or ``None``.
-
-    Reads the sandbox_id that ``SandboxMiddleware`` (and the sandbox tools
-    themselves) write into ``runtime.state["sandbox"]``. We intentionally do
-    NOT call ``provider.acquire`` here: acquiring a sandbox can trigger
-    blocking remote I/O, and this resolver runs on every tool call. Tools
-    that do not use a sandbox (``web_search``, MCP, ...) will return ``None``
-    here, which is fine -- the caller falls back to inline truncation.
-    """
-    runtime = getattr(request, "runtime", None)
-    state = getattr(runtime, "state", None)
-    if not isinstance(state, dict):
-        return None
-    sandbox_state = state.get("sandbox")
-    if not isinstance(sandbox_state, dict):
-        return None
-    sandbox_id = sandbox_state.get("sandbox_id")
-    if not sandbox_id:
-        return None
-    try:
-        return get_sandbox_provider().get(sandbox_id)
-    except Exception:
-        logger.exception("Failed to look up sandbox %s for tool-output externalization", sandbox_id)
-        return None
-
-
-def _budget_content(
-    content: str,
-    *,
-    tool_name: str,
-    tool_call_id: str,
-    outputs_path: str | None,
-    config: ToolOutputConfig,
-    sandbox: Sandbox | None = None,
-) -> str | None:
-    """Apply budget to *content*. Returns ``None`` if no change needed."""
-    threshold = config.tool_overrides.get(tool_name, config.externalize_min_chars)
-    if threshold <= 0 and config.fallback_max_chars <= 0:
-        return None
-    if len(content) <= threshold and len(content) <= config.fallback_max_chars:
-        return None
-
-    if threshold > 0 and len(content) > threshold:
-        virtual_path: str | None = None
-        # Decide persistence target based on what's available, without touching
-        # the sandbox provider unless a sandbox was actually resolved for this
-        # call. This keeps the legacy host-disk path provider-free, so callers
-        # without a configured sandbox (and CI environments without a
-        # config.yaml) continue to externalize to the host as before.
-        if sandbox is not None:
-            provider = None
-            try:
-                provider = get_sandbox_provider()
-            except Exception:
-                logger.exception("Failed to get sandbox provider for tool-output externalization; falling back to inline truncation")
-            if provider is not None and getattr(provider, "uses_thread_data_mounts", False):
-                # Host-mounted sandbox: host outputs path is bind-mounted into
-                # the sandbox at the same virtual path, so writing host-side is
-                # equivalent. Preserve the original behavior to avoid extra
-                # sandbox round-trips.
-                if outputs_path:
-                    virtual_path = _externalize(
-                        content,
-                        tool_name=tool_name,
-                        tool_call_id=tool_call_id,
-                        outputs_path=outputs_path,
-                        storage_subdir=config.storage_subdir,
-                    )
-            else:
-                virtual_path = _externalize_to_sandbox(
-                    content,
-                    tool_name=tool_name,
-                    tool_call_id=tool_call_id,
-                    storage_subdir=config.storage_subdir,
-                    sandbox=sandbox,
-                )
-        elif outputs_path:
-            # No sandbox in this call (legacy / non-sandbox tools): write to
-            # host outputs path directly, no provider needed.
-            virtual_path = _externalize(
-                content,
-                tool_name=tool_name,
-                tool_call_id=tool_call_id,
-                outputs_path=outputs_path,
-                storage_subdir=config.storage_subdir,
-            )
-        if virtual_path is not None:
-            logger.info(
-                "Externalized %s output (%d chars) to %s",
-                tool_name,
-                len(content),
-                virtual_path,
-            )
-            return _build_preview(
-                content,
-                tool_name=tool_name,
-                virtual_path=virtual_path,
-                head_chars=config.preview_head_chars,
-                tail_chars=config.preview_tail_chars,
-            )
-
-    if config.fallback_max_chars > 0 and len(content) > config.fallback_max_chars:
-        logger.warning(
-            "Fallback-truncating %s output: %d chars → %d max",
-            tool_name,
-            len(content),
-            config.fallback_max_chars,
-        )
-        return _build_fallback(
-            content,
-            tool_name=tool_name,
-            max_chars=config.fallback_max_chars,
-            head_chars=config.fallback_head_chars,
-            tail_chars=config.fallback_tail_chars,
-        )
-
-    return None
-
-
-# ---------------------------------------------------------------------------
-# Result patchers
-# ---------------------------------------------------------------------------
-
-
-def _patch_tool_message(
-    msg: ToolMessage,
-    config: ToolOutputConfig,
-    outputs_path: str | None,
-    sandbox: Sandbox | None = None,
-) -> ToolMessage:
-    """Apply budget to a single ToolMessage. Returns the original if unchanged."""
-    tool_name = msg.name or "unknown"
-    if tool_name in config.exempt_tools:
-        return msg
-
-    text = _message_text(msg.content)
-    if text is None:
-        return msg
-
-    replacement = _budget_content(
-        text,
-        tool_name=tool_name,
-        tool_call_id=msg.tool_call_id or "",
-        outputs_path=outputs_path,
-        config=config,
-        sandbox=sandbox,
-    )
-    if replacement is None:
-        return msg
-
-    update: dict[str, Any] = {"content": replacement}
-    if getattr(msg, "response_metadata", None):
-        update["response_metadata"] = dict(msg.response_metadata)
-    if getattr(msg, "additional_kwargs", None):
-        update["additional_kwargs"] = dict(msg.additional_kwargs)
-    return msg.model_copy(update=update)
-
-
-def _effective_trigger(tool_name: str, config: ToolOutputConfig) -> int:
-    """Smallest content length that could trigger budgeting for *tool_name*.
-
-    Mirrors the trigger conditions in :func:`_budget_content` (per-tool
-    externalize threshold OR global fallback), so the pre-scan never produces
-    a false negative. Returns ``-1`` when nothing could ever trigger.
-    """
-    candidates: list[int] = []
-    externalize = config.tool_overrides.get(tool_name, config.externalize_min_chars)
-    if externalize > 0:
-        candidates.append(externalize)
-    if config.fallback_max_chars > 0:
-        candidates.append(config.fallback_max_chars)
-    return min(candidates) if candidates else -1
-
-
-def _tool_message_over_budget(msg: ToolMessage, config: ToolOutputConfig) -> bool:
-    """Cheap, per-tool-aware check: is this ToolMessage non-exempt and over its trigger?"""
-    if (msg.name or "") in config.exempt_tools:
-        return False
-    trigger = _effective_trigger(msg.name or "", config)
-    if trigger < 0:
-        return False
-    text = _message_text(msg.content)
-    return text is not None and len(text) > trigger
-
-
-def _needs_budget(result: ToolMessage | Command, config: ToolOutputConfig) -> bool:
-    """Fast check whether *result* could need budgeting (avoids thread offload for small outputs)."""
-    if isinstance(result, ToolMessage):
-        return _tool_message_over_budget(result, config)
-    update = getattr(result, "update", None)
-    if isinstance(update, dict):
-        for msg in update.get("messages", []):
-            if isinstance(msg, ToolMessage) and _tool_message_over_budget(msg, config):
-                return True
-    return False
-
-
-def _patch_result(
-    result: ToolMessage | Command,
-    config: ToolOutputConfig,
-    outputs_path: str | None,
-    sandbox: Sandbox | None = None,
-) -> ToolMessage | Command:
-    """Apply budget to a tool call result (ToolMessage or Command)."""
-    if isinstance(result, ToolMessage):
-        return _patch_tool_message(result, config, outputs_path, sandbox)
-
-    update = getattr(result, "update", None)
-    if not isinstance(update, dict):
-        return result
-
-    messages = update.get("messages")
-    if not isinstance(messages, list):
-        return result
-
-    new_messages: list[Any] = []
-    changed = False
-    for msg in messages:
-        if isinstance(msg, ToolMessage):
-            patched = _patch_tool_message(msg, config, outputs_path, sandbox)
-            if patched is not msg:
-                changed = True
-            new_messages.append(patched)
-        else:
-            new_messages.append(msg)
-
-    if not changed:
-        return result
-
-    return dc_replace(result, update={**update, "messages": new_messages})
-
-
-def _patch_model_messages(messages: list[Any], config: ToolOutputConfig) -> list[Any] | None:
-    """Apply budget to historical ToolMessages in a model request. Returns ``None`` if unchanged.
-
-    A cheap pre-scan bails out before allocating a new list when no historical
-    ToolMessage exceeds the budget — the common case once every result has
-    already been budgeted at tool-call time, so a long history is not rebuilt
-    on every model call.
-
-    Historical messages do not get a ``sandbox`` argument: any oversized tool
-    message in history was already budgeted (and possibly externalized) at
-    tool-call time, so the only thing left for the history path to do is
-    inline fallback truncation, which needs no sandbox.
-    """
-    if not any(isinstance(msg, ToolMessage) and _tool_message_over_budget(msg, config) for msg in messages):
-        return None
-
-    updated: list[Any] = []
-    changed = False
-    for msg in messages:
-        if isinstance(msg, ToolMessage):
-            patched = _patch_tool_message(msg, config, outputs_path=None)
-            if patched is not msg:
-                changed = True
-            updated.append(patched)
-        else:
-            updated.append(msg)
-    return updated if changed else None
-
-
-# ---------------------------------------------------------------------------
-# Middleware class
-# ---------------------------------------------------------------------------
-
-
-class ToolOutputBudgetMiddleware(AgentMiddleware[AgentState]):
-    """Enforce per-result budget on tool outputs via externalization or truncation."""
-
-    def __init__(self, config: ToolOutputConfig | None = None) -> None:
-        super().__init__()
-        self._config = config if config is not None else _default_config()
-
-    @classmethod
-    def from_app_config(cls, app_config: Any) -> ToolOutputBudgetMiddleware:
-        tool_output = getattr(app_config, "tool_output", None)
-        if isinstance(tool_output, ToolOutputConfig):
-            return cls(config=tool_output)
-        return cls()
-
-    # -- tool call hooks ---------------------------------------------------
-
-    @override
-    def wrap_tool_call(
-        self,
-        request: ToolCallRequest,
-        handler: Callable[[ToolCallRequest], ToolMessage | Command],
-    ) -> ToolMessage | Command:
-        result = handler(request)
-        if not self._config.enabled:
-            return result
-        if not _needs_budget(result, self._config):
-            return result
-        outputs_path = _resolve_outputs_path(request)
-        sandbox = _resolve_sandbox(request)
-        return _patch_result(result, self._config, outputs_path, sandbox)
-
-    @override
-    async def awrap_tool_call(
-        self,
-        request: ToolCallRequest,
-        handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command]],
-    ) -> ToolMessage | Command:
-        result = await handler(request)
-        if not self._config.enabled:
-            return result
-        if not _needs_budget(result, self._config):
-            return result
-        outputs_path = _resolve_outputs_path(request)
-        # _resolve_sandbox only touches runtime.state and the provider's
-        # in-memory sandbox registry, so it is safe to call on the event
-        # loop. The actual sandbox I/O (mkdir/write/test) happens inside
-        # _patch_result, which is offloaded to a worker thread below.
-        sandbox = _resolve_sandbox(request)
-        return await asyncio.to_thread(_patch_result, result, self._config, outputs_path, sandbox)
-
-    # -- model call hooks (historical message truncation) ------------------
-
-    @override
-    def wrap_model_call(
-        self,
-        request: ModelRequest,
-        handler: Callable[[ModelRequest], ModelResponse],
-    ) -> ModelCallResult:
-        if self._config.enabled:
-            messages = getattr(request, "messages", None)
-            if isinstance(messages, list):
-                patched = _patch_model_messages(messages, self._config)
-                if patched is not None:
-                    request = request.override(messages=patched)
-        return handler(request)
-
-    @override
-    async def awrap_model_call(
-        self,
-        request: ModelRequest,
-        handler: Callable[[ModelRequest], Awaitable[ModelResponse]],
-    ) -> ModelCallResult:
-        if self._config.enabled:
-            messages = getattr(request, "messages", None)
-            if isinstance(messages, list):
-                patched = _patch_model_messages(messages, self._config)
-                if patched is not None:
-                    request = request.override(messages=patched)
-        return await handler(request)
@@ -7,13 +7,11 @@ from typing import NotRequired, override
 from langchain.agents import AgentState
 from langchain.agents.middleware import AgentMiddleware
 from langchain_core.messages import HumanMessage
-from langchain_core.runnables import run_in_executor
 from langgraph.runtime import Runtime

 from deerflow.config.paths import Paths, get_paths
 from deerflow.runtime.user_context import get_effective_user_id
 from deerflow.utils.file_conversion import extract_outline
-from deerflow.utils.messages import ORIGINAL_USER_CONTENT_KEY, message_content_to_text

 logger = logging.getLogger(__name__)

@@ -266,8 +264,6 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):

        # Extract original content - handle both string and list formats
        original_content = last_message.content
-        additional_kwargs = dict(last_message.additional_kwargs or {})
-        additional_kwargs.setdefault(ORIGINAL_USER_CONTENT_KEY, message_content_to_text(original_content))
        if isinstance(original_content, str):
            # Simple case: string content, just prepend files message
            updated_content = f"{files_message}\n\n{original_content}"
@@ -288,7 +284,7 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
            content=updated_content,
            id=last_message.id,
            name=last_message.name,
-            additional_kwargs=additional_kwargs,
+            additional_kwargs=last_message.additional_kwargs,
        )

        messages[last_message_index] = updated_message
@@ -297,16 +293,3 @@ class UploadsMiddleware(AgentMiddleware[UploadsMiddlewareState]):
            "uploaded_files": new_files,
            "messages": messages,
        }
-
-    @override
-    async def abefore_agent(self, state: UploadsMiddlewareState, runtime: Runtime) -> dict | None:
-        """Async hook that offloads the synchronous uploads scan off the event loop.
-
-        ``before_agent`` performs blocking filesystem IO (directory enumeration,
-        ``stat``, reading sibling ``.md`` outlines). When the graph runs async,
-        langgraph would otherwise execute the sync hook directly on the event
-        loop, so it is dispatched to a worker thread via ``run_in_executor``.
-        ``run_in_executor`` copies the current context, so the ``user_id``
-        contextvar read by ``get_effective_user_id()`` is preserved.
-        """
-        return await run_in_executor(None, self.before_agent, state, runtime)
@@ -179,10 +179,8 @@ class ViewImageMiddleware(AgentMiddleware[ViewImageMiddlewareState]):
        # Create the image details message with text and image content
        image_content = self._create_image_details_message(state)

-        # Create a new human message with mixed content (text + images). This is
-        # internal context for the model only, so hide it from the chat UI and IM
-        # channels (matches the other middleware-injected context messages).
-        human_msg = HumanMessage(content=image_content, additional_kwargs={"hide_from_ui": True})
+        # Create a new human message with mixed content (text + images)
+        human_msg = HumanMessage(content=image_content)

        logger.debug("Injecting image details message with images before LLM call")

@@ -58,32 +58,6 @@ def merge_todos(existing: list | None, new: list | None) -> list | None:
    return new


-class PromotedTools(TypedDict):
-    catalog_hash: str
-    names: list[str]
-
-
-def merge_promoted(existing: PromotedTools | None, new: PromotedTools | None) -> PromotedTools | None:
-    """Reducer for deferred-tool promotions, scoped by catalog hash.
-
-    - new None/empty -> preserve existing (node didn't touch promotions).
-    - catalog_hash changed -> replace wholesale, dropping stale names (prevents a
-      persisted bare name from exposing a different tool after catalog drift).
-    - same catalog_hash -> union names, dedupe, preserve order.
-    """
-    if not new:
-        return existing
-    if existing is None or existing.get("catalog_hash") != new["catalog_hash"]:
-        return {
-            "catalog_hash": new["catalog_hash"],
-            "names": list(dict.fromkeys(new["names"])),
-        }
-    return {
-        "catalog_hash": existing["catalog_hash"],
-        "names": list(dict.fromkeys(existing["names"] + new["names"])),
-    }
-
-
 class ThreadState(AgentState):
    sandbox: NotRequired[SandboxState | None]
    thread_data: NotRequired[ThreadDataState | None]
@@ -92,4 +66,3 @@ class ThreadState(AgentState):
    todos: Annotated[list | None, merge_todos]
    uploaded_files: NotRequired[list[dict] | None]
    viewed_images: Annotated[dict[str, ViewedImageData], merge_viewed_images]  # image_path -> {base64, mime_type}
-    promoted: Annotated[PromotedTools | None, merge_promoted]
@@ -33,7 +33,7 @@ from langchain.agents.middleware import AgentMiddleware
 from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage
 from langchain_core.runnables import RunnableConfig

-from deerflow.agents.lead_agent.agent import build_middlewares
+from deerflow.agents.lead_agent.agent import _build_middlewares
 from deerflow.agents.lead_agent.prompt import apply_prompt_template
 from deerflow.agents.thread_state import ThreadState
 from deerflow.config.agents_config import AGENT_NAME_PATTERN
@@ -43,7 +43,6 @@ from deerflow.config.paths import get_paths
 from deerflow.models import create_chat_model
 from deerflow.runtime.user_context import get_effective_user_id
 from deerflow.skills.storage import get_or_new_skill_storage
-from deerflow.tools.builtins.tool_search import assemble_deferred_tools
 from deerflow.tracing import build_tracing_callbacks, inject_langfuse_metadata
 from deerflow.uploads.manager import (
    claim_unique_filename,
@@ -238,30 +237,19 @@ class DeerFlowClient:
        subagent_enabled = cfg.get("subagent_enabled", False)
        max_concurrent_subagents = cfg.get("max_concurrent_subagents", 3)

-        tools = self._get_tools(model_name=model_name, subagent_enabled=subagent_enabled)
-        final_tools, deferred_setup = assemble_deferred_tools(tools, enabled=self._app_config.tool_search.enabled)
        kwargs: dict[str, Any] = {
            # attach_tracing=False because ``stream()`` injects tracing
            # callbacks at the graph invocation root so a single embedded run
            # produces one trace with correct session_id / user_id propagation.
            # Attaching them again on the model would emit duplicate spans.
            "model": create_chat_model(name=model_name, thinking_enabled=thinking_enabled, attach_tracing=False),
-            "tools": final_tools,
-            "middleware": build_middlewares(
-                config,
-                model_name=model_name,
-                agent_name=self._agent_name,
-                available_skills=self._available_skills,
-                custom_middlewares=self._middlewares,
-                app_config=self._app_config,
-                deferred_setup=deferred_setup,
-            ),
+            "tools": self._get_tools(model_name=model_name, subagent_enabled=subagent_enabled),
+            "middleware": _build_middlewares(config, model_name=model_name, agent_name=self._agent_name, custom_middlewares=self._middlewares),
            "system_prompt": apply_prompt_template(
                subagent_enabled=subagent_enabled,
                max_concurrent_subagents=max_concurrent_subagents,
                agent_name=self._agent_name,
                available_skills=self._available_skills,
-                deferred_names=deferred_setup.deferred_names,
            ),
            "state_schema": ThreadState,
        }
@@ -1141,7 +1129,6 @@ class DeerFlowClient:
            "fact_confidence_threshold": config.fact_confidence_threshold,
            "injection_enabled": config.injection_enabled,
            "max_injection_tokens": config.max_injection_tokens,
-            "token_counting": config.token_counting,
        }

    def get_memory_status(self) -> dict:
@@ -1219,7 +1206,7 @@ class DeerFlowClient:

                info: dict[str, Any] = {
                    "filename": dest_name,
-                    "size": dest.stat().st_size,
+                    "size": str(dest.stat().st_size),
                    "path": str(dest),
                    "virtual_path": upload_virtual_path(dest_name),
                    "artifact_url": upload_artifact_url(thread_id, dest_name),
@@ -39,63 +39,11 @@ class AioSandbox(Sandbox):
        self._client = AioSandboxClient(base_url=base_url, timeout=600)
        self._home_dir = home_dir
        self._lock = threading.Lock()
-        self._closed = False

    @property
    def base_url(self) -> str:
        return self._base_url

-    def close(self) -> None:
-        """Best-effort close of the host-side HTTP client owned by this sandbox.
-
-        The agent_sandbox SDK is Fern-generated and exposes no ``close()`` /
-        ``__exit__``, so we reach the socket-owning ``httpx.Client`` explicitly
-        through its attribute chain::
-
-            Sandbox._client_wrapper        -> SyncClientWrapper
-                .httpx_client              -> Fern HttpClient (a wrapper, NOT httpx.Client)
-                    .httpx_client          -> httpx.Client     <- the real socket owner
-
-        Closing it releases pooled sockets so long-running provider lifecycles
-        do not accumulate unreclaimed host-side resources (#2872).
-
-        Resolution is most-specific-first with graceful degradation: if a future
-        SDK adds a top-level ``Sandbox.close()`` it is picked up automatically
-        without changing this code. Idempotent, thread-safe, and non-fatal:
-        failures during teardown are logged and swallowed so provider/backend
-        cleanup is never blocked.
-        """
-        with self._lock:
-            if self._closed:
-                return
-            self._closed = True
-            client = self._client
-            # Drop the reference under the lock for use-after-close safety: any
-            # later command on this instance fails loudly instead of reusing a
-            # half-closed client.
-            self._client = None
-
-        if client is None:
-            return
-
-        # Walk from the real httpx.Client up to the top-level client, picking the
-        # first object that actually exposes close().
-        wrapper = getattr(client, "_client_wrapper", None)
-        fern_http = getattr(wrapper, "httpx_client", None)
-        real_httpx = getattr(fern_http, "httpx_client", None)
-        target = next(
-            (c for c in (real_httpx, fern_http, client) if c is not None and hasattr(c, "close")),
-            None,
-        )
-        if target is None:
-            logger.debug("AioSandbox %s: no closable client found, nothing to release", self.id)
-            return
-
-        try:
-            target.close()
-        except Exception as e:
-            logger.warning(f"Error closing AioSandbox client for {self.id}: {e}")
-
    @property
    def home_dir(self) -> str:
        """Get the home directory inside the sandbox."""
@@ -119,6 +119,7 @@ class AioSandboxProvider(SandboxProvider):
        port: 8080                      # Base port for local containers
        container_prefix: deer-flow-sandbox
        idle_timeout: 600               # Idle timeout in seconds (0 to disable)
+        auto_restart: true              # Restart crashed containers automatically
        replicas: 3                     # Max concurrent sandbox containers (LRU eviction when exceeded)
        mounts:                         # Volume mounts for local containers
          - host_path: /path/on/host
@@ -203,12 +204,14 @@ class AioSandboxProvider(SandboxProvider):

        idle_timeout = getattr(sandbox_config, "idle_timeout", None)
        replicas = getattr(sandbox_config, "replicas", None)
+        auto_restart = getattr(sandbox_config, "auto_restart", True)

        return {
            "image": sandbox_config.image or DEFAULT_IMAGE,
            "port": sandbox_config.port or DEFAULT_PORT,
            "container_prefix": sandbox_config.container_prefix or DEFAULT_CONTAINER_PREFIX,
            "idle_timeout": idle_timeout if idle_timeout is not None else DEFAULT_IDLE_TIMEOUT,
+            "auto_restart": auto_restart,
            "replicas": replicas if replicas is not None else DEFAULT_REPLICAS,
            "mounts": sandbox_config.mounts or [],
            "environment": self._resolve_env_vars(sandbox_config.environment or {}),
@@ -771,18 +774,58 @@ class AioSandboxProvider(SandboxProvider):
    def get(self, sandbox_id: str) -> Sandbox | None:
        """Get a sandbox by ID. Updates last activity timestamp.

+        When ``auto_restart`` is enabled (the default), the container's liveness
+        is verified on each lookup.  If the underlying container has crashed, the
+        sandbox is evicted from all caches so that the next ``acquire()`` call will
+        transparently create a fresh container.
+
        Args:
            sandbox_id: The ID of the sandbox.

        Returns:
-            The sandbox instance if found, None otherwise.
+            The sandbox instance if found and alive, None otherwise.
        """
        with self._lock:
            sandbox = self._sandboxes.get(sandbox_id)
-            if sandbox is not None:
+            if sandbox is None:
+                return None
            self._last_activity[sandbox_id] = time.time()
+            auto_restart = self._config.get("auto_restart", True)
+            info = self._sandbox_infos.get(sandbox_id) if auto_restart else None
+
+        if not info:
            return sandbox

+        if self._backend.is_alive(info):
+            return sandbox
+
+        info_to_destroy = None
+        with self._lock:
+            current_sandbox = self._sandboxes.get(sandbox_id)
+            current_info = self._sandbox_infos.get(sandbox_id)
+            if current_sandbox is None:
+                return None
+            if current_info is not info:
+                self._last_activity[sandbox_id] = time.time()
+                return current_sandbox
+
+            logger.warning(f"Sandbox {sandbox_id} container is not alive, evicting from cache for auto-restart")
+            self._sandboxes.pop(sandbox_id, None)
+            self._sandbox_infos.pop(sandbox_id, None)
+            self._last_activity.pop(sandbox_id, None)
+            self._warm_pool.pop(sandbox_id, None)
+            thread_ids = [tid for tid, sid in self._thread_sandboxes.items() if sid == sandbox_id]
+            for tid in thread_ids:
+                del self._thread_sandboxes[tid]
+            info_to_destroy = info
+
+        if info_to_destroy:
+            try:
+                self._backend.destroy(info_to_destroy)
+            except Exception as e:
+                logger.warning(f"Failed to cleanup dead sandbox {sandbox_id}: {e}")
+        return None
+
    def release(self, sandbox_id: str) -> None:
        """Release a sandbox from active use into the warm pool.

@@ -790,20 +833,14 @@ class AioSandboxProvider(SandboxProvider):
        thread on its next turn without a cold-start.  The container will only be
        stopped when the replicas limit forces eviction or during shutdown.

-        The host-side HTTP client owned by the cached ``AioSandbox`` instance is
-        closed before the instance is dropped (#2872). The warm-pool entry only
-        stores ``SandboxInfo``, so a fresh ``AioSandbox`` (and a fresh client)
-        is constructed if the container is later reclaimed.
-
        Args:
            sandbox_id: The ID of the sandbox to release.
        """
        info = None
-        sandbox = None
        thread_ids_to_remove: list[str] = []

        with self._lock:
-            sandbox = self._sandboxes.pop(sandbox_id, None)
+            self._sandboxes.pop(sandbox_id, None)
            info = self._sandbox_infos.pop(sandbox_id, None)
            thread_ids_to_remove = [tid for tid, sid in self._thread_sandboxes.items() if sid == sandbox_id]
            for tid in thread_ids_to_remove:
@@ -813,15 +850,6 @@ class AioSandboxProvider(SandboxProvider):
            if info and sandbox_id not in self._warm_pool:
                self._warm_pool[sandbox_id] = (info, time.time())

-        if sandbox is not None:
-            # Defense-in-depth: close() already swallows its own errors; this
-            # guard only protects against a future close() that misbehaves, so
-            # host-side client cleanup can never block parking in the warm pool.
-            try:
-                sandbox.close()
-            except Exception as e:
-                logger.warning(f"Error closing sandbox {sandbox_id} during release: {e}")
-
        logger.info(f"Released sandbox {sandbox_id} to warm pool (container still running)")

    def destroy(self, sandbox_id: str) -> None:
@@ -830,19 +858,14 @@ class AioSandboxProvider(SandboxProvider):
        Unlike release(), this actually stops the container.  Use this for
        explicit cleanup, capacity-driven eviction, or shutdown.

-        The host-side HTTP client owned by the cached ``AioSandbox`` instance is
-        closed alongside backend/container destruction so no client/socket
-        resources leak (#2872).
-
        Args:
            sandbox_id: The ID of the sandbox to destroy.
        """
        info = None
-        sandbox = None
        thread_ids_to_remove: list[str] = []

        with self._lock:
-            sandbox = self._sandboxes.pop(sandbox_id, None)
+            self._sandboxes.pop(sandbox_id, None)
            info = self._sandbox_infos.pop(sandbox_id, None)
            thread_ids_to_remove = [tid for tid, sid in self._thread_sandboxes.items() if sid == sandbox_id]
            for tid in thread_ids_to_remove:
@@ -854,15 +877,6 @@ class AioSandboxProvider(SandboxProvider):
            else:
                self._warm_pool.pop(sandbox_id, None)

-        if sandbox is not None:
-            # Defense-in-depth: close() already swallows its own errors; this
-            # guard only protects against a future close() that misbehaves, so
-            # host-side client cleanup can never block container destruction.
-            try:
-                sandbox.close()
-            except Exception as e:
-                logger.warning(f"Error closing sandbox {sandbox_id} during destroy: {e}")
-
        if info:
            self._backend.destroy(info)
            logger.info(f"Destroyed sandbox {sandbox_id}")
@@ -11,85 +11,12 @@ from deerflow.config import get_app_config

 logger = logging.getLogger(__name__)

-DEFAULT_BACKEND = "auto"
-DEFAULT_REGION = "wt-wt"
-DEFAULT_SAFESEARCH = "moderate"
-DEFAULT_WIKIPEDIA_REGION = "us-en"
-
-WIKIPEDIA_BACKENDS = {"auto", "all", "wikipedia"}
-WIKIPEDIA_LANGUAGE_ALIASES = {
-    "jp": "ja",
-    "kr": "ko",
-    "tzh": "zh",
-    "wt": "en",
-}
-
-
-def _normalize_backend(backend: str | list[str] | tuple[str, ...] | None) -> str:
-    if backend is None:
-        return DEFAULT_BACKEND
-    if isinstance(backend, (list, tuple)):
-        return ",".join(str(part).strip() for part in backend if str(part).strip()) or DEFAULT_BACKEND
-    return str(backend).strip() or DEFAULT_BACKEND
-
-
-def _normalize_setting(value: str | None, default: str) -> str:
-    return str(value).strip() if value else default
-
-
-def _backend_includes_wikipedia(backend: str | list[str] | tuple[str, ...] | None) -> bool:
-    backend = _normalize_backend(backend)
-    return any(part.strip().lower() in WIKIPEDIA_BACKENDS for part in backend.split(","))
-
-
-def _contains_codepoint(query: str, ranges: tuple[tuple[int, int], ...]) -> bool:
-    return any(start <= ord(char) <= end for char in query for start, end in ranges)
-
-
-def _infer_wikipedia_region(query: str) -> str:
-    """Pick a valid Wikipedia language region when DDGS' worldwide region is used."""
-    if _contains_codepoint(query, ((0x3040, 0x30FF), (0x31F0, 0x31FF))):
-        return "jp-ja"
-    if _contains_codepoint(query, ((0xAC00, 0xD7AF), (0x1100, 0x11FF), (0x3130, 0x318F))):
-        return "kr-ko"
-    if _contains_codepoint(query, ((0x3400, 0x9FFF),)):
-        return "cn-zh"
-    if _contains_codepoint(query, ((0x0400, 0x04FF),)):
-        return "ru-ru"
-    if _contains_codepoint(query, ((0x0370, 0x03FF),)):
-        return "gr-el"
-    if _contains_codepoint(query, ((0x0590, 0x05FF),)):
-        return "il-he"
-    if _contains_codepoint(query, ((0x0600, 0x06FF),)):
-        return "xa-ar"
-    return DEFAULT_WIKIPEDIA_REGION
-
-
-def _resolve_ddgs_region(query: str, region: str | None, backend: str | list[str] | tuple[str, ...] | None) -> str:
-    """
-    DDGS' wikipedia engine treats the second part of region as a Wikipedia
-    subdomain. Its default worldwide region, wt-wt, becomes wt.wikipedia.org.
-    """
-    normalized_region = _normalize_setting(region, DEFAULT_REGION).lower()
-    if not _backend_includes_wikipedia(backend):
-        return normalized_region
-
-    if normalized_region == DEFAULT_REGION:
-        return _infer_wikipedia_region(query)
-
-    if "-" not in normalized_region:
-        return DEFAULT_WIKIPEDIA_REGION
-
-    country, language = normalized_region.split("-", 1)
-    return f"{country}-{WIKIPEDIA_LANGUAGE_ALIASES.get(language, language)}"
-

 def _search_text(
    query: str,
    max_results: int = 5,
-    region: str | None = DEFAULT_REGION,
-    safesearch: str | None = DEFAULT_SAFESEARCH,
-    backend: str | list[str] | tuple[str, ...] | None = DEFAULT_BACKEND,
+    region: str = "wt-wt",
+    safesearch: str = "moderate",
 ) -> list[dict]:
    """
    Execute text search using DuckDuckGo.
@@ -99,7 +26,6 @@ def _search_text(
        max_results: Maximum number of results
        region: Search region
        safesearch: Safe search level
-        backend: DDGS backend(s), e.g. "auto", "duckduckgo", or "duckduckgo,brave"

    Returns:
        List of search results
@@ -113,15 +39,11 @@ def _search_text(
    ddgs = DDGS(timeout=30)

    try:
-        backend = _normalize_backend(backend)
-        safesearch = _normalize_setting(safesearch, DEFAULT_SAFESEARCH)
-        effective_region = _resolve_ddgs_region(query, region, backend)
        results = ddgs.text(
            query,
-            region=effective_region,
+            region=region,
            safesearch=safesearch,
            max_results=max_results,
-            backend=backend,
        )
        return list(results) if results else []

@@ -142,23 +64,14 @@ def web_search_tool(
        max_results: Maximum number of results to return. Default is 5.
    """
    config = get_app_config().get_tool_config("web_search")
-    region = DEFAULT_REGION
-    safesearch = DEFAULT_SAFESEARCH
-    backend = DEFAULT_BACKEND

-    if config is not None:
-        # Override tool call defaults from config if set.
+    # Override max_results from config if set
+    if config is not None and "max_results" in config.model_extra:
        max_results = config.model_extra.get("max_results", max_results)
-        region = config.model_extra.get("region", region)
-        safesearch = config.model_extra.get("safesearch", safesearch)
-        backend = config.model_extra.get("backend", backend)

    results = _search_text(
        query=query,
        max_results=max_results,
-        region=region,
-        safesearch=safesearch,
-        backend=backend,
    )

    if not results:
@@ -9,7 +9,7 @@ _api_key_warned = False


 class JinaClient:
-    async def crawl(self, url: str, return_format: str = "html", timeout: int = 10, proxy: str | None = None, trust_env: bool = True) -> str:
+    async def crawl(self, url: str, return_format: str = "html", timeout: int = 10) -> str:
        global _api_key_warned
        headers = {
            "Content-Type": "application/json",
@@ -23,10 +23,7 @@ class JinaClient:
            logger.warning("Jina API key is not set. Provide your own key to access a higher rate limit. See https://jina.ai/reader for more information.")
        data = {"url": url}
        try:
-            client_kwargs: dict[str, object] = {"trust_env": trust_env}
-            if proxy:
-                client_kwargs["proxy"] = proxy
-            async with httpx.AsyncClient(**client_kwargs) as client:
+            async with httpx.AsyncClient() as client:
                response = await client.post("https://r.jina.ai/", headers=headers, json=data, timeout=timeout)

            if response.status_code != 200:
@@ -9,38 +9,6 @@ from deerflow.utils.readability import ReadabilityExtractor
 readability_extractor = ReadabilityExtractor()


-def _coerce_bool(value: object, default: bool) -> bool:
-    if isinstance(value, bool):
-        return value
-    if isinstance(value, str):
-        normalized = value.strip().lower()
-        if normalized in {"1", "true", "yes", "on"}:
-            return True
-        if normalized in {"0", "false", "no", "off"}:
-            return False
-    return default
-
-
-def _coerce_timeout(value: object, default: int) -> int:
-    if isinstance(value, bool):
-        return default
-    if isinstance(value, int):
-        return value
-    if isinstance(value, str):
-        try:
-            return int(value)
-        except ValueError:
-            return default
-    return default
-
-
-def _coerce_proxy(value: object) -> str | None:
-    if not isinstance(value, str):
-        return None
-    proxy = value.strip()
-    return proxy or None
-
-
@tool("web_fetch", parse_docstring=True)
 async def web_fetch_tool(url: str) -> str:
    """Fetch the contents of a web page at a given URL.
@@ -54,14 +22,10 @@ async def web_fetch_tool(url: str) -> str:
    """
    jina_client = JinaClient()
    timeout = 10
-    proxy = None
-    trust_env = True
    config = get_app_config().get_tool_config("web_fetch")
-    if config is not None:
-        timeout = _coerce_timeout(config.model_extra.get("timeout"), timeout)
-        proxy = _coerce_proxy(config.model_extra.get("proxy"))
-        trust_env = _coerce_bool(config.model_extra.get("trust_env"), trust_env)
-    html_content = await jina_client.crawl(url, return_format="html", timeout=timeout, proxy=proxy, trust_env=trust_env)
+    if config is not None and "timeout" in config.model_extra:
+        timeout = config.model_extra.get("timeout")
+    html_content = await jina_client.crawl(url, return_format="html", timeout=timeout)
    if isinstance(html_content, str) and html_content.startswith("Error:"):
        return html_content
    article = await asyncio.to_thread(readability_extractor.extract_article, html_content)
@@ -67,13 +67,11 @@ def resolve_agent_dir(name: str, *, user_id: str | None = None) -> Path:
    paths = get_paths()
    effective_user = user_id or get_effective_user_id()
    user_path = paths.user_agent_dir(effective_user, name)
-    # Require config.yaml to confirm this is a genuine agent directory,
-    # not a leftover from memory/storage writes (see #3390).
-    if user_path.exists() and (user_path / "config.yaml").exists():
+    if user_path.exists():
        return user_path

    legacy_path = paths.agent_dir(name)
-    if legacy_path.exists() and (legacy_path / "config.yaml").exists():
+    if legacy_path.exists():
        return legacy_path

    return user_path
@@ -7,11 +7,10 @@ from typing import Any, Self

 import yaml
 from dotenv import load_dotenv
-from pydantic import BaseModel, ConfigDict, Field, field_validator
+from pydantic import BaseModel, ConfigDict, Field

 from deerflow.config.acp_config import ACPAgentConfig, load_acp_config_from_dict
 from deerflow.config.agents_api_config import AgentsApiConfig, load_agents_api_config_from_dict
-from deerflow.config.channel_connections_config import ChannelConnectionsConfig
 from deerflow.config.checkpointer_config import CheckpointerConfig, load_checkpointer_config_from_dict
 from deerflow.config.database_config import DatabaseConfig
 from deerflow.config.extensions_config import ExtensionsConfig
@@ -19,7 +18,6 @@ from deerflow.config.guardrails_config import GuardrailsConfig, load_guardrails_
 from deerflow.config.loop_detection_config import LoopDetectionConfig
 from deerflow.config.memory_config import MemoryConfig, load_memory_config_from_dict
 from deerflow.config.model_config import ModelConfig
-from deerflow.config.reload_boundary import format_field_description
 from deerflow.config.run_events_config import RunEventsConfig
 from deerflow.config.runtime_paths import existing_project_file
 from deerflow.config.safety_finish_reason_config import SafetyFinishReasonConfig
@@ -32,7 +30,6 @@ from deerflow.config.summarization_config import SummarizationConfig, load_summa
 from deerflow.config.title_config import TitleConfig, load_title_config_from_dict
 from deerflow.config.token_usage_config import TokenUsageConfig
 from deerflow.config.tool_config import ToolConfig, ToolGroupConfig
-from deerflow.config.tool_output_config import ToolOutputConfig
 from deerflow.config.tool_search_config import ToolSearchConfig, load_tool_search_config_from_dict

 load_dotenv()
@@ -87,27 +84,15 @@ def apply_logging_level(name: str | None) -> None:
 class AppConfig(BaseModel):
    """Config for the DeerFlow application"""

-    log_level: str = Field(
-        default="info",
-        description=format_field_description(
-            "log_level",
-            field_doc="Logging level for deerflow and app modules (debug/info/warning/error); third-party libraries are not affected.",
-        ),
-    )
+    log_level: str = Field(default="info", description="Logging level for deerflow and app modules (debug/info/warning/error); third-party libraries are not affected")
    token_usage: TokenUsageConfig = Field(default_factory=TokenUsageConfig, description="Token usage tracking configuration")
    models: list[ModelConfig] = Field(default_factory=list, description="Available models")
-    sandbox: SandboxConfig = Field(
-        description=format_field_description(
-            "sandbox",
-            field_doc="Sandbox provider configuration (local filesystem or Docker-based aio sandbox).",
-        ),
-    )
+    sandbox: SandboxConfig = Field(description="Sandbox configuration")
    tools: list[ToolConfig] = Field(default_factory=list, description="Available tools")
    tool_groups: list[ToolGroupConfig] = Field(default_factory=list, description="Available tool groups")
    skills: SkillsConfig = Field(default_factory=SkillsConfig, description="Skills configuration")
    skill_evolution: SkillEvolutionConfig = Field(default_factory=SkillEvolutionConfig, description="Agent-managed skill evolution configuration")
    extensions: ExtensionsConfig = Field(default_factory=ExtensionsConfig, description="Extensions configuration (MCP servers and skills state)")
-    tool_output: ToolOutputConfig = Field(default_factory=ToolOutputConfig, description="Tool output budget protection configuration")
    tool_search: ToolSearchConfig = Field(default_factory=ToolSearchConfig, description="Tool search / deferred loading configuration")
    title: TitleConfig = Field(default_factory=TitleConfig, description="Automatic title generation configuration")
    summarization: SummarizationConfig = Field(default_factory=SummarizationConfig, description="Conversation summarization configuration")
@@ -117,53 +102,13 @@ class AppConfig(BaseModel):
    subagents: SubagentsAppConfig = Field(default_factory=SubagentsAppConfig, description="Subagent runtime configuration")
    guardrails: GuardrailsConfig = Field(default_factory=GuardrailsConfig, description="Guardrail middleware configuration")
    circuit_breaker: CircuitBreakerConfig = Field(default_factory=CircuitBreakerConfig, description="LLM circuit breaker configuration")
-    channel_connections: ChannelConnectionsConfig = Field(default_factory=ChannelConnectionsConfig, description="User-facing IM channel connection configuration")
    loop_detection: LoopDetectionConfig = Field(default_factory=LoopDetectionConfig, description="Loop detection middleware configuration")
    safety_finish_reason: SafetyFinishReasonConfig = Field(default_factory=SafetyFinishReasonConfig, description="Provider safety-filter finish_reason interception middleware configuration")
    model_config = ConfigDict(extra="allow")
-    database: DatabaseConfig = Field(
-        default_factory=DatabaseConfig,
-        description=format_field_description(
-            "database",
-            field_doc="Unified database backend for run/feedback metadata (memory, sqlite, or postgres).",
-        ),
-    )
-    run_events: RunEventsConfig = Field(
-        default_factory=RunEventsConfig,
-        description=format_field_description(
-            "run_events",
-            field_doc="Run-event store backend (memory for dev, db for production queries, jsonl for lightweight single-node persistence).",
-        ),
-    )
-    checkpointer: CheckpointerConfig | None = Field(
-        default=None,
-        description=format_field_description(
-            "checkpointer",
-            field_doc="LangGraph state-persistence checkpointer configuration.",
-        ),
-    )
-    stream_bridge: StreamBridgeConfig | None = Field(
-        default=None,
-        description=format_field_description(
-            "stream_bridge",
-            field_doc="Stream bridge connecting agent workers to SSE endpoints.",
-        ),
-    )
-
-    @field_validator("models", "tools", "tool_groups", mode="before")
-    @classmethod
-    def _coerce_null_list_sections(cls, value: Any) -> Any:
-        """Treat a present-but-empty config section as an empty list.
-
-        Commenting out every entry under a top-level YAML key — e.g. ``models:``
-        with only comments beneath it, exactly as shipped in
-        ``config.example.yaml`` — makes PyYAML parse the value as ``None``.
-        Without this, the documented ``cp config.example.yaml config.yaml``
-        first-run flow crashes with an opaque ``Input should be a valid list``
-        pydantic error. Coercing ``None`` to ``[]`` keeps that flow working and
-        matches the field's own ``default_factory=list``.
-        """
-        return [] if value is None else value
+    database: DatabaseConfig = Field(default_factory=DatabaseConfig, description="Unified database backend configuration")
+    run_events: RunEventsConfig = Field(default_factory=RunEventsConfig, description="Run event storage configuration")
+    checkpointer: CheckpointerConfig | None = Field(default=None, description="Checkpointer configuration")
+    stream_bridge: StreamBridgeConfig | None = Field(default=None, description="Stream bridge configuration")

    @classmethod
    def resolve_config_path(cls, config_path: str | None = None) -> Path:
@@ -226,11 +171,6 @@ class AppConfig(BaseModel):
        config_data["extensions"] = extensions_config.model_dump()

        result = cls.model_validate(config_data)
-        if not result.models:
-            logger.warning(
-                "No models are configured in %s. Add at least one entry under `models:` (see the commented examples in config.example.yaml) or run `make setup`.",
-                resolved_path,
-            )
        acp_agents = cls._validate_acp_agents(config_data.get("acp_agents", {}))
        cls._apply_singleton_configs(result, acp_agents)
        return result
@@ -1,61 +0,0 @@
-"""Configuration for user-owned IM channel connections."""
-
-from __future__ import annotations
-
-from pydantic import BaseModel, Field
-
-
-class SlackChannelConnectionConfig(BaseModel):
-    enabled: bool = False
-
-    @property
-    def configured(self) -> bool:
-        return True
-
-
-class TelegramChannelConnectionConfig(BaseModel):
-    enabled: bool = False
-    bot_username: str = ""
-
-    @property
-    def configured(self) -> bool:
-        return bool(self.bot_username)
-
-
-class DiscordChannelConnectionConfig(BaseModel):
-    enabled: bool = False
-
-    @property
-    def configured(self) -> bool:
-        return True
-
-
-class BindingCodeChannelConnectionConfig(BaseModel):
-    enabled: bool = False
-
-    @property
-    def configured(self) -> bool:
-        return True
-
-
-class ChannelConnectionsConfig(BaseModel):
-    """Top-level config for browser-connectable IM channels."""
-
-    enabled: bool = False
-    slack: SlackChannelConnectionConfig = Field(default_factory=SlackChannelConnectionConfig)
-    telegram: TelegramChannelConnectionConfig = Field(default_factory=TelegramChannelConnectionConfig)
-    discord: DiscordChannelConnectionConfig = Field(default_factory=DiscordChannelConnectionConfig)
-    feishu: BindingCodeChannelConnectionConfig = Field(default_factory=BindingCodeChannelConnectionConfig)
-    dingtalk: BindingCodeChannelConnectionConfig = Field(default_factory=BindingCodeChannelConnectionConfig)
-    wechat: BindingCodeChannelConnectionConfig = Field(default_factory=BindingCodeChannelConnectionConfig)
-    wecom: BindingCodeChannelConnectionConfig = Field(default_factory=BindingCodeChannelConnectionConfig)
-
-    def provider_status(self, provider: str) -> dict[str, bool]:
-        config = getattr(self, provider, None)
-        if config is None:
-            return {"enabled": False, "configured": False}
-        enabled = bool(config.enabled)
-        return {
-            "enabled": enabled,
-            "configured": enabled and bool(config.configured),
-        }
@@ -41,20 +41,6 @@ def set_checkpointer_config(config: CheckpointerConfig | None) -> None:
    _checkpointer_config = config


-def ensure_config_loaded() -> None:
-    """Lazily load app config when checkpointer config has not been initialized."""
-    from deerflow.config.app_config import _app_config, get_app_config
-
-    config = get_checkpointer_config()
-    if config is not None or _app_config is not None:
-        return
-
-    try:
-        get_app_config()
-    except FileNotFoundError:
-        pass
-
-
 def load_checkpointer_config_from_dict(config_dict: dict | None) -> None:
    """Load checkpointer configuration from a dictionary."""
    global _checkpointer_config
@@ -5,7 +5,7 @@ import os
 from pathlib import Path
 from typing import Any, Literal

-from pydantic import BaseModel, ConfigDict, Field, model_validator
+from pydantic import BaseModel, ConfigDict, Field

 from deerflow.config.runtime_paths import existing_project_file

@@ -47,24 +47,6 @@ class McpServerConfig(BaseModel):
    description: str = Field(default="", description="Human-readable description of what this MCP server provides")
    model_config = ConfigDict(extra="allow")

-    @model_validator(mode="before")
-    @classmethod
-    def _accept_transport_alias(cls, data: Any) -> Any:
-        """Accept the MCP-spec ``transport`` field as an alias for ``type``.
-
-        The official MCP configuration schema uses ``transport`` to indicate
-        the transport mechanism (``stdio``/``sse``/``http``). Earlier versions
-        of this project only honored ``type``, which caused remote SSE/HTTP
-        servers configured with just ``transport`` to be incorrectly treated as
-        ``stdio`` (the default). This validator normalizes the two so either
-        spelling works, with ``type`` taking precedence when both are provided.
-        """
-        if isinstance(data, dict):
-            transport = data.get("transport")
-            if transport and not data.get("type"):
-                data = {**data, "type": transport}
-        return data
-

 class SkillStateConfig(BaseModel):
    """Configuration for a single skill's state."""
@@ -1,7 +1,5 @@
 """Configuration for memory mechanism."""

-from typing import Literal
-
 from pydantic import BaseModel, Field


@@ -62,17 +60,6 @@ class MemoryConfig(BaseModel):
        le=8000,
        description="Maximum tokens to use for memory injection",
    )
-    token_counting: Literal["tiktoken", "char"] = Field(
-        default="tiktoken",
-        description=(
-            "Token counting strategy for memory-injection budgeting. "
-            "'tiktoken' is accurate but the encoding's BPE data may be "
-            "downloaded from a public network endpoint on first use, which "
-            "can block for a long time in network-restricted environments "
-            "(see issue #3402/#3429). 'char' uses a network-free "
-            "CJK-aware character-based estimate and never touches tiktoken."
-        ),
-    )


 # Global configuration instance
@@ -32,16 +32,6 @@ class ModelConfig(BaseModel):
        description="Extra settings to be passed to the model when thinking is disabled",
    )
    supports_vision: bool = Field(default_factory=lambda: False, description="Whether the model supports vision/image inputs")
-    stream_chunk_timeout: float | None = Field(
-        default=None,
-        description=(
-            "Maximum seconds to wait between successive streaming chunks before "
-            "langchain-openai raises StreamChunkTimeoutError. None means use the "
-            "factory default (240s for OpenAI-compatible clients). Tune higher for "
-            "reasoning models with long thinking pauses; lower for latency-sensitive "
-            "interactive endpoints. Has no effect on non-OpenAI-compatible providers."
-        ),
-    )
    thinking: dict | None = Field(
        default_factory=lambda: None,
        description=(
@@ -1,4 +1,3 @@
-import hashlib
 import os
 import re
 import shutil
@@ -11,8 +10,6 @@ VIRTUAL_PATH_PREFIX = "/mnt/user-data"

 _SAFE_THREAD_ID_RE = re.compile(r"^[A-Za-z0-9_\-]+$")
 _SAFE_USER_ID_RE = re.compile(r"^[A-Za-z0-9_\-]+$")
-_UNSAFE_USER_ID_CHAR_RE = re.compile(r"[^A-Za-z0-9_\-]")
-_SAFE_USER_ID_DIGEST_HEX_LEN = 16


 def _default_local_base_dir() -> Path:
@@ -34,23 +31,6 @@ def _validate_user_id(user_id: str) -> str:
    return user_id


-def make_safe_user_id(raw: str) -> str:
-    """Normalize an external identity into the user-id charset (``[A-Za-z0-9_-]``).
-
-    IM channel ids (Feishu/Slack/Telegram) may contain characters that
-    :func:`_validate_user_id` rejects. Already-safe ids pass through unchanged;
-    lossy ones get a short digest suffix so two distinct inputs never share a
-    storage bucket.
-    """
-    if not raw:
-        raise ValueError("user_id must be a non-empty string.")
-    sanitized = _UNSAFE_USER_ID_CHAR_RE.sub("-", raw)
-    if sanitized == raw:
-        return raw
-    digest = hashlib.sha256(raw.encode("utf-8")).hexdigest()[:_SAFE_USER_ID_DIGEST_HEX_LEN]
-    return f"{sanitized}-{digest}"
-
-
 def _join_host_path(base: str, *parts: str) -> str:
    """Join host filesystem path segments while preserving native style.

--- a/Show More
+++ b/Show More