Files
deer-flow/backend/CLAUDE.md
T
Xinmin Zeng e93f658472 fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107) (#3131)
* fix(task-tool): unwrap callback manager when locating usage recorder

`config["callbacks"]` may arrive as a `BaseCallbackManager` (e.g. the
`AsyncCallbackManager` LangChain hands to async tool runs), not just a plain
list. The previous `for cb in callbacks` loop raised
`TypeError: 'AsyncCallbackManager' object is not iterable`, which
`ToolErrorHandlingMiddleware` then converted into a failed `task` ToolMessage
even though the subagent had completed internally — Ultra mode lost subagent
results and the lead agent fell back to redoing the work.

Unwrap `BaseCallbackManager.handlers` before searching for the recorder.

Refs: bytedance/deer-flow#3107 (BUG-002)

* fix(frontend): treat any task tool error as a terminal subtask failure

The subtask card status machine matched only three English prefixes (`Task
Succeeded. Result:`, `Task failed.`, `Task timed out`). Anything else fell
through to `in_progress`, so a `task` tool error wrapped by
`ToolErrorHandlingMiddleware` (`Error: Tool 'task' failed ...`) left the card
spinning forever even after the run had ended.

Extract the prefix logic into `parseSubtaskResult` and recognise any leading
`Error:` token as a terminal failure. The extracted function is unit-tested
against the legacy prefixes plus the `AsyncCallbackManager` regression
captured in the upstream issue.

Refs: bytedance/deer-flow#3107 (BUG-007)

* fix(frontend): exclude hidden, reasoning, and tool payloads from chat export

`formatThreadAsMarkdown` / `formatThreadAsJSON` iterated raw messages without
running the UI-level `isHiddenFromUIMessage` filter. Exported transcripts
therefore included `hide_from_ui` system reminders, memory injections,
provider `reasoning_content`, tool calls, and tool result messages — content
that is intentionally hidden in the chat view.

Filter the export to the user-visible transcript by default and gate
reasoning / tool calls / tool messages / hidden messages behind explicit
`ExportOptions` flags so a future debug export can opt back in without
forking the formatter.

Refs: bytedance/deer-flow#3107 (BUG-006)

* fix(gateway): route get_config through get_app_config for mtime hot reload

`get_config(request)` returned the `app.state.config` snapshot captured at
startup. The worker / lead-agent path then threaded that frozen `AppConfig`
through `RunContext` and `agent_factory`, so per-run fields edited in
`config.yaml` (notably `max_tokens`) were ignored until the gateway process
was restarted — even though `get_app_config()` already does mtime-based
reload at the bottom layer.

Route the request dependency through `get_app_config()` directly. Runtime
`ContextVar` overrides (`push_current_app_config`) and test-injected
singletons (`set_app_config`) keep working; `app.state.config` is now only
read at startup for one-shot bootstrap (logging level, IM channels,
`langgraph_runtime` engines).

`tests/test_gateway_deps_config.py` encoded the old snapshot contract and is
removed; `tests/test_gateway_config_freshness.py` replaces it with mtime,
ContextVar, and `set_app_config` coverage. `test_skills_custom_router.py` and
`test_uploads_router.py` now inject test configs via FastAPI
`dependency_overrides[get_config]` instead of mutating `app.state.config`.

Document the hot-reload boundary in `backend/CLAUDE.md` so reviewers know
which fields are picked up on the next request vs. which still require a
restart (`database`, `checkpointer`, `run_events`, `stream_bridge`,
`sandbox.use`, `log_level`, `channels.*`).

Refs: bytedance/deer-flow#3107 (BUG-001)

* fix(gateway): broaden get_config 503 to any config-load failure

Address review feedback on the previous commit:

1. Narrow exception catch removed. The old contract returned 503 whenever
   `app.state.config is None`. The first cut only mapped
   `FileNotFoundError`, leaving `PermissionError`, YAML parse errors, and
   pydantic `ValidationError` to bubble up as 500. At the request boundary
   we treat any inability to materialise the config as "configuration not
   available" (503) and log the original exception so the operator still
   has the stack.

2. Removed the unused `request: Request` parameter and the matching
   `# noqa: ARG001`. FastAPI's `Depends()` does not require the dependency
   to accept `Request`; the only call site uses the no-arg form.

3. `backend/CLAUDE.md` boundary now lists the *reason* each field is
   restart-required (engine binding, singleton caching, one-shot
   `apply_logging_level`, etc.), not just the field name, so reviewers do
   not have to reverse-engineer the boundary themselves.

Tests parametrise four exception classes (`FileNotFoundError`,
`PermissionError`, `ValueError`, `RuntimeError`) and assert 503 for each.

Refs: bytedance/deer-flow#3107 (BUG-001)

* fix(task-tool): defend _find_usage_recorder against non-list callbacks

Address review feedback. The previous commit handled the two common shapes
LangChain hands to async tool runs — a plain `list[BaseCallbackHandler]` and
a `BaseCallbackManager` subclass — but iterated any other shape directly,
which would still raise `TypeError` if e.g. a single handler instance leaked
through without a list wrapper.

Treat any non-list, non-manager `config["callbacks"]` value as "no recorder"
rather than crash. Docstring now lists all four shapes explicitly. New tests
cover the single-handler-object case, `runtime is None`, `callbacks is None`,
and `runtime.config` being a non-dict — all required to be silent no-ops.

Refs: bytedance/deer-flow#3107 (BUG-002)

* fix(frontend): drop dead identity ternary and add opt-in export tests

Address review feedback on the previous export commit:

1. Removed the no-op `typeof msg.content === "string" ? msg.content : msg.content`
   expression in `formatThreadAsJSON`. Both branches returned the same value;
   the message content now flows through unchanged whether it is a string or
   the rich `MessageContent[]` shape (LangChain JSON-serialises the array
   structure correctly already).

2. Expanded the JSDoc on `ExportOptions` to make it clearer that the four
   flags are not currently wired to any UI control — callers wanting a debug
   export must build the options object explicitly. The default behaviour
   continues to match the explicit prescription in
   bytedance/deer-flow#3107 BUG-006.

3. Added opt-in coverage. The previous tests only exercised the
   `options = {}` default path; the new cases verify each flag flips the
   corresponding payload back into the export so a future debug-export
   surface does not silently break the contract.

Refs: bytedance/deer-flow#3107 (BUG-006)

* fix(frontend): export subtask prefix constants and document fallback intent

Address review feedback on the previous BUG-007 commit:

1. `SUCCESS_PREFIX`, `FAILURE_PREFIX`, `TIMEOUT_PREFIX`, and the
   `ERROR_WRAPPER_PATTERN` regex are now exported. The JSDoc explicitly
   pins them as part of the backend↔frontend contract defined in
   `task_tool.py` and `tool_error_handling_middleware.py`, so any future
   structured-status migration (e.g. backend writing
   `additional_kwargs.subagent_status` instead of leading text) can
   reference these from one canonical place rather than redefine them.

2. The `in_progress` fallback now carries a docstring explaining the
   deliberate choice — LangChain only ever emits a `ToolMessage` once the
   tool itself has returned, so unrecognised content means the contract
   has drifted and "still running" is the right operator signal (eagerly
   marking it terminal-failed would mask the drift).

No behaviour change; this is documentation and an API export.

Refs: bytedance/deer-flow#3107 (BUG-007)

* fix(gateway): drop app.state.config snapshot and freeze run_events_config

Address @ShenAC-SAC's BUG-001 review on #3131. The previous cut still
stored an ``AppConfig`` snapshot on ``app.state.config`` for startup
bootstrap. Two follow-on hazards from that:

1. Future code touching the gateway lifespan could accidentally start
   reading ``app.state.config`` again, silently regressing the request
   hot path back to a stale snapshot.
2. ``get_run_context()`` paired a freshly-reloaded ``AppConfig`` with the
   startup-bound ``event_store`` and a *live* ``run_events_config``
   field — so an operator who edited ``run_events.backend`` mid-flight
   would have produced a run context whose ``event_store`` and
   ``run_events_config`` referred to different backends.

Clean approach (aligned with the direction in PR #3128):

- ``lifespan()`` keeps a local ``startup_config`` variable and passes it
  explicitly into ``langgraph_runtime(app, startup_config)`` and into
  ``start_channel_service``. No ``app.state.config`` attribute is set at
  any point.
- ``langgraph_runtime`` now accepts ``startup_config`` as a required
  parameter, removing the ``getattr(app.state, "config", None)`` lookup
  and the "config not initialised" runtime error.
- The matching ``run_events_config`` is frozen onto ``app.state`` next
  to ``run_event_store`` so ``get_run_context`` reads the two from the
  same startup-time source. ``app_config`` continues to be resolved
  live via ``get_app_config()``.
- ``backend/CLAUDE.md`` boundary explanation updated to spell out the
  ``startup_config`` / ``get_app_config()`` split.

New regression test ``test_run_context_app_config_reflects_yaml_edit``
exercises the worker-feeding path: it asserts that ``ctx.app_config``
follows a mid-flight ``config.yaml`` edit while
``ctx.run_events_config`` stays frozen to the startup snapshot the
event store was built from.

Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review

* fix(frontend): parse Task cancelled and polling timed out as terminal

Address @ShenAC-SAC's BUG-007 review on #3131. `task_tool.py` actually
emits five terminal strings:

- `Task Succeeded. Result: …`
- `Task failed. …`
- `Task timed out. …`
- `Task cancelled by user.`               ← previously matched none
- `Task polling timed out after N minutes …` ← previously matched none

The previous cut handled three; the last two fell through to the
"unknown content" branch and pushed the subtask card back to
`in_progress` even though the backend had already reached a terminal
state. Add explicit matches plus regression tests for both. The
`in_progress` fallback is now reserved for genuinely unrecognised
output (i.e. contract drift), as documented.

Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review

* fix(frontend): sanitize JSON export content via the Markdown content path

Address @ShenAC-SAC's BUG-006 review and the Copilot inline comment on
#3131. The previous cut filtered hidden/tool messages out of the JSON
export but still serialised `msg.content` verbatim, so:

- inline `<think>…</think>` wrappers stayed in the exported `content`
  even with `includeReasoning: false`,
- content-array thinking blocks leaked the `thinking` field,
- `<uploaded_files>…</uploaded_files>` markers leaked the workspace
  paths a user uploaded files to.

JSON now goes through the same sanitiser the Markdown path uses
(`extractContentFromMessage` + `stripUploadedFilesTag`). Reasoning and
tool_calls remain gated behind their `ExportOptions` flags. AI / human
rows that sanitise to empty content with no opted-in reasoning or tool
calls are dropped so the JSON matches the Markdown path's `continue`
on empty assistant fragments.

New regression tests cover the three leak shapes the reviewer called
out plus the empty-content-drop case.

Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review

* test(gateway): align lifespan stub with langgraph_runtime two-arg signature

Codex round-3 review of c0bc7a06 flagged this: changing
`langgraph_runtime` to require `startup_config` as a second positional
argument broke the one-arg stub `_noop_langgraph_runtime(_app)` in
`test_gateway_lifespan_shutdown.py`, which is patched into
`app.gateway.app.langgraph_runtime` by the lifespan shutdown bounded-timeout
regression. Lifespan would then call the stub with two args and raise
`TypeError` before the bounded-shutdown assertion ran.

Update the stub to match the new signature. The shutdown test itself is
unaffected — it only cares about the channel `stop_channel_service` hang
path.

Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review

* fix(frontend): strip every known backend marker in export, not just uploads

Codex round-3 review of 258ca800 and the matching maintainer feedback on
PR #3131 made the same point: the JSON export now ran the
Markdown-side sanitiser, but that sanitiser only stripped
`<uploaded_files>`. The full set of payloads middleware embeds inside
message `content` is larger:

- `<uploaded_files>` — `UploadsMiddleware`
- `<system-reminder>` — `DynamicContextMiddleware`
- `<memory>` — `DynamicContextMiddleware` (nested inside system-reminder)
- `<current_date>` — `DynamicContextMiddleware`

The primary protection is still `isHiddenFromUIMessage`: the
`<system-reminder>` HumanMessage is marked `hide_from_ui: true` and never
reaches the formatter. This commit adds the second line of defence so a
regression that drops the `hide_from_ui` flag — or any future middleware
that injects the same tag vocabulary into a visible HumanMessage —
cannot leak the payload into the export file.

Concrete changes:

- New `INTERNAL_MARKER_TAGS` constant + `stripInternalMarkers(content)`
  helper in `core/messages/utils.ts`. The constant doubles as
  documentation for the backend↔frontend contract.
- `formatMessageContent` in `export.ts` now calls `stripInternalMarkers`
  instead of `stripUploadedFilesTag`. UI render paths
  (`message-list-item.tsx`) keep using the narrower function so a user
  legitimately typing `<memory>` in a meta-discussion is preserved.
- The "drop empty rows" guard in `buildJSONMessage` switched from
  `=== undefined` to truthy `!` checks. Codex spotted the asymmetry: when
  `extractReasoningContentFromMessage` returned the empty string (which it
  legitimately can), the JSON path emitted `{reasoning: ""}` while the
  Markdown path's `!reasoning` `continue` correctly dropped the row.

New regression tests cover the defence-in-depth strip with a
`<system-reminder><memory><current_date>` payload deliberately *not*
marked `hide_from_ui`; tool-message sanitization under
`includeToolMessages: true`; the mixed-content-array case
(`thinking + text + image_url`); and the opted-in empty-reasoning drop.

Live verification on a real Ultra-mode thread that uploaded a PDF
(`曾鑫民-薪资交易流水.pdf`): backend state's first HumanMessage carries the
`<uploaded_files>` block (with `/mnt/user-data/uploads/...` paths) as part
of a content-array. The Markdown and JSON export blobs both come back
free of `<uploaded_files>`, `<system-reminder>`, `<current_date>`,
`tool_calls`, and reasoning — while preserving the user's `这是什么 ?`
prompt and the assistant's visible answer.

Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review

* test(frontend): cover trim, varied N, and pre-execution Error: prefixes

Codex round-3 review of 50e2c257 flagged three coverage gaps in the
subtask-status parser:

1. `Task cancelled by user.` and `Task polling timed out` previously had
   no whitespace-trim coverage — the original trim test only exercised
   the success prefix. Streaming chunks can arrive with leading/trailing
   newlines; the regex needed an explicit assertion.
2. The polling-timeout case was tested only at one `N` (15 minutes). The
   backend interpolates the live `timeout_seconds // 60` value, so the
   matcher must hold for any positive integer. Now we run the case for
   1, 5, and 60 minutes.
3. `task_tool.py` also emits three `Error:` strings for pre-execution
   failures — unknown subagent type, host-bash disabled, and "task
   disappeared from background tasks". They are intentionally handled by
   `ERROR_WRAPPER_PATTERN` rather than dedicated prefixes (the wrapper
   already produces the right terminal-failed shape) but had no test
   coverage proving that wiring. Codex was right that a refactor splitting
   one of them off into its own prefix would silently break things.

The JSDoc on the constants block now spells the three pre-execution
errors out so the relationship between `task_tool.py` returns and the
prefix vocabulary is explicit.

No production code change beyond the docstring — this commit is pure
coverage hardening for the contract that already exists.

Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review
2026-05-21 21:18:10 +08:00

44 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

DeerFlow is a LangGraph-based AI super agent system with a full-stack architecture. The backend provides a "super agent" with sandbox execution, persistent memory, subagent delegation, and extensible tool integration - all operating in per-thread isolated environments.

Architecture:

  • Gateway API (port 8001): REST API plus embedded LangGraph-compatible agent runtime
  • Frontend (port 3000): Next.js web interface
  • Nginx (port 2026): Unified reverse proxy entry point
  • Provisioner (port 8002, optional in Docker dev): Started only when sandbox is configured for provisioner/Kubernetes mode

Runtime:

  • make dev, Docker dev, and production all run the agent runtime in Gateway via RunManager + run_agent() + StreamBridge (packages/harness/deerflow/runtime/). Nginx exposes that runtime at /api/langgraph/* and rewrites it to Gateway's native /api/* routers.

Project Structure:

deer-flow/
├── Makefile                    # Root commands (check, install, dev, stop)
├── config.yaml                 # Main application configuration
├── extensions_config.json      # MCP servers and skills configuration
├── backend/                    # Backend application (this directory)
│   ├── Makefile               # Backend-only commands (dev, gateway, lint)
│   ├── langgraph.json         # LangGraph Studio graph configuration
│   ├── packages/
│   │   └── harness/           # deerflow-harness package (import: deerflow.*)
│   │       ├── pyproject.toml
│   │       └── deerflow/
│   │           ├── agents/            # LangGraph agent system
│   │           │   ├── lead_agent/    # Main agent (factory + system prompt)
│   │           │   ├── middlewares/   # 10 middleware components
│   │           │   ├── memory/        # Memory extraction, queue, prompts
│   │           │   └── thread_state.py # ThreadState schema
│   │           ├── sandbox/           # Sandbox execution system
│   │           │   ├── local/         # Local filesystem provider
│   │           │   ├── sandbox.py     # Abstract Sandbox interface
│   │           │   ├── tools.py       # bash, ls, read/write/str_replace
│   │           │   └── middleware.py  # Sandbox lifecycle management
│   │           ├── subagents/         # Subagent delegation system
│   │           │   ├── builtins/      # general-purpose, bash agents
│   │           │   ├── executor.py    # Background execution engine
│   │           │   └── registry.py    # Agent registry
│   │           ├── tools/builtins/    # Built-in tools (present_files, ask_clarification, view_image)
│   │           ├── mcp/               # MCP integration (tools, cache, client)
│   │           ├── models/            # Model factory with thinking/vision support
│   │           ├── skills/            # Skills discovery, loading, parsing
│   │           ├── config/            # Configuration system (app, model, sandbox, tool, etc.)
│   │           ├── community/         # Community tools (tavily, jina_ai, firecrawl, image_search, aio_sandbox)
│   │           ├── reflection/        # Dynamic module loading (resolve_variable, resolve_class)
│   │           ├── utils/             # Utilities (network, readability)
│   │           └── client.py          # Embedded Python client (DeerFlowClient)
│   ├── app/                   # Application layer (import: app.*)
│   │   ├── gateway/           # FastAPI Gateway API
│   │   │   ├── app.py         # FastAPI application
│   │   │   └── routers/       # FastAPI route modules (models, mcp, memory, skills, uploads, threads, artifacts, agents, suggestions, channels)
│   │   └── channels/          # IM platform integrations
│   ├── tests/                 # Test suite
│   └── docs/                  # Documentation
├── frontend/                   # Next.js frontend application
└── skills/                     # Agent skills directory
    ├── public/                # Public skills (committed)
    └── custom/                # Custom skills (gitignored)

Important Development Guidelines

Documentation Update Policy

CRITICAL: Always update README.md and CLAUDE.md after every code change

When making code changes, you MUST update the relevant documentation:

  • Update README.md for user-facing changes (features, setup, usage instructions)
  • Update CLAUDE.md for development changes (architecture, commands, workflows, internal systems)
  • Keep documentation synchronized with the codebase at all times
  • Ensure accuracy and timeliness of all documentation

Commands

Root directory (for full application):

make check      # Check system requirements
make install    # Install all dependencies (frontend + backend)
make dev        # Start all services (Gateway + Frontend + Nginx), with config.yaml preflight
make start      # Start production services locally
make stop       # Stop all services

Backend directory (for backend development only):

make install    # Install backend dependencies
make dev        # Run Gateway API with reload (port 8001)
make gateway    # Run Gateway API only (port 8001)
make test       # Run all backend tests
make lint       # Lint with ruff
make format     # Format code with ruff

Regression tests related to Docker/provisioner behavior:

  • tests/test_docker_sandbox_mode_detection.py (mode detection from config.yaml)
  • tests/test_provisioner_kubeconfig.py (kubeconfig file/directory handling)

Boundary check (harness → app import firewall):

  • tests/test_harness_boundary.py — ensures packages/harness/deerflow/ never imports from app.*

CI runs these regression tests for every pull request via .github/workflows/backend-unit-tests.yml.

Architecture

Harness / App Split

The backend is split into two layers with a strict dependency direction:

  • Harness (packages/harness/deerflow/): Publishable agent framework package (deerflow-harness). Import prefix: deerflow.*. Contains agent orchestration, tools, sandbox, models, MCP, skills, config — everything needed to build and run agents.
  • App (app/): Unpublished application code. Import prefix: app.*. Contains the FastAPI Gateway API and IM channel integrations (Feishu, Slack, Telegram, DingTalk).

Dependency rule: App imports deerflow, but deerflow never imports app. This boundary is enforced by tests/test_harness_boundary.py which runs in CI.

Import conventions:

# Harness internal
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model

# App internal
from app.gateway.app import app
from app.channels.service import start_channel_service

# App → Harness (allowed)
from deerflow.config import get_app_config

# Harness → App (FORBIDDEN — enforced by test_harness_boundary.py)
# from app.gateway.routers.uploads import ...  # ← will fail CI

Agent System

Lead Agent (packages/harness/deerflow/agents/lead_agent/agent.py):

  • Entry point: make_lead_agent(config: RunnableConfig) registered in langgraph.json
  • Dynamic model selection via create_chat_model() with thinking/vision support
  • Tools loaded via get_available_tools() - combines sandbox, built-in, MCP, community, and subagent tools
  • System prompt generated by apply_prompt_template() with skills, memory, and subagent instructions

ThreadState (packages/harness/deerflow/agents/thread_state.py):

  • Extends AgentState with: sandbox, thread_data, title, artifacts, todos, uploaded_files, viewed_images
  • Uses custom reducers: merge_artifacts (deduplicate), merge_viewed_images (merge/clear)

Runtime Configuration (via config.configurable):

  • thinking_enabled - Enable model's extended thinking
  • model_name - Select specific LLM model
  • is_plan_mode - Enable TodoList middleware
  • subagent_enabled - Enable task delegation tool

Middleware Chain

Lead-agent middlewares are assembled in strict append order across packages/harness/deerflow/agents/middlewares/tool_error_handling_middleware.py (build_lead_runtime_middlewares) and packages/harness/deerflow/agents/lead_agent/agent.py (_build_middlewares):

  1. ThreadDataMiddleware - Creates per-thread directories under the user's isolation scope (backend/.deer-flow/users/{user_id}/threads/{thread_id}/user-data/{workspace,uploads,outputs}); resolves user_id via get_effective_user_id() (falls back to "default" in no-auth mode); Web UI thread deletion now follows LangGraph thread removal with Gateway cleanup of the local thread directory
  2. UploadsMiddleware - Tracks and injects newly uploaded files into conversation
  3. SandboxMiddleware - Acquires sandbox, stores sandbox_id in state
  4. DanglingToolCallMiddleware - Injects placeholder ToolMessages for AIMessage tool_calls that lack responses (e.g., due to user interruption), including raw provider tool-call payloads preserved only in additional_kwargs["tool_calls"]
  5. LLMErrorHandlingMiddleware - Normalizes provider/model invocation failures into recoverable assistant-facing errors before later middleware/tool stages run
  6. GuardrailMiddleware - Pre-tool-call authorization via pluggable GuardrailProvider protocol (optional, if guardrails.enabled in config). Evaluates each tool call and returns error ToolMessage on deny. Three provider options: built-in AllowlistProvider (zero deps), OAP policy providers (e.g. aport-agent-guardrails), or custom providers. See docs/GUARDRAILS.md for setup, usage, and how to implement a provider.
  7. SandboxAuditMiddleware - Audits sandboxed shell/file operations for security logging before tool execution continues
  8. ToolErrorHandlingMiddleware - Converts tool exceptions into error ToolMessages so the run can continue instead of aborting
  9. SummarizationMiddleware - Context reduction when approaching token limits (optional, if enabled)
  10. TodoListMiddleware - Task tracking with write_todos tool (optional, if plan_mode)
  11. TokenUsageMiddleware - Records token usage metrics when token tracking is enabled (optional); subagent usage is cached by tool_call_id only while token usage is enabled and merged back into the dispatching AIMessage by message position rather than message id
  12. TitleMiddleware - Auto-generates thread title after first complete exchange and normalizes structured message content before prompting the title model
  13. MemoryMiddleware - Queues conversations for async memory update (filters to user + final AI responses)
  14. ViewImageMiddleware - Injects base64 image data before LLM call (conditional on vision support)
  15. DeferredToolFilterMiddleware - Hides deferred tool schemas from the bound model until tool search is enabled (optional)
  16. SubagentLimitMiddleware - Truncates excess task tool calls from model response to enforce MAX_CONCURRENT_SUBAGENTS limit (optional, if subagent_enabled)
  17. LoopDetectionMiddleware - Detects repeated tool-call loops; hard-stop responses clear both structured tool_calls and raw provider tool-call metadata before forcing a final text answer
  18. ClarificationMiddleware - Intercepts ask_clarification tool calls, interrupts via Command(goto=END) (must be last)

Configuration System

Main Configuration (config.yaml):

Setup: Copy config.example.yaml to config.yaml in the project root directory.

Config Versioning: config.example.yaml has a config_version field. On startup, AppConfig.from_file() compares user version vs example version and emits a warning if outdated. Missing config_version = version 0. Run make config-upgrade to auto-merge missing fields. When changing the config schema, bump config_version in config.example.yaml.

Config Caching: get_app_config() caches the parsed config, but automatically reloads it when the resolved config path changes or the file's mtime increases. This keeps Gateway and LangGraph reads aligned with config.yaml edits without requiring a manual process restart.

Config Hot-Reload Boundary: Gateway dependencies route through get_app_config() on every request, so per-run fields like models[*].max_tokens, summarization.*, title.*, memory.*, subagents.*, tools[*], and the agent system prompt pick up config.yaml edits on the next message. AppConfig is intentionally not cached on app.statelifespan() keeps a local startup_config variable for one-shot bootstrap work (logging level, channels, langgraph_runtime engines) and passes it explicitly to langgraph_runtime(app, startup_config). Infrastructure fields are restart-required:

Field Why a restart is required
database.* init_engine_from_config() runs once during langgraph_runtime() startup; the SQLAlchemy engine holds the connection pool.
checkpointer.* (including SQLite WAL/journal settings) make_checkpointer() binds the persistent checkpointer once at startup.
run_events.* make_run_event_store() selects memory- vs. SQL-backed implementation at startup.
stream_bridge.* make_stream_bridge() constructs the bridge object once.
sandbox.use get_sandbox_provider() caches the provider singleton (_default_sandbox_provider); a new class path takes effect only on next process start.
log_level apply_logging_level() is called only in app.py startup; it mutates the root logger's level, and get_app_config() returning a fresh AppConfig does not retrigger it.
channels.* IM platform credentials start_channel_service() is invoked once during startup; live channels are not rebuilt on config change.

Configuration priority:

  1. Explicit config_path argument
  2. DEER_FLOW_CONFIG_PATH environment variable
  3. config.yaml in current directory (backend/)
  4. config.yaml in parent directory (project root - recommended location)

Config values starting with $ are resolved as environment variables (e.g., $OPENAI_API_KEY). ModelConfig also declares use_responses_api and output_version so OpenAI /v1/responses can be enabled explicitly while still using langchain_openai:ChatOpenAI.

Extensions Configuration (extensions_config.json):

MCP servers and skills are configured together in extensions_config.json in project root:

Configuration priority:

  1. Explicit config_path argument
  2. DEER_FLOW_EXTENSIONS_CONFIG_PATH environment variable
  3. extensions_config.json in current directory (backend/)
  4. extensions_config.json in parent directory (project root - recommended location)

Gateway API (app/gateway/)

FastAPI application on port 8001 with health check at GET /health. Set GATEWAY_ENABLE_DOCS=false to disable /docs, /redoc, and /openapi.json in production (default: enabled).

CORS is same-origin by default when requests enter through nginx on port 2026. Split-origin or port-forwarded browser clients must opt in with GATEWAY_CORS_ORIGINS (comma-separated exact origins); Gateway CORSMiddleware and CSRFMiddleware both read that variable so browser CORS and auth-origin checks stay aligned.

Routers:

Router Endpoints
Models (/api/models) GET / - list models; GET /{name} - model details
MCP (/api/mcp) GET /config - get config; PUT /config - update config (saves to extensions_config.json)
Skills (/api/skills) GET / - list skills; GET /{name} - details; PUT /{name} - update enabled; POST /install - install from .skill archive (accepts standard optional frontmatter like version, author, compatibility)
Memory (/api/memory) GET / - memory data; POST /reload - force reload; GET /config - config; GET /status - config + data
Uploads (/api/threads/{id}/uploads) POST / - upload files (auto-converts PDF/PPT/Excel/Word); GET /list - list; DELETE /{filename} - delete
Threads (/api/threads/{id}) DELETE / - remove DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail
Artifacts (/api/threads/{id}/artifacts) GET /{path} - serve artifacts; active content types (text/html, application/xhtml+xml, image/svg+xml) are always forced as download attachments to reduce XSS risk; ?download=true still forces download for other file types
Suggestions (/api/threads/{id}/suggestions) POST / - generate follow-up questions; rich list/block model content is normalized before JSON parsing
Thread Runs (/api/threads/{id}/runs) POST / - create background run; POST /stream - create + SSE stream; POST /wait - create + block; GET / - list runs; GET /{rid} - run details; POST /{rid}/cancel - cancel; GET /{rid}/join - join SSE; GET /{rid}/messages - paginated messages {data, has_more}; GET /{rid}/events - full event stream; GET /../messages - thread messages with feedback; GET /../token-usage - aggregate tokens
Feedback (/api/threads/{id}/runs/{rid}/feedback) PUT / - upsert feedback; DELETE / - delete user feedback; POST / - create feedback; GET / - list feedback; GET /stats - aggregate stats; DELETE /{fid} - delete specific
Runs (/api/runs) POST /stream - stateless run + SSE; POST /wait - stateless run + block; GET /{rid}/messages - paginated messages by run_id {data, has_more} (cursor: after_seq/before_seq); GET /{rid}/feedback - list feedback by run_id

RunManager / RunStore contract:

  • RunManager.get() is async; direct callers must await it.
  • When a persistent RunStore is configured, get() and list_by_thread() hydrate historical runs from the store. In-memory records win for the same run_id so task, abort, and stream-control state stays attached to active local runs.
  • cancel() and create_or_reject(..., multitask_strategy="interrupt"|"rollback") persist interrupted status through RunStore.update_status(), matching normal set_status() transitions.
  • Store-only hydrated runs are readable history. If the current worker has no in-memory task/control state for that run, cancellation APIs can return 409 because this worker cannot stop the task.

Proxied through nginx: /api/langgraph/* → Gateway LangGraph-compatible runtime, all other /api/* → Gateway REST APIs.

Sandbox System (packages/harness/deerflow/sandbox/)

Interface: Abstract Sandbox with execute_command, read_file, write_file, list_dir Provider Pattern: SandboxProvider with acquire, acquire_async, get, release lifecycle. Async agent/tool paths call async sandbox lifecycle hooks so Docker sandbox creation, discovery, cross-process locking, readiness polling, and release stay off the event loop. Implementations:

  • LocalSandboxProvider - Local filesystem execution. acquire(thread_id) returns a per-thread LocalSandbox (id local:{thread_id}) whose path_mappings resolve /mnt/user-data/{workspace,uploads,outputs} and /mnt/acp-workspace to that thread's host directories, so the public Sandbox API honours the /mnt/user-data contract uniformly with AIO. acquire() / acquire(None) keeps the legacy generic singleton (id local) for callers without a thread context. Per-thread sandboxes are held in an LRU cache (default 256 entries) guarded by a threading.Lock.
  • AioSandboxProvider (packages/harness/deerflow/community/) - Docker-based isolation

Virtual Path System:

  • Agent sees: /mnt/user-data/{workspace,uploads,outputs}, /mnt/skills
  • Physical: backend/.deer-flow/users/{user_id}/threads/{thread_id}/user-data/..., deer-flow/skills/
  • Translation: LocalSandboxProvider builds per-thread PathMappings for the user-data prefixes at acquire time; tools.py keeps replace_virtual_path() / replace_virtual_paths_in_command() as a defense-in-depth layer (and for path validation). AIO has the directories volume-mounted at the same virtual paths inside its container, so both implementations accept /mnt/user-data/... natively.
  • Detection: is_local_sandbox() accepts both sandbox_id == "local" (legacy / no-thread) and sandbox_id.startswith("local:") (per-thread)

Sandbox Tools (in packages/harness/deerflow/sandbox/tools.py):

  • bash - Execute commands with path translation and error handling
  • ls - Directory listing (tree format, max 2 levels)
  • read_file - Read file contents with optional line range
  • write_file - Write/append to files, creates directories; overwrites by default and exposes the append argument in the model-facing schema for end-of-file writes
  • str_replace - Substring replacement (single or all occurrences); same-path serialization is scoped to (sandbox.id, path) so isolated sandboxes do not contend on identical virtual paths inside one process

Subagent System (packages/harness/deerflow/subagents/)

Built-in Agents: general-purpose (all tools except task) and bash (command specialist) Execution: Dual thread pool - _scheduler_pool (3 workers) + _execution_pool (3 workers) Concurrency: MAX_CONCURRENT_SUBAGENTS = 3 enforced by SubagentLimitMiddleware (truncates excess tool calls in after_model), 15-minute timeout Flow: task() tool → SubagentExecutor → background thread → poll 5s → SSE events → result Events: task_started, task_running, task_completed/task_failed/task_timed_out

Tool System (packages/harness/deerflow/tools/)

get_available_tools(groups, include_mcp, model_name, subagent_enabled) assembles:

  1. Config-defined tools - Resolved from config.yaml via resolve_variable()
  2. MCP tools - From enabled MCP servers (lazy initialized, cached with mtime invalidation)
  3. Built-in tools:
    • present_files - Make output files visible to user (only /mnt/user-data/outputs)
    • ask_clarification - Request clarification (intercepted by ClarificationMiddleware → interrupts)
    • view_image - Read image as base64 (added only if model supports vision)
    • setup_agent - Bootstrap-only: persist a brand-new custom agent's SOUL.md and config.yaml. Bound only when is_bootstrap=True.
    • update_agent - Custom-agent-only: persist self-updates to the current agent's SOUL.md / config.yaml from inside a normal chat (partial update + atomic write). Bound when agent_name is set and is_bootstrap=False.
  4. Subagent tool (if enabled):
    • task - Delegate to subagent (description, prompt, subagent_type)

Community tools (packages/harness/deerflow/community/):

  • tavily/ - Web search (5 results default) and web fetch (4KB limit)
  • jina_ai/ - Web fetch via Jina reader API with readability extraction
  • firecrawl/ - Web scraping via Firecrawl API

ACP agent tools:

  • invoke_acp_agent - Invokes external ACP-compatible agents from config.yaml
  • ACP launchers must be real ACP adapters. The standard codex CLI is not ACP-compatible by itself; configure a wrapper such as npx -y @zed-industries/codex-acp or an installed codex-acp binary
  • Missing ACP executables now return an actionable error message instead of a raw [Errno 2]
  • Each ACP agent uses a per-thread workspace at {base_dir}/users/{user_id}/threads/{thread_id}/acp-workspace/. The workspace is accessible to the lead agent via the virtual path /mnt/acp-workspace/ (read-only). In docker sandbox mode, the directory is volume-mounted into the container at /mnt/acp-workspace (read-only); in local sandbox mode, path translation is handled by tools.py
  • image_search/ - Image search via DuckDuckGo

MCP System (packages/harness/deerflow/mcp/)

  • Uses langchain-mcp-adapters MultiServerMCPClient for multi-server management
  • Lazy initialization: Tools loaded on first use via get_cached_mcp_tools()
  • Cache invalidation: Detects config file changes via mtime comparison
  • Transports: stdio (command-based), SSE, HTTP
  • OAuth (HTTP/SSE): Supports token endpoint flows (client_credentials, refresh_token) with automatic token refresh + Authorization header injection
  • Runtime updates: Gateway API saves to extensions_config.json; LangGraph detects via mtime

Skills System (packages/harness/deerflow/skills/)

  • Location: deer-flow/skills/{public,custom}/
  • Format: Directory with SKILL.md (YAML frontmatter: name, description, license, allowed-tools)
  • Loading: load_skills() recursively scans skills/{public,custom} for SKILL.md, parses metadata, and reads enabled state from extensions_config.json
  • Injection: Enabled skills listed in agent system prompt with container paths
  • Installation: POST /api/skills/install extracts .skill ZIP archive to custom/ directory

Model Factory (packages/harness/deerflow/models/factory.py)

  • create_chat_model(name, thinking_enabled) instantiates LLM from config via reflection
  • Supports thinking_enabled flag with per-model when_thinking_enabled overrides
  • Supports vLLM-style thinking toggles via when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking for Qwen reasoning models, while normalizing legacy thinking configs for backward compatibility
  • Supports supports_vision flag for image understanding models
  • Config values starting with $ resolved as environment variables
  • Missing provider modules surface actionable install hints from reflection resolvers (for example uv add langchain-google-genai)

vLLM Provider (packages/harness/deerflow/models/vllm_provider.py)

  • VllmChatModel subclasses langchain_openai:ChatOpenAI for vLLM 0.19.0 OpenAI-compatible endpoints
  • Preserves vLLM's non-standard assistant reasoning field on full responses, streaming deltas, and follow-up tool-call turns
  • Designed for configs that enable thinking through extra_body.chat_template_kwargs.enable_thinking on vLLM 0.19.0 Qwen reasoning models, while accepting the older thinking alias

IM Channels System (app/channels/)

Bridges external messaging platforms (Feishu, Slack, Telegram, DingTalk) to the DeerFlow agent via the LangGraph Server.

Architecture: Channels communicate with Gateway through the langgraph-sdk HTTP client (same as the frontend), ensuring threads are created and managed server-side. The internal SDK client injects process-local internal auth plus a matching CSRF cookie/header pair so Gateway accepts state-changing thread/run requests from channel workers without relying on browser session cookies.

Components:

  • message_bus.py - Async pub/sub hub (InboundMessage → queue → dispatcher; OutboundMessage → callbacks → channels)
  • store.py - JSON-file persistence mapping channel_name:chat_id[:topic_id]thread_id (keys are channel:chat for root conversations and channel:chat:topic for threaded conversations)
  • manager.py - Core dispatcher: creates threads via client.threads.create(), routes commands, keeps Slack/Telegram on client.runs.wait(), and uses client.runs.stream(["messages-tuple", "values"]) for Feishu incremental outbound updates
  • base.py - Abstract Channel base class (start/stop/send lifecycle)
  • service.py - Manages lifecycle of all configured channels from config.yaml
  • slack.py / feishu.py / telegram.py / dingtalk.py - Platform-specific implementations (feishu.py tracks the running card message_id in memory and patches the same card in place; dingtalk.py optionally uses AI Card streaming for in-place updates when card_template_id is configured)

Message Flow:

  1. External platform -> Channel impl -> MessageBus.publish_inbound()
  2. ChannelManager._dispatch_loop() consumes from queue
  3. For chat: look up/create thread through Gateway's LangGraph-compatible API
  4. Feishu chat: runs.stream() → accumulate AI text → publish multiple outbound updates (is_final=False) → publish final outbound (is_final=True)
  5. Slack/Telegram chat: runs.wait() → extract final response → publish outbound
  6. Feishu channel sends one running reply card up front, then patches the same card for each outbound update (card JSON sets config.update_multi=true for Feishu's patch API requirement)
  7. DingTalk AI Card mode (when card_template_id configured): runs.stream() → create card with initial text → stream updates via PUT /v1.0/card/streaming → finalize on is_final=True. Falls back to sampleMarkdown if card creation or streaming fails
  8. For commands (/new, /status, /models, /memory, /help): handle locally or query Gateway API
  9. Outbound → channel callbacks → platform reply

Configuration (config.yaml -> channels):

  • langgraph_url - LangGraph-compatible Gateway API base URL (default: http://localhost:8001/api)
  • gateway_url - Gateway API URL for auxiliary commands (default: http://localhost:8001)
  • In Docker Compose, IM channels run inside the gateway container, so localhost points back to that container. Use http://gateway:8001/api for langgraph_url and http://gateway:8001 for gateway_url, or set DEER_FLOW_CHANNELS_LANGGRAPH_URL / DEER_FLOW_CHANNELS_GATEWAY_URL.
  • Per-channel configs: feishu (app_id, app_secret), slack (bot_token, app_token), telegram (bot_token), dingtalk (client_id, client_secret, optional card_template_id for AI Card streaming)

Memory System (packages/harness/deerflow/agents/memory/)

Components:

  • updater.py - LLM-based memory updates with fact extraction, whitespace-normalized fact deduplication (trims leading/trailing whitespace before comparing), and atomic file I/O
  • queue.py - Debounced update queue (per-thread deduplication, configurable wait time); captures user_id at enqueue time so it survives the threading.Timer boundary
  • prompt.py - Prompt templates for memory updates
  • storage.py - File-based storage with per-user isolation; cache keyed by (user_id, agent_name) tuple

Per-User Isolation:

  • Memory is stored per-user at {base_dir}/users/{user_id}/memory.json
  • Per-agent per-user memory at {base_dir}/users/{user_id}/agents/{agent_name}/memory.json
  • Custom agent definitions (SOUL.md + config.yaml) are also per-user at {base_dir}/users/{user_id}/agents/{agent_name}/. The legacy shared layout {base_dir}/agents/{agent_name}/ remains read-only fallback for unmigrated installations
  • user_id is resolved via get_effective_user_id() from deerflow.runtime.user_context
  • In no-auth mode, user_id defaults to "default" (constant DEFAULT_USER_ID)
  • Absolute storage_path in config opts out of per-user isolation
  • Migration: Run PYTHONPATH=. python scripts/migrate_user_isolation.py to move legacy memory.json, threads/, and agents/ into per-user layout. Supports --dry-run (preview changes) and --user-id USER_ID (assign unowned legacy data to a user, defaults to default).

Data Structure (stored in {base_dir}/users/{user_id}/memory.json):

  • User Context: workContext, personalContext, topOfMind (1-3 sentence summaries)
  • History: recentMonths, earlierContext, longTermBackground
  • Facts: Discrete facts with id, content, category (preference/knowledge/context/behavior/goal), confidence (0-1), createdAt, source

Workflow:

  1. MemoryMiddleware filters messages (user inputs + final AI responses), captures user_id via get_effective_user_id(), and queues conversation with the captured user_id
  2. Queue debounces (30s default), batches updates, deduplicates per-thread
  3. Background thread invokes LLM to extract context updates and facts, using the stored user_id (not the contextvar, which is unavailable on timer threads)
  4. Applies updates atomically (temp file + rename) with cache invalidation, skipping duplicate fact content before append
  5. Next interaction injects top 15 facts + context into <memory> tags in system prompt

Focused regression coverage for the updater lives in backend/tests/test_memory_updater.py.

Configuration (config.yamlmemory):

  • enabled / injection_enabled - Master switches
  • storage_path - Path to memory.json (absolute path opts out of per-user isolation)
  • debounce_seconds - Wait time before processing (default: 30)
  • model_name - LLM for updates (null = default model)
  • max_facts / fact_confidence_threshold - Fact storage limits (100 / 0.7)
  • max_injection_tokens - Token limit for prompt injection (2000)

Reflection System (packages/harness/deerflow/reflection/)

  • resolve_variable(path) - Import module and return variable (e.g., module.path:variable_name)
  • resolve_class(path, base_class) - Import and validate class against base class

Tracing System (packages/harness/deerflow/tracing/)

LangSmith and Langfuse are both supported. The wiring lives in two layers:

  • factory.py::build_tracing_callbacks() — returns the LangChain CallbackHandler list for the providers currently enabled via env vars (LANGSMITH_TRACING, LANGFUSE_TRACING, etc.). The handlers are attached at the graph invocation root for in-graph runs (make_lead_agent and DeerFlowClient.stream both append them to config["callbacks"] before invoking the graph) so a single run produces one trace with all node / LLM / tool calls as child spans. Standalone callers — anything that invokes a model outside such a graph (e.g. MemoryUpdater) — keep create_chat_model's default attach_tracing=True, which falls back to model-level callback attachment.
  • metadata.py::build_langfuse_trace_metadata() — builds the Langfuse-reserved trace attributes for RunnableConfig.metadata. The Langfuse v4 langchain.CallbackHandler lifts these onto the root trace (see its _parse_langfuse_trace_attributes), but only when it sees on_chain_start(parent_run_id=None) — which is why the callbacks have to live at the graph root, not the model.

Trace-attribute injection points: both runtime/runs/worker.py::run_agent (gateway path) and client.py::DeerFlowClient.stream (embedded path) merge the metadata into config["metadata"] right before constructing the graph. Caller-supplied keys win via setdefault, so an external session_id override is preserved. Field mapping:

Langfuse field Source
langfuse_session_id LangGraph thread_id
langfuse_user_id get_effective_user_id() (default in no-auth)
langfuse_trace_name RunRecord.assistant_id / client agent_name (defaults to lead-agent)
langfuse_tags env:<DEER_FLOW_ENV> + model:<model_name>

Returns {} when Langfuse is not in the enabled providers — LangSmith-only deployments are unaffected. Set DEER_FLOW_ENV (or ENVIRONMENT) to tag traces by deployment environment. Tests live in tests/test_tracing_factory.py, tests/test_tracing_metadata.py, tests/test_worker_langfuse_metadata.py, and tests/test_client_langfuse_metadata.py.

Config Schema

config.yaml key sections:

  • models[] - LLM configs with use class path, supports_thinking, supports_vision, provider-specific fields
  • vLLM reasoning models should use deerflow.models.vllm_provider:VllmChatModel; for Qwen-style parsers prefer when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking, and DeerFlow will also normalize the older thinking alias
  • tools[] - Tool configs with use variable path and group
  • tool_groups[] - Logical groupings for tools
  • sandbox.use - Sandbox provider class path
  • skills.path / skills.container_path - Host and container paths to skills directory
  • title - Auto-title generation (enabled, max_words, max_chars, prompt_template)
  • summarization - Context summarization (enabled, trigger conditions, keep policy)
  • subagents.enabled - Master switch for subagent delegation
  • memory - Memory system (enabled, storage_path, debounce_seconds, model_name, max_facts, fact_confidence_threshold, injection_enabled, max_injection_tokens)

extensions_config.json:

  • mcpServers - Map of server name → config (enabled, type, command, args, env, url, headers, oauth, description)
  • skills - Map of skill name → state (enabled)

Both can be modified at runtime via Gateway API endpoints or DeerFlowClient methods.

Embedded Client (packages/harness/deerflow/client.py)

DeerFlowClient provides direct in-process access to all DeerFlow capabilities without HTTP services. All return types align with the Gateway API response schemas, so consumer code works identically in HTTP and embedded modes.

Architecture: Imports the same deerflow modules that Gateway API uses. Shares the same config files and data directories. No FastAPI dependency.

Agent Conversation:

  • chat(message, thread_id) — synchronous, accumulates streaming deltas per message-id and returns the final AI text
  • stream(message, thread_id) — subscribes to LangGraph stream_mode=["values", "messages", "custom"] and yields StreamEvent:
    • "values" — full state snapshot (title, messages, artifacts); AI text already delivered via messages mode is not re-synthesized here to avoid duplicate deliveries
    • "messages-tuple" — per-chunk update: for AI text this is a delta (concat per id to rebuild the full message); tool calls and tool results are emitted once each
    • "custom" — forwarded from StreamWriter
    • "end" — stream finished (carries cumulative usage counted once per message id)
  • Agent created lazily via create_agent() + _build_middlewares(), same as make_lead_agent
  • Supports checkpointer parameter for state persistence across turns
  • reset_agent() forces agent recreation (e.g. after memory or skill changes)
  • See docs/STREAMING.md for the full design: why Gateway and DeerFlowClient are parallel paths, LangGraph's stream_mode semantics, the per-id dedup invariants, and regression testing strategy

Gateway Equivalent Methods (replaces Gateway API):

Category Methods Return format
Models list_models(), get_model(name) {"models": [...]}, {name, display_name, ...}
MCP get_mcp_config(), update_mcp_config(servers) {"mcp_servers": {...}}
Skills list_skills(), get_skill(name), update_skill(name, enabled), install_skill(path) {"skills": [...]}
Memory get_memory(), reload_memory(), get_memory_config(), get_memory_status() dict
Uploads upload_files(thread_id, files), list_uploads(thread_id), delete_upload(thread_id, filename) {"success": true, "files": [...]}, {"files": [...], "count": N}
Artifacts get_artifact(thread_id, path)(bytes, mime_type) tuple

Key difference from Gateway: Upload accepts local Path objects instead of HTTP UploadFile, rejects directory paths before copying, and reuses a single worker when document conversion must run inside an active event loop. Artifact returns (bytes, mime_type) instead of HTTP Response. The new Gateway-only thread cleanup route deletes .deer-flow/threads/{thread_id} after LangGraph thread deletion; there is no matching DeerFlowClient method yet. update_mcp_config() and update_skill() automatically invalidate the cached agent.

Tests: tests/test_client.py (77 unit tests including TestGatewayConformance), tests/test_client_live.py (live integration tests, requires config.yaml)

Gateway Conformance Tests (TestGatewayConformance): Validate that every dict-returning client method conforms to the corresponding Gateway Pydantic response model. Each test parses the client output through the Gateway model — if Gateway adds a required field that the client doesn't provide, Pydantic raises ValidationError and CI catches the drift. Covers: ModelsListResponse, ModelResponse, SkillsListResponse, SkillResponse, SkillInstallResponse, McpConfigResponse, UploadResponse, MemoryConfigResponse, MemoryStatusResponse.

Development Workflow

Test-Driven Development (TDD) — MANDATORY

Every new feature or bug fix MUST be accompanied by unit tests. No exceptions.

  • Write tests in backend/tests/ following the existing naming convention test_<feature>.py
  • Run the full suite before and after your change: make test
  • Tests must pass before a feature is considered complete
  • For lightweight config/utility modules, prefer pure unit tests with no external dependencies
  • If a module causes circular import issues in tests, add a sys.modules mock in tests/conftest.py (see existing example for deerflow.subagents.executor)
# Run all tests
make test

# Run a specific test file
PYTHONPATH=. uv run pytest tests/test_<feature>.py -v

Running the Full Application

From the project root directory:

make dev

This starts all services and makes the application available at http://localhost:2026.

All startup modes:

Local Foreground Local Daemon Docker Dev Docker Prod
Dev ./scripts/serve.sh --dev
make dev
./scripts/serve.sh --dev --daemon
make dev-daemon
./scripts/docker.sh start
make docker-start
Prod ./scripts/serve.sh --prod
make start
./scripts/serve.sh --prod --daemon
make start-daemon
./scripts/deploy.sh
make up
Action Local Docker Dev Docker Prod
Stop ./scripts/serve.sh --stop
make stop
./scripts/docker.sh stop
make docker-stop
./scripts/deploy.sh down
make down
Restart ./scripts/serve.sh --restart [flags] ./scripts/docker.sh restart

Nginx routing:

  • /api/langgraph/* → Gateway embedded runtime (8001), rewritten to /api/*
  • /api/* (other) → Gateway API (8001)
  • / (non-API) → Frontend (3000)

Running Backend Services Separately

From the backend directory:

# Gateway API
make gateway

Direct access (without nginx):

  • Gateway: http://localhost:8001

Frontend Configuration

The frontend uses environment variables to connect to backend services:

  • NEXT_PUBLIC_LANGGRAPH_BASE_URL - Defaults to /api/langgraph (through nginx)
  • NEXT_PUBLIC_BACKEND_BASE_URL - Defaults to empty string (through nginx)

When using make dev from root, the frontend automatically connects through nginx.

Key Features

File Upload

Multi-file upload with automatic document conversion:

  • Endpoint: POST /api/threads/{thread_id}/uploads
  • Supports: PDF, PPT, Excel, Word documents (converted via markitdown)
  • Rejects directory inputs before copying so uploads stay all-or-nothing
  • Reuses one conversion worker per request when called from an active event loop
  • Files stored in thread-isolated directories
  • Duplicate filenames in a single upload request are auto-renamed with _N suffixes so later files do not truncate earlier files
  • Agent receives uploaded file list via UploadsMiddleware

See docs/FILE_UPLOAD.md for details.

Plan Mode

TodoList middleware for complex multi-step tasks:

  • Controlled via runtime config: config.configurable.is_plan_mode = True
  • Provides write_todos tool for task tracking
  • One task in_progress at a time, real-time updates

See docs/plan_mode_usage.md for details.

Context Summarization

Automatic conversation summarization when approaching token limits:

  • Configured in config.yaml under summarization key
  • Trigger types: tokens, messages, or fraction of max input
  • Keeps recent messages while summarizing older ones

See docs/summarization.md for details.

Vision Support

For models with supports_vision: true:

  • ViewImageMiddleware processes images in conversation
  • view_image_tool added to agent's toolset
  • Images automatically converted to base64 and injected into state

Code Style

  • Uses ruff for linting and formatting
  • Line length: 240 characters
  • Python 3.12+ with type hints
  • Double quotes, space indentation

Documentation

See docs/ directory for detailed documentation: