mirror of https://github.com/bytedance/deer-flow.git synced 2026-06-18 05:25:57 +00:00

Files

T

Nan Gao 525af0da14 fix(channels): scope IM files and helper commands to owner (#3579 )

* fix(channels): scope IM files and helper commands to owner

* fix(memory): honor bound IM owner for /memory gateway endpoints

The channel manager already attaches X-DeerFlow-Owner-User-Id for /memory
and /models, but the memory router resolved user_id solely from
get_effective_user_id(), which returns the synthetic internal user
(DEFAULT_USER_ID) for channel workers. A bound IM /memory therefore read
the default/internal memory instead of the connection owner's.

Resolve the owner via _resolve_memory_user_id(request) across all
/api/memory* endpoints: trusted internal callers act for the owner header,
browser/API callers fall back to get_effective_user_id(). Mirrors the
threads router's get_trusted_internal_owner_user_id pattern, completing
acceptance criterion #3 of #3539.

Add end-to-end tests asserting the resolved user_id (not just that the
header is sent) and that a spoofed owner header from a browser user is
ignored.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): align memory bucket and reuse cached storage owner

Address PR #3579 review feedback:

- Memory router now sanitizes the trusted owner header via make_safe_user_id
  before routing, matching the channel file pipeline
  (_safe_user_id_for_run/prepare_user_dir_for_raw_id). A bound owner id needing
  sanitization now resolves to the same bucket as its files/uploads instead of
  500ing in _validate_user_id.
- _handle_chat reuses the storage_user_id cached at the top of the method for
  artifact delivery instead of re-deriving _channel_storage_user_id(msg), so
  uploads and outputs cannot drift to different buckets if a channel rewrites
  the InboundMessage in receive_file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): stage unbound IM files under the run's user bucket

Address PR #3579 review feedback (#5): _channel_storage_user_id now mirrors
_resolve_run_params' identity policy, falling back to safe(msg.user_id) instead
of returning None for unbound auth-enabled channels.

Previously an unbound msg ran under safe(platform_user_id) but staged uploads
under get_effective_user_id() in the dispatcher task (unset contextvar ->
"default"), so files landed in users/default/... while the agent read from
users/{safe_platform_user_id}/.... Bound and unbound channels now write where
the agent reads. Returns None only when no identity is available.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): reuse cached storage owner in streaming artifact delivery

Address PR #3579 review feedback (#6): thread the storage_user_id resolved in
_handle_chat into _handle_streaming_chat instead of re-deriving
_channel_storage_user_id(msg) in the finally block. Avoids re-running
_safe_user_id_for_run (and its possible filesystem touch) on the streaming-error
path and guarantees artifact delivery targets the same bucket as the uploads.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(channels): document owner-scoped IM file storage

Address PR #3579 review feedback (#4): the IM Channels and File Upload sections
still described pre-PR default-bucket behaviour. Document that receive_file,
_ingest_inbound_files/ensure_uploads_dir/get_uploads_dir, and
_resolve_attachments/_prepare_artifact_delivery are owner-scoped via the user_id
kwarg, and that the bucket matches the memory bucket from _resolve_memory_user_id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(channels): unify run identity and storage bucket resolution

Address PR #3579 review feedback (#3): _resolve_run_params no longer duplicates
the owner-resolution rule inline. After the #5 fix the inline block and
_channel_storage_user_id computed the identical sanitized-with-platform-fallback
value, so the run identity now calls the same helper, making it the single
source of truth for run_context["user_id"] and the file/artifact storage bucket.

_owner_headers stays deliberately separate: it sends the raw owner id over HTTP
for the gateway to re-resolve (no sanitize, no platform fallback), documented on
both helpers. test_run_identity_matches_storage_bucket pins the two together so
they cannot drift again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-18 11:45:35 +08:00

57 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

DeerFlow is a LangGraph-based AI super agent system with a full-stack architecture. The backend provides a "super agent" with sandbox execution, persistent memory, subagent delegation, and extensible tool integration - all operating in per-thread isolated environments.

Architecture:

Gateway API (port 8001): REST API plus embedded LangGraph-compatible agent runtime
Frontend (port 3000): Next.js web interface
Nginx (port 2026): Unified reverse proxy entry point
Provisioner (port 8002, optional in Docker dev): Started only when sandbox is configured for provisioner/Kubernetes mode

Runtime:

make dev, Docker dev, and production all run the agent runtime in Gateway via RunManager + run_agent() + StreamBridge (packages/harness/deerflow/runtime/). Nginx exposes that runtime at /api/langgraph/* and rewrites it to Gateway's native /api/* routers.

Project Structure:

deer-flow/
├── Makefile                    # Root commands (check, install, dev, stop)
├── config.yaml                 # Main application configuration
├── extensions_config.json      # MCP servers and skills configuration
├── backend/                    # Backend application (this directory)
│   ├── Makefile               # Backend-only commands (dev, gateway, lint)
│   ├── langgraph.json         # LangGraph Studio graph configuration
│   ├── packages/
│   │   └── harness/           # deerflow-harness package (import: deerflow.*)
│   │       ├── pyproject.toml
│   │       └── deerflow/
│   │           ├── agents/            # LangGraph agent system
│   │           │   ├── lead_agent/    # Main agent (factory + system prompt)
│   │           │   ├── middlewares/   # 10 middleware components
│   │           │   ├── memory/        # Memory extraction, queue, prompts
│   │           │   └── thread_state.py # ThreadState schema
│   │           ├── sandbox/           # Sandbox execution system
│   │           │   ├── local/         # Local filesystem provider
│   │           │   ├── sandbox.py     # Abstract Sandbox interface
│   │           │   ├── tools.py       # bash, ls, read/write/str_replace
│   │           │   └── middleware.py  # Sandbox lifecycle management
│   │           ├── subagents/         # Subagent delegation system
│   │           │   ├── builtins/      # general-purpose, bash agents
│   │           │   ├── executor.py    # Background execution engine
│   │           │   └── registry.py    # Agent registry
│   │           ├── tools/builtins/    # Built-in tools (present_files, ask_clarification, view_image)
│   │           ├── mcp/               # MCP integration (tools, cache, client)
│   │           ├── models/            # Model factory with thinking/vision support
│   │           ├── skills/            # Skills discovery, loading, parsing
│   │           ├── config/            # Configuration system (app, model, sandbox, tool, etc.)
│   │           ├── community/         # Community tools (tavily, jina_ai, firecrawl, image_search, aio_sandbox)
│   │           ├── reflection/        # Dynamic module loading (resolve_variable, resolve_class)
│   │           ├── utils/             # Utilities (network, readability)
│   │           └── client.py          # Embedded Python client (DeerFlowClient)
│   ├── app/                   # Application layer (import: app.*)
│   │   ├── gateway/           # FastAPI Gateway API
│   │   │   ├── app.py         # FastAPI application
│   │   │   └── routers/       # FastAPI route modules (models, mcp, memory, skills, uploads, threads, artifacts, agents, suggestions, channels)
│   │   └── channels/          # IM platform integrations
│   ├── tests/                 # Test suite
│   └── docs/                  # Documentation
├── frontend/                   # Next.js frontend application
└── skills/                     # Agent skills directory
    ├── public/                # Public skills (committed)
    └── custom/                # Custom skills (gitignored)

Important Development Guidelines

Documentation Update Policy

CRITICAL: Always update README.md and CLAUDE.md after every code change

When making code changes, you MUST update the relevant documentation:

Update README.md for user-facing changes (features, setup, usage instructions)
Update CLAUDE.md for development changes (architecture, commands, workflows, internal systems)
Keep documentation synchronized with the codebase at all times
Ensure accuracy and timeliness of all documentation

Commands

Root directory (for full application):

make check      # Check system requirements
make install    # Install all dependencies (frontend + backend)
make dev        # Start all services (Gateway + Frontend + Nginx), with config.yaml preflight
make start      # Start production services locally
make stop       # Stop all services

Backend directory (for backend development only):

make install            # Install backend dependencies
make dev                # Run Gateway API with reload (port 8001)
make gateway            # Run Gateway API only (port 8001)
make test               # Run all backend tests
make test-blocking-io   # Run strict Blockbuster runtime gate on tests/blocking_io/
make lint               # Lint with ruff
make format             # Format code with ruff

The detect-blocking-io target parses app/, packages/harness/deerflow/, and scripts/ with AST. By default it reports only blocking IO candidates that are inside async code, reachable from async code in the same file, or reachable from sync-only AgentMiddleware before/after hooks that LangGraph can execute on the async graph path. It prints a concise summary and writes complete JSON findings to .deer-flow/blocking-io-findings.json at the repository root (both make detect-blocking-io from the repo root and cd backend && make detect-blocking-io resolve to the same repo-root path). JSON findings include priority, location, blocking_call, event_loop_exposure, reason, and code for model-assisted or manual review. priority is a deterministic review ordering from operation type, not proof of a bug. Bare-name same-file calls are resolved by function name, so duplicate helper names in one file can conservatively over-report async reachability. It is intentionally informational and is not run from CI in this round.

For a diff-scoped view of the same findings, scripts/scan_changed_blocking_io.py (repo root) reports findings on the added lines of git diff <base>...HEAD plus findings new versus the merge base (so a new async caller exposing an untouched sync helper in the same file is still reported) — used by the blocking-io-guard skill (.agent/skills/blocking-io-guard/) as the deterministic scope step before routing each candidate to a fix and/or a tests/blocking_io/ runtime anchor.

Regression tests related to Docker/provisioner behavior:

tests/test_docker_sandbox_mode_detection.py (mode detection from config.yaml)
tests/test_provisioner_kubeconfig.py (kubeconfig file/directory handling)

Blocking-IO runtime gate (tests/blocking_io/):

Wraps every item under tests/blocking_io/ with a strict Blockbuster context scoped to app.* and deerflow.* (see tests/support/detectors/blocking_io_runtime.py). Any sync blocking IO call whose stack passes through DeerFlow business code while running on the asyncio event loop raises BlockingError and fails the test.
Regression anchors live there: test_skills_load.py (locks the asyncio.to_thread offload around LocalSkillStorage.load_skills, fix for #1917); test_sqlite_lifespan.py (locks the offload around SQLite path resolution plus ensure_sqlite_parent_dir, fix for #1912); test_jsonl_run_event_store.py (locks JsonlRunEventStore's async API offloading its file IO via asyncio.to_thread, fix #3084); and test_uploads_middleware.py (locks UploadsMiddleware.abefore_agent offloading the uploads-directory scan off the event loop).
test_gate_smoke.py is a meta-test asserting the gate actually catches unoffloaded blocking IO and that the @pytest.mark.allow_blocking_io opt-out works.
Coverage boundary: the gate only sees code that test execution actually touches. Static AST coverage is a separate concern (out of scope for this PR).
CI: runs on every PR via .github/workflows/backend-blocking-io-tests.yml, hard-fail.

Boundary check (harness → app import firewall):

tests/test_harness_boundary.py — ensures packages/harness/deerflow/ never imports from app.*

CI runs these regression tests for every pull request via .github/workflows/backend-unit-tests.yml.

Architecture

Harness / App Split

The backend is split into two layers with a strict dependency direction:

Harness (packages/harness/deerflow/): Publishable agent framework package (deerflow-harness). Import prefix: deerflow.*. Contains agent orchestration, tools, sandbox, models, MCP, skills, config — everything needed to build and run agents.
App (app/): Unpublished application code. Import prefix: app.*. Contains the FastAPI Gateway API and IM channel integrations (Feishu, Slack, Telegram, DingTalk).

Dependency rule: App imports deerflow, but deerflow never imports app. This boundary is enforced by tests/test_harness_boundary.py which runs in CI.

Import conventions:

# Harness internal
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model

# App internal
from app.gateway.app import app
from app.channels.service import start_channel_service

# App → Harness (allowed)
from deerflow.config import get_app_config

# Harness → App (FORBIDDEN — enforced by test_harness_boundary.py)
# from app.gateway.routers.uploads import ...  # ← will fail CI

Agent System

Lead Agent (packages/harness/deerflow/agents/lead_agent/agent.py):

Entry point: make_lead_agent(config: RunnableConfig) registered in langgraph.json
Dynamic model selection via create_chat_model() with thinking/vision support
Tools loaded via get_available_tools() - combines sandbox, built-in, MCP, community, and subagent tools
System prompt generated by apply_prompt_template() with skills, memory, and subagent instructions

ThreadState (packages/harness/deerflow/agents/thread_state.py):

Extends AgentState with: sandbox, thread_data, title, artifacts, todos, uploaded_files, viewed_images
Uses custom reducers: merge_artifacts (deduplicate), merge_viewed_images (merge/clear)

Runtime Configuration (via config.configurable):

thinking_enabled - Enable model's extended thinking
model_name - Select specific LLM model
is_plan_mode - Enable TodoList middleware
subagent_enabled - Enable task delegation tool

Middleware Chain

Lead-agent middlewares are assembled in strict append order across packages/harness/deerflow/agents/middlewares/tool_error_handling_middleware.py (build_lead_runtime_middlewares) and packages/harness/deerflow/agents/lead_agent/agent.py (build_middlewares):

ThreadDataMiddleware - Creates per-thread directories under the user's isolation scope (backend/.deer-flow/users/{user_id}/threads/{thread_id}/user-data/{workspace,uploads,outputs}); resolves user_id via get_effective_user_id() (falls back to "default" in no-auth mode); Web UI thread deletion now follows LangGraph thread removal with Gateway cleanup of the local thread directory
UploadsMiddleware - Tracks and injects newly uploaded files into conversation
SandboxMiddleware - Acquires sandbox, stores sandbox_id in state
DanglingToolCallMiddleware - Injects placeholder ToolMessages for AIMessage tool_calls that lack responses (e.g., due to user interruption), including raw provider tool-call payloads preserved only in additional_kwargs["tool_calls"]
LLMErrorHandlingMiddleware - Normalizes provider/model invocation failures into recoverable assistant-facing errors before later middleware/tool stages run
GuardrailMiddleware - Pre-tool-call authorization via pluggable GuardrailProvider protocol (optional, if guardrails.enabled in config). Evaluates each tool call and returns error ToolMessage on deny. Three provider options: built-in AllowlistProvider (zero deps), OAP policy providers (e.g. aport-agent-guardrails), or custom providers. See docs/GUARDRAILS.md for setup, usage, and how to implement a provider.
SandboxAuditMiddleware - Audits sandboxed shell/file operations for security logging before tool execution continues
ToolErrorHandlingMiddleware - Converts tool exceptions into error ToolMessages so the run can continue instead of aborting
SkillActivationMiddleware - Detects strict /skill-name task syntax on the latest real user message, resolves only enabled and runtime-allowed skills, reads SKILL.md from trusted skill storage, injects the skill body as hidden current-turn model context, and records a middleware:skill_activation audit event with skill name, category, path, and content hash
SummarizationMiddleware - Context reduction when approaching token limits (optional, if enabled)
TodoListMiddleware - Task tracking with write_todos tool (optional, if plan_mode)
TokenUsageMiddleware - Records token usage metrics when token tracking is enabled (optional); subagent usage is cached by tool_call_id only while token usage is enabled and merged back into the dispatching AIMessage by message position rather than message id
TitleMiddleware - Auto-generates thread title after first complete exchange and normalizes structured message content before prompting the title model
MemoryMiddleware - Queues conversations for async memory update (filters to user + final AI responses)
ViewImageMiddleware - Injects base64 image data before LLM call (conditional on vision support)
DeferredToolFilterMiddleware - Hides deferred (MCP) tool schemas from the bound model using a build-time deferred-name set + catalog hash, reading per-thread promotions from ThreadState.promoted (hash-scoped, no ContextVar); a tool becomes bound on subsequent turns after tool_search returns its schema (optional, if tool_search.enabled)
SubagentLimitMiddleware - Truncates excess task tool calls from model response to enforce MAX_CONCURRENT_SUBAGENTS limit (optional, if subagent_enabled)
LoopDetectionMiddleware - Detects repeated tool-call loops; hard-stop responses clear both structured tool_calls and raw provider tool-call metadata before forcing a final text answer
ClarificationMiddleware - Intercepts ask_clarification tool calls, interrupts via Command(goto=END) (must be last)

Configuration System

Main Configuration (config.yaml):

Setup: Copy config.example.yaml to config.yaml in the project root directory.

Config Versioning: config.example.yaml has a config_version field. On startup, AppConfig.from_file() compares user version vs example version and emits a warning if outdated. Missing config_version = version 0. Run make config-upgrade to auto-merge missing fields. When changing the config schema, bump config_version in config.example.yaml.

Config Caching: get_app_config() caches the parsed config, but automatically reloads it when the resolved config path or file content signature changes. The signature includes file metadata and a content digest, so Gateway and LangGraph reads stay aligned with config.yaml edits even on object-store or network mounts where mtime can remain stale.

Config Hot-Reload Boundary: Gateway dependencies route through get_app_config() on every request, so per-run fields like models[*].max_tokens, summarization.*, title.*, memory.*, subagents.*, tools[*], and the agent system prompt pick up config.yaml edits on the next message. AppConfig is intentionally not cached on app.state — lifespan() keeps a local startup_config variable for one-shot bootstrap work and passes it to langgraph_runtime(app, startup_config).

Infrastructure fields are restart-required. The authoritative list lives in packages/harness/deerflow/config/reload_boundary.py::STARTUP_ONLY_FIELDS and is mirrored by the standardised "startup-only:" prefix on the corresponding Field(description=...) in AppConfig, so IDE hover on those fields surfaces the reason inline (no need to context-switch into this table). Currently registered: database, checkpointer, run_events, stream_bridge, sandbox, log_level, channels, channel_connections. Adding a new restart-required field requires updating the registry; drift is pinned by tests/test_reload_boundary.py.

Configuration priority:

Explicit config_path argument
DEER_FLOW_CONFIG_PATH environment variable
config.yaml in current directory (backend/)
config.yaml in parent directory (project root - recommended location)

Config values starting with $ are resolved as environment variables (e.g., $OPENAI_API_KEY). ModelConfig also declares use_responses_api and output_version so OpenAI /v1/responses can be enabled explicitly while still using langchain_openai:ChatOpenAI.

Extensions Configuration (extensions_config.json):

MCP servers and skills are configured together in extensions_config.json in project root:

Configuration priority:

Explicit config_path argument
DEER_FLOW_EXTENSIONS_CONFIG_PATH environment variable
extensions_config.json in current directory (backend/)
extensions_config.json in parent directory (project root - recommended location)

Gateway API (`app/gateway/`)

FastAPI application on port 8001 with health check at GET /health. Set GATEWAY_ENABLE_DOCS=false to disable /docs, /redoc, and /openapi.json in production (default: enabled).

CORS is same-origin by default when requests enter through nginx on port 2026. Split-origin or port-forwarded browser clients must opt in with GATEWAY_CORS_ORIGINS (comma-separated exact origins); Gateway CORSMiddleware and CSRFMiddleware both read that variable so browser CORS and auth-origin checks stay aligned.

Routers:

Router	Endpoints
Models (`/api/models`)	`GET /` - list models; `GET /{name}` - model details
MCP (`/api/mcp`)	`GET /config` - get config; `PUT /config` - update config (saves to extensions_config.json)
Skills (`/api/skills`)	`GET /` - list skills; `GET /{name}` - details; `PUT /{name}` - update enabled; `POST /install` - install from .skill archive (accepts standard optional frontmatter like `version`, `author`, `compatibility`)
Memory (`/api/memory`)	`GET /` - memory data; `POST /reload` - force reload; `GET /config` - config; `GET /status` - config + data
Uploads (`/api/threads/{id}/uploads`)	`POST /` - upload files (auto-converts PDF/PPT/Excel/Word); `GET /list` - list; `DELETE /{filename}` - delete
Threads (`/api/threads/{id}`)	`DELETE /` - remove DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail
Artifacts (`/api/threads/{id}/artifacts`)	`GET /{path}` - serve artifacts; active content types (`text/html`, `application/xhtml+xml`, `image/svg+xml`) are always forced as download attachments to reduce XSS risk; `?download=true` still forces download for other file types
Suggestions (`/api/threads/{id}/suggestions`)	`POST /` - generate follow-up questions; rich list/block model content is normalized and inline reasoning (`<think>...</think>`, including unclosed/truncated blocks from reasoning models like MiniMax-M3) is stripped before JSON parsing
Thread Runs (`/api/threads/{id}/runs`)	`POST /` - create background run; `POST /stream` - create + SSE stream; `POST /wait` - create + block; `GET /` - list runs; `GET /{rid}` - run details; `POST /{rid}/cancel` - cancel; `GET /{rid}/join` - join SSE; `GET /{rid}/messages` - paginated messages `{data, has_more}`; `GET /{rid}/events` - full event stream; `GET /../messages` - thread messages with feedback; `GET /../token-usage` - aggregate tokens
Feedback (`/api/threads/{id}/runs/{rid}/feedback`)	`PUT /` - upsert feedback; `DELETE /` - delete user feedback; `POST /` - create feedback; `GET /` - list feedback; `GET /stats` - aggregate stats; `DELETE /{fid}` - delete specific
Runs (`/api/runs`)	`POST /stream` - stateless run + SSE; `POST /wait` - stateless run + block; `GET /{rid}/messages` - paginated messages by run_id `{data, has_more}` (cursor: `after_seq`/`before_seq`); `GET /{rid}/feedback` - list feedback by run_id

RunManager / RunStore contract:

RunManager.get() is async; direct callers must await it.
When a persistent RunStore is configured, get() and list_by_thread() hydrate historical runs from the store. In-memory records win for the same run_id so task, abort, and stream-control state stays attached to active local runs.
cancel() and create_or_reject(..., multitask_strategy="interrupt"|"rollback") persist interrupted status through RunStore.update_status(), matching normal set_status() transitions.
Store-only hydrated runs are readable history. If the current worker has no in-memory task/control state for that run, cancellation APIs can return 409 because this worker cannot stop the task.
POST /wait (both thread-scoped and /api/runs/wait) drains the stream bridge via wait_for_run_completion() instead of bare await record.task, so it honours the run's on_disconnect setting and cancels the background run on real client disconnect rather than returning a stale checkpoint (issue #3265).

Proxied through nginx: /api/langgraph/* → Gateway LangGraph-compatible runtime, all other /api/* → Gateway REST APIs.

Sandbox System (`packages/harness/deerflow/sandbox/`)

Interface: Abstract Sandbox with execute_command, read_file, write_file, list_dir Provider Pattern: SandboxProvider with acquire, acquire_async, get, release lifecycle. Async agent/tool paths call async sandbox lifecycle hooks so Docker sandbox creation, discovery, cross-process locking, readiness polling, and release stay off the event loop. Implementations:

LocalSandboxProvider - Local filesystem execution. acquire(thread_id) returns a per-thread LocalSandbox (id local:{thread_id}) whose path_mappings resolve /mnt/user-data/{workspace,uploads,outputs} and /mnt/acp-workspace to that thread's host directories, so the public Sandbox API honours the /mnt/user-data contract uniformly with AIO. acquire() / acquire(None) keeps the legacy generic singleton (id local) for callers without a thread context. Per-thread sandboxes are held in an LRU cache (default 256 entries) guarded by a threading.Lock.
AioSandboxProvider (packages/harness/deerflow/community/) - Docker-based isolation. Active-cache and warm-pool entries are checked with the backend during acquire/reuse; definitively dead containers are dropped from all in-process maps so the thread can discover or create a fresh sandbox instead of reusing a stale client. Backend health-check failures are treated as unknown, not dead; local discovery likewise treats an unverifiable container as not adoptable and falls through to create rather than failing acquire. get() remains an in-memory lookup for event-loop-safe tool paths.

Virtual Path System:

Agent sees: /mnt/user-data/{workspace,uploads,outputs}, /mnt/skills
Physical: backend/.deer-flow/users/{user_id}/threads/{thread_id}/user-data/..., deer-flow/skills/
Translation: LocalSandboxProvider builds per-thread PathMappings for the user-data prefixes at acquire time; tools.py keeps replace_virtual_path() / replace_virtual_paths_in_command() as a defense-in-depth layer (and for path validation). AIO has the directories volume-mounted at the same virtual paths inside its container, so both implementations accept /mnt/user-data/... natively.
Detection: is_local_sandbox() accepts both sandbox_id == "local" (legacy / no-thread) and sandbox_id.startswith("local:") (per-thread)

Sandbox Tools (in packages/harness/deerflow/sandbox/tools.py):

bash - Execute commands with path translation and error handling
ls - Directory listing (tree format, max 2 levels)
read_file - Read file contents with optional line range
write_file - Write/append to files, creates directories; overwrites by default and exposes the append argument in the model-facing schema for end-of-file writes
str_replace - Substring replacement (single or all occurrences); same-path serialization is scoped to (sandbox.id, path) so isolated sandboxes do not contend on identical virtual paths inside one process

Subagent System (`packages/harness/deerflow/subagents/`)

Built-in Agents: general-purpose (all tools except task) and bash (command specialist) Execution: Dual thread pool - _scheduler_pool (3 workers) + _execution_pool (3 workers) Concurrency: MAX_CONCURRENT_SUBAGENTS = 3 enforced by SubagentLimitMiddleware (truncates excess tool calls in after_model); default subagent timeout subagents.timeout_seconds=1800 (30 min) and built-in general-purpose max_turns=150 (raised from 100/15-min so deep-research subtasks stop hitting GraphRecursionError out of the box) Flow: task() tool → SubagentExecutor → background thread → poll 5s → SSE events → result Events: task_started, task_running, task_completed/task_failed/task_timed_out Deferred MCP tools (if tool_search.enabled): SubagentExecutor._build_initial_state assembles deferral after policy filtering via the shared assemble_deferred_tools (fail-closed), appends the tool_search tool, injects the <available-deferred-tools> section into the subagent's SystemMessage, and threads the setup to _create_agent, which attaches DeferredToolFilterMiddleware through build_subagent_runtime_middlewares(deferred_setup=...). Subagents thus withhold full MCP schemas until promotion, same as the lead agent; each task run gets a fresh ThreadState so promotion is isolated per run Checkpointer isolation: Subagent graphs are compiled with checkpointer=False to avoid inheriting the parent run's checkpointer, since subagents are one-shot and never resume.

Tool System (`packages/harness/deerflow/tools/`)

get_available_tools(groups, include_mcp, model_name, subagent_enabled) assembles:

Config-defined tools - Resolved from config.yaml via resolve_variable()
MCP tools - From enabled MCP servers (lazy initialized, cached with mtime invalidation)
Built-in tools:
- present_files - Make output files visible to user (only /mnt/user-data/outputs)
- ask_clarification - Request clarification (intercepted by ClarificationMiddleware → interrupts)
- view_image - Read image as base64 (added only if model supports vision)
- setup_agent - Bootstrap-only: persist a brand-new custom agent's SOUL.md and config.yaml. Bound only when is_bootstrap=True.
- update_agent - Custom-agent-only: persist self-updates to the current agent's SOUL.md / config.yaml from inside a normal chat (partial update + atomic write). Bound when agent_name is set and is_bootstrap=False.
Subagent tool (if enabled):
- task - Delegate to subagent (description, prompt, subagent_type)

Community tools (packages/harness/deerflow/community/):

tavily/ - Web search (5 results default) and web fetch (4KB limit)
jina_ai/ - Web fetch via Jina reader API with readability extraction
firecrawl/ - Web scraping via Firecrawl API

ACP agent tools:

invoke_acp_agent - Invokes external ACP-compatible agents from config.yaml
ACP launchers must be real ACP adapters. The standard codex CLI is not ACP-compatible by itself; configure a wrapper such as npx -y @zed-industries/codex-acp or an installed codex-acp binary
Missing ACP executables now return an actionable error message instead of a raw [Errno 2]
Each ACP agent uses a per-thread workspace at {base_dir}/users/{user_id}/threads/{thread_id}/acp-workspace/. The workspace is accessible to the lead agent via the virtual path /mnt/acp-workspace/ (read-only). In docker sandbox mode, the directory is volume-mounted into the container at /mnt/acp-workspace (read-only); in local sandbox mode, path translation is handled by tools.py
image_search/ - Image search via DuckDuckGo

MCP System (`packages/harness/deerflow/mcp/`)

Uses langchain-mcp-adapters MultiServerMCPClient for multi-server management
Lazy initialization: Tools loaded on first use via get_cached_mcp_tools()
Cache invalidation: Detects config file changes via mtime comparison
Transports: stdio (command-based), SSE, HTTP
OAuth (HTTP/SSE): Supports token endpoint flows (client_credentials, refresh_token) with automatic token refresh + Authorization header injection
Runtime updates: Gateway API saves to extensions_config.json; the Gateway-embedded runtime detects changes via mtime

Skills System (`packages/harness/deerflow/skills/`)

Location: deer-flow/skills/{public,custom}/
Format: Directory with SKILL.md (YAML frontmatter: name, description, license, allowed-tools)
Loading: load_skills() recursively scans skills/{public,custom} for SKILL.md, parses metadata, and reads enabled state from extensions_config.json
Injection: Enabled skills listed in agent system prompt with container paths
Slash activation: /skill-name task loads that enabled skill's SKILL.md for the current model call only. The resolver rejects leading whitespace, missing separators, reserved channel commands (/new, /help, /bootstrap, /status, /models, /memory), disabled skills, and skills outside a custom agent's whitelist.
Installation: POST /api/skills/install extracts .skill ZIP archive to custom/ directory

Model Factory (`packages/harness/deerflow/models/factory.py`)

create_chat_model(name, thinking_enabled) instantiates LLM from config via reflection
Supports thinking_enabled flag with per-model when_thinking_enabled overrides
Supports vLLM-style thinking toggles via when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking for Qwen reasoning models, while normalizing legacy thinking configs for backward compatibility
Supports supports_vision flag for image understanding models
Config values starting with $ resolved as environment variables
Missing provider modules surface actionable install hints from reflection resolvers (for example uv add langchain-google-genai)

vLLM Provider (`packages/harness/deerflow/models/vllm_provider.py`)

VllmChatModel subclasses langchain_openai:ChatOpenAI for vLLM 0.19.0 OpenAI-compatible endpoints
Preserves vLLM's non-standard assistant reasoning field on full responses, streaming deltas, and follow-up tool-call turns
Designed for configs that enable thinking through extra_body.chat_template_kwargs.enable_thinking on vLLM 0.19.0 Qwen reasoning models, while accepting the older thinking alias

IM Channels System (`app/channels/`)

Bridges external messaging platforms (Feishu, Slack, Telegram, Discord, DingTalk) to the DeerFlow agent via Gateway's LangGraph-compatible API.

Architecture: Channels communicate with Gateway through the langgraph-sdk HTTP client (same as the frontend), ensuring threads are created and managed server-side. The internal SDK client injects process-local internal auth plus a matching CSRF cookie/header pair so Gateway accepts state-changing thread/run requests from channel workers without relying on browser session cookies.

Components:

message_bus.py - Async pub/sub hub (InboundMessage → queue → dispatcher; OutboundMessage → callbacks → channels)
store.py - JSON-file persistence mapping channel_name:chat_id[:topic_id] → thread_id (keys are channel:chat for root conversations and channel:chat:topic for threaded conversations)
manager.py - Core dispatcher: creates threads via client.threads.create(), routes commands, keeps Slack/Discord on client.runs.wait(), and uses client.runs.stream(["messages-tuple", "values"]) for Feishu/Telegram incremental outbound updates
base.py - Abstract Channel base class (start/stop/send lifecycle)
service.py - Manages lifecycle of all configured channels from config.yaml
slack.py / feishu.py / telegram.py / discord.py / dingtalk.py - Platform-specific implementations (feishu.py tracks the running card message_id in memory and patches the same card in place; telegram.py registers the "Working on it..." placeholder as the stream target and edits it in place via editMessageText; dingtalk.py optionally uses AI Card streaming for in-place updates when card_template_id is configured)
app/gateway/routers/channel_connections.py - Browser-facing user connection and disconnect APIs
deerflow.persistence.channel_connections - SQL-backed user-owned connection, optional credential, connect state, and conversation store

Message Flow:

External platform -> Channel impl -> MessageBus.publish_inbound()
ChannelManager._dispatch_loop() consumes from queue
For user-owned channel connections, incoming messages carry connection_id, owner_user_id, and workspace_id; owner_user_id becomes the DeerFlow run user_id, while the raw platform user id remains channel_user_id
For chat: look up/create thread through Gateway's LangGraph-compatible API
Feishu/Telegram chat: runs.stream() → accumulate AI text → publish multiple outbound updates (is_final=False) → publish final outbound (is_final=True)
Slack/Discord chat: runs.wait() → extract final response → publish outbound
Feishu channel sends one running reply card up front, then patches the same card for each outbound update (card JSON sets config.update_multi=true for Feishu's patch API requirement)
Telegram streaming: the "Working on it..." placeholder message is registered as the stream target; non-final updates editMessageText it in place (channel-side throttle: 1s in private chats, 3s in groups due to Telegram's 20 msg/min group cap; 4096-char truncation; rate-limited updates dropped); the final update performs the last edit and splits >4096 texts into follow-up messages
DingTalk AI Card mode (when card_template_id configured): runs.stream() → create card with initial text → stream updates via PUT /v1.0/card/streaming → finalize on is_final=True. Falls back to sampleMarkdown if card creation or streaming fails
For commands (/new, /status, /models, /memory, /help): handle locally or query Gateway API
Outbound → channel callbacks → platform reply

Owner-scoped file storage: inbound files, uploads, and output artifacts are staged under the DeerFlow owner's bucket so they land where the agent run reads/writes (users/{user_id}/threads/{thread_id}/user-data/{uploads,outputs}). ChannelManager._handle_chat resolves the storage owner once via _channel_storage_user_id(msg) (sanitized owner id, falling back to safe(msg.user_id) for unbound auth-enabled channels — mirroring _resolve_run_params's run identity; None only when no identity is available) and threads it as the user_id= kwarg through the file pipeline:

Channel.receive_file(msg, thread_id, user_id=...) — owner-bound channels persist downloaded files under the owner's bucket instead of the default bucket
_ingest_inbound_files(...) and the underlying ensure_uploads_dir / get_uploads_dir — owner-scoped via the same kwarg
_resolve_attachments / _prepare_artifact_delivery — resolve output artifacts from the bound owner's bucket The cached value is reused for both the blocking (runs.wait) and streaming (_handle_streaming_chat) paths, so uploads and artifact delivery always target the same bucket even if a channel returns a rewritten InboundMessage from receive_file. The bucket id matches the memory bucket resolved by _resolve_memory_user_id (both normalize through make_safe_user_id).

Configuration (config.yaml -> channels):

langgraph_url - LangGraph-compatible Gateway API base URL (default: http://localhost:8001/api)
gateway_url - Gateway API URL for auxiliary commands (default: http://localhost:8001)
In Docker Compose, IM channels run inside the gateway container, so localhost points back to that container. Use http://gateway:8001/api for langgraph_url and http://gateway:8001 for gateway_url, or set DEER_FLOW_CHANNELS_LANGGRAPH_URL / DEER_FLOW_CHANNELS_GATEWAY_URL.
Per-channel configs: feishu (app_id, app_secret), slack (bot_token, app_token), telegram (bot_token), dingtalk (client_id, client_secret, optional card_template_id for AI Card streaming)

User-owned channel connections (config.yaml -> channel_connections):

Disabled by default. It is a user-binding layer on top of the existing channels.* runtime config, not a replacement for provider bot credentials.
No public IP, OAuth callback URL, or provider webhook route is required by the current implementation.
Telegram uses a deep-link /start <code> flow over the existing long-polling worker. Slack, Discord, Feishu/Lark, DingTalk, WeChat, and WeCom use /connect <code> over their existing outbound channel workers.
Frontend APIs: GET /api/channels/providers, GET /api/channels/connections, POST /api/channels/{provider}/connect, and DELETE /api/channels/connections/{connection_id}.
Browser APIs remain protected by normal Gateway auth/CSRF. Provider messages arrive through the already-configured channel workers.
Provider-level connection_status reflects the user's newest connection row. With no binding it is not_connected, except in auth-disabled local mode where a configured running channel reports connected because all channel messages already route to the default user.
Slack replies use the configured operator bot token from channels.slack unless per-connection credentials are present; unreadable or corrupt stored credentials are treated as unavailable.
Telegram, Slack, Discord, Feishu/Lark, DingTalk, WeChat, and WeCom workers resolve incoming platform identities to connection records before reaching ChannelManager.
Connect-code ordering vs allowed_users: inbound workers consume a valid /connect <code> (or Telegram /start <code>) before applying the allowed_users filter, so a newly allowlisted-but-unbound user can bootstrap their first bind via the browser flow. Consequence: allowed_users is not a bind-time defense — any sender who possesses a valid code can consume it (not only allowlisted users). The bind security model rests on the code's confidentiality: secrets.token_urlsafe(16), 600 s TTL, one-time consume_oauth_state, and codes surfaced only in the initiating browser (never echoed to chat). allowed_users still gates ordinary (non-bind) messages.
Single-active-owner transfer semantics: an external identity is keyed by (provider, external_account_id, workspace_id). The latest successful bind wins — upsert_connection revokes other owners' active rows for the same identity (ownership transfer). This invariant is enforced at the DB layer by the partial unique index uq_channel_connection_active_identity (WHERE status != 'revoked'), so concurrent connects from different owners cannot both end connected; the losing writer retries against the now-visible state. find_connection_by_external_identity therefore resolves deterministically.
See backend/docs/IM_CHANNEL_CONNECTIONS.md for provider setup and operational notes.

Memory System (`packages/harness/deerflow/agents/memory/`)

Components:

updater.py - LLM-based memory updates with fact extraction, whitespace-normalized fact deduplication (trims leading/trailing whitespace before comparing), and atomic file I/O
queue.py - Debounced update queue (per-thread deduplication, configurable wait time); captures user_id at enqueue time so it survives the threading.Timer boundary
prompt.py - Prompt templates for memory updates
storage.py - File-based storage with per-user isolation; cache keyed by (user_id, agent_name) tuple

Per-User Isolation:

Memory is stored per-user at {base_dir}/users/{user_id}/memory.json
Per-agent per-user memory at {base_dir}/users/{user_id}/agents/{agent_name}/memory.json
Custom agent definitions (SOUL.md + config.yaml) are also per-user at {base_dir}/users/{user_id}/agents/{agent_name}/. The legacy shared layout {base_dir}/agents/{agent_name}/ remains read-only fallback for unmigrated installations
user_id is resolved via get_effective_user_id() from deerflow.runtime.user_context
The /api/memory* endpoints resolve the owner through _resolve_memory_user_id(request): trusted internal callers (IM channel workers carrying the X-DeerFlow-Owner-User-Id header, e.g. a bound /memory command) act for the connection owner; browser/API callers fall back to get_effective_user_id(). The header is only honored after AuthMiddleware validated the internal token, mirroring get_trusted_internal_owner_user_id used by the threads router
In no-auth mode, user_id defaults to "default" (constant DEFAULT_USER_ID)
Absolute storage_path in config opts out of per-user isolation
Migration: Run PYTHONPATH=. python scripts/migrate_user_isolation.py to move legacy memory.json, threads/, and agents/ into per-user layout. Supports --dry-run (preview changes) and --user-id USER_ID (assign unowned legacy data to a user, defaults to default).

Data Structure (stored in {base_dir}/users/{user_id}/memory.json):

User Context: workContext, personalContext, topOfMind (1-3 sentence summaries)
History: recentMonths, earlierContext, longTermBackground
Facts: Discrete facts with id, content, category (preference/knowledge/context/behavior/goal), confidence (0-1), createdAt, source

Workflow:

MemoryMiddleware filters messages (user inputs + final AI responses), captures user_id via get_effective_user_id(), and queues conversation with the captured user_id
Queue debounces (30s default), batches updates, deduplicates per-thread
Background thread invokes LLM to extract context updates and facts, using the stored user_id (not the contextvar, which is unavailable on timer threads)
Applies updates atomically (temp file + rename) with cache invalidation, skipping duplicate fact content before append
Next interaction injects top 15 facts + context into <memory> tags in system prompt

Token counting (packages/harness/deerflow/agents/memory/prompt.py):

_count_tokens budgets the injection. In default tiktoken mode, the encoding is loaded lazily and cached.
Failed tiktoken loads are cached with a timestamp. During the fixed cooldown (_TIKTOKEN_RETRY_COOLDOWN_S, 600s), callers fall back to char estimation immediately instead of re-triggering the blocking BPE download; after the cooldown, transient outages can self-heal without a restart.
In-flight loads are cached as a LOADING sentinel so concurrent callers fall back instead of spawning more blocking threads.
Set memory.token_counting: char to skip tiktoken entirely and use the network-free CJK-aware char estimate.

Focused regression coverage for the updater lives in backend/tests/test_memory_updater.py.

Configuration (config.yaml → memory):

enabled / injection_enabled - Master switches
storage_path - Path to memory.json (absolute path opts out of per-user isolation)
debounce_seconds - Wait time before processing (default: 30)
model_name - LLM for updates (null = default model)
max_facts / fact_confidence_threshold - Fact storage limits (100 / 0.7)
max_injection_tokens - Token limit for prompt injection (2000)
token_counting - Token counting strategy for the injection budget: tiktoken (default, accurate but may download BPE data from a public endpoint on first use — can block for a long time in network-restricted environments, see issues #3402/#3429) or char (network-free CJK-aware char estimate, never touches tiktoken)

Reflection System (`packages/harness/deerflow/reflection/`)

resolve_variable(path) - Import module and return variable (e.g., module.path:variable_name)
resolve_class(path, base_class) - Import and validate class against base class

Tracing System (`packages/harness/deerflow/tracing/`)

LangSmith and Langfuse are both supported. The wiring lives in two layers:

factory.py::build_tracing_callbacks() — returns the LangChain CallbackHandler list for the providers currently enabled via env vars (LANGSMITH_TRACING, LANGFUSE_TRACING, etc.). The handlers are attached at the graph invocation root for in-graph runs (make_lead_agent and DeerFlowClient.stream both append them to config["callbacks"] before invoking the graph) so a single run produces one trace with all node / LLM / tool calls as child spans. Standalone callers — anything that invokes a model outside such a graph (e.g. MemoryUpdater) — keep create_chat_model's default attach_tracing=True, which falls back to model-level callback attachment.
metadata.py::build_langfuse_trace_metadata() — builds the Langfuse-reserved trace attributes for RunnableConfig.metadata. The Langfuse v4 langchain.CallbackHandler lifts these onto the root trace (see its _parse_langfuse_trace_attributes), but only when it sees on_chain_start(parent_run_id=None) — which is why the callbacks have to live at the graph root, not the model.

Trace-attribute injection points: both runtime/runs/worker.py::run_agent (gateway path) and client.py::DeerFlowClient.stream (embedded path) merge the metadata into config["metadata"] right before constructing the graph. subagents/executor.py::_aexecute does the same for every subagent run so subagent traces group under the parent thread's session card (carrying the parent thread_id → langfuse_session_id, the user_id captured at task_tool → langfuse_user_id, and a subagent:<normalized-name> trace name). Caller-supplied keys win via setdefault, so an external session_id override is preserved. Field mapping:

Langfuse field	Source
`langfuse_session_id`	LangGraph `thread_id`
`langfuse_user_id`	`get_effective_user_id()` (`default` in no-auth); for subagents, captured from `runtime.context` at `task_tool` time via `resolve_runtime_user_id()`
`langfuse_trace_name`	`RunRecord.assistant_id` / client `agent_name` (defaults to `lead-agent`); for subagents, `subagent:<name>` (lowercased, `_` → `-`)
`langfuse_tags`	`env:<DEER_FLOW_ENV>` + `model:<model_name>`

Returns {} when Langfuse is not in the enabled providers — LangSmith-only deployments are unaffected. Set DEER_FLOW_ENV (or ENVIRONMENT) to tag traces by deployment environment. Tests live in tests/test_tracing_factory.py, tests/test_tracing_metadata.py, tests/test_worker_langfuse_metadata.py, tests/test_client_langfuse_metadata.py, and tests/test_subagent_executor.py::TestSubagentTracingWiring.

Config Schema

config.yaml key sections:

models[] - LLM configs with use class path, supports_thinking, supports_vision, provider-specific fields
vLLM reasoning models should use deerflow.models.vllm_provider:VllmChatModel; for Qwen-style parsers prefer when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking, and DeerFlow will also normalize the older thinking alias
tools[] - Tool configs with use variable path and group
tool_groups[] - Logical groupings for tools
sandbox.use - Sandbox provider class path
skills.path / skills.container_path - Host and container paths to skills directory
title - Auto-title generation (enabled, max_words, max_chars, prompt_template)
summarization - Context summarization (enabled, trigger conditions, keep policy)
subagents.enabled - Master switch for subagent delegation
memory - Memory system (enabled, storage_path, debounce_seconds, model_name, max_facts, fact_confidence_threshold, injection_enabled, max_injection_tokens)

extensions_config.json:

mcpServers - Map of server name → config (enabled, type, command, args, env, url, headers, oauth, description)
skills - Map of skill name → state (enabled)

Both can be modified at runtime via Gateway API endpoints or DeerFlowClient methods.

Embedded Client (`packages/harness/deerflow/client.py`)

DeerFlowClient provides direct in-process access to all DeerFlow capabilities without HTTP services. All return types align with the Gateway API response schemas, so consumer code works identically in HTTP and embedded modes.

Architecture: Imports the same deerflow modules that Gateway API uses. Shares the same config files and data directories. No FastAPI dependency.

Agent Conversation:

chat(message, thread_id) — synchronous, accumulates streaming deltas per message-id and returns the final AI text
stream(message, thread_id) — subscribes to LangGraph stream_mode=["values", "messages", "custom"] and yields StreamEvent:
- "values" — full state snapshot (title, messages, artifacts); AI text already delivered via messages mode is not re-synthesized here to avoid duplicate deliveries
- "messages-tuple" — per-chunk update: for AI text this is a delta (concat per id to rebuild the full message); tool calls and tool results are emitted once each
- "custom" — forwarded from StreamWriter
- "end" — stream finished (carries cumulative usage counted once per message id)
Agent created lazily via create_agent() + build_middlewares(), same as make_lead_agent
Supports checkpointer parameter for state persistence across turns
reset_agent() forces agent recreation (e.g. after memory or skill changes)
See docs/STREAMING.md for the full design: why Gateway and DeerFlowClient are parallel paths, LangGraph's stream_mode semantics, the per-id dedup invariants, and regression testing strategy

Gateway Equivalent Methods (replaces Gateway API):

Category	Methods	Return format
Models	`list_models()`, `get_model(name)`	`{"models": [...]}`, `{name, display_name, ...}`
MCP	`get_mcp_config()`, `update_mcp_config(servers)`	`{"mcp_servers": {...}}`
Skills	`list_skills()`, `get_skill(name)`, `update_skill(name, enabled)`, `install_skill(path)`	`{"skills": [...]}`
Memory	`get_memory()`, `reload_memory()`, `get_memory_config()`, `get_memory_status()`	dict
Uploads	`upload_files(thread_id, files)`, `list_uploads(thread_id)`, `delete_upload(thread_id, filename)`	`{"success": true, "files": [...]}`, `{"files": [...], "count": N}`
Artifacts	`get_artifact(thread_id, path)` → `(bytes, mime_type)`	tuple

Key difference from Gateway: Upload accepts local Path objects instead of HTTP UploadFile, rejects directory paths before copying, and reuses a single worker when document conversion must run inside an active event loop. Artifact returns (bytes, mime_type) instead of HTTP Response. The new Gateway-only thread cleanup route deletes .deer-flow/threads/{thread_id} after LangGraph thread deletion; there is no matching DeerFlowClient method yet. update_mcp_config() and update_skill() automatically invalidate the cached agent.

Tests: tests/test_client.py (77 unit tests including TestGatewayConformance), tests/test_client_live.py (live integration tests, requires config.yaml)

Gateway Conformance Tests (TestGatewayConformance): Validate that every dict-returning client method conforms to the corresponding Gateway Pydantic response model. Each test parses the client output through the Gateway model — if Gateway adds a required field that the client doesn't provide, Pydantic raises ValidationError and CI catches the drift. Covers: ModelsListResponse, ModelResponse, SkillsListResponse, SkillResponse, SkillInstallResponse, McpConfigResponse, UploadResponse, MemoryConfigResponse, MemoryStatusResponse.

Development Workflow

Test-Driven Development (TDD) — MANDATORY

Every new feature or bug fix MUST be accompanied by unit tests. No exceptions.

Write tests in backend/tests/ following the existing naming convention test_<feature>.py
Run the full suite before and after your change: make test
Tests must pass before a feature is considered complete
For lightweight config/utility modules, prefer pure unit tests with no external dependencies
If a module causes circular import issues in tests, add a sys.modules mock in tests/conftest.py (see existing example for deerflow.subagents.executor)

# Run all tests
make test

# Run a specific test file
PYTHONPATH=. uv run pytest tests/test_<feature>.py -v

Running the Full Application

From the project root directory:

make dev

This starts all services and makes the application available at http://localhost:2026.

All startup modes:

	Local Foreground	Local Daemon	Docker Dev	Docker Prod
Dev	`./scripts/serve.sh --dev` `make dev`	`./scripts/serve.sh --dev --daemon` `make dev-daemon`	`./scripts/docker.sh start` `make docker-start`	—
Prod	`./scripts/serve.sh --prod` `make start`	`./scripts/serve.sh --prod --daemon` `make start-daemon`	—	`./scripts/deploy.sh` `make up`

Action	Local	Docker Dev	Docker Prod
Stop	`./scripts/serve.sh --stop` `make stop`	`./scripts/docker.sh stop` `make docker-stop`	`./scripts/deploy.sh down` `make down`
Restart	`./scripts/serve.sh --restart [flags]`	`./scripts/docker.sh restart`	—

Nginx routing:

/api/langgraph/* → Gateway embedded runtime (8001), rewritten to /api/*
/api/* (other) → Gateway API (8001)
/ (non-API) → Frontend (3000)

Running Backend Services Separately

From the backend directory:

# Gateway API
make gateway

Direct access (without nginx):

Gateway: http://localhost:8001

Frontend Configuration

The frontend uses environment variables to connect to backend services:

NEXT_PUBLIC_LANGGRAPH_BASE_URL - Defaults to /api/langgraph (through nginx)
NEXT_PUBLIC_BACKEND_BASE_URL - Defaults to empty string (through nginx)

When using make dev from root, the frontend automatically connects through nginx.

Key Features

File Upload

Multi-file upload with automatic document conversion:

Endpoint: POST /api/threads/{thread_id}/uploads
Supports: PDF, PPT, Excel, Word documents (converted via markitdown)
Rejects directory inputs before copying so uploads stay all-or-nothing
Reuses one conversion worker per request when called from an active event loop
Files stored in thread-isolated directories under the resolving user's bucket (users/{user_id}/threads/{thread_id}/user-data/uploads). For IM channels the owner is threaded explicitly via the user_id= kwarg (see IM Channels → Owner-scoped file storage); HTTP/embedded callers resolve it from get_effective_user_id()
Duplicate filenames in a single upload request are auto-renamed with _N suffixes so later files do not truncate earlier files
Agent receives uploaded file list via UploadsMiddleware

See docs/FILE_UPLOAD.md for details.

Plan Mode

TodoList middleware for complex multi-step tasks:

Controlled via runtime config: config.configurable.is_plan_mode = True
Provides write_todos tool for task tracking
One task in_progress at a time, real-time updates

See docs/plan_mode_usage.md for details.

Context Summarization

Automatic conversation summarization when approaching token limits:

Configured in config.yaml under summarization key
Trigger types: tokens, messages, or fraction of max input
Keeps recent messages while summarizing older ones

See docs/summarization.md for details.

Vision Support

For models with supports_vision: true:

ViewImageMiddleware processes images in conversation
view_image_tool added to agent's toolset
Images automatically converted to base64 and injected into state

Code Style

Uses ruff for linting and formatting
Line length: 240 characters
Python 3.12+ with type hints
Double quotes, space indentation

Documentation

See docs/ directory for detailed documentation:

CONFIGURATION.md - Configuration options
ARCHITECTURE.md - Architecture details
API.md - API reference
SETUP.md - Setup guide
FILE_UPLOAD.md - File upload feature
PATH_EXAMPLES.md - Path types and usage
summarization.md - Context summarization
plan_mode_usage.md - Plan mode with TodoList

57 KiB Raw Blame History