* fix(agents): sync agent_name across context/configurable and reject empty soul (#3549)
Two independent issues caused custom agent creation to silently fail:
1. build_run_config only wrote agent_name into one container (configurable
or context), so setup_agent — which reads ToolRuntime.context exclusively
since LangGraph >=1.1.9 — saw agent_name=None and wrote SOUL.md to the
global base_dir instead of users/{user_id}/agents/{name}/. Mirror the
dual-write pattern already used by merge_run_context_overrides and
naming.py so both containers always carry the same value.
2. setup_agent persisted whatever soul string it received, including empty
or whitespace-only content, and still reported success. The frontend
then surfaced an unusable agent and the global default SOUL.md could be
silently overwritten with empty content. Reject empty soul before any
filesystem operation so the model can retry.
Tests:
- test_gateway_services.py: dual-write regressions for both configurable
and context entry paths, explicit-agent-name precedence on both sides,
and a shape-parity test against merge_run_context_overrides.
- test_setup_agent_tool.py: empty/whitespace soul rejection, plus
no-overwrite guarantees for existing global and per-agent SOUL.md.
* Update services.py
Subagent _create_agent() now passes checkpointer=False to prevent
inheriting the parent run's synchronous checkpointer via copy_context(),
which would cause NotImplementedError when aget_tuple() is called on
the async path. Subagents are one-shot delegations that never resume,
so persistence is unnecessary.
send_file opened the attachment with a bare open() and never closed it,
leaking a file descriptor on every Discord file delivery. The handle was
also leaked on the failure path: when target.send raised, the except
branch logged and returned without closing fp. The "# noqa: SIM115"
suppressed the lint warning instead of fixing it.
Wrap open() in a with statement that stays open for the full upload —
the discord client reads fp while target.send runs on _discord_loop, and
once that future resolves the bytes are consumed, so closing here is
safe. This closes the handle on both the success and exception paths and
matches how telegram and feishu already handle their file uploads.
Adds regression tests asserting the handle is closed after send_file on
both the success and failure paths.
Refs #3544
* feat(community): add Brave Search web search tool
Add a community web_search provider backed by the official Brave Search
API (https://api.search.brave.com/res/v1/web/search). API key is read
from the tool config (inline api_key) or the BRAVE_SEARCH_API_KEY env
var. Output schema (title/url/content) matches existing search tools.
No new dependencies (uses the existing httpx). Also wires up the setup
wizard, doctor health check, config example, and EN/ZH docs.
* refactor(community): drop redundant [:count] slice in Brave search
The Brave API already caps results via the `count` request param, so
client-side slicing was redundant. Tests now simulate the API honoring
`count` instead of relying on the slice. Addresses PR review nit.
* style(tests): apply ruff format to test_doctor.py
Collapse multiline write_text calls onto single lines to satisfy the
CI ruff formatter (lint-backend was failing on format --check).
* fix(channels): surface WeCom WebSocket connection failures (#2000)
The WeCom channel started the SDK connection with a fire-and-forget
asyncio task and never inspected its result, so connection failures
(e.g. the gateway WebSocket handshake to wss://openws.work.weixin.qq.com
failing) were silently swallowed: the channel still logged "started",
SDK error/disconnected events went unobserved, and the connect task
produced "Task exception was never retrieved" noise.
Monitor the connect task with a done-callback that logs a clear,
actionable error (and stays silent on cancellation), and subscribe to
the SDK's error/disconnected events so failures become visible in
DeerFlow's own logs.
* style(channels): apply ruff format to wecom.py
Collapse the multiline log message onto a single line to satisfy the
CI ruff formatter (lint-backend was failing on format --check).
* fix(channels): log WeCom disconnect reason when SDK provides one
Address review feedback: _on_ws_disconnected now includes the first
event arg (e.g. reason/context) in the warning instead of discarding
it, so disconnect causes are visible in logs.
* fix(scripts):start with make start-daemon,can not stop next-server with make stop
* fix(scripts):start with make start-daemon,can not stop next-server with make stop
ViewImageMiddleware persists full base64 image payloads in hide_from_ui
human messages inside checkpoints. All REST endpoints that returned
serialize_channel_values(channel_values) sent these multi-megabyte
payloads to the frontend, freezing the UI on threads with images.
Add strip_data_url_image_blocks() to remove data:-scheme image_url
content blocks from hide_from_ui messages, and
serialize_channel_values_for_api() as a convenience wrapper used by all
six affected call sites across threads, runs, and thread_runs routers.
SSE streaming is unaffected (still uses serialize_channel_values).
Fixes#3496
* docs(spec): telegram streaming output design
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* docs(plan): telegram streaming implementation plan
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat(telegram): report streaming support for telegram channel
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test(channels): use slack as the non-streaming sample channel in manager tests
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat(telegram): register running-reply placeholder as stream target
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test(telegram): pin last_edit_at sentinel in placeholder registration test
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* refactor(telegram): extract _send_new_message from send()
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat(telegram): edit streamed message in place for non-final updates
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat(telegram): finalize streamed message with overflow splitting
When is_final=True arrives and stream state exists, pop the state, edit
the streamed placeholder with the final text, split overflow into follow-up
send_message calls, update _last_bot_message, and clear stream state.
Falls back to _send_new_message when no stream state is registered.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* test(telegram): exercise the not-modified handler in final edit path
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* docs: telegram channel now streams replies via message editing
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* fix(telegram): harden final-delivery path with guarded retry and chunk retries
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* fix(channels): accept runtime 'messages' SSE event for streaming text accumulation
The embedded runtime (matching LangGraph Platform semantics) emits SSE
event name 'messages' for the requested 'messages-tuple' stream mode,
so the manager never accumulated token deltas and streaming channels
only updated from end-of-step 'values' snapshots — on Telegram this
looked like 'Working on it...' followed by the full answer in one block.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* feat(telegram): widen stream-edit throttle to 3s in group chats
Telegram caps bots at 20 messages/minute per group, stricter than the
1 msg/s per-chat guideline. Groups have negative chat ids, so pick the
interval by sign.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* fix(telegram): address review findings — thread fallback messages, bound stream registry, share stream-event constants
- Fallback/new stream messages now carry reply_to_message_id parsed from
thread_ts so they stay nested under the user's message (finding 1)
- STREAM_MODES / MESSAGE_STREAM_EVENTS constants link the requested
stream modes to the SSE event names they arrive under (finding 2)
- _register_stream_message bounds the in-flight registry at 256 entries,
evicting oldest, guarding against leaks when a final never arrives (finding 4)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
GET /api/mcp/config returns 403 for non-admin users, but the previous
client returned the error body as MCPConfig, causing MCPServerList to
crash with 'Cannot convert undefined or null to object' on
Object.entries(config.mcp_servers).
- api.ts: introduce MCPConfigRequestError; loadMCPConfig and
updateMCPConfig now throw it (carrying status + isAdminRequired)
instead of letting non-2xx bodies leak through as parsed config
- tool-settings-page.tsx: render a friendly 'admin privileges required'
empty state when the React Query error is an admin-required
MCPConfigRequestError; keep MCPServerList resilient with
Object.entries(servers ?? {}) and an empty-state for no servers
- i18n: add settings.tools.adminRequired and settings.tools.empty in
en-US, zh-CN and the Translations type
- tests: cover 403 / 5xx / instanceof / detail-fallback for both
loadMCPConfig and updateMCPConfig in tests/unit/core/mcp/api.test.ts
Refs: #3527
_ingest_inbound_files ensured the thread uploads dir (mkdir), enumerated it
(iterdir/is_file) to de-duplicate names, and wrote each downloaded attachment
(write_upload_file_no_symlink) directly on the event loop. Offload the directory
prep and every per-file write via asyncio.to_thread; the genuinely async network
read (file_reader) stays on the loop. Externally observable behavior is unchanged.
Found via `make detect-blocking-io` (HIGH: iterdir on an async path).
Add tests/blocking_io/test_channels_ingest.py anchor, verified red->green under
the strict Blockbuster gate.
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
list_messages re-scanned every event in the thread on each call (category
filter + seq filter) — O(total events) per paginated request on the default
run-events backend. Maintain a messages-only, seq-sorted projection of _events
(shared dict refs, no copies) and locate the seq window with bisect:
list_messages drops to O(log m + page) and count_messages to O(1). The index is
kept in lockstep at every mutation site (put / put_batch via _put_one,
delete_by_run, delete_by_thread).
Externally observable behavior is unchanged — the full RunEventStore contract
suite passes across memory/db/jsonl.
Add a test covering pagination over non-contiguous message seqs (messages
interleaved with trace events), including in-gap and exact-boundary cursors.
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore(blocking-io): fail-loud repo-root resolution and shared detector CLI shim
The three detectors resolved REPO_ROOT with depth-indexed
Path(__file__).resolve().parents[4]. If a detector file ever moves to a
different directory depth, scan roots resolve under the wrong directory
and the detector reports zero findings with no error — a silent-zero
failure shape for a detection tool.
- Add support/detectors/repo_root.py: resolve the repo root by walking
upward to the .git marker (checked with exists() so git worktrees,
where .git is a file, also resolve), raising RuntimeError when no
marker is found. All three detectors use it at import time, so a
relocated detector fails loudly instead of scanning an empty tree.
- Extract scripts/_detector_cli.py from the three character-identical
CLI shims; the sys.path computation lives in one place and raises
when backend/tests cannot be found.
- tests/test_detector_repo_root.py pins: resolution from an unmarked
location raises instead of returning an empty scan; all three
detectors share the resolved root; each CLI shim delegates to its
detector.
Testing: backend `make test` (4278 passed); smoke-ran
`make detect-blocking-io`, `make detect-thread-boundaries`, and
`scripts/scan_changed_blocking_io.py --base upstream/main`.
Closes#3510 (review follow-up to #3503).
* chore(blocking-io): declare detector modules import-only, drop script-mode residue
Adversarial review caught that blocking_io_static.py and
thread_boundaries.py kept shebangs and __main__ blocks but can no longer
run as plain scripts: the new `from support.detectors.repo_root import`
executes before anything puts backend/tests on sys.path, so direct
invocation dies with ModuleNotFoundError before argparse.
Direct execution was never a documented entry point (Makefile targets,
the scripts/ shims, the blocking-io-guard skill, and tests all go
through the support.detectors package), so converge on import-only
instead of re-adding per-module bootstrap: drop the shebangs and the now
unreachable __main__ blocks (plus the `import sys` they kept alive) and
state the supported entry points in each module docstring. The shim
delegation tests in test_detector_repo_root.py pin the supported CLI
paths.
Testing: backend `make test` (4278 passed); `make detect-blocking-io`
and `make detect-thread-boundaries` smoke-ran.
* perf(runtime): index runs by thread_id to avoid O(n) scans in RunManager
RunManager.list_by_thread, create_or_reject (inflight check), and has_inflight each filtered every in-memory run by thread_id — an O(total in-memory runs) scan that grows with overall gateway traffic rather than the queried thread's depth.
Add a thread_id -> run_ids secondary index (an insertion-ordered dict used as an ordered set) maintained in lockstep with _runs under the existing lock at every add/remove site (create, create_or_reject, both rollbacks, cleanup). The three per-thread queries now run in O(runs-in-thread); insertion order is preserved so list_by_thread keeps stable tie-breaking. Behavior unchanged. Adds 6 regression tests; full RunManager suite 146 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(runtime): cover create_or_reject rollback + clarify thread-index guard docstrings
Address review on #3499 (fancyboi999):
- Reword _thread_records_locked docstring: lockstep under self._lock is the
correctness guarantee; self._runs.get is one-directional defense-in-depth
(drops stale ids, cannot recover index-missing ids), not reconciliation.
- Add test_failed_create_or_reject_unindexes_run covering the create_or_reject
rollback/unindex mutation site (the last untested mutation path).
- Fix _FailingPutRunStore docstring ("initial put" -> "every put").
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(frontend): render user messages as plain text and cap blockquote nesting
User messages are typed or pasted plain text, not authored Markdown, but
they were rendered through the full Streamdown pipeline. Pasted source
files got fragmented (indented chunks become code blocks, paragraphs
collapse and lose indentation), "$...$" spans were KaTeX-ified, and a
message with thousands of nested ">" markers overflowed the call stack
in marked's recursive blockquote lexer, permanently crashing the thread
on every load.
Render human message content verbatim with pre-wrap instead, and cap
blockquote nesting at 100 levels at the Streamdown chokepoint so model
output cannot trigger the same recursion either.
Closes#3500
* fix(frontend): absorb marked lexer crashes with a render fallback boundary
Review found two gaps in the nesting cap: marked's list and blockquote
tokenizers are mutually recursive, so a list marker in front of the
quote chain ("- > > > ...") bypassed the blockquote-only regex and
still overflowed the stack; and the line-based rewrite was fence-blind,
silently truncating literal ">" runs inside code blocks.
Add an error boundary around Streamdown that renders the raw content as
plain pre-wrap text when rendering throws (retrying on the next content
change), keep the cap as a fast path for the dominant pure-">" case,
and make it skip fenced and indented code lines.
* Add user-owned IM channel connections
* Fix dev startup and channel connect popup
* Use async channel connect flow
* Harden dev service daemon startup
* Support local IM channel connections
* Align IM connections with local channels
* Fix safe user id digest algorithm
* Address Copilot IM channel feedback
* Address IM channel review comments
* Support all integrated IM channel connections
* Format additional channel connection tests
* Keep unavailable channel connect buttons clickable
* Fix IM channel provider icons
* Add runtime setup for enabled IM channels
* Guard global shortcut key handling
* Keep configured IM channels editable
* Avoid password autofill for channel secrets
* Make channel threads visible to connection owners
* Persist IM runtime config locally
* Allow disconnecting runtime IM channels
* Route no-auth channel sessions to local user
* Use default user for auth-disabled local mode
* Show IM channel source on threads
* Prefill IM channel runtime config
* Reflect IM channel runtime health
* Ignore Feishu message read events
* Ignore Feishu non-content message events
* Let setup wizard enable IM channels
* Fix frontend formatting after merge
* Stabilize backend tests without local config
* Isolate channel runtime config tests
* Address channel connection review comments
* Use sha256 user buckets with legacy migration
* Ensure runtime IM channels are ready after restart
* Persist disconnected IM channel state
* Address channel connection review comments
* Address channel connection review findings
Frontend connect flow:
- Open the runtime-config dialog only when a provider still needs
credentials; configured providers go straight to the connect flow, so
the binding-code/deep-link path is reachable from the UI again.
- After saving credentials, continue into the connect flow when a user
binding is still required (multi-user mode) instead of stopping at a
"Connected" toast.
- Extract shared provider-state helpers to core/channels/provider-state
and add unit + e2e coverage for the direct-connect and
configure-then-connect paths.
Provider status semantics:
- Report connection_status from the user's newest connection row;
with no binding it is not_connected, except in auth-disabled local
mode where a configured running channel is effectively connected.
Concurrency and event-loop correctness:
- Offload ChannelRuntimeConfigStore construction and writes, channel
service construction, and Slack connection replies to threads; add a
tests/blocking_io/ anchor for the runtime-config handlers.
- Consume binding codes with a conditional UPDATE so a code can only be
used once under concurrent workers; retry upsert_connection as an
update when a concurrent insert wins the unique constraint.
- Serialize ensure_channel_ready per channel so concurrent provider
polls cannot double-start a channel worker.
Config and migration hardening:
- Stop mutating the get_app_config()-cached Telegram provider config;
the runtime store now owns the UI-entered bot username.
- Register channel_connections in STARTUP_ONLY_FIELDS with the
standardized startup-only Field description.
- Match the legacy unsafe-id bucket by recomputing its exact SHA-1 name
so another user's same-prefix bucket can never be migrated.
- Remove the unused Telegram process_webhook_update path and document
src/core/channels in the frontend docs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Address PR review comments on authz scoping and channel runtime
Security (review feedback from ShenAC-SAC):
- Scope internal-token callers to the connection owner carried in
X-DeerFlow-Owner-User-Id instead of bypassing owner checks outright,
in both require_permission(owner_check=True) and the stateless run
endpoints. Internal callers keep access to their own and
shared/legacy threads, and may claim a default-owned channel thread
for its real owner, but a leaked internal token no longer grants
cross-user thread access.
- Require admin privileges for POST/DELETE /api/channels/{provider}/
runtime-config: runtime credentials and channel workers are
instance-wide shared state (same model as the MCP config API).
Read-only provider listing stays available to all users.
Performance (review feedback from willem-bd):
- Skip the redundant thread channel-metadata PATCH after the first
successful backfill per thread.
- Reuse the per-connection Slack WebClient until its token changes
instead of constructing one per outbound message.
- Reconcile channel readiness for all providers concurrently in
GET /api/channels/providers.
Also resolve the code-quality unused-import flag in the blocking-io
anchor by pre-importing the channel service via importlib.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Fix prettier formatting in provider-state test
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Reconcile UI runtime channel config with config reload on restart
Main now reloads a channel's config.yaml entry on restart_channel()
(#3514, issue #3497). Adapt the user-owned connection flow to coexist:
- configure_channel() restarts with reload_config=False — the caller
just supplied the authoritative config (browser-entered credentials
that are never written to config.yaml), so a file reload must not
clobber it with the stale on-disk entry.
- _load_channel_config() re-applies the UI runtime-store overlay used
at startup, so an operator-triggered restart keeps browser-entered
credentials for channels without a config.yaml entry and does not
resurrect a channel disconnected from the UI.
- Offload the reload's disk IO (config.yaml + runtime store) with
asyncio.to_thread, matching the blocking-IO policy on this branch.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
* fix(skills): keep skill archive installation off the event loop
ainstall_skill_from_archive — the async entry point awaited by the gateway
POST /skills/install route — ran its entire filesystem pipeline inline on
the event loop: zip extraction, frontmatter validation, rglob enumeration,
per-file read_text, shutil.copytree staging, and tempdir cleanup.
Restructure into offloaded phases: prepare (extract + validate) and commit
(stage + move) run via asyncio.to_thread, the tempdir lifecycle is
offloaded, and the security scanner's file enumeration and reads move off
the loop — only the per-file LLM scan (genuinely async) stays awaited.
Security decision logic and exception contract are unchanged.
Anchor: tests/blocking_io/test_skills_install.py drives the real install
pipeline (real .skill archive, real FS; only scan_skill_content stubbed)
under the strict Blockbuster gate. Verified red on pre-fix code
(BlockingError: os.stat), green with the fix.
* fix(skills): log temp-dir cleanup failures instead of swallowing them
Review follow-up on the install offload: rmtree(ignore_errors=True) kept
the primary install exception but silently leaked the extraction dir on
cleanup failure. Keep the never-mask behaviour, add a warning log.
* fix(skills): bound install tmp cleanup and pass skill_dir explicitly (review)
- Wrap the best-effort temp-dir cleanup in asyncio.wait_for (5s) so a
hung filesystem in the finally block cannot stall or mask the install
outcome; timeout is logged like the existing OSError path.
- Hoist _collect_scannable_files to module level with skill_dir as an
explicit argument instead of a closure capture.
* feat(blocking-io): add changed-lines blocking-IO scanner (L1)
* feat(blocking-io): add scan-changed CLI wrapper
* feat(skill): add blocking-io-guard developer SOP skill
* docs(blocking-io): point contributors at the blocking-io-guard skill
* style(blocking-io): apply ruff format to scanner and tests
* docs(backend): document changed-lines blocking-IO scanner in CLAUDE.md
* feat(skill): add post-fix re-scan check and PR batching policy
* refactor(skill): fix SOP step ordering, align template with repo conventions
- Move re-scan into an explicit 'apply the fix' step (was wedged after
anchor generation while telling you to go back before the anchor)
- Renumber steps 0-6; drop undefined 'L1' jargon
- Mode A: document that the diff is <base>...HEAD (commit first)
- Mode B: prefer make detect-blocking-io + findings JSON file
- anchor template: module-level pytestmark per tests/blocking_io convention
- CLAUDE.md: fix 'git diff --base' phrasing
* fix(skill): catch findings introduced without touching the blocking line
Review follow-up: changed-line intersection alone misses the case where a
new async caller exposes an old sync helper — the static finding sits on
the untouched blocking line, so Mode A returned empty and the SOP stopped
on a false 'no blocking-IO surface'.
Selection is now a union over the changed files:
- findings on added lines of git diff <base>...HEAD (kept: a second
identical symbol in an already-flagged function collides on the stable
key and only this selection sees it);
- findings new versus the merge base, matched by (path, function,
symbol) — never line numbers.
Base sources are materialized via git show <merge-base>:<path>; files
absent at base count every head finding as new. SKILL.md now states the
residual same-file-only blind spot (cross-file async callers) instead of
treating an empty list as proof of zero exposure, and only requires
reading sop-skeleton.md when generalizing to another detector domain.
* docs(skill): examples teach test-writing, the teeth check defines the rule
All examples in the references/template are filesystem-flavored; make
explicit that they are instances, not the SOP's boundary — the same rules
apply to every detector category (FILE_IO, HTTP, SUBPROCESS, SLEEP) and
acceptance is always red/green teeth, never similarity to an example.
Neutralize the template's arrange comment accordingly.
* fix(blocking-io): harden changed-lines scanner per review
- Dedup the union selection by the stable key (path, function, symbol)
instead of dict identity, so a future selector returning copied dicts
cannot silently empty the result.
- parse_changed_lines now handles any unified diff: context lines advance
the new-file counter, \-markers and deletions do not, and the counter
resets at each +++ header. Previously correct only for --unified=0.
- Add blocking_io_static.scan_source (in-memory scan); base-version
comparison no longer round-trips through temp files.
- Empty Mode A report now prints the same-file-only reachability caveat
at the point of use instead of relying on the SOP text alone.
* docs(skill): bound best-effort cleanup when the offload sits in finally
Lesson from the #3505 review: the SOP routinely drives 'offload the
cleanup branch' transformations, and an awaited cleanup in finally can
mask or stall the primary exception. One sentence in Step 2 closes that
gap at the point where the fix is written.
* feat(community): add SearXNG and Browserless web search/fetch tools
- SearXNG web_search: privacy-focused meta search engine integration
with configurable base_url via config.yaml tool settings
- Browserless web_fetch: headless browser page fetching with
readability article extraction
- Both tools are fully configurable through tool config section
- No external API keys required for basic operation
* fix: address PR review feedback and add unit tests
- Guard config.model_extra against None values (review #1, #2)
- Coerce max_results to int when reading from config (review #2)
- Fix web_fetch_tool to use direct HTTP fetch instead of reusing
the web_search client config (review #3)
- Fix misleading docstring for SearxngClient.fetch (review #4)
- Remove unused target_url variable to pass Ruff lint (review #5)
- Normalize bool config values with _normalize_bool helper to
handle env-resolved string values correctly (review #6)
- Add unit tests for both SearXNG and Browserless client classes
and their tool functions with mocked httpx (review #7, #8)
* fix: convert to async httpx to avoid blocking I/O on event loop
- Replace httpx.Client with httpx.AsyncClient in both client classes
- Convert tool functions to async def
- Wrap readability_extractor calls in asyncio.to_thread()
- Update all tests to use pytest.mark.asyncio and async mocks
- Fix import sorting to pass Ruff lint
* fix(browserless): replace deprecated waitUntil with waitForEvent
The Browserless API has deprecated the waitUntil parameter.
Replace with waitForEvent which accepts values like 'networkidle'.
Default is empty (no wait), configurable via config.yaml.
* fix(browserless): remove deprecated gotoTimeout and bestAttempt params
The Browserless /content API does not accept gotoTimeout or bestAttempt
as top-level payload keys. These were being sent unconditionally,
causing 400 Bad Request errors on current Browserless versions.
Changes:
- Remove goto_timeout_ms parameter and 'gotoTimeout' from payload
- Remove best_attempt parameter and 'bestAttempt' from payload
- Remove _normalize_bool helper (no longer needed)
- Remove goto_timeout_ms and best_attempt config reading in tools.py
- Add tests for waitForSelector and reject params
- Verify no deprecated params are sent in test_fetch_html_success
* refactor(searxng): remove web_fetch_tool, decouple from web_search config
SearXNG is a search engine — it should only provide web_search_tool.
The web_fetch responsibility belongs to Browserless (headless Chrome)
or Jina AI, not SearXNG.
Changes:
- Remove web_fetch_tool from SearXNG tools.py and __init__.py
- Remove SearxngClient.fetch() method (no longer needed)
- Remove unused asyncio/readability imports from SearXNG tools.py
- Add test for max_results string-to-int coercion from config
- Add test for search with categories parameter
- Add test for httpx.RequestError handling
- Apply ruff format fixes to browserless_client.py and test files
- Add max-w-full min-w-0 to user message wrapper div to constrain width
- Change bubble width from w-fit to w-full max-w-full for consistent layout
- Add break-words to user message content for long string wrapping
- Add overflow-x-clip as defensive overflow protection
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(frontend): keep workspace interactive when SSR auth probe cannot reach gateway (#3493)
When the SSR auth probe at /api/v1/auth/me times out or fails, the
workspace layout used to render a static fallback page without
AuthProvider or QueryClientProvider, making logout and every other
interaction non-functional until the gateway recovered.
Render the normal WorkspaceContent in 'gateway_unavailable' mode
instead, surfacing a polite offline banner that re-probes the gateway
in the background and hides itself the moment refreshUser() returns
an authenticated user. The probe is reentrancy-guarded so a slow
gateway cannot pile up parallel /auth/me requests.
Closes#3493
* fix(workspace): silent probe in offline banner to avoid /login redirect during gateway recovery (#3493)
The banner previously delegated retry probes to AuthProvider.refreshUser(),
which treats any 401 from /api/v1/auth/me as 'session expired' and
force-redirects to /login. During gateway recovery, the first few requests
may transiently return 401 before the gateway is fully ready, which would
incorrectly kick the user out — defeating the purpose of the offline banner.
Now the banner silently fetches /api/v1/auth/me itself and only delegates
to refreshUser() on 200 OK. Non-200 responses (401 / 5xx / network) are
swallowed and retried on the next interval tick, ensuring the user stays
logged in across short gateway outages.
Verified in Docker:
- docker pause deer-flow-gateway → banner appears, page interactive
- docker unpause deer-flow-gateway → banner auto-disappears within 10s,
user remains on /workspace/chats/new with full session restored
- All 117 unit tests pass
* fix(workspace): fix banner polling leak and persistent 401 handling (#3493)
- Stop polling immediately after user recovery: add user to effect dependencies, cleanup interval when user !== null
- Handle persistent 401: trigger login redirect after 3 consecutive unauthorized responses
- Extract decision logic to pure helper, add 8 unit tests covering all critical paths
* fix(workspace): address CR feedback on gateway offline recovery (#3493)
- gateway-offline-banner-helpers: decrement (not reset) auth-failure
streak on transient outcomes so a flapping gateway (401 alternating
with 5xx) still converges on session-expired
- gateway-offline-banner: reuse probe response body to apply user
directly via new AuthProvider.applyUser, halving the recovery burst
against an already-struggling gateway
- gateway-offline-banner: extract classifyProbe into helpers for unit
testability; log probe failures via console.warn instead of swallowing
- gateway-offline-fallback: new shared component used by both workspace
and (auth) layouts so auth pages recover the same way the workspace
does, fixing the lockup where bare static HTML had no AuthProvider
- AuthProvider.logout: fall back to hard navigation when the gateway
logout fetch fails, matching legacy form-POST behaviour and avoiding
stale client state during outage
- tests: extend gateway-offline-banner-helpers.test with flapping
convergence and classifyProbe branch coverage (19 cases total)
* Fix stale AIO sandbox cache reuse
* Address AIO sandbox review feedback
* Distinguish sandbox health check failures
* Keep local discovery recoverable when the runtime check fails
LocalContainerBackend.discover() shares _is_container_running, which now
raises on transient daemon errors instead of returning False. Discovery has
no exception handling in _discover_or_create_with_lock(_async), so a brief
Docker hiccup turned a recoverable "could not verify, create instead" into a
hard acquire failure. Catch the check failure inside discover() and return
None so an unverifiable container is simply not adopted, restoring the
pre-change fall-through while keeping raise-on-unknown semantics protecting
the destroy path.
Reported by fancy-agent on PR #3494.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* Narrow the not-found match in container inspect error handling
A bare "not found" substring also matches transient failures like "command
not found" or "context not found", which would misclassify a check error as
"container definitely gone" and bypass the raise-on-unknown contract. Keep
Docker's specific "No such object"/"No such container" phrases, and only
trust a generic "not found" (Apple Container) when the message names the
inspected container or refers to a container/object.
Reported by WillemJiang on PR #3494.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
* fix(sandbox): persist lazily-acquired sandbox state via Command
ensure_sandbox_initialized mutates runtime.state in place, which is local
to the current tool invocation and is not picked up by LangGraph's channel
reducer. Subsequent graph steps and downstream consumers (such as
ToolOutputBudgetMiddleware and the sub-agent task_tool) therefore cannot
observe the sandbox id from state.
Wrap tool calls in SandboxMiddleware (wrap_tool_call / awrap_tool_call) to
detect fresh lazy initialization by diffing runtime.state before and after
the handler, and emit a proper state update via Command(update=...):
- ToolMessage results are wrapped into Command(update={sandbox, messages})
- Command results with a dict update are merged on the sandbox key while
preserving messages / goto / graph / resume
- Command results with non-dict updates are left untouched to avoid silent
data loss on unknown update shapes
Tests:
- 7 new unit tests cover lazy-init emit, passthrough, dict-update merge,
non-dict-update passthrough (sync and async)
- Refresh replay golden write_read_file.ultra.events.json: SSE 'values'
events now correctly carry the 'sandbox' key in their keys list, which
is the direct evidence that the fix is effective
Closes#3463
* refactor(sandbox): use dataclasses.replace to preserve Command fields
Address Copilot review on #3464: replace manual field-copy with
dataclasses.replace so any current or future Command fields are
preserved automatically when merging sandbox_update.
Also add a regression test that constructs a Command with non-None
graph/goto/resume to lock this behavior in.
* fix(frontend): paginate workspace chat list beyond 50 threads (#3482)
The sidebar 'Recent chats' and /workspace/chats list were hard-capped
at the first 50 threads returned by threads.search. Replace the
single-shot useThreads() consumers with useInfiniteThreads() and add
an IntersectionObserver sentinel to each list so further pages are
fetched on demand.
In search mode on the chats page, the sentinel is replaced by an
explicit 'Load more' button to prevent the observer from draining the
entire backend list while the filtered view stays empty.
- Add useInfiniteThreads + page-size constant and pure cache helpers
(map/filterInfiniteThreadsCache, getInfiniteThreadsNextPageParam)
- Mirror rename / delete / stream-finish updates into the new
infinite cache so optimistic UI stays consistent
- Extend the e2e mock to honour limit/offset slicing
- Unit tests for the cache helpers and pagination boundary
- Playwright e2e covering chats page + sidebar load-more, and the
search-mode guard against runaway auto-pagination
- Add en/zh i18n entries for the search-mode load-more button
Fixes#3482
* docs(frontend): clarify infinite-threads offset semantics and test post-delete invariant
- Add docstring to getInfiniteThreadsNextPageParam explaining that TanStack
Query freezes the returned offset into pageParams once, so optimistic cache
mutations that shrink page lengths (filterInfiniteThreadsCache on delete)
cannot retroactively move the offset backwards. Delete/rename paths
reconcile against the backend via invalidateQueries in onSettled.
- Add unit test covering the post-delete invariant.
- Fix misleading comment in thread-list-infinite-scroll.spec.ts: the
thread-search mock does not sort by updated_at; it returns the array in
the order provided.
Addresses Copilot CR comments on #3485.
* fix(frontend): mirror onCreated upsert into infinite cache; add sidebar Load-older button
Address review feedback on #3485:
- New upsertThreadInInfiniteCache helper; useThreadStream onCreated now
upserts into both the legacy ['threads','search'] cache and the new
infinite cache, so a freshly created thread appears in the sidebar
immediately during streaming instead of only after the run finishes
and onSettled invalidates the query. Restores parity with main.
- Sidebar Recent Chats now exposes a visible 'Load older chats' button
alongside the IntersectionObserver sentinel, so keyboard-only users
and environments where IO is unavailable can still reach older
conversations.
- Add zh-CN / en-US / types entry for chats.loadOlderChats.
- Cover the new helper with 3 unit tests (no-op on uninitialised cache,
prepend new thread to first page, merge with existing entry without
duplication).
When memory is enabled, the first conversation with a legacy shared agent
creates a per-user agent directory containing only memory.json (no
config.yaml). On the second turn, resolve_agent_dir() returned this
incomplete directory, causing load_agent_config() to fail with
"Agent config not found".
Require config.yaml to exist alongside the directory for both the
per-user and legacy paths, so that memory-only directories fall
through correctly. This aligns resolve_agent_dir with the existing
config.yaml check in list_custom_agents.
Refs: https://github.com/bytedance/deer-flow/issues/3390
* feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429)
Add a `memory.token_counting` option (`tiktoken` | `char`) so deployments in
network-restricted environments can opt out of tiktoken entirely. In `char`
mode the memory-injection budget uses a network-free character-based estimate
and never triggers the BPE download from openaipublic.blob.core.windows.net,
which could otherwise block for tens of minutes (see #3402).
Also harden the default `tiktoken` path:
- cache an in-flight LOADING sentinel so concurrent callers fall back
immediately instead of spawning more blocking get_encoding threads when the
first load is still running (e.g. under the 5s startup warm-up timeout);
- cache failures with a timestamp and retry after a cooldown so a transient
network outage self-heals back to accurate counting without a restart;
- skip startup warm-up entirely in char mode.
The new config is surfaced via the memory config API and config.example.yaml
(config_version bumped). Default remains `tiktoken`, so existing deployments
are unaffected.
* fix(memory): use CJK-aware char token estimate and address review feedback
- Replace the flat len(text)//4 fallback with a CJK-aware estimate so
Chinese/Japanese/Korean memory content does not over-fill the injection budget
- Document the internal tiktoken retry cooldown and char-mode escape hatch
- Sync CLAUDE.md / config.example.yaml / MEMORY_IMPROVEMENTS.md wording
- Fix MemoryConfigResponse mocks/assertions and add CJK estimate tests
POST /api/runs/stream and /api/runs/wait accept thread_id in the request
body but performed no owner authorization, letting any authenticated user
start runs on -- and read /wait checkpoint channel_values from -- another
user's thread (cross-user IDOR, #3472).
The @require_permission(owner_check=True) decorator resolves ownership from
the thread_id *path* param, so it cannot cover these body-param endpoints.
Enforce ownership inside start_run() before create_or_reject via
ThreadMetaStore.check_access: missing rows (auto-created temp threads) and
NULL-owner rows stay accessible, while a thread owned by another user
returns 404 (matching thread_runs.py). The internal system role (IM
channels acting for platform users) is exempt.
Closes#3472
The default `make up` started the Gateway with `--workers 4`, but run state
(RunManager and the stream bridge) is held in-process and nginx uses no sticky
sessions. With the default config, same-run requests scatter across workers that
each keep their own run state, breaking run cancellation (409), SSE reconnect
(hangs on heartbeats), multitask de-duplication, and IM channels (duplicate
replies). The shared cross-worker stream bridge does not exist yet.
Default GATEWAY_WORKERS to 1 so the out-of-the-box deployment is correct,
document the single-worker boundary in the README, and add a regression test
pinning the default while keeping it overridable. This is a stop-gap, not a
multi-worker implementation; the full fix (shared run state + stream bridge) is
tracked in #3191.
Refs #3239, #3260
In Docker production deployments, LocalSandboxProvider runs inside the
deer-flow-gateway container, so any `sandbox.mounts[].host_path` from
config.yaml is resolved against the gateway container's filesystem — not
the host machine. When the path isn't also bind-mounted into the gateway
service, the mount was silently dropped with only a WARNING log, leaving
agents reading an empty directory in production while the same config
worked under `make dev`.
Escalate the missing-host_path branch to logger.error with explicit
guidance about Docker bind mounts and docker-compose, so the failure is
hard to miss in default log configurations. Skip behaviour is preserved
to avoid breaking existing deployments.
Also clarify the misleading `VolumeMountConfig.host_path` field
description so it documents reality for both providers:
- LocalSandboxProvider checks host_path from inside the gateway process
(host in `make dev`, container in `make up`).
- AioSandboxProvider (DooD) passes host_path straight to `docker -v`
for the sandbox container, where the host Docker daemon resolves it
from the host machine's perspective.
config.example.yaml's `sandbox.mounts` comment gets a Note: block
pointing operators at the docker-compose bind-mount requirement so the
Docker-mode gotcha is discoverable from the canonical template.
Adds a regression test that:
- confirms missing host_path is still skipped (no behaviour break);
- asserts an ERROR record is emitted referencing the offending paths;
- asserts the message contains actionable Docker/gateway/docker-compose
keywords so future refactors can't quietly downgrade it.
Refs: https://github.com/bytedance/deer-flow/issues/3244
* fix: Solving the problem of "make dev" failing to start in Windows environment
* fix: revert the change to the startup_config and fix the lint errors
* fix: Address Copilot review feedback
- Validate wait-for-port input and avoid PowerShell port interpolation
- Require Python 3 in serve.sh launcher detection
- Keep Windows event loop policy setup in sitecustomize only
- Clarify sitecustomize process-wide backend behavior
* fix(agents): offload blocking filesystem IO in delete_agent off the event loop
delete_agent is an async route handler but resolved the agent directory (Paths.base_dir -> Path.resolve), probed it (Path.exists), and removed it (shutil.rmtree) directly on the event loop, blocking it for the duration of every delete. Surfaced by 'make detect-blocking-io'.
Move the resolve/exists/rmtree sequence into a sync helper run via asyncio.to_thread, mapping its outcome back to the existing 404/409/500 responses (behavior unchanged). Adds a tests/blocking_io/ regression anchor under the strict Blockbuster gate, mirroring test_skills_load.py (#1917).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agents): offload blocking filesystem IO in create_agent_endpoint too
Like delete_agent, the async create_agent_endpoint resolved and created the agent directory and wrote config.yaml + SOUL.md (with rmtree cleanup on failure) directly on the event loop. Move the whole create-or-409 sequence into a sync helper run via asyncio.to_thread; behavior is unchanged (201 / 409 / 500). Extends the blocking_io regression anchor to cover create as well as delete and renames it to test_agents_router.py.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* add caller identity in replay e2e
* make format
* fix(replay-e2e): stabilize title caller replay
* fix(replay-e2e): use captured caller without run manager
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Add PatchedChatStepFun adapter for StepFun reasoning models (step-3.7-flash,
step-3.5-flash). Captures reasoning from both streaming and non-streaming
responses and replays it on historical assistant messages for multi-turn
tool-call conversations.
- New: PatchedChatStepFun adapter with streaming/non-streaming reasoning capture
- Support both reasoning and reasoning_content field names
- 17 unit tests covering all response paths
- Updated: config.example.yaml with StepFun configuration example
Copying config.example.yaml to config.yaml and starting DeerFlow crashed with `pydantic ValidationError: models — Input should be a valid list [input_value=None]`, because the example ships every entry under `models:` commented out, so PyYAML parses the key as null. Reported in #1444.
Add a field_validator(mode="before") on AppConfig that coerces null models/tools/tool_groups to [] (matching their default_factory=list), and emit an actionable warning from from_file when no models are configured (pointing to config.example.yaml / make setup). Adds regression tests.
Closes#1444
Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(dev): create backend/sandbox before uvicorn reload-exclude (#3459)
#3426 switched the dev gateway's --reload-exclude patterns to absolute
paths. uvicorn only excludes an absolute path directly when it already
exists as a directory; otherwise it globs the pattern, and Python 3.12's
pathlib raises NotImplementedError("Non-relative patterns are unsupported")
for an absolute glob pattern. serve.sh mkdir'd the .deer-flow excludes but
not backend/sandbox, so `make dev` crashed on startup on a fresh checkout
under Python 3.12 (#3454). docker/dev-entrypoint.sh had the same latent gap.
Create backend/sandbox in both launchers so every absolute exclude stays on
uvicorn's is_dir() short-circuit. Add a regression test that pins the uvicorn
mechanism (crash on missing dir, safe once created) and enforces that every
absolute --reload-exclude is mkdir'd before launch.
Closes#3459
* test(dev): harden reload-exclude invariant parser against false pass/negatives
The launcher invariant test parsed shell with a "mkdir -p" line filter and a
substring membership check. Two latent gaps (sub-threshold for this fix, but
this code guards a user-facing startup path, so close them):
- A `\`-continued multi-line `mkdir` would drop arguments on continuation
lines, silently weakening coverage.
- Substring membership could false-pass when an exclude is a path-prefix of a
different created dir (e.g. `/app/backend/sandbox` "found" inside
`/app/backend/sandbox-other`).
Fold line-continuations, drop comments, and shlex-tokenize each `mkdir`
argument list into an exact set (quotes stripped, `$VAR` literal); assert exact
set membership. Same shlex handling for `--reload-exclude` values. Verified the
parser still flags the pre-fix missing `backend/sandbox` (RED preserved) and no
longer false-passes on a path-prefix.
* fix(dev): gitignore backend/sandbox runtime dir + pin mkdir-before-launch
Address two review findings on the #3459 fix:
- backend/sandbox was described as "gitignored runtime state" but no ignore
rule actually matched it. Add an anchored `/sandbox/` to backend/.gitignore
(anchored so it does NOT shadow the source package
backend/packages/harness/deerflow/sandbox/) so sandbox artifacts created at
runtime can't pollute the working tree or be committed by accident. New test
asserts content under backend/sandbox is ignored, making the claim verifiable.
- The launcher invariant test only proved the sandbox mkdir exists somewhere,
not that it runs before uvicorn starts. Add an order test (sandbox mkdir line
must precede the `uv run uvicorn` launch) so a future edit can't move the
mkdir below the launch and silently reintroduce the crash.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* test(dev): fix reload-exclude parser to handle serve.sh's quoted flag bundle
The previous autofix tokenized each whole line with shlex, but serve.sh packs
every flag into a single double-quoted `GATEWAY_EXTRA_FLAGS="..."` assignment.
shlex collapses that into one token, so no `--reload-exclude` flag is found and
`test_launcher_precreates_every_absolute_reload_exclude[scripts/serve.sh]`
failed CI with "expected at least one absolute reload-exclude".
Parse `--reload-exclude` with a regex that matches a balanced single/double
quoted group or a bare token, so the assignment's surrounding `"` is never
swallowed into the value. This recovers all three serve.sh excludes (the prior
regex also silently dropped the last `$BACKEND_RUNTIME_HOME` because the
adjacent closing quote broke shlex) while still covering dev-entrypoint.sh and
the space-separated `--reload-exclude <value>` form.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
`client.py` imported the private `_build_middlewares` from `agent.py` across a
module boundary and called it as public API. Because the `_` name signals
"module-private, no external callers", any future rename or signature change
silently breaks the embedded `DeerFlowClient` path — and the test suite even
monkeypatched `deerflow.client._build_middlewares`, baking the leak in.
`DeerFlowClient` is a lead-agent variant that genuinely needs the lead agent's
full middleware composition, so make the dependency honest: promote the helper
to a documented public entry point `build_middlewares` and update every in-repo
caller. Found during #3341 review; #3341 already removed one such leak
(`_assemble_deferred` -> public `assemble_deferred_tools`) and left this one out
of scope on purpose.
- agent.py: rename def + both internal call sites; expand the docstring into a
public-entry-point contract and document the previously-undocumented
model_name / app_config / deferred_setup params
- client.py: import + call site now use the public name (removes the last
cross-module private import)
- scripts/tool-error-degradation-detection.sh: update its import + call site
- tests (5 files): update monkeypatch/patch targets and direct calls
- docs (backend/CLAUDE.md, plan_mode_usage.md, middlewares.mdx): sync the live
references that describe the symbol as current API
Pure mechanical rename, no behavior change. Historical design docs (rfc,
superpowers spec) intentionally keep the old name as point-in-time records.
Closes#3431
* fix(ci): consolidate PR/issue labeling into one triage.yml; fix reviewing crash & label thrash
- Replace pr-labeler + pr-triage + issue-triage with a single triage.yml; drop actions/labeler.
Its sync-labels removed labels outside its config (clobbered size/risk/needs-validation and
could clobber maintainer labels). Area is now computed in-script and reconciled only within
owned namespaces (area:/size//risk:/needs-validation); first-time/reviewing are add-only.
- reviewing: gate on author_association in {OWNER,MEMBER,COLLABORATOR} + user.type==='User'
instead of getCollaboratorPermissionLevel, which 404'd on bot reviewers ('Copilot is not a
user') and crashed the job. Excludes all review bots with no denylist and no API call.
- Read live state (listFiles + listLabelsOnIssue) not the stale event payload, so rapid
synchronize events converge instead of thrashing. Size churn excludes lockfiles/snapshots.
* fix(ci): read labels live via paginate in reviewing & issue-triage jobs
Address review feedback on #3455:
- reviewing: listLabelsOnIssue now paginates (per_page:100) instead of the
default 30, matching pr-labels, so a 'reviewing' label is never missed on
PRs with many labels.
- issue-triage: read live labels via the API instead of the event payload,
consistent with the live-state reads documented in the header.