mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-18 13:46:02 +00:00
6044e5c5535d5f7a9aadb471ee0f1a6db279c48a
335 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
525af0da14 |
fix(channels): scope IM files and helper commands to owner (#3579)
* fix(channels): scope IM files and helper commands to owner * fix(memory): honor bound IM owner for /memory gateway endpoints The channel manager already attaches X-DeerFlow-Owner-User-Id for /memory and /models, but the memory router resolved user_id solely from get_effective_user_id(), which returns the synthetic internal user (DEFAULT_USER_ID) for channel workers. A bound IM /memory therefore read the default/internal memory instead of the connection owner's. Resolve the owner via _resolve_memory_user_id(request) across all /api/memory* endpoints: trusted internal callers act for the owner header, browser/API callers fall back to get_effective_user_id(). Mirrors the threads router's get_trusted_internal_owner_user_id pattern, completing acceptance criterion #3 of #3539. Add end-to-end tests asserting the resolved user_id (not just that the header is sent) and that a spoofed owner header from a browser user is ignored. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): align memory bucket and reuse cached storage owner Address PR #3579 review feedback: - Memory router now sanitizes the trusted owner header via make_safe_user_id before routing, matching the channel file pipeline (_safe_user_id_for_run/prepare_user_dir_for_raw_id). A bound owner id needing sanitization now resolves to the same bucket as its files/uploads instead of 500ing in _validate_user_id. - _handle_chat reuses the storage_user_id cached at the top of the method for artifact delivery instead of re-deriving _channel_storage_user_id(msg), so uploads and outputs cannot drift to different buckets if a channel rewrites the InboundMessage in receive_file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): stage unbound IM files under the run's user bucket Address PR #3579 review feedback (#5): _channel_storage_user_id now mirrors _resolve_run_params' identity policy, falling back to safe(msg.user_id) instead of returning None for unbound auth-enabled channels. Previously an unbound msg ran under safe(platform_user_id) but staged uploads under get_effective_user_id() in the dispatcher task (unset contextvar -> "default"), so files landed in users/default/... while the agent read from users/{safe_platform_user_id}/.... Bound and unbound channels now write where the agent reads. Returns None only when no identity is available. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): reuse cached storage owner in streaming artifact delivery Address PR #3579 review feedback (#6): thread the storage_user_id resolved in _handle_chat into _handle_streaming_chat instead of re-deriving _channel_storage_user_id(msg) in the finally block. Avoids re-running _safe_user_id_for_run (and its possible filesystem touch) on the streaming-error path and guarantees artifact delivery targets the same bucket as the uploads. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(channels): document owner-scoped IM file storage Address PR #3579 review feedback (#4): the IM Channels and File Upload sections still described pre-PR default-bucket behaviour. Document that receive_file, _ingest_inbound_files/ensure_uploads_dir/get_uploads_dir, and _resolve_attachments/_prepare_artifact_delivery are owner-scoped via the user_id kwarg, and that the bucket matches the memory bucket from _resolve_memory_user_id. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(channels): unify run identity and storage bucket resolution Address PR #3579 review feedback (#3): _resolve_run_params no longer duplicates the owner-resolution rule inline. After the #5 fix the inline block and _channel_storage_user_id computed the identical sanitized-with-platform-fallback value, so the run identity now calls the same helper, making it the single source of truth for run_context["user_id"] and the file/artifact storage bucket. _owner_headers stays deliberately separate: it sends the raw owner id over HTTP for the gateway to re-resolve (no sanitize, no platform fallback), documented on both helpers. test_run_identity_matches_storage_bucket pins the two together so they cannot drift again. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
68ba4198b8 |
fix(channels): make channel connect flow deterministic (#3582)
* fix(channels): make channel connect flow deterministic * make format * fix(channels): apply connect-code before allowed_users on telegram and wechat The bind-bootstrap reorder shipped for slack/dingtalk only. Telegram and WeChat still gate _check_user/allowed_users before connect-code dispatch, so a newly allowlisted-but-unbound user is silently rejected when binding via the browser deep-link / connect-code flow — the same deadlock the PR fixes. - telegram: consume the /start deep-link token before the allowed_users gate. - wechat: handle the /connect code before the allowed_users gate, and defer inbound file extraction + context-token tracking past the gate so blocked senders no longer trigger CDN downloads or token bookkeeping. Adds regression tests for both adapters mirroring the slack/dingtalk coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): enforce single-active-owner invariant at the DB layer _revoke_other_active_owners did a SELECT-then-UPDATE in app code with no row lock or constraint covering active rows. Under READ COMMITTED, two concurrent connect-code consumes for the same (provider, external_account_id, workspace_id) from different owners could each observe "no other active owner" and both commit a connected row, leaving find_connection_by_external_identity nondeterministic. - Add a partial unique index on (provider, external_account_id, workspace_id) WHERE status != 'revoked' (portable to SQLite >= 3.8.0 and PostgreSQL) so the database guarantees at most one non-revoked row per external identity. - Reorder upsert_connection to revoke other owners' active rows before the new connected row is flushed (so the index is satisfied at commit), wrapped in a bounded rollback-and-retry loop. A losing concurrent writer now retries against the now-visible state instead of committing a duplicate. Adds DB-constraint, revoked-slot-reuse, and concurrent-upsert regression tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): harden connect-status polling primitive pollChannelConnectionUntilResolved was a free-floating recursive setTimeout started from onSuccess with no cancellation, no per-provider dedup, a redundant second endpoint per tick, and an unbounded loop on a non-finite expires_in. - Extract a framework-agnostic, cancellable poller (connect-poll.ts) that polls only listChannelConnections() and invalidates the providers query once when the bind resolves, instead of fetching both endpoints every tick. - Guard expires_in with a finite check + default window so undefined/NaN can no longer produce a poll loop that runs until the page closes. - Track one active poll handle per provider in useConnectChannelProvider via a ref Map: a new connect cancels the prior poll for that provider, and a useEffect cleanup cancels all polls on unmount. Adds unit tests for resolve-and-stop, cancellation, and non-finite-expiry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): stop leaking blocked-sender content in DingTalk INFO log; document bind semantics Moving the allowed_users gate past _extract_text meant the parsed-message INFO log (text=%r, first 100 chars) fired for senders that allowed_users would have rejected, defeating the filter's noise/privacy role. Move that log to after the allowed_users gate so blocked senders' message text never reaches INFO logs. Also document the two operator-relevant semantic changes in backend/CLAUDE.md: connect-code dispatch runs before allowed_users (so allowed_users is no longer a bind-time defense; the model relies on code confidentiality + 600s TTL + one-time consumption), and the single-active-owner-per-external-identity transfer semantics now backed by the partial unique index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(channels): note connect-code-vs-allowlist and ownership transfer in operator guide Mirror the backend/CLAUDE.md notes in the operator-facing IM_CHANNEL_CONNECTIONS.md: connect codes are consumed before allowed_users (so a not-yet-allowlisted user can still complete a first bind, and allowed_users is not a bind-time defense), and an external identity has at most one active owner with last-bind-wins transfer enforced at the DB layer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(channels): lift connect-code dispatch into Channel base class Each adapter duplicated the ordering-sensitive boilerplate of extracting a /connect code and guarding on the connection repo before its allowed_users gate. The duplication is what let telegram/wechat drift and keep the gate ahead of the bind. Centralize it: - Move `_connection_repo` onto Channel.__init__ (removing 7 duplicate assignments). - Add Channel._pending_connect_code(text), which guards on the repo and extracts the code, documenting that adapters MUST consult it before authorization so a browser-initiated bind can bootstrap a not-yet-authorized identity. - Route slack, discord, feishu, dingtalk, wechat, and wecom through the helper. This also fixes a latent inconsistency where slack dispatched a bind even when no connection repo was configured. Pure refactor — the full channel suite stays green; adds a direct unit test for the base helper's contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * make format * fix(channels): redact DingTalk parsed-message INFO log content Log text_len instead of the first 100 chars of message text, so message content never reaches INFO logs (the after-gate move already keeps blocked senders out entirely). This takes over the redaction from #3584 so only this PR touches dingtalk.py, letting the two PRs merge in any order conflict-free. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
8c0830aea1 |
fix(channels): add operational guardrails (#3584)
* fix(channels): add operational guardrails * make format * fix(channels): converge with #3582 to avoid merge-order conflicts Drop this PR's DingTalk INFO-log redaction and hand it to #3582, which already restructures that handler and will redact the same log there. This PR no longer touches dingtalk.py, so the two PRs can merge to main in any order without a conflict. For WeChat, drop the contested thread_ts priority reorder (review #3) and keep only what inbound dedupe needs: a server-stable message_id in the inbound metadata (message_id/msg_id, no client_id per review #6). This is a single added line inside the metadata dict, a region #3582 never touches, so it auto-merges regardless of order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): address three correctness review findings 1. Connect-code cap was racy (willem #1): _create_state ran delete-expired, count, and insert as three separate transactions, so concurrent connect POSTs from one owner could each see count < cap and all insert past it. Add ChannelConnectionRepository.create_oauth_state_within_cap which does delete+count+insert in a single transaction serialized per (owner, provider) — Postgres via pg_advisory_xact_lock, SQLite via the write lock the leading DELETE takes — and have the router use it. 2. Inbound dedupe key fell back to "" workspace (willem #3): two workspaces delivering without team/guild/aibotid would collapse to the same key and dedupe each other's messages. _inbound_dedupe_key now fails closed (returns None) when no workspace identifier is present. 3. Dedupe key was recorded on receipt and never released on failure (ShenAC #1): a transient error (DB blip, Gateway 503) left the key in place for the full TTL, so a provider redelivery of the same message_id — exactly the retry dedupe should absorb — was silently dropped. _handle_message now releases the key in the unexpected-exception branch so redelivery can recover, while keeping record-on-receipt so retries during handling are still deduped. Tests: repo cap enforcement incl. concurrent-issuance non-leak; dedupe fail-closed; dedupe key release-on-failure redelivery recovery. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): address cleanup/efficiency and test review findings Efficiency / cleanup: - Dedupe key set drops client-generated ids (client_msg_id, client_id); keep only server-stable event_id/message_id/msg_id, which a provider's own redelivery preserves (ShenAC #6). Every provider already emits message_id. - TTL/overflow pruning of _recent_inbound_events is now O(k): switch to an OrderedDict and popitem(last=False) from the front instead of scanning all 4096 entries on every inbound (willem #4). - Log "received inbound" only after the dedupe check so a provider retrying N times no longer logs N accepts; document that manager dedupe covers the agent run/final answer, not provider ack side-effects (willem #5, ShenAC #2). - Slack drops the redundant `team_id or event.get("team")` fallback the caller already resolved (willem #6). - create_oauth_state_within_cap prunes only this owner/provider's expired codes instead of a global DELETE on every connect POST; global cleanup still runs on consume_oauth_state (willem #7). Tests: - Dedupe test uses tmp_path instead of a leaked mkdtemp, uses distinct objects per publish, and adds a negative control: a different message_id is still processed, catching over-dedupe regressions (willem #8, ShenAC #4). - Slack HTTP-mode rejection test supplies app_token so the missing-token early return can't mask the guard, giving the state assertions teeth (ShenAC #3). - count_oauth_states test pins that the active row survives, not just the count (ShenAC #5). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * make format --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
97dd9ecf73 |
fix(sandbox): stop flagging string-literal path fragments as unsafe absolute paths (#3623)
* fix(sandbox): stop flagging string-literal path fragments as unsafe paths
The host-bash absolute-path guard scans the raw command string, so /segment
sequences inside string literals, f-strings, and templates were treated as
absolute path arguments and rejected — e.g. python -c "print(f'/端口{port}')"
or a REST template /devices/{id}/port. Whether a fragment tripped the guard
depended on the character right before the slash (a word char suppressed the
match), so the same literal could pass or fail unpredictably, pushing the model
into retry loops that bloat context and wall-clock time.
Exempt matches carrying non-ASCII characters or format braces: real host paths
a command would open contain neither, so these are text, not paths. The guard
is best-effort (not a security boundary), and plain ASCII host paths like
/etc/passwd — including ones written inside a code string such as
open('/etc/passwd') — stay rejected.
* fix(sandbox): only exempt identifier-template braces, not bash brace expansion
The literal-fragment exemption exempted any path fragment containing { or },
which let bash brace expansion (cat /etc/{passwd,shadow}) and ${VAR} expansion
reconstitute real host paths past validate_local_bash_command_paths. Tighten
the brace branch to only exempt fragments where every {...} block is a single
identifier-like placeholder (/devices/{id}/port, f-string /{port}); reject
${VAR} shell-variable expansion. Add parametrized regression tests for the
brace-expansion and shell-var bypasses.
|
||
|
|
0bbbbc06f4 |
feat(community): add Serper Google Images provider for image_search (#3575)
* feat(community): add Serper Google Images provider for image_search Add a Serper-backed `image_search` tool alongside the existing Serper `web_search` provider, so users with a SERPER_API_KEY can pull Google Images results as reference images for downstream image generation. - Share request/response handling between web_search and image_search via `_serper_post` / `_response_items`, with bounded `max_results` (capped at 10) and query normalization. - Add a best-effort SSRF guard (`_safe_public_url`) that rejects non-http(s), localhost and private/non-global IP image URLs; filtered entries are dropped and never consume the result limit. - doctor: flag literal `api_key` values in config as a warning and steer users toward `.env` + `$SERPER_API_KEY`. - Docs/config: document the Serper image_search provider and SERPER_API_KEY, and discourage committing literal keys to config.yaml. - Tests: cover the provider end-to-end (100% line coverage on tools.py) and the doctor literal-key warning path. * fix(community): block obfuscated IPv4 literals in Serper image SSRF guard The image_search SSRF guard only rejected dotted-decimal IP literals; encoded forms such as decimal (http://2130706433/), hex (0x7f000001) and octal (0177.0.0.1) raised ValueError in ip_address() and were allowed through, even though many HTTP clients resolve them to private addresses like 127.0.0.1. Add _decode_ipv4() to permissively decode these inet_aton-style encodings and apply the same is_global check; hostnames that do not decode to an IP (e.g. cafe.com) are still treated as hosts and left to fetch-time re-validation. Addresses PR review feedback. Tests cover decimal/hex/octal loopback and private encodings plus non-IP edge cases; tools.py stays at 100% line coverage. * test(community): cover IPv4-mapped IPv6 URL filtering * fix(community): address Serper image search review feedback - Block trailing-dot hostname SSRF bypass (localhost./127.0.0.1.) in _safe_public_url by stripping the FQDN root label before checks. - Keep a filtered image/thumbnail URL empty instead of collapsing onto its counterpart, preserving the high-res/preview contract. - Evaluate the SSRF guard once per field rather than twice. - Treat a null-typed organic/images field as "no results" rather than a malformed payload. - doctor.py: when a config $VAR is unset, fall through to the default env var before reporting it as not set. |
||
|
|
ec16b6650d |
fix(channel): force reload config on channel restart (#3619)
* fix: force reload config on channel restart * fix: detect config content changes for reload |
||
|
|
6a4a30fa2b |
fix(sandbox): return actionable hint when read_file hits a binary file (#3624)
read_file decodes with UTF-8. Binary uploads (.xlsx, images, ...) raise UnicodeDecodeError in the sandbox layer. UnicodeDecodeError is a ValueError subclass, not an OSError, so it bypassed the typed handlers and fell through to the generic except, surfacing a vague "Unexpected error reading file" message. The model could not tell the file was binary, so it retried read_file instead of switching to bash + pandas/openpyxl, burning LLM round-trips and bloating context with repeated failures. Add a dedicated UnicodeDecodeError handler that tells the model the file is binary and to use bash with a suitable library (or view_image for images). |
||
|
|
c81ab268fb | fix(serialization): stop stripping __interrupt__ from channel values (#3595) (#3605) | ||
|
|
a72af8ea37 |
feat(subagents): attribute subagent spans to parent thread's Langfuse session (#3611)
The subagent execution path did not call inject_langfuse_metadata(...) and built its model with attach_tracing=True, so subagent LLM/tool spans landed in Langfuse as isolated top-level traces carrying fresh session ids and the default user. They were findable in the unfiltered trace list but did not group under the parent thread's session card, and Langfuse cost attribution for subagent traffic did not line up with the parent conversation — even though DeerFlow's internal token accounting (SubagentTokenCollector) was already correct. Extend the lead-agent tracing wiring to the subagent path so a single subagent run produces one trace that shares the parent thread's session_id and user_id, with a subagent:<name> trace name: - subagents/executor.py: append build_tracing_callbacks() output to run_config["callbacks"] (preserving SubagentTokenCollector) and call inject_langfuse_metadata(...) with thread_id, user_id, and the normalized subagent:<name> trace name. Build the model with attach_tracing=False so model-level tracing does not double-count with the graph-root callbacks — the same pairing the lead agent uses. - tools/builtins/task_tool.py: resolve user_id via resolve_runtime_user_id(runtime) at the parent tool layer (before the background thread starts) and thread it through SubagentExecutor.__init__, because the _current_user contextvar is not guaranteed to survive the _execution_pool boundary. Trace topology is unchanged: subagent traces remain separate top-level traces in the same session, not nested as child spans under the lead trace (Plan B follow-up). Tests: tests/test_subagent_executor.py::TestSubagentTracingWiring covers the callback append, the session/user/trace-name injection, the disabled-langfuse no-op, the DEFAULT_USER_ID fallback, the empty-name trace-name fallback, and the env-tag emission. Existing test_create_agent_threads_explicit_app_config_to_model_and_middlewares now also asserts attach_tracing=False. Docs: CLAUDE.md Tracing System section documents subagents/executor.py as a third injection point alongside worker.py and client.py. |
||
|
|
f212da9f89 |
fix(sandbox): create shell session before retrying on a fresh id (#3577)
* fix(sandbox): create shell session before retrying on a fresh id The AIO sandbox recovery path generated a UUID and passed it straight to exec_command(id=...). The sandbox image only auto-creates a session when exec_command is called with *no* id; an exec carrying an unknown id returns HTTP 404 "Session not found". So every ErrorObservation recovery itself 404'd, turning a transient session lapse into an unrecoverable tool error that looped the run up to the LangGraph recursion limit. Explicitly create_session(id=fresh_id) before targeting that id on retry. create_session is idempotent (returns the existing session if the id already exists), so this is safe under the serializing lock. Updated the regression test to assert the retry targets exactly the created session id rather than a fabricated, uncreated one. * fix(sandbox): release the one-shot recovery session after retry The fresh session created on the ErrorObservation recovery path is used for exactly one command -- the next execute_command runs with no id and returns to the default session. Under persistent session corruption every command would create another session that is never reused or released, accumulating sessions on the container. Release it best-effort with cleanup_session() in a finally, swallowing any cleanup error so it never masks a successful retry. Addresses review feedback on #3577. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
926406e0d6 |
fix(channels): make runtime provider state authoritative (#3580)
* fix(channels): make runtime provider state authoritative * make format * fix(channels): close runtime provider config races and status gaps Address review findings on the runtime-provider-state change: - configure/disconnect now re-read the live app.state.channels_config after the worker await and mutate only the affected provider key in place, so a concurrent mutation for a different provider is no longer clobbered by a stale pre-await snapshot. - disconnect revokes DB connection rows before committing the store and cache, so a repo failure cannot leave the store/cache "disconnected" while the DB keeps "connected" rows a later re-configure would silently reactivate. - _provider_response preserves non-connected statuses (e.g. revoked) when the provider is unavailable, only masking a stale "connected" row as not_connected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
0966131b31 |
fix(channels): require bound identity for user-owned IM messages (#3578)
* fix(channels): require bound identity for user-owned IM messages * make format * docs: document bound identity channel config * refactor: reuse channel connection config * refactor _requires_bound_identity() * refactor from_app_config() * make format * fix: reject unbound channel chats before semaphore * security enhancement * make format * fix: enforce bound-identity admission at command entry point The bound-identity gate only ran for non-command messages in _handle_message() and as a fallback inside _handle_chat(). Commands had no equivalent boundary, so an unbound platform user could send /new and reach _create_thread() directly, creating an unowned Gateway thread and empty checkpoint. Info commands (/status, /models, /memory) likewise leaked Gateway state to unbound users. Add the same _requires_bound_identity() check at the top of _handle_command(), rejecting via _reject_unbound_channel_message() before any thread creation or Gateway query. The gate is a no-op in legacy open-bot mode (require_bound_identity=False) and auth-disabled mode. Provider-level binding flows (/connect, /start) are consumed by the provider adapter before reaching the manager, so they are unaffected. Tests: - unbound auth-enabled /new is rejected before threads.create - bound auth-enabled /new still creates the thread Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): carry workspace fallback decision on inbound messages * fix(channels): recheck bound identity by normalized workspace * fix(channels): avoid duplicate bound identity checks * fix(channels): preserve verified routing for bound identity rejects * fix(channels): clarify bound identity upgrade failures --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
05be7ea688 |
fix(subagents): raise general-purpose max_turns to 150 and default timeout to 30min (#3610)
* fix(subagents): raise general-purpose max_turns to 150 and default timeout to 30min Deep-research subtasks failed out of the box with GraphRecursionError (Recursion limit of 100 reached): the built-in general-purpose subagent caps at max_turns=100. Raise it to 150 and bump the default subagent timeout from 900s (15min) to 1800s (30min) so the extra turns have time to run instead of shifting the failure to a timeout. The lead agent recursion_limit (100) is unchanged; the failures are subagent-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(subagents): clarify lead recursion_limit is independent of subagent max_turns Add comments at both lead recursion_limit=100 sites (gateway services + channel manager) explaining the lead's LangGraph super-step budget is separate from subagent depth, so the two 100s are not conflated. Comment-only, no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(subagents): clarify built-in vs custom timeout scope; pin bash max_turns in test Review follow-ups: (1) clarify SubagentConfig docstring + global timeout field/comment that the 1800 default applies to built-in subagents (custom agents keep their own timeout_seconds); (2) pin bash.max_turns==60 in the defaults regression test so the config.example.yaml doc cannot drift; (3) rename test_default_timeout_preserved_when_no_config -> test_explicit_global_timeout_propagates_to_general_purpose since it intentionally exercises an explicit non-default 900. No runtime behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
d2cc991d55 | make ai follow-up suggestions optional (#3591) | ||
|
|
f43aa78107 |
fix(agents): sync agent_name across context/configurable and reject empty soul (#3549) (#3553)
* fix(agents): sync agent_name across context/configurable and reject empty soul (#3549) Two independent issues caused custom agent creation to silently fail: 1. build_run_config only wrote agent_name into one container (configurable or context), so setup_agent — which reads ToolRuntime.context exclusively since LangGraph >=1.1.9 — saw agent_name=None and wrote SOUL.md to the global base_dir instead of users/{user_id}/agents/{name}/. Mirror the dual-write pattern already used by merge_run_context_overrides and naming.py so both containers always carry the same value. 2. setup_agent persisted whatever soul string it received, including empty or whitespace-only content, and still reported success. The frontend then surfaced an unusable agent and the global default SOUL.md could be silently overwritten with empty content. Reject empty soul before any filesystem operation so the model can retry. Tests: - test_gateway_services.py: dual-write regressions for both configurable and context entry paths, explicit-agent-name precedence on both sides, and a shape-parity test against merge_run_context_overrides. - test_setup_agent_tool.py: empty/whitespace soul rejection, plus no-overwrite guarantees for existing global and per-agent SOUL.md. * Update services.py |
||
|
|
47e9570d86 |
fix(subagent): isolate subagent from parent run checkpointer (#3559)
Subagent _create_agent() now passes checkpointer=False to prevent inheriting the parent run's synchronous checkpointer via copy_context(), which would cause NotImplementedError when aget_tuple() is called on the async path. Subagents are one-shot delegations that never resume, so persistence is unnecessary. |
||
|
|
6e839342a7 |
feat(community): add Brave Search web search tool (#3528)
* feat(community): add Brave Search web search tool Add a community web_search provider backed by the official Brave Search API (https://api.search.brave.com/res/v1/web/search). API key is read from the tool config (inline api_key) or the BRAVE_SEARCH_API_KEY env var. Output schema (title/url/content) matches existing search tools. No new dependencies (uses the existing httpx). Also wires up the setup wizard, doctor health check, config example, and EN/ZH docs. * refactor(community): drop redundant [:count] slice in Brave search The Brave API already caps results via the `count` request param, so client-side slicing was redundant. Tests now simulate the API honoring `count` instead of relying on the slice. Addresses PR review nit. * style(tests): apply ruff format to test_doctor.py Collapse multiline write_text calls onto single lines to satisfy the CI ruff formatter (lint-backend was failing on format --check). |
||
|
|
8955b3222a |
fix(sandbox): merge idempotent sandbox state updates (#3518)
* fix(sandbox): merge idempotent sandbox state updates * fix(sandbox): merge idempotent sandbox state updates |
||
|
|
094296440f |
fix(history): strip base64 image data from REST endpoint responses (#3535)
ViewImageMiddleware persists full base64 image payloads in hide_from_ui human messages inside checkpoints. All REST endpoints that returned serialize_channel_values(channel_values) sent these multi-megabyte payloads to the frontend, freezing the UI on threads with images. Add strip_data_url_image_blocks() to remove data:-scheme image_url content blocks from hide_from_ui messages, and serialize_channel_values_for_api() as a convenience wrapper used by all six affected call sites across threads, runs, and thread_runs routers. SSE streaming is unaffected (still uses serialize_channel_values). Fixes #3496 |
||
|
|
579e416459 |
perf(runtime): index messages in MemoryRunEventStore to avoid O(n) scans (#3531)
list_messages re-scanned every event in the thread on each call (category filter + seq filter) — O(total events) per paginated request on the default run-events backend. Maintain a messages-only, seq-sorted projection of _events (shared dict refs, no copies) and locate the seq window with bisect: list_messages drops to O(log m + page) and count_messages to O(1). The index is kept in lockstep at every mutation site (put / put_batch via _put_one, delete_by_run, delete_by_thread). Externally observable behavior is unchanged — the full RunEventStore contract suite passes across memory/db/jsonl. Add a test covering pagination over non-contiguous message seqs (messages interleaved with trace events), including in-gap and exact-boundary cursors. Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
c002596ab4 | chore(todo): remove unused completion reminder counter (#3530) | ||
|
|
0d3bfe0a76 |
perf(runtime): index runs by thread_id to avoid O(n) scans in RunManager (#3499)
* perf(runtime): index runs by thread_id to avoid O(n) scans in RunManager RunManager.list_by_thread, create_or_reject (inflight check), and has_inflight each filtered every in-memory run by thread_id — an O(total in-memory runs) scan that grows with overall gateway traffic rather than the queried thread's depth. Add a thread_id -> run_ids secondary index (an insertion-ordered dict used as an ordered set) maintained in lockstep with _runs under the existing lock at every add/remove site (create, create_or_reject, both rollbacks, cleanup). The three per-thread queries now run in O(runs-in-thread); insertion order is preserved so list_by_thread keeps stable tie-breaking. Behavior unchanged. Adds 6 regression tests; full RunManager suite 146 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(runtime): cover create_or_reject rollback + clarify thread-index guard docstrings Address review on #3499 (fancyboi999): - Reword _thread_records_locked docstring: lockstep under self._lock is the correctness guarantee; self._runs.get is one-directional defense-in-depth (drops stale ids, cannot recover index-missing ids), not reconciliation. - Add test_failed_create_or_reject_unindexes_run covering the create_or_reject rollback/unindex mutation site (the last untested mutation path). - Fix _FailingPutRunStore docstring ("initial put" -> "every put"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
aa015462a7 |
feat(im): Add user-owned IM channel connections (#3487)
* Add user-owned IM channel connections * Fix dev startup and channel connect popup * Use async channel connect flow * Harden dev service daemon startup * Support local IM channel connections * Align IM connections with local channels * Fix safe user id digest algorithm * Address Copilot IM channel feedback * Address IM channel review comments * Support all integrated IM channel connections * Format additional channel connection tests * Keep unavailable channel connect buttons clickable * Fix IM channel provider icons * Add runtime setup for enabled IM channels * Guard global shortcut key handling * Keep configured IM channels editable * Avoid password autofill for channel secrets * Make channel threads visible to connection owners * Persist IM runtime config locally * Allow disconnecting runtime IM channels * Route no-auth channel sessions to local user * Use default user for auth-disabled local mode * Show IM channel source on threads * Prefill IM channel runtime config * Reflect IM channel runtime health * Ignore Feishu message read events * Ignore Feishu non-content message events * Let setup wizard enable IM channels * Fix frontend formatting after merge * Stabilize backend tests without local config * Isolate channel runtime config tests * Address channel connection review comments * Use sha256 user buckets with legacy migration * Ensure runtime IM channels are ready after restart * Persist disconnected IM channel state * Address channel connection review comments * Address channel connection review findings Frontend connect flow: - Open the runtime-config dialog only when a provider still needs credentials; configured providers go straight to the connect flow, so the binding-code/deep-link path is reachable from the UI again. - After saving credentials, continue into the connect flow when a user binding is still required (multi-user mode) instead of stopping at a "Connected" toast. - Extract shared provider-state helpers to core/channels/provider-state and add unit + e2e coverage for the direct-connect and configure-then-connect paths. Provider status semantics: - Report connection_status from the user's newest connection row; with no binding it is not_connected, except in auth-disabled local mode where a configured running channel is effectively connected. Concurrency and event-loop correctness: - Offload ChannelRuntimeConfigStore construction and writes, channel service construction, and Slack connection replies to threads; add a tests/blocking_io/ anchor for the runtime-config handlers. - Consume binding codes with a conditional UPDATE so a code can only be used once under concurrent workers; retry upsert_connection as an update when a concurrent insert wins the unique constraint. - Serialize ensure_channel_ready per channel so concurrent provider polls cannot double-start a channel worker. Config and migration hardening: - Stop mutating the get_app_config()-cached Telegram provider config; the runtime store now owns the UI-entered bot username. - Register channel_connections in STARTUP_ONLY_FIELDS with the standardized startup-only Field description. - Match the legacy unsafe-id bucket by recomputing its exact SHA-1 name so another user's same-prefix bucket can never be migrated. - Remove the unused Telegram process_webhook_update path and document src/core/channels in the frontend docs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Address PR review comments on authz scoping and channel runtime Security (review feedback from ShenAC-SAC): - Scope internal-token callers to the connection owner carried in X-DeerFlow-Owner-User-Id instead of bypassing owner checks outright, in both require_permission(owner_check=True) and the stateless run endpoints. Internal callers keep access to their own and shared/legacy threads, and may claim a default-owned channel thread for its real owner, but a leaked internal token no longer grants cross-user thread access. - Require admin privileges for POST/DELETE /api/channels/{provider}/ runtime-config: runtime credentials and channel workers are instance-wide shared state (same model as the MCP config API). Read-only provider listing stays available to all users. Performance (review feedback from willem-bd): - Skip the redundant thread channel-metadata PATCH after the first successful backfill per thread. - Reuse the per-connection Slack WebClient until its token changes instead of constructing one per outbound message. - Reconcile channel readiness for all providers concurrently in GET /api/channels/providers. Also resolve the code-quality unused-import flag in the blocking-io anchor by pre-importing the channel service via importlib. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Fix prettier formatting in provider-state test Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Reconcile UI runtime channel config with config reload on restart Main now reloads a channel's config.yaml entry on restart_channel() (#3514, issue #3497). Adapt the user-owned connection flow to coexist: - configure_channel() restarts with reload_config=False — the caller just supplied the authoritative config (browser-entered credentials that are never written to config.yaml), so a file reload must not clobber it with the stale on-disk entry. - _load_channel_config() re-applies the UI runtime-store overlay used at startup, so an operator-triggered restart keeps browser-entered credentials for channels without a config.yaml entry and does not resurrect a channel disconnected from the UI. - Offload the reload's disk IO (config.yaml + runtime store) with asyncio.to_thread, matching the blocking-IO policy on this branch. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> |
||
|
|
b8f5ed360f |
fix(skills): keep skill archive installation off the event loop (#3505)
* fix(skills): keep skill archive installation off the event loop ainstall_skill_from_archive — the async entry point awaited by the gateway POST /skills/install route — ran its entire filesystem pipeline inline on the event loop: zip extraction, frontmatter validation, rglob enumeration, per-file read_text, shutil.copytree staging, and tempdir cleanup. Restructure into offloaded phases: prepare (extract + validate) and commit (stage + move) run via asyncio.to_thread, the tempdir lifecycle is offloaded, and the security scanner's file enumeration and reads move off the loop — only the per-file LLM scan (genuinely async) stays awaited. Security decision logic and exception contract are unchanged. Anchor: tests/blocking_io/test_skills_install.py drives the real install pipeline (real .skill archive, real FS; only scan_skill_content stubbed) under the strict Blockbuster gate. Verified red on pre-fix code (BlockingError: os.stat), green with the fix. * fix(skills): log temp-dir cleanup failures instead of swallowing them Review follow-up on the install offload: rmtree(ignore_errors=True) kept the primary install exception but silently leaked the extraction dir on cleanup failure. Keep the never-mask behaviour, add a warning log. * fix(skills): bound install tmp cleanup and pass skill_dir explicitly (review) - Wrap the best-effort temp-dir cleanup in asyncio.wait_for (5s) so a hung filesystem in the finally block cannot stall or mask the install outcome; timeout is logged like the existing OSError path. - Hoist _collect_scannable_files to module level with skill_dir as an explicit argument instead of a closure capture. |
||
|
|
330a2ff8c5 |
feat(community): add SearXNG and Browserless web search/fetch tools (#3451)
* feat(community): add SearXNG and Browserless web search/fetch tools - SearXNG web_search: privacy-focused meta search engine integration with configurable base_url via config.yaml tool settings - Browserless web_fetch: headless browser page fetching with readability article extraction - Both tools are fully configurable through tool config section - No external API keys required for basic operation * fix: address PR review feedback and add unit tests - Guard config.model_extra against None values (review #1, #2) - Coerce max_results to int when reading from config (review #2) - Fix web_fetch_tool to use direct HTTP fetch instead of reusing the web_search client config (review #3) - Fix misleading docstring for SearxngClient.fetch (review #4) - Remove unused target_url variable to pass Ruff lint (review #5) - Normalize bool config values with _normalize_bool helper to handle env-resolved string values correctly (review #6) - Add unit tests for both SearXNG and Browserless client classes and their tool functions with mocked httpx (review #7, #8) * fix: convert to async httpx to avoid blocking I/O on event loop - Replace httpx.Client with httpx.AsyncClient in both client classes - Convert tool functions to async def - Wrap readability_extractor calls in asyncio.to_thread() - Update all tests to use pytest.mark.asyncio and async mocks - Fix import sorting to pass Ruff lint * fix(browserless): replace deprecated waitUntil with waitForEvent The Browserless API has deprecated the waitUntil parameter. Replace with waitForEvent which accepts values like 'networkidle'. Default is empty (no wait), configurable via config.yaml. * fix(browserless): remove deprecated gotoTimeout and bestAttempt params The Browserless /content API does not accept gotoTimeout or bestAttempt as top-level payload keys. These were being sent unconditionally, causing 400 Bad Request errors on current Browserless versions. Changes: - Remove goto_timeout_ms parameter and 'gotoTimeout' from payload - Remove best_attempt parameter and 'bestAttempt' from payload - Remove _normalize_bool helper (no longer needed) - Remove goto_timeout_ms and best_attempt config reading in tools.py - Add tests for waitForSelector and reject params - Verify no deprecated params are sent in test_fetch_html_success * refactor(searxng): remove web_fetch_tool, decouple from web_search config SearXNG is a search engine — it should only provide web_search_tool. The web_fetch responsibility belongs to Browserless (headless Chrome) or Jina AI, not SearXNG. Changes: - Remove web_fetch_tool from SearXNG tools.py and __init__.py - Remove SearxngClient.fetch() method (no longer needed) - Remove unused asyncio/readability imports from SearXNG tools.py - Add test for max_results string-to-int coercion from config - Add test for search with categories parameter - Add test for httpx.RequestError handling - Apply ruff format fixes to browserless_client.py and test files |
||
|
|
f401e7baa6 |
[codex] Fix stale AIO sandbox cache reuse (#3494)
* Fix stale AIO sandbox cache reuse * Address AIO sandbox review feedback * Distinguish sandbox health check failures * Keep local discovery recoverable when the runtime check fails LocalContainerBackend.discover() shares _is_container_running, which now raises on transient daemon errors instead of returning False. Discovery has no exception handling in _discover_or_create_with_lock(_async), so a brief Docker hiccup turned a recoverable "could not verify, create instead" into a hard acquire failure. Catch the check failure inside discover() and return None so an unverifiable container is simply not adopted, restoring the pre-change fall-through while keeping raise-on-unknown semantics protecting the destroy path. Reported by fancy-agent on PR #3494. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Narrow the not-found match in container inspect error handling A bare "not found" substring also matches transient failures like "command not found" or "context not found", which would misclassify a check error as "container definitely gone" and bypass the raise-on-unknown contract. Keep Docker's specific "No such object"/"No such container" phrases, and only trust a generic "not found" (Apple Container) when the message names the inspected container or refers to a container/object. Reported by WillemJiang on PR #3494. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> |
||
|
|
919d8bc279 |
fix(sandbox): persist lazily-acquired sandbox state via Command (#3464)
* fix(sandbox): persist lazily-acquired sandbox state via Command
ensure_sandbox_initialized mutates runtime.state in place, which is local
to the current tool invocation and is not picked up by LangGraph's channel
reducer. Subsequent graph steps and downstream consumers (such as
ToolOutputBudgetMiddleware and the sub-agent task_tool) therefore cannot
observe the sandbox id from state.
Wrap tool calls in SandboxMiddleware (wrap_tool_call / awrap_tool_call) to
detect fresh lazy initialization by diffing runtime.state before and after
the handler, and emit a proper state update via Command(update=...):
- ToolMessage results are wrapped into Command(update={sandbox, messages})
- Command results with a dict update are merged on the sandbox key while
preserving messages / goto / graph / resume
- Command results with non-dict updates are left untouched to avoid silent
data loss on unknown update shapes
Tests:
- 7 new unit tests cover lazy-init emit, passthrough, dict-update merge,
non-dict-update passthrough (sync and async)
- Refresh replay golden write_read_file.ultra.events.json: SSE 'values'
events now correctly carry the 'sandbox' key in their keys list, which
is the direct evidence that the fix is effective
Closes #3463
* refactor(sandbox): use dataclasses.replace to preserve Command fields
Address Copilot review on #3464: replace manual field-copy with
dataclasses.replace so any current or future Command fields are
preserved automatically when merging sandbox_update.
Also add a regression test that constructs a Command with non-None
graph/goto/resume to lock this behavior in.
|
||
|
|
b3c2cc42cf |
fix(agents): require config.yaml in resolve_agent_dir to skip memory-only directories (#3390) (#3481)
When memory is enabled, the first conversation with a legacy shared agent creates a per-user agent directory containing only memory.json (no config.yaml). On the second turn, resolve_agent_dir() returned this incomplete directory, causing load_agent_config() to fail with "Agent config not found". Require config.yaml to exist alongside the directory for both the per-user and legacy paths, so that memory-only directories fall through correctly. This aligns resolve_agent_dir with the existing config.yaml check in list_custom_agents. Refs: https://github.com/bytedance/deer-flow/issues/3390 |
||
|
|
167ef4512f |
feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429) (#3465)
* feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429) Add a `memory.token_counting` option (`tiktoken` | `char`) so deployments in network-restricted environments can opt out of tiktoken entirely. In `char` mode the memory-injection budget uses a network-free character-based estimate and never triggers the BPE download from openaipublic.blob.core.windows.net, which could otherwise block for tens of minutes (see #3402). Also harden the default `tiktoken` path: - cache an in-flight LOADING sentinel so concurrent callers fall back immediately instead of spawning more blocking get_encoding threads when the first load is still running (e.g. under the 5s startup warm-up timeout); - cache failures with a timestamp and retry after a cooldown so a transient network outage self-heals back to accurate counting without a restart; - skip startup warm-up entirely in char mode. The new config is surfaced via the memory config API and config.example.yaml (config_version bumped). Default remains `tiktoken`, so existing deployments are unaffected. * fix(memory): use CJK-aware char token estimate and address review feedback - Replace the flat len(text)//4 fallback with a CJK-aware estimate so Chinese/Japanese/Korean memory content does not over-fill the injection budget - Document the internal tiktoken retry cooldown and char-mode escape hatch - Sync CLAUDE.md / config.example.yaml / MEMORY_IMPROVEMENTS.md wording - Fix MemoryConfigResponse mocks/assertions and add CJK estimate tests |
||
|
|
a57d05fe0a | fix runtime journal run lifecycle events (#3470) | ||
|
|
ae9e8bc0bf |
fix(sandbox): make missing sandbox.mounts host_path a loud ERROR (#3244) (#3250)
In Docker production deployments, LocalSandboxProvider runs inside the
deer-flow-gateway container, so any `sandbox.mounts[].host_path` from
config.yaml is resolved against the gateway container's filesystem — not
the host machine. When the path isn't also bind-mounted into the gateway
service, the mount was silently dropped with only a WARNING log, leaving
agents reading an empty directory in production while the same config
worked under `make dev`.
Escalate the missing-host_path branch to logger.error with explicit
guidance about Docker bind mounts and docker-compose, so the failure is
hard to miss in default log configurations. Skip behaviour is preserved
to avoid breaking existing deployments.
Also clarify the misleading `VolumeMountConfig.host_path` field
description so it documents reality for both providers:
- LocalSandboxProvider checks host_path from inside the gateway process
(host in `make dev`, container in `make up`).
- AioSandboxProvider (DooD) passes host_path straight to `docker -v`
for the sandbox container, where the host Docker daemon resolves it
from the host machine's perspective.
config.example.yaml's `sandbox.mounts` comment gets a Note: block
pointing operators at the docker-compose bind-mount requirement so the
Docker-mode gotcha is discoverable from the canonical template.
Adds a regression test that:
- confirms missing host_path is still skipped (no behaviour break);
- asserts an ERROR record is emitted referencing the offending paths;
- asserts the message contains actionable Docker/gateway/docker-compose
keywords so future refactors can't quietly downgrade it.
Refs: https://github.com/bytedance/deer-flow/issues/3244
|
||
|
|
16391e35ab |
fix(skills): harden slash skill activation across chat channels (#3466)
* support slash skill activation * format slash skill activation * Preserve slash skill activation with uploads * Address slash skill review feedback * Address slash skill follow-up review * Fix lazy slash skill storage resolution * Keep slash skill activation out of system prompt * Address slash skill review issues * fix: harden slash skill command handling * feat(frontend): add slash skill autocomplete * fix: address slash skill review feedback * fix: preserve slash skill text for IM uploads |
||
|
|
37337b77f9 |
feat(models): add StepFun reasoning model adapter (#3461)
Add PatchedChatStepFun adapter for StepFun reasoning models (step-3.7-flash, step-3.5-flash). Captures reasoning from both streaming and non-streaming responses and replays it on historical assistant messages for multi-turn tool-call conversations. - New: PatchedChatStepFun adapter with streaming/non-streaming reasoning capture - Support both reasoning and reasoning_content field names - 17 unit tests covering all response paths - Updated: config.example.yaml with StepFun configuration example |
||
|
|
8db16bb3d8 |
fix(config): coerce null config.yaml list sections to empty list (#3434)
Copying config.example.yaml to config.yaml and starting DeerFlow crashed with `pydantic ValidationError: models — Input should be a valid list [input_value=None]`, because the example ships every entry under `models:` commented out, so PyYAML parses the key as null. Reported in #1444. Add a field_validator(mode="before") on AppConfig that coerces null models/tools/tool_groups to [] (matching their default_factory=list), and emit an actionable warning from from_file when no models are configured (pointing to config.example.yaml / make setup). Adds regression tests. Closes #1444 Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
0fb18e368c |
refactor(lead-agent): make build_middlewares public to drop the last cross-module private import (#3458)
`client.py` imported the private `_build_middlewares` from `agent.py` across a module boundary and called it as public API. Because the `_` name signals "module-private, no external callers", any future rename or signature change silently breaks the embedded `DeerFlowClient` path — and the test suite even monkeypatched `deerflow.client._build_middlewares`, baking the leak in. `DeerFlowClient` is a lead-agent variant that genuinely needs the lead agent's full middleware composition, so make the dependency honest: promote the helper to a documented public entry point `build_middlewares` and update every in-repo caller. Found during #3341 review; #3341 already removed one such leak (`_assemble_deferred` -> public `assemble_deferred_tools`) and left this one out of scope on purpose. - agent.py: rename def + both internal call sites; expand the docstring into a public-entry-point contract and document the previously-undocumented model_name / app_config / deferred_setup params - client.py: import + call site now use the public name (removes the last cross-module private import) - scripts/tool-error-degradation-detection.sh: update its import + call site - tests (5 files): update monkeypatch/patch targets and direct calls - docs (backend/CLAUDE.md, plan_mode_usage.md, middlewares.mdx): sync the live references that describe the symbol as current API Pure mechanical rename, no behavior change. Historical design docs (rfc, superpowers spec) intentionally keep the old name as point-in-time records. Closes #3431 |
||
|
|
f92a26d56f |
fix(web_fetch): support proxy for Jina reader in restricted networks (#3418) (#3430)
* fix(web_fetch): support proxy for Jina reader in restricted networks The web_fetch tool built a bare httpx.AsyncClient() with no proxy awareness, so users behind a corporate proxy / in Docker / WSL could not reach https://r.jina.ai and web_fetch timed out. - Add optional `proxy` / `trust_env` params to JinaClient.crawl and wire them from the `web_fetch` tool config (with type coercion for YAML string values). - Pass internal service hostnames through NO_PROXY in both compose files so proxy env inherited via env_file does not break in-cluster calls (gateway/provisioner/etc). - Load proxy vars from .env into the shell in scripts/docker.sh so the NO_PROXY interpolation can merge user-provided values on `make` path. - Document proxy/trust_env options in config.example.yaml. Closes #3418 * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
3b6dd0a4e3 |
feat(subagents): extend deferred MCP tool loading to subagents (#3432)
* feat(subagents): extend deferred MCP tool loading to subagents (#3341) Subagents now reuse the lead agent's deferred-tool path: when tool_search.enabled, MCP tool schemas are withheld from the model and surfaced by name in <available-deferred-tools>, fetched on demand via the generated tool_search helper. DeferredToolFilterMiddleware deterministically rewrites request.tools to hide the deferred schemas (the prompt section is discovery only, not enforcement). Consolidates the assembly into deerflow.tools.builtins.tool_search, now the single home for both assemble_deferred_tools (centralized fail-closed guard, replacing the lead-only private _assemble_deferred) and the relocated get_deferred_tools_prompt_section. Shared by every build path: lead agent, embedded client, and subagent executor. tool_search is appended after the subagent's name-level tool policy and is treated as infrastructure: its catalog is built from the already policy-filtered list, so it can never surface a tool the policy denied. Follow-up to #3370. Fixes #3341. * test(subagents): assert the real middleware builder emits a working deferred filter (#3341) The existing recipe test hand-constructs DeferredToolFilterMiddleware, so it cannot catch a regression in how build_subagent_runtime_middlewares (the call executor._create_agent actually makes) wires the deferred setup into the filter. Add a test that sources the filter from the real builder given a real setup and runs it through a graph: a wrong catalog hash would silently stop promotion, a dropped filter would stop hiding — both now caught. Running the full real middleware stack is intentionally avoided (the other runtime middlewares need sandbox/thread infra to execute, which would make the test flaky); their attachment + ordering before Safety stays locked in test_tool_error_handling_middleware.py. * test(subagents): keep executor tests config-free in CI * chore: trigger ci * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
cd5bedaa74 |
feat: MiniMax provider for image/video/podcast skills + new music-generation skill (#3437)
* docs(spec): MiniMax integration for generation skills + new music skill Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(plan): MiniMax generation providers implementation plan Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(skills): add importlib loader + FakeResp for skill tests * test(skills): register loaded module in sys.modules; raise requests.HTTPError in FakeResp * feat(image-generation): add MiniMax provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(image-generation): guard unknown provider, derive ref MIME, strengthen tests Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(video-generation): add MiniMax provider with async poll/download Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(video-generation): surface base_resp errors while polling; add timeout test * feat(podcast-generation): add MiniMax t2a_v2 provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(podcast-generation): restore TTS credential guard; add volcengine + voice tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(music-generation): new MiniMax music skill via skill-creator Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(music-generation): treat empty lyrics as absent; test no-audio-data path * refactor(skills): add request timeouts to MiniMax network calls Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Potential fix for pull request finding 'Explicit returns mixed with implicit (fall through) returns' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * fix(models): strip inconsistent user-message names for MiniMax chat DeerFlow middlewares tag user messages with provenance names (user-input, summary, loop_warning); langchain serializes them into the OpenAI-compatible payload and MiniMax rejects mismatched user-message names with "user name must be consistent (2013)". PatchedChatMiniMax now drops the per-message name from user-role messages. Point the config.example MiniMax models at PatchedChatMiniMax so they also get reasoning_content mapping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(image-generation): MiniMax sends JSON prompt field, guard 1500-char limit MiniMax image-01 takes one text string capped at 1500 chars, but the skill was sending the whole structured JSON. The MiniMax provider now extracts the JSON `prompt` field (relying on prompt_optimizer to expand it) and fails fast with a clear error before calling the API when that field exceeds 1500 chars. Authoring stays provider-agnostic; Gemini still receives the full JSON. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(podcast-generation): per-provider TTS concurrency and retry/backoff Each TTS provider owns its concurrency internally — MiniMax runs single-threaded to reduce rate-limit failures, Volcengine keeps 4 workers — with automatic retry and backoff on transient HTTP and base_resp errors. No caller-facing concurrency knob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(skills): address Copilot review comments on generation skills - video: add raise_for_status + timeout to the Gemini download/POST/poll calls so non-2xx responses surface as clear HTTP errors instead of JSON/KeyError or hangs - video: check the task Fail status before the generic base_resp check so the failure keeps its task_id context - video/image: create the output file parent directory before writing (matching music-generation) so nested output paths do not raise FileNotFoundError - music: require a non-empty prompt and fail fast with ValueError instead of sending an empty prompt to the API Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(scripts): reclaim dev ports across worktrees in make stop/dev All deer-flow worktrees (main checkout + linked worktrees) hardcode the same dev ports (8001/3000/2026), so a service started from any worktree must be reclaimable from another. stop_all now resolves the set of worktree roots (DEERFLOW_ROOTS) and treats a process as deer-flow-owned when its open files live under any of them. It also force-kills survivors on 2026 alongside 8001/3000, fixing `make dev` aborting on the nginx port preflight when a prior nginx lingered on 2026. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(view-image): hide the injected image-context message from the UI ViewImageMiddleware injects a HumanMessage (text + base64 images) so the vision model can see viewed images, but it was the only internal injector that set neither hide_from_ui nor a hidden name, so it leaked into the chat UI (and IM channels) as a user bubble reading "Here are the images you've viewed:". Mark it with additional_kwargs={"hide_from_ui": True}, matching todo/dynamic_context injections, which the frontend isHiddenFromUIMessage and the channel sender already honor. The model still receives the full content. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(minimax): mark M2.7 models as text-only (no vision) MiniMax M2.7 / M2.7-highspeed do not support vision; only M3 does. The provider config asserted vision support for M2.7 in four places. - config.example.yaml: 4 M2.7 entries -> supports_vision: false - backend/docs/CONFIGURATION.md: M2.7 + highspeed -> supports_vision: false - wizard: add LLMProvider.model_vision_overrides + extra_config_for() so selecting an M2.7 model writes supports_vision: false while M3 (default) keeps vision; wire it through setup_wizard.py - tests: M2.7-highspeed fixture -> supports_vision=False; add test_minimax_vision_is_per_model Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> |
||
|
|
64d923b0fd |
fix(middleware): externalize oversized tool output into sandbox for non-mounted sandboxes (#3417)
* fix(middleware): externalize oversized tool output into sandbox for non-mounted sandboxes
ToolOutputBudgetMiddleware persisted oversized tool results to the host
filesystem and returned a /mnt/user-data/outputs virtual path. For sandboxes
that do not use thread-data mounts (e.g. remote AIO sandbox), that virtual
path does not exist inside the sandbox, so the model's read_file tool could
not read it back and reported 'file not found'.
Branch on SandboxProvider.uses_thread_data_mounts:
- Mounted sandboxes (local Docker, AIO + LocalContainerBackend) keep the
original host-disk path; the host outputs dir is bind-mounted to the same
virtual path inside the sandbox, so behavior is unchanged.
- Non-mounted (remote) sandboxes externalize into the sandbox itself via
execute_command('mkdir -p ...') + write_file + 'test -s' validation. The
validation step is required because AIO sandbox execute_command returns
'Error: ...' as a string on failure instead of raising, so a silent mkdir
failure would otherwise leak through.
Any failure (rejected subdir, mkdir/write/validate error) falls back to the
existing inline head+tail truncation, so an unreadable path is never returned
to the model.
The sandbox resolver reads the sandbox_id that SandboxMiddleware already
writes into runtime.state['sandbox']; it never calls provider.acquire(),
keeping the tool-call hot path free of blocking I/O. Tools that do not use a
sandbox (web_search, MCP, ...) resolve to None and fall through to inline
truncation, which is the safe behavior for them.
Fixes #3416
* fix(middleware): address Copilot review feedback on sandbox externalization
- Make get_sandbox_provider() lookup best-effort in _budget_content: only
query when outputs_path or sandbox is available, and fall back to inline
truncation if provider initialization raises rather than propagating
the error. A resolved sandbox instance is sufficient on its own to take
the non-mounted externalization branch.
- Strict-match the sandbox post-write validation echo
(check.strip() == 'OK') to avoid false positives if execute_command
ever surfaces unrelated stdout/stderr containing 'OK' as a substring.
Refs: #3417
* test: fix flaky tests relying on /nonexistent/... path under container root
Two tests in this module (test_returns_none_on_invalid_path and
test_fallback_when_disk_write_fails) used paths like
'/nonexistent/impossible/path' to trigger _externalize's OSError
fallback. These paths are creatable when the test process runs as root
inside the CI container: os.makedirs(..., exist_ok=True) successfully
creates the entire chain under /, so the OSError branch is never hit
and the tests fail. Reproducible on main independently of this PR.
Switch to '/dev/null/cannot-mkdir-here'. /dev/null is a character
device on both Linux and macOS, so os.makedirs always fails with
NotADirectoryError regardless of privileges, reliably exercising the
OSError fallback.
* fix(tool-output-budget): only consult sandbox provider when a sandbox is resolved
The previous revision called get_sandbox_provider() whenever externalization
was triggered, including on the legacy host-disk path. Environments without
a configured sandbox -- in particular CI runners without a config.yaml --
would raise FileNotFoundError there, get caught, and silently fall back to
inline truncation. That defeated the host-disk externalization path that
predates this PR and was the root cause of the regressing legacy tests.
Restructure the branching so the provider is only consulted when a sandbox
has actually been resolved for the current tool call:
- sandbox resolved + provider.uses_thread_data_mounts: host-disk write
(bind-mounted into the sandbox, equivalent to a sandbox-side write).
- sandbox resolved + non-mounted provider: sandbox write (#3416).
- no sandbox + outputs_path: host-disk write
(legacy / non-sandbox tools, no provider call at all).
- otherwise: inline fallback.
No test changes; the legacy externalization tests are provider-agnostic by
construction and now pass without monkeypatching.
Refs: #3416
* test(tool-output-budget): assert legacy path does not call sandbox provider
Lock in the contract introduced by
|
||
|
|
519200728a |
fix(middleware): offload memory injection off event loop to prevent tiktoken blocking (#3402) (#3411)
* fix(middleware): offload memory injection off event loop to prevent tiktoken blocking (#3402) DynamicContextMiddleware.abefore_agent() called _inject() synchronously on the asyncio event loop. The first time memory is injected (second request), _inject() → format_memory_for_injection() → _count_tokens() → tiktoken.get_encoding("cl100k_base") needs to download the BPE data from openaipublic.blob.core.windows.net. In network-restricted environments this download blocks until the OS TCP timeout (~26 min), starving ALL concurrent handlers including /api/v1/auth/me. Fix: - abefore_agent now uses asyncio.to_thread(self._inject, state) so file I/O and tiktoken never block the event loop. - Extract _get_tiktoken_encoding() with a module-level cache so tiktoken.get_encoding() is called at most once per encoding name. - Add warm_tiktoken_cache() startup helper; gateway lifespan pre-warms the cache via asyncio.to_thread so the first request never triggers a cold download. - _count_tokens falls back to len(text) // 4 on any encoding failure. Tests: - tests/test_tiktoken_cache_and_count_tokens.py (12 tests): cache hit/miss, fallback paths, warm-up helper. - tests/blocking_io/test_dynamic_context_middleware.py (2 tests): Blockbuster gate verifies abefore_agent does not block the event loop; async/sync parity check. Fixes #3402 * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix the lint error * fix(memory): use future annotations to avoid NameError when tiktoken is absent Add `from __future__ import annotations` to prompt.py so that tiktoken.Encoding type hints are never evaluated at runtime. Without this, environments where tiktoken is not installed could raise NameError on the module-level cache and function return annotations. Addresses Copilot review comment on PR #3411. * fix(middleware): bound abefore_agent injection with timeout to prevent hung requests Wrap the asyncio.to_thread(self._inject) offload in asyncio.wait_for() with a 5-second cap. If the startup warm-up failed silently (e.g. network blip during deploy), a cold tiktoken BPE download on the first request can block until the OS TCP timeout (~26 min). The bounded timeout ensures the request degrades gracefully (no memory/date context for that turn) rather than hanging. Adds test_abefore_agent_returns_none_on_timeout to the blocking-IO regression anchors. Addresses review feedback from xg-gh-25 on PR #3411. --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
f725a963d5 |
fix(runtime): protect sync singleton init and reset (#3413)
* fix(runtime): protect sync singleton init/reset with threading.Lock * fix(runtime): serialize sync singleton init and reset * make format * test(runtime): assert store reset creates new singleton * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(runtime): load config outside singleton locks * fix(runtime): share checkpointer config loading helper --------- Co-authored-by: GODDiao <diaoshengjia@gmail.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> |
||
|
|
10c1d9f417 | fix(search): fix DDGS Wikipedia region handling (#3423) | ||
|
|
8d2e55a05f |
fix(subagent): structured subagent_status field over text parsing (#3146) (#3154)
* fix(subagent): structured subagent_status field over text parsing Closes #3146. ## Why The frontend used to derive subtask card state by string-matching the leading text of the `task` tool's result. That contract surface was fragile — `#3107` BUG-007 and the `#3131` review both surfaced cases where new backend wording (`Task cancelled by user.`, `Task polling timed out after N minutes`, `ToolErrorHandlingMiddleware` exception wrappers) silently broke the card lifecycle. The frontend fallback kept growing more prefixes; any future rewording would break it again. ## Design 1. **Backend → frontend contract**: `ToolMessage.additional_kwargs` carries `subagent_status` (one of `completed | failed | cancelled | timed_out | polling_timed_out`) and an optional `subagent_error` blob. The frontend prefers it over parsing `content`. 2. **Centralised stamping, not 8 sprinkled stamps**: rather than have each of `task_tool.py`'s 5 normal-return + 3 pre-execution `Error:` paths remember to set `additional_kwargs`, `ToolErrorHandlingMiddleware` stamps the field after every task-tool call. Adding a new return path in `task_tool.py` cannot now skip the stamp. 3. **Cross-language contract fixture**: the prefix→status mapping is the one piece both sides must agree on. The shared fixture at `contracts/subagent_status_contract.json` lists every backend return string, the expected status, and what the error substring should contain. Backend test (`backend/tests/test_subagent_status_contract.py`) and frontend test (`frontend/tests/unit/core/tasks/subtask-result.test.ts`) both load that fixture and assert the same cases. A wording drift on either side fails the matching language's test. 4. **Round-trip serialisation pinned**: the round-trip test asserts `ToolMessage.model_dump_json()` → `model_validate_json()` preserves `additional_kwargs.subagent_status`. Catches the case where a future LangChain or Pydantic upgrade silently strips unknown kwargs. 5. **Frontend status collapse documented**: the backend has five status values, the frontend card has three (`completed | failed | in_progress`). `cancelled` / `timed_out` / `polling_timed_out` all collapse to `failed` with the original status preserved in `error`. `parseSubtaskResult` returns `in_progress` for unknown values so a backend that ships a new enum variant before the frontend upgrades degrades to the legacy prefix fallback instead of getting pinned. ## Changes Backend: - `deerflow.subagents.status_contract` — new module exporting `SUBAGENT_STATUS_KEY`, `SUBAGENT_ERROR_KEY`, `SUBAGENT_STATUS_VALUES`, `extract_subagent_status(content)`, and `make_subagent_additional_kwargs(status, error)`. - `ToolErrorHandlingMiddleware`: new `_stamp_task_subagent_status` helper centralises the stamp; `wrap_tool_call` / `awrap_tool_call` stamp on the success path; `_build_error_message` stamps on the wrapper path (carrying `ExcClass: detail` into `subagent_error`). Non-task tools are untouched. - New tests: `test_subagent_status_contract.py` (19 cases from the shared fixture + status-enum / blank-error / unknown-status rejection) and `test_tool_error_handling_subagent_stamp.py` (middleware integration: terminal-content stamps, non-terminal doesn't, non-task tools untouched, async path mirrors sync, existing additional_kwargs survive, JSON round-trip preserved). Frontend: - `parseSubtaskResult(text, additionalKwargs?)` — prefers the structured stamp; falls back to the legacy prefix matcher for historical threads / unknown future status values. - `STRUCTURED_STATUS_TO_SUBTASK` documents the five→three collapse. - `message-list.tsx` passes `message.additional_kwargs` through. - `subtask-result.test.ts` adds a structured-status block + a fixture-driven contract block; legacy prefix tests stay green for the fallback path. Contract: - `contracts/subagent_status_contract.json` — single source of truth both languages load. Whitespace variants, varied N for polling timeouts, the 3 pre-execution `Error:` returns task_tool produces, and the middleware wrapper shape are all in there. ## Test plan - `make lint` clean (backend + frontend). - `pytest tests/test_subagent_status_contract.py tests/test_tool_error_handling_subagent_stamp.py` → 37 passed. - `pnpm test --run` → 103 passed (was 76, +27 new). ## Migration / fallback retirement The text-prefix fallback stays in place until backend telemetry shows the frontend never hits it for newly produced messages. At that point a follow-up PR can drop the prefix branches and keep only the structured-status branch. Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131 (prior prefix-only fix), #3146 (this issue). * fix(subtask): back-fill result/error from text when structured status present Three follow-ups on the PR #3154 review: 1. `readStructuredStatus` no longer short-circuits the prefix parse. The backend currently stamps only the `subagent_status` enum value; the human-facing `result` body and wrapped-error message still live in `ToolMessage.content`. Dropping the text parse meant successful tasks rendered empty completed pills and wrapped failures lost their diagnostic. Now both shapes get composed: structured status wins, `result`/`error` come from text when both sides agree, and a lying success body under a `failed` stamp is dropped instead of leaking. 2. Replace the ESM-incompatible `__dirname` fixture lookup in subtask-result.test.ts with `fileURLToPath(new URL(..., import.meta.url))`. The frontend package is `"type": "module"`, so the previous path would have thrown at runtime if anything ever changed under the contract directory. 3. Drop the `$schema` reference from contracts/subagent_status_contract.json pointing at a file that doesn't exist in the tree. Three new tests cover the structured + text composition: completed back-fills the success body, failed back-fills the wrapper text, and unrecognised content under a `failed` stamp stays empty rather than echoing noise. |
||
|
|
d8b728f7cb |
fix(mcp): close stdio sessions on their owning loop to avoid cross-task cancel-scope error (#3379) (#3392)
* fix(mcp): close stdio sessions on their owning loop to avoid cross-task cancel-scope error (#3379) Adopt an owner-task lifecycle for pooled MCP ClientSessions so each session is entered, initialized, and exited within a single asyncio task on its owning event loop. This eliminates the anyio "Attempted to exit cancel scope in a different task than it was entered in" RuntimeError that surfaced when stdio MCP tools were used via the sync tool wrapper (which spins up and tears down event loops across tasks). Also harden the pool lifecycle: - track in-flight session creation per (server, scope) to dedupe concurrent get_session() calls for the same key - make close_scope/close_server/close_all/close_all_sync cover both established entries and in-flight creations so sessions cannot be resurrected or leaked after close - handle cross-loop preemption of an in-flight creation by cancelling the stale owner task instead of only signalling it - define close_all_sync() semantics for a running loop on the current thread (signal-only, async completion) and route reset_mcp_tools_cache through a deterministic async close in that case * fix(mcp): avoid reset deadlock on running loop cache reset * fix(mcp): address session pool review feedback |
||
|
|
befe334f10 |
fix(config): make the reload boundary discoverable from code (#3144) (#3153)
* fix(config): make the reload boundary discoverable from code, not just docs Closes #3144. The hot-reload contract — per-run fields are resolved through `get_app_config()` on every request, infrastructure fields snapshot at gateway startup — landed in `backend/CLAUDE.md` as part of #3131. A maintainer reading `get_config()` or an `AppConfig` field still had to context-switch to that document to know which fields require a process restart, and there was no enforcement that the prose list stayed in sync with the code. This commit moves the boundary to a machine-readable single source of truth and surfaces it where the code lives: - New `deerflow.config.reload_boundary` module owns the registry of restart-required fields (`STARTUP_ONLY_FIELDS`) and a tiny helper API (`is_startup_only_field`, `iter_startup_only_field_paths`, `format_field_description`). The standardised `"startup-only:"` prefix is exported as `STARTUP_ONLY_PREFIX` so future scanners / lint hooks / doc generators can pivot off it without re-parsing prose. - `AppConfig`'s `database`, `checkpointer`, `run_events`, `stream_bridge`, `sandbox`, and `log_level` fields now build their `Field(description=...)` from `format_field_description(...)`. The same text shows up in IDE hover (Pydantic v2 exposes `description` via `model_fields[...]`). - `channels` is restart-required too but lives outside the AppConfig Pydantic schema (the config section is consumed directly by `start_channel_service`). The registry owns it so the boundary is not split between two places. - `get_config()` docstring points to the registry instead of leaving the reader to find `CLAUDE.md`. The `CLAUDE.md` table collapses to a one-liner pointing back at `reload_boundary.py` so the boundary has one canonical location, not two. Drift coverage in `tests/test_reload_boundary.py`: - Every registered field has a non-trivial reason. - Iterator / membership helpers stay in sync with the dict. - Every registry entry that maps to an `AppConfig` field also carries the `"startup-only:"` prefix in the schema (catches "forgot to update the schema"). - Reverse drift: any AppConfig field whose description starts with the prefix must be registered (catches "marked restart-required in the schema but forgot the registry"). - The runtime introspection that IDE hover depends on (`AppConfig.model_fields["database"].description`) is pinned, so a future Pydantic upgrade or schema swap that breaks the hover surface shows up as a test failure rather than a silent regression. Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131 (prior boundary fix in prose form). * fix(config): preserve field doc and correct log_level reload reason Two follow-ups on the PR #3153 review: 1. The `log_level` STARTUP_ONLY_FIELDS reason previously claimed `apply_logging_level()` mutates the root logger level. It does not: only the `deerflow` / `app` logger levels are set, and root handler thresholds are conditionally lowered so messages from those loggers can propagate. Reword to match the actual behavior so operators reading IDE hover get accurate restart guidance. 2. `format_field_description(field_path)` was the sole `Field(description=)` for every restart-required field, which silently overwrote the original human-facing documentation — most visibly the `log_level` field that used to list debug/info/warning/error and clarify that third-party libraries are not affected. Extend the helper with a keyword-only `field_doc` parameter that composes the startup-only marker with the original prose so IDE hover documents both *why* the field is restart-required and *what* it actually accepts. Updated all six restart-required AppConfig fields (`log_level`, `database`, `sandbox`, `run_events`, `checkpointer`, `stream_bridge`) to pass their original descriptions through the helper. Tests: two new cases in `test_reload_boundary.py` pin (a) the helper composition and (b) every AppConfig restart-required field still surfaces a recognisable substring of its original documentation. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
d133b1119a |
fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503) (#3378)
* fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503) The SummarizationMiddleware runs its summary LLM call inside a before_model hook. Without a nostream tag the summary tokens were captured by LangGraph's messages-tuple stream callback and broadcast to the frontend as a phantom AI message. Generate a dedicated summary model copy tagged with "nostream" (merged on top of any existing tags such as "middleware:summarize" so RunJournal attribution is preserved) and override _create_summary / _acreate_summary to invoke it directly. This avoids temporarily swapping the shared self.model, which would otherwise leak the RunnableBinding across concurrent runs and break parent logic that inspects the raw model (profile / _get_ls_params). Add regression tests covering nostream tagging, concurrent-run isolation, raw model preservation, and existing-tag merge. * fix(summarization): address nostream review feedback |
||
|
|
88e36d9686 |
fix(#3189): prevent write_file streaming timeout on long reports (#3195)
* fix(#3189): prevent write_file streaming timeout on long reports Adds a layered defense against StreamChunkTimeoutError caused by oversized single-shot write_file tool calls: - factory: default stream_chunk_timeout to 240s for OpenAI-compatible clients (overridable via ModelConfig.stream_chunk_timeout in config.yaml) - sandbox/tools: server-side 80 KB length guard on non-append write_file calls (configurable via DEERFLOW_WRITE_FILE_MAX_BYTES env var, 0 disables); rejects oversized payloads with a structured error pointing the model at str_replace or append=True - middleware: classify StreamChunkTimeoutError as transient but cap retries at 1 via per-exception _RETRY_BUDGET_OVERRIDES (same-payload retry on a chunk-gap timeout buffers the same way upstream; full 3-attempt loop would stack 6-12 min of dead air) - middleware: surface an actionable user-facing message for stream-drop exceptions instead of leaking the raw langchain stack - prompts: add a routing-style File Editing Workflow hint to both lead_agent and general_purpose subagent prompts, pointing the model at str_replace for incremental edits (mirrors Claude Code's Edit / Codex's apply_patch) - tests: behavioural coverage for size guard, retry budget override, stream-drop user message, factory default injection Refs #3189 * fix(#3189): drop stream_chunk_timeout for non-OpenAI providers Address CR feedback on PR #3195: - factory: pop `stream_chunk_timeout` from kwargs for any model_use_path other than `langchain_openai:ChatOpenAI` instead of returning early. `ModelConfig.stream_chunk_timeout` is part of the shared schema, so a user-supplied value on a non-OpenAI provider would otherwise be forwarded to its constructor and raise `TypeError: unexpected keyword argument`. - factory: rewrite docstring to describe the actual `exclude_none=True` behaviour (explicit null is excluded and falls back to the default) instead of the misleading "None falling out via exclude_none=True keeps its value". - tests: add regression coverage asserting the kwarg is stripped before reaching a non-OpenAI provider's constructor. Refs: bytedance#3189 * fix(#3189): restrict stream-drop user copy to StreamChunkTimeoutError only Per CR on #3195: narrow _STREAM_DROP_EXCEPTIONS to StreamChunkTimeoutError. Generic httpx RemoteProtocolError / ReadError fall back to the standard 'temporarily unavailable' copy, since they routinely fire on transient network blips where the 'split the output' guidance is misleading. Retry/backoff classification is unchanged — both remain transient/retriable. Tests updated to reflect new copy, plus a symmetric regression test for ReadError. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
268fdd6968 |
fix(gateway): drain in-flight runs before closing checkpointer on shutdown (#3381)
* fix(gateway): drain in-flight runs before closing checkpointer on shutdown Chat runs execute in fire-and-forget background asyncio tasks that write checkpoints through a shared checkpointer. On shutdown, langgraph_runtime's AsyncExitStack tore down the checkpointer's postgres connection pool while those run tasks were still mid-graph. langgraph's AsyncPregelLoop._checkpointer_put_after_previous then ran its `finally: await checkpointer.aput(...)` against the closed pool, raising psycopg_pool.PoolClosed. Because that put runs in a langgraph-internal task (not on run_agent's call stack), run_agent's try/except cannot catch it and it surfaces as "unhandled exception during asyncio.run() shutdown". Add RunManager.shutdown() to cancel and bounded-await all in-flight runs, and call it from langgraph_runtime BEFORE the AsyncExitStack closes the checkpointer, so the final checkpoint write lands while the pool is still open. The drain is bounded by a timeout so a stuck run cannot hang worker shutdown, and is shielded so a second shutdown signal cannot abandon it mid-drain and reopen the race. Closes #3373 * fix(gateway): address review — preserve completed-run status, bound drain persistence Addresses Copilot review on #3381: - RunManager.shutdown(): decide run status AFTER the drain. Under the lock it now only requests cancellation; after asyncio.wait it marks/persists `interrupted` only for runs still pending or ended cancelled. A run that completes (e.g. `success`) during the drain window keeps its real terminal status instead of being unconditionally overwritten. - Bound the trailing status persistence within the timeout budget (deadline = loop.time()+timeout; gather wrapped in asyncio.wait_for) so a slow store backing off under DB pressure cannot push shutdown past the deadline. - deps: use asyncio.create_task instead of asyncio.ensure_future. - tests: wait deterministically for the run to be in-flight (poll the first checkpoint) instead of a fixed sleep; init shutdown_calls explicitly in the recovery test double; add regression test asserting a run completing during the drain keeps its status (in memory and in the store). * fix(gateway): address maintainer review — surface failed drain persists, clarify timeout constant Addresses @WillemJiang review on #3381: - shutdown(): inspect the gather result of the trailing interrupted-status persistence. _persist_status is best-effort (it catches + logs its own failure with exc_info and returns False, so it never raises out of the gather), but the aggregate result was never checked — a partial failure had no shutdown-level visibility. Now any escaped Exception is logged, and any False (a persist that did not confirm) is logged with the run_id. Added regression test test_shutdown_surfaces_failed_interrupted_persist. - deps: clarify the _RUN_DRAIN_TIMEOUT_SECONDS comment — state the actual value of _SHUTDOWN_HOOK_TIMEOUT_SECONDS (5.0s) and that both count toward the lifespan shutdown window. Kept as two separate constants (independent teardown steps that may diverge) rather than one shared "must match" value. - Verified no other test fake needs the shutdown stub: _FakeRunManager in test_worker_langfuse_metadata.py is a run_agent() argument (worker path), never injected into langgraph_runtime, so it never receives shutdown(). |
||
|
|
1aac408dd0 | fix upload file size contract (#3408) | ||
|
|
2bbc7879fa |
refactor(tool-search): consolidate MCP metadata tag and harden deferred-tool setup (#3370)
Follow-up to #3342 (deferred MCP tool loading). Maintainability cleanup plus hardening of malformed/empty tool_search queries; no change to the deferral mechanism or search ranking. - Add deerflow/tools/mcp_metadata.py as the single source of truth for the "deerflow_mcp" tag (MCP_TOOL_METADATA_KEY + tag_mcp_tool + public is_mcp_tool). Removes the duplicated magic string and the private, cross-module _is_mcp_tool import. - tool_search.search: never raise on model-generated input. Extract _compile_catalog_regex (shared compile-with-literal-fallback); return empty for empty/whitespace queries and a bare "+" instead of matching everything or raising IndexError. - DeferredToolSetup: document the empty-vs-populated invariant. - build_deferred_tool_setup: comment the two distinct empty-return branches. - _assemble_deferred: add return type, rename local to deferred_setup, build the final list with an explicit append. - Tests: use tag_mcp_tool instead of per-file tag helpers; cover empty and bare-"+" queries. |