deer-flow

mirror of https://github.com/bytedance/deer-flow.git synced 2026-06-13 10:55:59 +00:00

Author	SHA1	Message	Date
DanielWalnut	839fa99237	feat(telegram): stream agent replies by editing the placeholder message in place (#3534 ) * docs(spec): telegram streaming output design Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(plan): telegram streaming implementation plan Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(telegram): report streaming support for telegram channel Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(channels): use slack as the non-streaming sample channel in manager tests Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(telegram): register running-reply placeholder as stream target Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(telegram): pin last_edit_at sentinel in placeholder registration test Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * refactor(telegram): extract _send_new_message from send() Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(telegram): edit streamed message in place for non-final updates Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(telegram): finalize streamed message with overflow splitting When is_final=True arrives and stream state exists, pop the state, edit the streamed placeholder with the final text, split overflow into follow-up send_message calls, update _last_bot_message, and clear stream state. Falls back to _send_new_message when no stream state is registered. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(telegram): exercise the not-modified handler in final edit path Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs: telegram channel now streams replies via message editing Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(telegram): harden final-delivery path with guarded retry and chunk retries Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(channels): accept runtime 'messages' SSE event for streaming text accumulation The embedded runtime (matching LangGraph Platform semantics) emits SSE event name 'messages' for the requested 'messages-tuple' stream mode, so the manager never accumulated token deltas and streaming channels only updated from end-of-step 'values' snapshots — on Telegram this looked like 'Working on it...' followed by the full answer in one block. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(telegram): widen stream-edit throttle to 3s in group chats Telegram caps bots at 20 messages/minute per group, stricter than the 1 msg/s per-chat guideline. Groups have negative chat ids, so pick the interval by sign. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(telegram): address review findings — thread fallback messages, bound stream registry, share stream-event constants - Fallback/new stream messages now carry reply_to_message_id parsed from thread_ts so they stay nested under the user's message (finding 1) - STREAM_MODES / MESSAGE_STREAM_EVENTS constants link the requested stream modes to the SSE event names they arrive under (finding 2) - _register_stream_message bounds the in-flight registry at 256 entries, evicting oldest, guarding against leaks when a final never arrives (finding 4) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 08:38:28 +08:00
dependabot[bot]	3475f7cdad	chore(deps): bump starlette from 1.0.0 to 1.0.1 in /backend (#3546 ) Bumps [starlette](https://github.com/Kludex/starlette) from 1.0.0 to 1.0.1. - [Release notes](https://github.com/Kludex/starlette/releases) - [Changelog](https://github.com/Kludex/starlette/blob/main/docs/release-notes.md) - [Commits](https://github.com/Kludex/starlette/compare/1.0.0...1.0.1) --- updated-dependencies: - dependency-name: starlette dependency-version: 1.0.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-13 08:14:16 +08:00
dependabot[bot]	83bc2fb1ae	chore(deps): bump aiohttp from 3.13.5 to 3.14.0 in /backend (#3545 ) --- updated-dependencies: - dependency-name: aiohttp dependency-version: 3.14.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-13 08:13:49 +08:00
Huixin615	a17d2ff8f8	fix(mcp): surface admin-required state on settings tools page (#3527 ) (#3533 ) GET /api/mcp/config returns 403 for non-admin users, but the previous client returned the error body as MCPConfig, causing MCPServerList to crash with 'Cannot convert undefined or null to object' on Object.entries(config.mcp_servers). - api.ts: introduce MCPConfigRequestError; loadMCPConfig and updateMCPConfig now throw it (carrying status + isAdminRequired) instead of letting non-2xx bodies leak through as parsed config - tool-settings-page.tsx: render a friendly 'admin privileges required' empty state when the React Query error is an admin-required MCPConfigRequestError; keep MCPServerList resilient with Object.entries(servers ?? {}) and an empty-state for no servers - i18n: add settings.tools.adminRequired and settings.tools.empty in en-US, zh-CN and the Translations type - tests: cover 403 / 5xx / instanceof / detail-fallback for both loadMCPConfig and updateMCPConfig in tests/unit/core/mcp/api.test.ts Refs: #3527	2026-06-13 07:36:57 +08:00
ly-wang19	420a886e1d	fix(channels): offload blocking filesystem IO in inbound file ingestion (#3529 ) _ingest_inbound_files ensured the thread uploads dir (mkdir), enumerated it (iterdir/is_file) to de-duplicate names, and wrote each downloaded attachment (write_upload_file_no_symlink) directly on the event loop. Offload the directory prep and every per-file write via asyncio.to_thread; the genuinely async network read (file_reader) stays on the loop. Externally observable behavior is unchanged. Found via `make detect-blocking-io` (HIGH: iterdir on an async path). Add tests/blocking_io/test_channels_ingest.py anchor, verified red->green under the strict Blockbuster gate. Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 06:38:54 +08:00
ly-wang19	579e416459	perf(runtime): index messages in MemoryRunEventStore to avoid O(n) scans (#3531 ) list_messages re-scanned every event in the thread on each call (category filter + seq filter) — O(total events) per paginated request on the default run-events backend. Maintain a messages-only, seq-sorted projection of _events (shared dict refs, no copies) and locate the seq window with bisect: list_messages drops to O(log m + page) and count_messages to O(1). The index is kept in lockstep at every mutation site (put / put_batch via _put_one, delete_by_run, delete_by_thread). Externally observable behavior is unchanged — the full RunEventStore contract suite passes across memory/db/jsonl. Add a test covering pagination over non-contiguous message seqs (messages interleaved with trace events), including in-gap and exact-boundary cursors. Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 22:58:30 +08:00
AochenShen99	c002596ab4	chore(todo): remove unused completion reminder counter (#3530 )	2026-06-12 22:48:47 +08:00
AochenShen99	a838546a2b	chore(blocking-io): fail-loud repo-root resolution and shared detector CLI shim (#3512 ) * chore(blocking-io): fail-loud repo-root resolution and shared detector CLI shim The three detectors resolved REPO_ROOT with depth-indexed Path(__file__).resolve().parents[4]. If a detector file ever moves to a different directory depth, scan roots resolve under the wrong directory and the detector reports zero findings with no error — a silent-zero failure shape for a detection tool. - Add support/detectors/repo_root.py: resolve the repo root by walking upward to the .git marker (checked with exists() so git worktrees, where .git is a file, also resolve), raising RuntimeError when no marker is found. All three detectors use it at import time, so a relocated detector fails loudly instead of scanning an empty tree. - Extract scripts/_detector_cli.py from the three character-identical CLI shims; the sys.path computation lives in one place and raises when backend/tests cannot be found. - tests/test_detector_repo_root.py pins: resolution from an unmarked location raises instead of returning an empty scan; all three detectors share the resolved root; each CLI shim delegates to its detector. Testing: backend `make test` (4278 passed); smoke-ran `make detect-blocking-io`, `make detect-thread-boundaries`, and `scripts/scan_changed_blocking_io.py --base upstream/main`. Closes #3510 (review follow-up to #3503). * chore(blocking-io): declare detector modules import-only, drop script-mode residue Adversarial review caught that blocking_io_static.py and thread_boundaries.py kept shebangs and __main__ blocks but can no longer run as plain scripts: the new `from support.detectors.repo_root import` executes before anything puts backend/tests on sys.path, so direct invocation dies with ModuleNotFoundError before argparse. Direct execution was never a documented entry point (Makefile targets, the scripts/ shims, the blocking-io-guard skill, and tests all go through the support.detectors package), so converge on import-only instead of re-adding per-module bootstrap: drop the shebangs and the now unreachable __main__ blocks (plus the `import sys` they kept alive) and state the supported entry points in each module docstring. The shim delegation tests in test_detector_repo_root.py pin the supported CLI paths. Testing: backend `make test` (4278 passed); `make detect-blocking-io` and `make detect-thread-boundaries` smoke-ran.	2026-06-12 17:16:01 +08:00
zengxi	bbce6c0ac0	docs(config): add SearXNG and Browserless configuration examples (#3513 ) * docs(config): add SearXNG and Browserless configuration examples Add commented-out configuration examples for the SearXNG web search and Browserless web fetch tools introduced in PR #3451. - SearXNG: self-hosted metasearch engine (base_url, max_results) - Browserless: headless Chrome renderer (base_url, token, timeout_s, wait_for_event, wait_for_selector, reject_resource_types, etc.) Also bump config_version to 13 since the tool schema has new options. * fix(config): align defaults with code and remove unconfigured keys - SearXNG default port: 8088 (matches searxng/tools.py fallback) - Browserless default port: 3032 (matches browserless/tools.py fallback) - Remove wait_for_selector_timeout_ms, reject_resource_types, reject_request_pattern from example (not yet read from config) - Note Docker service ports differ from code defaults	2026-06-12 16:50:32 +08:00
ly-wang19	0d3bfe0a76	perf(runtime): index runs by thread_id to avoid O(n) scans in RunManager (#3499 ) * perf(runtime): index runs by thread_id to avoid O(n) scans in RunManager RunManager.list_by_thread, create_or_reject (inflight check), and has_inflight each filtered every in-memory run by thread_id — an O(total in-memory runs) scan that grows with overall gateway traffic rather than the queried thread's depth. Add a thread_id -> run_ids secondary index (an insertion-ordered dict used as an ordered set) maintained in lockstep with _runs under the existing lock at every add/remove site (create, create_or_reject, both rollbacks, cleanup). The three per-thread queries now run in O(runs-in-thread); insertion order is preserved so list_by_thread keeps stable tie-breaking. Behavior unchanged. Adds 6 regression tests; full RunManager suite 146 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(runtime): cover create_or_reject rollback + clarify thread-index guard docstrings Address review on #3499 (fancyboi999): - Reword _thread_records_locked docstring: lockstep under self._lock is the correctness guarantee; self._runs.get is one-directional defense-in-depth (drops stale ids, cannot recover index-missing ids), not reconciliation. - Add test_failed_create_or_reject_unindexes_run covering the create_or_reject rollback/unindex mutation site (the last untested mutation path). - Fix _FailingPutRunStore docstring ("initial put" -> "every put"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 16:48:47 +08:00
Xinmin Zeng	503eeac788	fix(frontend): render user messages as plain text and cap blockquote nesting (#3502 ) * fix(frontend): render user messages as plain text and cap blockquote nesting User messages are typed or pasted plain text, not authored Markdown, but they were rendered through the full Streamdown pipeline. Pasted source files got fragmented (indented chunks become code blocks, paragraphs collapse and lose indentation), "$...$" spans were KaTeX-ified, and a message with thousands of nested ">" markers overflowed the call stack in marked's recursive blockquote lexer, permanently crashing the thread on every load. Render human message content verbatim with pre-wrap instead, and cap blockquote nesting at 100 levels at the Streamdown chokepoint so model output cannot trigger the same recursion either. Closes #3500 * fix(frontend): absorb marked lexer crashes with a render fallback boundary Review found two gaps in the nesting cap: marked's list and blockquote tokenizers are mutually recursive, so a list marker in front of the quote chain ("- > > > ...") bypassed the blockquote-only regex and still overflowed the stack; and the line-based rewrite was fence-blind, silently truncating literal ">" runs inside code blocks. Add an error boundary around Streamdown that renders the raw content as plain pre-wrap text when rendering throws (retrying on the next content change), keep the cap as a fast path for the dominant pure-">" case, and make it skip fenced and indented code lines.	2026-06-12 16:15:40 +08:00
DanielWalnut	aa015462a7	feat(im): Add user-owned IM channel connections (#3487 ) * Add user-owned IM channel connections * Fix dev startup and channel connect popup * Use async channel connect flow * Harden dev service daemon startup * Support local IM channel connections * Align IM connections with local channels * Fix safe user id digest algorithm * Address Copilot IM channel feedback * Address IM channel review comments * Support all integrated IM channel connections * Format additional channel connection tests * Keep unavailable channel connect buttons clickable * Fix IM channel provider icons * Add runtime setup for enabled IM channels * Guard global shortcut key handling * Keep configured IM channels editable * Avoid password autofill for channel secrets * Make channel threads visible to connection owners * Persist IM runtime config locally * Allow disconnecting runtime IM channels * Route no-auth channel sessions to local user * Use default user for auth-disabled local mode * Show IM channel source on threads * Prefill IM channel runtime config * Reflect IM channel runtime health * Ignore Feishu message read events * Ignore Feishu non-content message events * Let setup wizard enable IM channels * Fix frontend formatting after merge * Stabilize backend tests without local config * Isolate channel runtime config tests * Address channel connection review comments * Use sha256 user buckets with legacy migration * Ensure runtime IM channels are ready after restart * Persist disconnected IM channel state * Address channel connection review comments * Address channel connection review findings Frontend connect flow: - Open the runtime-config dialog only when a provider still needs credentials; configured providers go straight to the connect flow, so the binding-code/deep-link path is reachable from the UI again. - After saving credentials, continue into the connect flow when a user binding is still required (multi-user mode) instead of stopping at a "Connected" toast. - Extract shared provider-state helpers to core/channels/provider-state and add unit + e2e coverage for the direct-connect and configure-then-connect paths. Provider status semantics: - Report connection_status from the user's newest connection row; with no binding it is not_connected, except in auth-disabled local mode where a configured running channel is effectively connected. Concurrency and event-loop correctness: - Offload ChannelRuntimeConfigStore construction and writes, channel service construction, and Slack connection replies to threads; add a tests/blocking_io/ anchor for the runtime-config handlers. - Consume binding codes with a conditional UPDATE so a code can only be used once under concurrent workers; retry upsert_connection as an update when a concurrent insert wins the unique constraint. - Serialize ensure_channel_ready per channel so concurrent provider polls cannot double-start a channel worker. Config and migration hardening: - Stop mutating the get_app_config()-cached Telegram provider config; the runtime store now owns the UI-entered bot username. - Register channel_connections in STARTUP_ONLY_FIELDS with the standardized startup-only Field description. - Match the legacy unsafe-id bucket by recomputing its exact SHA-1 name so another user's same-prefix bucket can never be migrated. - Remove the unused Telegram process_webhook_update path and document src/core/channels in the frontend docs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Address PR review comments on authz scoping and channel runtime Security (review feedback from ShenAC-SAC): - Scope internal-token callers to the connection owner carried in X-DeerFlow-Owner-User-Id instead of bypassing owner checks outright, in both require_permission(owner_check=True) and the stateless run endpoints. Internal callers keep access to their own and shared/legacy threads, and may claim a default-owned channel thread for its real owner, but a leaked internal token no longer grants cross-user thread access. - Require admin privileges for POST/DELETE /api/channels/{provider}/ runtime-config: runtime credentials and channel workers are instance-wide shared state (same model as the MCP config API). Read-only provider listing stays available to all users. Performance (review feedback from willem-bd): - Skip the redundant thread channel-metadata PATCH after the first successful backfill per thread. - Reuse the per-connection Slack WebClient until its token changes instead of constructing one per outbound message. - Reconcile channel readiness for all providers concurrently in GET /api/channels/providers. Also resolve the code-quality unused-import flag in the blocking-io anchor by pre-importing the channel service via importlib. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Fix prettier formatting in provider-state test Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Reconcile UI runtime channel config with config reload on restart Main now reloads a channel's config.yaml entry on restart_channel() (#3514, issue #3497). Adapt the user-owned connection flow to coexist: - configure_channel() restarts with reload_config=False — the caller just supplied the authoritative config (browser-entered credentials that are never written to config.yaml), so a file reload must not clobber it with the stale on-disk entry. - _load_channel_config() re-applies the UI runtime-store overlay used at startup, so an operator-triggered restart keeps browser-entered credentials for channels without a config.yaml entry and does not resurrect a channel disconnected from the UI. - Offload the reload's disk IO (config.yaml + runtime store) with asyncio.to_thread, matching the blocking-IO policy on this branch. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 15:24:58 +08:00
AochenShen99	b8f5ed360f	fix(skills): keep skill archive installation off the event loop (#3505 ) * fix(skills): keep skill archive installation off the event loop ainstall_skill_from_archive — the async entry point awaited by the gateway POST /skills/install route — ran its entire filesystem pipeline inline on the event loop: zip extraction, frontmatter validation, rglob enumeration, per-file read_text, shutil.copytree staging, and tempdir cleanup. Restructure into offloaded phases: prepare (extract + validate) and commit (stage + move) run via asyncio.to_thread, the tempdir lifecycle is offloaded, and the security scanner's file enumeration and reads move off the loop — only the per-file LLM scan (genuinely async) stays awaited. Security decision logic and exception contract are unchanged. Anchor: tests/blocking_io/test_skills_install.py drives the real install pipeline (real .skill archive, real FS; only scan_skill_content stubbed) under the strict Blockbuster gate. Verified red on pre-fix code (BlockingError: os.stat), green with the fix. * fix(skills): log temp-dir cleanup failures instead of swallowing them Review follow-up on the install offload: rmtree(ignore_errors=True) kept the primary install exception but silently leaked the extraction dir on cleanup failure. Keep the never-mask behaviour, add a warning log. * fix(skills): bound install tmp cleanup and pass skill_dir explicitly (review) - Wrap the best-effort temp-dir cleanup in asyncio.wait_for (5s) so a hung filesystem in the finally block cannot stall or mask the install outcome; timeout is logged like the existing OSError path. - Hoist _collect_scannable_files to module level with skill_dir as an explicit argument instead of a closure capture.	2026-06-12 15:17:40 +08:00
hataa	76136d22b4	fix(channels): reload config on channel restart, fixes #3497 (#3514 )	2026-06-12 14:45:22 +08:00
AochenShen99	dc2ababf00	feat(skill): add blocking-io-guard — SOP skill for blocking-IO triage and runtime anchors (#3503 ) * feat(blocking-io): add changed-lines blocking-IO scanner (L1) * feat(blocking-io): add scan-changed CLI wrapper * feat(skill): add blocking-io-guard developer SOP skill * docs(blocking-io): point contributors at the blocking-io-guard skill * style(blocking-io): apply ruff format to scanner and tests * docs(backend): document changed-lines blocking-IO scanner in CLAUDE.md * feat(skill): add post-fix re-scan check and PR batching policy * refactor(skill): fix SOP step ordering, align template with repo conventions - Move re-scan into an explicit 'apply the fix' step (was wedged after anchor generation while telling you to go back before the anchor) - Renumber steps 0-6; drop undefined 'L1' jargon - Mode A: document that the diff is <base>...HEAD (commit first) - Mode B: prefer make detect-blocking-io + findings JSON file - anchor template: module-level pytestmark per tests/blocking_io convention - CLAUDE.md: fix 'git diff --base' phrasing * fix(skill): catch findings introduced without touching the blocking line Review follow-up: changed-line intersection alone misses the case where a new async caller exposes an old sync helper — the static finding sits on the untouched blocking line, so Mode A returned empty and the SOP stopped on a false 'no blocking-IO surface'. Selection is now a union over the changed files: - findings on added lines of git diff <base>...HEAD (kept: a second identical symbol in an already-flagged function collides on the stable key and only this selection sees it); - findings new versus the merge base, matched by (path, function, symbol) — never line numbers. Base sources are materialized via git show <merge-base>:<path>; files absent at base count every head finding as new. SKILL.md now states the residual same-file-only blind spot (cross-file async callers) instead of treating an empty list as proof of zero exposure, and only requires reading sop-skeleton.md when generalizing to another detector domain. * docs(skill): examples teach test-writing, the teeth check defines the rule All examples in the references/template are filesystem-flavored; make explicit that they are instances, not the SOP's boundary — the same rules apply to every detector category (FILE_IO, HTTP, SUBPROCESS, SLEEP) and acceptance is always red/green teeth, never similarity to an example. Neutralize the template's arrange comment accordingly. * fix(blocking-io): harden changed-lines scanner per review - Dedup the union selection by the stable key (path, function, symbol) instead of dict identity, so a future selector returning copied dicts cannot silently empty the result. - parse_changed_lines now handles any unified diff: context lines advance the new-file counter, \-markers and deletions do not, and the counter resets at each +++ header. Previously correct only for --unified=0. - Add blocking_io_static.scan_source (in-memory scan); base-version comparison no longer round-trips through temp files. - Empty Mode A report now prints the same-file-only reachability caveat at the point of use instead of relying on the SOP text alone. * docs(skill): bound best-effort cleanup when the offload sits in finally Lesson from the #3505 review: the SOP routinely drives 'offload the cleanup branch' transformations, and an awaited cleanup in finally can mask or stall the primary exception. One sentence in Step 2 closes that gap at the point where the fix is written.	2026-06-12 10:20:38 +08:00
zengxi	330a2ff8c5	feat(community): add SearXNG and Browserless web search/fetch tools (#3451 ) * feat(community): add SearXNG and Browserless web search/fetch tools - SearXNG web_search: privacy-focused meta search engine integration with configurable base_url via config.yaml tool settings - Browserless web_fetch: headless browser page fetching with readability article extraction - Both tools are fully configurable through tool config section - No external API keys required for basic operation * fix: address PR review feedback and add unit tests - Guard config.model_extra against None values (review #1, #2) - Coerce max_results to int when reading from config (review #2) - Fix web_fetch_tool to use direct HTTP fetch instead of reusing the web_search client config (review #3) - Fix misleading docstring for SearxngClient.fetch (review #4) - Remove unused target_url variable to pass Ruff lint (review #5) - Normalize bool config values with _normalize_bool helper to handle env-resolved string values correctly (review #6) - Add unit tests for both SearXNG and Browserless client classes and their tool functions with mocked httpx (review #7, #8) * fix: convert to async httpx to avoid blocking I/O on event loop - Replace httpx.Client with httpx.AsyncClient in both client classes - Convert tool functions to async def - Wrap readability_extractor calls in asyncio.to_thread() - Update all tests to use pytest.mark.asyncio and async mocks - Fix import sorting to pass Ruff lint * fix(browserless): replace deprecated waitUntil with waitForEvent The Browserless API has deprecated the waitUntil parameter. Replace with waitForEvent which accepts values like 'networkidle'. Default is empty (no wait), configurable via config.yaml. * fix(browserless): remove deprecated gotoTimeout and bestAttempt params The Browserless /content API does not accept gotoTimeout or bestAttempt as top-level payload keys. These were being sent unconditionally, causing 400 Bad Request errors on current Browserless versions. Changes: - Remove goto_timeout_ms parameter and 'gotoTimeout' from payload - Remove best_attempt parameter and 'bestAttempt' from payload - Remove _normalize_bool helper (no longer needed) - Remove goto_timeout_ms and best_attempt config reading in tools.py - Add tests for waitForSelector and reject params - Verify no deprecated params are sent in test_fetch_html_success * refactor(searxng): remove web_fetch_tool, decouple from web_search config SearXNG is a search engine — it should only provide web_search_tool. The web_fetch responsibility belongs to Browserless (headless Chrome) or Jina AI, not SearXNG. Changes: - Remove web_fetch_tool from SearXNG tools.py and __init__.py - Remove SearxngClient.fetch() method (no longer needed) - Remove unused asyncio/readability imports from SearXNG tools.py - Add test for max_results string-to-int coercion from config - Add test for search with categories parameter - Add test for httpx.RequestError handling - Apply ruff format fixes to browserless_client.py and test files	2026-06-12 09:45:26 +08:00
snaplap	0367fe6c7a	fix(frontend): prevent user message bubble overflow with long unbreakable strings (#3488 ) - Add max-w-full min-w-0 to user message wrapper div to constrain width - Change bubble width from w-fit to w-full max-w-full for consistent layout - Add break-words to user message content for long string wrapping - Add overflow-x-clip as defensive overflow protection Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-11 22:55:48 +08:00
zgenu	c733d3c917	fix(frontend): isolate new chat thread messages (#3508 ) * fix(frontend): isolate new chat thread messages * fix(frontend): keep live messages visible in new chat * fix(frontend): reset thread-local message refs	2026-06-11 22:12:15 +08:00
Huixin615	b6fbf0d105	fix(frontend): keep workspace interactive when SSR auth probe cannot reach gateway (#3493 ) (#3495 ) * fix(frontend): keep workspace interactive when SSR auth probe cannot reach gateway (#3493) When the SSR auth probe at /api/v1/auth/me times out or fails, the workspace layout used to render a static fallback page without AuthProvider or QueryClientProvider, making logout and every other interaction non-functional until the gateway recovered. Render the normal WorkspaceContent in 'gateway_unavailable' mode instead, surfacing a polite offline banner that re-probes the gateway in the background and hides itself the moment refreshUser() returns an authenticated user. The probe is reentrancy-guarded so a slow gateway cannot pile up parallel /auth/me requests. Closes #3493 * fix(workspace): silent probe in offline banner to avoid /login redirect during gateway recovery (#3493) The banner previously delegated retry probes to AuthProvider.refreshUser(), which treats any 401 from /api/v1/auth/me as 'session expired' and force-redirects to /login. During gateway recovery, the first few requests may transiently return 401 before the gateway is fully ready, which would incorrectly kick the user out — defeating the purpose of the offline banner. Now the banner silently fetches /api/v1/auth/me itself and only delegates to refreshUser() on 200 OK. Non-200 responses (401 / 5xx / network) are swallowed and retried on the next interval tick, ensuring the user stays logged in across short gateway outages. Verified in Docker: - docker pause deer-flow-gateway → banner appears, page interactive - docker unpause deer-flow-gateway → banner auto-disappears within 10s, user remains on /workspace/chats/new with full session restored - All 117 unit tests pass * fix(workspace): fix banner polling leak and persistent 401 handling (#3493) - Stop polling immediately after user recovery: add user to effect dependencies, cleanup interval when user !== null - Handle persistent 401: trigger login redirect after 3 consecutive unauthorized responses - Extract decision logic to pure helper, add 8 unit tests covering all critical paths * fix(workspace): address CR feedback on gateway offline recovery (#3493) - gateway-offline-banner-helpers: decrement (not reset) auth-failure streak on transient outcomes so a flapping gateway (401 alternating with 5xx) still converges on session-expired - gateway-offline-banner: reuse probe response body to apply user directly via new AuthProvider.applyUser, halving the recovery burst against an already-struggling gateway - gateway-offline-banner: extract classifyProbe into helpers for unit testability; log probe failures via console.warn instead of swallowing - gateway-offline-fallback: new shared component used by both workspace and (auth) layouts so auth pages recover the same way the workspace does, fixing the lockup where bare static HTML had no AuthProvider - AuthProvider.logout: fall back to hard navigation when the gateway logout fetch fails, matching legacy form-POST behaviour and avoiding stale client state during outage - tests: extend gateway-offline-banner-helpers.test with flapping convergence and classifyProbe branch coverage (19 cases total)	2026-06-11 21:14:49 +08:00
DanielWalnut	f401e7baa6	[codex] Fix stale AIO sandbox cache reuse (#3494 ) * Fix stale AIO sandbox cache reuse * Address AIO sandbox review feedback * Distinguish sandbox health check failures * Keep local discovery recoverable when the runtime check fails LocalContainerBackend.discover() shares _is_container_running, which now raises on transient daemon errors instead of returning False. Discovery has no exception handling in _discover_or_create_with_lock(_async), so a brief Docker hiccup turned a recoverable "could not verify, create instead" into a hard acquire failure. Catch the check failure inside discover() and return None so an unverifiable container is simply not adopted, restoring the pre-change fall-through while keeping raise-on-unknown semantics protecting the destroy path. Reported by fancy-agent on PR #3494. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Narrow the not-found match in container inspect error handling A bare "not found" substring also matches transient failures like "command not found" or "context not found", which would misclassify a check error as "container definitely gone" and bypass the raise-on-unknown contract. Keep Docker's specific "No such object"/"No such container" phrases, and only trust a generic "not found" (Apple Container) when the message names the inspected container or refers to a container/object. Reported by WillemJiang on PR #3494. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 17:53:37 +08:00
Huixin615	919d8bc279	fix(sandbox): persist lazily-acquired sandbox state via Command (#3464 ) * fix(sandbox): persist lazily-acquired sandbox state via Command ensure_sandbox_initialized mutates runtime.state in place, which is local to the current tool invocation and is not picked up by LangGraph's channel reducer. Subsequent graph steps and downstream consumers (such as ToolOutputBudgetMiddleware and the sub-agent task_tool) therefore cannot observe the sandbox id from state. Wrap tool calls in SandboxMiddleware (wrap_tool_call / awrap_tool_call) to detect fresh lazy initialization by diffing runtime.state before and after the handler, and emit a proper state update via Command(update=...): - ToolMessage results are wrapped into Command(update={sandbox, messages}) - Command results with a dict update are merged on the sandbox key while preserving messages / goto / graph / resume - Command results with non-dict updates are left untouched to avoid silent data loss on unknown update shapes Tests: - 7 new unit tests cover lazy-init emit, passthrough, dict-update merge, non-dict-update passthrough (sync and async) - Refresh replay golden write_read_file.ultra.events.json: SSE 'values' events now correctly carry the 'sandbox' key in their keys list, which is the direct evidence that the fix is effective Closes #3463 * refactor(sandbox): use dataclasses.replace to preserve Command fields Address Copilot review on #3464: replace manual field-copy with dataclasses.replace so any current or future Command fields are preserved automatically when merging sandbox_update. Also add a regression test that constructs a Command with non-None graph/goto/resume to lock this behavior in.	2026-06-11 17:50:36 +08:00
Willem Jiang	2d5f0787de	Update lint-check.yml with the job setting	2026-06-11 00:07:36 +08:00
Huixin615	5819bd8a59	fix(frontend): paginate workspace chat list beyond 50 threads (#3482 ) (#3485 ) * fix(frontend): paginate workspace chat list beyond 50 threads (#3482) The sidebar 'Recent chats' and /workspace/chats list were hard-capped at the first 50 threads returned by threads.search. Replace the single-shot useThreads() consumers with useInfiniteThreads() and add an IntersectionObserver sentinel to each list so further pages are fetched on demand. In search mode on the chats page, the sentinel is replaced by an explicit 'Load more' button to prevent the observer from draining the entire backend list while the filtered view stays empty. - Add useInfiniteThreads + page-size constant and pure cache helpers (map/filterInfiniteThreadsCache, getInfiniteThreadsNextPageParam) - Mirror rename / delete / stream-finish updates into the new infinite cache so optimistic UI stays consistent - Extend the e2e mock to honour limit/offset slicing - Unit tests for the cache helpers and pagination boundary - Playwright e2e covering chats page + sidebar load-more, and the search-mode guard against runaway auto-pagination - Add en/zh i18n entries for the search-mode load-more button Fixes #3482 * docs(frontend): clarify infinite-threads offset semantics and test post-delete invariant - Add docstring to getInfiniteThreadsNextPageParam explaining that TanStack Query freezes the returned offset into pageParams once, so optimistic cache mutations that shrink page lengths (filterInfiniteThreadsCache on delete) cannot retroactively move the offset backwards. Delete/rename paths reconcile against the backend via invalidateQueries in onSettled. - Add unit test covering the post-delete invariant. - Fix misleading comment in thread-list-infinite-scroll.spec.ts: the thread-search mock does not sort by updated_at; it returns the array in the order provided. Addresses Copilot CR comments on #3485. * fix(frontend): mirror onCreated upsert into infinite cache; add sidebar Load-older button Address review feedback on #3485: - New upsertThreadInInfiniteCache helper; useThreadStream onCreated now upserts into both the legacy ['threads','search'] cache and the new infinite cache, so a freshly created thread appears in the sidebar immediately during streaming instead of only after the run finishes and onSettled invalidates the query. Restores parity with main. - Sidebar Recent Chats now exposes a visible 'Load older chats' button alongside the IntersectionObserver sentinel, so keyboard-only users and environments where IO is unavailable can still reach older conversations. - Add zh-CN / en-US / types entry for chats.loadOlderChats. - Cover the new helper with 3 unit tests (no-op on uninitialised cache, prepend new thread to first page, merge with existing entry without duplication).	2026-06-10 23:59:38 +08:00
hataa	b3c2cc42cf	fix(agents): require config.yaml in resolve_agent_dir to skip memory-only directories (#3390 ) (#3481 ) When memory is enabled, the first conversation with a legacy shared agent creates a per-user agent directory containing only memory.json (no config.yaml). On the second turn, resolve_agent_dir() returned this incomplete directory, causing load_agent_config() to fail with "Agent config not found". Require config.yaml to exist alongside the directory for both the per-user and legacy paths, so that memory-only directories fall through correctly. This aligns resolve_agent_dir with the existing config.yaml check in list_custom_agents. Refs: https://github.com/bytedance/deer-flow/issues/3390	2026-06-10 23:57:17 +08:00
Ryker_Feng	167ef4512f	feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429 ) (#3465 ) * feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429) Add a `memory.token_counting` option (`tiktoken` \| `char`) so deployments in network-restricted environments can opt out of tiktoken entirely. In `char` mode the memory-injection budget uses a network-free character-based estimate and never triggers the BPE download from openaipublic.blob.core.windows.net, which could otherwise block for tens of minutes (see #3402). Also harden the default `tiktoken` path: - cache an in-flight LOADING sentinel so concurrent callers fall back immediately instead of spawning more blocking get_encoding threads when the first load is still running (e.g. under the 5s startup warm-up timeout); - cache failures with a timestamp and retry after a cooldown so a transient network outage self-heals back to accurate counting without a restart; - skip startup warm-up entirely in char mode. The new config is surfaced via the memory config API and config.example.yaml (config_version bumped). Default remains `tiktoken`, so existing deployments are unaffected. * fix(memory): use CJK-aware char token estimate and address review feedback - Replace the flat len(text)//4 fallback with a CJK-aware estimate so Chinese/Japanese/Korean memory content does not over-fill the injection budget - Document the internal tiktoken retry cooldown and char-mode escape hatch - Sync CLAUDE.md / config.example.yaml / MEMORY_IMPROVEMENTS.md wording - Fix MemoryConfigResponse mocks/assertions and add CJK estimate tests	2026-06-10 23:26:15 +08:00
Xinmin Zeng	ba9cc5e972	fix(gateway): enforce thread ownership on stateless run endpoints (#3473 ) POST /api/runs/stream and /api/runs/wait accept thread_id in the request body but performed no owner authorization, letting any authenticated user start runs on -- and read /wait checkpoint channel_values from -- another user's thread (cross-user IDOR, #3472). The @require_permission(owner_check=True) decorator resolves ownership from the thread_id path param, so it cannot cover these body-param endpoints. Enforce ownership inside start_run() before create_or_reject via ThreadMetaStore.check_access: missing rows (auto-created temp threads) and NULL-owner rows stay accessible, while a thread owned by another user returns 404 (matching thread_runs.py). The internal system role (IM channels acting for platform users) is exempt. Closes #3472	2026-06-10 23:03:39 +08:00
Xinmin Zeng	05ae4467ae	fix(docker): default Gateway to a single worker to prevent multi-worker breakage (#3475 ) The default `make up` started the Gateway with `--workers 4`, but run state (RunManager and the stream bridge) is held in-process and nginx uses no sticky sessions. With the default config, same-run requests scatter across workers that each keep their own run state, breaking run cancellation (409), SSE reconnect (hangs on heartbeats), multitask de-duplication, and IM channels (duplicate replies). The shared cross-worker stream bridge does not exist yet. Default GATEWAY_WORKERS to 1 so the out-of-the-box deployment is correct, document the single-worker boundary in the README, and add a regression test pinning the default while keeping it overridable. This is a stop-gap, not a multi-worker implementation; the full fix (shared run state + stream bridge) is tracked in #3191. Refs #3239, #3260	2026-06-10 21:36:25 +08:00
DanielWalnut	2b795265e7	fix: align auth-disabled mode and mock history loading (#3471 ) * fix: align auth-disabled mode and mock history loading * fix: address auth-disabled review feedback * test: cover auth-disabled backend contract * style: format frontend tests * fix: address follow-up review comments	2026-06-10 16:11:00 +08:00
Nan Gao	a57d05fe0a	fix runtime journal run lifecycle events (#3470 )	2026-06-10 08:33:29 +08:00
Lucy Shen	ae9e8bc0bf	fix(sandbox): make missing sandbox.mounts host_path a loud ERROR (#3244 ) (#3250 ) In Docker production deployments, LocalSandboxProvider runs inside the deer-flow-gateway container, so any `sandbox.mounts[].host_path` from config.yaml is resolved against the gateway container's filesystem — not the host machine. When the path isn't also bind-mounted into the gateway service, the mount was silently dropped with only a WARNING log, leaving agents reading an empty directory in production while the same config worked under `make dev`. Escalate the missing-host_path branch to logger.error with explicit guidance about Docker bind mounts and docker-compose, so the failure is hard to miss in default log configurations. Skip behaviour is preserved to avoid breaking existing deployments. Also clarify the misleading `VolumeMountConfig.host_path` field description so it documents reality for both providers: - LocalSandboxProvider checks host_path from inside the gateway process (host in `make dev`, container in `make up`). - AioSandboxProvider (DooD) passes host_path straight to `docker -v` for the sandbox container, where the host Docker daemon resolves it from the host machine's perspective. config.example.yaml's `sandbox.mounts` comment gets a Note: block pointing operators at the docker-compose bind-mount requirement so the Docker-mode gotcha is discoverable from the canonical template. Adds a regression test that: - confirms missing host_path is still skipped (no behaviour break); - asserts an ERROR record is emitted referencing the offending paths; - asserts the message contains actionable Docker/gateway/docker-compose keywords so future refactors can't quietly downgrade it. Refs: https://github.com/bytedance/deer-flow/issues/3244	2026-06-09 23:16:14 +08:00
DanielWalnut	16391e35ab	fix(skills): harden slash skill activation across chat channels (#3466 ) * support slash skill activation * format slash skill activation * Preserve slash skill activation with uploads * Address slash skill review feedback * Address slash skill follow-up review * Fix lazy slash skill storage resolution * Keep slash skill activation out of system prompt * Address slash skill review issues * fix: harden slash skill command handling * feat(frontend): add slash skill autocomplete * fix: address slash skill review feedback * fix: preserve slash skill text for IM uploads v2.0-m1-rc3	2026-06-09 23:07:17 +08:00
tanghang97	18bbb82f07	Fix 'make dev' failure in Windows environment (#3236 ) * fix: Solving the problem of "make dev" failing to start in Windows environment * fix: revert the change to the startup_config and fix the lint errors * fix: Address Copilot review feedback - Validate wait-for-port input and avoid PowerShell port interpolation - Require Python 3 in serve.sh launcher detection - Keep Windows event loop policy setup in sitecustomize only - Clarify sitecustomize process-wide backend behavior	2026-06-09 22:37:54 +08:00
ly-wang19	b62c5a7b5b	fix(agents): offload blocking filesystem IO in the custom-agent router off the event loop (#3457 ) * fix(agents): offload blocking filesystem IO in delete_agent off the event loop delete_agent is an async route handler but resolved the agent directory (Paths.base_dir -> Path.resolve), probed it (Path.exists), and removed it (shutil.rmtree) directly on the event loop, blocking it for the duration of every delete. Surfaced by 'make detect-blocking-io'. Move the resolve/exists/rmtree sequence into a sync helper run via asyncio.to_thread, mapping its outcome back to the existing 404/409/500 responses (behavior unchanged). Adds a tests/blocking_io/ regression anchor under the strict Blockbuster gate, mirroring test_skills_load.py (#1917). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(agents): offload blocking filesystem IO in create_agent_endpoint too Like delete_agent, the async create_agent_endpoint resolved and created the agent directory and wrote config.yaml + SOUL.md (with rmtree cleanup on failure) directly on the event loop. Move the whole create-or-409 sequence into a sync helper run via asyncio.to_thread; behavior is unchanged (201 / 409 / 500). Extends the blocking_io regression anchor to cover create as well as delete and renames it to test_agents_router.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-06-09 22:24:53 +08:00
Admire	5b81588b87	fix(frontend): fallback Streamdown clipboard copy (#3397 ) * fix(frontend): fallback streamdown clipboard copy * fix(frontend): address clipboard fallback review * fix(frontend): normalize clipboard fallback rejection * fix(frontend): harden clipboard fallback install * fix(frontend): clarify clipboard fallback errors * fix(frontend): cover clipboard fallback edge cases * fix(frontend): tighten clipboard fallback cleanup * fix(frontend): reduce clipboard fallback copy window * fix(frontend): guard clipboard item fallback install * fix(frontend): clean up clipboard fallback on selection errors * Address clipboard fallback review feedback * fix(frontend): guard clipboard fallback install during SSR	2026-06-09 22:09:13 +08:00
Nan Gao	63ce88f874	fix(replay-e2e): key fixtures by caller and conversation (#3453 ) * add caller identity in replay e2e * make format * fix(replay-e2e): stabilize title caller replay * fix(replay-e2e): use captured caller without run manager --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-09 21:58:31 +08:00
hataa	37337b77f9	feat(models): add StepFun reasoning model adapter (#3461 ) Add PatchedChatStepFun adapter for StepFun reasoning models (step-3.7-flash, step-3.5-flash). Captures reasoning from both streaming and non-streaming responses and replays it on historical assistant messages for multi-turn tool-call conversations. - New: PatchedChatStepFun adapter with streaming/non-streaming reasoning capture - Support both reasoning and reasoning_content field names - 17 unit tests covering all response paths - Updated: config.example.yaml with StepFun configuration example	2026-06-09 18:01:43 +08:00
ly-wang19	8db16bb3d8	fix(config): coerce null config.yaml list sections to empty list (#3434 ) Copying config.example.yaml to config.yaml and starting DeerFlow crashed with `pydantic ValidationError: models — Input should be a valid list [input_value=None]`, because the example ships every entry under `models:` commented out, so PyYAML parses the key as null. Reported in #1444. Add a field_validator(mode="before") on AppConfig that coerces null models/tools/tool_groups to [] (matching their default_factory=list), and emit an actionable warning from from_file when no models are configured (pointing to config.example.yaml / make setup). Adds regression tests. Closes #1444 Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-09 15:45:28 +08:00
AochenShen99	93e3281cbf	fix(dev): create backend/sandbox before uvicorn reload-exclude (#3459 ) (#3460 ) * fix(dev): create backend/sandbox before uvicorn reload-exclude (#3459) #3426 switched the dev gateway's --reload-exclude patterns to absolute paths. uvicorn only excludes an absolute path directly when it already exists as a directory; otherwise it globs the pattern, and Python 3.12's pathlib raises NotImplementedError("Non-relative patterns are unsupported") for an absolute glob pattern. serve.sh mkdir'd the .deer-flow excludes but not backend/sandbox, so `make dev` crashed on startup on a fresh checkout under Python 3.12 (#3454). docker/dev-entrypoint.sh had the same latent gap. Create backend/sandbox in both launchers so every absolute exclude stays on uvicorn's is_dir() short-circuit. Add a regression test that pins the uvicorn mechanism (crash on missing dir, safe once created) and enforces that every absolute --reload-exclude is mkdir'd before launch. Closes #3459 * test(dev): harden reload-exclude invariant parser against false pass/negatives The launcher invariant test parsed shell with a "mkdir -p" line filter and a substring membership check. Two latent gaps (sub-threshold for this fix, but this code guards a user-facing startup path, so close them): - A `\`-continued multi-line `mkdir` would drop arguments on continuation lines, silently weakening coverage. - Substring membership could false-pass when an exclude is a path-prefix of a different created dir (e.g. `/app/backend/sandbox` "found" inside `/app/backend/sandbox-other`). Fold line-continuations, drop comments, and shlex-tokenize each `mkdir` argument list into an exact set (quotes stripped, `$VAR` literal); assert exact set membership. Same shlex handling for `--reload-exclude` values. Verified the parser still flags the pre-fix missing `backend/sandbox` (RED preserved) and no longer false-passes on a path-prefix. * fix(dev): gitignore backend/sandbox runtime dir + pin mkdir-before-launch Address two review findings on the #3459 fix: - backend/sandbox was described as "gitignored runtime state" but no ignore rule actually matched it. Add an anchored `/sandbox/` to backend/.gitignore (anchored so it does NOT shadow the source package backend/packages/harness/deerflow/sandbox/) so sandbox artifacts created at runtime can't pollute the working tree or be committed by accident. New test asserts content under backend/sandbox is ignored, making the claim verifiable. - The launcher invariant test only proved the sandbox mkdir exists somewhere, not that it runs before uvicorn starts. Add an order test (sandbox mkdir line must precede the `uv run uvicorn` launch) so a future edit can't move the mkdir below the launch and silently reintroduce the crash. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * test(dev): fix reload-exclude parser to handle serve.sh's quoted flag bundle The previous autofix tokenized each whole line with shlex, but serve.sh packs every flag into a single double-quoted `GATEWAY_EXTRA_FLAGS="..."` assignment. shlex collapses that into one token, so no `--reload-exclude` flag is found and `test_launcher_precreates_every_absolute_reload_exclude[scripts/serve.sh]` failed CI with "expected at least one absolute reload-exclude". Parse `--reload-exclude` with a regex that matches a balanced single/double quoted group or a bare token, so the assignment's surrounding `"` is never swallowed into the value. This recovers all three serve.sh excludes (the prior regex also silently dropped the last `$BACKEND_RUNTIME_HOME` because the adjacent closing quote broke shlex) while still covering dev-entrypoint.sh and the space-separated `--reload-exclude <value>` form. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-06-09 15:29:40 +08:00
AochenShen99	0fb18e368c	refactor(lead-agent): make build_middlewares public to drop the last cross-module private import (#3458 ) `client.py` imported the private `_build_middlewares` from `agent.py` across a module boundary and called it as public API. Because the `_` name signals "module-private, no external callers", any future rename or signature change silently breaks the embedded `DeerFlowClient` path — and the test suite even monkeypatched `deerflow.client._build_middlewares`, baking the leak in. `DeerFlowClient` is a lead-agent variant that genuinely needs the lead agent's full middleware composition, so make the dependency honest: promote the helper to a documented public entry point `build_middlewares` and update every in-repo caller. Found during #3341 review; #3341 already removed one such leak (`_assemble_deferred` -> public `assemble_deferred_tools`) and left this one out of scope on purpose. - agent.py: rename def + both internal call sites; expand the docstring into a public-entry-point contract and document the previously-undocumented model_name / app_config / deferred_setup params - client.py: import + call site now use the public name (removes the last cross-module private import) - scripts/tool-error-degradation-detection.sh: update its import + call site - tests (5 files): update monkeypatch/patch targets and direct calls - docs (backend/CLAUDE.md, plan_mode_usage.md, middlewares.mdx): sync the live references that describe the symbol as current API Pure mechanical rename, no behavior change. Historical design docs (rfc, superpowers spec) intentionally keep the old name as point-in-time records. Closes #3431	2026-06-09 11:56:28 +08:00
Xinmin Zeng	90e23bfd09	fix(ci): consolidate PR/issue labeling and fix reviewing-job crash + label thrash (#3455 ) * fix(ci): consolidate PR/issue labeling into one triage.yml; fix reviewing crash & label thrash - Replace pr-labeler + pr-triage + issue-triage with a single triage.yml; drop actions/labeler. Its sync-labels removed labels outside its config (clobbered size/risk/needs-validation and could clobber maintainer labels). Area is now computed in-script and reconciled only within owned namespaces (area:/size//risk:/needs-validation); first-time/reviewing are add-only. - reviewing: gate on author_association in {OWNER,MEMBER,COLLABORATOR} + user.type==='User' instead of getCollaboratorPermissionLevel, which 404'd on bot reviewers ('Copilot is not a user') and crashed the job. Excludes all review bots with no denylist and no API call. - Read live state (listFiles + listLabelsOnIssue) not the stale event payload, so rapid synchronize events converge instead of thrashing. Size churn excludes lockfiles/snapshots. * fix(ci): read labels live via paginate in reviewing & issue-triage jobs Address review feedback on #3455: - reviewing: listLabelsOnIssue now paginates (per_page:100) instead of the default 30, matching pr-labels, so a 'reviewing' label is never missed on PRs with many labels. - issue-triage: read live labels via the API instead of the event payload, consistent with the live-state reads documented in the header.	2026-06-09 11:14:19 +08:00
Ryker_Feng	f92a26d56f	fix(web_fetch): support proxy for Jina reader in restricted networks (#3418 ) (#3430 ) * fix(web_fetch): support proxy for Jina reader in restricted networks The web_fetch tool built a bare httpx.AsyncClient() with no proxy awareness, so users behind a corporate proxy / in Docker / WSL could not reach https://r.jina.ai and web_fetch timed out. - Add optional `proxy` / `trust_env` params to JinaClient.crawl and wire them from the `web_fetch` tool config (with type coercion for YAML string values). - Pass internal service hostnames through NO_PROXY in both compose files so proxy env inherited via env_file does not break in-cluster calls (gateway/provisioner/etc). - Load proxy vars from .env into the shell in scripts/docker.sh so the NO_PROXY interpolation can merge user-provided values on `make` path. - Document proxy/trust_env options in config.example.yaml. Closes #3418 * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-06-08 23:25:29 +08:00
AochenShen99	3b6dd0a4e3	feat(subagents): extend deferred MCP tool loading to subagents (#3432 ) * feat(subagents): extend deferred MCP tool loading to subagents (#3341) Subagents now reuse the lead agent's deferred-tool path: when tool_search.enabled, MCP tool schemas are withheld from the model and surfaced by name in <available-deferred-tools>, fetched on demand via the generated tool_search helper. DeferredToolFilterMiddleware deterministically rewrites request.tools to hide the deferred schemas (the prompt section is discovery only, not enforcement). Consolidates the assembly into deerflow.tools.builtins.tool_search, now the single home for both assemble_deferred_tools (centralized fail-closed guard, replacing the lead-only private _assemble_deferred) and the relocated get_deferred_tools_prompt_section. Shared by every build path: lead agent, embedded client, and subagent executor. tool_search is appended after the subagent's name-level tool policy and is treated as infrastructure: its catalog is built from the already policy-filtered list, so it can never surface a tool the policy denied. Follow-up to #3370. Fixes #3341. * test(subagents): assert the real middleware builder emits a working deferred filter (#3341) The existing recipe test hand-constructs DeferredToolFilterMiddleware, so it cannot catch a regression in how build_subagent_runtime_middlewares (the call executor._create_agent actually makes) wires the deferred setup into the filter. Add a test that sources the filter from the real builder given a real setup and runs it through a graph: a wrong catalog hash would silently stop promotion, a dropped filter would stop hiding — both now caught. Running the full real middleware stack is intentionally avoided (the other runtime middlewares need sandbox/thread infra to execute, which would make the test flaky); their attachment + ordering before Safety stays locked in test_tool_error_handling_middleware.py. * test(subagents): keep executor tests config-free in CI * chore: trigger ci * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> v2.0-m1-rc2	2026-06-08 23:17:22 +08:00
Xun	3c2b60aaae	fix(threads): assign new checkpoint ID in update_thread_state (#2391 ) * async * add test * test(threads): assert aput preserves endpoint-assigned checkpoint id Confirm the update_thread_state fix is real, not a no-op: all supported savers (InMemorySaver, AsyncSqliteSaver, AsyncPostgresSaver) persist and echo checkpoint["id"] verbatim rather than minting their own. Add assertions that each POST /state response's checkpoint_id round-tripped into persisted history and kept its uuid6 time-ordering through aput, and document the verified contract in the router. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 23:12:25 +08:00
zgenu	67ad6e232f	fix(dev): exclude runtime state from gateway reload (#3426 )	2026-06-08 22:54:23 +08:00
DanielWalnut	cd5bedaa74	feat: MiniMax provider for image/video/podcast skills + new music-generation skill (#3437 ) * docs(spec): MiniMax integration for generation skills + new music skill Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(plan): MiniMax generation providers implementation plan Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(skills): add importlib loader + FakeResp for skill tests * test(skills): register loaded module in sys.modules; raise requests.HTTPError in FakeResp * feat(image-generation): add MiniMax provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(image-generation): guard unknown provider, derive ref MIME, strengthen tests Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(video-generation): add MiniMax provider with async poll/download Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(video-generation): surface base_resp errors while polling; add timeout test * feat(podcast-generation): add MiniMax t2a_v2 provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(podcast-generation): restore TTS credential guard; add volcengine + voice tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(music-generation): new MiniMax music skill via skill-creator Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(music-generation): treat empty lyrics as absent; test no-audio-data path * refactor(skills): add request timeouts to MiniMax network calls Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Potential fix for pull request finding 'Explicit returns mixed with implicit (fall through) returns' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * fix(models): strip inconsistent user-message names for MiniMax chat DeerFlow middlewares tag user messages with provenance names (user-input, summary, loop_warning); langchain serializes them into the OpenAI-compatible payload and MiniMax rejects mismatched user-message names with "user name must be consistent (2013)". PatchedChatMiniMax now drops the per-message name from user-role messages. Point the config.example MiniMax models at PatchedChatMiniMax so they also get reasoning_content mapping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(image-generation): MiniMax sends JSON prompt field, guard 1500-char limit MiniMax image-01 takes one text string capped at 1500 chars, but the skill was sending the whole structured JSON. The MiniMax provider now extracts the JSON `prompt` field (relying on prompt_optimizer to expand it) and fails fast with a clear error before calling the API when that field exceeds 1500 chars. Authoring stays provider-agnostic; Gemini still receives the full JSON. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(podcast-generation): per-provider TTS concurrency and retry/backoff Each TTS provider owns its concurrency internally — MiniMax runs single-threaded to reduce rate-limit failures, Volcengine keeps 4 workers — with automatic retry and backoff on transient HTTP and base_resp errors. No caller-facing concurrency knob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(skills): address Copilot review comments on generation skills - video: add raise_for_status + timeout to the Gemini download/POST/poll calls so non-2xx responses surface as clear HTTP errors instead of JSON/KeyError or hangs - video: check the task Fail status before the generic base_resp check so the failure keeps its task_id context - video/image: create the output file parent directory before writing (matching music-generation) so nested output paths do not raise FileNotFoundError - music: require a non-empty prompt and fail fast with ValueError instead of sending an empty prompt to the API Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(scripts): reclaim dev ports across worktrees in make stop/dev All deer-flow worktrees (main checkout + linked worktrees) hardcode the same dev ports (8001/3000/2026), so a service started from any worktree must be reclaimable from another. stop_all now resolves the set of worktree roots (DEERFLOW_ROOTS) and treats a process as deer-flow-owned when its open files live under any of them. It also force-kills survivors on 2026 alongside 8001/3000, fixing `make dev` aborting on the nginx port preflight when a prior nginx lingered on 2026. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(view-image): hide the injected image-context message from the UI ViewImageMiddleware injects a HumanMessage (text + base64 images) so the vision model can see viewed images, but it was the only internal injector that set neither hide_from_ui nor a hidden name, so it leaked into the chat UI (and IM channels) as a user bubble reading "Here are the images you've viewed:". Mark it with additional_kwargs={"hide_from_ui": True}, matching todo/dynamic_context injections, which the frontend isHiddenFromUIMessage and the channel sender already honor. The model still receives the full content. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(minimax): mark M2.7 models as text-only (no vision) MiniMax M2.7 / M2.7-highspeed do not support vision; only M3 does. The provider config asserted vision support for M2.7 in four places. - config.example.yaml: 4 M2.7 entries -> supports_vision: false - backend/docs/CONFIGURATION.md: M2.7 + highspeed -> supports_vision: false - wizard: add LLMProvider.model_vision_overrides + extra_config_for() so selecting an M2.7 model writes supports_vision: false while M3 (default) keeps vision; wire it through setup_wizard.py - tests: M2.7-highspeed fixture -> supports_vision=False; add test_minimax_vision_is_per_model Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-06-08 22:04:38 +08:00
DanielWalnut	1651d1f1f5	fix(frontend): restructure Memory settings toolbar into two rows (#3433 ) The search input, filter tabs, and four action buttons were crammed into a single horizontal row, which squeezed the search box into an unusable sliver and truncated the "Summaries" filter tab to "Summarie". Split the toolbar into two rows: search + filter tabs on the first, actions on the second. The search input now keeps a usable min width, filter tabs use whitespace-nowrap so they never truncate, and the destructive "Clear all memory" button is pushed to the far right (ml-auto) to separate it from the constructive actions. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 19:17:14 +08:00
Xinmin Zeng	799bef6d9d	fix(replay-e2e): match by conversation, not the living system prompt (#3436 ) * fix(replay-e2e): match by conversation, not the living system prompt The model-replay match key hashed the full input including the lead-agent system prompt. That prompt is edited frequently (e.g. #3195 added a "File Editing Workflow" section), so the committed fixture went stale the moment the prompt changed on main — turning the Layer-2 render gate RED on every unrelated PR (#3430, #3432, ...). This was a self-inflicted false positive. Root-cause fix: - replay_provider._canonical_messages now EXCLUDES the system message from the hash. The conversation (human/ai/tool) is the stable contract that identifies a recorded turn; the system prompt is an internal detail not part of the front-back contract under test. (Mirrors how open-design keys its mock picker on the user prompt, not the system internals.) Proven robust: injecting a prompt edit no longer causes a replay miss. - Layer-1 golden was BLIND to replay misses: the gateway swallows a miss into an assistant error message, so the shape-only golden stayed green on a stale fixture. It now inspects replay_provider.replay_misses() and fails loud. (Layer-2 already fails on a miss.) - Re-recorded write_read_file.ultra fixture + regenerated golden under the new conversation-only hash. - Layer-2 render spec: assert the in-graph auto-title (deterministic); the follow-up suggestion is fired async and depends on a clean JSON model output, so assert it only when the fixture captured one — never gate on its absence (recording flakiness must not block CI). - docs: REPLAY_E2E.md updated. Verified: Layer-1 golden green (no miss), Layer-2 both specs green, CI=true make test 4033 passed / 0 failed, frontend pnpm check clean. * test(replay-e2e): restore suggestions coverage with a reliable capture Addresses review feedback (the suggestion path was dropped from Layer-2): - record spec now waits for the `/suggestions` response before checking capture stability, so the recorded fixture reliably includes the frontend-fired suggestions turn (previously the stability window could return before suggestions fired, yielding a fixture without it). - Re-recorded write_read_file.ultra: 5 turns (write_file, auto-title, read_file, answer, suggestions). Golden unchanged — suggestions is a separate /suggestions call, not part of the /runs/stream SSE sequence. - Layer-2 spec: restore the hard `EXPECTED_SUGGESTION` assertion. With the record spec now waiting for /suggestions, a fixture missing the suggestion turn means a broken recording and must fail loud, not pass silently. Verified: Layer-1 golden green (no miss), Layer-2 both specs green (auto-title + suggestion render), frontend pnpm check clean. * ci: re-trigger (flaky Docker Hub image pull in sandbox e2e, unrelated) backend-unit-tests failed only in test_sandbox_orphan_reconciliation_e2e.py with 'docker pull busybox:latest ... context deadline exceeded' — a CI-runner network flake reaching Docker Hub, not related to this docs/tests-only change. Empty commit to re-run CI. --------- Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>	2026-06-08 17:32:41 +08:00
DanielWalnut	3b105d1e5f	fix(suggestions): strip inline <think> reasoning before parsing follow-up questions (#3435 ) Reasoning models such as MiniMax-M3 inline their chain-of-thought into the message content as <think>...</think> (reasoning_split defaults to false) instead of a separate reasoning_content field. The follow-up-suggestions endpoint extracted the JSON array via find('[') / rfind(']'), which silently broke whenever the reasoning text contained '[' or ']' — or when long thinking hit max_tokens and truncated before the array was emitted — returning empty suggestions. - Add _strip_think_blocks() and apply it before JSON extraction; it removes complete <think>...</think> blocks (case-insensitive) and drops an unclosed <think> left by max_tokens truncation. - Document the MiniMax thinking toggle in config.example.yaml (when_thinking_enabled: adaptive / when_thinking_disabled: disabled) so thinking_enabled=False actually disables reasoning on M3; note that M2.x models always think and rely on the defensive strip above. - Tests cover complete/unclosed think blocks, brackets-inside-think, think + code-fence, and an end-to-end suggestions case reproducing the empty-result bug. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 15:48:00 +08:00
Xinmin Zeng	88759015e4	test(e2e): deterministic record/replay front-back contract verification (#3365 ) * test(e2e): record/replay front-back contract verification Guards the front-back contract with a deterministic, key-free record/replay harness (mirrors open-design's golden-trace approach): - ReplayChatModel (tests/replay_provider.py): replays recorded LLM turns by a normalized hash of the model input. Strips <system-reminder>/date/uuid/tmp-path so one fixture replays across days and from both the browser and direct-POST paths; a miss raises loudly (no silent divergence). - Recording is record-through-browser (scripts/record_gateway.py + build_fixture_from_jsonl.py + frontend/tests/e2e-record): a real run is driven through the real frontend so captured inputs match exactly what the browser sends; fixtures contain no API key. - Layer 1 — backend golden (tests/test_replay_golden.py): replay through the real gateway, assert the SSE event sequence == committed golden. - Layer 2 — full-stack render (frontend/tests/e2e-real-backend): real Next.js + real gateway (replay model) + Chromium; assert the replayed auto-title and follow-up suggestions render. DOM assertions are the gate; visual regression is a local dev gate (CI uploads the render as an artifact). - CI (.github/workflows/replay-e2e.yml): both layers, triggered on EITHER side of the contract (frontend/** or backend gateway/harness/fixtures). * test(e2e): multi-run render-order cross-stack scenario (#3352) Guards the dangerous front-back class where a backend ordering change silently breaks a frontend assumption while both sides' unit tests stay green. Reproduces issue #3352: backend list_by_thread returns runs newest-first (#2932) and the frontend prepended per-run pages, inverting chronological order once the checkpoint no longer held the older messages. - tests/seed_runs_router.py: test-only seeder, mounted on the replay gateway only when DEERFLOW_ENABLE_TEST_SEED=1 (never in the production app). Seeds a thread with >=2 runs + per-run message events and no checkpoint -- the #3352 precondition -- so the frontend per-run reload path is the sole source of truth and the prepend inversion is observable. - frontend/tests/e2e-real-backend/multi-run-order.spec.ts: drives the real frontend against the real gateway, asserts the first run renders above the second. Reverting the #3354 fix turns it red. - replay-e2e.yml: trigger on the new replay test-infra paths. - docs: REPLAY_E2E.md cross-stack scenario section. * test(e2e): address Copilot review on the replay harness - Fix stale recorder references (scripts/record_traces.py -> scripts/record_gateway.py + scripts/build_fixture_from_jsonl.py) in replay_provider.py, test_replay_golden.py, _replay_fixture.py. - MODE_CONTEXT['ultra']: thinking_enabled False -> True, mirroring the frontend's `context.mode !== 'flash'` (hooks.ts). It did not affect the hashed input (Layer 1 golden still green), but the table now matches the real frontend context it claims to mirror. - replay_provider.py docstring: stop claiming memory is recorded-enabled; the replay config disables memory/summarization for determinism (title stays, as an in-graph deterministic call). - record_gateway.py / run_replay_gateway.py: override DEER_FLOW_HOME instead of setdefault, so an outer value can't leak into the hermetic harness. - record_gateway.py: clear error when DEERFLOW_RECORD_OUT is unset (was a bare KeyError). - playwright.record.config.ts: forward OPENAI_/DEERFLOW_RECORD_OUT only when set, so the gateway raises a clear 'missing env' error instead of getting ''. test(e2e): address Copilot review round 2 - seed_runs_router.py: constrain SeedMessage.role to Literal['human','ai'] so a bad value is a clean 422 at the boundary instead of a 500 (KeyError on _EVENT_TYPE). - record-write-read-file.spec.ts: waitForCaptureStable now throws on timeout instead of returning the last count, so a truncated/partial recording can't pass silently. - real-backend-render.spec.ts: guard the suggestions JSON.parse; a bracket-prefixed non-JSON turn falls back to '' so the existing not.toBe('') assertion fails clearly instead of a generic parse throw.	2026-06-08 12:35:03 +08:00
Huixin615	64d923b0fd	fix(middleware): externalize oversized tool output into sandbox for non-mounted sandboxes (#3417 ) * fix(middleware): externalize oversized tool output into sandbox for non-mounted sandboxes ToolOutputBudgetMiddleware persisted oversized tool results to the host filesystem and returned a /mnt/user-data/outputs virtual path. For sandboxes that do not use thread-data mounts (e.g. remote AIO sandbox), that virtual path does not exist inside the sandbox, so the model's read_file tool could not read it back and reported 'file not found'. Branch on SandboxProvider.uses_thread_data_mounts: - Mounted sandboxes (local Docker, AIO + LocalContainerBackend) keep the original host-disk path; the host outputs dir is bind-mounted to the same virtual path inside the sandbox, so behavior is unchanged. - Non-mounted (remote) sandboxes externalize into the sandbox itself via execute_command('mkdir -p ...') + write_file + 'test -s' validation. The validation step is required because AIO sandbox execute_command returns 'Error: ...' as a string on failure instead of raising, so a silent mkdir failure would otherwise leak through. Any failure (rejected subdir, mkdir/write/validate error) falls back to the existing inline head+tail truncation, so an unreadable path is never returned to the model. The sandbox resolver reads the sandbox_id that SandboxMiddleware already writes into runtime.state['sandbox']; it never calls provider.acquire(), keeping the tool-call hot path free of blocking I/O. Tools that do not use a sandbox (web_search, MCP, ...) resolve to None and fall through to inline truncation, which is the safe behavior for them. Fixes #3416 * fix(middleware): address Copilot review feedback on sandbox externalization - Make get_sandbox_provider() lookup best-effort in _budget_content: only query when outputs_path or sandbox is available, and fall back to inline truncation if provider initialization raises rather than propagating the error. A resolved sandbox instance is sufficient on its own to take the non-mounted externalization branch. - Strict-match the sandbox post-write validation echo (check.strip() == 'OK') to avoid false positives if execute_command ever surfaces unrelated stdout/stderr containing 'OK' as a substring. Refs: #3417 * test: fix flaky tests relying on /nonexistent/... path under container root Two tests in this module (test_returns_none_on_invalid_path and test_fallback_when_disk_write_fails) used paths like '/nonexistent/impossible/path' to trigger _externalize's OSError fallback. These paths are creatable when the test process runs as root inside the CI container: os.makedirs(..., exist_ok=True) successfully creates the entire chain under /, so the OSError branch is never hit and the tests fail. Reproducible on main independently of this PR. Switch to '/dev/null/cannot-mkdir-here'. /dev/null is a character device on both Linux and macOS, so os.makedirs always fails with NotADirectoryError regardless of privileges, reliably exercising the OSError fallback. * fix(tool-output-budget): only consult sandbox provider when a sandbox is resolved The previous revision called get_sandbox_provider() whenever externalization was triggered, including on the legacy host-disk path. Environments without a configured sandbox -- in particular CI runners without a config.yaml -- would raise FileNotFoundError there, get caught, and silently fall back to inline truncation. That defeated the host-disk externalization path that predates this PR and was the root cause of the regressing legacy tests. Restructure the branching so the provider is only consulted when a sandbox has actually been resolved for the current tool call: - sandbox resolved + provider.uses_thread_data_mounts: host-disk write (bind-mounted into the sandbox, equivalent to a sandbox-side write). - sandbox resolved + non-mounted provider: sandbox write (#3416). - no sandbox + outputs_path: host-disk write (legacy / non-sandbox tools, no provider call at all). - otherwise: inline fallback. No test changes; the legacy externalization tests are provider-agnostic by construction and now pass without monkeypatching. Refs: #3416 * test(tool-output-budget): assert legacy path does not call sandbox provider Lock in the contract introduced by `d6e2d25b`: when no sandbox is resolved for a tool call, _budget_content must externalize to the host outputs directory without consulting get_sandbox_provider(). Regressing this would re-break legacy / non-sandbox tools in environments without a configured sandbox (e.g. CI without config.yaml), which is the failure mode #3416's fix avoids. The test injects a get_sandbox_provider that raises on call, so any future refactor that moves the provider lookup out of the sandbox-only branch will fail loudly. Refs: #3416	2026-06-08 12:24:48 +08:00

1 2 3 4 5 ...

2270 Commits