Commit Graph

665 Commits

Author SHA1 Message Date
taohe e0cb0df5da Persist disconnected IM channel state 2026-06-11 22:46:36 +08:00
taohe b17bc5efc9 Ensure runtime IM channels are ready after restart 2026-06-11 22:19:03 +08:00
taohe 7ce047b723 Merge remote-tracking branch 'origin/main' into codex/im-channel-connections 2026-06-11 21:13:25 +08:00
taohe 0268ebbe6a Use sha256 user buckets with legacy migration 2026-06-11 21:08:08 +08:00
taohe 0798489734 Address channel connection review comments 2026-06-11 20:54:57 +08:00
taohe 5e3cc83d17 Isolate channel runtime config tests 2026-06-11 18:15:53 +08:00
taohe 6956b247fd Stabilize backend tests without local config 2026-06-11 18:07:06 +08:00
DanielWalnut f401e7baa6 [codex] Fix stale AIO sandbox cache reuse (#3494)
* Fix stale AIO sandbox cache reuse

* Address AIO sandbox review feedback

* Distinguish sandbox health check failures

* Keep local discovery recoverable when the runtime check fails

LocalContainerBackend.discover() shares _is_container_running, which now
raises on transient daemon errors instead of returning False. Discovery has
no exception handling in _discover_or_create_with_lock(_async), so a brief
Docker hiccup turned a recoverable "could not verify, create instead" into a
hard acquire failure. Catch the check failure inside discover() and return
None so an unverifiable container is simply not adopted, restoring the
pre-change fall-through while keeping raise-on-unknown semantics protecting
the destroy path.

Reported by fancy-agent on PR #3494.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* Narrow the not-found match in container inspect error handling

A bare "not found" substring also matches transient failures like "command
not found" or "context not found", which would misclassify a check error as
"container definitely gone" and bypass the raise-on-unknown contract. Keep
Docker's specific "No such object"/"No such container" phrases, and only
trust a generic "not found" (Apple Container) when the message names the
inspected container or refers to a container/object.

Reported by WillemJiang on PR #3494.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 17:53:37 +08:00
taohe d1768606c0 Merge remote-tracking branch 'origin/main' into codex/im-channel-connections
# Conflicts:
#	backend/app/gateway/services.py
#	frontend/src/app/workspace/chats/page.tsx
2026-06-11 17:51:16 +08:00
Huixin615 919d8bc279 fix(sandbox): persist lazily-acquired sandbox state via Command (#3464)
* fix(sandbox): persist lazily-acquired sandbox state via Command

ensure_sandbox_initialized mutates runtime.state in place, which is local
to the current tool invocation and is not picked up by LangGraph's channel
reducer. Subsequent graph steps and downstream consumers (such as
ToolOutputBudgetMiddleware and the sub-agent task_tool) therefore cannot
observe the sandbox id from state.

Wrap tool calls in SandboxMiddleware (wrap_tool_call / awrap_tool_call) to
detect fresh lazy initialization by diffing runtime.state before and after
the handler, and emit a proper state update via Command(update=...):

- ToolMessage results are wrapped into Command(update={sandbox, messages})
- Command results with a dict update are merged on the sandbox key while
  preserving messages / goto / graph / resume
- Command results with non-dict updates are left untouched to avoid silent
  data loss on unknown update shapes

Tests:
- 7 new unit tests cover lazy-init emit, passthrough, dict-update merge,
  non-dict-update passthrough (sync and async)
- Refresh replay golden write_read_file.ultra.events.json: SSE 'values'
  events now correctly carry the 'sandbox' key in their keys list, which
  is the direct evidence that the fix is effective

Closes #3463

* refactor(sandbox): use dataclasses.replace to preserve Command fields

Address Copilot review on #3464: replace manual field-copy with
dataclasses.replace so any current or future Command fields are
preserved automatically when merging sandbox_update.

Also add a regression test that constructs a Command with non-None
graph/goto/resume to lock this behavior in.
2026-06-11 17:50:36 +08:00
taohe ddd1c5e42f Let setup wizard enable IM channels 2026-06-11 17:34:22 +08:00
taohe a270e8b310 Ignore Feishu non-content message events 2026-06-11 17:22:16 +08:00
taohe f330ddce01 Ignore Feishu message read events 2026-06-11 17:15:44 +08:00
taohe b26b30ac3d Reflect IM channel runtime health 2026-06-11 17:11:55 +08:00
taohe dae7c7870e Prefill IM channel runtime config 2026-06-11 17:02:02 +08:00
taohe 4f56437030 Show IM channel source on threads 2026-06-11 16:51:04 +08:00
taohe 42fd0cc22f Use default user for auth-disabled local mode 2026-06-11 16:33:37 +08:00
taohe a4202028d9 Route no-auth channel sessions to local user 2026-06-11 16:22:14 +08:00
taohe 4a0278420f Allow disconnecting runtime IM channels 2026-06-11 16:10:02 +08:00
taohe ade4a55cfe Persist IM runtime config locally 2026-06-11 15:58:40 +08:00
taohe 09872af36c Make channel threads visible to connection owners 2026-06-11 15:40:49 +08:00
taohe 9d51e38641 Keep configured IM channels editable 2026-06-11 14:40:37 +08:00
taohe c4368c9018 Add runtime setup for enabled IM channels 2026-06-11 12:10:16 +08:00
taohe 89da9b70db Format additional channel connection tests 2026-06-11 11:20:39 +08:00
taohe a52deada8b Support all integrated IM channel connections 2026-06-11 11:19:27 +08:00
taohe b7097baaec Address IM channel review comments 2026-06-11 10:33:44 +08:00
taohe 87200ff920 Address Copilot IM channel feedback 2026-06-11 08:26:48 +08:00
hataa b3c2cc42cf fix(agents): require config.yaml in resolve_agent_dir to skip memory-only directories (#3390) (#3481)
When memory is enabled, the first conversation with a legacy shared agent
creates a per-user agent directory containing only memory.json (no
config.yaml). On the second turn, resolve_agent_dir() returned this
incomplete directory, causing load_agent_config() to fail with
"Agent config not found".

Require config.yaml to exist alongside the directory for both the
per-user and legacy paths, so that memory-only directories fall
through correctly. This aligns resolve_agent_dir with the existing
config.yaml check in list_custom_agents.

Refs: https://github.com/bytedance/deer-flow/issues/3390
2026-06-10 23:57:17 +08:00
Ryker_Feng 167ef4512f feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429) (#3465)
* feat(memory): add memory.token_counting config to avoid tiktoken network dependency (#3429)

Add a `memory.token_counting` option (`tiktoken` | `char`) so deployments in
network-restricted environments can opt out of tiktoken entirely. In `char`
mode the memory-injection budget uses a network-free character-based estimate
and never triggers the BPE download from openaipublic.blob.core.windows.net,
which could otherwise block for tens of minutes (see #3402).

Also harden the default `tiktoken` path:
- cache an in-flight LOADING sentinel so concurrent callers fall back
  immediately instead of spawning more blocking get_encoding threads when the
  first load is still running (e.g. under the 5s startup warm-up timeout);
- cache failures with a timestamp and retry after a cooldown so a transient
  network outage self-heals back to accurate counting without a restart;
- skip startup warm-up entirely in char mode.

The new config is surfaced via the memory config API and config.example.yaml
(config_version bumped). Default remains `tiktoken`, so existing deployments
are unaffected.

* fix(memory): use CJK-aware char token estimate and address review feedback

- Replace the flat len(text)//4 fallback with a CJK-aware estimate so
  Chinese/Japanese/Korean memory content does not over-fill the injection budget
- Document the internal tiktoken retry cooldown and char-mode escape hatch
- Sync CLAUDE.md / config.example.yaml / MEMORY_IMPROVEMENTS.md wording
- Fix MemoryConfigResponse mocks/assertions and add CJK estimate tests
2026-06-10 23:26:15 +08:00
Xinmin Zeng ba9cc5e972 fix(gateway): enforce thread ownership on stateless run endpoints (#3473)
POST /api/runs/stream and /api/runs/wait accept thread_id in the request
body but performed no owner authorization, letting any authenticated user
start runs on -- and read /wait checkpoint channel_values from -- another
user's thread (cross-user IDOR, #3472).

The @require_permission(owner_check=True) decorator resolves ownership from
the thread_id *path* param, so it cannot cover these body-param endpoints.
Enforce ownership inside start_run() before create_or_reject via
ThreadMetaStore.check_access: missing rows (auto-created temp threads) and
NULL-owner rows stay accessible, while a thread owned by another user
returns 404 (matching thread_runs.py). The internal system role (IM
channels acting for platform users) is exempt.

Closes #3472
2026-06-10 23:03:39 +08:00
taohe 6a94b58ad1 Fix safe user id digest algorithm 2026-06-10 22:53:07 +08:00
taohe d06643d8a2 Align IM connections with local channels 2026-06-10 22:16:47 +08:00
taohe 92c185b90d Support local IM channel connections 2026-06-10 21:59:33 +08:00
taohe 9effa7be6d Merge remote-tracking branch 'origin/main' into codex/im-channel-connections 2026-06-10 21:42:12 +08:00
Xinmin Zeng 05ae4467ae fix(docker): default Gateway to a single worker to prevent multi-worker breakage (#3475)
The default `make up` started the Gateway with `--workers 4`, but run state
(RunManager and the stream bridge) is held in-process and nginx uses no sticky
sessions. With the default config, same-run requests scatter across workers that
each keep their own run state, breaking run cancellation (409), SSE reconnect
(hangs on heartbeats), multitask de-duplication, and IM channels (duplicate
replies). The shared cross-worker stream bridge does not exist yet.

Default GATEWAY_WORKERS to 1 so the out-of-the-box deployment is correct,
document the single-worker boundary in the README, and add a regression test
pinning the default while keeping it overridable. This is a stop-gap, not a
multi-worker implementation; the full fix (shared run state + stream bridge) is
tracked in #3191.

Refs #3239, #3260
2026-06-10 21:36:25 +08:00
taohe ec5ed185cd Merge remote-tracking branch 'origin/main' into codex/im-channel-connections
# Conflicts:
#	backend/app/channels/discord.py
#	backend/app/channels/manager.py
#	backend/app/channels/slack.py
#	backend/app/channels/telegram.py
2026-06-10 21:13:02 +08:00
taohe dbe3a3bb0d Add user-owned IM channel connections 2026-06-10 21:07:44 +08:00
DanielWalnut 2b795265e7 fix: align auth-disabled mode and mock history loading (#3471)
* fix: align auth-disabled mode and mock history loading

* fix: address auth-disabled review feedback

* test: cover auth-disabled backend contract

* style: format frontend tests

* fix: address follow-up review comments
2026-06-10 16:11:00 +08:00
Nan Gao a57d05fe0a fix runtime journal run lifecycle events (#3470) 2026-06-10 08:33:29 +08:00
Lucy Shen ae9e8bc0bf fix(sandbox): make missing sandbox.mounts host_path a loud ERROR (#3244) (#3250)
In Docker production deployments, LocalSandboxProvider runs inside the
deer-flow-gateway container, so any `sandbox.mounts[].host_path` from
config.yaml is resolved against the gateway container's filesystem — not
the host machine. When the path isn't also bind-mounted into the gateway
service, the mount was silently dropped with only a WARNING log, leaving
agents reading an empty directory in production while the same config
worked under `make dev`.

Escalate the missing-host_path branch to logger.error with explicit
guidance about Docker bind mounts and docker-compose, so the failure is
hard to miss in default log configurations. Skip behaviour is preserved
to avoid breaking existing deployments.

Also clarify the misleading `VolumeMountConfig.host_path` field
description so it documents reality for both providers:

  - LocalSandboxProvider checks host_path from inside the gateway process
    (host in `make dev`, container in `make up`).
  - AioSandboxProvider (DooD) passes host_path straight to `docker -v`
    for the sandbox container, where the host Docker daemon resolves it
    from the host machine's perspective.

config.example.yaml's `sandbox.mounts` comment gets a Note: block
pointing operators at the docker-compose bind-mount requirement so the
Docker-mode gotcha is discoverable from the canonical template.

Adds a regression test that:
  - confirms missing host_path is still skipped (no behaviour break);
  - asserts an ERROR record is emitted referencing the offending paths;
  - asserts the message contains actionable Docker/gateway/docker-compose
    keywords so future refactors can't quietly downgrade it.

Refs: https://github.com/bytedance/deer-flow/issues/3244
2026-06-09 23:16:14 +08:00
DanielWalnut 16391e35ab fix(skills): harden slash skill activation across chat channels (#3466)
* support slash skill activation

* format slash skill activation

* Preserve slash skill activation with uploads

* Address slash skill review feedback

* Address slash skill follow-up review

* Fix lazy slash skill storage resolution

* Keep slash skill activation out of system prompt

* Address slash skill review issues

* fix: harden slash skill command handling

* feat(frontend): add slash skill autocomplete

* fix: address slash skill review feedback

* fix: preserve slash skill text for IM uploads
2026-06-09 23:07:17 +08:00
tanghang97 18bbb82f07 Fix 'make dev' failure in Windows environment (#3236)
* fix: Solving the problem of "make dev" failing to start in Windows environment

* fix: revert the change to the startup_config and fix the lint errors

* fix: Address Copilot review feedback

- Validate wait-for-port input and avoid PowerShell port interpolation
- Require Python 3 in serve.sh launcher detection
- Keep Windows event loop policy setup in sitecustomize only
- Clarify sitecustomize process-wide backend behavior
2026-06-09 22:37:54 +08:00
ly-wang19 b62c5a7b5b fix(agents): offload blocking filesystem IO in the custom-agent router off the event loop (#3457)
* fix(agents): offload blocking filesystem IO in delete_agent off the event loop

delete_agent is an async route handler but resolved the agent directory (Paths.base_dir -> Path.resolve), probed it (Path.exists), and removed it (shutil.rmtree) directly on the event loop, blocking it for the duration of every delete. Surfaced by 'make detect-blocking-io'.

Move the resolve/exists/rmtree sequence into a sync helper run via asyncio.to_thread, mapping its outcome back to the existing 404/409/500 responses (behavior unchanged). Adds a tests/blocking_io/ regression anchor under the strict Blockbuster gate, mirroring test_skills_load.py (#1917).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agents): offload blocking filesystem IO in create_agent_endpoint too

Like delete_agent, the async create_agent_endpoint resolved and created the agent directory and wrote config.yaml + SOUL.md (with rmtree cleanup on failure) directly on the event loop. Move the whole create-or-409 sequence into a sync helper run via asyncio.to_thread; behavior is unchanged (201 / 409 / 500). Extends the blocking_io regression anchor to cover create as well as delete and renames it to test_agents_router.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-09 22:24:53 +08:00
Nan Gao 63ce88f874 fix(replay-e2e): key fixtures by caller and conversation (#3453)
* add caller identity in replay e2e

* make format

* fix(replay-e2e): stabilize title caller replay

* fix(replay-e2e): use captured caller without run manager

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-06-09 21:58:31 +08:00
hataa 37337b77f9 feat(models): add StepFun reasoning model adapter (#3461)
Add PatchedChatStepFun adapter for StepFun reasoning models (step-3.7-flash,
step-3.5-flash). Captures reasoning from both streaming and non-streaming
responses and replays it on historical assistant messages for multi-turn
tool-call conversations.

- New: PatchedChatStepFun adapter with streaming/non-streaming reasoning capture
- Support both reasoning and reasoning_content field names
- 17 unit tests covering all response paths
- Updated: config.example.yaml with StepFun configuration example
2026-06-09 18:01:43 +08:00
ly-wang19 8db16bb3d8 fix(config): coerce null config.yaml list sections to empty list (#3434)
Copying config.example.yaml to config.yaml and starting DeerFlow crashed with `pydantic ValidationError: models — Input should be a valid list [input_value=None]`, because the example ships every entry under `models:` commented out, so PyYAML parses the key as null. Reported in #1444.

Add a field_validator(mode="before") on AppConfig that coerces null models/tools/tool_groups to [] (matching their default_factory=list), and emit an actionable warning from from_file when no models are configured (pointing to config.example.yaml / make setup). Adds regression tests.

Closes #1444

Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-06-09 15:45:28 +08:00
AochenShen99 93e3281cbf fix(dev): create backend/sandbox before uvicorn reload-exclude (#3459) (#3460)
* fix(dev): create backend/sandbox before uvicorn reload-exclude (#3459)

#3426 switched the dev gateway's --reload-exclude patterns to absolute
paths. uvicorn only excludes an absolute path directly when it already
exists as a directory; otherwise it globs the pattern, and Python 3.12's
pathlib raises NotImplementedError("Non-relative patterns are unsupported")
for an absolute glob pattern. serve.sh mkdir'd the .deer-flow excludes but
not backend/sandbox, so `make dev` crashed on startup on a fresh checkout
under Python 3.12 (#3454). docker/dev-entrypoint.sh had the same latent gap.

Create backend/sandbox in both launchers so every absolute exclude stays on
uvicorn's is_dir() short-circuit. Add a regression test that pins the uvicorn
mechanism (crash on missing dir, safe once created) and enforces that every
absolute --reload-exclude is mkdir'd before launch.

Closes #3459

* test(dev): harden reload-exclude invariant parser against false pass/negatives

The launcher invariant test parsed shell with a "mkdir -p" line filter and a
substring membership check. Two latent gaps (sub-threshold for this fix, but
this code guards a user-facing startup path, so close them):

- A `\`-continued multi-line `mkdir` would drop arguments on continuation
  lines, silently weakening coverage.
- Substring membership could false-pass when an exclude is a path-prefix of a
  different created dir (e.g. `/app/backend/sandbox` "found" inside
  `/app/backend/sandbox-other`).

Fold line-continuations, drop comments, and shlex-tokenize each `mkdir`
argument list into an exact set (quotes stripped, `$VAR` literal); assert exact
set membership. Same shlex handling for `--reload-exclude` values. Verified the
parser still flags the pre-fix missing `backend/sandbox` (RED preserved) and no
longer false-passes on a path-prefix.

* fix(dev): gitignore backend/sandbox runtime dir + pin mkdir-before-launch

Address two review findings on the #3459 fix:

- backend/sandbox was described as "gitignored runtime state" but no ignore
  rule actually matched it. Add an anchored `/sandbox/` to backend/.gitignore
  (anchored so it does NOT shadow the source package
  backend/packages/harness/deerflow/sandbox/) so sandbox artifacts created at
  runtime can't pollute the working tree or be committed by accident. New test
  asserts content under backend/sandbox is ignored, making the claim verifiable.

- The launcher invariant test only proved the sandbox mkdir exists somewhere,
  not that it runs before uvicorn starts. Add an order test (sandbox mkdir line
  must precede the `uv run uvicorn` launch) so a future edit can't move the
  mkdir below the launch and silently reintroduce the crash.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* test(dev): fix reload-exclude parser to handle serve.sh's quoted flag bundle

The previous autofix tokenized each whole line with shlex, but serve.sh packs
every flag into a single double-quoted `GATEWAY_EXTRA_FLAGS="..."` assignment.
shlex collapses that into one token, so no `--reload-exclude` flag is found and
`test_launcher_precreates_every_absolute_reload_exclude[scripts/serve.sh]`
failed CI with "expected at least one absolute reload-exclude".

Parse `--reload-exclude` with a regex that matches a balanced single/double
quoted group or a bare token, so the assignment's surrounding `"` is never
swallowed into the value. This recovers all three serve.sh excludes (the prior
regex also silently dropped the last `$BACKEND_RUNTIME_HOME` because the
adjacent closing quote broke shlex) while still covering dev-entrypoint.sh and
the space-separated `--reload-exclude <value>` form.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-09 15:29:40 +08:00
AochenShen99 0fb18e368c refactor(lead-agent): make build_middlewares public to drop the last cross-module private import (#3458)
`client.py` imported the private `_build_middlewares` from `agent.py` across a
module boundary and called it as public API. Because the `_` name signals
"module-private, no external callers", any future rename or signature change
silently breaks the embedded `DeerFlowClient` path — and the test suite even
monkeypatched `deerflow.client._build_middlewares`, baking the leak in.

`DeerFlowClient` is a lead-agent variant that genuinely needs the lead agent's
full middleware composition, so make the dependency honest: promote the helper
to a documented public entry point `build_middlewares` and update every in-repo
caller. Found during #3341 review; #3341 already removed one such leak
(`_assemble_deferred` -> public `assemble_deferred_tools`) and left this one out
of scope on purpose.

- agent.py: rename def + both internal call sites; expand the docstring into a
  public-entry-point contract and document the previously-undocumented
  model_name / app_config / deferred_setup params
- client.py: import + call site now use the public name (removes the last
  cross-module private import)
- scripts/tool-error-degradation-detection.sh: update its import + call site
- tests (5 files): update monkeypatch/patch targets and direct calls
- docs (backend/CLAUDE.md, plan_mode_usage.md, middlewares.mdx): sync the live
  references that describe the symbol as current API

Pure mechanical rename, no behavior change. Historical design docs (rfc,
superpowers spec) intentionally keep the old name as point-in-time records.

Closes #3431
2026-06-09 11:56:28 +08:00
Ryker_Feng f92a26d56f fix(web_fetch): support proxy for Jina reader in restricted networks (#3418) (#3430)
* fix(web_fetch): support proxy for Jina reader in restricted networks

The web_fetch tool built a bare httpx.AsyncClient() with no proxy
awareness, so users behind a corporate proxy / in Docker / WSL could
not reach https://r.jina.ai and web_fetch timed out.

- Add optional `proxy` / `trust_env` params to JinaClient.crawl and
  wire them from the `web_fetch` tool config (with type coercion for
  YAML string values).
- Pass internal service hostnames through NO_PROXY in both compose
  files so proxy env inherited via env_file does not break in-cluster
  calls (gateway/provisioner/etc).
- Load proxy vars from .env into the shell in scripts/docker.sh so the
  NO_PROXY interpolation can merge user-provided values on `make` path.
- Document proxy/trust_env options in config.example.yaml.

Closes #3418

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-08 23:25:29 +08:00
AochenShen99 3b6dd0a4e3 feat(subagents): extend deferred MCP tool loading to subagents (#3432)
* feat(subagents): extend deferred MCP tool loading to subagents (#3341)

Subagents now reuse the lead agent's deferred-tool path: when
tool_search.enabled, MCP tool schemas are withheld from the model and
surfaced by name in <available-deferred-tools>, fetched on demand via the
generated tool_search helper. DeferredToolFilterMiddleware deterministically
rewrites request.tools to hide the deferred schemas (the prompt section is
discovery only, not enforcement).

Consolidates the assembly into deerflow.tools.builtins.tool_search, now the
single home for both assemble_deferred_tools (centralized fail-closed guard,
replacing the lead-only private _assemble_deferred) and the relocated
get_deferred_tools_prompt_section. Shared by every build path: lead agent,
embedded client, and subagent executor.

tool_search is appended after the subagent's name-level tool policy and is
treated as infrastructure: its catalog is built from the already
policy-filtered list, so it can never surface a tool the policy denied.

Follow-up to #3370. Fixes #3341.

* test(subagents): assert the real middleware builder emits a working deferred filter (#3341)

The existing recipe test hand-constructs DeferredToolFilterMiddleware, so it
cannot catch a regression in how build_subagent_runtime_middlewares (the call
executor._create_agent actually makes) wires the deferred setup into the
filter. Add a test that sources the filter from the real builder given a real
setup and runs it through a graph: a wrong catalog hash would silently stop
promotion, a dropped filter would stop hiding — both now caught.

Running the full real middleware stack is intentionally avoided (the other
runtime middlewares need sandbox/thread infra to execute, which would make the
test flaky); their attachment + ordering before Safety stays locked in
test_tool_error_handling_middleware.py.

* test(subagents): keep executor tests config-free in CI

* chore: trigger ci

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-08 23:17:22 +08:00