fix(subagents): raise general-purpose max_turns to 150 and default timeout to 30min (#3610)

* fix(subagents): raise general-purpose max_turns to 150 and default timeout to 30min Deep-research subtasks failed out of the box with GraphRecursionError (Recursion limit of 100 reached): the built-in general-purpose subagent caps at max_turns=100. Raise it to 150 and bump the default subagent timeout from 900s (15min) to 1800s (30min) so the extra turns have time to run instead of shifting the failure to a timeout. The lead agent recursion_limit (100) is unchanged; the failures are subagent-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(subagents): clarify lead recursion_limit is independent of subagent max_turns Add comments at both lead recursion_limit=100 sites (gateway services + channel manager) explaining the lead's LangGraph super-step budget is separate from subagent depth, so the two 100s are not conflated. Comment-only, no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(subagents): clarify built-in vs custom timeout scope; pin bash max_turns in test Review follow-ups: (1) clarify SubagentConfig docstring + global timeout field/comment that the 1800 default applies to built-in subagents (custom agents keep their own timeout_seconds); (2) pin bash.max_turns==60 in the defaults regression test so the config.example.yaml doc cannot drift; (3) rename test_default_timeout_preserved_when_no_config -> test_explicit_global_timeout_propagates_to_general_purpose since it intentionally exercises an explicit non-default 900. No runtime behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 13:05:58 +00:00 · 2026-06-16 19:55:04 +08:00
parent d2cc991d55
commit 05be7ea688
9 changed files with 50 additions and 16 deletions
@@ -311,7 +311,7 @@ Proxied through nginx: `/api/langgraph/*` → Gateway LangGraph-compatible runti

 **Built-in Agents**: `general-purpose` (all tools except `task`) and `bash` (command specialist)
 **Execution**: Dual thread pool - `_scheduler_pool` (3 workers) + `_execution_pool` (3 workers)
-**Concurrency**: `MAX_CONCURRENT_SUBAGENTS = 3` enforced by `SubagentLimitMiddleware` (truncates excess tool calls in `after_model`), 15-minute timeout
+**Concurrency**: `MAX_CONCURRENT_SUBAGENTS = 3` enforced by `SubagentLimitMiddleware` (truncates excess tool calls in `after_model`); default subagent timeout `subagents.timeout_seconds=1800` (30 min) and built-in `general-purpose` `max_turns=150` (raised from 100/15-min so deep-research subtasks stop hitting `GraphRecursionError` out of the box)
 **Flow**: `task()` tool → `SubagentExecutor` → background thread → poll 5s → SSE events → result
 **Events**: `task_started`, `task_running`, `task_completed`/`task_failed`/`task_timed_out`
 **Deferred MCP tools** (if `tool_search.enabled`): `SubagentExecutor._build_initial_state` assembles deferral after policy filtering via the shared `assemble_deferred_tools` (fail-closed), appends the `tool_search` tool, injects the `<available-deferred-tools>` section into the subagent's `SystemMessage`, and threads the setup to `_create_agent`, which attaches `DeferredToolFilterMiddleware` through `build_subagent_runtime_middlewares(deferred_setup=...)`. Subagents thus withhold full MCP schemas until promotion, same as the lead agent; each task run gets a fresh `ThreadState` so promotion is isolated per run