fix(#3189): prevent write_file streaming timeout on long reports (#3195)

* fix(#3189): prevent write_file streaming timeout on long reports Adds a layered defense against StreamChunkTimeoutError caused by oversized single-shot write_file tool calls: - factory: default stream_chunk_timeout to 240s for OpenAI-compatible clients (overridable via ModelConfig.stream_chunk_timeout in config.yaml) - sandbox/tools: server-side 80 KB length guard on non-append write_file calls (configurable via DEERFLOW_WRITE_FILE_MAX_BYTES env var, 0 disables); rejects oversized payloads with a structured error pointing the model at str_replace or append=True - middleware: classify StreamChunkTimeoutError as transient but cap retries at 1 via per-exception _RETRY_BUDGET_OVERRIDES (same-payload retry on a chunk-gap timeout buffers the same way upstream; full 3-attempt loop would stack 6-12 min of dead air) - middleware: surface an actionable user-facing message for stream-drop exceptions instead of leaking the raw langchain stack - prompts: add a routing-style File Editing Workflow hint to both lead_agent and general_purpose subagent prompts, pointing the model at str_replace for incremental edits (mirrors Claude Code's Edit / Codex's apply_patch) - tests: behavioural coverage for size guard, retry budget override, stream-drop user message, factory default injection Refs #3189 * fix(#3189): drop stream_chunk_timeout for non-OpenAI providers Address CR feedback on PR #3195: - factory: pop `stream_chunk_timeout` from kwargs for any model_use_path other than `langchain_openai:ChatOpenAI` instead of returning early. `ModelConfig.stream_chunk_timeout` is part of the shared schema, so a user-supplied value on a non-OpenAI provider would otherwise be forwarded to its constructor and raise `TypeError: unexpected keyword argument`. - factory: rewrite docstring to describe the actual `exclude_none=True` behaviour (explicit null is excluded and falls back to the default) instead of the misleading "None falling out via exclude_none=True keeps its value". - tests: add regression coverage asserting the kwarg is stripped before reaching a non-OpenAI provider's constructor. Refs: bytedance#3189 * fix(#3189): restrict stream-drop user copy to StreamChunkTimeoutError only Per CR on #3195: narrow _STREAM_DROP_EXCEPTIONS to StreamChunkTimeoutError. Generic httpx RemoteProtocolError / ReadError fall back to the standard 'temporarily unavailable' copy, since they routinely fire on transient network blips where the 'split the output' guidance is misleading. Retry/backoff classification is unchanged — both remain transient/retriable. Tests updated to reflect new copy, plus a symmetric regression test for ReadError. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-06-10 09:25:57 +00:00 · 2026-06-07 17:47:11 +08:00
parent 268fdd6968
commit 88e36d9686
10 changed files with 677 additions and 4 deletions
@@ -373,7 +373,11 @@ def test_sync_read_error_triggers_retry_loop(monkeypatch: pytest.MonkeyPatch) ->
    result = middleware.wrap_model_call(SimpleNamespace(), handler)

    assert isinstance(result, AIMessage)
+    # ReadError is a generic connection drop, not a chunk-gap timeout, so
+    # it must fall back to the legacy transient copy rather than the
+    # specialized "split the work into smaller steps" guidance (#3195 CR).
    assert "temporarily unavailable" in result.content
+    assert "streaming response was interrupted" not in result.content
    assert attempts == 3  # exhausted all retries
    assert len(waits) == 2  # slept between attempts 1→2 and 2→3

@@ -397,7 +401,11 @@ async def test_async_read_error_triggers_retry_loop(monkeypatch: pytest.MonkeyPa
    result = await middleware.awrap_model_call(SimpleNamespace(), handler)

    assert isinstance(result, AIMessage)
+    # ReadError is a generic connection drop, not a chunk-gap timeout, so
+    # it must fall back to the legacy transient copy rather than the
+    # specialized "split the work into smaller steps" guidance (#3195 CR).
    assert "temporarily unavailable" in result.content
+    assert "streaming response was interrupted" not in result.content
    assert attempts == 3  # exhausted all retries
    assert len(waits) == 2  # slept between attempts 1→2 and 2→3

@@ -462,3 +470,211 @@ async def test_async_circuit_breaker_trips_and_recovers(monkeypatch: pytest.Monk
    assert result.content == "Success"
    assert middleware._circuit_failure_count == 0  # RESET
    assert middleware._check_circuit() is False
+
+
+class _StreamChunkTimeoutError(Exception):
+    """Local stand-in for langchain_openai's StreamChunkTimeoutError —
+    matched by class name, no langchain-openai import needed (mirrors
+    how this file already stubs httpx.ReadError / RemoteProtocolError).
+    """
+
+
+_StreamChunkTimeoutError.__name__ = "StreamChunkTimeoutError"
+
+
+def test_classify_error_stream_chunk_timeout_is_retriable() -> None:
+    """StreamChunkTimeoutError must be classified as transient/retriable."""
+    middleware = _build_middleware()
+    exc = _StreamChunkTimeoutError("No streaming chunk received for 120.0s (model=mimo-v2.5, chunks_received=58).")
+    exc.__class__.__name__ = "StreamChunkTimeoutError"
+    retriable, reason = middleware._classify_error(exc)
+    assert retriable is True
+    assert reason == "transient"
+
+
+def test_sync_stream_chunk_timeout_retries_once(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    """Sync handler raising StreamChunkTimeoutError is retried exactly once —
+    the per-exception override caps it at 2 total attempts (1 first call + 1
+    retry) even when retry_max_attempts=3.
+    Same-payload retry on a chunk-gap timeout buffers the same way upstream;
+    a full 3-attempt loop would stack 6-12 minutes of dead air before
+    surfacing failure. We keep one cheap reconnect for genuine transient TCP
+    blips, then surface the failure so the model can re-plan on its next turn.
+    """
+    middleware = _build_middleware(
+        retry_max_attempts=3,
+        retry_base_delay_ms=10,
+        retry_cap_delay_ms=10,
+    )
+    attempts = 0
+    waits: list[float] = []
+    monkeypatch.setattr("time.sleep", lambda d: waits.append(d))
+
+    def handler(_request) -> AIMessage:
+        nonlocal attempts
+        attempts += 1
+        raise _StreamChunkTimeoutError("No streaming chunk received for 120.0s")
+
+    result = middleware.wrap_model_call(SimpleNamespace(), handler)
+
+    assert isinstance(result, AIMessage)
+    assert "streaming response was interrupted" in result.content
+    # Override caps StreamChunkTimeoutError at 2 attempts (1 first call + 1 retry).
+    assert attempts == 2
+    # Exactly one sleep between the first attempt and the single retry.
+    assert len(waits) == 1
+
+
+@pytest.mark.anyio
+async def test_async_stream_chunk_timeout_retries_once(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    """Async mirror of the sync test: StreamChunkTimeoutError is capped at
+    2 attempts (1 first call + 1 retry) so we don't stack 6-12 minutes of
+    dead air on a same-payload buffering failure.
+    """
+    middleware = _build_middleware(
+        retry_max_attempts=3,
+        retry_base_delay_ms=10,
+        retry_cap_delay_ms=10,
+    )
+    attempts = 0
+    waits: list[float] = []
+
+    async def fake_sleep(d: float) -> None:
+        waits.append(d)
+
+    monkeypatch.setattr(asyncio, "sleep", fake_sleep)
+
+    async def handler(_request) -> AIMessage:
+        nonlocal attempts
+        attempts += 1
+        raise _StreamChunkTimeoutError("No streaming chunk received for 120.0s")
+
+    result = await middleware.awrap_model_call(SimpleNamespace(), handler)
+
+    assert isinstance(result, AIMessage)
+    assert "streaming response was interrupted" in result.content
+    assert attempts == 2
+    # Exactly one sleep between the first attempt and the single retry.
+    assert len(waits) == 1
+
+
+def test_max_attempts_for_returns_override_for_stream_chunk_timeout() -> None:
+    """StreamChunkTimeoutError must use the tightened budget (2 = "keep one retry"),
+    not the default of 3."""
+    middleware = _build_middleware(retry_max_attempts=3)
+    exc = _StreamChunkTimeoutError("upstream stalled")
+    exc.__class__.__name__ = "StreamChunkTimeoutError"
+
+    assert middleware._max_attempts_for(exc) == 2
+
+
+def test_max_attempts_for_falls_back_to_default_for_unlisted_exception() -> None:
+    """ReadError / RemoteProtocolError keep the full retry budget — only
+    StreamChunkTimeoutError pays for stalling upstream for `stream_chunk_timeout`
+    seconds per attempt, so only it gets the tighter cap.
+    """
+    middleware = _build_middleware(retry_max_attempts=3)
+
+    read_err = _ReadError("conn reset")
+    read_err.__class__.__name__ = "ReadError"
+    proto_err = _RemoteProtocolError("peer closed")
+    proto_err.__class__.__name__ = "RemoteProtocolError"
+
+    assert middleware._max_attempts_for(read_err) == 3
+    assert middleware._max_attempts_for(proto_err) == 3
+    assert middleware._max_attempts_for(FakeError("boom")) == 3
+
+
+def test_max_attempts_for_override_never_exceeds_user_cap() -> None:
+    """If the operator lowered retry_max_attempts below the override default,
+    the user-configured cap wins — overrides only ever *tighten*, never loosen.
+    """
+    middleware = _build_middleware(retry_max_attempts=1)
+    exc = _StreamChunkTimeoutError("upstream stalled")
+    exc.__class__.__name__ = "StreamChunkTimeoutError"
+
+    assert middleware._max_attempts_for(exc) == 1
+
+
+def test_user_message_for_stream_chunk_timeout_mentions_split_or_shorten() -> None:
+    """When the retry budget for StreamChunkTimeoutError is exhausted, the user
+    message must guide the user toward splitting / shortening the request
+    instead of suggesting a generic retry. This is the actionable advice
+    Reviewer B asked for in the follow-up CR (issue #3189).
+    """
+    middleware = _build_middleware()
+    exc = _StreamChunkTimeoutError("No streaming chunk received for 120.0s")
+    exc.__class__.__name__ = "StreamChunkTimeoutError"
+
+    message = middleware._build_user_message(exc, reason="transient")
+
+    assert "streaming response was interrupted" in message
+    assert "split" in message or "shorten" in message
+    # The old generic "streaming response was interrupted" wording must NOT appear here,
+    # otherwise the actionable guidance is buried.
+    assert "temporarily unavailable" not in message
+
+
+def test_user_message_for_remote_protocol_error_uses_generic_transient_copy() -> None:
+    """RemoteProtocolError is a generic connection drop that can fire on
+    transient network blips with perfectly normal payloads. The
+    "split the work into smaller steps" guidance only applies when the
+    upstream chunk-gap watchdog fires (StreamChunkTimeoutError), so
+    RemoteProtocolError must fall back to the legacy transient copy.
+    Regression guard for the #3195 CR feedback.
+    """
+    middleware = _build_middleware()
+    exc = _RemoteProtocolError("Server closed connection unexpectedly")
+    exc.__class__.__name__ = "RemoteProtocolError"
+
+    message = middleware._build_user_message(exc, reason="transient")
+
+    assert "temporarily unavailable" in message
+    assert "streaming response was interrupted" not in message
+
+
+def test_user_message_for_read_error_uses_generic_transient_copy() -> None:
+    """httpx.ReadError is symmetric to RemoteProtocolError: a generic
+    connection drop that must NOT receive the "split the work" guidance.
+    Regression guard for the #3195 CR feedback.
+    """
+    middleware = _build_middleware()
+    exc = FakeError("connection dropped mid-stream")
+    exc.__class__.__name__ = "ReadError"
+
+    message = middleware._build_user_message(exc, reason="transient")
+
+    assert "temporarily unavailable" in message
+    assert "streaming response was interrupted" not in message
+
+
+def test_user_message_for_generic_transient_keeps_legacy_copy() -> None:
+    """Generic transient errors (HTTP 503, 'cluster busy', etc.) must keep
+    the original 'streaming response was interrupted' message — only stream-drop
+    exceptions get the new specialized copy. This prevents regression on
+    callers who already rely on the legacy wording.
+    """
+    middleware = _build_middleware()
+    exc = FakeError("server busy", status_code=503)
+
+    message = middleware._build_user_message(exc, reason="transient")
+
+    assert "temporarily unavailable" in message
+    assert "streaming response was interrupted" not in message
+
+
+def test_user_message_for_quota_unchanged() -> None:
+    """Sanity check: the quota / auth branches must remain untouched by the
+    stream-drop refactor.
+    """
+    middleware = _build_middleware()
+    exc = FakeError("insufficient_quota", status_code=429, code="insufficient_quota")
+
+    message = middleware._build_user_message(exc, reason="quota")
+
+    assert "out of quota" in message
+    assert "streaming response was interrupted" not in message