fix(subagent): structured subagent_status field over text parsing (#3146) (#3154)

* fix(subagent): structured subagent_status field over text parsing

Closes #3146.

## Why

The frontend used to derive subtask card state by string-matching the
leading text of the `task` tool's result. That contract surface was
fragile — `#3107` BUG-007 and the `#3131` review both surfaced cases
where new backend wording (`Task cancelled by user.`,
`Task polling timed out after N minutes`, `ToolErrorHandlingMiddleware`
exception wrappers) silently broke the card lifecycle. The frontend
fallback kept growing more prefixes; any future rewording would break
it again.

## Design

1. **Backend → frontend contract**: `ToolMessage.additional_kwargs`
   carries `subagent_status` (one of `completed | failed | cancelled |
   timed_out | polling_timed_out`) and an optional `subagent_error`
   blob. The frontend prefers it over parsing `content`.

2. **Centralised stamping, not 8 sprinkled stamps**: rather than have
   each of `task_tool.py`'s 5 normal-return + 3 pre-execution `Error:`
   paths remember to set `additional_kwargs`, `ToolErrorHandlingMiddleware`
   stamps the field after every task-tool call. Adding a new return
   path in `task_tool.py` cannot now skip the stamp.

3. **Cross-language contract fixture**: the prefix→status mapping is
   the one piece both sides must agree on. The shared fixture at
   `contracts/subagent_status_contract.json` lists every backend return
   string, the expected status, and what the error substring should
   contain. Backend test (`backend/tests/test_subagent_status_contract.py`)
   and frontend test (`frontend/tests/unit/core/tasks/subtask-result.test.ts`)
   both load that fixture and assert the same cases. A wording drift on
   either side fails the matching language's test.

4. **Round-trip serialisation pinned**: the round-trip test asserts
   `ToolMessage.model_dump_json()` → `model_validate_json()` preserves
   `additional_kwargs.subagent_status`. Catches the case where a future
   LangChain or Pydantic upgrade silently strips unknown kwargs.

5. **Frontend status collapse documented**: the backend has five status
   values, the frontend card has three (`completed | failed |
   in_progress`). `cancelled` / `timed_out` / `polling_timed_out` all
   collapse to `failed` with the original status preserved in `error`.
   `parseSubtaskResult` returns `in_progress` for unknown values so a
   backend that ships a new enum variant before the frontend upgrades
   degrades to the legacy prefix fallback instead of getting pinned.

## Changes

Backend:
- `deerflow.subagents.status_contract` — new module exporting
  `SUBAGENT_STATUS_KEY`, `SUBAGENT_ERROR_KEY`,
  `SUBAGENT_STATUS_VALUES`, `extract_subagent_status(content)`, and
  `make_subagent_additional_kwargs(status, error)`.
- `ToolErrorHandlingMiddleware`: new `_stamp_task_subagent_status`
  helper centralises the stamp; `wrap_tool_call` / `awrap_tool_call`
  stamp on the success path; `_build_error_message` stamps on the
  wrapper path (carrying `ExcClass: detail` into `subagent_error`).
  Non-task tools are untouched.
- New tests: `test_subagent_status_contract.py` (19 cases from the
  shared fixture + status-enum / blank-error / unknown-status
  rejection) and `test_tool_error_handling_subagent_stamp.py`
  (middleware integration: terminal-content stamps, non-terminal
  doesn't, non-task tools untouched, async path mirrors sync,
  existing additional_kwargs survive, JSON round-trip preserved).

Frontend:
- `parseSubtaskResult(text, additionalKwargs?)` — prefers the
  structured stamp; falls back to the legacy prefix matcher for
  historical threads / unknown future status values.
- `STRUCTURED_STATUS_TO_SUBTASK` documents the five→three collapse.
- `message-list.tsx` passes `message.additional_kwargs` through.
- `subtask-result.test.ts` adds a structured-status block + a
  fixture-driven contract block; legacy prefix tests stay green for
  the fallback path.

Contract:
- `contracts/subagent_status_contract.json` — single source of truth
  both languages load. Whitespace variants, varied N for polling
  timeouts, the 3 pre-execution `Error:` returns task_tool produces,
  and the middleware wrapper shape are all in there.

## Test plan
- `make lint` clean (backend + frontend).
- `pytest tests/test_subagent_status_contract.py
   tests/test_tool_error_handling_subagent_stamp.py` → 37 passed.
- `pnpm test --run` → 103 passed (was 76, +27 new).

## Migration / fallback retirement

The text-prefix fallback stays in place until backend telemetry shows
the frontend never hits it for newly produced messages. At that point
a follow-up PR can drop the prefix branches and keep only the
structured-status branch.

Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131
(prior prefix-only fix), #3146 (this issue).

* fix(subtask): back-fill result/error from text when structured status present

Three follow-ups on the PR #3154 review:

1. `readStructuredStatus` no longer short-circuits the prefix parse.
   The backend currently stamps only the `subagent_status` enum value;
   the human-facing `result` body and wrapped-error message still live
   in `ToolMessage.content`. Dropping the text parse meant successful
   tasks rendered empty completed pills and wrapped failures lost their
   diagnostic. Now both shapes get composed: structured status wins,
   `result`/`error` come from text when both sides agree, and a lying
   success body under a `failed` stamp is dropped instead of leaking.

2. Replace the ESM-incompatible `__dirname` fixture lookup in
   subtask-result.test.ts with `fileURLToPath(new URL(..., import.meta.url))`.
   The frontend package is `"type": "module"`, so the previous path
   would have thrown at runtime if anything ever changed under the
   contract directory.

3. Drop the `$schema` reference from contracts/subagent_status_contract.json
   pointing at a file that doesn't exist in the tree.

Three new tests cover the structured + text composition: completed
back-fills the success body, failed back-fills the wrapper text, and
unrecognised content under a `failed` stamp stays empty rather than
echoing noise.
This commit is contained in:
Xinmin Zeng
2026-06-07 22:49:55 +08:00
committed by GitHub
parent d8b728f7cb
commit 8d2e55a05f
8 changed files with 780 additions and 17 deletions
+113 -13
View File
@@ -8,6 +8,35 @@ export interface SubtaskResultUpdate {
error?: string;
}
/**
* Structured-status keys the backend stamps onto
* ``ToolMessage.additional_kwargs`` for every ``task`` tool result.
*
* The values mirror the Python contract in
* ``backend/packages/harness/deerflow/subagents/status_contract.py``
* (``SUBAGENT_STATUS_KEY`` / ``SUBAGENT_ERROR_KEY``). The cross-language
* fixture at ``contracts/subagent_status_contract.json`` pins both sides
* to the same values.
*/
export const SUBAGENT_STATUS_KEY = "subagent_status";
export const SUBAGENT_ERROR_KEY = "subagent_error";
/**
* Map from the backend ``subagent_status`` value to the frontend
* {@link SubtaskStatus} enum. The frontend collapses ``cancelled`` /
* ``timed_out`` / ``polling_timed_out`` into ``failed`` because the
* subtask card only renders three pill states. The richer backend
* vocabulary still survives on ``error`` for tooling that wants the
* detail.
*/
const STRUCTURED_STATUS_TO_SUBTASK: Record<string, SubtaskStatus> = {
completed: "completed",
failed: "failed",
cancelled: "failed",
timed_out: "failed",
polling_timed_out: "failed",
};
/**
* Prefix strings the backend `task` tool writes into its result `content`.
*
@@ -34,24 +63,68 @@ export const POLLING_TIMEOUT_PREFIX = "Task polling timed out";
export const ERROR_WRAPPER_PATTERN = /^Error\b/i;
/**
* Map a `task` tool result string to a {@link SubtaskStatus}.
* Map a `task` tool result to a {@link SubtaskStatus}.
*
* Bytedance/deer-flow issue #3107 BUG-007: parent-visible task tool errors do
* not always start with one of the three legacy prefixes (e.g. when
* `ToolErrorHandlingMiddleware` wraps an exception as
* `Error: Tool 'task' failed ...`). Treat any leading `Error:` token as a
* terminal failure so subtask cards stop being stuck on "in_progress".
* Bytedance/deer-flow issue #3146: prefers the structured
* ``additional_kwargs.subagent_status`` field the backend now stamps via
* ``ToolErrorHandlingMiddleware``. Falls back to the legacy prefix
* matching for messages that pre-date the stamping commit (historical
* threads, third-party clients, or any tool path that bypasses the
* middleware). Both shapes converge on the same {@link SubtaskStatus}
* vocabulary the card UI renders.
*
* When the structured field is present, the prefix parser is still run
* so the success `result` body and the wrapped-error message can be
* back-filled from `content`. Today the backend only stamps the
* `subagent_status` enum value — the human-facing payload still lives
* in `content`, so dropping the prefix parse would regress the subtask
* card display. Structured fields win on conflict: if `subagent_status`
* and the text disagree, the text-derived `result`/`error` are
* discarded so a malformed wrapper can't sneak through.
*
* Returning `in_progress` is the **deliberate** fallback for content that
* matches none of the known prefixes. LangChain only ever emits a
* `ToolMessage` once the tool itself has returned (success or wrapped
* exception), so an unknown shape means "the contract changed underneath us"
* — surfacing it as still-running prompts the operator to investigate, where
* eagerly marking it terminal-failed would mask the drift.
* matches none of the known prefixes and carries no structured stamp.
* LangChain only ever emits a `ToolMessage` once the tool itself has
* returned (success or wrapped exception), so an unknown shape means
* "the contract changed underneath us" — surfacing it as still-running
* prompts the operator to investigate, where eagerly marking it
* terminal-failed would mask the drift.
*/
export function parseSubtaskResult(text: string): SubtaskResultUpdate {
const trimmed = text.trim();
export function parseSubtaskResult(
text: string,
additionalKwargs?: Record<string, unknown> | null,
): SubtaskResultUpdate {
const fromText = parseFromText(text.trim());
const structured = readStructuredStatus(additionalKwargs);
if (!structured) {
return fromText;
}
const update: SubtaskResultUpdate = { status: structured.status };
// Structured `subagent_error` wins; otherwise inherit the text-derived
// error only when both sides agree on the status (so a "Task Succeeded"
// body can't bleed into a `failed` structured stamp and vice versa).
if (structured.error) {
update.error = structured.error;
} else if (
fromText.status === structured.status &&
fromText.error !== undefined
) {
update.error = fromText.error;
}
// Result body only matters for `completed`; require text agreement so
// a lying success prefix under a `failed` stamp is dropped.
if (
structured.status === "completed" &&
fromText.status === "completed" &&
fromText.result !== undefined
) {
update.result = fromText.result;
}
return update;
}
function parseFromText(trimmed: string): SubtaskResultUpdate {
if (trimmed.startsWith(SUCCESS_PREFIX)) {
return {
status: "completed",
@@ -86,3 +159,30 @@ export function parseSubtaskResult(text: string): SubtaskResultUpdate {
return { status: "in_progress" };
}
interface StructuredStatus {
status: SubtaskStatus;
error?: string;
}
function readStructuredStatus(
additionalKwargs: Record<string, unknown> | null | undefined,
): StructuredStatus | null {
if (!additionalKwargs) return null;
const rawStatus = additionalKwargs[SUBAGENT_STATUS_KEY];
if (typeof rawStatus !== "string") return null;
const mapped = STRUCTURED_STATUS_TO_SUBTASK[rawStatus];
if (mapped === undefined) {
// Unknown future status value — stay on the legacy prefix fallback
// so a backend that ships a new enum variant before the frontend
// upgrades still renders something predictable instead of getting
// pinned to "in_progress" by an empty branch.
return null;
}
const rawError = additionalKwargs[SUBAGENT_ERROR_KEY];
const result: StructuredStatus = { status: mapped };
if (typeof rawError === "string" && rawError.trim()) {
result.error = rawError;
}
return result;
}