mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-10 09:25:57 +00:00
8d2e55a05f
* fix(subagent): structured subagent_status field over text parsing Closes #3146. ## Why The frontend used to derive subtask card state by string-matching the leading text of the `task` tool's result. That contract surface was fragile — `#3107` BUG-007 and the `#3131` review both surfaced cases where new backend wording (`Task cancelled by user.`, `Task polling timed out after N minutes`, `ToolErrorHandlingMiddleware` exception wrappers) silently broke the card lifecycle. The frontend fallback kept growing more prefixes; any future rewording would break it again. ## Design 1. **Backend → frontend contract**: `ToolMessage.additional_kwargs` carries `subagent_status` (one of `completed | failed | cancelled | timed_out | polling_timed_out`) and an optional `subagent_error` blob. The frontend prefers it over parsing `content`. 2. **Centralised stamping, not 8 sprinkled stamps**: rather than have each of `task_tool.py`'s 5 normal-return + 3 pre-execution `Error:` paths remember to set `additional_kwargs`, `ToolErrorHandlingMiddleware` stamps the field after every task-tool call. Adding a new return path in `task_tool.py` cannot now skip the stamp. 3. **Cross-language contract fixture**: the prefix→status mapping is the one piece both sides must agree on. The shared fixture at `contracts/subagent_status_contract.json` lists every backend return string, the expected status, and what the error substring should contain. Backend test (`backend/tests/test_subagent_status_contract.py`) and frontend test (`frontend/tests/unit/core/tasks/subtask-result.test.ts`) both load that fixture and assert the same cases. A wording drift on either side fails the matching language's test. 4. **Round-trip serialisation pinned**: the round-trip test asserts `ToolMessage.model_dump_json()` → `model_validate_json()` preserves `additional_kwargs.subagent_status`. Catches the case where a future LangChain or Pydantic upgrade silently strips unknown kwargs. 5. **Frontend status collapse documented**: the backend has five status values, the frontend card has three (`completed | failed | in_progress`). `cancelled` / `timed_out` / `polling_timed_out` all collapse to `failed` with the original status preserved in `error`. `parseSubtaskResult` returns `in_progress` for unknown values so a backend that ships a new enum variant before the frontend upgrades degrades to the legacy prefix fallback instead of getting pinned. ## Changes Backend: - `deerflow.subagents.status_contract` — new module exporting `SUBAGENT_STATUS_KEY`, `SUBAGENT_ERROR_KEY`, `SUBAGENT_STATUS_VALUES`, `extract_subagent_status(content)`, and `make_subagent_additional_kwargs(status, error)`. - `ToolErrorHandlingMiddleware`: new `_stamp_task_subagent_status` helper centralises the stamp; `wrap_tool_call` / `awrap_tool_call` stamp on the success path; `_build_error_message` stamps on the wrapper path (carrying `ExcClass: detail` into `subagent_error`). Non-task tools are untouched. - New tests: `test_subagent_status_contract.py` (19 cases from the shared fixture + status-enum / blank-error / unknown-status rejection) and `test_tool_error_handling_subagent_stamp.py` (middleware integration: terminal-content stamps, non-terminal doesn't, non-task tools untouched, async path mirrors sync, existing additional_kwargs survive, JSON round-trip preserved). Frontend: - `parseSubtaskResult(text, additionalKwargs?)` — prefers the structured stamp; falls back to the legacy prefix matcher for historical threads / unknown future status values. - `STRUCTURED_STATUS_TO_SUBTASK` documents the five→three collapse. - `message-list.tsx` passes `message.additional_kwargs` through. - `subtask-result.test.ts` adds a structured-status block + a fixture-driven contract block; legacy prefix tests stay green for the fallback path. Contract: - `contracts/subagent_status_contract.json` — single source of truth both languages load. Whitespace variants, varied N for polling timeouts, the 3 pre-execution `Error:` returns task_tool produces, and the middleware wrapper shape are all in there. ## Test plan - `make lint` clean (backend + frontend). - `pytest tests/test_subagent_status_contract.py tests/test_tool_error_handling_subagent_stamp.py` → 37 passed. - `pnpm test --run` → 103 passed (was 76, +27 new). ## Migration / fallback retirement The text-prefix fallback stays in place until backend telemetry shows the frontend never hits it for newly produced messages. At that point a follow-up PR can drop the prefix branches and keep only the structured-status branch. Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131 (prior prefix-only fix), #3146 (this issue). * fix(subtask): back-fill result/error from text when structured status present Three follow-ups on the PR #3154 review: 1. `readStructuredStatus` no longer short-circuits the prefix parse. The backend currently stamps only the `subagent_status` enum value; the human-facing `result` body and wrapped-error message still live in `ToolMessage.content`. Dropping the text parse meant successful tasks rendered empty completed pills and wrapped failures lost their diagnostic. Now both shapes get composed: structured status wins, `result`/`error` come from text when both sides agree, and a lying success body under a `failed` stamp is dropped instead of leaking. 2. Replace the ESM-incompatible `__dirname` fixture lookup in subtask-result.test.ts with `fileURLToPath(new URL(..., import.meta.url))`. The frontend package is `"type": "module"`, so the previous path would have thrown at runtime if anything ever changed under the contract directory. 3. Drop the `$schema` reference from contracts/subagent_status_contract.json pointing at a file that doesn't exist in the tree. Three new tests cover the structured + text composition: completed back-fills the success body, failed back-fills the wrapper text, and unrecognised content under a `failed` stamp stays empty rather than echoing noise.
290 lines
11 KiB
TypeScript
290 lines
11 KiB
TypeScript
import { readFileSync } from "node:fs";
|
|
import { fileURLToPath } from "node:url";
|
|
|
|
import { describe, expect, it } from "vitest";
|
|
|
|
import {
|
|
SUBAGENT_ERROR_KEY,
|
|
SUBAGENT_STATUS_KEY,
|
|
parseSubtaskResult,
|
|
} from "@/core/tasks/subtask-result";
|
|
|
|
interface ContractCase {
|
|
name: string;
|
|
content: string;
|
|
expected_status: string | null;
|
|
expected_error_contains: string | null;
|
|
}
|
|
|
|
interface ContractFile {
|
|
valid_status_values: string[];
|
|
cases: ContractCase[];
|
|
}
|
|
|
|
// The frontend package is ESM (`"type": "module"`), so `__dirname` is not
|
|
// defined. Resolve the cross-language fixture relative to this module URL.
|
|
const CONTRACT_PATH = fileURLToPath(
|
|
new URL(
|
|
"../../../../../contracts/subagent_status_contract.json",
|
|
import.meta.url,
|
|
),
|
|
);
|
|
const CONTRACT: ContractFile = JSON.parse(
|
|
readFileSync(CONTRACT_PATH, "utf-8"),
|
|
) as ContractFile;
|
|
|
|
describe("parseSubtaskResult", () => {
|
|
it("recognises the standard success prefix", () => {
|
|
const parsed = parseSubtaskResult(
|
|
"Task Succeeded. Result: investigated and produced a 3-page report",
|
|
);
|
|
expect(parsed.status).toBe("completed");
|
|
expect(parsed.result).toBe("investigated and produced a 3-page report");
|
|
});
|
|
|
|
it("recognises the standard failure prefix", () => {
|
|
const parsed = parseSubtaskResult(
|
|
"Task failed. underlying tool raised RuntimeError",
|
|
);
|
|
expect(parsed.status).toBe("failed");
|
|
expect(parsed.error).toBe("underlying tool raised RuntimeError");
|
|
});
|
|
|
|
it("recognises the standard timeout prefix", () => {
|
|
const parsed = parseSubtaskResult("Task timed out after 900s");
|
|
expect(parsed.status).toBe("failed");
|
|
expect(parsed.error).toBe("Task timed out after 900s");
|
|
});
|
|
|
|
it("recognises the cancelled-by-user prefix", () => {
|
|
// bytedance/deer-flow#3131 review: this is one of the five terminal
|
|
// strings task_tool.py actually emits — the previous cut treated it as
|
|
// unrecognised content and pushed the card back to in_progress.
|
|
const parsed = parseSubtaskResult("Task cancelled by user.");
|
|
expect(parsed.status).toBe("failed");
|
|
expect(parsed.error).toBe("Task cancelled by user.");
|
|
});
|
|
|
|
it("recognises the polling-timed-out prefix", () => {
|
|
// Emitted by task_tool when the background polling loop runs out of
|
|
// budget waiting for the subagent to reach a terminal state.
|
|
const parsed = parseSubtaskResult(
|
|
"Task polling timed out after 15 minutes. This may indicate the background task is stuck. Status: RUNNING",
|
|
);
|
|
expect(parsed.status).toBe("failed");
|
|
expect(parsed.error).toContain("polling timed out");
|
|
});
|
|
|
|
it("recognises polling-timed-out with different durations", () => {
|
|
// `task_tool` emits `Task polling timed out after {N} minutes` where N
|
|
// varies with the configured subagent timeout. Guard against the regex
|
|
// accidentally being pinned to a specific number.
|
|
for (const n of [1, 5, 60]) {
|
|
const parsed = parseSubtaskResult(
|
|
`Task polling timed out after ${n} minutes. Status: RUNNING`,
|
|
);
|
|
expect(parsed.status).toBe("failed");
|
|
}
|
|
});
|
|
|
|
it("trims whitespace around cancelled and polling-timed-out prefixes", () => {
|
|
// Streaming chunks sometimes arrive with leading/trailing newlines.
|
|
expect(parseSubtaskResult(" Task cancelled by user. \n").status).toBe(
|
|
"failed",
|
|
);
|
|
expect(
|
|
parseSubtaskResult("\n\nTask polling timed out after 3 minutes").status,
|
|
).toBe("failed");
|
|
});
|
|
|
|
it("recognises task_tool pre-execution Error: returns via the wrapper", () => {
|
|
// `task_tool.py` returns three `Error:` strings for unknown subagent
|
|
// type, host-bash disabled, and "task disappeared". They share the
|
|
// ERROR_WRAPPER_PATTERN, not a dedicated prefix, so this guards
|
|
// against a refactor splitting them off.
|
|
for (const text of [
|
|
"Error: Unknown subagent type 'foo'. Available: bash, general-purpose",
|
|
"Error: Host bash subagent is disabled by configuration",
|
|
"Error: Task 1234 disappeared from background tasks",
|
|
]) {
|
|
expect(parseSubtaskResult(text).status).toBe("failed");
|
|
}
|
|
});
|
|
|
|
it("treats middleware-wrapped tool errors as terminal failures", () => {
|
|
// bytedance/deer-flow issue #3107 BUG-007: the parent-visible ToolMessage
|
|
// produced by ToolErrorHandlingMiddleware never matches the three legacy
|
|
// prefixes, so subtask cards stay stuck on "in_progress".
|
|
const parsed = parseSubtaskResult(
|
|
"Error: Tool 'task' failed with TypeError: 'AsyncCallbackManager' object is not iterable. Continue with available context, or choose an alternative tool.",
|
|
);
|
|
expect(parsed.status).toBe("failed");
|
|
expect(parsed.error).toContain("AsyncCallbackManager");
|
|
});
|
|
|
|
it("treats any other Error: prefix as a terminal failure", () => {
|
|
const parsed = parseSubtaskResult("Error: subagent worker pool exhausted");
|
|
expect(parsed.status).toBe("failed");
|
|
});
|
|
|
|
it("keeps unrecognised non-error output as in_progress", () => {
|
|
// Streaming partial chunks should not flip the card to terminal early.
|
|
const parsed = parseSubtaskResult("Investigating ...");
|
|
expect(parsed.status).toBe("in_progress");
|
|
expect(parsed.error).toBeUndefined();
|
|
expect(parsed.result).toBeUndefined();
|
|
});
|
|
|
|
it("trims surrounding whitespace before matching prefixes", () => {
|
|
const parsed = parseSubtaskResult(" Task Succeeded. Result: ok ");
|
|
expect(parsed.status).toBe("completed");
|
|
expect(parsed.result).toBe("ok");
|
|
});
|
|
});
|
|
|
|
/**
|
|
* Structured-status path (bytedance/deer-flow#3146).
|
|
*
|
|
* The backend stamps `ToolMessage.additional_kwargs.subagent_status`
|
|
* directly. The frontend should prefer that over reverse-engineering it
|
|
* from the content string.
|
|
*/
|
|
describe("parseSubtaskResult — structured additional_kwargs (preferred path)", () => {
|
|
it("uses additional_kwargs.subagent_status when present", () => {
|
|
const parsed = parseSubtaskResult("Task Succeeded. Result: foo", {
|
|
[SUBAGENT_STATUS_KEY]: "completed",
|
|
});
|
|
expect(parsed.status).toBe("completed");
|
|
});
|
|
|
|
it("collapses cancelled / timed_out / polling_timed_out to failed for the card UI", () => {
|
|
for (const backendStatus of [
|
|
"cancelled",
|
|
"timed_out",
|
|
"polling_timed_out",
|
|
]) {
|
|
const parsed = parseSubtaskResult("anything at all", {
|
|
[SUBAGENT_STATUS_KEY]: backendStatus,
|
|
});
|
|
expect(parsed.status).toBe("failed");
|
|
}
|
|
});
|
|
|
|
it("uses subagent_error when supplied", () => {
|
|
const parsed = parseSubtaskResult("ignored content", {
|
|
[SUBAGENT_STATUS_KEY]: "failed",
|
|
[SUBAGENT_ERROR_KEY]: "boom from backend",
|
|
});
|
|
expect(parsed.status).toBe("failed");
|
|
expect(parsed.error).toBe("boom from backend");
|
|
});
|
|
|
|
it("ignores empty / non-string subagent_error", () => {
|
|
const parsed = parseSubtaskResult("ignored content", {
|
|
[SUBAGENT_STATUS_KEY]: "failed",
|
|
[SUBAGENT_ERROR_KEY]: "",
|
|
});
|
|
expect(parsed.status).toBe("failed");
|
|
expect(parsed.error).toBeUndefined();
|
|
});
|
|
|
|
it("falls back to prefix parsing when the structured status is missing", () => {
|
|
const parsed = parseSubtaskResult("Task Succeeded. Result: foo", {
|
|
// No subagent_status here — backend versions that pre-date the
|
|
// middleware stamping commit still need to render.
|
|
other_field: "irrelevant",
|
|
});
|
|
expect(parsed.status).toBe("completed");
|
|
expect(parsed.result).toBe("foo");
|
|
});
|
|
|
|
it("falls back to prefix parsing when the structured status is an unknown future value", () => {
|
|
const parsed = parseSubtaskResult("Task Succeeded. Result: foo", {
|
|
[SUBAGENT_STATUS_KEY]: "renamed_in_v3",
|
|
});
|
|
// Falls back to prefix and still finds the success path.
|
|
expect(parsed.status).toBe("completed");
|
|
});
|
|
|
|
it("structured status overrides legacy text — opposite content", () => {
|
|
// Defence: if backend sends `failed` structured but the content
|
|
// accidentally starts with "Task Succeeded.", we must trust the
|
|
// structured field. The structured field is the source of truth.
|
|
const parsed = parseSubtaskResult("Task Succeeded. Result: this is a lie", {
|
|
[SUBAGENT_STATUS_KEY]: "failed",
|
|
});
|
|
expect(parsed.status).toBe("failed");
|
|
// The misleading success body must be dropped — `result` is reserved
|
|
// for the completed pill, and the suspicious text isn't replayed as
|
|
// an error either.
|
|
expect(parsed.result).toBeUndefined();
|
|
expect(parsed.error).toBeUndefined();
|
|
});
|
|
|
|
it("back-fills `result` from the success-prefixed content when structured says completed", () => {
|
|
// The backend currently stamps `subagent_status: completed` but the
|
|
// success body still lives in `content`. Without back-fill the card
|
|
// would render an empty completed pill (regression flagged in PR #3154
|
|
// Copilot review).
|
|
const parsed = parseSubtaskResult(
|
|
"Task Succeeded. Result: investigated and produced a 3-page report",
|
|
{ [SUBAGENT_STATUS_KEY]: "completed" },
|
|
);
|
|
expect(parsed.status).toBe("completed");
|
|
expect(parsed.result).toBe("investigated and produced a 3-page report");
|
|
});
|
|
|
|
it("back-fills `error` from a wrapped-error body when structured says failed and no subagent_error", () => {
|
|
// Same regression on the failure side: the wrapper text is the only
|
|
// place the diagnostic message exists when the backend stamps the
|
|
// enum but not `subagent_error`.
|
|
const parsed = parseSubtaskResult(
|
|
"Error: Tool 'task' failed with TypeError: boom",
|
|
{ [SUBAGENT_STATUS_KEY]: "failed" },
|
|
);
|
|
expect(parsed.status).toBe("failed");
|
|
expect(parsed.error).toContain("TypeError: boom");
|
|
});
|
|
|
|
it("leaves `error` undefined when structured says failed with no error and unrecognised text", () => {
|
|
// Don't dump arbitrary content into the error field — better to render
|
|
// an empty `failed` pill than to surface noise.
|
|
const parsed = parseSubtaskResult("partial streaming chunk", {
|
|
[SUBAGENT_STATUS_KEY]: "failed",
|
|
});
|
|
expect(parsed.status).toBe("failed");
|
|
expect(parsed.error).toBeUndefined();
|
|
});
|
|
});
|
|
|
|
/**
|
|
* Cross-language contract test (bytedance/deer-flow#3146).
|
|
*
|
|
* Loads the shared fixture at ``contracts/subagent_status_contract.json``
|
|
* and runs every case through the legacy prefix parser. The matching
|
|
* backend test (`backend/tests/test_subagent_status_contract.py`) runs
|
|
* the same cases through ``extract_subagent_status``. Any drift between
|
|
* the two implementations surfaces here.
|
|
*
|
|
* Status-collapse expectations:
|
|
* - `completed` → `completed`
|
|
* - `failed` → `failed`
|
|
* - `cancelled` / `timed_out` / `polling_timed_out` → `failed`
|
|
* (the frontend card has three pill states, not five)
|
|
* - `null` → `in_progress`
|
|
*/
|
|
describe("parseSubtaskResult — shared contract fixture", () => {
|
|
const expectedCardStatus = (backendStatus: string | null): string => {
|
|
if (backendStatus === null) return "in_progress";
|
|
if (backendStatus === "completed") return "completed";
|
|
return "failed";
|
|
};
|
|
|
|
for (const c of CONTRACT.cases) {
|
|
it(`legacy prefix parser matches contract: ${c.name}`, () => {
|
|
const parsed = parseSubtaskResult(c.content);
|
|
expect(parsed.status).toBe(expectedCardStatus(c.expected_status));
|
|
});
|
|
}
|
|
});
|