mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-10 09:25:57 +00:00
8d2e55a05f
* fix(subagent): structured subagent_status field over text parsing Closes #3146. ## Why The frontend used to derive subtask card state by string-matching the leading text of the `task` tool's result. That contract surface was fragile — `#3107` BUG-007 and the `#3131` review both surfaced cases where new backend wording (`Task cancelled by user.`, `Task polling timed out after N minutes`, `ToolErrorHandlingMiddleware` exception wrappers) silently broke the card lifecycle. The frontend fallback kept growing more prefixes; any future rewording would break it again. ## Design 1. **Backend → frontend contract**: `ToolMessage.additional_kwargs` carries `subagent_status` (one of `completed | failed | cancelled | timed_out | polling_timed_out`) and an optional `subagent_error` blob. The frontend prefers it over parsing `content`. 2. **Centralised stamping, not 8 sprinkled stamps**: rather than have each of `task_tool.py`'s 5 normal-return + 3 pre-execution `Error:` paths remember to set `additional_kwargs`, `ToolErrorHandlingMiddleware` stamps the field after every task-tool call. Adding a new return path in `task_tool.py` cannot now skip the stamp. 3. **Cross-language contract fixture**: the prefix→status mapping is the one piece both sides must agree on. The shared fixture at `contracts/subagent_status_contract.json` lists every backend return string, the expected status, and what the error substring should contain. Backend test (`backend/tests/test_subagent_status_contract.py`) and frontend test (`frontend/tests/unit/core/tasks/subtask-result.test.ts`) both load that fixture and assert the same cases. A wording drift on either side fails the matching language's test. 4. **Round-trip serialisation pinned**: the round-trip test asserts `ToolMessage.model_dump_json()` → `model_validate_json()` preserves `additional_kwargs.subagent_status`. Catches the case where a future LangChain or Pydantic upgrade silently strips unknown kwargs. 5. **Frontend status collapse documented**: the backend has five status values, the frontend card has three (`completed | failed | in_progress`). `cancelled` / `timed_out` / `polling_timed_out` all collapse to `failed` with the original status preserved in `error`. `parseSubtaskResult` returns `in_progress` for unknown values so a backend that ships a new enum variant before the frontend upgrades degrades to the legacy prefix fallback instead of getting pinned. ## Changes Backend: - `deerflow.subagents.status_contract` — new module exporting `SUBAGENT_STATUS_KEY`, `SUBAGENT_ERROR_KEY`, `SUBAGENT_STATUS_VALUES`, `extract_subagent_status(content)`, and `make_subagent_additional_kwargs(status, error)`. - `ToolErrorHandlingMiddleware`: new `_stamp_task_subagent_status` helper centralises the stamp; `wrap_tool_call` / `awrap_tool_call` stamp on the success path; `_build_error_message` stamps on the wrapper path (carrying `ExcClass: detail` into `subagent_error`). Non-task tools are untouched. - New tests: `test_subagent_status_contract.py` (19 cases from the shared fixture + status-enum / blank-error / unknown-status rejection) and `test_tool_error_handling_subagent_stamp.py` (middleware integration: terminal-content stamps, non-terminal doesn't, non-task tools untouched, async path mirrors sync, existing additional_kwargs survive, JSON round-trip preserved). Frontend: - `parseSubtaskResult(text, additionalKwargs?)` — prefers the structured stamp; falls back to the legacy prefix matcher for historical threads / unknown future status values. - `STRUCTURED_STATUS_TO_SUBTASK` documents the five→three collapse. - `message-list.tsx` passes `message.additional_kwargs` through. - `subtask-result.test.ts` adds a structured-status block + a fixture-driven contract block; legacy prefix tests stay green for the fallback path. Contract: - `contracts/subagent_status_contract.json` — single source of truth both languages load. Whitespace variants, varied N for polling timeouts, the 3 pre-execution `Error:` returns task_tool produces, and the middleware wrapper shape are all in there. ## Test plan - `make lint` clean (backend + frontend). - `pytest tests/test_subagent_status_contract.py tests/test_tool_error_handling_subagent_stamp.py` → 37 passed. - `pnpm test --run` → 103 passed (was 76, +27 new). ## Migration / fallback retirement The text-prefix fallback stays in place until backend telemetry shows the frontend never hits it for newly produced messages. At that point a follow-up PR can drop the prefix branches and keep only the structured-status branch. Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131 (prior prefix-only fix), #3146 (this issue). * fix(subtask): back-fill result/error from text when structured status present Three follow-ups on the PR #3154 review: 1. `readStructuredStatus` no longer short-circuits the prefix parse. The backend currently stamps only the `subagent_status` enum value; the human-facing `result` body and wrapped-error message still live in `ToolMessage.content`. Dropping the text parse meant successful tasks rendered empty completed pills and wrapped failures lost their diagnostic. Now both shapes get composed: structured status wins, `result`/`error` come from text when both sides agree, and a lying success body under a `failed` stamp is dropped instead of leaking. 2. Replace the ESM-incompatible `__dirname` fixture lookup in subtask-result.test.ts with `fileURLToPath(new URL(..., import.meta.url))`. The frontend package is `"type": "module"`, so the previous path would have thrown at runtime if anything ever changed under the contract directory. 3. Drop the `$schema` reference from contracts/subagent_status_contract.json pointing at a file that doesn't exist in the tree. Three new tests cover the structured + text composition: completed back-fills the success body, failed back-fills the wrapper text, and unrecognised content under a `failed` stamp stays empty rather than echoing noise.
189 lines
6.9 KiB
TypeScript
189 lines
6.9 KiB
TypeScript
import type { Subtask } from "./types";
|
|
|
|
export type SubtaskStatus = Subtask["status"];
|
|
|
|
export interface SubtaskResultUpdate {
|
|
status: SubtaskStatus;
|
|
result?: string;
|
|
error?: string;
|
|
}
|
|
|
|
/**
|
|
* Structured-status keys the backend stamps onto
|
|
* ``ToolMessage.additional_kwargs`` for every ``task`` tool result.
|
|
*
|
|
* The values mirror the Python contract in
|
|
* ``backend/packages/harness/deerflow/subagents/status_contract.py``
|
|
* (``SUBAGENT_STATUS_KEY`` / ``SUBAGENT_ERROR_KEY``). The cross-language
|
|
* fixture at ``contracts/subagent_status_contract.json`` pins both sides
|
|
* to the same values.
|
|
*/
|
|
export const SUBAGENT_STATUS_KEY = "subagent_status";
|
|
export const SUBAGENT_ERROR_KEY = "subagent_error";
|
|
|
|
/**
|
|
* Map from the backend ``subagent_status`` value to the frontend
|
|
* {@link SubtaskStatus} enum. The frontend collapses ``cancelled`` /
|
|
* ``timed_out`` / ``polling_timed_out`` into ``failed`` because the
|
|
* subtask card only renders three pill states. The richer backend
|
|
* vocabulary still survives on ``error`` for tooling that wants the
|
|
* detail.
|
|
*/
|
|
const STRUCTURED_STATUS_TO_SUBTASK: Record<string, SubtaskStatus> = {
|
|
completed: "completed",
|
|
failed: "failed",
|
|
cancelled: "failed",
|
|
timed_out: "failed",
|
|
polling_timed_out: "failed",
|
|
};
|
|
|
|
/**
|
|
* Prefix strings the backend `task` tool writes into its result `content`.
|
|
*
|
|
* These values are not user-facing copy — they are part of the
|
|
* backend↔frontend contract defined in
|
|
* `backend/packages/harness/deerflow/tools/builtins/task_tool.py` (returned
|
|
* from the tool body) and in
|
|
* `backend/packages/harness/deerflow/agents/middlewares/tool_error_handling_middleware.py`
|
|
* (wrapper for tool exceptions). Any change here must be paired with the
|
|
* matching backend change. Exported so a future structured-status migration
|
|
* can reference the same values from one place.
|
|
*
|
|
* `task_tool.py` also emits three `Error:` strings for pre-execution failures
|
|
* — unknown subagent type, host-bash disabled, and "task disappeared from
|
|
* background tasks". They are handled by {@link ERROR_WRAPPER_PATTERN}
|
|
* rather than dedicated prefixes because the wrapper already produces
|
|
* exactly the right `terminal failed` shape.
|
|
*/
|
|
export const SUCCESS_PREFIX = "Task Succeeded. Result:";
|
|
export const FAILURE_PREFIX = "Task failed.";
|
|
export const TIMEOUT_PREFIX = "Task timed out";
|
|
export const CANCELLED_PREFIX = "Task cancelled by user.";
|
|
export const POLLING_TIMEOUT_PREFIX = "Task polling timed out";
|
|
export const ERROR_WRAPPER_PATTERN = /^Error\b/i;
|
|
|
|
/**
|
|
* Map a `task` tool result to a {@link SubtaskStatus}.
|
|
*
|
|
* Bytedance/deer-flow issue #3146: prefers the structured
|
|
* ``additional_kwargs.subagent_status`` field the backend now stamps via
|
|
* ``ToolErrorHandlingMiddleware``. Falls back to the legacy prefix
|
|
* matching for messages that pre-date the stamping commit (historical
|
|
* threads, third-party clients, or any tool path that bypasses the
|
|
* middleware). Both shapes converge on the same {@link SubtaskStatus}
|
|
* vocabulary the card UI renders.
|
|
*
|
|
* When the structured field is present, the prefix parser is still run
|
|
* so the success `result` body and the wrapped-error message can be
|
|
* back-filled from `content`. Today the backend only stamps the
|
|
* `subagent_status` enum value — the human-facing payload still lives
|
|
* in `content`, so dropping the prefix parse would regress the subtask
|
|
* card display. Structured fields win on conflict: if `subagent_status`
|
|
* and the text disagree, the text-derived `result`/`error` are
|
|
* discarded so a malformed wrapper can't sneak through.
|
|
*
|
|
* Returning `in_progress` is the **deliberate** fallback for content that
|
|
* matches none of the known prefixes and carries no structured stamp.
|
|
* LangChain only ever emits a `ToolMessage` once the tool itself has
|
|
* returned (success or wrapped exception), so an unknown shape means
|
|
* "the contract changed underneath us" — surfacing it as still-running
|
|
* prompts the operator to investigate, where eagerly marking it
|
|
* terminal-failed would mask the drift.
|
|
*/
|
|
export function parseSubtaskResult(
|
|
text: string,
|
|
additionalKwargs?: Record<string, unknown> | null,
|
|
): SubtaskResultUpdate {
|
|
const fromText = parseFromText(text.trim());
|
|
const structured = readStructuredStatus(additionalKwargs);
|
|
if (!structured) {
|
|
return fromText;
|
|
}
|
|
|
|
const update: SubtaskResultUpdate = { status: structured.status };
|
|
// Structured `subagent_error` wins; otherwise inherit the text-derived
|
|
// error only when both sides agree on the status (so a "Task Succeeded"
|
|
// body can't bleed into a `failed` structured stamp and vice versa).
|
|
if (structured.error) {
|
|
update.error = structured.error;
|
|
} else if (
|
|
fromText.status === structured.status &&
|
|
fromText.error !== undefined
|
|
) {
|
|
update.error = fromText.error;
|
|
}
|
|
// Result body only matters for `completed`; require text agreement so
|
|
// a lying success prefix under a `failed` stamp is dropped.
|
|
if (
|
|
structured.status === "completed" &&
|
|
fromText.status === "completed" &&
|
|
fromText.result !== undefined
|
|
) {
|
|
update.result = fromText.result;
|
|
}
|
|
return update;
|
|
}
|
|
|
|
function parseFromText(trimmed: string): SubtaskResultUpdate {
|
|
if (trimmed.startsWith(SUCCESS_PREFIX)) {
|
|
return {
|
|
status: "completed",
|
|
result: trimmed.slice(SUCCESS_PREFIX.length).trim(),
|
|
};
|
|
}
|
|
|
|
if (trimmed.startsWith(FAILURE_PREFIX)) {
|
|
return {
|
|
status: "failed",
|
|
error: trimmed.slice(FAILURE_PREFIX.length).trim(),
|
|
};
|
|
}
|
|
|
|
if (trimmed.startsWith(TIMEOUT_PREFIX)) {
|
|
return { status: "failed", error: trimmed };
|
|
}
|
|
|
|
if (trimmed.startsWith(CANCELLED_PREFIX)) {
|
|
return { status: "failed", error: trimmed };
|
|
}
|
|
|
|
if (trimmed.startsWith(POLLING_TIMEOUT_PREFIX)) {
|
|
return { status: "failed", error: trimmed };
|
|
}
|
|
|
|
// ToolErrorHandlingMiddleware-style wrapper, or any other terminal error
|
|
// signal the backend forwards to the lead agent.
|
|
if (ERROR_WRAPPER_PATTERN.test(trimmed)) {
|
|
return { status: "failed", error: trimmed };
|
|
}
|
|
|
|
return { status: "in_progress" };
|
|
}
|
|
|
|
interface StructuredStatus {
|
|
status: SubtaskStatus;
|
|
error?: string;
|
|
}
|
|
|
|
function readStructuredStatus(
|
|
additionalKwargs: Record<string, unknown> | null | undefined,
|
|
): StructuredStatus | null {
|
|
if (!additionalKwargs) return null;
|
|
const rawStatus = additionalKwargs[SUBAGENT_STATUS_KEY];
|
|
if (typeof rawStatus !== "string") return null;
|
|
const mapped = STRUCTURED_STATUS_TO_SUBTASK[rawStatus];
|
|
if (mapped === undefined) {
|
|
// Unknown future status value — stay on the legacy prefix fallback
|
|
// so a backend that ships a new enum variant before the frontend
|
|
// upgrades still renders something predictable instead of getting
|
|
// pinned to "in_progress" by an empty branch.
|
|
return null;
|
|
}
|
|
const rawError = additionalKwargs[SUBAGENT_ERROR_KEY];
|
|
const result: StructuredStatus = { status: mapped };
|
|
if (typeof rawError === "string" && rawError.trim()) {
|
|
result.error = rawError;
|
|
}
|
|
return result;
|
|
}
|