* fix(frontend): paginate workspace chat list beyond 50 threads (#3482)
The sidebar 'Recent chats' and /workspace/chats list were hard-capped
at the first 50 threads returned by threads.search. Replace the
single-shot useThreads() consumers with useInfiniteThreads() and add
an IntersectionObserver sentinel to each list so further pages are
fetched on demand.
In search mode on the chats page, the sentinel is replaced by an
explicit 'Load more' button to prevent the observer from draining the
entire backend list while the filtered view stays empty.
- Add useInfiniteThreads + page-size constant and pure cache helpers
(map/filterInfiniteThreadsCache, getInfiniteThreadsNextPageParam)
- Mirror rename / delete / stream-finish updates into the new
infinite cache so optimistic UI stays consistent
- Extend the e2e mock to honour limit/offset slicing
- Unit tests for the cache helpers and pagination boundary
- Playwright e2e covering chats page + sidebar load-more, and the
search-mode guard against runaway auto-pagination
- Add en/zh i18n entries for the search-mode load-more button
Fixes#3482
* docs(frontend): clarify infinite-threads offset semantics and test post-delete invariant
- Add docstring to getInfiniteThreadsNextPageParam explaining that TanStack
Query freezes the returned offset into pageParams once, so optimistic cache
mutations that shrink page lengths (filterInfiniteThreadsCache on delete)
cannot retroactively move the offset backwards. Delete/rename paths
reconcile against the backend via invalidateQueries in onSettled.
- Add unit test covering the post-delete invariant.
- Fix misleading comment in thread-list-infinite-scroll.spec.ts: the
thread-search mock does not sort by updated_at; it returns the array in
the order provided.
Addresses Copilot CR comments on #3485.
* fix(frontend): mirror onCreated upsert into infinite cache; add sidebar Load-older button
Address review feedback on #3485:
- New upsertThreadInInfiniteCache helper; useThreadStream onCreated now
upserts into both the legacy ['threads','search'] cache and the new
infinite cache, so a freshly created thread appears in the sidebar
immediately during streaming instead of only after the run finishes
and onSettled invalidates the query. Restores parity with main.
- Sidebar Recent Chats now exposes a visible 'Load older chats' button
alongside the IntersectionObserver sentinel, so keyboard-only users
and environments where IO is unavailable can still reach older
conversations.
- Add zh-CN / en-US / types entry for chats.loadOlderChats.
- Cover the new helper with 3 unit tests (no-op on uninitialised cache,
prepend new thread to first page, merge with existing entry without
duplication).
`client.py` imported the private `_build_middlewares` from `agent.py` across a
module boundary and called it as public API. Because the `_` name signals
"module-private, no external callers", any future rename or signature change
silently breaks the embedded `DeerFlowClient` path — and the test suite even
monkeypatched `deerflow.client._build_middlewares`, baking the leak in.
`DeerFlowClient` is a lead-agent variant that genuinely needs the lead agent's
full middleware composition, so make the dependency honest: promote the helper
to a documented public entry point `build_middlewares` and update every in-repo
caller. Found during #3341 review; #3341 already removed one such leak
(`_assemble_deferred` -> public `assemble_deferred_tools`) and left this one out
of scope on purpose.
- agent.py: rename def + both internal call sites; expand the docstring into a
public-entry-point contract and document the previously-undocumented
model_name / app_config / deferred_setup params
- client.py: import + call site now use the public name (removes the last
cross-module private import)
- scripts/tool-error-degradation-detection.sh: update its import + call site
- tests (5 files): update monkeypatch/patch targets and direct calls
- docs (backend/CLAUDE.md, plan_mode_usage.md, middlewares.mdx): sync the live
references that describe the symbol as current API
Pure mechanical rename, no behavior change. Historical design docs (rfc,
superpowers spec) intentionally keep the old name as point-in-time records.
Closes#3431
* docs(spec): MiniMax integration for generation skills + new music skill
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(plan): MiniMax generation providers implementation plan
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(skills): add importlib loader + FakeResp for skill tests
* test(skills): register loaded module in sys.modules; raise requests.HTTPError in FakeResp
* feat(image-generation): add MiniMax provider with env auto-detect
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(image-generation): guard unknown provider, derive ref MIME, strengthen tests
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(video-generation): add MiniMax provider with async poll/download
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(video-generation): surface base_resp errors while polling; add timeout test
* feat(podcast-generation): add MiniMax t2a_v2 provider with env auto-detect
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(podcast-generation): restore TTS credential guard; add volcengine + voice tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(music-generation): new MiniMax music skill via skill-creator
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(music-generation): treat empty lyrics as absent; test no-audio-data path
* refactor(skills): add request timeouts to MiniMax network calls
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Potential fix for pull request finding 'Explicit returns mixed with implicit (fall through) returns'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* fix(models): strip inconsistent user-message names for MiniMax chat
DeerFlow middlewares tag user messages with provenance names (user-input, summary, loop_warning); langchain serializes them into the OpenAI-compatible payload and MiniMax rejects mismatched user-message names with "user name must be consistent (2013)". PatchedChatMiniMax now drops the per-message name from user-role messages. Point the config.example MiniMax models at PatchedChatMiniMax so they also get reasoning_content mapping.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(image-generation): MiniMax sends JSON prompt field, guard 1500-char limit
MiniMax image-01 takes one text string capped at 1500 chars, but the skill was sending the whole structured JSON. The MiniMax provider now extracts the JSON `prompt` field (relying on prompt_optimizer to expand it) and fails fast with a clear error before calling the API when that field exceeds 1500 chars. Authoring stays provider-agnostic; Gemini still receives the full JSON.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(podcast-generation): per-provider TTS concurrency and retry/backoff
Each TTS provider owns its concurrency internally — MiniMax runs single-threaded to reduce rate-limit failures, Volcengine keeps 4 workers — with automatic retry and backoff on transient HTTP and base_resp errors. No caller-facing concurrency knob.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(skills): address Copilot review comments on generation skills
- video: add raise_for_status + timeout to the Gemini download/POST/poll calls so non-2xx responses surface as clear HTTP errors instead of JSON/KeyError or hangs
- video: check the task Fail status before the generic base_resp check so the failure keeps its task_id context
- video/image: create the output file parent directory before writing (matching music-generation) so nested output paths do not raise FileNotFoundError
- music: require a non-empty prompt and fail fast with ValueError instead of sending an empty prompt to the API
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(scripts): reclaim dev ports across worktrees in make stop/dev
All deer-flow worktrees (main checkout + linked worktrees) hardcode the same dev ports (8001/3000/2026), so a service started from any worktree must be reclaimable from another. stop_all now resolves the set of worktree roots (DEERFLOW_ROOTS) and treats a process as deer-flow-owned when its open files live under any of them. It also force-kills survivors on 2026 alongside 8001/3000, fixing `make dev` aborting on the nginx port preflight when a prior nginx lingered on 2026.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(view-image): hide the injected image-context message from the UI
ViewImageMiddleware injects a HumanMessage (text + base64 images) so the vision model can see viewed images, but it was the only internal injector that set neither hide_from_ui nor a hidden name, so it leaked into the chat UI (and IM channels) as a user bubble reading "Here are the images you've viewed:". Mark it with additional_kwargs={"hide_from_ui": True}, matching todo/dynamic_context injections, which the frontend isHiddenFromUIMessage and the channel sender already honor. The model still receives the full content.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(minimax): mark M2.7 models as text-only (no vision)
MiniMax M2.7 / M2.7-highspeed do not support vision; only M3 does. The
provider config asserted vision support for M2.7 in four places.
- config.example.yaml: 4 M2.7 entries -> supports_vision: false
- backend/docs/CONFIGURATION.md: M2.7 + highspeed -> supports_vision: false
- wizard: add LLMProvider.model_vision_overrides + extra_config_for() so
selecting an M2.7 model writes supports_vision: false while M3 (default)
keeps vision; wire it through setup_wizard.py
- tests: M2.7-highspeed fixture -> supports_vision=False; add
test_minimax_vision_is_per_model
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
The search input, filter tabs, and four action buttons were crammed into
a single horizontal row, which squeezed the search box into an unusable
sliver and truncated the "Summaries" filter tab to "Summarie".
Split the toolbar into two rows: search + filter tabs on the first,
actions on the second. The search input now keeps a usable min width,
filter tabs use whitespace-nowrap so they never truncate, and the
destructive "Clear all memory" button is pushed to the far right
(ml-auto) to separate it from the constructive actions.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(replay-e2e): match by conversation, not the living system prompt
The model-replay match key hashed the full input including the lead-agent
system prompt. That prompt is edited frequently (e.g. #3195 added a "File
Editing Workflow" section), so the committed fixture went stale the moment
the prompt changed on main — turning the Layer-2 render gate RED on every
unrelated PR (#3430, #3432, ...). This was a self-inflicted false positive.
Root-cause fix:
- replay_provider._canonical_messages now EXCLUDES the system message from
the hash. The conversation (human/ai/tool) is the stable contract that
identifies a recorded turn; the system prompt is an internal detail not
part of the front-back contract under test. (Mirrors how open-design keys
its mock picker on the user prompt, not the system internals.) Proven
robust: injecting a prompt edit no longer causes a replay miss.
- Layer-1 golden was BLIND to replay misses: the gateway swallows a miss
into an assistant error message, so the shape-only golden stayed green on
a stale fixture. It now inspects replay_provider.replay_misses() and fails
loud. (Layer-2 already fails on a miss.)
- Re-recorded write_read_file.ultra fixture + regenerated golden under the
new conversation-only hash.
- Layer-2 render spec: assert the in-graph auto-title (deterministic); the
follow-up suggestion is fired async and depends on a clean JSON model
output, so assert it only when the fixture captured one — never gate on
its absence (recording flakiness must not block CI).
- docs: REPLAY_E2E.md updated.
Verified: Layer-1 golden green (no miss), Layer-2 both specs green,
CI=true make test 4033 passed / 0 failed, frontend pnpm check clean.
* test(replay-e2e): restore suggestions coverage with a reliable capture
Addresses review feedback (the suggestion path was dropped from Layer-2):
- record spec now waits for the `/suggestions` response before checking
capture stability, so the recorded fixture reliably includes the
frontend-fired suggestions turn (previously the stability window could
return before suggestions fired, yielding a fixture without it).
- Re-recorded write_read_file.ultra: 5 turns (write_file, auto-title,
read_file, answer, suggestions). Golden unchanged — suggestions is a
separate /suggestions call, not part of the /runs/stream SSE sequence.
- Layer-2 spec: restore the hard `EXPECTED_SUGGESTION` assertion. With the
record spec now waiting for /suggestions, a fixture missing the suggestion
turn means a broken recording and must fail loud, not pass silently.
Verified: Layer-1 golden green (no miss), Layer-2 both specs green
(auto-title + suggestion render), frontend pnpm check clean.
* ci: re-trigger (flaky Docker Hub image pull in sandbox e2e, unrelated)
backend-unit-tests failed only in test_sandbox_orphan_reconciliation_e2e.py
with 'docker pull busybox:latest ... context deadline exceeded' — a CI-runner
network flake reaching Docker Hub, not related to this docs/tests-only change.
Empty commit to re-run CI.
---------
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
* test(e2e): record/replay front-back contract verification
Guards the front-back contract with a deterministic, key-free record/replay
harness (mirrors open-design's golden-trace approach):
- ReplayChatModel (tests/replay_provider.py): replays recorded LLM turns by a
normalized hash of the model input. Strips <system-reminder>/date/uuid/tmp-path
so one fixture replays across days and from both the browser and direct-POST
paths; a miss raises loudly (no silent divergence).
- Recording is record-through-browser (scripts/record_gateway.py +
build_fixture_from_jsonl.py + frontend/tests/e2e-record): a real run is driven
through the real frontend so captured inputs match exactly what the browser
sends; fixtures contain no API key.
- Layer 1 — backend golden (tests/test_replay_golden.py): replay through the real
gateway, assert the SSE event sequence == committed golden.
- Layer 2 — full-stack render (frontend/tests/e2e-real-backend): real Next.js +
real gateway (replay model) + Chromium; assert the replayed auto-title and
follow-up suggestions render. DOM assertions are the gate; visual regression is
a local dev gate (CI uploads the render as an artifact).
- CI (.github/workflows/replay-e2e.yml): both layers, triggered on EITHER side of
the contract (frontend/** or backend gateway/harness/fixtures).
* test(e2e): multi-run render-order cross-stack scenario (#3352)
Guards the dangerous front-back class where a backend ordering change
silently breaks a frontend assumption while both sides' unit tests stay
green. Reproduces issue #3352: backend list_by_thread returns runs
newest-first (#2932) and the frontend prepended per-run pages, inverting
chronological order once the checkpoint no longer held the older messages.
- tests/seed_runs_router.py: test-only seeder, mounted on the replay
gateway only when DEERFLOW_ENABLE_TEST_SEED=1 (never in the production
app). Seeds a thread with >=2 runs + per-run message events and no
checkpoint -- the #3352 precondition -- so the frontend per-run reload
path is the sole source of truth and the prepend inversion is observable.
- frontend/tests/e2e-real-backend/multi-run-order.spec.ts: drives the real
frontend against the real gateway, asserts the first run renders above
the second. Reverting the #3354 fix turns it red.
- replay-e2e.yml: trigger on the new replay test-infra paths.
- docs: REPLAY_E2E.md cross-stack scenario section.
* test(e2e): address Copilot review on the replay harness
- Fix stale recorder references (scripts/record_traces.py ->
scripts/record_gateway.py + scripts/build_fixture_from_jsonl.py) in
replay_provider.py, test_replay_golden.py, _replay_fixture.py.
- MODE_CONTEXT['ultra']: thinking_enabled False -> True, mirroring the
frontend's `context.mode !== 'flash'` (hooks.ts). It did not affect the
hashed input (Layer 1 golden still green), but the table now matches the
real frontend context it claims to mirror.
- replay_provider.py docstring: stop claiming memory is recorded-enabled;
the replay config disables memory/summarization for determinism (title
stays, as an in-graph deterministic call).
- record_gateway.py / run_replay_gateway.py: override DEER_FLOW_HOME instead
of setdefault, so an outer value can't leak into the hermetic harness.
- record_gateway.py: clear error when DEERFLOW_RECORD_OUT is unset (was a
bare KeyError).
- playwright.record.config.ts: forward OPENAI_*/DEERFLOW_RECORD_OUT only when
set, so the gateway raises a clear 'missing env' error instead of getting ''.
* test(e2e): address Copilot review round 2
- seed_runs_router.py: constrain SeedMessage.role to Literal['human','ai']
so a bad value is a clean 422 at the boundary instead of a 500
(KeyError on _EVENT_TYPE).
- record-write-read-file.spec.ts: waitForCaptureStable now throws on
timeout instead of returning the last count, so a truncated/partial
recording can't pass silently.
- real-backend-render.spec.ts: guard the suggestions JSON.parse; a
bracket-prefixed non-JSON turn falls back to '' so the existing
not.toBe('') assertion fails clearly instead of a generic parse throw.
* fix(frontend): truncate overflowing text in agent cards
Long custom agent names, descriptions, skills and tool-group labels
overflowed the agent card and broke its layout (issue #3389). The title
already had `truncate`, but it never took effect: an ancestor flex
container lacked `min-w-0`, so the flex item refused to shrink below its
content width.
- Restore the truncation chain by adding `min-w-0` to the title's flex
ancestors so `truncate` can finally take effect.
- Cap and ellipsize model / skill / tool-group badges via a small
`TruncatedBadge` (`block max-w-full truncate`).
- Reveal the full value on hover, but only when the text is actually
clipped (`TruncatedTooltip`, width + height detection), so names,
descriptions and labels stay readable without popping redundant
tooltips on short cards.
* fix(frontend): wrap unbreakable strings in agent card tooltips
A long token with no break opportunity (no spaces or hyphens) could still
overflow the tooltip horizontally. Add `break-words` next to the existing
`text-wrap` so such strings wrap instead of overflowing.
Addresses Copilot review feedback on tooltip wrapping robustness.
* fix(frontend): show agent card tooltips instantly
Drop the explicit `delayDuration` so card tooltips fall back to the
provider's default 0ms delay. Instant feedback is better UX for revealing
text that is already clipped, per maintainer review.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(subagent): structured subagent_status field over text parsing
Closes#3146.
## Why
The frontend used to derive subtask card state by string-matching the
leading text of the `task` tool's result. That contract surface was
fragile — `#3107` BUG-007 and the `#3131` review both surfaced cases
where new backend wording (`Task cancelled by user.`,
`Task polling timed out after N minutes`, `ToolErrorHandlingMiddleware`
exception wrappers) silently broke the card lifecycle. The frontend
fallback kept growing more prefixes; any future rewording would break
it again.
## Design
1. **Backend → frontend contract**: `ToolMessage.additional_kwargs`
carries `subagent_status` (one of `completed | failed | cancelled |
timed_out | polling_timed_out`) and an optional `subagent_error`
blob. The frontend prefers it over parsing `content`.
2. **Centralised stamping, not 8 sprinkled stamps**: rather than have
each of `task_tool.py`'s 5 normal-return + 3 pre-execution `Error:`
paths remember to set `additional_kwargs`, `ToolErrorHandlingMiddleware`
stamps the field after every task-tool call. Adding a new return
path in `task_tool.py` cannot now skip the stamp.
3. **Cross-language contract fixture**: the prefix→status mapping is
the one piece both sides must agree on. The shared fixture at
`contracts/subagent_status_contract.json` lists every backend return
string, the expected status, and what the error substring should
contain. Backend test (`backend/tests/test_subagent_status_contract.py`)
and frontend test (`frontend/tests/unit/core/tasks/subtask-result.test.ts`)
both load that fixture and assert the same cases. A wording drift on
either side fails the matching language's test.
4. **Round-trip serialisation pinned**: the round-trip test asserts
`ToolMessage.model_dump_json()` → `model_validate_json()` preserves
`additional_kwargs.subagent_status`. Catches the case where a future
LangChain or Pydantic upgrade silently strips unknown kwargs.
5. **Frontend status collapse documented**: the backend has five status
values, the frontend card has three (`completed | failed |
in_progress`). `cancelled` / `timed_out` / `polling_timed_out` all
collapse to `failed` with the original status preserved in `error`.
`parseSubtaskResult` returns `in_progress` for unknown values so a
backend that ships a new enum variant before the frontend upgrades
degrades to the legacy prefix fallback instead of getting pinned.
## Changes
Backend:
- `deerflow.subagents.status_contract` — new module exporting
`SUBAGENT_STATUS_KEY`, `SUBAGENT_ERROR_KEY`,
`SUBAGENT_STATUS_VALUES`, `extract_subagent_status(content)`, and
`make_subagent_additional_kwargs(status, error)`.
- `ToolErrorHandlingMiddleware`: new `_stamp_task_subagent_status`
helper centralises the stamp; `wrap_tool_call` / `awrap_tool_call`
stamp on the success path; `_build_error_message` stamps on the
wrapper path (carrying `ExcClass: detail` into `subagent_error`).
Non-task tools are untouched.
- New tests: `test_subagent_status_contract.py` (19 cases from the
shared fixture + status-enum / blank-error / unknown-status
rejection) and `test_tool_error_handling_subagent_stamp.py`
(middleware integration: terminal-content stamps, non-terminal
doesn't, non-task tools untouched, async path mirrors sync,
existing additional_kwargs survive, JSON round-trip preserved).
Frontend:
- `parseSubtaskResult(text, additionalKwargs?)` — prefers the
structured stamp; falls back to the legacy prefix matcher for
historical threads / unknown future status values.
- `STRUCTURED_STATUS_TO_SUBTASK` documents the five→three collapse.
- `message-list.tsx` passes `message.additional_kwargs` through.
- `subtask-result.test.ts` adds a structured-status block + a
fixture-driven contract block; legacy prefix tests stay green for
the fallback path.
Contract:
- `contracts/subagent_status_contract.json` — single source of truth
both languages load. Whitespace variants, varied N for polling
timeouts, the 3 pre-execution `Error:` returns task_tool produces,
and the middleware wrapper shape are all in there.
## Test plan
- `make lint` clean (backend + frontend).
- `pytest tests/test_subagent_status_contract.py
tests/test_tool_error_handling_subagent_stamp.py` → 37 passed.
- `pnpm test --run` → 103 passed (was 76, +27 new).
## Migration / fallback retirement
The text-prefix fallback stays in place until backend telemetry shows
the frontend never hits it for newly produced messages. At that point
a follow-up PR can drop the prefix branches and keep only the
structured-status branch.
Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131
(prior prefix-only fix), #3146 (this issue).
* fix(subtask): back-fill result/error from text when structured status present
Three follow-ups on the PR #3154 review:
1. `readStructuredStatus` no longer short-circuits the prefix parse.
The backend currently stamps only the `subagent_status` enum value;
the human-facing `result` body and wrapped-error message still live
in `ToolMessage.content`. Dropping the text parse meant successful
tasks rendered empty completed pills and wrapped failures lost their
diagnostic. Now both shapes get composed: structured status wins,
`result`/`error` come from text when both sides agree, and a lying
success body under a `failed` stamp is dropped instead of leaking.
2. Replace the ESM-incompatible `__dirname` fixture lookup in
subtask-result.test.ts with `fileURLToPath(new URL(..., import.meta.url))`.
The frontend package is `"type": "module"`, so the previous path
would have thrown at runtime if anything ever changed under the
contract directory.
3. Drop the `$schema` reference from contracts/subagent_status_contract.json
pointing at a file that doesn't exist in the tree.
Three new tests cover the structured + text composition: completed
back-fills the success body, failed back-fills the wrapper text, and
unrecognised content under a `failed` stamp stays empty rather than
echoing noise.
* fix(frontend): preserve chronological order of thread history after context compression
Iterate runs from newest to match backend `list_by_thread` (newest-first) and the prepend semantics of the history loader, so refreshed history renders in A→B→C→D→E→F order.
Fixes#3352
* fix(frontend): auto-continue loading runs with no visible messages after context compression
* fix(frontend): surface backend detail when agent name check fails
The new-agent page caught AgentNameCheckError but only branched on
reason === "backend_unreachable". Everything else (notably the 422
"Invalid agent name '...'. Must match ^[A-Za-z0-9-]+$" response from
GET /api/agents/check when the user submits a name with disallowed
characters — trailing space, dot, Chinese, invisible whitespace from
copy-paste) fell through to the generic fallback "Could not verify
name availability — please try again", swallowing the detail that
already told the user exactly what to fix.
Add a request_failed branch that surfaces err.message (which
checkAgentName already populates from the backend's detail at
core/agents/api.ts). The disabled / backend_unreachable / unknown-
error paths are unchanged.
Pin the contract with unit tests covering: 200 success, fetch
rejection, 502/503/504 network errors, agents_api disabled detail,
422 validation detail carried verbatim, statusText fallback when
detail is absent, and a regression guard against misclassifying a
422 as agents_api disabled.
Closes#3041
* fix(frontend): localise the error prefix when surfacing backend detail
The previous commit surfaced the backend's raw `err.message` on the
new-agent page when the name check failed. The detail itself is
English (backend's `_validate_agent_name` text, any 5xx business
message, etc.) and dropping it bare into a zh-CN page produced a
jarring English-among-Chinese line that didn't match neighbouring
strings like "已存在同名智能体" / "无法验证名称可用性".
Add `nameStepCheckErrorWithDetail` as a templated string ("Name
check failed: {detail}" / "名称校验失败:{detail}"), mirroring the
existing `nameStepBootstrapMessage` `{name}` template pattern. The
page wraps `err.message` in it when present and falls back to the
plain `nameStepCheckError` when the detail is empty.
Rendered output (verified locally with a Console fetch mock that
returns 500 + detail):
zh-CN: 名称校验失败:Database connection lost: SQLAlchemy connection
pool exhausted (max 5 connections, all in use)
en-US: Name check failed: Database connection lost: SQLAlchemy
connection pool exhausted (max 5 connections, all in use)
The localised prefix tells the user *what operation* failed; the
raw detail tells them *why*. Translating the detail itself would
be lossy (any unbounded backend string would need a translation
table) and would break the debuggability the previous commit
delivered.
Refs #3041
* fix(frontend): distinguish backend detail from generated fallback in AgentNameCheckError
Addresses Copilot's review on #3048: the previous commits keyed off
`err.message`, but `checkAgentName` substitutes a generated fallback
string ("Failed to check agent name: ${statusText}") when the backend
sent no detail. That guaranteed `err.message` was always truthy, made
the `nameStepCheckError` fallback branch unreachable in practice, and
could surface awkward strings like "名称校验失败:Failed to check
agent name: Bad Gateway" in the UI.
Add an explicit `detail: string | null` field to AgentNameCheckError.
`checkAgentName` populates it only when the backend response actually
carried a string `detail` (defensive guard against the dict-shaped
detail that other deer-flow endpoints use for typed error codes).
The new-agent page now selects on `err.detail` instead of `err.message`
so the localised fallback wins when no real detail exists.
Also fix the prettier formatting that broke lint-frontend CI on the
previous push.
Test changes:
- The 422 carry-through test now asserts both `detail` and `message`
hold the backend string verbatim.
- A new "falls back to statusText in message but leaves detail null"
test pins the contract that no real detail ⇒ no UI surface leak.
- A new "treats non-string detail as null" test guards against future
backend schema drift toward dict-shaped detail.
Refs #3041#3048
When a user starts a new conversation, the sidebar list did not display
it until the AI finished streaming and generated a title. This made it
impossible to switch back to an in-progress conversation when working
with multiple threads concurrently.
Optimistically insert the new thread into the TanStack Query cache
during the `onCreated` callback so the sidebar renders a placeholder
entry ("New chat") as soon as the backend acknowledges thread creation.
The existing `onUpdateEvent` title handler and `onFinish` query
invalidation then update the entry in-place with the real title.
* fix(frontend): strip unclosed <think> tags from streaming AI content
During streaming, an opening <think> tag may arrive in one chunk
while the matching </think> arrives in a later chunk. The existing
splitInlineReasoning regex only matched fully closed pairs, so the
mid-flight reasoning was left in message.content and rendered into
the chat bubble via the markdown pipeline's rehypeRaw plugin until
the closing tag landed.
Extend splitInlineReasoning with a second pass: after stripping every
closed <think>...</think> pair, route any remaining content from a
lone opener to the reasoning slot and leave only the preceding
preamble in content. Closed-tag behavior is unchanged.
Covers every provider whose stream emits reasoning inline as <think>
tags (MiniMax streaming path, MindIE, PatchedChatOpenAI, and any
gateway-served DeepSeek/OpenAI-compatible model).
* style(frontend): apply prettier formatting to streaming reasoning tests
* fix(frontend): skip <think> split for literal think tags in inline code
Treats a `<think>` opener immediately preceded by a backtick as part of
markdown inline code rather than a streaming reasoning marker. Prevents
permanent content truncation when an AI message documents the `<think>`
tag literally (e.g. ``Use `<think>` markers``), where the streaming-safe
fallback would otherwise route the rest of the answer into the reasoning
panel because no `</think>` ever arrives.
Adds regression tests for both the post-stream and mid-stream cases.