fix(suggestions): strip inline <think> reasoning before parsing follow-up questions (#3435)

Reasoning models such as MiniMax-M3 inline their chain-of-thought into the message content as <think>...</think> (reasoning_split defaults to false) instead of a separate reasoning_content field. The follow-up-suggestions endpoint extracted the JSON array via find('[') / rfind(']'), which silently broke whenever the reasoning text contained '[' or ']' — or when long thinking hit max_tokens and truncated before the array was emitted — returning empty suggestions. - Add _strip_think_blocks() and apply it before JSON extraction; it removes complete <think>...</think> blocks (case-insensitive) and drops an unclosed <think> left by max_tokens truncation. - Document the MiniMax thinking toggle in config.example.yaml (when_thinking_enabled: adaptive / when_thinking_disabled: disabled) so thinking_enabled=False actually disables reasoning on M3; note that M2.x models always think and rely on the defensive strip above. - Tests cover complete/unclosed think blocks, brackets-inside-think, think + code-fence, and an end-to-end suggestions case reproducing the empty-result bug. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 09:55:59 +00:00 · 2026-06-08 15:48:00 +08:00
parent 88759015e4
commit 3b105d1e5f
4 changed files with 115 additions and 2 deletions
@@ -263,7 +263,7 @@ CORS is same-origin by default when requests enter through nginx on port 2026. S
 | **Uploads** (`/api/threads/{id}/uploads`) | `POST /` - upload files (auto-converts PDF/PPT/Excel/Word); `GET /list` - list; `DELETE /{filename}` - delete |
 | **Threads** (`/api/threads/{id}`) | `DELETE /` - remove DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail |
 | **Artifacts** (`/api/threads/{id}/artifacts`) | `GET /{path}` - serve artifacts; active content types (`text/html`, `application/xhtml+xml`, `image/svg+xml`) are always forced as download attachments to reduce XSS risk; `?download=true` still forces download for other file types |
-| **Suggestions** (`/api/threads/{id}/suggestions`) | `POST /` - generate follow-up questions; rich list/block model content is normalized before JSON parsing |
+| **Suggestions** (`/api/threads/{id}/suggestions`) | `POST /` - generate follow-up questions; rich list/block model content is normalized and inline reasoning (`<think>...</think>`, including unclosed/truncated blocks from reasoning models like MiniMax-M3) is stripped before JSON parsing |
 | **Thread Runs** (`/api/threads/{id}/runs`) | `POST /` - create background run; `POST /stream` - create + SSE stream; `POST /wait` - create + block; `GET /` - list runs; `GET /{rid}` - run details; `POST /{rid}/cancel` - cancel; `GET /{rid}/join` - join SSE; `GET /{rid}/messages` - paginated messages `{data, has_more}`; `GET /{rid}/events` - full event stream; `GET /../messages` - thread messages with feedback; `GET /../token-usage` - aggregate tokens |
 | **Feedback** (`/api/threads/{id}/runs/{rid}/feedback`) | `PUT /` - upsert feedback; `DELETE /` - delete user feedback; `POST /` - create feedback; `GET /` - list feedback; `GET /stats` - aggregate stats; `DELETE /{fid}` - delete specific |
 | **Runs** (`/api/runs`) | `POST /stream` - stateless run + SSE; `POST /wait` - stateless run + block; `GET /{rid}/messages` - paginated messages by run_id `{data, has_more}` (cursor: `after_seq`/`before_seq`); `GET /{rid}/feedback` - list feedback by run_id |