fix(suggestions): strip inline <think> reasoning before parsing follow-up questions (#3435)

Reasoning models such as MiniMax-M3 inline their chain-of-thought into the message content as <think>...</think> (reasoning_split defaults to false) instead of a separate reasoning_content field. The follow-up-suggestions endpoint extracted the JSON array via find('[') / rfind(']'), which silently broke whenever the reasoning text contained '[' or ']' — or when long thinking hit max_tokens and truncated before the array was emitted — returning empty suggestions. - Add _strip_think_blocks() and apply it before JSON extraction; it removes complete <think>...</think> blocks (case-insensitive) and drops an unclosed <think> left by max_tokens truncation. - Document the MiniMax thinking toggle in config.example.yaml (when_thinking_enabled: adaptive / when_thinking_disabled: disabled) so thinking_enabled=False actually disables reasoning on M3; note that M2.x models always think and rely on the defensive strip above. - Tests cover complete/unclosed think blocks, brackets-inside-think, think + code-fence, and an end-to-end suggestions case reproducing the empty-result bug. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 17:35:57 +00:00 · 2026-06-08 15:48:00 +08:00
parent 88759015e4
commit 3b105d1e5f
4 changed files with 115 additions and 2 deletions
@@ -289,7 +289,23 @@ models:
  #   temperature: 1.0  # MiniMax requires temperature in (0.0, 1.0]
  #   supports_vision: true
  #   supports_thinking: true
+  #   # MiniMax inlines its chain-of-thought into `content` as <think>...</think>
+  #   # (reasoning_split defaults to false), not in a separate reasoning_content
+  #   # field. Declare the thinking toggle so non-thinking paths (flash mode,
+  #   # follow-up suggestions, title/memory generation) truly disable reasoning
+  #   # instead of wasting tokens on — and parsing around — inline <think> blocks.
+  #   when_thinking_enabled:
+  #     extra_body:
+  #       thinking:
+  #         type: adaptive
+  #   when_thinking_disabled:
+  #     extra_body:
+  #       thinking:
+  #         type: disabled

+  # NOTE: M2.x models always think — passing thinking:{type:disabled} has no
+  # effect (per MiniMax docs), so the toggle above is omitted for M2.7. The
+  # follow-up-suggestions endpoint strips inline <think> defensively regardless.
  # - name: minimax-m2.7
  #   display_name: MiniMax M2.7
  #   use: langchain_openai:ChatOpenAI
@@ -331,7 +347,23 @@ models:
  #   temperature: 1.0  # MiniMax requires temperature in (0.0, 1.0]
  #   supports_vision: true
  #   supports_thinking: true
+  #   # MiniMax inlines its chain-of-thought into `content` as <think>...</think>
+  #   # (reasoning_split defaults to false), not in a separate reasoning_content
+  #   # field. Declare the thinking toggle so non-thinking paths (flash mode,
+  #   # follow-up suggestions, title/memory generation) truly disable reasoning
+  #   # instead of wasting tokens on — and parsing around — inline <think> blocks.
+  #   when_thinking_enabled:
+  #     extra_body:
+  #       thinking:
+  #         type: adaptive
+  #   when_thinking_disabled:
+  #     extra_body:
+  #       thinking:
+  #         type: disabled

+  # NOTE: M2.x models always think — passing thinking:{type:disabled} has no
+  # effect (per MiniMax docs), so the toggle above is omitted for M2.7. The
+  # follow-up-suggestions endpoint strips inline <think> defensively regardless.
  # - name: minimax-m2.7
  #   display_name: MiniMax M2.7
  #   use: langchain_openai:ChatOpenAI