mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-10 09:25:57 +00:00
feat: MiniMax provider for image/video/podcast skills + new music-generation skill (#3437)
* docs(spec): MiniMax integration for generation skills + new music skill Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(plan): MiniMax generation providers implementation plan Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(skills): add importlib loader + FakeResp for skill tests * test(skills): register loaded module in sys.modules; raise requests.HTTPError in FakeResp * feat(image-generation): add MiniMax provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(image-generation): guard unknown provider, derive ref MIME, strengthen tests Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(video-generation): add MiniMax provider with async poll/download Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(video-generation): surface base_resp errors while polling; add timeout test * feat(podcast-generation): add MiniMax t2a_v2 provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(podcast-generation): restore TTS credential guard; add volcengine + voice tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(music-generation): new MiniMax music skill via skill-creator Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(music-generation): treat empty lyrics as absent; test no-audio-data path * refactor(skills): add request timeouts to MiniMax network calls Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Potential fix for pull request finding 'Explicit returns mixed with implicit (fall through) returns' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * fix(models): strip inconsistent user-message names for MiniMax chat DeerFlow middlewares tag user messages with provenance names (user-input, summary, loop_warning); langchain serializes them into the OpenAI-compatible payload and MiniMax rejects mismatched user-message names with "user name must be consistent (2013)". PatchedChatMiniMax now drops the per-message name from user-role messages. Point the config.example MiniMax models at PatchedChatMiniMax so they also get reasoning_content mapping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(image-generation): MiniMax sends JSON prompt field, guard 1500-char limit MiniMax image-01 takes one text string capped at 1500 chars, but the skill was sending the whole structured JSON. The MiniMax provider now extracts the JSON `prompt` field (relying on prompt_optimizer to expand it) and fails fast with a clear error before calling the API when that field exceeds 1500 chars. Authoring stays provider-agnostic; Gemini still receives the full JSON. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(podcast-generation): per-provider TTS concurrency and retry/backoff Each TTS provider owns its concurrency internally — MiniMax runs single-threaded to reduce rate-limit failures, Volcengine keeps 4 workers — with automatic retry and backoff on transient HTTP and base_resp errors. No caller-facing concurrency knob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(skills): address Copilot review comments on generation skills - video: add raise_for_status + timeout to the Gemini download/POST/poll calls so non-2xx responses surface as clear HTTP errors instead of JSON/KeyError or hangs - video: check the task Fail status before the generic base_resp check so the failure keeps its task_id context - video/image: create the output file parent directory before writing (matching music-generation) so nested output paths do not raise FileNotFoundError - music: require a non-empty prompt and fail fast with ValueError instead of sending an empty prompt to the API Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(scripts): reclaim dev ports across worktrees in make stop/dev All deer-flow worktrees (main checkout + linked worktrees) hardcode the same dev ports (8001/3000/2026), so a service started from any worktree must be reclaimable from another. stop_all now resolves the set of worktree roots (DEERFLOW_ROOTS) and treats a process as deer-flow-owned when its open files live under any of them. It also force-kills survivors on 2026 alongside 8001/3000, fixing `make dev` aborting on the nginx port preflight when a prior nginx lingered on 2026. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(view-image): hide the injected image-context message from the UI ViewImageMiddleware injects a HumanMessage (text + base64 images) so the vision model can see viewed images, but it was the only internal injector that set neither hide_from_ui nor a hidden name, so it leaked into the chat UI (and IM channels) as a user bubble reading "Here are the images you've viewed:". Mark it with additional_kwargs={"hide_from_ui": True}, matching todo/dynamic_context injections, which the frontend isHiddenFromUIMessage and the channel sender already honor. The model still receives the full content. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(minimax): mark M2.7 models as text-only (no vision) MiniMax M2.7 / M2.7-highspeed do not support vision; only M3 does. The provider config asserted vision support for M2.7 in four places. - config.example.yaml: 4 M2.7 entries -> supports_vision: false - backend/docs/CONFIGURATION.md: M2.7 + highspeed -> supports_vision: false - wizard: add LLMProvider.model_vision_overrides + extra_config_for() so selecting an M2.7 model writes supports_vision: false while M3 (default) keeps vision; wire it through setup_wizard.py - tests: M2.7-highspeed fixture -> supports_vision=False; add test_minimax_vision_is_per_model Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
This commit is contained in:
@@ -113,7 +113,7 @@ models:
|
||||
base_url: https://api.minimax.io/v1
|
||||
max_tokens: 4096
|
||||
temperature: 1.0 # MiniMax requires temperature in (0.0, 1.0]
|
||||
supports_vision: true
|
||||
supports_vision: false # M2.7 is text-only; M3 supports vision
|
||||
|
||||
- name: minimax-m2.7-highspeed
|
||||
display_name: MiniMax M2.7 Highspeed
|
||||
@@ -123,7 +123,7 @@ models:
|
||||
base_url: https://api.minimax.io/v1
|
||||
max_tokens: 4096
|
||||
temperature: 1.0 # MiniMax requires temperature in (0.0, 1.0]
|
||||
supports_vision: true
|
||||
supports_vision: false # M2.7 is text-only; M3 supports vision
|
||||
- name: openrouter-gemini-2.5-flash
|
||||
display_name: Gemini 2.5 Flash (OpenRouter)
|
||||
use: langchain_openai:ChatOpenAI
|
||||
|
||||
@@ -179,8 +179,10 @@ class ViewImageMiddleware(AgentMiddleware[ViewImageMiddlewareState]):
|
||||
# Create the image details message with text and image content
|
||||
image_content = self._create_image_details_message(state)
|
||||
|
||||
# Create a new human message with mixed content (text + images)
|
||||
human_msg = HumanMessage(content=image_content)
|
||||
# Create a new human message with mixed content (text + images). This is
|
||||
# internal context for the model only, so hide it from the chat UI and IM
|
||||
# channels (matches the other middleware-injected context messages).
|
||||
human_msg = HumanMessage(content=image_content, additional_kwargs={"hide_from_ui": True})
|
||||
|
||||
logger.debug("Injecting image details message with images before LLM call")
|
||||
|
||||
|
||||
@@ -114,8 +114,27 @@ class PatchedChatMiniMax(ChatOpenAI):
|
||||
}
|
||||
else:
|
||||
payload["extra_body"] = {"reasoning_split": True}
|
||||
self._strip_user_message_names(payload)
|
||||
return payload
|
||||
|
||||
@staticmethod
|
||||
def _strip_user_message_names(payload: dict) -> None:
|
||||
"""Drop the per-message ``name`` field from user-role messages.
|
||||
|
||||
DeerFlow middlewares tag user messages with internal provenance names
|
||||
(``user-input``, ``summary``, ``loop_warning``, ...). ``langchain_openai``
|
||||
serializes those into the OpenAI-compatible request, but MiniMax requires
|
||||
every user-role ``name`` to be identical and otherwise rejects the request
|
||||
with ``invalid params, user name must be consistent (2013)``. MiniMax does
|
||||
not use the per-message author name, so strip it.
|
||||
"""
|
||||
messages = payload.get("messages")
|
||||
if not isinstance(messages, list):
|
||||
return
|
||||
for message in messages:
|
||||
if isinstance(message, dict) and message.get("role") == "user":
|
||||
message.pop("name", None)
|
||||
|
||||
def _convert_chunk_to_generation_chunk(
|
||||
self,
|
||||
chunk: dict,
|
||||
|
||||
@@ -715,7 +715,7 @@ def test_openai_compatible_provider_multiple_models(monkeypatch):
|
||||
base_url="https://api.minimax.io/v1",
|
||||
api_key="test-key",
|
||||
temperature=1.0,
|
||||
supports_vision=True,
|
||||
supports_vision=False, # M2.7 is text-only; M3 supports vision
|
||||
supports_thinking=False,
|
||||
)
|
||||
cfg = _make_app_config([m1, m2])
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
from langchain_core.messages import AIMessageChunk, HumanMessage
|
||||
from langchain_core.messages import AIMessage, AIMessageChunk, HumanMessage, SystemMessage
|
||||
|
||||
from deerflow.models.patched_minimax import PatchedChatMiniMax
|
||||
|
||||
@@ -21,6 +21,30 @@ def test_get_request_payload_preserves_thinking_and_forces_reasoning_split():
|
||||
assert payload["extra_body"]["reasoning_split"] is True
|
||||
|
||||
|
||||
def test_get_request_payload_strips_inconsistent_user_message_names():
|
||||
"""MiniMax rejects user messages whose `name` fields differ (error 2013).
|
||||
|
||||
DeerFlow middlewares tag user messages with internal provenance names
|
||||
(e.g. "summary", "user-input", "loop_warning"). langchain serializes those
|
||||
into the OpenAI-compatible payload, and MiniMax requires every user-role
|
||||
name to be consistent. Strip them so the request is accepted.
|
||||
"""
|
||||
model = _make_model()
|
||||
|
||||
payload = model._get_request_payload(
|
||||
[
|
||||
SystemMessage(content="system"),
|
||||
HumanMessage(content="older summary", name="summary"),
|
||||
AIMessage(content="ok"),
|
||||
HumanMessage(content="latest question", name="user-input"),
|
||||
]
|
||||
)
|
||||
|
||||
user_messages = [m for m in payload["messages"] if m["role"] == "user"]
|
||||
assert len(user_messages) == 2
|
||||
assert all(m.get("name") is None for m in user_messages)
|
||||
|
||||
|
||||
def test_create_chat_result_maps_reasoning_details_to_reasoning_content():
|
||||
model = _make_model()
|
||||
response = {
|
||||
|
||||
@@ -54,6 +54,29 @@ class TestProviders:
|
||||
assert providers["deepseek"].use == "deerflow.models.patched_deepseek:PatchedChatDeepSeek"
|
||||
assert providers["volcengine"].extra_config["api_base"] == "https://ark.cn-beijing.volces.com/api/v3"
|
||||
|
||||
def test_minimax_vision_is_per_model(self):
|
||||
"""M3 supports vision; M2.7 variants are text-only.
|
||||
|
||||
The provider-level extra_config carries the default (M3) capability, but
|
||||
extra_config_for() must drop vision when an M2.7 model is selected.
|
||||
"""
|
||||
providers = {provider.name: provider for provider in LLM_PROVIDERS}
|
||||
|
||||
for name in ("minimax", "minimax_cn"):
|
||||
provider = providers[name]
|
||||
assert provider.extra_config["supports_vision"] is True
|
||||
assert provider.extra_config_for("MiniMax-M3")["supports_vision"] is True
|
||||
assert provider.extra_config_for("MiniMax-M2.7")["supports_vision"] is False
|
||||
assert provider.extra_config_for("MiniMax-M2.7-highspeed")["supports_vision"] is False
|
||||
# Override must not mutate the shared provider-level config.
|
||||
assert provider.extra_config["supports_vision"] is True
|
||||
|
||||
def test_extra_config_for_returns_provider_config_without_override(self):
|
||||
"""Providers without per-model overrides return their config unchanged."""
|
||||
providers = {provider.name: provider for provider in LLM_PROVIDERS}
|
||||
openai = providers["openai"]
|
||||
assert openai.extra_config_for("gpt-5") == openai.extra_config
|
||||
|
||||
def test_llm_providers_have_required_fields(self):
|
||||
for p in LLM_PROVIDERS:
|
||||
assert p.name
|
||||
|
||||
@@ -356,6 +356,9 @@ class TestInjectImageMessage:
|
||||
# Mixed-content payload: list of text + image_url blocks
|
||||
assert isinstance(injected.content, list)
|
||||
assert any(isinstance(b, dict) and b.get("type") == "image_url" for b in injected.content)
|
||||
# Internal injection: must be hidden from the chat UI (and IM channels),
|
||||
# like the other middleware-injected context messages.
|
||||
assert injected.additional_kwargs.get("hide_from_ui") is True
|
||||
|
||||
|
||||
class TestBeforeModel:
|
||||
|
||||
Reference in New Issue
Block a user