* docs(spec): MiniMax integration for generation skills + new music skill Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(plan): MiniMax generation providers implementation plan Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(skills): add importlib loader + FakeResp for skill tests * test(skills): register loaded module in sys.modules; raise requests.HTTPError in FakeResp * feat(image-generation): add MiniMax provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(image-generation): guard unknown provider, derive ref MIME, strengthen tests Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(video-generation): add MiniMax provider with async poll/download Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(video-generation): surface base_resp errors while polling; add timeout test * feat(podcast-generation): add MiniMax t2a_v2 provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(podcast-generation): restore TTS credential guard; add volcengine + voice tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(music-generation): new MiniMax music skill via skill-creator Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(music-generation): treat empty lyrics as absent; test no-audio-data path * refactor(skills): add request timeouts to MiniMax network calls Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Potential fix for pull request finding 'Explicit returns mixed with implicit (fall through) returns' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * fix(models): strip inconsistent user-message names for MiniMax chat DeerFlow middlewares tag user messages with provenance names (user-input, summary, loop_warning); langchain serializes them into the OpenAI-compatible payload and MiniMax rejects mismatched user-message names with "user name must be consistent (2013)". PatchedChatMiniMax now drops the per-message name from user-role messages. Point the config.example MiniMax models at PatchedChatMiniMax so they also get reasoning_content mapping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(image-generation): MiniMax sends JSON prompt field, guard 1500-char limit MiniMax image-01 takes one text string capped at 1500 chars, but the skill was sending the whole structured JSON. The MiniMax provider now extracts the JSON `prompt` field (relying on prompt_optimizer to expand it) and fails fast with a clear error before calling the API when that field exceeds 1500 chars. Authoring stays provider-agnostic; Gemini still receives the full JSON. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(podcast-generation): per-provider TTS concurrency and retry/backoff Each TTS provider owns its concurrency internally — MiniMax runs single-threaded to reduce rate-limit failures, Volcengine keeps 4 workers — with automatic retry and backoff on transient HTTP and base_resp errors. No caller-facing concurrency knob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(skills): address Copilot review comments on generation skills - video: add raise_for_status + timeout to the Gemini download/POST/poll calls so non-2xx responses surface as clear HTTP errors instead of JSON/KeyError or hangs - video: check the task Fail status before the generic base_resp check so the failure keeps its task_id context - video/image: create the output file parent directory before writing (matching music-generation) so nested output paths do not raise FileNotFoundError - music: require a non-empty prompt and fail fast with ValueError instead of sending an empty prompt to the API Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(scripts): reclaim dev ports across worktrees in make stop/dev All deer-flow worktrees (main checkout + linked worktrees) hardcode the same dev ports (8001/3000/2026), so a service started from any worktree must be reclaimable from another. stop_all now resolves the set of worktree roots (DEERFLOW_ROOTS) and treats a process as deer-flow-owned when its open files live under any of them. It also force-kills survivors on 2026 alongside 8001/3000, fixing `make dev` aborting on the nginx port preflight when a prior nginx lingered on 2026. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(view-image): hide the injected image-context message from the UI ViewImageMiddleware injects a HumanMessage (text + base64 images) so the vision model can see viewed images, but it was the only internal injector that set neither hide_from_ui nor a hidden name, so it leaked into the chat UI (and IM channels) as a user bubble reading "Here are the images you've viewed:". Mark it with additional_kwargs={"hide_from_ui": True}, matching todo/dynamic_context injections, which the frontend isHiddenFromUIMessage and the channel sender already honor. The model still receives the full content. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(minimax): mark M2.7 models as text-only (no vision) MiniMax M2.7 / M2.7-highspeed do not support vision; only M3 does. The provider config asserted vision support for M2.7 in four places. - config.example.yaml: 4 M2.7 entries -> supports_vision: false - backend/docs/CONFIGURATION.md: M2.7 + highspeed -> supports_vision: false - wizard: add LLMProvider.model_vision_overrides + extra_config_for() so selecting an M2.7 model writes supports_vision: false while M3 (default) keeps vision; wire it through setup_wizard.py - tests: M2.7-highspeed fixture -> supports_vision=False; add test_minimax_vision_is_per_model Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
10 KiB
name, description
| name | description |
|---|---|
| image-generation | Use this skill when the user requests to generate, create, imagine, or visualize images including characters, scenes, products, or any visual content. Supports structured prompts and reference images for guided generation. |
Image Generation Skill
Overview
This skill generates high-quality images using structured prompts and a Python script. The workflow includes creating JSON-formatted prompts and executing image generation with optional reference images.
Core Capabilities
- Create structured JSON prompts for AIGC image generation
- Support multiple reference images for style/composition guidance
- Generate images through automated Python script execution
- Handle various image generation scenarios (character design, scenes, products, etc.)
Workflow
Step 1: Understand Requirements
When a user requests image generation, identify:
- Subject/content: What should be in the image
- Style preferences: Art style, mood, color palette
- Technical specs: Aspect ratio, composition, lighting
- Reference images: Any images to guide generation
- You don't need to check the folder under
/mnt/user-data
Step 2: Create Structured Prompt
Generate a structured JSON file in /mnt/user-data/workspace/ with naming pattern: {descriptive-name}.json
Step 3: Execute Generation
Call the Python script:
python /mnt/skills/public/image-generation/scripts/generate.py \
--prompt-file /mnt/user-data/workspace/prompt-file.json \
--reference-images /path/to/ref1.jpg /path/to/ref2.png \
--output-file /mnt/user-data/outputs/generated-image.jpg
--aspect-ratio 16:9
Parameters:
--prompt-file: Absolute path to JSON prompt file (required)--reference-images: Absolute paths to reference images (optional, space-separated)--output-file: Absolute path to output image file (required)--aspect-ratio: Aspect ratio of the generated image (optional, default: 16:9)
[!NOTE] Do NOT read the python file, just call it with the parameters.
Character Generation Example
User request: "Create a Tokyo street style woman character in 1990s"
Create prompt file: /mnt/user-data/workspace/asian-woman.json
{
"characters": [{
"gender": "female",
"age": "mid-20s",
"ethnicity": "Japanese",
"body_type": "slender, elegant",
"facial_features": "delicate features, expressive eyes, subtle makeup with emphasis on lips, long dark hair partially wet from rain",
"clothing": "stylish trench coat, designer handbag, high heels, contemporary Tokyo street fashion",
"accessories": "minimal jewelry, statement earrings, leather handbag",
"era": "1990s"
}],
"negative_prompt": "blurry face, deformed, low quality, overly sharp digital look, oversaturated colors, artificial lighting, studio setting, posed, selfie angle",
"style": "Leica M11 street photography aesthetic, film-like rendering, natural color palette with slight warmth, bokeh background blur, analog photography feel",
"composition": "medium shot, rule of thirds, subject slightly off-center, environmental context of Tokyo street visible, shallow depth of field isolating subject",
"lighting": "neon lights from signs and storefronts, wet pavement reflections, soft ambient city glow, natural street lighting, rim lighting from background neons",
"color_palette": "muted naturalistic tones, warm skin tones, cool blue and magenta neon accents, desaturated compared to digital photography, film grain texture"
}
Execute generation:
python /mnt/skills/public/image-generation/scripts/generate.py \
--prompt-file /mnt/user-data/workspace/cyberpunk-hacker.json \
--output-file /mnt/user-data/outputs/cyberpunk-hacker-01.jpg \
--aspect-ratio 2:3
With reference images:
{
"characters": [{
"gender": "based on [Image 1]",
"age": "based on [Image 1]",
"ethnicity": "human from [Image 1] adapted to Star Wars universe",
"body_type": "based on [Image 1]",
"facial_features": "matching [Image 1] with slight weathered look from space travel",
"clothing": "Star Wars style outfit - worn leather jacket with utility vest, cargo pants with tactical pouches, scuffed boots, belt with holster",
"accessories": "blaster pistol on hip, comlink device on wrist, goggles pushed up on forehead, satchel with supplies, personal vehicle based on [Image 2]",
"era": "Star Wars universe, post-Empire era"
}],
"prompt": "Character inspired by [Image 1] standing next to a vehicle inspired by [Image 2] on a bustling alien planet street in Star Wars universe aesthetic. Character wearing worn leather jacket with utility vest, cargo pants with tactical pouches, scuffed boots, belt with blaster holster. The vehicle adapted to Star Wars aesthetic with weathered metal panels, repulsor engines, desert dust covering, parked on the street. Exotic alien marketplace street with multi-level architecture, weathered metal structures, hanging market stalls with colorful awnings, alien species walking by as background characters. Twin suns casting warm golden light, atmospheric dust particles in air, moisture vaporators visible in distance. Gritty lived-in Star Wars aesthetic, practical effects look, film grain texture, cinematic composition.",
"negative_prompt": "clean futuristic look, sterile environment, overly CGI appearance, fantasy medieval elements, Earth architecture, modern city",
"style": "Star Wars original trilogy aesthetic, lived-in universe, practical effects inspired, cinematic film look, slightly desaturated with warm tones",
"composition": "medium wide shot, character in foreground with alien street extending into background, environmental storytelling, rule of thirds",
"lighting": "warm golden hour lighting from twin suns, rim lighting on character, atmospheric haze, practical light sources from market stalls",
"color_palette": "warm sandy tones, ochre and sienna, dusty blues, weathered metals, muted earth colors with pops of alien market colors",
"technical": {
"aspect_ratio": "9:16",
"quality": "high",
"detail_level": "highly detailed with film-like texture"
}
}
python /mnt/skills/public/image-generation/scripts/generate.py \
--prompt-file /mnt/user-data/workspace/star-wars-scene.json \
--reference-images /mnt/user-data/uploads/character-ref.jpg /mnt/user-data/uploads/vehicle-ref.jpg \
--output-file /mnt/user-data/outputs/star-wars-scene-01.jpg \
--aspect-ratio 16:9
Common Scenarios
Use different JSON schemas for different scenarios.
Character Design:
- Physical attributes (gender, age, ethnicity, body type)
- Facial features and expressions
- Clothing and accessories
- Historical era or setting
- Pose and context
Scene Generation:
- Environment description
- Time of day, weather
- Mood and atmosphere
- Focal points and composition
Product Visualization:
- Product details and materials
- Lighting setup
- Background and context
- Presentation angle
Specific Templates
Read the following template file only when matching the user request.
Output Handling
After generation:
- Images are typically saved in
/mnt/user-data/outputs/ - Share generated images with user using present_files tool
- Provide brief description of the generation result
- Offer to iterate if adjustments needed
Tips: Enhancing Generation with Reference Images
For scenarios where visual accuracy is critical, use the image_search tool first to find reference images before generation.
Recommended scenarios for using image_search tool:
- Character/Portrait Generation: Search for similar poses, expressions, or styles to guide facial features and body proportions
- Specific Objects or Products: Find reference images of real objects to ensure accurate representation
- Architectural or Environmental Scenes: Search for location references to capture authentic details
- Fashion and Clothing: Find style references to ensure accurate garment details and styling
Example workflow:
- Call the
image_searchtool to find suitable reference images:image_search(query="Japanese woman street photography 1990s", size="Large") - Download the returned image URLs to local files
- Use the downloaded images as
--reference-imagesparameter in the generation script
This approach significantly improves generation quality by providing the model with concrete visual guidance rather than relying solely on text descriptions.
Providers (Gemini / MiniMax)
This skill auto-selects the provider by environment variables (no CLI change):
GEMINI_API_KEYset → use Gemini (default, unchanged).- Only
MINIMAX_API_KEYset → use MiniMax (/v1/image_generation, modelimage-01). - Force one explicitly with
IMAGE_GENERATION_PROVIDER=gemini|minimax.
MiniMax optional overrides: MINIMAX_API_HOST (default https://api.minimaxi.com),
MINIMAX_IMAGE_MODEL (default image-01). Reference images are sent as the MiniMax
subject_reference character image. The CLI and --prompt-file / --reference-images
/ --output-file / --aspect-ratio arguments are identical for both providers.
MiniMax prompt handling (provider-internal). Authoring is provider-agnostic — write
the same structured JSON regardless of which provider is active. MiniMax image-01
consumes a single text string, so the MiniMax path itself sends only the JSON prompt
field (the other fields such as style / composition / negative_prompt apply to the
Gemini path) and enables prompt_optimizer so MiniMax expands it server-side. MiniMax
caps that prompt at 1500 characters; if the prompt field is longer, the script returns
an error instead of calling the API. The Gemini path receives the full structured JSON.
Notes
- Always use English for prompts regardless of user's language
- JSON format ensures structured, parsable prompts
- Reference images enhance generation quality significantly
- Iterative refinement is normal for optimal results
- For character generation, include the detailed character object plus a consolidated prompt field