feat(models): add vLLM provider support (#1860)

support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking.

Co-authored-by: NmanQAQ <normangyao@qq.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
This commit is contained in:
NmanQAQ
2026-04-06 15:18:34 +08:00
committed by GitHub
parent 5fd2c581f6
commit dd30e609f7
8 changed files with 534 additions and 5 deletions
+22
View File
@@ -245,6 +245,28 @@ models:
# max_tokens: 8192
# temperature: 0.7
# Example: vLLM 0.19.0 (OpenAI-compatible, with reasoning toggle)
# DeerFlow's vLLM provider preserves vLLM reasoning across tool-call turns and
# toggles Qwen-style reasoning by writing
# extra_body.chat_template_kwargs.enable_thinking=true/false.
# Some reasoning models also require the server to be started with
# `vllm serve ... --reasoning-parser <parser>`.
# - name: qwen3-32b-vllm
# display_name: Qwen3 32B (vLLM)
# use: deerflow.models.vllm_provider:VllmChatModel
# model: Qwen/Qwen3-32B
# api_key: $VLLM_API_KEY
# base_url: http://localhost:8000/v1
# request_timeout: 600.0
# max_retries: 2
# max_tokens: 8192
# supports_thinking: true
# supports_vision: false
# when_thinking_enabled:
# extra_body:
# chat_template_kwargs:
# enable_thinking: true
# ============================================================================
# Tool Groups Configuration
# ============================================================================