fix: ensure researcher agent uses web search tool instead of generating URLs (#702) (#704)

* fix: ensure researcher agent uses web search tool instead of generating URLs (#702)

- Add enforce_researcher_search configuration option (default: True) to control web search requirement
- Strengthen researcher prompts in both English and Chinese with explicit instructions to use web_search tool
- Implement validate_web_search_usage function to detect if web search tool was used during research
- Add validation logic that warns when researcher doesn't use web search tool
- Enhance logging for web search tools with special markers for easy tracking
- Skip validation during unit tests to avoid test failures
- Update _execute_agent_step to accept config parameter for proper configuration access

This addresses issue #702 where the researcher agent was generating URLs on its own instead of using the web search tool.

* fix: addressed the code review comment

* fix the unit test error and update the code
This commit is contained in:
Willem Jiang
2025-11-24 20:07:28 +08:00
committed by GitHub
parent cc9414f978
commit 478291df07
6 changed files with 84 additions and 11 deletions
+4 -1
View File
@@ -37,7 +37,8 @@ You have access to two types of tools:
3. **Plan the Solution**: Determine the best approach to solve the problem using the available tools.
4. **Execute the Solution**:
- Forget your previous knowledge, so you **should leverage the tools** to retrieve the information.
- Use the {% if resources %}**local_search_tool** or{% endif %}**web_search** or other suitable search tool to perform a search with the provided keywords.
- **CRITICAL**: You MUST use the {% if resources %}**local_search_tool** or{% endif %}**web_search** tool to search for information. NEVER generate URLs on your own. All URLs must come from tool results.
- **MANDATORY**: Always perform at least one web search using the **web_search** tool at the beginning of your research. This is not optional.
- When the task includes time range requirements:
- Incorporate appropriate time-based search parameters in your queries (e.g., "after:2020", "before:2023", or specific date ranges)
- Ensure search results respect the specified time constraints.
@@ -71,6 +72,8 @@ You have access to two types of tools:
# Notes
- **CRITICAL**: NEVER generate URLs on your own. All URLs must come from search tool results. This is a mandatory requirement.
- **MANDATORY**: Always start with a web search. Do not rely on your internal knowledge.
- Always verify the relevance and credibility of the information gathered.
- If no URL is provided, focus solely on the search results.
- Never do any math or any file operations.
+4 -1
View File
@@ -37,7 +37,8 @@ CURRENT_TIME: {{ CURRENT_TIME }}
3. **规划解决方案**:确定使用可用工具解决问题的最佳方法。
4. **执行解决方案**
- 忘记你之前的知识,所以你**应该利用工具**来检索信息。
- 使用{% if resources %}**local_search_tool**或{% endif %}**web_search**或其他合适的搜索工具以提供的关键词执行搜索
- **关键要求**:你必须使用{% if resources %}**local_search_tool**或{% endif %}**web_search**工具搜索信息。绝对不能自己生成URL。所有URL必须来自工具结果
- **强制要求**:在研究开始时必须使用**web_search**工具至少执行一次网络搜索。这不是可选项。
- 当任务包括时间范围要求时:
- 在查询中纳入适当的基于时间的搜索参数(如"after:2020"、"before:2023"或特定日期范围)
- 确保搜索结果尊重指定的时间约束。
@@ -66,6 +67,8 @@ CURRENT_TIME: {{ CURRENT_TIME }}
# 注意
- **关键要求**:绝对不能自己生成URL。所有URL必须来自搜索工具结果。这是强制要求。
- **强制要求**:始终从网络搜索开始。不要依赖你的内部知识。
- 始终验证收集的信息的相关性和可信度。
- 如果未提供URL,仅关注搜索结果。
- 不要进行任何数学运算或文件操作。