mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-05-25 17:36:00 +00:00
test(skills): add evaluation + trigger analysis for systematic-literature-review (#2061)
* test(skills): add trigger eval set for systematic-literature-review skill 20 eval queries (10 should-trigger, 10 should-not-trigger) for use with skill-creator's run_eval.py. Includes real-world SLR queries contributed by @VANDRANKI (issue #1862 author) and edge cases for routing disambiguation with academic-paper-review. * test(skills): add grader expectations for SLR skill evaluation 5 eval cases with 39 expectations covering: - Standard SLR flow (APA/BibTeX/IEEE format selection) - Keyword extraction and search behavior - Subagent dispatch for metadata extraction - Report structure (themes, convergences, gaps, per-paper annotations) - Negative case: single-paper routing to academic-paper-review - Edge case: implicit SLR without explicit keywords * refactor(skills): shorten SLR description for better trigger rate Reduce description from 833 to 344 chars. Key changes: - Lead with "systematic literature review" as primary trigger phrase - Strengthen single-paper exclusion: "Not for single-paper tasks" - Remove verbose example patterns that didn't improve routing Tested with run_eval.py (10 runs/query): - False positive "best paper on RL": 67% → 20% (improved) - True positive explicit SLR query: ~30% (unchanged) Low recall is a routing-layer limitation, not a description issue — see PR description for full analysis. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
---
|
||||
name: systematic-literature-review
|
||||
description: Use this skill whenever the user wants to survey, synthesize, or do a systematic literature review (SLR) across multiple academic papers on a topic. Triggers on queries like "review the literature on X", "survey recent papers about Y", "do an SLR on Z", "what does the literature say about W", "summarize recent research in A", or "compare findings across papers on B". Make sure to use this skill even when the user does not say the word "systematic" — the defining signal is that they want a synthesis across MANY papers rather than a deep read of a single one. Distinct from `academic-paper-review`, which does single-paper peer review. This skill searches arXiv, extracts structured metadata from each paper in parallel via subagents, synthesizes themes across the set, and emits a report in APA, IEEE, or BibTeX citation format.
|
||||
description: Use this skill when the user wants a systematic literature review, survey, or synthesis across multiple academic papers on a topic. Also covers annotated bibliographies and cross-paper comparisons. Searches arXiv and outputs reports in APA, IEEE, or BibTeX format. Not for single-paper tasks — use academic-paper-review for reviewing one paper.
|
||||
---
|
||||
|
||||
# Systematic Literature Review Skill
|
||||
|
||||
Reference in New Issue
Block a user