feat(skill): add blocking-io-guard — SOP skill for blocking-IO triage and runtime anchors (#3503)

* feat(blocking-io): add changed-lines blocking-IO scanner (L1) * feat(blocking-io): add scan-changed CLI wrapper * feat(skill): add blocking-io-guard developer SOP skill * docs(blocking-io): point contributors at the blocking-io-guard skill * style(blocking-io): apply ruff format to scanner and tests * docs(backend): document changed-lines blocking-IO scanner in CLAUDE.md * feat(skill): add post-fix re-scan check and PR batching policy * refactor(skill): fix SOP step ordering, align template with repo conventions - Move re-scan into an explicit 'apply the fix' step (was wedged after anchor generation while telling you to go back before the anchor) - Renumber steps 0-6; drop undefined 'L1' jargon - Mode A: document that the diff is <base>...HEAD (commit first) - Mode B: prefer make detect-blocking-io + findings JSON file - anchor template: module-level pytestmark per tests/blocking_io convention - CLAUDE.md: fix 'git diff --base' phrasing * fix(skill): catch findings introduced without touching the blocking line Review follow-up: changed-line intersection alone misses the case where a new async caller exposes an old sync helper — the static finding sits on the untouched blocking line, so Mode A returned empty and the SOP stopped on a false 'no blocking-IO surface'. Selection is now a union over the changed files: - findings on added lines of git diff <base>...HEAD (kept: a second identical symbol in an already-flagged function collides on the stable key and only this selection sees it); - findings new versus the merge base, matched by (path, function, symbol) — never line numbers. Base sources are materialized via git show <merge-base>:<path>; files absent at base count every head finding as new. SKILL.md now states the residual same-file-only blind spot (cross-file async callers) instead of treating an empty list as proof of zero exposure, and only requires reading sop-skeleton.md when generalizing to another detector domain. * docs(skill): examples teach test-writing, the teeth check defines the rule All examples in the references/template are filesystem-flavored; make explicit that they are instances, not the SOP's boundary — the same rules apply to every detector category (FILE_IO, HTTP, SUBPROCESS, SLEEP) and acceptance is always red/green teeth, never similarity to an example. Neutralize the template's arrange comment accordingly. * fix(blocking-io): harden changed-lines scanner per review - Dedup the union selection by the stable key (path, function, symbol) instead of dict identity, so a future selector returning copied dicts cannot silently empty the result. - parse_changed_lines now handles any unified diff: context lines advance the new-file counter, \-markers and deletions do not, and the counter resets at each +++ header. Previously correct only for --unified=0. - Add blocking_io_static.scan_source (in-memory scan); base-version comparison no longer round-trips through temp files. - Empty Mode A report now prints the same-file-only reachability caveat at the point of use instead of relying on the SOP text alone. * docs(skill): bound best-effort cleanup when the offload sits in finally Lesson from the #3505 review: the SOP routinely drives 'offload the cleanup branch' transformations, and an awaited cleanup in finally can mask or stall the primary exception. One sentence in Step 2 closes that gap at the point where the fix is written.
2026-06-13 19:06:01 +00:00 · 2026-06-12 10:20:38 +08:00
parent 330a2ff8c5
commit dc2ababf00
10 changed files with 703 additions and 4 deletions
@@ -0,0 +1,141 @@
+---
+name: blocking-io-guard
+description: Ensure async-path backend code that could block the asyncio event loop is protected by a teeth-verified runtime anchor in tests/blocking_io/. Use when changing backend Python under app/, packages/harness/deerflow/, or scripts/, when running a blocking-IO triage round over the whole repo, or when a reviewer/CI asks for blocking-IO coverage. Runs a deterministic scan (changed-lines or full-repo), routes each candidate, drafts/extends an anchor, and proves it fails when the blocking IO regresses.
+---
+
+# Blocking-IO Guard Skill
+
+Help a contributor ship backend async changes together with the runtime anchor
+that lets DeerFlow's blocking-IO CI gate actually see the new code. The dynamic
+detector only catches blocking IO on paths a test executes — this skill closes
+that gap, either for your own diff or for a repo-wide triage round.
+
+Read `references/good-anchor-rules.md` before writing any anchor.
+Only read `references/sop-skeleton.md` when generalizing this SOP to another
+detector domain — it is not needed to execute the steps below.
+
+## When to use
+
+- Your change touches Python under `backend/app/`,
+  `backend/packages/harness/deerflow/`, or `backend/scripts/` and may run on
+  the async event loop (Mode A). If unsure, run Step 0 — it answers
+  deterministically.
+- You are doing a maintenance triage round over the existing codebase
+  (Mode B).
+
+## SOP (router)
+
+### Step 0 — Scope (deterministic)
+
+**Mode A — your own diff** (default, pre-PR). From repo root:
+
+```bash
+uv run --project backend python scripts/scan_changed_blocking_io.py --base origin/main
+```
+
+Lists blocking-IO candidates your change introduces: findings on lines the
+diff added, **plus** findings that are new versus the merge base — the latter
+catches a new async caller exposing an old sync helper whose blocking line is
+not in the diff. The diff is `<base>...HEAD`, so **commit your work first** —
+uncommitted lines are not selected.
+
+If the list is empty, this change introduces no blocking-IO surface *that the
+static detector can see in the changed files*. One residual blind spot
+remains: reachability is same-file only, so a new async caller of a sync
+helper **defined in another file** is invisible to both selections. If your
+diff adds an async call into a helper that lives elsewhere, check that helper
+manually (codegraph or `git grep`) before stopping.
+
+**Mode B — full-repo triage round.** From repo root:
+
+```bash
+make detect-blocking-io
+```
+
+Prints a summary and writes the complete structured finding list to
+`.deer-flow/blocking-io-findings.json`. Work HIGH priority first; do not start
+MEDIUM until every HIGH is dispositioned (fixed, guarded, or recorded
+NO-ACTION).
+
+**Batching policy (PR sizing).** One **fix unit** per PR while any HIGH
+remains: a fix unit is one root cause — usually a single HIGH, but two HIGHs
+resolved by the same one-place fix belong together. Once no HIGH remains,
+MEDIUM/LOW may be batched (about five per round, grouped by module or by
+disposition) so each PR stays reviewable. A new Blockbuster rule is never
+batched with anything — it always ships alone (see Step 5).
+
+Both modes emit the same JSON shape per finding: `priority`, `location`
+(path/line/function), `blocking_call` (category/operation/symbol),
+`event_loop_exposure`, `reason`, `code`. Priority is a deterministic review
+ordering, not proof of a bug — Step 1 makes the actual call.
+
+### Step 1 — Judge each candidate (router)
+
+Read the code around each candidate and route it:
+
+- **Already offloaded** (`asyncio.to_thread`, `run_in_executor`, async client) →
+  **GUARD**: add/extend an anchor that locks the offload so a future edit cannot
+  move it back onto the loop.
+- **On the loop, not offloaded** → **FIX+ANCHOR**: offload the production code
+  (your fix), then add an anchor that guards it.
+- **Not actually exposed / acceptable** (rare: scanner false positive,
+  startup-only code) → **NO-ACTION**: record one line of why.
+- **Cross-file caveat**: the scanner's async reachability is same-file only
+  (`ASYNC_REACHABLE_SAME_FILE`). If the candidate is a *sync helper*, check for
+  async callers in other files (codegraph or `git grep`) before deciding
+  NO-ACTION.
+
+### Step 2 — Apply the fix, then re-scan (FIX+ANCHOR only)
+
+Offload the blocking call in production code, then re-run the Step 0 scan and
+confirm the candidate no longer appears. If the offloaded call sits in a
+`finally` / cleanup path, keep it best-effort and bounded (swallow-and-log,
+`asyncio.wait_for`) so a failing or hung cleanup cannot mask the primary
+exception. Match by the stable key
+**(path, function, symbol)** — line numbers shift after edits, so never
+compare by line.
+
+- The finding must disappear. If it still shows, the fix did not remove the
+  blocking pattern (e.g. the call is still a direct call, not offloaded) —
+  go back before touching any test.
+- GUARD / NO-ACTION routes skip this step: a residual finding there is
+  *expected* (the raw call still exists inside a sync helper with the offload
+  at the caller, or the exposure was judged acceptable).
+
+This is pattern-level feedback in seconds; it complements but never replaces
+Step 5 — only the runtime gate proves the event loop is actually protected.
+
+### Step 3 — Check existing anchors
+
+Look in `backend/tests/blocking_io/` for a test that drives the production async
+entry point reaching this candidate's branch.
+
+- Covers this branch already → go to Step 5 (re-verify teeth).
+- Covers the entry point but not this branch (e.g. happy path covered,
+  cleanup/404/409 not) → **extend** that anchor.
+- None → create one from `templates/anchor.template.py`.
+
+### Step 4 — Generate / extend the anchor
+
+Follow `references/good-anchor-rules.md`. Drive the *specific* branch (e.g. force
+the create failure that hits the cleanup `shutil.rmtree`). Never bypass the
+blocking surface with a test-only `asyncio.to_thread` wrapper.
+
+### Step 5 — Verify teeth (mandatory; also the anchor-vs-rule discriminator)
+
+1. Reintroduce the block (GUARD: temporarily revert the offload; FIX+ANCHOR: run
+   against the pre-fix code).
+2. Run `cd backend && make test-blocking-io` (or target the one test). It **must
+   go RED**.
+3. Restore the fix. It **must go GREEN**.
+
+A real block that stays GREEN means Blockbuster has no rule for that
+primitive — that is the **RULE** route; see `references/good-anchor-rules.md`
+for the admission criteria before adding one.
+
+### Step 6 — Deliver
+
+Commit the anchor(s) with your change; `make test-blocking-io` green. In the PR,
+note: candidates found, each disposition, the re-scan result (Step 2), and
+the teeth evidence (red→green). Include the reason for any NO-ACTION. A new
+Blockbuster rule, if any, goes in its own commit with the evidence from Step 5.
@@ -0,0 +1,65 @@
+# Good anchor rules + teeth (blocking-IO fill)
+
+Distilled from `backend/docs/BLOCKING_IO_DETECTION.md`. An anchor lives in
+`backend/tests/blocking_io/`; the suite's conftest runs each test under the
+strict Blockbuster gate scoped to `app.*` / `deerflow.*`.
+
+The examples in this file and in `templates/` are all filesystem-flavored.
+They demonstrate how to *write* the test, not what the SOP covers: the same
+rules apply to every category the detector reports (FILE_IO, HTTP,
+SUBPROCESS, SLEEP), and the acceptance criterion is always the teeth check
+below — never similarity to an example.
+
+## A good anchor
+
+- Calls the **real production async entry point** — not a low-level helper,
+  unless that helper *is* the entry point production executes.
+- Does **not** bypass the blocking surface with a test-only
+  `asyncio.to_thread` / `run_in_executor` wrapper.
+- Uses **real local filesystem** inputs when the bug shape is filesystem IO.
+- Mocks **only** the external dependency boundary (network service, third-party
+  saver), never the offload being guarded.
+- Drives the **specific branch** you are protecting (error / cleanup / 404 /
+  409), not just the happy path.
+
+## Teeth (the acceptance test)
+
+An anchor only counts if the gate actually fires when the code blocks:
+
+1. Reintroduce the block (revert the offload, or run pre-fix code).
+2. `cd backend && make test-blocking-io` → the anchor **must fail** (RED).
+3. Restore the fix → the anchor **must pass** (GREEN).
+
+A green-on-happy-path anchor with no proven red is fake coverage. Don't ship it.
+
+## The RULE route (rare; strict admission criteria)
+
+Blockbuster's built-in rules cover the common blocking primitives well. The
+two deliberate openings in this SOP are:
+
+1. **Coverage opening** (the normal case): the rules already see the
+   primitive — you only need an anchor so runtime detection executes the real
+   business path and CI prevents regression.
+2. **Rule opening** (rare): you reintroduced a *real* block and the gate
+   stayed GREEN — Blockbuster has no rule for that primitive.
+
+A project rule lives in `_PROJECT_BLOCKING_RULES` inside
+`backend/tests/support/detectors/blocking_io_runtime.py` and changes detection
+for the **entire** blocking-IO suite — global blast radius. Admission criteria
+for adding one:
+
+- You have the **fails-to-fail anchor** as evidence: a good anchor (per the
+  rules above) that drives a genuinely blocking path and stays green. No
+  evidence, no rule.
+- The primitive is a real blocking call (verified against its implementation
+  or docs), not a false positive of the static detector.
+- The rule ships in its **own commit**, naming the primitive, the anchor that
+  exposed the gap, and the suite-wide impact. Run the full
+  `make test-blocking-io` suite after adding it — a new rule can turn other
+  previously-green tests red, and each such red is either a real latent bug
+  (fix it) or rule overreach (narrow the rule).
+- If you are not in a position to own that blast radius (e.g. external
+  contributor), escalate to a maintainer with the evidence instead.
+
+**Never add a runtime rule just because a path is untested** — that case needs
+an anchor, not a rule.
@@ -0,0 +1,34 @@
+# SOP skeleton (generic shape — extraction seam)
+
+This is the domain-agnostic shape the blocking-IO skill instantiates. It exists
+so a second detector/gate domain can reuse the flow without copying it. Do not
+add machinery for that until a second domain actually appears (YAGNI).
+
+A domain provides:
+- a **static detector** that can scan a diff (or the whole tree) and emit
+  located candidates,
+- a **CI gate** that fails when the bad pattern executes,
+- a **test location** for guard tests,
+- **good-test rules** for that gate,
+- a **teeth definition** (how to make the gate fire on purpose).
+
+Steps:
+1. **Scope (deterministic):** intersect the diff's added lines with the
+   detector's findings → candidates this change introduced/touched. (Or, in
+   triage mode, take the full finding list ordered by priority.)
+2. **Judge (router):** per candidate — guard existing fix / fix + guard /
+   no-action / rule (the gate cannot see the primitive).
+3. **Fix + re-scope (fixes only):** apply the fix, re-run the detector; the
+   fixed candidate must vanish from the findings (match by a stable key, not
+   line numbers). Pattern-level feedback in seconds — complements, never
+   replaces, step 5.
+4. **Generate:** draft or extend a guard test per the good-test rules, driving
+   the specific branch.
+5. **Verify teeth:** make the bad pattern happen → gate must fail; restore →
+   gate must pass. A pattern that stays green while genuinely bad is the
+   "rule" signal, not a coverage success.
+6. **Deliver:** commit the verified guard test; any gate-rule change ships in
+   its own commit with the fails-to-fail evidence attached.
+
+To add a domain: supply a new fill doc (like `good-anchor-rules.md`) + detector,
+and promote this file into a parent skill the instances point at.
@@ -0,0 +1,32 @@
+"""Template: a tests/blocking_io/ runtime anchor.
+
+Copy into backend/tests/blocking_io/test_<area>.py and adapt. The suite's
+conftest already wraps every test here in the strict Blockbuster gate, so you do
+NOT import or activate the detector — just drive the real async entry point.
+
+Teeth check before you commit (see references/good-anchor-rules.md):
+  1. reintroduce the block  -> `cd backend && make test-blocking-io` must FAIL
+  2. restore the fix        -> it must PASS
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import pytest
+
+# from app.<module> import <real_async_entry_point>
+
+pytestmark = pytest.mark.asyncio
+
+
+async def test_<entry_point>_offloads_blocking_io_on_<branch>(tmp_path: Path) -> None:
+    # Arrange: real inputs at the boundary the code blocks on (FS -> tmp_path;
+    #   HTTP/subprocess -> stub the external service). Mock ONLY the external
+    #   boundary, never the offload under test.
+
+    # Act + Assert: call the REAL production async entry point and drive the
+    # specific branch you are guarding (e.g. force a failure to hit the cleanup
+    # path). If the entry point performs blocking IO on the loop, the gate fails.
+    #   await <real_async_entry_point>(...)
+    raise NotImplementedError("Replace with the real async entry point call.")