mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-13 19:06:01 +00:00
dc2ababf00
* feat(blocking-io): add changed-lines blocking-IO scanner (L1) * feat(blocking-io): add scan-changed CLI wrapper * feat(skill): add blocking-io-guard developer SOP skill * docs(blocking-io): point contributors at the blocking-io-guard skill * style(blocking-io): apply ruff format to scanner and tests * docs(backend): document changed-lines blocking-IO scanner in CLAUDE.md * feat(skill): add post-fix re-scan check and PR batching policy * refactor(skill): fix SOP step ordering, align template with repo conventions - Move re-scan into an explicit 'apply the fix' step (was wedged after anchor generation while telling you to go back before the anchor) - Renumber steps 0-6; drop undefined 'L1' jargon - Mode A: document that the diff is <base>...HEAD (commit first) - Mode B: prefer make detect-blocking-io + findings JSON file - anchor template: module-level pytestmark per tests/blocking_io convention - CLAUDE.md: fix 'git diff --base' phrasing * fix(skill): catch findings introduced without touching the blocking line Review follow-up: changed-line intersection alone misses the case where a new async caller exposes an old sync helper — the static finding sits on the untouched blocking line, so Mode A returned empty and the SOP stopped on a false 'no blocking-IO surface'. Selection is now a union over the changed files: - findings on added lines of git diff <base>...HEAD (kept: a second identical symbol in an already-flagged function collides on the stable key and only this selection sees it); - findings new versus the merge base, matched by (path, function, symbol) — never line numbers. Base sources are materialized via git show <merge-base>:<path>; files absent at base count every head finding as new. SKILL.md now states the residual same-file-only blind spot (cross-file async callers) instead of treating an empty list as proof of zero exposure, and only requires reading sop-skeleton.md when generalizing to another detector domain. * docs(skill): examples teach test-writing, the teeth check defines the rule All examples in the references/template are filesystem-flavored; make explicit that they are instances, not the SOP's boundary — the same rules apply to every detector category (FILE_IO, HTTP, SUBPROCESS, SLEEP) and acceptance is always red/green teeth, never similarity to an example. Neutralize the template's arrange comment accordingly. * fix(blocking-io): harden changed-lines scanner per review - Dedup the union selection by the stable key (path, function, symbol) instead of dict identity, so a future selector returning copied dicts cannot silently empty the result. - parse_changed_lines now handles any unified diff: context lines advance the new-file counter, \-markers and deletions do not, and the counter resets at each +++ header. Previously correct only for --unified=0. - Add blocking_io_static.scan_source (in-memory scan); base-version comparison no longer round-trips through temp files. - Empty Mode A report now prints the same-file-only reachability caveat at the point of use instead of relying on the SOP text alone. * docs(skill): bound best-effort cleanup when the offload sits in finally Lesson from the #3505 review: the SOP routinely drives 'offload the cleanup branch' transformations, and an awaited cleanup in finally can mask or stall the primary exception. One sentence in Step 2 closes that gap at the point where the fix is written.
160 lines
5.0 KiB
Markdown
160 lines
5.0 KiB
Markdown
# Blocking IO detection usage and maintenance
|
||
|
||
This document describes how to use and maintain DeerFlow backend blocking-IO
|
||
detection for async event-loop safety.
|
||
|
||
The goal is narrow: find and prevent synchronous IO from blocking backend
|
||
async event-loop paths. Static and runtime detection are complementary, but
|
||
they have different jobs.
|
||
|
||
## Static detector
|
||
|
||
The static detector is the discovery tool. It scans backend source code and
|
||
reports candidate blocking-IO call sites that may need human review.
|
||
|
||
Run it from the repository root:
|
||
|
||
```bash
|
||
make detect-blocking-io
|
||
```
|
||
|
||
Or from `backend/`:
|
||
|
||
```bash
|
||
make detect-blocking-io
|
||
```
|
||
|
||
The report is written to:
|
||
|
||
```text
|
||
.deer-flow/blocking-io-findings.json
|
||
```
|
||
|
||
Use this output for review and triage. A static finding is a candidate, not
|
||
proof that production blocks the event loop at runtime. The current static
|
||
rules are intentionally broad; prefer triaging existing output before adding
|
||
new static rules.
|
||
|
||
Add a static rule only when review finds a recurring high-risk blocking
|
||
pattern that is invisible to the current detector.
|
||
|
||
## Runtime detector
|
||
|
||
The runtime detector is the CI regression guard. It uses Blockbuster to fail a
|
||
focused test when code under `app.*` or `deerflow.*` performs blocking IO on
|
||
the asyncio event-loop thread.
|
||
|
||
Run it from `backend/`:
|
||
|
||
```bash
|
||
make test-blocking-io
|
||
```
|
||
|
||
The runtime gate starts from confirmed production bugs and protects those
|
||
paths from regressing. It does not prove that the entire backend is free of
|
||
blocking IO; it only covers the production paths exercised by
|
||
`backend/tests/blocking_io/`.
|
||
|
||
## Maintenance workflow
|
||
|
||
Use the static detector to find candidates, then use review to decide which
|
||
async production paths are worth protecting in CI.
|
||
|
||
The normal workflow is:
|
||
|
||
1. Run the static detector to find backend blocking-IO candidates.
|
||
2. Use human review to pick high-risk production async paths.
|
||
3. Add or update a focused runtime anchor in `backend/tests/blocking_io/`.
|
||
4. Let CI prevent that path from regressing.
|
||
|
||
Contributors changing backend async code can run the `blocking-io-guard` skill
|
||
(`.agent/skills/blocking-io-guard/`) to execute steps 1–3 for their own diff: it
|
||
scans the change for blocking-IO candidates, drafts or extends a runtime anchor,
|
||
and verifies the anchor fails when the blocking IO regresses.
|
||
|
||
Runtime detection has two maintenance paths.
|
||
|
||
### Add a runtime rule
|
||
|
||
Add a runtime rule when Blockbuster's default rules do not cover a generic
|
||
blocking primitive used by production code.
|
||
|
||
Rules belong in:
|
||
|
||
```text
|
||
backend/tests/support/detectors/blocking_io_runtime.py
|
||
```
|
||
|
||
Add them to `_PROJECT_BLOCKING_RULES`, not directly inside individual tests.
|
||
Keeping rules centralized makes it clear which extra primitives DeerFlow
|
||
expects Blockbuster to catch.
|
||
|
||
Example shape:
|
||
|
||
```python
|
||
import subprocess
|
||
|
||
from blockbuster import BlockBusterFunction
|
||
|
||
_PROJECT_BLOCKING_RULES = (
|
||
(
|
||
"subprocess.Popen.__init__",
|
||
BlockBusterFunction(
|
||
subprocess.Popen,
|
||
"__init__",
|
||
scanned_modules=["app", "deerflow"],
|
||
),
|
||
),
|
||
)
|
||
```
|
||
|
||
Do not add a runtime rule just because a business path is not tested. A rule
|
||
only expands what Blockbuster can intercept after code runs.
|
||
|
||
### Add a runtime anchor
|
||
|
||
Add a runtime anchor when a high-risk async production path should be protected
|
||
by CI but no existing `backend/tests/blocking_io/` test executes it.
|
||
|
||
Anchors belong in:
|
||
|
||
```text
|
||
backend/tests/blocking_io/
|
||
```
|
||
|
||
A good anchor should:
|
||
|
||
- Call the real production async entry point.
|
||
- Avoid bypassing the blocking surface with test-only `asyncio.to_thread`
|
||
wrappers.
|
||
- Use real local filesystem inputs when the bug shape is filesystem IO.
|
||
- Mock only the external dependency boundary, such as a network service or
|
||
third-party saver class.
|
||
- Fail if a future change moves the blocking operation back onto the event
|
||
loop.
|
||
|
||
Avoid testing only the low-level helper unless that helper is the production
|
||
async entry point. The runtime gate is most useful when it protects the caller
|
||
that production actually executes.
|
||
|
||
## Current runtime coverage
|
||
|
||
The runtime anchors protect confirmed blocking-IO bug shapes:
|
||
|
||
- SQLite checkpointer setup, including path resolution and parent-directory
|
||
creation.
|
||
- Subagent skill metadata loading through `SubagentExecutor._load_skills()`.
|
||
- `JsonlRunEventStore` async API (`put` / `list_*` / `delete_*`): the JSONL
|
||
run-event backend offloads its synchronous file IO via `asyncio.to_thread`
|
||
(fix #3084); this anchor drives the real async API under the gate so any
|
||
blocking IO reintroduced on the loop fails, not only removal of one
|
||
`to_thread` call.
|
||
- `UploadsMiddleware.before_agent` uploads-directory scan: a sync-only middleware
|
||
hook runs on the event loop under async graph execution, so the scan is
|
||
offloaded via `abefore_agent` + `run_in_executor`.
|
||
- Gate health checks: Blockbuster catches unoffloaded calls, opt-out works, and
|
||
patches are restored after exceptions.
|
||
|
||
As static detection and review identify more high-risk async paths, add new
|
||
runtime anchors incrementally.
|