deer-flow/backend/docs/BLOCKING_IO_DETECTION.md

# Blocking IO detection usage and maintenance

This document describes how to use and maintain DeerFlow backend blocking-IO
detection for async event-loop safety.

The goal is narrow: find and prevent synchronous IO from blocking backend
async event-loop paths. Static and runtime detection are complementary, but
they have different jobs.

## Static detector

The static detector is the discovery tool. It scans backend source code and
reports candidate blocking-IO call sites that may need human review.

Run it from the repository root:

```bash
make detect-blocking-io
```

Or from `backend/`:

```bash
make detect-blocking-io
```

The report is written to:

```text
.deer-flow/blocking-io-findings.json
```

Use this output for review and triage. A static finding is a candidate, not
proof that production blocks the event loop at runtime. The current static
rules are intentionally broad; prefer triaging existing output before adding
new static rules.

Add a static rule only when review finds a recurring high-risk blocking
pattern that is invisible to the current detector.

## Runtime detector

The runtime detector is the CI regression guard. It uses Blockbuster to fail a
focused test when code under `app.*` or `deerflow.*` performs blocking IO on
the asyncio event-loop thread.

Run it from `backend/`:

```bash
make test-blocking-io
```

The runtime gate starts from confirmed production bugs and protects those
paths from regressing. It does not prove that the entire backend is free of
blocking IO; it only covers the production paths exercised by
`backend/tests/blocking_io/`.

## Maintenance workflow

Use the static detector to find candidates, then use review to decide which
async production paths are worth protecting in CI.

The normal workflow is:

1. Run the static detector to find backend blocking-IO candidates.
2. Use human review to pick high-risk production async paths.
3. Add or update a focused runtime anchor in `backend/tests/blocking_io/`.
4. Let CI prevent that path from regressing.

Contributors changing backend async code can run the `blocking-io-guard` skill
(`.agent/skills/blocking-io-guard/`) to execute steps 1–3 for their own diff: it
scans the change for blocking-IO candidates, drafts or extends a runtime anchor,
and verifies the anchor fails when the blocking IO regresses.

Runtime detection has two maintenance paths.

### Add a runtime rule

Add a runtime rule when Blockbuster's default rules do not cover a generic
blocking primitive used by production code.

Rules belong in:

```text
backend/tests/support/detectors/blocking_io_runtime.py
```

Add them to `_PROJECT_BLOCKING_RULES`, not directly inside individual tests.
Keeping rules centralized makes it clear which extra primitives DeerFlow
expects Blockbuster to catch.

Example shape:

```python
import subprocess

from blockbuster import BlockBusterFunction

_PROJECT_BLOCKING_RULES = (
    (
        "subprocess.Popen.__init__",
        BlockBusterFunction(
            subprocess.Popen,
            "__init__",
            scanned_modules=["app", "deerflow"],
        ),
    ),
)
```

Do not add a runtime rule just because a business path is not tested. A rule
only expands what Blockbuster can intercept after code runs.

### Add a runtime anchor

Add a runtime anchor when a high-risk async production path should be protected
by CI but no existing `backend/tests/blocking_io/` test executes it.

Anchors belong in:

```text
backend/tests/blocking_io/
```

A good anchor should:

- Call the real production async entry point.
- Avoid bypassing the blocking surface with test-only `asyncio.to_thread`
  wrappers.
- Use real local filesystem inputs when the bug shape is filesystem IO.
- Mock only the external dependency boundary, such as a network service or
  third-party saver class.
- Fail if a future change moves the blocking operation back onto the event
  loop.

Avoid testing only the low-level helper unless that helper is the production
async entry point. The runtime gate is most useful when it protects the caller
that production actually executes.

## Current runtime coverage

The runtime anchors protect confirmed blocking-IO bug shapes:

- SQLite checkpointer setup, including path resolution and parent-directory
  creation.
- Subagent skill metadata loading through `SubagentExecutor._load_skills()`.
- `JsonlRunEventStore` async API (`put` / `list_*` / `delete_*`): the JSONL
  run-event backend offloads its synchronous file IO via `asyncio.to_thread`
  (fix #3084); this anchor drives the real async API under the gate so any
  blocking IO reintroduced on the loop fails, not only removal of one
  `to_thread` call.
- `UploadsMiddleware.before_agent` uploads-directory scan: a sync-only middleware
  hook runs on the event loop under async graph execution, so the scan is
  offloaded via `abefore_agent` + `run_in_executor`.
- Gate health checks: Blockbuster catches unoffloaded calls, opt-out works, and
  patches are restored after exceptions.

As static detection and review identify more high-risk async paths, add new
runtime anchors incrementally.