mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-10 09:25:57 +00:00
93e3281cbf
* fix(dev): create backend/sandbox before uvicorn reload-exclude (#3459) #3426 switched the dev gateway's --reload-exclude patterns to absolute paths. uvicorn only excludes an absolute path directly when it already exists as a directory; otherwise it globs the pattern, and Python 3.12's pathlib raises NotImplementedError("Non-relative patterns are unsupported") for an absolute glob pattern. serve.sh mkdir'd the .deer-flow excludes but not backend/sandbox, so `make dev` crashed on startup on a fresh checkout under Python 3.12 (#3454). docker/dev-entrypoint.sh had the same latent gap. Create backend/sandbox in both launchers so every absolute exclude stays on uvicorn's is_dir() short-circuit. Add a regression test that pins the uvicorn mechanism (crash on missing dir, safe once created) and enforces that every absolute --reload-exclude is mkdir'd before launch. Closes #3459 * test(dev): harden reload-exclude invariant parser against false pass/negatives The launcher invariant test parsed shell with a "mkdir -p" line filter and a substring membership check. Two latent gaps (sub-threshold for this fix, but this code guards a user-facing startup path, so close them): - A `\`-continued multi-line `mkdir` would drop arguments on continuation lines, silently weakening coverage. - Substring membership could false-pass when an exclude is a path-prefix of a different created dir (e.g. `/app/backend/sandbox` "found" inside `/app/backend/sandbox-other`). Fold line-continuations, drop comments, and shlex-tokenize each `mkdir` argument list into an exact set (quotes stripped, `$VAR` literal); assert exact set membership. Same shlex handling for `--reload-exclude` values. Verified the parser still flags the pre-fix missing `backend/sandbox` (RED preserved) and no longer false-passes on a path-prefix. * fix(dev): gitignore backend/sandbox runtime dir + pin mkdir-before-launch Address two review findings on the #3459 fix: - backend/sandbox was described as "gitignored runtime state" but no ignore rule actually matched it. Add an anchored `/sandbox/` to backend/.gitignore (anchored so it does NOT shadow the source package backend/packages/harness/deerflow/sandbox/) so sandbox artifacts created at runtime can't pollute the working tree or be committed by accident. New test asserts content under backend/sandbox is ignored, making the claim verifiable. - The launcher invariant test only proved the sandbox mkdir exists somewhere, not that it runs before uvicorn starts. Add an order test (sandbox mkdir line must precede the `uv run uvicorn` launch) so a future edit can't move the mkdir below the launch and silently reintroduce the crash. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * test(dev): fix reload-exclude parser to handle serve.sh's quoted flag bundle The previous autofix tokenized each whole line with shlex, but serve.sh packs every flag into a single double-quoted `GATEWAY_EXTRA_FLAGS="..."` assignment. shlex collapses that into one token, so no `--reload-exclude` flag is found and `test_launcher_precreates_every_absolute_reload_exclude[scripts/serve.sh]` failed CI with "expected at least one absolute reload-exclude". Parse `--reload-exclude` with a regex that matches a balanced single/double quoted group or a bare token, so the assignment's surrounding `"` is never swallowed into the value. This recovers all three serve.sh excludes (the prior regex also silently dropped the last `$BACKEND_RUNTIME_HOME` because the adjacent closing quote broke shlex) while still covering dev-entrypoint.sh and the space-separated `--reload-exclude <value>` form. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
186 lines
8.1 KiB
Python
186 lines
8.1 KiB
Python
"""Regression for #3459 / #3454 — dev gateway reload-exclude must not crash.
|
|
|
|
#3426 switched the dev gateway's ``--reload-exclude`` patterns from relative
|
|
(``sandbox/``) to absolute (``$REPO_ROOT/backend/sandbox``). uvicorn only
|
|
excludes such a path directly when it already exists as a directory; otherwise
|
|
it falls back to ``Path.cwd().glob(pattern)``, and on **Python 3.12**
|
|
``pathlib.Path.glob()`` raises ``NotImplementedError: Non-relative patterns are
|
|
unsupported`` for an absolute pattern. ``serve.sh`` created the ``.deer-flow``
|
|
excludes but not ``backend/sandbox``, so a fresh checkout crashed ``make dev``
|
|
on startup.
|
|
|
|
Two layers of coverage:
|
|
|
|
* ``test_*_resolve_*`` exercises uvicorn's real ``resolve_reload_patterns`` to
|
|
pin the failure mode and the fix's mechanism.
|
|
* ``test_launcher_precreates_every_absolute_reload_exclude`` enforces the actual
|
|
invariant on both launchers: every absolute exclude dir is ``mkdir -p``'d
|
|
before uvicorn starts. This encodes the root cause, so any future absolute
|
|
exclude that forgets its ``mkdir`` fails here.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import re
|
|
import shlex
|
|
import subprocess
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
import pytest
|
|
from uvicorn.config import resolve_reload_patterns
|
|
|
|
REPO_ROOT = Path(__file__).resolve().parents[2]
|
|
|
|
LAUNCHERS = {
|
|
"scripts/serve.sh": REPO_ROOT / "scripts" / "serve.sh",
|
|
"docker/dev-entrypoint.sh": REPO_ROOT / "docker" / "dev-entrypoint.sh",
|
|
}
|
|
|
|
# Shell terminators / redirects that end a simple command's argument list.
|
|
_CMD_BOUNDARY = re.compile(r"[;&|<>]")
|
|
|
|
|
|
def _logical_lines(script: str) -> list[str]:
|
|
"""Fold ``\\``-continuations and drop comment lines, yielding logical lines.
|
|
|
|
A ``mkdir`` or ``--reload-exclude`` list split across lines with a trailing
|
|
backslash becomes one line here, so an argument on a continuation line can't
|
|
be silently dropped by per-line scanning.
|
|
"""
|
|
folded = script.replace("\\\n", " ")
|
|
return [line for line in folded.splitlines() if not line.lstrip().startswith("#")]
|
|
|
|
|
|
def _shlex(fragment: str) -> list[str]:
|
|
"""Tokenize a shell fragment (quotes stripped, ``$VAR`` kept literal,
|
|
trailing ``# comment`` honored); tolerate pathological quoting."""
|
|
try:
|
|
return shlex.split(fragment, comments=True)
|
|
except ValueError:
|
|
return fragment.split()
|
|
|
|
|
|
# ``--reload-exclude`` followed by ``=`` or whitespace, then a value that is a
|
|
# single-quoted group, a double-quoted group, or a bare token. The quoted
|
|
# alternatives match a *balanced* pair first, so serve.sh's surrounding
|
|
# ``GATEWAY_EXTRA_FLAGS="..."`` closing quote is never swallowed into the value.
|
|
_RELOAD_EXCLUDE = re.compile(r"""--reload-exclude[=\s]+('[^']*'|"[^"]*"|[^\s'"]+)""")
|
|
|
|
|
|
def _reload_exclude_values(script: str) -> list[str]:
|
|
"""Every ``--reload-exclude`` value, with surrounding quotes removed.
|
|
|
|
Handles both CLI forms (``--reload-exclude=<value>`` and the space form
|
|
``--reload-exclude <value>``) and both shell quotings the launchers use:
|
|
|
|
* ``docker/dev-entrypoint.sh`` puts each flag on its own line.
|
|
* ``scripts/serve.sh`` packs every flag into a single double-quoted
|
|
``GATEWAY_EXTRA_FLAGS="... --reload-exclude='$X' ..."`` assignment. A
|
|
whole-line ``shlex`` would collapse that assignment into one token and
|
|
find no flags (this is what regressed serve.sh in CI); matching balanced
|
|
inner quotes here keeps the assignment's closing ``"`` out of the value,
|
|
so every exclude — including the last ``$BACKEND_RUNTIME_HOME`` — is seen.
|
|
"""
|
|
values: list[str] = []
|
|
for line in _logical_lines(script):
|
|
for raw in _RELOAD_EXCLUDE.findall(line):
|
|
values.append(raw.strip("\"'"))
|
|
return values
|
|
|
|
|
|
def _mkdir_dirs(script: str) -> set[str]:
|
|
"""Exact set of directories created by every ``mkdir`` command.
|
|
|
|
Tokenizes each ``mkdir`` argument list rather than substring-matching, so
|
|
``/app/backend/sandbox`` is not falsely considered created by, say,
|
|
``mkdir -p /app/backend/sandbox-other``.
|
|
"""
|
|
dirs: set[str] = set()
|
|
for line in _logical_lines(script):
|
|
match = re.search(r"\bmkdir\b(.*)", line)
|
|
if not match:
|
|
continue
|
|
args = _CMD_BOUNDARY.split(match.group(1), maxsplit=1)[0]
|
|
for token in _shlex(args):
|
|
if token.startswith("-"): # skip flags such as -p
|
|
continue
|
|
dirs.add(token)
|
|
return dirs
|
|
|
|
|
|
@pytest.mark.skipif(
|
|
sys.version_info >= (3, 13),
|
|
reason="pathlib accepts absolute glob patterns on 3.13+, so the crash is 3.12-only",
|
|
)
|
|
def test_resolve_reload_patterns_crashes_on_missing_absolute_dir(tmp_path):
|
|
"""The exact #3454 failure: absolute exclude + missing dir on Python 3.12."""
|
|
missing = tmp_path / "sandbox" # absolute path that does not exist yet
|
|
assert not missing.exists()
|
|
with pytest.raises(NotImplementedError):
|
|
resolve_reload_patterns([str(missing)], [])
|
|
|
|
|
|
def test_resolve_reload_patterns_is_safe_once_dir_exists(tmp_path):
|
|
"""The fix's mechanism: a pre-created dir takes uvicorn's is_dir() path."""
|
|
sandbox = tmp_path / "sandbox"
|
|
sandbox.mkdir()
|
|
_patterns, directories = resolve_reload_patterns([str(sandbox)], [])
|
|
resolved = {d.resolve() for d in directories}
|
|
assert sandbox.resolve() in resolved
|
|
|
|
|
|
@pytest.mark.parametrize("name", list(LAUNCHERS))
|
|
def test_launcher_precreates_every_absolute_reload_exclude(name):
|
|
"""Every absolute ``--reload-exclude`` dir must be created by ``mkdir`` first.
|
|
|
|
Relative glob patterns (``*.pyc``, ``__pycache__``) are safe and skipped;
|
|
anything anchored at ``/`` or a shell variable is an absolute path that
|
|
uvicorn would glob — and crash on — unless it already exists. Membership is
|
|
an exact match against the parsed ``mkdir`` argument set (not a substring
|
|
test), so a path-prefix can't produce a false pass.
|
|
"""
|
|
script = LAUNCHERS[name].read_text(encoding="utf-8")
|
|
created = _mkdir_dirs(script)
|
|
|
|
absolute_excludes = [v for v in _reload_exclude_values(script) if v.startswith(("/", "$"))]
|
|
assert absolute_excludes, f"{name}: expected at least one absolute reload-exclude"
|
|
|
|
for value in absolute_excludes:
|
|
assert value in created, f"{name}: absolute reload-exclude {value!r} is never created via mkdir (created dirs: {sorted(created)})"
|
|
|
|
|
|
@pytest.mark.parametrize("name", list(LAUNCHERS))
|
|
def test_sandbox_mkdir_precedes_uvicorn_launch(name):
|
|
"""The sandbox mkdir must come before the uvicorn launch, not just exist.
|
|
|
|
``_mkdir_dirs`` only proves the mkdir is present somewhere; this pins script
|
|
order so a future edit can't move (or guard) the mkdir below the launch and
|
|
silently reintroduce the #3454 crash on a fresh checkout. ``uv run uvicorn``
|
|
matches the launch but not serve.sh's ``stop_all`` kill line.
|
|
"""
|
|
lines = LAUNCHERS[name].read_text(encoding="utf-8").splitlines()
|
|
launch_idx = next((i for i, ln in enumerate(lines) if "uv run uvicorn" in ln), None)
|
|
mkdir_idx = next((i for i, ln in enumerate(lines) if re.search(r"\bmkdir\b", ln) and "sandbox" in ln), None)
|
|
|
|
assert launch_idx is not None, f"{name}: could not locate the 'uv run uvicorn' launch line"
|
|
assert mkdir_idx is not None, f"{name}: could not locate the sandbox mkdir line"
|
|
assert mkdir_idx < launch_idx, f"{name}: sandbox mkdir (line {mkdir_idx + 1}) must precede uvicorn launch (line {launch_idx + 1})"
|
|
|
|
|
|
def test_precreated_sandbox_artifacts_are_gitignored():
|
|
"""backend/sandbox is runtime state — its contents must stay out of git so
|
|
sandbox artifacts can't be accidentally committed (matches the reload-exclude
|
|
intent). A content path is existence-independent, unlike the bare dir path.
|
|
|
|
Guards against the inaccurate "gitignored" claim by making it verifiable.
|
|
"""
|
|
probe = "backend/sandbox/__artifact_probe__"
|
|
result = subprocess.run(
|
|
["git", "-C", str(REPO_ROOT), "check-ignore", "-q", probe],
|
|
capture_output=True,
|
|
)
|
|
if result.returncode == 128: # not a git checkout (e.g. packaged install)
|
|
pytest.skip("not inside a git working tree")
|
|
assert result.returncode == 0, "backend/sandbox/* should be gitignored (see backend/.gitignore '/sandbox/')"
|