mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-05-21 15:36:48 +00:00
e93f658472
* fix(task-tool): unwrap callback manager when locating usage recorder
`config["callbacks"]` may arrive as a `BaseCallbackManager` (e.g. the
`AsyncCallbackManager` LangChain hands to async tool runs), not just a plain
list. The previous `for cb in callbacks` loop raised
`TypeError: 'AsyncCallbackManager' object is not iterable`, which
`ToolErrorHandlingMiddleware` then converted into a failed `task` ToolMessage
even though the subagent had completed internally — Ultra mode lost subagent
results and the lead agent fell back to redoing the work.
Unwrap `BaseCallbackManager.handlers` before searching for the recorder.
Refs: bytedance/deer-flow#3107 (BUG-002)
* fix(frontend): treat any task tool error as a terminal subtask failure
The subtask card status machine matched only three English prefixes (`Task
Succeeded. Result:`, `Task failed.`, `Task timed out`). Anything else fell
through to `in_progress`, so a `task` tool error wrapped by
`ToolErrorHandlingMiddleware` (`Error: Tool 'task' failed ...`) left the card
spinning forever even after the run had ended.
Extract the prefix logic into `parseSubtaskResult` and recognise any leading
`Error:` token as a terminal failure. The extracted function is unit-tested
against the legacy prefixes plus the `AsyncCallbackManager` regression
captured in the upstream issue.
Refs: bytedance/deer-flow#3107 (BUG-007)
* fix(frontend): exclude hidden, reasoning, and tool payloads from chat export
`formatThreadAsMarkdown` / `formatThreadAsJSON` iterated raw messages without
running the UI-level `isHiddenFromUIMessage` filter. Exported transcripts
therefore included `hide_from_ui` system reminders, memory injections,
provider `reasoning_content`, tool calls, and tool result messages — content
that is intentionally hidden in the chat view.
Filter the export to the user-visible transcript by default and gate
reasoning / tool calls / tool messages / hidden messages behind explicit
`ExportOptions` flags so a future debug export can opt back in without
forking the formatter.
Refs: bytedance/deer-flow#3107 (BUG-006)
* fix(gateway): route get_config through get_app_config for mtime hot reload
`get_config(request)` returned the `app.state.config` snapshot captured at
startup. The worker / lead-agent path then threaded that frozen `AppConfig`
through `RunContext` and `agent_factory`, so per-run fields edited in
`config.yaml` (notably `max_tokens`) were ignored until the gateway process
was restarted — even though `get_app_config()` already does mtime-based
reload at the bottom layer.
Route the request dependency through `get_app_config()` directly. Runtime
`ContextVar` overrides (`push_current_app_config`) and test-injected
singletons (`set_app_config`) keep working; `app.state.config` is now only
read at startup for one-shot bootstrap (logging level, IM channels,
`langgraph_runtime` engines).
`tests/test_gateway_deps_config.py` encoded the old snapshot contract and is
removed; `tests/test_gateway_config_freshness.py` replaces it with mtime,
ContextVar, and `set_app_config` coverage. `test_skills_custom_router.py` and
`test_uploads_router.py` now inject test configs via FastAPI
`dependency_overrides[get_config]` instead of mutating `app.state.config`.
Document the hot-reload boundary in `backend/CLAUDE.md` so reviewers know
which fields are picked up on the next request vs. which still require a
restart (`database`, `checkpointer`, `run_events`, `stream_bridge`,
`sandbox.use`, `log_level`, `channels.*`).
Refs: bytedance/deer-flow#3107 (BUG-001)
* fix(gateway): broaden get_config 503 to any config-load failure
Address review feedback on the previous commit:
1. Narrow exception catch removed. The old contract returned 503 whenever
`app.state.config is None`. The first cut only mapped
`FileNotFoundError`, leaving `PermissionError`, YAML parse errors, and
pydantic `ValidationError` to bubble up as 500. At the request boundary
we treat any inability to materialise the config as "configuration not
available" (503) and log the original exception so the operator still
has the stack.
2. Removed the unused `request: Request` parameter and the matching
`# noqa: ARG001`. FastAPI's `Depends()` does not require the dependency
to accept `Request`; the only call site uses the no-arg form.
3. `backend/CLAUDE.md` boundary now lists the *reason* each field is
restart-required (engine binding, singleton caching, one-shot
`apply_logging_level`, etc.), not just the field name, so reviewers do
not have to reverse-engineer the boundary themselves.
Tests parametrise four exception classes (`FileNotFoundError`,
`PermissionError`, `ValueError`, `RuntimeError`) and assert 503 for each.
Refs: bytedance/deer-flow#3107 (BUG-001)
* fix(task-tool): defend _find_usage_recorder against non-list callbacks
Address review feedback. The previous commit handled the two common shapes
LangChain hands to async tool runs — a plain `list[BaseCallbackHandler]` and
a `BaseCallbackManager` subclass — but iterated any other shape directly,
which would still raise `TypeError` if e.g. a single handler instance leaked
through without a list wrapper.
Treat any non-list, non-manager `config["callbacks"]` value as "no recorder"
rather than crash. Docstring now lists all four shapes explicitly. New tests
cover the single-handler-object case, `runtime is None`, `callbacks is None`,
and `runtime.config` being a non-dict — all required to be silent no-ops.
Refs: bytedance/deer-flow#3107 (BUG-002)
* fix(frontend): drop dead identity ternary and add opt-in export tests
Address review feedback on the previous export commit:
1. Removed the no-op `typeof msg.content === "string" ? msg.content : msg.content`
expression in `formatThreadAsJSON`. Both branches returned the same value;
the message content now flows through unchanged whether it is a string or
the rich `MessageContent[]` shape (LangChain JSON-serialises the array
structure correctly already).
2. Expanded the JSDoc on `ExportOptions` to make it clearer that the four
flags are not currently wired to any UI control — callers wanting a debug
export must build the options object explicitly. The default behaviour
continues to match the explicit prescription in
bytedance/deer-flow#3107 BUG-006.
3. Added opt-in coverage. The previous tests only exercised the
`options = {}` default path; the new cases verify each flag flips the
corresponding payload back into the export so a future debug-export
surface does not silently break the contract.
Refs: bytedance/deer-flow#3107 (BUG-006)
* fix(frontend): export subtask prefix constants and document fallback intent
Address review feedback on the previous BUG-007 commit:
1. `SUCCESS_PREFIX`, `FAILURE_PREFIX`, `TIMEOUT_PREFIX`, and the
`ERROR_WRAPPER_PATTERN` regex are now exported. The JSDoc explicitly
pins them as part of the backend↔frontend contract defined in
`task_tool.py` and `tool_error_handling_middleware.py`, so any future
structured-status migration (e.g. backend writing
`additional_kwargs.subagent_status` instead of leading text) can
reference these from one canonical place rather than redefine them.
2. The `in_progress` fallback now carries a docstring explaining the
deliberate choice — LangChain only ever emits a `ToolMessage` once the
tool itself has returned, so unrecognised content means the contract
has drifted and "still running" is the right operator signal (eagerly
marking it terminal-failed would mask the drift).
No behaviour change; this is documentation and an API export.
Refs: bytedance/deer-flow#3107 (BUG-007)
* fix(gateway): drop app.state.config snapshot and freeze run_events_config
Address @ShenAC-SAC's BUG-001 review on #3131. The previous cut still
stored an ``AppConfig`` snapshot on ``app.state.config`` for startup
bootstrap. Two follow-on hazards from that:
1. Future code touching the gateway lifespan could accidentally start
reading ``app.state.config`` again, silently regressing the request
hot path back to a stale snapshot.
2. ``get_run_context()`` paired a freshly-reloaded ``AppConfig`` with the
startup-bound ``event_store`` and a *live* ``run_events_config``
field — so an operator who edited ``run_events.backend`` mid-flight
would have produced a run context whose ``event_store`` and
``run_events_config`` referred to different backends.
Clean approach (aligned with the direction in PR #3128):
- ``lifespan()`` keeps a local ``startup_config`` variable and passes it
explicitly into ``langgraph_runtime(app, startup_config)`` and into
``start_channel_service``. No ``app.state.config`` attribute is set at
any point.
- ``langgraph_runtime`` now accepts ``startup_config`` as a required
parameter, removing the ``getattr(app.state, "config", None)`` lookup
and the "config not initialised" runtime error.
- The matching ``run_events_config`` is frozen onto ``app.state`` next
to ``run_event_store`` so ``get_run_context`` reads the two from the
same startup-time source. ``app_config`` continues to be resolved
live via ``get_app_config()``.
- ``backend/CLAUDE.md`` boundary explanation updated to spell out the
``startup_config`` / ``get_app_config()`` split.
New regression test ``test_run_context_app_config_reflects_yaml_edit``
exercises the worker-feeding path: it asserts that ``ctx.app_config``
follows a mid-flight ``config.yaml`` edit while
``ctx.run_events_config`` stays frozen to the startup snapshot the
event store was built from.
Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review
* fix(frontend): parse Task cancelled and polling timed out as terminal
Address @ShenAC-SAC's BUG-007 review on #3131. `task_tool.py` actually
emits five terminal strings:
- `Task Succeeded. Result: …`
- `Task failed. …`
- `Task timed out. …`
- `Task cancelled by user.` ← previously matched none
- `Task polling timed out after N minutes …` ← previously matched none
The previous cut handled three; the last two fell through to the
"unknown content" branch and pushed the subtask card back to
`in_progress` even though the backend had already reached a terminal
state. Add explicit matches plus regression tests for both. The
`in_progress` fallback is now reserved for genuinely unrecognised
output (i.e. contract drift), as documented.
Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review
* fix(frontend): sanitize JSON export content via the Markdown content path
Address @ShenAC-SAC's BUG-006 review and the Copilot inline comment on
#3131. The previous cut filtered hidden/tool messages out of the JSON
export but still serialised `msg.content` verbatim, so:
- inline `<think>…</think>` wrappers stayed in the exported `content`
even with `includeReasoning: false`,
- content-array thinking blocks leaked the `thinking` field,
- `<uploaded_files>…</uploaded_files>` markers leaked the workspace
paths a user uploaded files to.
JSON now goes through the same sanitiser the Markdown path uses
(`extractContentFromMessage` + `stripUploadedFilesTag`). Reasoning and
tool_calls remain gated behind their `ExportOptions` flags. AI / human
rows that sanitise to empty content with no opted-in reasoning or tool
calls are dropped so the JSON matches the Markdown path's `continue`
on empty assistant fragments.
New regression tests cover the three leak shapes the reviewer called
out plus the empty-content-drop case.
Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review
* test(gateway): align lifespan stub with langgraph_runtime two-arg signature
Codex round-3 review of c0bc7a06 flagged this: changing
`langgraph_runtime` to require `startup_config` as a second positional
argument broke the one-arg stub `_noop_langgraph_runtime(_app)` in
`test_gateway_lifespan_shutdown.py`, which is patched into
`app.gateway.app.langgraph_runtime` by the lifespan shutdown bounded-timeout
regression. Lifespan would then call the stub with two args and raise
`TypeError` before the bounded-shutdown assertion ran.
Update the stub to match the new signature. The shutdown test itself is
unaffected — it only cares about the channel `stop_channel_service` hang
path.
Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review
* fix(frontend): strip every known backend marker in export, not just uploads
Codex round-3 review of 258ca800 and the matching maintainer feedback on
PR #3131 made the same point: the JSON export now ran the
Markdown-side sanitiser, but that sanitiser only stripped
`<uploaded_files>`. The full set of payloads middleware embeds inside
message `content` is larger:
- `<uploaded_files>` — `UploadsMiddleware`
- `<system-reminder>` — `DynamicContextMiddleware`
- `<memory>` — `DynamicContextMiddleware` (nested inside system-reminder)
- `<current_date>` — `DynamicContextMiddleware`
The primary protection is still `isHiddenFromUIMessage`: the
`<system-reminder>` HumanMessage is marked `hide_from_ui: true` and never
reaches the formatter. This commit adds the second line of defence so a
regression that drops the `hide_from_ui` flag — or any future middleware
that injects the same tag vocabulary into a visible HumanMessage —
cannot leak the payload into the export file.
Concrete changes:
- New `INTERNAL_MARKER_TAGS` constant + `stripInternalMarkers(content)`
helper in `core/messages/utils.ts`. The constant doubles as
documentation for the backend↔frontend contract.
- `formatMessageContent` in `export.ts` now calls `stripInternalMarkers`
instead of `stripUploadedFilesTag`. UI render paths
(`message-list-item.tsx`) keep using the narrower function so a user
legitimately typing `<memory>` in a meta-discussion is preserved.
- The "drop empty rows" guard in `buildJSONMessage` switched from
`=== undefined` to truthy `!` checks. Codex spotted the asymmetry: when
`extractReasoningContentFromMessage` returned the empty string (which it
legitimately can), the JSON path emitted `{reasoning: ""}` while the
Markdown path's `!reasoning` `continue` correctly dropped the row.
New regression tests cover the defence-in-depth strip with a
`<system-reminder><memory><current_date>` payload deliberately *not*
marked `hide_from_ui`; tool-message sanitization under
`includeToolMessages: true`; the mixed-content-array case
(`thinking + text + image_url`); and the opted-in empty-reasoning drop.
Live verification on a real Ultra-mode thread that uploaded a PDF
(`曾鑫民-薪资交易流水.pdf`): backend state's first HumanMessage carries the
`<uploaded_files>` block (with `/mnt/user-data/uploads/...` paths) as part
of a content-array. The Markdown and JSON export blobs both come back
free of `<uploaded_files>`, `<system-reminder>`, `<current_date>`,
`tool_calls`, and reasoning — while preserving the user's `这是什么 ?`
prompt and the assistant's visible answer.
Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review
* test(frontend): cover trim, varied N, and pre-execution Error: prefixes
Codex round-3 review of 50e2c257 flagged three coverage gaps in the
subtask-status parser:
1. `Task cancelled by user.` and `Task polling timed out` previously had
no whitespace-trim coverage — the original trim test only exercised
the success prefix. Streaming chunks can arrive with leading/trailing
newlines; the regex needed an explicit assertion.
2. The polling-timeout case was tested only at one `N` (15 minutes). The
backend interpolates the live `timeout_seconds // 60` value, so the
matcher must hold for any positive integer. Now we run the case for
1, 5, and 60 minutes.
3. `task_tool.py` also emits three `Error:` strings for pre-execution
failures — unknown subagent type, host-bash disabled, and "task
disappeared from background tasks". They are intentionally handled by
`ERROR_WRAPPER_PATTERN` rather than dedicated prefixes (the wrapper
already produces the right terminal-failed shape) but had no test
coverage proving that wiring. Codex was right that a refactor splitting
one of them off into its own prefix would silently break things.
The JSDoc on the constants block now spells the three pre-execution
errors out so the relationship between `task_tool.py` returns and the
prefix vocabulary is explicit.
No production code change beyond the docstring — this commit is pure
coverage hardening for the contract that already exists.
Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review
676 lines
27 KiB
Python
676 lines
27 KiB
Python
import asyncio
|
|
import os
|
|
import stat
|
|
from io import BytesIO
|
|
from pathlib import Path
|
|
from types import SimpleNamespace
|
|
from unittest.mock import AsyncMock, MagicMock, patch
|
|
|
|
import pytest
|
|
from _router_auth_helpers import call_unwrapped, make_authed_test_app
|
|
from fastapi import HTTPException, UploadFile
|
|
from fastapi.testclient import TestClient
|
|
|
|
from app.gateway.deps import get_config
|
|
from app.gateway.routers import uploads
|
|
|
|
|
|
class ChunkedUpload:
|
|
def __init__(self, filename: str, chunks: list[bytes]):
|
|
self.filename = filename
|
|
self._chunks = list(chunks)
|
|
self.read_calls: list[int | None] = []
|
|
|
|
async def read(self, size: int | None = None) -> bytes:
|
|
self.read_calls.append(size)
|
|
if size is None:
|
|
raise AssertionError("upload must be read with an explicit chunk size")
|
|
if not self._chunks:
|
|
return b""
|
|
return self._chunks.pop(0)
|
|
|
|
|
|
def _mounted_provider() -> MagicMock:
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
return provider
|
|
|
|
|
|
def test_upload_files_writes_thread_storage_and_skips_local_sandbox_sync(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
provider.acquire.return_value = "local"
|
|
sandbox = MagicMock()
|
|
provider.get.return_value = sandbox
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
|
|
result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert result.success is True
|
|
assert len(result.files) == 1
|
|
assert result.files[0]["filename"] == "notes.txt"
|
|
assert (thread_uploads_dir / "notes.txt").read_bytes() == b"hello uploads"
|
|
|
|
sandbox.update_file.assert_not_called()
|
|
|
|
|
|
def test_upload_files_auto_renames_duplicate_form_filenames(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
result = asyncio.run(
|
|
call_unwrapped(
|
|
uploads.upload_files,
|
|
"thread-local",
|
|
request=MagicMock(),
|
|
files=[
|
|
UploadFile(filename="data.txt", file=BytesIO(b"first")),
|
|
UploadFile(filename="data.txt", file=BytesIO(b"second")),
|
|
],
|
|
config=SimpleNamespace(),
|
|
)
|
|
)
|
|
|
|
assert result.success is True
|
|
assert [file_info["filename"] for file_info in result.files] == ["data.txt", "data_1.txt"]
|
|
assert "original_filename" not in result.files[0]
|
|
assert result.files[1]["original_filename"] == "data.txt"
|
|
assert (thread_uploads_dir / "data.txt").read_bytes() == b"first"
|
|
assert (thread_uploads_dir / "data_1.txt").read_bytes() == b"second"
|
|
|
|
|
|
def test_upload_files_skips_acquire_when_thread_data_is_mounted(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
|
|
result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-mounted", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert result.success is True
|
|
assert (thread_uploads_dir / "notes.txt").read_bytes() == b"hello uploads"
|
|
provider.acquire.assert_not_called()
|
|
provider.get.assert_not_called()
|
|
|
|
|
|
def test_upload_files_does_not_auto_convert_documents_by_default(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
provider.acquire.return_value = "local"
|
|
sandbox = MagicMock()
|
|
provider.get.return_value = sandbox
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
patch.object(uploads, "_auto_convert_documents_enabled", return_value=False),
|
|
patch.object(uploads, "convert_file_to_markdown", AsyncMock()) as convert_mock,
|
|
):
|
|
file = UploadFile(filename="report.pdf", file=BytesIO(b"pdf-bytes"))
|
|
result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert result.success is True
|
|
assert len(result.files) == 1
|
|
assert result.files[0]["filename"] == "report.pdf"
|
|
assert "markdown_file" not in result.files[0]
|
|
convert_mock.assert_not_called()
|
|
assert not (thread_uploads_dir / "report.md").exists()
|
|
|
|
|
|
def test_upload_files_syncs_non_local_sandbox_and_marks_markdown_file(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = False
|
|
provider.acquire.return_value = "aio-1"
|
|
sandbox = MagicMock()
|
|
provider.get.return_value = sandbox
|
|
|
|
async def fake_convert(file_path: Path) -> Path:
|
|
md_path = file_path.with_suffix(".md")
|
|
md_path.write_text("converted", encoding="utf-8")
|
|
return md_path
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
patch.object(uploads, "_auto_convert_documents_enabled", return_value=True),
|
|
patch.object(uploads, "convert_file_to_markdown", AsyncMock(side_effect=fake_convert)),
|
|
):
|
|
file = UploadFile(filename="report.pdf", file=BytesIO(b"pdf-bytes"))
|
|
result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-aio", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert result.success is True
|
|
assert len(result.files) == 1
|
|
file_info = result.files[0]
|
|
assert file_info["filename"] == "report.pdf"
|
|
assert file_info["markdown_file"] == "report.md"
|
|
|
|
assert (thread_uploads_dir / "report.pdf").read_bytes() == b"pdf-bytes"
|
|
assert (thread_uploads_dir / "report.md").read_text(encoding="utf-8") == "converted"
|
|
|
|
sandbox.update_file.assert_any_call("/mnt/user-data/uploads/report.pdf", b"pdf-bytes")
|
|
sandbox.update_file.assert_any_call("/mnt/user-data/uploads/report.md", b"converted")
|
|
|
|
|
|
def test_upload_files_makes_non_local_files_sandbox_writable(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = False
|
|
provider.acquire.return_value = "aio-1"
|
|
sandbox = MagicMock()
|
|
provider.get.return_value = sandbox
|
|
|
|
async def fake_convert(file_path: Path) -> Path:
|
|
md_path = file_path.with_suffix(".md")
|
|
md_path.write_text("converted", encoding="utf-8")
|
|
return md_path
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
patch.object(uploads, "_auto_convert_documents_enabled", return_value=True),
|
|
patch.object(uploads, "convert_file_to_markdown", AsyncMock(side_effect=fake_convert)),
|
|
patch.object(uploads, "_make_file_sandbox_writable") as make_writable,
|
|
):
|
|
file = UploadFile(filename="report.pdf", file=BytesIO(b"pdf-bytes"))
|
|
result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-aio", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert result.success is True
|
|
make_writable.assert_any_call(thread_uploads_dir / "report.pdf")
|
|
make_writable.assert_any_call(thread_uploads_dir / "report.md")
|
|
|
|
|
|
def test_upload_files_does_not_adjust_permissions_for_local_sandbox(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
provider.acquire.return_value = "local"
|
|
sandbox = MagicMock()
|
|
provider.get.return_value = sandbox
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
patch.object(uploads, "_make_file_sandbox_writable") as make_writable,
|
|
):
|
|
file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
|
|
result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert result.success is True
|
|
make_writable.assert_not_called()
|
|
|
|
|
|
def test_upload_files_acquires_non_local_sandbox_before_writing(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = False
|
|
sandbox = MagicMock()
|
|
provider.get.return_value = sandbox
|
|
|
|
def acquire_before_writes(thread_id: str) -> str:
|
|
assert list(thread_uploads_dir.iterdir()) == []
|
|
return "aio-1"
|
|
|
|
provider.acquire.side_effect = acquire_before_writes
|
|
|
|
with (
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
|
|
result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-aio", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert result.success is True
|
|
provider.acquire.assert_called_once_with("thread-aio")
|
|
sandbox.update_file.assert_called_once_with("/mnt/user-data/uploads/notes.txt", b"hello uploads")
|
|
|
|
|
|
def test_upload_files_fails_before_writing_when_non_local_sandbox_unavailable(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = False
|
|
provider.acquire.side_effect = RuntimeError("sandbox unavailable")
|
|
file = ChunkedUpload("notes.txt", [b"hello uploads"])
|
|
|
|
with (
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
with pytest.raises(RuntimeError, match="sandbox unavailable"):
|
|
asyncio.run(call_unwrapped(uploads.upload_files, "thread-aio", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert list(thread_uploads_dir.iterdir()) == []
|
|
assert file.read_calls == []
|
|
provider.get.assert_not_called()
|
|
|
|
|
|
def test_upload_files_rejects_too_many_files_before_writing(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
with (
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=_mounted_provider()),
|
|
patch.object(uploads, "_get_upload_limits", return_value=uploads.UploadLimits(max_files=1, max_file_size=10, max_total_size=20)),
|
|
):
|
|
files = [
|
|
ChunkedUpload("one.txt", [b"one"]),
|
|
ChunkedUpload("two.txt", [b"two"]),
|
|
]
|
|
with pytest.raises(HTTPException) as exc_info:
|
|
asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=files, config=SimpleNamespace()))
|
|
|
|
assert exc_info.value.status_code == 413
|
|
assert list(thread_uploads_dir.iterdir()) == []
|
|
assert files[0].read_calls == []
|
|
assert files[1].read_calls == []
|
|
|
|
|
|
def test_upload_files_rejects_oversized_single_file_and_removes_partial_file(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = _mounted_provider()
|
|
file = ChunkedUpload("big.txt", [b"123456"])
|
|
|
|
with (
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
patch.object(uploads, "_get_upload_limits", return_value=uploads.UploadLimits(max_files=10, max_file_size=5, max_total_size=20)),
|
|
):
|
|
with pytest.raises(HTTPException) as exc_info:
|
|
asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert exc_info.value.status_code == 413
|
|
assert not (thread_uploads_dir / "big.txt").exists()
|
|
assert file.read_calls == [8192]
|
|
provider.acquire.assert_not_called()
|
|
|
|
|
|
def test_upload_files_rejects_total_size_over_limit_and_cleans_request_files(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
with (
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=_mounted_provider()),
|
|
patch.object(uploads, "_get_upload_limits", return_value=uploads.UploadLimits(max_files=10, max_file_size=10, max_total_size=5)),
|
|
):
|
|
files = [
|
|
ChunkedUpload("first.txt", [b"123"]),
|
|
ChunkedUpload("second.txt", [b"456"]),
|
|
]
|
|
with pytest.raises(HTTPException) as exc_info:
|
|
asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=files, config=SimpleNamespace()))
|
|
|
|
assert exc_info.value.status_code == 413
|
|
assert not (thread_uploads_dir / "first.txt").exists()
|
|
assert not (thread_uploads_dir / "second.txt").exists()
|
|
|
|
|
|
def test_upload_files_does_not_sync_non_local_sandbox_when_total_size_exceeds_limit(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = False
|
|
provider.acquire.return_value = "aio-1"
|
|
sandbox = MagicMock()
|
|
provider.get.return_value = sandbox
|
|
|
|
with (
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
patch.object(uploads, "_get_upload_limits", return_value=uploads.UploadLimits(max_files=10, max_file_size=10, max_total_size=5)),
|
|
):
|
|
files = [
|
|
ChunkedUpload("first.txt", [b"123"]),
|
|
ChunkedUpload("second.txt", [b"456"]),
|
|
]
|
|
with pytest.raises(HTTPException) as exc_info:
|
|
asyncio.run(call_unwrapped(uploads.upload_files, "thread-aio", request=MagicMock(), files=files, config=SimpleNamespace()))
|
|
|
|
assert exc_info.value.status_code == 413
|
|
provider.acquire.assert_called_once_with("thread-aio")
|
|
provider.get.assert_called_once_with("aio-1")
|
|
sandbox.update_file.assert_not_called()
|
|
|
|
|
|
def test_upload_files_does_not_sync_non_local_sandbox_when_conversion_fails(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = False
|
|
provider.acquire.return_value = "aio-1"
|
|
sandbox = MagicMock()
|
|
provider.get.return_value = sandbox
|
|
|
|
with (
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
patch.object(uploads, "_auto_convert_documents_enabled", return_value=True),
|
|
patch.object(uploads, "convert_file_to_markdown", AsyncMock(side_effect=RuntimeError("conversion failed"))),
|
|
):
|
|
file = UploadFile(filename="report.pdf", file=BytesIO(b"pdf-bytes"))
|
|
with pytest.raises(HTTPException) as exc_info:
|
|
asyncio.run(call_unwrapped(uploads.upload_files, "thread-aio", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
|
|
assert exc_info.value.status_code == 500
|
|
provider.acquire.assert_called_once_with("thread-aio")
|
|
provider.get.assert_called_once_with("aio-1")
|
|
sandbox.update_file.assert_not_called()
|
|
assert not (thread_uploads_dir / "report.pdf").exists()
|
|
|
|
|
|
def test_make_file_sandbox_writable_adds_write_bits_for_regular_files(tmp_path):
|
|
file_path = tmp_path / "report.pdf"
|
|
file_path.write_bytes(b"pdf-bytes")
|
|
os_chmod_mode = stat.S_IRUSR | stat.S_IRGRP | stat.S_IROTH
|
|
file_path.chmod(os_chmod_mode)
|
|
|
|
uploads._make_file_sandbox_writable(file_path)
|
|
|
|
updated_mode = stat.S_IMODE(file_path.stat().st_mode)
|
|
assert updated_mode & stat.S_IWUSR
|
|
assert updated_mode & stat.S_IWGRP
|
|
assert updated_mode & stat.S_IWOTH
|
|
|
|
|
|
def test_make_file_sandbox_writable_skips_symlinks(tmp_path):
|
|
file_path = tmp_path / "target-link.txt"
|
|
file_path.write_text("hello", encoding="utf-8")
|
|
symlink_stat = MagicMock(st_mode=stat.S_IFLNK)
|
|
|
|
with (
|
|
patch.object(uploads.os, "lstat", return_value=symlink_stat),
|
|
patch.object(uploads.os, "chmod") as chmod,
|
|
):
|
|
uploads._make_file_sandbox_writable(file_path)
|
|
|
|
chmod.assert_not_called()
|
|
|
|
|
|
def test_upload_files_rejects_dotdot_and_dot_filenames(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
provider = MagicMock()
|
|
provider.acquire.return_value = "local"
|
|
sandbox = MagicMock()
|
|
provider.get.return_value = sandbox
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
# These filenames must be rejected outright
|
|
for bad_name in ["..", "."]:
|
|
file = UploadFile(filename=bad_name, file=BytesIO(b"data"))
|
|
result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
assert result.success is True
|
|
assert result.files == [], f"Expected no files for unsafe filename {bad_name!r}"
|
|
|
|
# Path-traversal prefixes are stripped to the basename and accepted safely
|
|
file = UploadFile(filename="../etc/passwd", file=BytesIO(b"data"))
|
|
result = asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=[file], config=SimpleNamespace()))
|
|
assert result.success is True
|
|
assert len(result.files) == 1
|
|
assert result.files[0]["filename"] == "passwd"
|
|
|
|
# Only the safely normalised file should exist
|
|
assert [f.name for f in thread_uploads_dir.iterdir()] == ["passwd"]
|
|
|
|
|
|
def test_upload_files_rejects_preexisting_symlink_destination(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
outside_file = tmp_path / "outside.txt"
|
|
outside_file.write_text("protected", encoding="utf-8")
|
|
(thread_uploads_dir / "victim.txt").symlink_to(outside_file)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
file = UploadFile(filename="victim.txt", file=BytesIO(b"attacker upload"))
|
|
result = asyncio.run(uploads.upload_files("thread-local", files=[file]))
|
|
|
|
assert result.success is False
|
|
assert result.files == []
|
|
assert result.skipped_files == ["victim.txt"]
|
|
assert "skipped 1 unsafe file" in result.message
|
|
assert outside_file.read_text(encoding="utf-8") == "protected"
|
|
assert (thread_uploads_dir / "victim.txt").is_symlink()
|
|
|
|
|
|
def test_upload_files_rejects_dangling_symlink_destination(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
missing_target = tmp_path / "missing-target.txt"
|
|
(thread_uploads_dir / "victim.txt").symlink_to(missing_target)
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
file = UploadFile(filename="victim.txt", file=BytesIO(b"attacker upload"))
|
|
result = asyncio.run(uploads.upload_files("thread-local", files=[file]))
|
|
|
|
assert result.success is False
|
|
assert result.files == []
|
|
assert result.skipped_files == ["victim.txt"]
|
|
assert not missing_target.exists()
|
|
assert (thread_uploads_dir / "victim.txt").is_symlink()
|
|
|
|
|
|
def test_upload_files_rejects_hardlinked_destination_without_truncating(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
outside_file = tmp_path / "outside.txt"
|
|
outside_file.write_text("protected", encoding="utf-8")
|
|
os.link(outside_file, thread_uploads_dir / "victim.txt")
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
file = UploadFile(filename="victim.txt", file=BytesIO(b"attacker upload"))
|
|
result = asyncio.run(uploads.upload_files("thread-local", files=[file]))
|
|
|
|
assert result.success is False
|
|
assert result.files == []
|
|
assert result.skipped_files == ["victim.txt"]
|
|
assert outside_file.read_text(encoding="utf-8") == "protected"
|
|
assert (thread_uploads_dir / "victim.txt").read_text(encoding="utf-8") == "protected"
|
|
|
|
|
|
def test_upload_files_overwrites_existing_regular_file(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
existing_file = thread_uploads_dir / "notes.txt"
|
|
existing_file.write_bytes(b"old upload")
|
|
assert existing_file.stat().st_nlink == 1
|
|
|
|
provider = MagicMock()
|
|
provider.uses_thread_data_mounts = True
|
|
|
|
with (
|
|
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
|
):
|
|
file = UploadFile(filename="notes.txt", file=BytesIO(b"new upload"))
|
|
result = asyncio.run(uploads.upload_files("thread-local", files=[file]))
|
|
|
|
assert result.success is True
|
|
assert [file_info["filename"] for file_info in result.files] == ["notes.txt"]
|
|
assert existing_file.read_bytes() == b"new upload"
|
|
assert existing_file.stat().st_nlink == 1
|
|
|
|
|
|
def test_delete_uploaded_file_removes_generated_markdown_companion(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
(thread_uploads_dir / "report.pdf").write_bytes(b"pdf-bytes")
|
|
(thread_uploads_dir / "report.md").write_text("converted", encoding="utf-8")
|
|
|
|
with patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir):
|
|
result = asyncio.run(call_unwrapped(uploads.delete_uploaded_file, "thread-aio", "report.pdf", request=MagicMock()))
|
|
|
|
assert result == {"success": True, "message": "Deleted report.pdf"}
|
|
assert not (thread_uploads_dir / "report.pdf").exists()
|
|
assert not (thread_uploads_dir / "report.md").exists()
|
|
|
|
|
|
def test_auto_convert_documents_enabled_defaults_to_false_on_config_errors():
|
|
class BrokenConfig:
|
|
def __getattribute__(self, name):
|
|
if name == "uploads":
|
|
raise RuntimeError("boom")
|
|
return super().__getattribute__(name)
|
|
|
|
assert uploads._auto_convert_documents_enabled(BrokenConfig()) is False
|
|
|
|
|
|
def test_auto_convert_documents_enabled_reads_dict_backed_uploads_config():
|
|
cfg = MagicMock()
|
|
cfg.uploads = {"auto_convert_documents": True}
|
|
|
|
assert uploads._auto_convert_documents_enabled(cfg) is True
|
|
|
|
|
|
def test_auto_convert_documents_enabled_accepts_boolean_and_string_truthy_values():
|
|
false_cfg = MagicMock()
|
|
false_cfg.uploads = MagicMock(auto_convert_documents=False)
|
|
|
|
true_cfg = MagicMock()
|
|
true_cfg.uploads = MagicMock(auto_convert_documents=True)
|
|
|
|
string_true_cfg = MagicMock()
|
|
string_true_cfg.uploads = MagicMock(auto_convert_documents="YES")
|
|
|
|
string_false_cfg = MagicMock()
|
|
string_false_cfg.uploads = MagicMock(auto_convert_documents="false")
|
|
|
|
assert uploads._auto_convert_documents_enabled(false_cfg) is False
|
|
assert uploads._auto_convert_documents_enabled(true_cfg) is True
|
|
assert uploads._auto_convert_documents_enabled(string_true_cfg) is True
|
|
assert uploads._auto_convert_documents_enabled(string_false_cfg) is False
|
|
|
|
|
|
def test_upload_limits_endpoint_reads_uploads_config():
|
|
cfg = MagicMock()
|
|
cfg.uploads = {
|
|
"max_files": 15,
|
|
"max_file_size": "1048576",
|
|
"max_total_size": 2097152,
|
|
}
|
|
|
|
result = asyncio.run(call_unwrapped(uploads.get_upload_limits, "thread-local", request=MagicMock(), config=cfg))
|
|
|
|
assert result.max_files == 15
|
|
assert result.max_file_size == 1048576
|
|
assert result.max_total_size == 2097152
|
|
|
|
|
|
def test_upload_limits_endpoint_requires_thread_access():
|
|
cfg = MagicMock()
|
|
cfg.uploads = {}
|
|
app = make_authed_test_app(owner_check_passes=False)
|
|
app.state.config = cfg
|
|
app.dependency_overrides[get_config] = lambda: cfg
|
|
app.include_router(uploads.router)
|
|
|
|
with TestClient(app) as client:
|
|
response = client.get("/api/threads/thread-local/uploads/limits")
|
|
|
|
assert response.status_code == 404
|
|
|
|
|
|
def test_upload_limits_accept_legacy_config_keys():
|
|
cfg = MagicMock()
|
|
cfg.uploads = {
|
|
"max_file_count": 7,
|
|
"max_single_file_size": 123,
|
|
"max_total_size": 456,
|
|
}
|
|
|
|
limits = uploads._get_upload_limits(cfg)
|
|
|
|
assert limits == uploads.UploadLimits(max_files=7, max_file_size=123, max_total_size=456)
|
|
|
|
|
|
def test_upload_files_uses_configured_file_count_limit(tmp_path):
|
|
thread_uploads_dir = tmp_path / "uploads"
|
|
thread_uploads_dir.mkdir(parents=True)
|
|
|
|
cfg = MagicMock()
|
|
cfg.uploads = {"max_files": 1}
|
|
|
|
with (
|
|
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
|
patch.object(uploads, "get_sandbox_provider", return_value=_mounted_provider()),
|
|
):
|
|
files = [
|
|
ChunkedUpload("one.txt", [b"one"]),
|
|
ChunkedUpload("two.txt", [b"two"]),
|
|
]
|
|
with pytest.raises(HTTPException) as exc_info:
|
|
asyncio.run(call_unwrapped(uploads.upload_files, "thread-local", request=MagicMock(), files=files, config=cfg))
|
|
|
|
assert exc_info.value.status_code == 413
|