fix(channels): scope IM files and helper commands to owner (#3579)

* fix(channels): scope IM files and helper commands to owner

* fix(memory): honor bound IM owner for /memory gateway endpoints

The channel manager already attaches X-DeerFlow-Owner-User-Id for /memory
and /models, but the memory router resolved user_id solely from
get_effective_user_id(), which returns the synthetic internal user
(DEFAULT_USER_ID) for channel workers. A bound IM /memory therefore read
the default/internal memory instead of the connection owner's.

Resolve the owner via _resolve_memory_user_id(request) across all
/api/memory* endpoints: trusted internal callers act for the owner header,
browser/API callers fall back to get_effective_user_id(). Mirrors the
threads router's get_trusted_internal_owner_user_id pattern, completing
acceptance criterion #3 of #3539.

Add end-to-end tests asserting the resolved user_id (not just that the
header is sent) and that a spoofed owner header from a browser user is
ignored.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): align memory bucket and reuse cached storage owner

Address PR #3579 review feedback:

- Memory router now sanitizes the trusted owner header via make_safe_user_id
  before routing, matching the channel file pipeline
  (_safe_user_id_for_run/prepare_user_dir_for_raw_id). A bound owner id needing
  sanitization now resolves to the same bucket as its files/uploads instead of
  500ing in _validate_user_id.
- _handle_chat reuses the storage_user_id cached at the top of the method for
  artifact delivery instead of re-deriving _channel_storage_user_id(msg), so
  uploads and outputs cannot drift to different buckets if a channel rewrites
  the InboundMessage in receive_file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): stage unbound IM files under the run's user bucket

Address PR #3579 review feedback (#5): _channel_storage_user_id now mirrors
_resolve_run_params' identity policy, falling back to safe(msg.user_id) instead
of returning None for unbound auth-enabled channels.

Previously an unbound msg ran under safe(platform_user_id) but staged uploads
under get_effective_user_id() in the dispatcher task (unset contextvar ->
"default"), so files landed in users/default/... while the agent read from
users/{safe_platform_user_id}/.... Bound and unbound channels now write where
the agent reads. Returns None only when no identity is available.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): reuse cached storage owner in streaming artifact delivery

Address PR #3579 review feedback (#6): thread the storage_user_id resolved in
_handle_chat into _handle_streaming_chat instead of re-deriving
_channel_storage_user_id(msg) in the finally block. Avoids re-running
_safe_user_id_for_run (and its possible filesystem touch) on the streaming-error
path and guarantees artifact delivery targets the same bucket as the uploads.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(channels): document owner-scoped IM file storage

Address PR #3579 review feedback (#4): the IM Channels and File Upload sections
still described pre-PR default-bucket behaviour. Document that receive_file,
_ingest_inbound_files/ensure_uploads_dir/get_uploads_dir, and
_resolve_attachments/_prepare_artifact_delivery are owner-scoped via the user_id
kwarg, and that the bucket matches the memory bucket from _resolve_memory_user_id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(channels): unify run identity and storage bucket resolution

Address PR #3579 review feedback (#3): _resolve_run_params no longer duplicates
the owner-resolution rule inline. After the #5 fix the inline block and
_channel_storage_user_id computed the identical sanitized-with-platform-fallback
value, so the run identity now calls the same helper, making it the single
source of truth for run_context["user_id"] and the file/artifact storage bucket.

_owner_headers stays deliberately separate: it sends the raw owner id over HTTP
for the gateway to re-resolve (no sanitize, no platform fallback), documented on
both helpers. test_run_identity_matches_storage_bucket pins the two together so
they cannot drift again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Nan Gao
2026-06-18 05:45:35 +02:00
committed by GitHub
parent 2b301e8211
commit 525af0da14
8 changed files with 461 additions and 55 deletions
+8 -1
View File
@@ -405,6 +405,12 @@ Bridges external messaging platforms (Feishu, Slack, Telegram, Discord, DingTalk
10. For commands (`/new`, `/status`, `/models`, `/memory`, `/help`): handle locally or query Gateway API 10. For commands (`/new`, `/status`, `/models`, `/memory`, `/help`): handle locally or query Gateway API
11. Outbound → channel callbacks → platform reply 11. Outbound → channel callbacks → platform reply
**Owner-scoped file storage**: inbound files, uploads, and output artifacts are staged under the DeerFlow owner's bucket so they land where the agent run reads/writes (`users/{user_id}/threads/{thread_id}/user-data/{uploads,outputs}`). `ChannelManager._handle_chat` resolves the storage owner once via `_channel_storage_user_id(msg)` (sanitized owner id, falling back to `safe(msg.user_id)` for unbound auth-enabled channels — mirroring `_resolve_run_params`'s run identity; `None` only when no identity is available) and threads it as the `user_id=` kwarg through the file pipeline:
- `Channel.receive_file(msg, thread_id, user_id=...)` — owner-bound channels persist downloaded files under the owner's bucket instead of the default bucket
- `_ingest_inbound_files(...)` and the underlying `ensure_uploads_dir` / `get_uploads_dir` — owner-scoped via the same kwarg
- `_resolve_attachments` / `_prepare_artifact_delivery` — resolve output artifacts from the bound owner's bucket
The cached value is reused for both the blocking (`runs.wait`) and streaming (`_handle_streaming_chat`) paths, so uploads and artifact delivery always target the same bucket even if a channel returns a rewritten `InboundMessage` from `receive_file`. The bucket id matches the memory bucket resolved by `_resolve_memory_user_id` (both normalize through `make_safe_user_id`).
**Configuration** (`config.yaml` -> `channels`): **Configuration** (`config.yaml` -> `channels`):
- `langgraph_url` - LangGraph-compatible Gateway API base URL (default: `http://localhost:8001/api`) - `langgraph_url` - LangGraph-compatible Gateway API base URL (default: `http://localhost:8001/api`)
- `gateway_url` - Gateway API URL for auxiliary commands (default: `http://localhost:8001`) - `gateway_url` - Gateway API URL for auxiliary commands (default: `http://localhost:8001`)
@@ -438,6 +444,7 @@ Bridges external messaging platforms (Feishu, Slack, Telegram, Discord, DingTalk
- Per-agent per-user memory at `{base_dir}/users/{user_id}/agents/{agent_name}/memory.json` - Per-agent per-user memory at `{base_dir}/users/{user_id}/agents/{agent_name}/memory.json`
- Custom agent definitions (`SOUL.md` + `config.yaml`) are also per-user at `{base_dir}/users/{user_id}/agents/{agent_name}/`. The legacy shared layout `{base_dir}/agents/{agent_name}/` remains read-only fallback for unmigrated installations - Custom agent definitions (`SOUL.md` + `config.yaml`) are also per-user at `{base_dir}/users/{user_id}/agents/{agent_name}/`. The legacy shared layout `{base_dir}/agents/{agent_name}/` remains read-only fallback for unmigrated installations
- `user_id` is resolved via `get_effective_user_id()` from `deerflow.runtime.user_context` - `user_id` is resolved via `get_effective_user_id()` from `deerflow.runtime.user_context`
- The `/api/memory*` endpoints resolve the owner through `_resolve_memory_user_id(request)`: trusted internal callers (IM channel workers carrying the `X-DeerFlow-Owner-User-Id` header, e.g. a bound `/memory` command) act for the connection owner; browser/API callers fall back to `get_effective_user_id()`. The header is only honored after `AuthMiddleware` validated the internal token, mirroring `get_trusted_internal_owner_user_id` used by the threads router
- In no-auth mode, `user_id` defaults to `"default"` (constant `DEFAULT_USER_ID`) - In no-auth mode, `user_id` defaults to `"default"` (constant `DEFAULT_USER_ID`)
- Absolute `storage_path` in config opts out of per-user isolation - Absolute `storage_path` in config opts out of per-user isolation
- **Migration**: Run `PYTHONPATH=. python scripts/migrate_user_isolation.py` to move legacy `memory.json`, `threads/`, and `agents/` into per-user layout. Supports `--dry-run` (preview changes) and `--user-id USER_ID` (assign unowned legacy data to a user, defaults to `default`). - **Migration**: Run `PYTHONPATH=. python scripts/migrate_user_isolation.py` to move legacy `memory.json`, `threads/`, and `agents/` into per-user layout. Supports `--dry-run` (preview changes) and `--user-id USER_ID` (assign unowned legacy data to a user, defaults to `default`).
@@ -624,7 +631,7 @@ Multi-file upload with automatic document conversion:
- Supports: PDF, PPT, Excel, Word documents (converted via `markitdown`) - Supports: PDF, PPT, Excel, Word documents (converted via `markitdown`)
- Rejects directory inputs before copying so uploads stay all-or-nothing - Rejects directory inputs before copying so uploads stay all-or-nothing
- Reuses one conversion worker per request when called from an active event loop - Reuses one conversion worker per request when called from an active event loop
- Files stored in thread-isolated directories - Files stored in thread-isolated directories under the resolving user's bucket (`users/{user_id}/threads/{thread_id}/user-data/uploads`). For IM channels the owner is threaded explicitly via the `user_id=` kwarg (see IM Channels → Owner-scoped file storage); HTTP/embedded callers resolve it from `get_effective_user_id()`
- Duplicate filenames in a single upload request are auto-renamed with `_N` suffixes so later files do not truncate earlier files - Duplicate filenames in a single upload request are auto-renamed with `_N` suffixes so later files do not truncate earlier files
- Agent receives uploaded file list via `UploadsMiddleware` - Agent receives uploaded file list via `UploadsMiddleware`
+3 -1
View File
@@ -178,7 +178,7 @@ class Channel(ABC):
except Exception: except Exception:
logger.exception("[%s] failed to upload file %s", self.name, attachment.filename) logger.exception("[%s] failed to upload file %s", self.name, attachment.filename)
async def receive_file(self, msg: InboundMessage, thread_id: str) -> InboundMessage: async def receive_file(self, msg: InboundMessage, thread_id: str, *, user_id: str | None = None) -> InboundMessage:
""" """
Optionally process and materialize inbound file attachments for this channel. Optionally process and materialize inbound file attachments for this channel.
@@ -190,8 +190,10 @@ class Channel(ABC):
Args: Args:
msg: The inbound message, possibly containing file metadata in msg.files. msg: The inbound message, possibly containing file metadata in msg.files.
thread_id: The resolved DeerFlow thread ID for sandbox path context. thread_id: The resolved DeerFlow thread ID for sandbox path context.
user_id: Optional DeerFlow storage user ID for user-scoped channel workers.
Returns: Returns:
The (possibly modified) InboundMessage, with text and/or files updated as needed. The (possibly modified) InboundMessage, with text and/or files updated as needed.
""" """
del user_id
return msg return msg
+15 -7
View File
@@ -311,7 +311,7 @@ class FeishuChannel(Channel):
raise RuntimeError(f"Feishu file upload failed: code={response.code}, msg={response.msg}") raise RuntimeError(f"Feishu file upload failed: code={response.code}, msg={response.msg}")
return response.data.file_key return response.data.file_key
async def receive_file(self, msg: InboundMessage, thread_id: str) -> InboundMessage: async def receive_file(self, msg: InboundMessage, thread_id: str, *, user_id: str | None = None) -> InboundMessage:
"""Download a Feishu file into the thread uploads directory. """Download a Feishu file into the thread uploads directory.
Returns the sandbox virtual path when the image is persisted successfully. Returns the sandbox virtual path when the image is persisted successfully.
@@ -326,15 +326,23 @@ class FeishuChannel(Channel):
text = msg.text text = msg.text
for file in files: for file in files:
if file.get("image_key"): if file.get("image_key"):
virtual_path = await self._receive_single_file(msg.thread_ts, file["image_key"], "image", thread_id) virtual_path = await self._receive_single_file(msg.thread_ts, file["image_key"], "image", thread_id, user_id=user_id)
text = text.replace("[image]", virtual_path, 1) text = text.replace("[image]", virtual_path, 1)
elif file.get("file_key"): elif file.get("file_key"):
virtual_path = await self._receive_single_file(msg.thread_ts, file["file_key"], "file", thread_id) virtual_path = await self._receive_single_file(msg.thread_ts, file["file_key"], "file", thread_id, user_id=user_id)
text = text.replace("[file]", virtual_path, 1) text = text.replace("[file]", virtual_path, 1)
msg.text = text msg.text = text
return msg return msg
async def _receive_single_file(self, message_id: str, file_key: str, type: Literal["image", "file"], thread_id: str) -> str: async def _receive_single_file(
self,
message_id: str,
file_key: str,
type: Literal["image", "file"],
thread_id: str,
*,
user_id: str | None = None,
) -> str:
request = self._GetMessageResourceRequest.builder().message_id(message_id).file_key(file_key).type(type).build() request = self._GetMessageResourceRequest.builder().message_id(message_id).file_key(file_key).type(type).build()
def inner(): def inner():
@@ -373,9 +381,9 @@ class FeishuChannel(Channel):
return f"Failed to obtain the [{type}]" return f"Failed to obtain the [{type}]"
paths = get_paths() paths = get_paths()
user_id = get_effective_user_id() effective_user_id = user_id or get_effective_user_id()
paths.ensure_thread_dirs(thread_id, user_id=user_id) paths.ensure_thread_dirs(thread_id, user_id=effective_user_id)
uploads_dir = paths.sandbox_uploads_dir(thread_id, user_id=user_id).resolve() uploads_dir = paths.sandbox_uploads_dir(thread_id, user_id=effective_user_id).resolve()
ext = "png" if type == "image" else "bin" ext = "png" if type == "image" else "bin"
raw_filename = getattr(response, "file_name", "") or f"feishu_{file_key[-12:]}.{ext}" raw_filename = getattr(response, "file_name", "") or f"feishu_{file_key[-12:]}.{ext}"
+61 -20
View File
@@ -525,6 +525,34 @@ def _safe_user_id_for_run(raw_user_id: str) -> str:
return make_safe_user_id(raw_user_id) return make_safe_user_id(raw_user_id)
def _channel_storage_user_id(msg: InboundMessage) -> str | None:
"""Resolve the canonical DeerFlow user id for a channel-triggered message.
Single source of truth for both the agent **run identity**
(``_resolve_run_params`` → ``run_context["user_id"]``) and the **file/artifact
storage bucket** (``receive_file`` / ``_ingest_inbound_files`` /
``_prepare_artifact_delivery``), so the bucket the agent reads/writes always
matches where channel files are staged. Prefer the bound DeerFlow owner,
otherwise fall back to the sanitized raw platform user id. Without that
fallback, an unbound auth-enabled channel would run under ``safe(msg.user_id)``
but stage files under ``get_effective_user_id()`` (the dispatcher task's unset
contextvar → ``"default"``), so uploads would land in ``users/default/...``
while the agent reads ``users/{safe_platform_user_id}/...``. Returns ``None``
only when neither identity is available, leaving the caller to fall back to the
contextvar/default user.
Distinct from :func:`_owner_headers`, which deliberately sends the *raw* owner
id (no sanitize, no platform fallback) over HTTP for gateway to re-resolve;
this helper is the in-process, sanitized, filesystem-facing identity.
"""
owner_user_id = _effective_owner_user_id(msg)
if owner_user_id:
return _safe_user_id_for_run(owner_user_id)
if msg.user_id:
return _safe_user_id_for_run(msg.user_id)
return None
def _resolve_slash_skill_command( def _resolve_slash_skill_command(
text: str, text: str,
available_skills: set[str] | None = None, available_skills: set[str] | None = None,
@@ -551,7 +579,7 @@ def _resolve_slash_skill_command(
raise SlashSkillCommandResolutionError("Failed to resolve slash skill command. Please check the skill configuration.") from exc raise SlashSkillCommandResolutionError("Failed to resolve slash skill command. Please check the skill configuration.") from exc
def _resolve_attachments(thread_id: str, artifacts: list[str]) -> list[ResolvedAttachment]: def _resolve_attachments(thread_id: str, artifacts: list[str], *, user_id: str | None = None) -> list[ResolvedAttachment]:
"""Resolve virtual artifact paths to host filesystem paths with metadata. """Resolve virtual artifact paths to host filesystem paths with metadata.
Only paths under ``/mnt/user-data/outputs/`` are accepted; any other Only paths under ``/mnt/user-data/outputs/`` are accepted; any other
@@ -565,15 +593,15 @@ def _resolve_attachments(thread_id: str, artifacts: list[str]) -> list[ResolvedA
attachments: list[ResolvedAttachment] = [] attachments: list[ResolvedAttachment] = []
paths = get_paths() paths = get_paths()
user_id = get_effective_user_id() effective_user_id = user_id or get_effective_user_id()
outputs_dir = paths.sandbox_outputs_dir(thread_id, user_id=user_id).resolve() outputs_dir = paths.sandbox_outputs_dir(thread_id, user_id=effective_user_id).resolve()
for virtual_path in artifacts: for virtual_path in artifacts:
# Security: only allow files from the agent outputs directory # Security: only allow files from the agent outputs directory
if not virtual_path.startswith(_OUTPUTS_VIRTUAL_PREFIX): if not virtual_path.startswith(_OUTPUTS_VIRTUAL_PREFIX):
logger.warning("[Manager] rejected non-outputs artifact path: %s", virtual_path) logger.warning("[Manager] rejected non-outputs artifact path: %s", virtual_path)
continue continue
try: try:
actual = paths.resolve_virtual_path(thread_id, virtual_path, user_id=user_id) actual = paths.resolve_virtual_path(thread_id, virtual_path, user_id=effective_user_id)
# Verify the resolved path is actually under the outputs directory # Verify the resolved path is actually under the outputs directory
# (guards against path-traversal even after prefix check) # (guards against path-traversal even after prefix check)
try: try:
@@ -605,13 +633,15 @@ def _prepare_artifact_delivery(
thread_id: str, thread_id: str,
response_text: str, response_text: str,
artifacts: list[str], artifacts: list[str],
*,
user_id: str | None = None,
) -> tuple[str, list[ResolvedAttachment]]: ) -> tuple[str, list[ResolvedAttachment]]:
"""Resolve attachments and append filename fallbacks to the text response.""" """Resolve attachments and append filename fallbacks to the text response."""
attachments: list[ResolvedAttachment] = [] attachments: list[ResolvedAttachment] = []
if not artifacts: if not artifacts:
return response_text, attachments return response_text, attachments
attachments = _resolve_attachments(thread_id, artifacts) attachments = _resolve_attachments(thread_id, artifacts, user_id=user_id)
resolved_virtuals = {attachment.virtual_path for attachment in attachments} resolved_virtuals = {attachment.virtual_path for attachment in attachments}
unresolved = [path for path in artifacts if path not in resolved_virtuals] unresolved = [path for path in artifacts if path not in resolved_virtuals]
@@ -628,7 +658,7 @@ def _prepare_artifact_delivery(
return response_text, attachments return response_text, attachments
async def _ingest_inbound_files(thread_id: str, msg: InboundMessage) -> list[dict[str, Any]]: async def _ingest_inbound_files(thread_id: str, msg: InboundMessage, *, user_id: str | None = None) -> list[dict[str, Any]]:
if not msg.files: if not msg.files:
return [] return []
@@ -643,7 +673,7 @@ async def _ingest_inbound_files(thread_id: str, msg: InboundMessage) -> list[dic
def _prepare_uploads_dir() -> tuple[Path, set[str]]: def _prepare_uploads_dir() -> tuple[Path, set[str]]:
# Worker thread: ensure_uploads_dir's mkdir and the iterdir enumeration are # Worker thread: ensure_uploads_dir's mkdir and the iterdir enumeration are
# blocking filesystem IO that must stay off the event loop. # blocking filesystem IO that must stay off the event loop.
target = ensure_uploads_dir(thread_id) target = ensure_uploads_dir(thread_id, user_id=user_id)
existing = {entry.name for entry in target.iterdir() if entry.is_file()} existing = {entry.name for entry in target.iterdir() if entry.is_file()}
return target, existing return target, existing
@@ -833,11 +863,12 @@ class ChannelManager:
# owns the connection. Preserve the raw platform user under # owns the connection. Preserve the raw platform user under
# ``channel_user_id`` for platform-facing lookups and audits. # ``channel_user_id`` for platform-facing lookups and audits.
run_context_identity: dict[str, Any] = {"thread_id": thread_id} run_context_identity: dict[str, Any] = {"thread_id": thread_id}
owner_user_id = _effective_owner_user_id(msg) # Single source of truth for the run identity: the same helper that scopes
if owner_user_id: # inbound files and outbound artifacts, so the bucket the agent reads/writes
run_context_identity["user_id"] = _safe_user_id_for_run(owner_user_id) # always matches where channel files are staged.
elif msg.user_id: run_user_id = _channel_storage_user_id(msg)
run_context_identity["user_id"] = _safe_user_id_for_run(msg.user_id) if run_user_id:
run_context_identity["user_id"] = run_user_id
if msg.user_id: if msg.user_id:
run_context_identity["channel_user_id"] = msg.user_id run_context_identity["channel_user_id"] = msg.user_id
@@ -1215,6 +1246,7 @@ class ChannelManager:
return return
client = self._get_client() client = self._get_client()
storage_user_id = _channel_storage_user_id(msg)
# Look up existing DeerFlow thread. # Look up existing DeerFlow thread.
# topic_id may be None (e.g. Telegram private chats) — the store # topic_id may be None (e.g. Telegram private chats) — the store
@@ -1240,12 +1272,12 @@ class ChannelManager:
service = get_channel_service() service = get_channel_service()
channel = service.get_channel(msg.channel_name) if service else None channel = service.get_channel(msg.channel_name) if service else None
logger.info("[Manager] preparing receive file context for %d attachments", len(msg.files)) logger.info("[Manager] preparing receive file context for %d attachments", len(msg.files))
msg = await channel.receive_file(msg, thread_id) if channel else msg msg = await channel.receive_file(msg, thread_id, user_id=storage_user_id) if channel else msg
if extra_context: if extra_context:
run_context.update(extra_context) run_context.update(extra_context)
original_text = msg.text original_text = msg.text
uploaded = await _ingest_inbound_files(thread_id, msg) uploaded = await _ingest_inbound_files(thread_id, msg, user_id=storage_user_id)
if uploaded: if uploaded:
msg.text = f"{_format_uploaded_files_block(uploaded)}\n\n{msg.text}".strip() msg.text = f"{_format_uploaded_files_block(uploaded)}\n\n{msg.text}".strip()
human_message = _human_input_message(msg.text, original_content=original_text) human_message = _human_input_message(msg.text, original_content=original_text)
@@ -1259,6 +1291,7 @@ class ChannelManager:
run_config, run_config,
run_context, run_context,
human_message, human_message,
storage_user_id=storage_user_id,
) )
return return
@@ -1296,7 +1329,10 @@ class ChannelManager:
len(artifacts), len(artifacts),
) )
response_text, attachments = _prepare_artifact_delivery(thread_id, response_text, artifacts) # Reuse the storage owner cached at the top of _handle_chat so uploads and
# artifact delivery always resolve to the same bucket, even if a future
# channel.receive_file returns a rewritten InboundMessage.
response_text, attachments = _prepare_artifact_delivery(thread_id, response_text, artifacts, user_id=storage_user_id)
if not response_text: if not response_text:
if attachments: if attachments:
@@ -1328,6 +1364,7 @@ class ChannelManager:
run_config: dict[str, Any], run_config: dict[str, Any],
run_context: dict[str, Any], run_context: dict[str, Any],
human_message: dict[str, Any], human_message: dict[str, Any],
storage_user_id: str | None = None,
) -> None: ) -> None:
logger.info("[Manager] invoking runs.stream(thread_id=%s, text_len=%d)", thread_id, len(msg.text or "")) logger.info("[Manager] invoking runs.stream(thread_id=%s, text_len=%d)", thread_id, len(msg.text or ""))
@@ -1400,7 +1437,10 @@ class ChannelManager:
response_text = _extract_response_text(result) response_text = _extract_response_text(result)
pending_clarification = _has_current_turn_clarification(result) pending_clarification = _has_current_turn_clarification(result)
artifacts = _extract_artifacts(result) artifacts = _extract_artifacts(result)
response_text, attachments = _prepare_artifact_delivery(thread_id, response_text, artifacts) # Reuse the storage owner resolved by _handle_chat so artifact delivery
# matches the upload bucket and we avoid re-running _safe_user_id_for_run
# (and its possible filesystem touch) on the streaming-error path.
response_text, attachments = _prepare_artifact_delivery(thread_id, response_text, artifacts, user_id=storage_user_id)
if not response_text: if not response_text:
if attachments: if attachments:
@@ -1481,9 +1521,9 @@ class ChannelManager:
thread_id = await self._lookup_thread_id(msg) thread_id = await self._lookup_thread_id(msg)
reply = f"Active thread: {thread_id}" if thread_id else "No active conversation." reply = f"Active thread: {thread_id}" if thread_id else "No active conversation."
elif reply is None and command == "models": elif reply is None and command == "models":
reply = await self._fetch_gateway("/api/models", "models") reply = await self._fetch_gateway("/api/models", "models", msg=msg)
elif reply is None and command == "memory": elif reply is None and command == "memory":
reply = await self._fetch_gateway("/api/memory", "memory") reply = await self._fetch_gateway("/api/memory", "memory", msg=msg)
elif reply is None and command == "help": elif reply is None and command == "help":
reply = ( reply = (
"Available commands:\n" "Available commands:\n"
@@ -1526,16 +1566,17 @@ class ChannelManager:
) )
await self.bus.publish_outbound(outbound) await self.bus.publish_outbound(outbound)
async def _fetch_gateway(self, path: str, kind: str) -> str: async def _fetch_gateway(self, path: str, kind: str, *, msg: InboundMessage | None = None) -> str:
"""Fetch data from the Gateway API for command responses.""" """Fetch data from the Gateway API for command responses."""
import httpx import httpx
try: try:
headers = _owner_headers(msg) if msg is not None else None
async with httpx.AsyncClient() as http: async with httpx.AsyncClient() as http:
resp = await http.get( resp = await http.get(
f"{self._gateway_url}{path}", f"{self._gateway_url}{path}",
timeout=10, timeout=10,
headers=create_internal_auth_headers(), headers=headers or create_internal_auth_headers(),
) )
resp.raise_for_status() resp.raise_for_status()
data = resp.json() data = resp.json()
+43 -19
View File
@@ -1,8 +1,9 @@
"""Memory API router for retrieving and managing global memory data.""" """Memory API router for retrieving and managing global memory data."""
from fastapi import APIRouter, HTTPException from fastapi import APIRouter, HTTPException, Request
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from app.gateway.internal_auth import get_trusted_internal_owner_user_id
from deerflow.agents.memory.updater import ( from deerflow.agents.memory.updater import (
clear_memory_data, clear_memory_data,
create_memory_fact, create_memory_fact,
@@ -13,11 +14,34 @@ from deerflow.agents.memory.updater import (
update_memory_fact, update_memory_fact,
) )
from deerflow.config.memory_config import get_memory_config from deerflow.config.memory_config import get_memory_config
from deerflow.config.paths import make_safe_user_id
from deerflow.runtime.user_context import get_effective_user_id from deerflow.runtime.user_context import get_effective_user_id
router = APIRouter(prefix="/api", tags=["memory"]) router = APIRouter(prefix="/api", tags=["memory"])
def _resolve_memory_user_id(request: Request) -> str:
"""Resolve the memory owner for this request.
Honors the trusted internal owner header that channel workers attach when
acting for a connection owner, so an IM ``/memory`` command reads the bound
owner's memory instead of the synthetic internal user. The header is only
honored after ``AuthMiddleware`` validated the internal token (see
``get_trusted_internal_owner_user_id``). Browser/API callers are never
internal, so this falls back to the normal contextvar-based effective user.
The trusted owner header carries the *raw* owner id, so sanitize it through
``make_safe_user_id`` (the same normalization the channel file pipeline applies
via ``_safe_user_id_for_run``/``prepare_user_dir_for_raw_id``). This keeps the
memory bucket aligned with the owner's file/upload bucket and avoids a 500 when
the raw id contains characters ``_validate_user_id`` would reject.
"""
raw_owner = get_trusted_internal_owner_user_id(request)
if raw_owner:
return make_safe_user_id(raw_owner)
return get_effective_user_id()
class ContextSection(BaseModel): class ContextSection(BaseModel):
"""Model for context sections (user and history).""" """Model for context sections (user and history)."""
@@ -115,7 +139,7 @@ class MemoryStatusResponse(BaseModel):
summary="Get Memory Data", summary="Get Memory Data",
description="Retrieve the current global memory data including user context, history, and facts.", description="Retrieve the current global memory data including user context, history, and facts.",
) )
async def get_memory() -> MemoryResponse: async def get_memory(http_request: Request) -> MemoryResponse:
"""Get the current global memory data. """Get the current global memory data.
Returns: Returns:
@@ -149,7 +173,7 @@ async def get_memory() -> MemoryResponse:
} }
``` ```
""" """
memory_data = get_memory_data(user_id=get_effective_user_id()) memory_data = get_memory_data(user_id=_resolve_memory_user_id(http_request))
return MemoryResponse(**memory_data) return MemoryResponse(**memory_data)
@@ -160,7 +184,7 @@ async def get_memory() -> MemoryResponse:
summary="Reload Memory Data", summary="Reload Memory Data",
description="Reload memory data from the storage file, refreshing the in-memory cache.", description="Reload memory data from the storage file, refreshing the in-memory cache.",
) )
async def reload_memory() -> MemoryResponse: async def reload_memory(http_request: Request) -> MemoryResponse:
"""Reload memory data from file. """Reload memory data from file.
This forces a reload of the memory data from the storage file, This forces a reload of the memory data from the storage file,
@@ -169,7 +193,7 @@ async def reload_memory() -> MemoryResponse:
Returns: Returns:
The reloaded memory data. The reloaded memory data.
""" """
memory_data = reload_memory_data(user_id=get_effective_user_id()) memory_data = reload_memory_data(user_id=_resolve_memory_user_id(http_request))
return MemoryResponse(**memory_data) return MemoryResponse(**memory_data)
@@ -180,10 +204,10 @@ async def reload_memory() -> MemoryResponse:
summary="Clear All Memory Data", summary="Clear All Memory Data",
description="Delete all saved memory data and reset the memory structure to an empty state.", description="Delete all saved memory data and reset the memory structure to an empty state.",
) )
async def clear_memory() -> MemoryResponse: async def clear_memory(http_request: Request) -> MemoryResponse:
"""Clear all persisted memory data.""" """Clear all persisted memory data."""
try: try:
memory_data = clear_memory_data(user_id=get_effective_user_id()) memory_data = clear_memory_data(user_id=_resolve_memory_user_id(http_request))
except OSError as exc: except OSError as exc:
raise HTTPException(status_code=500, detail="Failed to clear memory data.") from exc raise HTTPException(status_code=500, detail="Failed to clear memory data.") from exc
@@ -197,14 +221,14 @@ async def clear_memory() -> MemoryResponse:
summary="Create Memory Fact", summary="Create Memory Fact",
description="Create a single saved memory fact manually.", description="Create a single saved memory fact manually.",
) )
async def create_memory_fact_endpoint(request: FactCreateRequest) -> MemoryResponse: async def create_memory_fact_endpoint(request: FactCreateRequest, http_request: Request) -> MemoryResponse:
"""Create a single fact manually.""" """Create a single fact manually."""
try: try:
memory_data = create_memory_fact( memory_data = create_memory_fact(
content=request.content, content=request.content,
category=request.category, category=request.category,
confidence=request.confidence, confidence=request.confidence,
user_id=get_effective_user_id(), user_id=_resolve_memory_user_id(http_request),
) )
except ValueError as exc: except ValueError as exc:
raise _map_memory_fact_value_error(exc) from exc raise _map_memory_fact_value_error(exc) from exc
@@ -221,10 +245,10 @@ async def create_memory_fact_endpoint(request: FactCreateRequest) -> MemoryRespo
summary="Delete Memory Fact", summary="Delete Memory Fact",
description="Delete a single saved memory fact by its fact id.", description="Delete a single saved memory fact by its fact id.",
) )
async def delete_memory_fact_endpoint(fact_id: str) -> MemoryResponse: async def delete_memory_fact_endpoint(fact_id: str, http_request: Request) -> MemoryResponse:
"""Delete a single fact from memory by fact id.""" """Delete a single fact from memory by fact id."""
try: try:
memory_data = delete_memory_fact(fact_id, user_id=get_effective_user_id()) memory_data = delete_memory_fact(fact_id, user_id=_resolve_memory_user_id(http_request))
except KeyError as exc: except KeyError as exc:
raise HTTPException(status_code=404, detail=f"Memory fact '{fact_id}' not found.") from exc raise HTTPException(status_code=404, detail=f"Memory fact '{fact_id}' not found.") from exc
except OSError as exc: except OSError as exc:
@@ -240,7 +264,7 @@ async def delete_memory_fact_endpoint(fact_id: str) -> MemoryResponse:
summary="Patch Memory Fact", summary="Patch Memory Fact",
description="Partially update a single saved memory fact by its fact id while preserving omitted fields.", description="Partially update a single saved memory fact by its fact id while preserving omitted fields.",
) )
async def update_memory_fact_endpoint(fact_id: str, request: FactPatchRequest) -> MemoryResponse: async def update_memory_fact_endpoint(fact_id: str, request: FactPatchRequest, http_request: Request) -> MemoryResponse:
"""Partially update a single fact manually.""" """Partially update a single fact manually."""
try: try:
memory_data = update_memory_fact( memory_data = update_memory_fact(
@@ -248,7 +272,7 @@ async def update_memory_fact_endpoint(fact_id: str, request: FactPatchRequest) -
content=request.content, content=request.content,
category=request.category, category=request.category,
confidence=request.confidence, confidence=request.confidence,
user_id=get_effective_user_id(), user_id=_resolve_memory_user_id(http_request),
) )
except ValueError as exc: except ValueError as exc:
raise _map_memory_fact_value_error(exc) from exc raise _map_memory_fact_value_error(exc) from exc
@@ -267,9 +291,9 @@ async def update_memory_fact_endpoint(fact_id: str, request: FactPatchRequest) -
summary="Export Memory Data", summary="Export Memory Data",
description="Export the current global memory data as JSON for backup or transfer.", description="Export the current global memory data as JSON for backup or transfer.",
) )
async def export_memory() -> MemoryResponse: async def export_memory(http_request: Request) -> MemoryResponse:
"""Export the current memory data.""" """Export the current memory data."""
memory_data = get_memory_data(user_id=get_effective_user_id()) memory_data = get_memory_data(user_id=_resolve_memory_user_id(http_request))
return MemoryResponse(**memory_data) return MemoryResponse(**memory_data)
@@ -280,10 +304,10 @@ async def export_memory() -> MemoryResponse:
summary="Import Memory Data", summary="Import Memory Data",
description="Import and overwrite the current global memory data from a JSON payload.", description="Import and overwrite the current global memory data from a JSON payload.",
) )
async def import_memory(request: MemoryResponse) -> MemoryResponse: async def import_memory(request: MemoryResponse, http_request: Request) -> MemoryResponse:
"""Import and persist memory data.""" """Import and persist memory data."""
try: try:
memory_data = import_memory_data(request.model_dump(), user_id=get_effective_user_id()) memory_data = import_memory_data(request.model_dump(), user_id=_resolve_memory_user_id(http_request))
except OSError as exc: except OSError as exc:
raise HTTPException(status_code=500, detail="Failed to import memory data.") from exc raise HTTPException(status_code=500, detail="Failed to import memory data.") from exc
@@ -336,14 +360,14 @@ async def get_memory_config_endpoint() -> MemoryConfigResponse:
summary="Get Memory Status", summary="Get Memory Status",
description="Retrieve both memory configuration and current data in a single request.", description="Retrieve both memory configuration and current data in a single request.",
) )
async def get_memory_status() -> MemoryStatusResponse: async def get_memory_status(http_request: Request) -> MemoryStatusResponse:
"""Get the memory system status including configuration and data. """Get the memory system status including configuration and data.
Returns: Returns:
Combined memory configuration and current data. Combined memory configuration and current data.
""" """
config = get_memory_config() config = get_memory_config()
memory_data = get_memory_data(user_id=get_effective_user_id()) memory_data = get_memory_data(user_id=_resolve_memory_user_id(http_request))
return MemoryStatusResponse( return MemoryStatusResponse(
config=MemoryConfigResponse( config=MemoryConfigResponse(
@@ -37,15 +37,15 @@ def validate_thread_id(thread_id: str) -> None:
raise ValueError(f"Invalid thread_id: {thread_id!r}") raise ValueError(f"Invalid thread_id: {thread_id!r}")
def get_uploads_dir(thread_id: str) -> Path: def get_uploads_dir(thread_id: str, *, user_id: str | None = None) -> Path:
"""Return the uploads directory path for a thread (no side effects).""" """Return the uploads directory path for a thread (no side effects)."""
validate_thread_id(thread_id) validate_thread_id(thread_id)
return get_paths().sandbox_uploads_dir(thread_id, user_id=get_effective_user_id()) return get_paths().sandbox_uploads_dir(thread_id, user_id=user_id or get_effective_user_id())
def ensure_uploads_dir(thread_id: str) -> Path: def ensure_uploads_dir(thread_id: str, *, user_id: str | None = None) -> Path:
"""Return the uploads directory for a thread, creating it if needed.""" """Return the uploads directory for a thread, creating it if needed."""
base = get_uploads_dir(thread_id) base = get_uploads_dir(thread_id, user_id=user_id)
base.mkdir(parents=True, exist_ok=True) base.mkdir(parents=True, exist_ok=True)
return base return base
+200 -3
View File
@@ -679,6 +679,55 @@ class TestChannelManager:
_run(go()) _run(go())
def test_fetch_gateway_uses_bound_owner_headers(self, monkeypatch):
from app.channels.manager import ChannelManager
from app.gateway.internal_auth import INTERNAL_OWNER_USER_ID_HEADER_NAME
class MockResponse:
def raise_for_status(self):
return None
def json(self):
return {"facts": [{"text": "owner fact"}]}
class MockAsyncClient:
def __init__(self, *args, **kwargs):
return None
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc, tb):
return None
async def get(self, url, **kwargs):
calls.append({"url": url, **kwargs})
return MockResponse()
calls = []
monkeypatch.setattr("app.channels.manager.httpx.AsyncClient", MockAsyncClient)
async def go():
bus = MessageBus()
store = ChannelStore(path=Path(tempfile.mkdtemp()) / "store.json")
manager = ChannelManager(bus=bus, store=store, gateway_url="http://gateway:8001")
msg = InboundMessage(
channel_name="slack",
chat_id="C123",
user_id="U-platform",
owner_user_id="deerflow-user-1",
connection_id="connection-1",
text="/memory",
msg_type=InboundMessageType.COMMAND,
)
reply = await manager._fetch_gateway("/api/memory", "memory", msg=msg)
assert reply == "Memory contains 1 fact(s)."
assert calls[0]["headers"][INTERNAL_OWNER_USER_ID_HEADER_NAME] == "deerflow-user-1"
_run(go())
def test_handle_chat_calls_channel_receive_file_for_inbound_files(self, monkeypatch): def test_handle_chat_calls_channel_receive_file_for_inbound_files(self, monkeypatch):
from app.channels.manager import ChannelManager from app.channels.manager import ChannelManager
@@ -716,7 +765,9 @@ class TestChannelManager:
inbound = InboundMessage( inbound = InboundMessage(
channel_name="test", channel_name="test",
chat_id="chat1", chat_id="chat1",
user_id="user1", user_id="platform-user",
owner_user_id="owner-1",
connection_id="connection-1",
text="hi [image]", text="hi [image]",
files=[{"image_key": "img_1"}], files=[{"image_key": "img_1"}],
) )
@@ -729,6 +780,7 @@ class TestChannelManager:
assert called_msg.text == "hi [image]" assert called_msg.text == "hi [image]"
assert isinstance(called_thread_id, str) assert isinstance(called_thread_id, str)
assert called_thread_id assert called_thread_id
assert mock_channel.receive_file.await_args.kwargs["user_id"] == "owner-1"
mock_client.runs.wait.assert_called_once() mock_client.runs.wait.assert_called_once()
run_call_args = mock_client.runs.wait.call_args run_call_args = mock_client.runs.wait.call_args
@@ -736,6 +788,70 @@ class TestChannelManager:
_run(go()) _run(go())
def test_ingest_inbound_files_uses_explicit_owner_bucket(self, tmp_path, monkeypatch):
from app.channels.manager import INBOUND_FILE_READERS, _ingest_inbound_files
from deerflow.config.paths import Paths
paths = Paths(tmp_path)
monkeypatch.setattr("deerflow.uploads.manager.get_paths", lambda: paths)
async def read_file(file_info, client):
del file_info, client
return b"owner data"
INBOUND_FILE_READERS["owner-test"] = read_file
async def go():
try:
created = await _ingest_inbound_files(
"thread-owner",
InboundMessage(
channel_name="owner-test",
chat_id="C123",
user_id="U-platform",
text="file",
files=[{"filename": "report.txt", "type": "file"}],
),
user_id="owner-1",
)
finally:
INBOUND_FILE_READERS.pop("owner-test", None)
assert created == [
{
"filename": "report.txt",
"size": len(b"owner data"),
"path": "/mnt/user-data/uploads/report.txt",
"is_image": False,
}
]
assert (paths.sandbox_uploads_dir("thread-owner", user_id="owner-1") / "report.txt").read_bytes() == b"owner data"
assert not paths.sandbox_uploads_dir("thread-owner").exists()
_run(go())
def test_channel_storage_user_id_falls_back_to_platform_user(self, monkeypatch):
"""Unbound auth-enabled channels stage files under the same bucket the run uses.
``_resolve_run_params`` runs an unbound msg under ``safe(msg.user_id)``, so
``_channel_storage_user_id`` must resolve to the same value instead of
``None`` (which would fall back to ``"default"`` in the dispatcher task and
cross buckets — the agent would read uploads the channel never wrote there).
"""
from app.channels.manager import _channel_storage_user_id, _safe_user_id_for_run
# Auth enabled (no auth-disabled owner), unbound (no owner_user_id).
monkeypatch.setattr("app.channels.manager._auth_disabled_owner_user_id", lambda: None)
unbound = InboundMessage(channel_name="slack", chat_id="C1", user_id="U-platform", text="hi")
assert _channel_storage_user_id(unbound) == _safe_user_id_for_run("U-platform")
bound = InboundMessage(channel_name="slack", chat_id="C1", user_id="U-platform", text="hi", owner_user_id="owner-1")
assert _channel_storage_user_id(bound) == _safe_user_id_for_run("owner-1")
anonymous = InboundMessage(channel_name="slack", chat_id="C1", user_id="", text="hi")
assert _channel_storage_user_id(anonymous) is None
def test_handle_chat_creates_thread(self): def test_handle_chat_creates_thread(self):
from app.channels.manager import ChannelManager from app.channels.manager import ChannelManager
@@ -1862,7 +1978,8 @@ class TestChannelManager:
def test_handle_command_slash_skill_with_attachment_preserves_original_content(self, monkeypatch, tmp_path): def test_handle_command_slash_skill_with_attachment_preserves_original_content(self, monkeypatch, tmp_path):
from app.channels.manager import ChannelManager from app.channels.manager import ChannelManager
async def fake_ingest(thread_id, msg): async def fake_ingest(thread_id, msg, *, user_id=None):
del user_id
return [ return [
{ {
"filename": "report.pdf", "filename": "report.pdf",
@@ -1916,7 +2033,8 @@ class TestChannelManager:
def test_streaming_slash_skill_with_attachment_preserves_original_content(self, monkeypatch, tmp_path): def test_streaming_slash_skill_with_attachment_preserves_original_content(self, monkeypatch, tmp_path):
from app.channels.manager import ChannelManager from app.channels.manager import ChannelManager
async def fake_ingest(thread_id, msg): async def fake_ingest(thread_id, msg, *, user_id=None):
del user_id
return [ return [
{ {
"filename": "report.pdf", "filename": "report.pdf",
@@ -2658,6 +2776,31 @@ class TestResolveRunParamsUserId:
assert run_context["user_id"] == "123456" assert run_context["user_id"] == "123456"
assert run_context["channel_user_id"] == "123456" assert run_context["channel_user_id"] == "123456"
@pytest.mark.parametrize(
"kwargs",
[
{"user_id": "U-platform", "owner_user_id": "deerflow-user-1"}, # bound
{"user_id": "U-platform"}, # unbound auth-enabled
{"user_id": "feishu|ou_AbC/123"}, # unbound needing sanitization
],
)
def test_run_identity_matches_storage_bucket(self, kwargs, monkeypatch):
"""The run user_id and the file/artifact storage bucket share one resolver.
Pins #2 and #3 to a single source of truth so they cannot drift: whatever
_resolve_run_params puts in run_context["user_id"] is exactly what
_channel_storage_user_id scopes uploads/artifacts to.
"""
from app.channels.manager import _channel_storage_user_id
manager = self._manager()
monkeypatch.delenv("DEER_FLOW_AUTH_DISABLED", raising=False)
msg = InboundMessage(channel_name="slack", chat_id="C123", text="hi", **kwargs)
_, _, run_context = manager._resolve_run_params(msg, "thread-1")
assert run_context["user_id"] == _channel_storage_user_id(msg)
def test_connection_owner_user_id_takes_precedence_over_platform_user_id(self, monkeypatch): def test_connection_owner_user_id_takes_precedence_over_platform_user_id(self, monkeypatch):
manager = self._manager() manager = self._manager()
monkeypatch.delenv("DEER_FLOW_AUTH_DISABLED", raising=False) monkeypatch.delenv("DEER_FLOW_AUTH_DISABLED", raising=False)
@@ -3429,6 +3572,60 @@ class TestFormatArtifactText:
class TestHandleChatWithArtifacts: class TestHandleChatWithArtifacts:
def test_bound_owner_artifacts_resolve_from_owner_outputs_bucket(self, tmp_path, monkeypatch):
from app.channels.manager import ChannelManager
from deerflow.config.paths import Paths
paths = Paths(tmp_path)
monkeypatch.setattr("deerflow.config.paths.get_paths", lambda: paths)
outputs_dir = paths.sandbox_outputs_dir("test-thread-123", user_id="owner-1")
outputs_dir.mkdir(parents=True)
(outputs_dir / "report.md").write_text("owner report", encoding="utf-8")
async def go():
bus = MessageBus()
store = ChannelStore(path=tmp_path / "store.json")
manager = ChannelManager(bus=bus, store=store)
run_result = {
"messages": [
{"type": "human", "content": "generate report"},
{
"type": "ai",
"content": "Here is your report.",
"tool_calls": [
{"name": "present_files", "args": {"filepaths": ["/mnt/user-data/outputs/report.md"]}},
],
},
{"type": "tool", "name": "present_files", "content": "ok"},
],
}
mock_client = _make_mock_langgraph_client(run_result=run_result)
manager._client = mock_client
outbound_received = []
bus.subscribe_outbound(lambda msg: outbound_received.append(msg))
await manager.start()
await bus.publish_inbound(
InboundMessage(
channel_name="test",
chat_id="c1",
user_id="U-platform",
owner_user_id="owner-1",
connection_id="connection-1",
text="generate report",
)
)
await _wait_for(lambda: len(outbound_received) >= 1)
await manager.stop()
assert len(outbound_received) == 1
assert len(outbound_received[0].attachments) == 1
assert outbound_received[0].attachments[0].actual_path == outputs_dir / "report.md"
_run(go())
def test_artifacts_appended_to_text(self): def test_artifacts_appended_to_text(self):
from app.channels.manager import ChannelManager from app.channels.manager import ChannelManager
+127
View File
@@ -1,3 +1,5 @@
import asyncio
from types import SimpleNamespace
from unittest.mock import patch from unittest.mock import patch
from fastapi import FastAPI from fastapi import FastAPI
@@ -303,3 +305,128 @@ def test_update_memory_fact_route_returns_specific_error_for_invalid_confidence(
assert response.status_code == 400 assert response.status_code == 400
assert response.json()["detail"] == "Invalid confidence value; must be between 0 and 1." assert response.json()["detail"] == "Invalid confidence value; must be between 0 and 1."
def _internal_owner_request(owner_user_id: str) -> SimpleNamespace:
"""Build a trusted-internal request carrying the connection owner header.
Mirrors what ``AuthMiddleware`` stamps for a channel worker that holds the
internal token (``request.state.user`` is the synthetic internal user) and
what ``ChannelManager._fetch_gateway`` attaches via ``_owner_headers``.
"""
from app.gateway.internal_auth import INTERNAL_OWNER_USER_ID_HEADER_NAME, INTERNAL_SYSTEM_ROLE
from deerflow.runtime.user_context import DEFAULT_USER_ID
return SimpleNamespace(
headers={INTERNAL_OWNER_USER_ID_HEADER_NAME: owner_user_id},
state=SimpleNamespace(user=SimpleNamespace(id=DEFAULT_USER_ID, system_role=INTERNAL_SYSTEM_ROLE)),
)
def test_get_memory_honors_bound_owner_header() -> None:
"""A bound IM ``/memory`` reads the owner's bucket, not the internal user's."""
seen: dict[str, str] = {}
def fake_get_memory_data(*, user_id: str) -> dict:
seen["user_id"] = user_id
return _sample_memory(facts=[{"id": "f", "content": "owner fact", "category": "context", "confidence": 0.9, "createdAt": "", "source": "owner"}])
with patch("app.gateway.routers.memory.get_memory_data", side_effect=fake_get_memory_data):
response = asyncio.run(memory.get_memory(_internal_owner_request("owner-1")))
assert seen["user_id"] == "owner-1"
assert response.facts[0].content == "owner fact"
def test_get_memory_sanitizes_unsafe_owner_header() -> None:
"""A bound owner id needing sanitization routes to the safe bucket, not a 500.
The trusted owner header carries the raw owner id. The memory router must
normalize it through the same ``make_safe_user_id`` the channel file pipeline
applies, so the memory bucket matches the owner's file/upload bucket and the
raw id never reaches ``_validate_user_id`` unsanitized.
"""
from deerflow.config.paths import make_safe_user_id
raw_owner = "feishu|ou_AbC/123"
seen: dict[str, str] = {}
def fake_get_memory_data(*, user_id: str) -> dict:
seen["user_id"] = user_id
return _sample_memory()
with patch("app.gateway.routers.memory.get_memory_data", side_effect=fake_get_memory_data):
asyncio.run(memory.get_memory(_internal_owner_request(raw_owner)))
expected = make_safe_user_id(raw_owner)
assert seen["user_id"] == expected
assert seen["user_id"] != raw_owner
def test_get_memory_falls_back_to_effective_user_for_browser_requests() -> None:
"""Non-internal callers ignore the owner header and use the effective user."""
from app.gateway.internal_auth import INTERNAL_OWNER_USER_ID_HEADER_NAME
seen: dict[str, str] = {}
def fake_get_memory_data(*, user_id: str) -> dict:
seen["user_id"] = user_id
return _sample_memory()
# A real browser user (system_role "user") must never be overridden even if
# a spoofed owner header is present — the header is only honored for the
# synthetic internal caller after the internal token is validated.
browser_request = SimpleNamespace(
headers={INTERNAL_OWNER_USER_ID_HEADER_NAME: "owner-1"},
state=SimpleNamespace(user=SimpleNamespace(id="real-user", system_role="user")),
)
with patch("app.gateway.routers.memory.get_effective_user_id", return_value="real-user"):
with patch("app.gateway.routers.memory.get_memory_data", side_effect=fake_get_memory_data):
asyncio.run(memory.get_memory(browser_request))
assert seen["user_id"] == "real-user"
def _browser_request_with_spoofed_owner_header() -> SimpleNamespace:
from app.gateway.internal_auth import INTERNAL_OWNER_USER_ID_HEADER_NAME
return SimpleNamespace(
headers={INTERNAL_OWNER_USER_ID_HEADER_NAME: "owner-1"},
state=SimpleNamespace(user=SimpleNamespace(id="real-user", system_role="user")),
)
def test_clear_memory_scopes_destructive_write_to_bound_owner() -> None:
"""A bound IM caller clears the owner's bucket; a browser user keeps their own."""
seen: dict[str, str] = {}
def fake_clear(*, user_id: str) -> dict:
seen["user_id"] = user_id
return _sample_memory()
with patch("app.gateway.routers.memory.clear_memory_data", side_effect=fake_clear):
asyncio.run(memory.clear_memory(_internal_owner_request("owner-1")))
assert seen["user_id"] == "owner-1"
with patch("app.gateway.routers.memory.get_effective_user_id", return_value="real-user"):
asyncio.run(memory.clear_memory(_browser_request_with_spoofed_owner_header()))
assert seen["user_id"] == "real-user"
def test_import_memory_scopes_overwrite_to_bound_owner() -> None:
"""A bound IM caller overwrites the owner's bucket; a spoofed header is ignored."""
seen: dict[str, str] = {}
payload = memory.MemoryResponse(**_sample_memory())
def fake_import(_data: dict, *, user_id: str) -> dict:
seen["user_id"] = user_id
return _sample_memory()
with patch("app.gateway.routers.memory.import_memory_data", side_effect=fake_import):
asyncio.run(memory.import_memory(payload, _internal_owner_request("owner-1")))
assert seen["user_id"] == "owner-1"
with patch("app.gateway.routers.memory.get_effective_user_id", return_value="real-user"):
asyncio.run(memory.import_memory(payload, _browser_request_with_spoofed_owner_header()))
assert seen["user_id"] == "real-user"