Files
deer-flow/backend/app/channels/base.py
T
Nan Gao 525af0da14 fix(channels): scope IM files and helper commands to owner (#3579)
* fix(channels): scope IM files and helper commands to owner

* fix(memory): honor bound IM owner for /memory gateway endpoints

The channel manager already attaches X-DeerFlow-Owner-User-Id for /memory
and /models, but the memory router resolved user_id solely from
get_effective_user_id(), which returns the synthetic internal user
(DEFAULT_USER_ID) for channel workers. A bound IM /memory therefore read
the default/internal memory instead of the connection owner's.

Resolve the owner via _resolve_memory_user_id(request) across all
/api/memory* endpoints: trusted internal callers act for the owner header,
browser/API callers fall back to get_effective_user_id(). Mirrors the
threads router's get_trusted_internal_owner_user_id pattern, completing
acceptance criterion #3 of #3539.

Add end-to-end tests asserting the resolved user_id (not just that the
header is sent) and that a spoofed owner header from a browser user is
ignored.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): align memory bucket and reuse cached storage owner

Address PR #3579 review feedback:

- Memory router now sanitizes the trusted owner header via make_safe_user_id
  before routing, matching the channel file pipeline
  (_safe_user_id_for_run/prepare_user_dir_for_raw_id). A bound owner id needing
  sanitization now resolves to the same bucket as its files/uploads instead of
  500ing in _validate_user_id.
- _handle_chat reuses the storage_user_id cached at the top of the method for
  artifact delivery instead of re-deriving _channel_storage_user_id(msg), so
  uploads and outputs cannot drift to different buckets if a channel rewrites
  the InboundMessage in receive_file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): stage unbound IM files under the run's user bucket

Address PR #3579 review feedback (#5): _channel_storage_user_id now mirrors
_resolve_run_params' identity policy, falling back to safe(msg.user_id) instead
of returning None for unbound auth-enabled channels.

Previously an unbound msg ran under safe(platform_user_id) but staged uploads
under get_effective_user_id() in the dispatcher task (unset contextvar ->
"default"), so files landed in users/default/... while the agent read from
users/{safe_platform_user_id}/.... Bound and unbound channels now write where
the agent reads. Returns None only when no identity is available.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): reuse cached storage owner in streaming artifact delivery

Address PR #3579 review feedback (#6): thread the storage_user_id resolved in
_handle_chat into _handle_streaming_chat instead of re-deriving
_channel_storage_user_id(msg) in the finally block. Avoids re-running
_safe_user_id_for_run (and its possible filesystem touch) on the streaming-error
path and guarantees artifact delivery targets the same bucket as the uploads.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(channels): document owner-scoped IM file storage

Address PR #3579 review feedback (#4): the IM Channels and File Upload sections
still described pre-PR default-bucket behaviour. Document that receive_file,
_ingest_inbound_files/ensure_uploads_dir/get_uploads_dir, and
_resolve_attachments/_prepare_artifact_delivery are owner-scoped via the user_id
kwarg, and that the bucket matches the memory bucket from _resolve_memory_user_id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(channels): unify run identity and storage bucket resolution

Address PR #3579 review feedback (#3): _resolve_run_params no longer duplicates
the owner-resolution rule inline. After the #5 fix the inline block and
_channel_storage_user_id computed the identical sanitized-with-platform-fallback
value, so the run identity now calls the same helper, making it the single
source of truth for run_context["user_id"] and the file/artifact storage bucket.

_owner_headers stays deliberately separate: it sends the raw owner id over HTTP
for the gateway to re-resolve (no sanitize, no platform fallback), documented on
both helpers. test_run_identity_matches_storage_bucket pins the two together so
they cannot drift again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 11:45:35 +08:00

200 lines
7.6 KiB
Python

"""Abstract base class for IM channels."""
from __future__ import annotations
import asyncio
import logging
from abc import ABC, abstractmethod
from collections.abc import Awaitable, Callable
from concurrent.futures import CancelledError as FutureCancelledError
from typing import Any, TypeVar
from app.channels.commands import extract_connect_code
from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
logger = logging.getLogger(__name__)
T = TypeVar("T")
class Channel(ABC):
"""Base class for all IM channel implementations.
Each channel connects to an external messaging platform and:
1. Receives messages, wraps them as InboundMessage, publishes to the bus.
2. Subscribes to outbound messages and sends replies back to the platform.
Subclasses must implement ``start``, ``stop``, and ``send``.
"""
def __init__(self, name: str, bus: MessageBus, config: dict[str, Any]) -> None:
self.name = name
self.bus = bus
self.config = config
self._running = False
self._connection_repo: Any = config.get("connection_repo")
@property
def is_running(self) -> bool:
return self._running
@property
def supports_streaming(self) -> bool:
return False
# -- lifecycle ---------------------------------------------------------
@abstractmethod
async def start(self) -> None:
"""Start listening for messages from the external platform."""
@abstractmethod
async def stop(self) -> None:
"""Gracefully stop the channel."""
# -- outbound ----------------------------------------------------------
@abstractmethod
async def send(self, msg: OutboundMessage) -> None:
"""Send a message back to the external platform.
The implementation should use ``msg.chat_id`` and ``msg.thread_ts``
to route the reply to the correct conversation/thread.
"""
async def send_file(self, msg: OutboundMessage, attachment: ResolvedAttachment) -> bool:
"""Upload a single file attachment to the platform.
Returns True if the upload succeeded, False otherwise.
Default implementation returns False (no file upload support).
"""
return False
# -- helpers -----------------------------------------------------------
async def _send_with_retry(
self,
operation: Callable[[], Awaitable[T]],
*,
max_retries: int,
log_prefix: str | None = None,
operation_name: str = "send",
) -> T:
"""Run an outbound send operation with the shared channel retry policy."""
prefix = log_prefix or f"[{self.name}]"
last_exc: Exception | None = None
for attempt in range(max_retries):
try:
return await operation()
except Exception as exc:
last_exc = exc
if attempt < max_retries - 1:
delay = 2**attempt
logger.warning(
"%s %s failed (attempt %d/%d), retrying in %ds: %s",
prefix,
operation_name,
attempt + 1,
max_retries,
delay,
exc,
)
await asyncio.sleep(delay)
logger.error("%s %s failed after %d attempts: %s", prefix, operation_name, max_retries, last_exc)
if last_exc is None:
raise RuntimeError(f"{self.name} {operation_name} failed without an exception from any attempt")
raise last_exc
def _log_future_error(self, fut: Any, name: str, msg_id: Any) -> None:
"""Callback for concurrent futures scheduled from channel worker threads."""
try:
exc = fut.exception()
except (asyncio.CancelledError, FutureCancelledError, asyncio.InvalidStateError):
return
except Exception:
logger.exception("[%s] failed to inspect future for %s (msg_id=%s)", self.name, name, msg_id)
return
if exc:
logger.error("[%s] %s failed for msg_id=%s: %s", self.name, name, msg_id, exc)
def _pending_connect_code(self, text: str) -> str | None:
"""Return the one-time bind code if *text* is a ``/connect <code>`` command
and channel connections are configured, else ``None``.
Adapters MUST consult this **before** applying their ``allowed_users`` /
``_check_user`` gate, so a browser-initiated bind can bootstrap an external
identity that the platform bot has never seen and is therefore not yet
authorized. (Telegram uses its deep-link ``/start <token>`` flow instead.)
"""
if self._connection_repo is None:
return None
return extract_connect_code(text)
def _make_inbound(
self,
chat_id: str,
user_id: str,
text: str,
*,
msg_type: InboundMessageType = InboundMessageType.CHAT,
thread_ts: str | None = None,
files: list[dict[str, Any]] | None = None,
metadata: dict[str, Any] | None = None,
) -> InboundMessage:
"""Convenience factory for creating InboundMessage instances."""
return InboundMessage(
channel_name=self.name,
chat_id=chat_id,
user_id=user_id,
text=text,
msg_type=msg_type,
thread_ts=thread_ts,
files=files or [],
metadata=metadata or {},
)
async def _on_outbound(self, msg: OutboundMessage) -> None:
"""Outbound callback registered with the bus.
Only forwards messages targeted at this channel.
Sends the text message first, then uploads any file attachments.
File uploads are skipped entirely when the text send fails to avoid
partial deliveries (files without accompanying text).
"""
if msg.channel_name == self.name:
try:
await self.send(msg)
except Exception:
logger.exception("Failed to send outbound message on channel %s", self.name)
return # Do not attempt file uploads when the text message failed
for attachment in msg.attachments:
try:
success = await self.send_file(msg, attachment)
if not success:
logger.warning("[%s] file upload skipped for %s", self.name, attachment.filename)
except Exception:
logger.exception("[%s] failed to upload file %s", self.name, attachment.filename)
async def receive_file(self, msg: InboundMessage, thread_id: str, *, user_id: str | None = None) -> InboundMessage:
"""
Optionally process and materialize inbound file attachments for this channel.
By default, this method does nothing and simply returns the original message.
Subclasses (e.g. FeishuChannel) may override this to download files (images, documents, etc)
referenced in msg.files, save them to the sandbox, and update msg.text to include
the sandbox file paths for downstream model consumption.
Args:
msg: The inbound message, possibly containing file metadata in msg.files.
thread_id: The resolved DeerFlow thread ID for sandbox path context.
user_id: Optional DeerFlow storage user ID for user-scoped channel workers.
Returns:
The (possibly modified) InboundMessage, with text and/or files updated as needed.
"""
del user_id
return msg