mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-18 21:55:59 +00:00
fix(channels): make channel connect flow deterministic (#3582)
* fix(channels): make channel connect flow deterministic * make format * fix(channels): apply connect-code before allowed_users on telegram and wechat The bind-bootstrap reorder shipped for slack/dingtalk only. Telegram and WeChat still gate _check_user/allowed_users before connect-code dispatch, so a newly allowlisted-but-unbound user is silently rejected when binding via the browser deep-link / connect-code flow — the same deadlock the PR fixes. - telegram: consume the /start deep-link token before the allowed_users gate. - wechat: handle the /connect code before the allowed_users gate, and defer inbound file extraction + context-token tracking past the gate so blocked senders no longer trigger CDN downloads or token bookkeeping. Adds regression tests for both adapters mirroring the slack/dingtalk coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): enforce single-active-owner invariant at the DB layer _revoke_other_active_owners did a SELECT-then-UPDATE in app code with no row lock or constraint covering active rows. Under READ COMMITTED, two concurrent connect-code consumes for the same (provider, external_account_id, workspace_id) from different owners could each observe "no other active owner" and both commit a connected row, leaving find_connection_by_external_identity nondeterministic. - Add a partial unique index on (provider, external_account_id, workspace_id) WHERE status != 'revoked' (portable to SQLite >= 3.8.0 and PostgreSQL) so the database guarantees at most one non-revoked row per external identity. - Reorder upsert_connection to revoke other owners' active rows before the new connected row is flushed (so the index is satisfied at commit), wrapped in a bounded rollback-and-retry loop. A losing concurrent writer now retries against the now-visible state instead of committing a duplicate. Adds DB-constraint, revoked-slot-reuse, and concurrent-upsert regression tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): harden connect-status polling primitive pollChannelConnectionUntilResolved was a free-floating recursive setTimeout started from onSuccess with no cancellation, no per-provider dedup, a redundant second endpoint per tick, and an unbounded loop on a non-finite expires_in. - Extract a framework-agnostic, cancellable poller (connect-poll.ts) that polls only listChannelConnections() and invalidates the providers query once when the bind resolves, instead of fetching both endpoints every tick. - Guard expires_in with a finite check + default window so undefined/NaN can no longer produce a poll loop that runs until the page closes. - Track one active poll handle per provider in useConnectChannelProvider via a ref Map: a new connect cancels the prior poll for that provider, and a useEffect cleanup cancels all polls on unmount. Adds unit tests for resolve-and-stop, cancellation, and non-finite-expiry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): stop leaking blocked-sender content in DingTalk INFO log; document bind semantics Moving the allowed_users gate past _extract_text meant the parsed-message INFO log (text=%r, first 100 chars) fired for senders that allowed_users would have rejected, defeating the filter's noise/privacy role. Move that log to after the allowed_users gate so blocked senders' message text never reaches INFO logs. Also document the two operator-relevant semantic changes in backend/CLAUDE.md: connect-code dispatch runs before allowed_users (so allowed_users is no longer a bind-time defense; the model relies on code confidentiality + 600s TTL + one-time consumption), and the single-active-owner-per-external-identity transfer semantics now backed by the partial unique index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(channels): note connect-code-vs-allowlist and ownership transfer in operator guide Mirror the backend/CLAUDE.md notes in the operator-facing IM_CHANNEL_CONNECTIONS.md: connect codes are consumed before allowed_users (so a not-yet-allowlisted user can still complete a first bind, and allowed_users is not a bind-time defense), and an external identity has at most one active owner with last-bind-wins transfer enforced at the DB layer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(channels): lift connect-code dispatch into Channel base class Each adapter duplicated the ordering-sensitive boilerplate of extracting a /connect code and guarding on the connection repo before its allowed_users gate. The duplication is what let telegram/wechat drift and keep the gate ahead of the bind. Centralize it: - Move `_connection_repo` onto Channel.__init__ (removing 7 duplicate assignments). - Add Channel._pending_connect_code(text), which guards on the repo and extracts the code, documenting that adapters MUST consult it before authorization so a browser-initiated bind can bootstrap a not-yet-authorized identity. - Route slack, discord, feishu, dingtalk, wechat, and wecom through the helper. This also fixes a latent inconsistency where slack dispatched a bind even when no connection repo was configured. Pure refactor — the full channel suite stays green; adds a direct unit test for the base helper's contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * make format * fix(channels): redact DingTalk parsed-message INFO log content Log text_len instead of the first 100 chars of message text, so message content never reaches INFO logs (the after-gate move already keeps blocked senders out entirely). This takes over the redaction from #3584 so only this PR touches dingtalk.py, letting the two PRs merge in any order conflict-free. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -9,6 +9,7 @@ from collections.abc import Awaitable, Callable
|
||||
from concurrent.futures import CancelledError as FutureCancelledError
|
||||
from typing import Any, TypeVar
|
||||
|
||||
from app.channels.commands import extract_connect_code
|
||||
from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -31,6 +32,7 @@ class Channel(ABC):
|
||||
self.bus = bus
|
||||
self.config = config
|
||||
self._running = False
|
||||
self._connection_repo: Any = config.get("connection_repo")
|
||||
|
||||
@property
|
||||
def is_running(self) -> bool:
|
||||
@@ -117,6 +119,19 @@ class Channel(ABC):
|
||||
if exc:
|
||||
logger.error("[%s] %s failed for msg_id=%s: %s", self.name, name, msg_id, exc)
|
||||
|
||||
def _pending_connect_code(self, text: str) -> str | None:
|
||||
"""Return the one-time bind code if *text* is a ``/connect <code>`` command
|
||||
and channel connections are configured, else ``None``.
|
||||
|
||||
Adapters MUST consult this **before** applying their ``allowed_users`` /
|
||||
``_check_user`` gate, so a browser-initiated bind can bootstrap an external
|
||||
identity that the platform bot has never seen and is therefore not yet
|
||||
authorized. (Telegram uses its deep-link ``/start <token>`` flow instead.)
|
||||
"""
|
||||
if self._connection_repo is None:
|
||||
return None
|
||||
return extract_connect_code(text)
|
||||
|
||||
def _make_inbound(
|
||||
self,
|
||||
chat_id: str,
|
||||
|
||||
@@ -14,7 +14,7 @@ from typing import Any
|
||||
import httpx
|
||||
|
||||
from app.channels.base import Channel
|
||||
from app.channels.commands import extract_connect_code, is_known_channel_command
|
||||
from app.channels.commands import is_known_channel_command
|
||||
from app.channels.connection_identity import attach_connection_identity
|
||||
from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
|
||||
|
||||
@@ -137,7 +137,6 @@ class DingTalkChannel(Channel):
|
||||
self._incoming_messages: dict[str, Any] = {}
|
||||
self._incoming_messages_lock = threading.Lock()
|
||||
self._card_repliers: dict[str, Any] = {}
|
||||
self._connection_repo = config.get("connection_repo")
|
||||
|
||||
@property
|
||||
def supports_streaming(self) -> bool:
|
||||
@@ -366,26 +365,13 @@ class DingTalkChannel(Channel):
|
||||
msg_id = message.message_id or ""
|
||||
sender_nick = message.sender_nick or ""
|
||||
|
||||
if self._allowed_users and sender_staff_id not in self._allowed_users:
|
||||
logger.debug("[DingTalk] ignoring message from non-allowed user: %s", sender_staff_id)
|
||||
return
|
||||
|
||||
text = self._extract_text(message)
|
||||
if not text:
|
||||
logger.info("[DingTalk] empty text, ignoring message")
|
||||
return
|
||||
|
||||
logger.info(
|
||||
"[DingTalk] parsed message: conv_type=%s, msg_id=%s, sender=%s(%s), text=%r",
|
||||
conversation_type,
|
||||
msg_id,
|
||||
sender_staff_id,
|
||||
sender_nick,
|
||||
text[:100],
|
||||
)
|
||||
|
||||
connect_code = extract_connect_code(text)
|
||||
if connect_code and self._connection_repo is not None:
|
||||
connect_code = self._pending_connect_code(text)
|
||||
if connect_code:
|
||||
if self._main_loop and self._main_loop.is_running():
|
||||
fut = asyncio.run_coroutine_threadsafe(
|
||||
self._bind_connection_from_connect_code(
|
||||
@@ -402,6 +388,22 @@ class DingTalkChannel(Channel):
|
||||
logger.warning("[DingTalk] main loop not running, cannot bind channel connection")
|
||||
return
|
||||
|
||||
if self._allowed_users and sender_staff_id not in self._allowed_users:
|
||||
logger.debug("[DingTalk] ignoring message from non-allowed user: %s", sender_staff_id)
|
||||
return
|
||||
|
||||
# Log only metadata (length, not content) so message text never reaches
|
||||
# INFO logs, and only after the allowed_users gate so blocked senders are
|
||||
# not logged at all.
|
||||
logger.info(
|
||||
"[DingTalk] parsed message: conv_type=%s, msg_id=%s, sender=%s(%s), text_len=%d",
|
||||
conversation_type,
|
||||
msg_id,
|
||||
sender_staff_id,
|
||||
sender_nick,
|
||||
len(text or ""),
|
||||
)
|
||||
|
||||
if _is_dingtalk_command(text):
|
||||
msg_type = InboundMessageType.COMMAND
|
||||
else:
|
||||
|
||||
@@ -10,7 +10,7 @@ from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from app.channels.base import Channel
|
||||
from app.channels.commands import extract_connect_code, is_known_channel_command
|
||||
from app.channels.commands import is_known_channel_command
|
||||
from app.channels.connection_identity import attach_connection_identity
|
||||
from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
|
||||
|
||||
@@ -71,7 +71,6 @@ class DiscordChannel(Channel):
|
||||
self._discord_loop: asyncio.AbstractEventLoop | None = None
|
||||
self._main_loop: asyncio.AbstractEventLoop | None = None
|
||||
self._discord_module = None
|
||||
self._connection_repo = config.get("connection_repo")
|
||||
|
||||
async def start(self) -> None:
|
||||
if self._running:
|
||||
@@ -293,7 +292,7 @@ class DiscordChannel(Channel):
|
||||
text = text.replace(bot_mention or "", "").replace(alt_mention or "", "").replace(standard_mention or "", "").strip()
|
||||
# Don't return early if text is empty — still process the mention (e.g., create thread)
|
||||
|
||||
connect_code = extract_connect_code(text)
|
||||
connect_code = self._pending_connect_code(text)
|
||||
if connect_code and await self._bind_connection_from_connect_code(message, connect_code):
|
||||
return
|
||||
|
||||
|
||||
@@ -11,7 +11,7 @@ import time
|
||||
from typing import Any, Literal
|
||||
|
||||
from app.channels.base import Channel
|
||||
from app.channels.commands import extract_connect_code, is_known_channel_command
|
||||
from app.channels.commands import is_known_channel_command
|
||||
from app.channels.connection_identity import attach_connection_identity
|
||||
from app.channels.message_bus import (
|
||||
PENDING_CLARIFICATION_METADATA_KEY,
|
||||
@@ -72,7 +72,6 @@ class FeishuChannel(Channel):
|
||||
self._CreateImageRequestBody = None
|
||||
self._GetMessageResourceRequest = None
|
||||
self._thread_lock = threading.Lock()
|
||||
self._connection_repo = config.get("connection_repo")
|
||||
|
||||
@staticmethod
|
||||
def _non_empty_str(value: Any) -> str | None:
|
||||
@@ -851,8 +850,8 @@ class FeishuChannel(Channel):
|
||||
logger.info("[Feishu] empty text, ignoring message")
|
||||
return
|
||||
|
||||
connect_code = extract_connect_code(text)
|
||||
if connect_code and self._connection_repo is not None:
|
||||
connect_code = self._pending_connect_code(text)
|
||||
if connect_code:
|
||||
if self._main_loop and self._main_loop.is_running():
|
||||
fut = asyncio.run_coroutine_threadsafe(
|
||||
self._bind_connection_from_connect_code(
|
||||
|
||||
@@ -9,7 +9,7 @@ from typing import Any
|
||||
from markdown_to_mrkdwn import SlackMarkdownConverter
|
||||
|
||||
from app.channels.base import Channel
|
||||
from app.channels.commands import extract_connect_code, is_known_channel_command
|
||||
from app.channels.commands import is_known_channel_command
|
||||
from app.channels.connection_identity import attach_connection_identity
|
||||
from app.channels.message_bus import InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
|
||||
|
||||
@@ -65,7 +65,6 @@ class SlackChannel(Channel):
|
||||
self._web_client = None
|
||||
self._loop: asyncio.AbstractEventLoop | None = None
|
||||
self._allowed_users = _normalize_allowed_users(config.get("allowed_users", []))
|
||||
self._connection_repo = config.get("connection_repo")
|
||||
self._web_client_factory = config.get("web_client_factory")
|
||||
self._connection_web_clients: dict[str, tuple[str, Any]] = {}
|
||||
configured_bot_user_id = config.get("bot_user_id")
|
||||
@@ -295,18 +294,13 @@ class SlackChannel(Channel):
|
||||
|
||||
user_id = event.get("user", "")
|
||||
|
||||
# Check allowed users
|
||||
if self._allowed_users and user_id not in self._allowed_users:
|
||||
logger.debug("Ignoring message from non-allowed user: %s", user_id)
|
||||
return
|
||||
|
||||
text = event.get("text", "").strip()
|
||||
if event.get("type") == "app_mention":
|
||||
text = _strip_leading_slack_bot_mention(text, self._bot_user_id)
|
||||
if not text:
|
||||
return
|
||||
|
||||
connect_code = extract_connect_code(text)
|
||||
connect_code = self._pending_connect_code(text)
|
||||
if connect_code:
|
||||
if self._loop and self._loop.is_running():
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
@@ -319,6 +313,12 @@ class SlackChannel(Channel):
|
||||
)
|
||||
return
|
||||
|
||||
# Check allowed users after connect-code handling so browser-initiated
|
||||
# binding can bootstrap a new external identity.
|
||||
if self._allowed_users and user_id not in self._allowed_users:
|
||||
logger.debug("Ignoring message from non-allowed user: %s", user_id)
|
||||
return
|
||||
|
||||
channel_id = event.get("channel", "")
|
||||
thread_ts = event.get("thread_ts") or event.get("ts", "")
|
||||
|
||||
|
||||
@@ -52,7 +52,6 @@ class TelegramChannel(Channel):
|
||||
# stream_key ("chat_id:thread_ts") -> state of the in-flight streamed
|
||||
# bot message being edited in place: {"message_id", "last_edit_at", "last_text"}
|
||||
self._stream_messages: dict[str, dict[str, Any]] = {}
|
||||
self._connection_repo = config.get("connection_repo")
|
||||
|
||||
@property
|
||||
def supports_streaming(self) -> bool:
|
||||
@@ -463,13 +462,15 @@ class TelegramChannel(Channel):
|
||||
|
||||
async def _cmd_start(self, update, context) -> None:
|
||||
"""Handle /start command."""
|
||||
if not self._check_user(update.effective_user.id):
|
||||
return
|
||||
args = getattr(context, "args", []) if context is not None else []
|
||||
if args:
|
||||
# Handle the deep-link bind token before applying allowed_users so a
|
||||
# browser-initiated bind can bootstrap a new external identity.
|
||||
handled = await self._bind_connection_from_start_token(update, str(args[0]))
|
||||
if handled:
|
||||
return
|
||||
if not self._check_user(update.effective_user.id):
|
||||
return
|
||||
await update.message.reply_text("Welcome to DeerFlow! Send me a message to start a conversation.\nType /help for available commands.")
|
||||
|
||||
async def _process_incoming_with_reply(self, chat_id: str, msg_id: int, inbound: InboundMessage) -> None:
|
||||
|
||||
@@ -22,7 +22,7 @@ from cryptography.hazmat.primitives import padding
|
||||
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
|
||||
|
||||
from app.channels.base import Channel
|
||||
from app.channels.commands import extract_connect_code, is_known_channel_command
|
||||
from app.channels.commands import is_known_channel_command
|
||||
from app.channels.connection_identity import attach_connection_identity
|
||||
from app.channels.message_bus import InboundMessage, InboundMessageType, MessageBus, OutboundMessage, ResolvedAttachment
|
||||
|
||||
@@ -254,7 +254,6 @@ class WechatChannel(Channel):
|
||||
self._state_dir = self._resolve_state_dir(config.get("state_dir"))
|
||||
self._cursor_path = self._state_dir / "wechat-getupdates.json" if self._state_dir else None
|
||||
self._auth_path = self._state_dir / "wechat-auth.json" if self._state_dir else None
|
||||
self._connection_repo = config.get("connection_repo")
|
||||
self._load_state()
|
||||
|
||||
async def start(self) -> None:
|
||||
@@ -591,24 +590,16 @@ class WechatChannel(Channel):
|
||||
return
|
||||
|
||||
chat_id = str(raw_message.get("from_user_id") or raw_message.get("ilink_user_id") or "").strip()
|
||||
if not chat_id or not self._check_user(chat_id):
|
||||
if not chat_id:
|
||||
return
|
||||
|
||||
text = self._extract_text(raw_message)
|
||||
files = await self._extract_inbound_files(raw_message)
|
||||
if not text and not files:
|
||||
return
|
||||
|
||||
context_token = str(raw_message.get("context_token") or "").strip()
|
||||
thread_ts = context_token or str(raw_message.get("client_id") or raw_message.get("msg_id") or "").strip() or None
|
||||
|
||||
if context_token:
|
||||
self._context_tokens_by_chat[chat_id] = context_token
|
||||
if thread_ts:
|
||||
self._context_tokens_by_thread[thread_ts] = context_token
|
||||
|
||||
connect_code = extract_connect_code(text)
|
||||
if connect_code and self._connection_repo is not None:
|
||||
# Handle the connect code before applying allowed_users so a browser-initiated
|
||||
# bind can bootstrap an external identity that is not yet whitelisted.
|
||||
connect_code = self._pending_connect_code(text)
|
||||
if connect_code:
|
||||
handled = await self._bind_connection_from_connect_code(
|
||||
chat_id=chat_id,
|
||||
context_token=context_token,
|
||||
@@ -617,6 +608,20 @@ class WechatChannel(Channel):
|
||||
if handled:
|
||||
return
|
||||
|
||||
if not self._check_user(chat_id):
|
||||
return
|
||||
|
||||
files = await self._extract_inbound_files(raw_message)
|
||||
if not text and not files:
|
||||
return
|
||||
|
||||
thread_ts = context_token or str(raw_message.get("client_id") or raw_message.get("msg_id") or "").strip() or None
|
||||
|
||||
if context_token:
|
||||
self._context_tokens_by_chat[chat_id] = context_token
|
||||
if thread_ts:
|
||||
self._context_tokens_by_thread[thread_ts] = context_token
|
||||
|
||||
inbound = self._make_inbound(
|
||||
chat_id=chat_id,
|
||||
user_id=chat_id,
|
||||
|
||||
@@ -8,7 +8,7 @@ from collections.abc import Awaitable, Callable
|
||||
from typing import Any, cast
|
||||
|
||||
from app.channels.base import Channel
|
||||
from app.channels.commands import extract_connect_code, is_known_channel_command
|
||||
from app.channels.commands import is_known_channel_command
|
||||
from app.channels.connection_identity import attach_connection_identity
|
||||
from app.channels.message_bus import (
|
||||
InboundMessage,
|
||||
@@ -31,7 +31,6 @@ class WeComChannel(Channel):
|
||||
self._ws_frames: dict[str, dict[str, Any]] = {}
|
||||
self._ws_stream_ids: dict[str, str] = {}
|
||||
self._working_message = "Working on it..."
|
||||
self._connection_repo = config.get("connection_repo")
|
||||
|
||||
@property
|
||||
def supports_streaming(self) -> bool:
|
||||
@@ -295,8 +294,8 @@ class WeComChannel(Channel):
|
||||
|
||||
user_id = (body.get("from") or {}).get("userid")
|
||||
|
||||
connect_code = extract_connect_code(text)
|
||||
if connect_code and self._connection_repo is not None:
|
||||
connect_code = self._pending_connect_code(text)
|
||||
if connect_code:
|
||||
handled = await self._bind_connection_from_connect_code(
|
||||
frame=frame,
|
||||
user_id=str(user_id or ""),
|
||||
|
||||
Reference in New Issue
Block a user