fix(channels): make channel connect flow deterministic (#3582)

* fix(channels): make channel connect flow deterministic * make format * fix(channels): apply connect-code before allowed_users on telegram and wechat The bind-bootstrap reorder shipped for slack/dingtalk only. Telegram and WeChat still gate _check_user/allowed_users before connect-code dispatch, so a newly allowlisted-but-unbound user is silently rejected when binding via the browser deep-link / connect-code flow — the same deadlock the PR fixes. - telegram: consume the /start deep-link token before the allowed_users gate. - wechat: handle the /connect code before the allowed_users gate, and defer inbound file extraction + context-token tracking past the gate so blocked senders no longer trigger CDN downloads or token bookkeeping. Adds regression tests for both adapters mirroring the slack/dingtalk coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): enforce single-active-owner invariant at the DB layer _revoke_other_active_owners did a SELECT-then-UPDATE in app code with no row lock or constraint covering active rows. Under READ COMMITTED, two concurrent connect-code consumes for the same (provider, external_account_id, workspace_id) from different owners could each observe "no other active owner" and both commit a connected row, leaving find_connection_by_external_identity nondeterministic. - Add a partial unique index on (provider, external_account_id, workspace_id) WHERE status != 'revoked' (portable to SQLite >= 3.8.0 and PostgreSQL) so the database guarantees at most one non-revoked row per external identity. - Reorder upsert_connection to revoke other owners' active rows before the new connected row is flushed (so the index is satisfied at commit), wrapped in a bounded rollback-and-retry loop. A losing concurrent writer now retries against the now-visible state instead of committing a duplicate. Adds DB-constraint, revoked-slot-reuse, and concurrent-upsert regression tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): harden connect-status polling primitive pollChannelConnectionUntilResolved was a free-floating recursive setTimeout started from onSuccess with no cancellation, no per-provider dedup, a redundant second endpoint per tick, and an unbounded loop on a non-finite expires_in. - Extract a framework-agnostic, cancellable poller (connect-poll.ts) that polls only listChannelConnections() and invalidates the providers query once when the bind resolves, instead of fetching both endpoints every tick. - Guard expires_in with a finite check + default window so undefined/NaN can no longer produce a poll loop that runs until the page closes. - Track one active poll handle per provider in useConnectChannelProvider via a ref Map: a new connect cancels the prior poll for that provider, and a useEffect cleanup cancels all polls on unmount. Adds unit tests for resolve-and-stop, cancellation, and non-finite-expiry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): stop leaking blocked-sender content in DingTalk INFO log; document bind semantics Moving the allowed_users gate past _extract_text meant the parsed-message INFO log (text=%r, first 100 chars) fired for senders that allowed_users would have rejected, defeating the filter's noise/privacy role. Move that log to after the allowed_users gate so blocked senders' message text never reaches INFO logs. Also document the two operator-relevant semantic changes in backend/CLAUDE.md: connect-code dispatch runs before allowed_users (so allowed_users is no longer a bind-time defense; the model relies on code confidentiality + 600s TTL + one-time consumption), and the single-active-owner-per-external-identity transfer semantics now backed by the partial unique index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(channels): note connect-code-vs-allowlist and ownership transfer in operator guide Mirror the backend/CLAUDE.md notes in the operator-facing IM_CHANNEL_CONNECTIONS.md: connect codes are consumed before allowed_users (so a not-yet-allowlisted user can still complete a first bind, and allowed_users is not a bind-time defense), and an external identity has at most one active owner with last-bind-wins transfer enforced at the DB layer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(channels): lift connect-code dispatch into Channel base class Each adapter duplicated the ordering-sensitive boilerplate of extracting a /connect code and guarding on the connection repo before its allowed_users gate. The duplication is what let telegram/wechat drift and keep the gate ahead of the bind. Centralize it: - Move `_connection_repo` onto Channel.__init__ (removing 7 duplicate assignments). - Add Channel._pending_connect_code(text), which guards on the repo and extracts the code, documenting that adapters MUST consult it before authorization so a browser-initiated bind can bootstrap a not-yet-authorized identity. - Route slack, discord, feishu, dingtalk, wechat, and wecom through the helper. This also fixes a latent inconsistency where slack dispatched a bind even when no connection repo was configured. Pure refactor — the full channel suite stays green; adds a direct unit test for the base helper's contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * make format * fix(channels): redact DingTalk parsed-message INFO log content Log text_len instead of the first 100 chars of message text, so message content never reaches INFO logs (the after-gate move already keeps blocked senders out entirely). This takes over the redaction from #3584 so only this PR touches dingtalk.py, letting the two PRs merge in any order conflict-free. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 21:55:59 +00:00 · 2026-06-18 04:15:31 +02:00
parent 8c0830aea1
commit 68ba4198b8
21 changed files with 695 additions and 80 deletions
@@ -5,7 +5,35 @@ from __future__ import annotations
 from datetime import UTC, datetime, timedelta
 from unittest.mock import AsyncMock, MagicMock

-from app.channels.message_bus import InboundMessage, MessageBus
+from app.channels.base import Channel
+from app.channels.message_bus import InboundMessage, MessageBus, OutboundMessage
+
+
+class _StubChannel(Channel):
+    """Minimal concrete Channel used to exercise base-class helpers directly."""
+
+    async def start(self) -> None:  # pragma: no cover - not exercised
+        pass
+
+    async def stop(self) -> None:  # pragma: no cover - not exercised
+        pass
+
+    async def send(self, msg: OutboundMessage) -> None:  # pragma: no cover - not exercised
+        pass
+
+
+def test_pending_connect_code_extracts_code_when_connections_configured():
+    channel = _StubChannel(name="stub", bus=MessageBus(), config={"connection_repo": object()})
+    # A connect command yields its code; ordinary text does not.
+    assert channel._pending_connect_code("/connect abc123") == "abc123"
+    assert channel._pending_connect_code("hello world") is None
+
+
+def test_pending_connect_code_is_none_when_connections_disabled():
+    # With no connection repo, binding is not configured and connect codes are
+    # ignored so the message falls through to normal handling.
+    channel = _StubChannel(name="stub", bus=MessageBus(), config={})
+    assert channel._pending_connect_code("/connect abc123") is None


 async def _make_repo(tmp_path, name: str):
@@ -88,6 +88,119 @@ class TestChannelConnectionRepository:
        assert second["external_account_name"] == "Alice Telegram"
        assert len(await repo.list_connections("alice")) == 1

+    @pytest.mark.anyio
+    async def test_upsert_connection_transfers_external_identity_between_owners(self, repo):
+        await repo.upsert_connection(
+            owner_user_id="alice",
+            provider="slack",
+            external_account_id="U-shared",
+            workspace_id="T1",
+            status="connected",
+        )
+
+        bob = await repo.upsert_connection(
+            owner_user_id="bob",
+            provider="slack",
+            external_account_id="U-shared",
+            workspace_id="T1",
+            status="connected",
+        )
+
+        alice_rows = await repo.list_connections("alice")
+        resolved = await repo.find_connection_by_external_identity(
+            provider="slack",
+            external_account_id="U-shared",
+            workspace_id="T1",
+        )
+
+        assert alice_rows[0]["status"] == "revoked"
+        assert bob["status"] == "connected"
+        assert resolved is not None
+        assert resolved["owner_user_id"] == "bob"
+        assert resolved["id"] == bob["id"]
+
+    @pytest.mark.anyio
+    async def test_active_identity_unique_index_rejects_second_connected_owner(self, repo):
+        # The single-active-owner invariant must be enforced by the database, not
+        # only by the app-level revoke step (which can race under READ COMMITTED).
+        from sqlalchemy.exc import IntegrityError
+
+        await repo.upsert_connection(
+            owner_user_id="alice",
+            provider="slack",
+            external_account_id="U-shared",
+            workspace_id="T1",
+            status="connected",
+        )
+
+        with pytest.raises(IntegrityError):
+            async with repo.session_factory() as session:
+                session.add(
+                    ChannelConnectionRow(
+                        id="manual-duplicate-active",
+                        owner_user_id="bob",
+                        provider="slack",
+                        external_account_id="U-shared",
+                        workspace_id="T1",
+                        status="connected",
+                    )
+                )
+                await session.commit()
+
+    @pytest.mark.anyio
+    async def test_active_identity_unique_index_allows_revoked_rows(self, repo):
+        # A revoked row must not occupy the active-identity slot, so a fresh
+        # connected bind for the same identity is allowed afterwards.
+        first = await repo.upsert_connection(
+            owner_user_id="alice",
+            provider="slack",
+            external_account_id="U-shared",
+            workspace_id="T1",
+            status="connected",
+        )
+        await repo.disconnect_connection(connection_id=first["id"], owner_user_id="alice")
+
+        second = await repo.upsert_connection(
+            owner_user_id="bob",
+            provider="slack",
+            external_account_id="U-shared",
+            workspace_id="T1",
+            status="connected",
+        )
+        assert second["status"] == "connected"
+
+    @pytest.mark.anyio
+    async def test_concurrent_upserts_keep_single_active_owner(self, repo):
+        import asyncio
+
+        async def connect(owner: str):
+            return await repo.upsert_connection(
+                owner_user_id=owner,
+                provider="slack",
+                external_account_id="U-shared",
+                workspace_id="T1",
+                status="connected",
+            )
+
+        await asyncio.gather(connect("alice"), connect("bob"))
+
+        async with repo.session_factory() as session:
+            connected = (
+                (
+                    await session.execute(
+                        select(ChannelConnectionRow).where(
+                            ChannelConnectionRow.provider == "slack",
+                            ChannelConnectionRow.external_account_id == "U-shared",
+                            ChannelConnectionRow.workspace_id == "T1",
+                            ChannelConnectionRow.status == "connected",
+                        )
+                    )
+                )
+                .scalars()
+                .all()
+            )
+        assert len(connected) == 1
+
    @pytest.mark.anyio
    async def test_credentials_are_encrypted_at_rest_and_decrypted_by_repository(self, repo):
        connection = await repo.upsert_connection(
@@ -4881,6 +4881,41 @@ class TestSlackAllowedUsers:
        assert inbound.chat_id == "C123"
        assert inbound.text == "hello from slack"

+    def test_connect_code_bypasses_allowed_users_filter(self):
+        from app.channels.slack import SlackChannel
+
+        bus = MessageBus()
+        bus.publish_inbound = AsyncMock()
+        channel = SlackChannel(
+            bus=bus,
+            config={"allowed_users": ["U-allowed"], "connection_repo": object()},
+        )
+        channel._loop = MagicMock()
+        channel._loop.is_running.return_value = True
+        channel._bind_connection_from_connect_code = AsyncMock(return_value=True)
+        channel._add_reaction = MagicMock()
+        channel._send_running_reply = MagicMock()
+
+        event = {
+            "user": "U-blocked",
+            "text": "/connect slack-bind-code",
+            "team": "T123",
+            "channel": "C123",
+            "ts": "1710000000.000100",
+        }
+
+        with patch(
+            "app.channels.slack.asyncio.run_coroutine_threadsafe",
+            side_effect=self._submit_coro,
+        ) as submit:
+            channel._handle_message_event(event)
+
+        channel._bind_connection_from_connect_code.assert_called_once()
+        submit.assert_called_once()
+        bus.publish_inbound.assert_not_awaited()
+        channel._add_reaction.assert_not_called()
+        channel._send_running_reply.assert_not_called()
+
    def test_app_mention_strips_leading_bot_mention_before_command_detection(self):
        from app.channels.slack import SlackChannel

@@ -435,6 +435,49 @@ class TestAllowedUsersFiltering:

        _run(go())

+    def test_non_allowed_user_message_content_not_logged(self, caplog):
+        import logging
+
+        async def go():
+            bus = MessageBus()
+            bus.publish_inbound = AsyncMock()
+            channel = DingTalkChannel(bus, config={"allowed_users": ["user_001"]})
+            channel._client_id = "test_key"
+            channel._main_loop = asyncio.get_event_loop()
+            channel._running = True
+
+            msg = _make_chatbot_message(sender_staff_id="user_blocked", text="secret blocked content")
+            with caplog.at_level(logging.INFO, logger="app.channels.dingtalk"):
+                channel._on_chatbot_message(msg)
+                await asyncio.sleep(0.1)
+
+            bus.publish_inbound.assert_not_awaited()
+            # The parsed-message INFO log (with message content) must not fire for
+            # a blocked sender — allowed_users still acts as a privacy/noise filter.
+            assert "parsed message" not in caplog.text
+            assert "secret blocked content" not in caplog.text
+
+        _run(go())
+
+    def test_connect_code_bypasses_allowed_users_filter(self):
+        async def go():
+            bus = MessageBus()
+            bus.publish_inbound = AsyncMock()
+            channel = DingTalkChannel(bus, config={"allowed_users": ["user_001"], "connection_repo": object()})
+            channel._client_id = "test_key"
+            channel._main_loop = asyncio.get_event_loop()
+            channel._running = True
+            channel._bind_connection_from_connect_code = AsyncMock(return_value=True)
+
+            msg = _make_chatbot_message(sender_staff_id="user_blocked", text="/connect dingtalk-bind-code")
+            channel._on_chatbot_message(msg)
+
+            await asyncio.sleep(0.1)
+            channel._bind_connection_from_connect_code.assert_awaited_once()
+            bus.publish_inbound.assert_not_awaited()
+
+        _run(go())
+
    def test_empty_allowed_users_allows_all(self):
        async def go():
            bus = MessageBus()
@@ -71,6 +71,38 @@ async def test_start_with_deep_link_state_binds_telegram_chat(repo):
    assert "connected" in update.message.reply_text.await_args.args[0].lower()


+@pytest.mark.anyio
+async def test_start_token_bypasses_allowed_users_filter(repo):
+    # A newly allowlisted-but-unbound user must be able to bootstrap their first
+    # bind via the deep-link start token even though their Telegram id is not yet
+    # in allowed_users. The allowed_users gate must run after token handling.
+    state = "telegram-bind-state"
+    await repo.create_oauth_state(
+        owner_user_id="deerflow-user-1",
+        provider="telegram",
+        state=state,
+        expires_at=datetime.now(UTC) + timedelta(minutes=5),
+    )
+    channel = TelegramChannel(
+        bus=MessageBus(),
+        config={
+            "bot_token": "test-token",
+            "connection_repo": repo,
+            "allowed_users": [999],  # newcomer (42) is not whitelisted
+        },
+    )
+    update = _telegram_update(text=f"/start {state}", user_id=42)
+    context = MagicMock()
+    context.args = [state]
+
+    await channel._cmd_start(update, context)
+
+    connections = await repo.list_connections("deerflow-user-1")
+    assert len(connections) == 1
+    assert connections[0]["external_account_id"] == "42"
+    assert "connected" in update.message.reply_text.await_args.args[0].lower()
+
+
@pytest.mark.anyio
 async def test_bound_telegram_message_publishes_connection_identity(repo):
    connection = await repo.upsert_connection(
@@ -7,6 +7,7 @@ import base64
 import json
 from pathlib import Path
 from typing import Any
+from unittest.mock import AsyncMock

 from app.channels.message_bus import InboundMessageType, MessageBus, OutboundMessage

@@ -359,6 +360,66 @@ def test_allowed_users_filter_blocks_non_whitelisted_sender():
    _run(go())


+def test_connect_code_bypasses_allowed_users_filter(tmp_path: Path):
+    from app.channels.wechat import WechatChannel
+    from deerflow.persistence.channel_connections import ChannelConnectionRepository, ChannelCredentialCipher
+    from deerflow.persistence.engine import close_engine, get_session_factory, init_engine
+
+    async def go():
+        from datetime import UTC, datetime, timedelta
+
+        await init_engine("sqlite", url=f"sqlite+aiosqlite:///{tmp_path / 'wechat.db'}", sqlite_dir=str(tmp_path))
+        try:
+            repo = ChannelConnectionRepository(
+                get_session_factory(),
+                cipher=ChannelCredentialCipher.from_key("wechat-secret"),
+            )
+            code = "wechat-bind-code"
+            await repo.create_oauth_state(
+                owner_user_id="deerflow-user-1",
+                provider="wechat",
+                state=code,
+                expires_at=datetime.now(UTC) + timedelta(minutes=5),
+            )
+
+            bus = MessageBus()
+            published = []
+
+            async def capture(msg):
+                published.append(msg)
+
+            bus.publish_inbound = capture  # type: ignore[method-assign]
+
+            # The newcomer ("blocked-user") is not in allowed_users yet, but a valid
+            # /connect code must still bootstrap their first bind.
+            channel = WechatChannel(
+                bus=bus,
+                config={"bot_token": "test-token", "allowed_users": ["allowed-user"], "connection_repo": repo},
+            )
+            channel._send_connection_reply = AsyncMock()  # type: ignore[method-assign]
+
+            await channel._handle_update(
+                {
+                    "message_type": 1,
+                    "from_user_id": "blocked-user",
+                    "context_token": "ctx-connect",
+                    "item_list": [{"type": 1, "text_item": {"text": f"/connect {code}"}}],
+                }
+            )
+
+            connections = await repo.list_connections("deerflow-user-1")
+            assert len(connections) == 1
+            assert connections[0]["provider"] == "wechat"
+            assert connections[0]["external_account_id"] == "blocked-user"
+            # The connect-code reply was sent and no normal inbound was published.
+            channel._send_connection_reply.assert_awaited_once()
+            assert published == []
+        finally:
+            await close_engine()
+
+    _run(go())
+
+
 def test_send_uses_cached_context_token(monkeypatch):
    from app.channels.wechat import WechatChannel