Commit Graph

2 Commits

Author SHA1 Message Date
Nan Gao 68ba4198b8 fix(channels): make channel connect flow deterministic (#3582)
* fix(channels): make channel connect flow deterministic

* make format

* fix(channels): apply connect-code before allowed_users on telegram and wechat

The bind-bootstrap reorder shipped for slack/dingtalk only. Telegram and
WeChat still gate _check_user/allowed_users before connect-code dispatch, so
a newly allowlisted-but-unbound user is silently rejected when binding via the
browser deep-link / connect-code flow — the same deadlock the PR fixes.

- telegram: consume the /start deep-link token before the allowed_users gate.
- wechat: handle the /connect code before the allowed_users gate, and defer
  inbound file extraction + context-token tracking past the gate so blocked
  senders no longer trigger CDN downloads or token bookkeeping.

Adds regression tests for both adapters mirroring the slack/dingtalk coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): enforce single-active-owner invariant at the DB layer

_revoke_other_active_owners did a SELECT-then-UPDATE in app code with no row
lock or constraint covering active rows. Under READ COMMITTED, two concurrent
connect-code consumes for the same (provider, external_account_id, workspace_id)
from different owners could each observe "no other active owner" and both commit
a connected row, leaving find_connection_by_external_identity nondeterministic.

- Add a partial unique index on (provider, external_account_id, workspace_id)
  WHERE status != 'revoked' (portable to SQLite >= 3.8.0 and PostgreSQL) so the
  database guarantees at most one non-revoked row per external identity.
- Reorder upsert_connection to revoke other owners' active rows before the new
  connected row is flushed (so the index is satisfied at commit), wrapped in a
  bounded rollback-and-retry loop. A losing concurrent writer now retries
  against the now-visible state instead of committing a duplicate.

Adds DB-constraint, revoked-slot-reuse, and concurrent-upsert regression tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): harden connect-status polling primitive

pollChannelConnectionUntilResolved was a free-floating recursive setTimeout
started from onSuccess with no cancellation, no per-provider dedup, a redundant
second endpoint per tick, and an unbounded loop on a non-finite expires_in.

- Extract a framework-agnostic, cancellable poller (connect-poll.ts) that polls
  only listChannelConnections() and invalidates the providers query once when the
  bind resolves, instead of fetching both endpoints every tick.
- Guard expires_in with a finite check + default window so undefined/NaN can no
  longer produce a poll loop that runs until the page closes.
- Track one active poll handle per provider in useConnectChannelProvider via a
  ref Map: a new connect cancels the prior poll for that provider, and a useEffect
  cleanup cancels all polls on unmount.

Adds unit tests for resolve-and-stop, cancellation, and non-finite-expiry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): stop leaking blocked-sender content in DingTalk INFO log; document bind semantics

Moving the allowed_users gate past _extract_text meant the parsed-message INFO
log (text=%r, first 100 chars) fired for senders that allowed_users would have
rejected, defeating the filter's noise/privacy role. Move that log to after the
allowed_users gate so blocked senders' message text never reaches INFO logs.

Also document the two operator-relevant semantic changes in backend/CLAUDE.md:
connect-code dispatch runs before allowed_users (so allowed_users is no longer a
bind-time defense; the model relies on code confidentiality + 600s TTL + one-time
consumption), and the single-active-owner-per-external-identity transfer semantics
now backed by the partial unique index.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(channels): note connect-code-vs-allowlist and ownership transfer in operator guide

Mirror the backend/CLAUDE.md notes in the operator-facing IM_CHANNEL_CONNECTIONS.md:
connect codes are consumed before allowed_users (so a not-yet-allowlisted user can
still complete a first bind, and allowed_users is not a bind-time defense), and an
external identity has at most one active owner with last-bind-wins transfer enforced
at the DB layer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(channels): lift connect-code dispatch into Channel base class

Each adapter duplicated the ordering-sensitive boilerplate of extracting a
/connect code and guarding on the connection repo before its allowed_users gate.
The duplication is what let telegram/wechat drift and keep the gate ahead of the
bind. Centralize it:

- Move `_connection_repo` onto Channel.__init__ (removing 7 duplicate assignments).
- Add Channel._pending_connect_code(text), which guards on the repo and extracts
  the code, documenting that adapters MUST consult it before authorization so a
  browser-initiated bind can bootstrap a not-yet-authorized identity.
- Route slack, discord, feishu, dingtalk, wechat, and wecom through the helper.
  This also fixes a latent inconsistency where slack dispatched a bind even when
  no connection repo was configured.

Pure refactor — the full channel suite stays green; adds a direct unit test for
the base helper's contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* make format

* fix(channels): redact DingTalk parsed-message INFO log content

Log text_len instead of the first 100 chars of message text, so message
content never reaches INFO logs (the after-gate move already keeps blocked
senders out entirely). This takes over the redaction from #3584 so only this
PR touches dingtalk.py, letting the two PRs merge in any order conflict-free.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 10:15:31 +08:00
He Wang 08afdcb907 feat(channels): add DingTalk channel integration (#2628)
* feat(channels): add DingTalk channel integration

Add a new DingTalk messaging channel using the dingtalk-stream SDK
with Stream Push (WebSocket), requiring no public IP. Supports both
plain sampleMarkdown replies and optional AI Card streaming for a
typewriter effect when card_template_id is configured.

- Add DingTalkChannel implementation with token management, message
  routing, allowed_users filtering, and markdown adaptation
- Register dingtalk in channel service registry and capability map
- Propagate inbound metadata to outbound messages in ChannelManager
  for DingTalk sender context (sender_staff_id, conversation_type)
- Add dingtalk-stream dependency to pyproject.toml
- Add configuration examples in config.example.yaml and .env.example
- Update all README translations with setup instructions
- Add comprehensive test suite (test_dingtalk_channel.py) and
  metadata propagation test in test_channels.py
- Update backend CLAUDE.md to document DingTalk channel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): address PR review feedback for DingTalk integration

- Replace runtime mutation of CHANNEL_CAPABILITIES with a
  `supports_streaming` property on the Channel base class, overridden
  by DingTalkChannel, FeishuChannel, and WeComChannel
- Store stream client reference and attempt graceful disconnect in
  stop(); guard _on_chatbot_message with _running check to prevent
  post-stop message processing
- Use msg.chat_id as the primary routing key in send/send_file via
  a shared _resolve_routing helper, with metadata as fallback
- Fix process() return type annotation from tuple[str, str] to
  tuple[int, str] to match AckMessage.STATUS_OK
- Protect _incoming_messages with threading.Lock for cross-thread
  safety between the Stream Push thread and the asyncio loop
- Re-add Docker Compose URL guidance removed during DingTalk setup
  docs addition in README.md
- Fix incomplete sentence in README_zh.md (missing verb "启用")

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(docs): restore plain paragraph format for Docker Compose note

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): fix isinstance TypeError and add file size guard in DingTalk channel

Use tuple syntax for isinstance() type check to avoid runtime TypeError
with PEP 604 union types. Add upload size limit (20MB) before reading
files into memory. Narrow exception handlers to specific types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): propagate markdown fallback errors and validate access token response

- Re-raise exceptions in _send_markdown_fallback to prevent partial
  deliveries (files sent without accompanying text)
- Validate _get_access_token response: reject non-dict bodies, empty
  tokens, and coerce invalid expireIn to a safe default
- Add tests for both fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): validate upload response and broaden send_file exception handling

- Validate _upload_media JSON response: handle JSONDecodeError and
  non-dict payloads gracefully by returning None
- Broaden send_file exception tuple to include TypeError and
  AttributeError for unexpected JSON shapes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): fix streaming race on channel registration and slim outbound metadata

- Register channel in service before calling start() to avoid race
  where background receiver publishes inbound before registration,
  causing manager to fall back to static CHANNEL_CAPABILITIES
- Strip known-large metadata keys (raw_message, ref_msg) from outbound
  messages to prevent memory bloat from propagated inbound payloads

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update service.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update CLAUDE.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-30 11:25:33 +08:00