mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-18 21:55:59 +00:00
fix(channels): add operational guardrails (#3584)
* fix(channels): add operational guardrails * make format * fix(channels): converge with #3582 to avoid merge-order conflicts Drop this PR's DingTalk INFO-log redaction and hand it to #3582, which already restructures that handler and will redact the same log there. This PR no longer touches dingtalk.py, so the two PRs can merge to main in any order without a conflict. For WeChat, drop the contested thread_ts priority reorder (review #3) and keep only what inbound dedupe needs: a server-stable message_id in the inbound metadata (message_id/msg_id, no client_id per review #6). This is a single added line inside the metadata dict, a region #3582 never touches, so it auto-merges regardless of order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): address three correctness review findings 1. Connect-code cap was racy (willem #1): _create_state ran delete-expired, count, and insert as three separate transactions, so concurrent connect POSTs from one owner could each see count < cap and all insert past it. Add ChannelConnectionRepository.create_oauth_state_within_cap which does delete+count+insert in a single transaction serialized per (owner, provider) — Postgres via pg_advisory_xact_lock, SQLite via the write lock the leading DELETE takes — and have the router use it. 2. Inbound dedupe key fell back to "" workspace (willem #3): two workspaces delivering without team/guild/aibotid would collapse to the same key and dedupe each other's messages. _inbound_dedupe_key now fails closed (returns None) when no workspace identifier is present. 3. Dedupe key was recorded on receipt and never released on failure (ShenAC #1): a transient error (DB blip, Gateway 503) left the key in place for the full TTL, so a provider redelivery of the same message_id — exactly the retry dedupe should absorb — was silently dropped. _handle_message now releases the key in the unexpected-exception branch so redelivery can recover, while keeping record-on-receipt so retries during handling are still deduped. Tests: repo cap enforcement incl. concurrent-issuance non-leak; dedupe fail-closed; dedupe key release-on-failure redelivery recovery. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): address cleanup/efficiency and test review findings Efficiency / cleanup: - Dedupe key set drops client-generated ids (client_msg_id, client_id); keep only server-stable event_id/message_id/msg_id, which a provider's own redelivery preserves (ShenAC #6). Every provider already emits message_id. - TTL/overflow pruning of _recent_inbound_events is now O(k): switch to an OrderedDict and popitem(last=False) from the front instead of scanning all 4096 entries on every inbound (willem #4). - Log "received inbound" only after the dedupe check so a provider retrying N times no longer logs N accepts; document that manager dedupe covers the agent run/final answer, not provider ack side-effects (willem #5, ShenAC #2). - Slack drops the redundant `team_id or event.get("team")` fallback the caller already resolved (willem #6). - create_oauth_state_within_cap prunes only this owner/provider's expired codes instead of a global DELETE on every connect POST; global cleanup still runs on consume_oauth_state (willem #7). Tests: - Dedupe test uses tmp_path instead of a leaked mkdtemp, uses distinct objects per publish, and adds a negative control: a different message_id is still processed, catching over-dedupe regressions (willem #8, ShenAC #4). - Slack HTTP-mode rejection test supplies app_token so the missing-token early return can't mask the guard, giving the state assertions teeth (ShenAC #3). - count_oauth_states test pins that the active row survives, not just the count (ShenAC #5). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * make format --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -98,24 +98,13 @@ def test_slack_send_uses_connection_bot_token_when_connection_id_is_present():
|
||||
anyio.run(go)
|
||||
|
||||
|
||||
def test_slack_http_events_mode_initializes_operator_web_client(monkeypatch):
|
||||
def test_slack_http_events_mode_is_rejected(monkeypatch, caplog):
|
||||
import anyio
|
||||
|
||||
from app.channels.slack import SlackChannel
|
||||
|
||||
class FakeWebClient:
|
||||
def __init__(self, token: str) -> None:
|
||||
self.token = token
|
||||
self.messages: list[dict] = []
|
||||
|
||||
def auth_test(self):
|
||||
return {"user_id": "B-http"}
|
||||
|
||||
def chat_postMessage(self, **kwargs):
|
||||
self.messages.append(kwargs)
|
||||
|
||||
slack_sdk = ModuleType("slack_sdk")
|
||||
slack_sdk.WebClient = FakeWebClient
|
||||
slack_sdk.WebClient = object
|
||||
socket_mode = ModuleType("slack_sdk.socket_mode")
|
||||
socket_mode.SocketModeClient = object
|
||||
response = ModuleType("slack_sdk.socket_mode.response")
|
||||
@@ -129,26 +118,20 @@ def test_slack_http_events_mode_initializes_operator_web_client(monkeypatch):
|
||||
bus=MessageBus(),
|
||||
config={
|
||||
"bot_token": "xoxb-operator",
|
||||
# Provide app_token too so the missing-token early return cannot
|
||||
# fire before the HTTP-mode guard — otherwise the state assertions
|
||||
# below would hold even if the guard were deleted.
|
||||
"app_token": "xapp-token",
|
||||
"event_delivery": "http",
|
||||
"connection_repo": MagicMock(),
|
||||
},
|
||||
)
|
||||
|
||||
await channel.start()
|
||||
assert channel._running is True
|
||||
assert channel._web_client is not None
|
||||
assert channel._web_client.token == "xoxb-operator"
|
||||
assert channel._bot_user_id == "B-http"
|
||||
with caplog.at_level("ERROR", logger="app.channels.slack"):
|
||||
await channel.start()
|
||||
|
||||
await channel._post_connection_reply("C123", "Slack connected to DeerFlow.", "1710000000.000100")
|
||||
|
||||
assert channel._web_client.messages == [
|
||||
{
|
||||
"channel": "C123",
|
||||
"text": "Slack connected to DeerFlow.",
|
||||
"thread_ts": "1710000000.000100",
|
||||
}
|
||||
]
|
||||
await channel.stop()
|
||||
assert channel._running is False
|
||||
assert channel._web_client is None
|
||||
assert "Slack HTTP Events mode is not supported" in caplog.text
|
||||
|
||||
anyio.run(go)
|
||||
|
||||
Reference in New Issue
Block a user