mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-18 13:46:02 +00:00
fix(channels): make channel connect flow deterministic (#3582)
* fix(channels): make channel connect flow deterministic * make format * fix(channels): apply connect-code before allowed_users on telegram and wechat The bind-bootstrap reorder shipped for slack/dingtalk only. Telegram and WeChat still gate _check_user/allowed_users before connect-code dispatch, so a newly allowlisted-but-unbound user is silently rejected when binding via the browser deep-link / connect-code flow — the same deadlock the PR fixes. - telegram: consume the /start deep-link token before the allowed_users gate. - wechat: handle the /connect code before the allowed_users gate, and defer inbound file extraction + context-token tracking past the gate so blocked senders no longer trigger CDN downloads or token bookkeeping. Adds regression tests for both adapters mirroring the slack/dingtalk coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): enforce single-active-owner invariant at the DB layer _revoke_other_active_owners did a SELECT-then-UPDATE in app code with no row lock or constraint covering active rows. Under READ COMMITTED, two concurrent connect-code consumes for the same (provider, external_account_id, workspace_id) from different owners could each observe "no other active owner" and both commit a connected row, leaving find_connection_by_external_identity nondeterministic. - Add a partial unique index on (provider, external_account_id, workspace_id) WHERE status != 'revoked' (portable to SQLite >= 3.8.0 and PostgreSQL) so the database guarantees at most one non-revoked row per external identity. - Reorder upsert_connection to revoke other owners' active rows before the new connected row is flushed (so the index is satisfied at commit), wrapped in a bounded rollback-and-retry loop. A losing concurrent writer now retries against the now-visible state instead of committing a duplicate. Adds DB-constraint, revoked-slot-reuse, and concurrent-upsert regression tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): harden connect-status polling primitive pollChannelConnectionUntilResolved was a free-floating recursive setTimeout started from onSuccess with no cancellation, no per-provider dedup, a redundant second endpoint per tick, and an unbounded loop on a non-finite expires_in. - Extract a framework-agnostic, cancellable poller (connect-poll.ts) that polls only listChannelConnections() and invalidates the providers query once when the bind resolves, instead of fetching both endpoints every tick. - Guard expires_in with a finite check + default window so undefined/NaN can no longer produce a poll loop that runs until the page closes. - Track one active poll handle per provider in useConnectChannelProvider via a ref Map: a new connect cancels the prior poll for that provider, and a useEffect cleanup cancels all polls on unmount. Adds unit tests for resolve-and-stop, cancellation, and non-finite-expiry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(channels): stop leaking blocked-sender content in DingTalk INFO log; document bind semantics Moving the allowed_users gate past _extract_text meant the parsed-message INFO log (text=%r, first 100 chars) fired for senders that allowed_users would have rejected, defeating the filter's noise/privacy role. Move that log to after the allowed_users gate so blocked senders' message text never reaches INFO logs. Also document the two operator-relevant semantic changes in backend/CLAUDE.md: connect-code dispatch runs before allowed_users (so allowed_users is no longer a bind-time defense; the model relies on code confidentiality + 600s TTL + one-time consumption), and the single-active-owner-per-external-identity transfer semantics now backed by the partial unique index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(channels): note connect-code-vs-allowlist and ownership transfer in operator guide Mirror the backend/CLAUDE.md notes in the operator-facing IM_CHANNEL_CONNECTIONS.md: connect codes are consumed before allowed_users (so a not-yet-allowlisted user can still complete a first bind, and allowed_users is not a bind-time defense), and an external identity has at most one active owner with last-bind-wins transfer enforced at the DB layer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(channels): lift connect-code dispatch into Channel base class Each adapter duplicated the ordering-sensitive boilerplate of extracting a /connect code and guarding on the connection repo before its allowed_users gate. The duplication is what let telegram/wechat drift and keep the gate ahead of the bind. Centralize it: - Move `_connection_repo` onto Channel.__init__ (removing 7 duplicate assignments). - Add Channel._pending_connect_code(text), which guards on the repo and extracts the code, documenting that adapters MUST consult it before authorization so a browser-initiated bind can bootstrap a not-yet-authorized identity. - Route slack, discord, feishu, dingtalk, wechat, and wecom through the helper. This also fixes a latent inconsistency where slack dispatched a bind even when no connection repo was configured. Pure refactor — the full channel suite stays green; adds a direct unit test for the base helper's contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * make format * fix(channels): redact DingTalk parsed-message INFO log content Log text_len instead of the first 100 chars of message text, so message content never reaches INFO logs (the after-gate move already keeps blocked senders out entirely). This takes over the redaction from #3584 so only this PR touches dingtalk.py, letting the two PRs merge in any order conflict-free. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,93 @@
|
||||
import type { ChannelConnection, ChannelProviderId } from "./types";
|
||||
|
||||
export const CONNECT_POLL_INTERVAL_MS = 2000;
|
||||
// Fallback bind window used when the backend response omits or garbles
|
||||
// `expires_in`, so a non-finite value can never produce an unbounded poll loop.
|
||||
const DEFAULT_CONNECT_EXPIRES_S = 600;
|
||||
|
||||
export interface ConnectPollHandle {
|
||||
cancel: () => void;
|
||||
}
|
||||
|
||||
export interface ConnectPollOptions {
|
||||
provider: ChannelProviderId;
|
||||
expiresInSeconds: number;
|
||||
/** Fetch the latest connections — the single source of truth for "connected". */
|
||||
fetchConnections: () => Promise<ChannelConnection[]>;
|
||||
/** Invoked once when the provider's connection resolves to "connected". */
|
||||
onConnected: () => void;
|
||||
intervalMs?: number;
|
||||
now?: () => number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Poll the connections endpoint until the given provider reports `connected`
|
||||
* or the bind window elapses. Returns a handle whose `cancel()` stops the loop
|
||||
* (used to dedup repeated connects and to clean up on unmount).
|
||||
*
|
||||
* Only the connections endpoint is polled; `onConnected` lets the caller refresh
|
||||
* derived provider state exactly once when the bind lands, instead of fetching
|
||||
* both endpoints on every tick.
|
||||
*/
|
||||
export function startConnectionPoll(
|
||||
options: ConnectPollOptions,
|
||||
): ConnectPollHandle {
|
||||
const {
|
||||
provider,
|
||||
expiresInSeconds,
|
||||
fetchConnections,
|
||||
onConnected,
|
||||
intervalMs = CONNECT_POLL_INTERVAL_MS,
|
||||
now = Date.now,
|
||||
} = options;
|
||||
|
||||
const expires =
|
||||
Number.isFinite(expiresInSeconds) && expiresInSeconds > 0
|
||||
? expiresInSeconds
|
||||
: DEFAULT_CONNECT_EXPIRES_S;
|
||||
const deadline = now() + expires * 1000;
|
||||
|
||||
let timer: ReturnType<typeof setTimeout> | undefined;
|
||||
let cancelled = false;
|
||||
|
||||
const cancel = () => {
|
||||
cancelled = true;
|
||||
if (timer !== undefined) {
|
||||
clearTimeout(timer);
|
||||
timer = undefined;
|
||||
}
|
||||
};
|
||||
|
||||
const schedule = () => {
|
||||
timer = setTimeout(() => {
|
||||
timer = undefined;
|
||||
if (cancelled) {
|
||||
return;
|
||||
}
|
||||
void fetchConnections()
|
||||
.then((connections) => {
|
||||
if (cancelled) {
|
||||
return;
|
||||
}
|
||||
const connected = connections.some(
|
||||
(item) => item.provider === provider && item.status === "connected",
|
||||
);
|
||||
if (connected) {
|
||||
onConnected();
|
||||
return;
|
||||
}
|
||||
if (now() < deadline) {
|
||||
schedule();
|
||||
}
|
||||
})
|
||||
.catch(() => {
|
||||
if (!cancelled && now() < deadline) {
|
||||
schedule();
|
||||
}
|
||||
});
|
||||
}, intervalMs);
|
||||
};
|
||||
|
||||
schedule();
|
||||
return { cancel };
|
||||
}
|
||||
@@ -1,4 +1,5 @@
|
||||
import { useMutation, useQuery, useQueryClient } from "@tanstack/react-query";
|
||||
import { useEffect, useRef } from "react";
|
||||
|
||||
import {
|
||||
configureChannelProvider,
|
||||
@@ -8,6 +9,7 @@ import {
|
||||
listChannelConnections,
|
||||
listChannelProviders,
|
||||
} from "./api";
|
||||
import { startConnectionPoll, type ConnectPollHandle } from "./connect-poll";
|
||||
import type { ChannelProviderId, ChannelRuntimeConfigValues } from "./types";
|
||||
|
||||
export const channelProviderQueryKey = ["channelProviders"] as const;
|
||||
@@ -36,14 +38,49 @@ export function useChannelConnections() {
|
||||
|
||||
export function useConnectChannelProvider() {
|
||||
const queryClient = useQueryClient();
|
||||
const pollersRef = useRef<Map<ChannelProviderId, ConnectPollHandle>>(
|
||||
new Map(),
|
||||
);
|
||||
|
||||
// Cancel any in-flight polls when the component using this hook unmounts.
|
||||
useEffect(() => {
|
||||
const pollers = pollersRef.current;
|
||||
return () => {
|
||||
pollers.forEach((handle) => handle.cancel());
|
||||
pollers.clear();
|
||||
};
|
||||
}, []);
|
||||
|
||||
return useMutation({
|
||||
mutationFn: (provider: ChannelProviderId) =>
|
||||
connectChannelProvider(provider),
|
||||
onSuccess: () => {
|
||||
onSuccess: (result, provider) => {
|
||||
void queryClient.invalidateQueries({ queryKey: channelProviderQueryKey });
|
||||
void queryClient.invalidateQueries({
|
||||
queryKey: channelConnectionsQueryKey,
|
||||
});
|
||||
|
||||
// Replace any existing poll for this provider so repeated Connect clicks
|
||||
// don't spawn parallel polling chains racing on the same query keys.
|
||||
pollersRef.current.get(provider)?.cancel();
|
||||
pollersRef.current.set(
|
||||
provider,
|
||||
startConnectionPoll({
|
||||
provider,
|
||||
expiresInSeconds: result.expires_in,
|
||||
fetchConnections: () =>
|
||||
queryClient.fetchQuery({
|
||||
queryKey: channelConnectionsQueryKey,
|
||||
queryFn: () => listChannelConnections(),
|
||||
}),
|
||||
onConnected: () => {
|
||||
// Refresh derived provider state exactly once when the bind lands.
|
||||
void queryClient.invalidateQueries({
|
||||
queryKey: channelProviderQueryKey,
|
||||
});
|
||||
},
|
||||
}),
|
||||
);
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
@@ -0,0 +1,101 @@
|
||||
import { afterEach, beforeEach, describe, expect, test, vi } from "vitest";
|
||||
|
||||
import { startConnectionPoll } from "@/core/channels/connect-poll";
|
||||
import type { ChannelConnection } from "@/core/channels/types";
|
||||
|
||||
function connection(provider: string, status: string): ChannelConnection {
|
||||
return {
|
||||
id: `${provider}-1`,
|
||||
provider,
|
||||
status,
|
||||
scopes: [],
|
||||
metadata: {},
|
||||
};
|
||||
}
|
||||
|
||||
beforeEach(() => {
|
||||
vi.useFakeTimers();
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
vi.useRealTimers();
|
||||
});
|
||||
|
||||
describe("startConnectionPoll", () => {
|
||||
test("polls connections until the provider is connected, then resolves once", async () => {
|
||||
const responses: ChannelConnection[][] = [
|
||||
[connection("telegram", "pending")],
|
||||
[connection("telegram", "connected")],
|
||||
];
|
||||
const fetchConnections = vi.fn(async () => responses.shift() ?? []);
|
||||
const onConnected = vi.fn();
|
||||
|
||||
startConnectionPoll({
|
||||
provider: "telegram",
|
||||
expiresInSeconds: 600,
|
||||
fetchConnections,
|
||||
onConnected,
|
||||
intervalMs: 1000,
|
||||
});
|
||||
|
||||
await vi.advanceTimersByTimeAsync(1000);
|
||||
expect(fetchConnections).toHaveBeenCalledTimes(1);
|
||||
expect(onConnected).not.toHaveBeenCalled();
|
||||
|
||||
await vi.advanceTimersByTimeAsync(1000);
|
||||
expect(fetchConnections).toHaveBeenCalledTimes(2);
|
||||
expect(onConnected).toHaveBeenCalledTimes(1);
|
||||
|
||||
// No further polling after the connection resolves.
|
||||
await vi.advanceTimersByTimeAsync(5000);
|
||||
expect(fetchConnections).toHaveBeenCalledTimes(2);
|
||||
});
|
||||
|
||||
test("cancel() stops scheduled polling and fires no further fetches", async () => {
|
||||
const fetchConnections = vi.fn(async () => [
|
||||
connection("telegram", "pending"),
|
||||
]);
|
||||
const handle = startConnectionPoll({
|
||||
provider: "telegram",
|
||||
expiresInSeconds: 600,
|
||||
fetchConnections,
|
||||
onConnected: vi.fn(),
|
||||
intervalMs: 1000,
|
||||
});
|
||||
|
||||
await vi.advanceTimersByTimeAsync(1000);
|
||||
expect(fetchConnections).toHaveBeenCalledTimes(1);
|
||||
|
||||
handle.cancel();
|
||||
await vi.advanceTimersByTimeAsync(10000);
|
||||
expect(fetchConnections).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
test("a non-finite expires_in falls back to a finite deadline and terminates", async () => {
|
||||
const fetchConnections = vi.fn(async () => [
|
||||
connection("telegram", "pending"),
|
||||
]);
|
||||
let nowValue = 0;
|
||||
startConnectionPoll({
|
||||
provider: "telegram",
|
||||
expiresInSeconds: Number.NaN,
|
||||
fetchConnections,
|
||||
onConnected: vi.fn(),
|
||||
intervalMs: 1000,
|
||||
now: () => nowValue,
|
||||
});
|
||||
|
||||
nowValue = 1;
|
||||
await vi.advanceTimersByTimeAsync(1000);
|
||||
expect(fetchConnections).toHaveBeenCalledTimes(1);
|
||||
|
||||
// Jump past the fallback expiry window: the loop must stop instead of
|
||||
// running forever (Date.now() >= NaN would otherwise never be true).
|
||||
nowValue = 10_000_000;
|
||||
await vi.advanceTimersByTimeAsync(1000);
|
||||
expect(fetchConnections).toHaveBeenCalledTimes(2);
|
||||
|
||||
await vi.advanceTimersByTimeAsync(10000);
|
||||
expect(fetchConnections).toHaveBeenCalledTimes(2);
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user