Commit Graph

6 Commits

Author SHA1 Message Date
Nan Gao 525af0da14 fix(channels): scope IM files and helper commands to owner (#3579)
* fix(channels): scope IM files and helper commands to owner

* fix(memory): honor bound IM owner for /memory gateway endpoints

The channel manager already attaches X-DeerFlow-Owner-User-Id for /memory
and /models, but the memory router resolved user_id solely from
get_effective_user_id(), which returns the synthetic internal user
(DEFAULT_USER_ID) for channel workers. A bound IM /memory therefore read
the default/internal memory instead of the connection owner's.

Resolve the owner via _resolve_memory_user_id(request) across all
/api/memory* endpoints: trusted internal callers act for the owner header,
browser/API callers fall back to get_effective_user_id(). Mirrors the
threads router's get_trusted_internal_owner_user_id pattern, completing
acceptance criterion #3 of #3539.

Add end-to-end tests asserting the resolved user_id (not just that the
header is sent) and that a spoofed owner header from a browser user is
ignored.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): align memory bucket and reuse cached storage owner

Address PR #3579 review feedback:

- Memory router now sanitizes the trusted owner header via make_safe_user_id
  before routing, matching the channel file pipeline
  (_safe_user_id_for_run/prepare_user_dir_for_raw_id). A bound owner id needing
  sanitization now resolves to the same bucket as its files/uploads instead of
  500ing in _validate_user_id.
- _handle_chat reuses the storage_user_id cached at the top of the method for
  artifact delivery instead of re-deriving _channel_storage_user_id(msg), so
  uploads and outputs cannot drift to different buckets if a channel rewrites
  the InboundMessage in receive_file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): stage unbound IM files under the run's user bucket

Address PR #3579 review feedback (#5): _channel_storage_user_id now mirrors
_resolve_run_params' identity policy, falling back to safe(msg.user_id) instead
of returning None for unbound auth-enabled channels.

Previously an unbound msg ran under safe(platform_user_id) but staged uploads
under get_effective_user_id() in the dispatcher task (unset contextvar ->
"default"), so files landed in users/default/... while the agent read from
users/{safe_platform_user_id}/.... Bound and unbound channels now write where
the agent reads. Returns None only when no identity is available.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): reuse cached storage owner in streaming artifact delivery

Address PR #3579 review feedback (#6): thread the storage_user_id resolved in
_handle_chat into _handle_streaming_chat instead of re-deriving
_channel_storage_user_id(msg) in the finally block. Avoids re-running
_safe_user_id_for_run (and its possible filesystem touch) on the streaming-error
path and guarantees artifact delivery targets the same bucket as the uploads.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(channels): document owner-scoped IM file storage

Address PR #3579 review feedback (#4): the IM Channels and File Upload sections
still described pre-PR default-bucket behaviour. Document that receive_file,
_ingest_inbound_files/ensure_uploads_dir/get_uploads_dir, and
_resolve_attachments/_prepare_artifact_delivery are owner-scoped via the user_id
kwarg, and that the bucket matches the memory bucket from _resolve_memory_user_id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(channels): unify run identity and storage bucket resolution

Address PR #3579 review feedback (#3): _resolve_run_params no longer duplicates
the owner-resolution rule inline. After the #5 fix the inline block and
_channel_storage_user_id computed the identical sanitized-with-platform-fallback
value, so the run identity now calls the same helper, making it the single
source of truth for run_context["user_id"] and the file/artifact storage bucket.

_owner_headers stays deliberately separate: it sends the raw owner id over HTTP
for the gateway to re-resolve (no sanitize, no platform fallback), documented on
both helpers. test_run_identity_matches_storage_bucket pins the two together so
they cannot drift again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 11:45:35 +08:00
Nan Gao 68ba4198b8 fix(channels): make channel connect flow deterministic (#3582)
* fix(channels): make channel connect flow deterministic

* make format

* fix(channels): apply connect-code before allowed_users on telegram and wechat

The bind-bootstrap reorder shipped for slack/dingtalk only. Telegram and
WeChat still gate _check_user/allowed_users before connect-code dispatch, so
a newly allowlisted-but-unbound user is silently rejected when binding via the
browser deep-link / connect-code flow — the same deadlock the PR fixes.

- telegram: consume the /start deep-link token before the allowed_users gate.
- wechat: handle the /connect code before the allowed_users gate, and defer
  inbound file extraction + context-token tracking past the gate so blocked
  senders no longer trigger CDN downloads or token bookkeeping.

Adds regression tests for both adapters mirroring the slack/dingtalk coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): enforce single-active-owner invariant at the DB layer

_revoke_other_active_owners did a SELECT-then-UPDATE in app code with no row
lock or constraint covering active rows. Under READ COMMITTED, two concurrent
connect-code consumes for the same (provider, external_account_id, workspace_id)
from different owners could each observe "no other active owner" and both commit
a connected row, leaving find_connection_by_external_identity nondeterministic.

- Add a partial unique index on (provider, external_account_id, workspace_id)
  WHERE status != 'revoked' (portable to SQLite >= 3.8.0 and PostgreSQL) so the
  database guarantees at most one non-revoked row per external identity.
- Reorder upsert_connection to revoke other owners' active rows before the new
  connected row is flushed (so the index is satisfied at commit), wrapped in a
  bounded rollback-and-retry loop. A losing concurrent writer now retries
  against the now-visible state instead of committing a duplicate.

Adds DB-constraint, revoked-slot-reuse, and concurrent-upsert regression tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): harden connect-status polling primitive

pollChannelConnectionUntilResolved was a free-floating recursive setTimeout
started from onSuccess with no cancellation, no per-provider dedup, a redundant
second endpoint per tick, and an unbounded loop on a non-finite expires_in.

- Extract a framework-agnostic, cancellable poller (connect-poll.ts) that polls
  only listChannelConnections() and invalidates the providers query once when the
  bind resolves, instead of fetching both endpoints every tick.
- Guard expires_in with a finite check + default window so undefined/NaN can no
  longer produce a poll loop that runs until the page closes.
- Track one active poll handle per provider in useConnectChannelProvider via a
  ref Map: a new connect cancels the prior poll for that provider, and a useEffect
  cleanup cancels all polls on unmount.

Adds unit tests for resolve-and-stop, cancellation, and non-finite-expiry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(channels): stop leaking blocked-sender content in DingTalk INFO log; document bind semantics

Moving the allowed_users gate past _extract_text meant the parsed-message INFO
log (text=%r, first 100 chars) fired for senders that allowed_users would have
rejected, defeating the filter's noise/privacy role. Move that log to after the
allowed_users gate so blocked senders' message text never reaches INFO logs.

Also document the two operator-relevant semantic changes in backend/CLAUDE.md:
connect-code dispatch runs before allowed_users (so allowed_users is no longer a
bind-time defense; the model relies on code confidentiality + 600s TTL + one-time
consumption), and the single-active-owner-per-external-identity transfer semantics
now backed by the partial unique index.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(channels): note connect-code-vs-allowlist and ownership transfer in operator guide

Mirror the backend/CLAUDE.md notes in the operator-facing IM_CHANNEL_CONNECTIONS.md:
connect codes are consumed before allowed_users (so a not-yet-allowlisted user can
still complete a first bind, and allowed_users is not a bind-time defense), and an
external identity has at most one active owner with last-bind-wins transfer enforced
at the DB layer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(channels): lift connect-code dispatch into Channel base class

Each adapter duplicated the ordering-sensitive boilerplate of extracting a
/connect code and guarding on the connection repo before its allowed_users gate.
The duplication is what let telegram/wechat drift and keep the gate ahead of the
bind. Centralize it:

- Move `_connection_repo` onto Channel.__init__ (removing 7 duplicate assignments).
- Add Channel._pending_connect_code(text), which guards on the repo and extracts
  the code, documenting that adapters MUST consult it before authorization so a
  browser-initiated bind can bootstrap a not-yet-authorized identity.
- Route slack, discord, feishu, dingtalk, wechat, and wecom through the helper.
  This also fixes a latent inconsistency where slack dispatched a bind even when
  no connection repo was configured.

Pure refactor — the full channel suite stays green; adds a direct unit test for
the base helper's contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* make format

* fix(channels): redact DingTalk parsed-message INFO log content

Log text_len instead of the first 100 chars of message text, so message
content never reaches INFO logs (the after-gate move already keeps blocked
senders out entirely). This takes over the redaction from #3584 so only this
PR touches dingtalk.py, letting the two PRs merge in any order conflict-free.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 10:15:31 +08:00
Nan Gao e732a741bf fix(channels): centralize shared channel retry helpers (#3583) 2026-06-17 15:44:40 +08:00
He Wang 08afdcb907 feat(channels): add DingTalk channel integration (#2628)
* feat(channels): add DingTalk channel integration

Add a new DingTalk messaging channel using the dingtalk-stream SDK
with Stream Push (WebSocket), requiring no public IP. Supports both
plain sampleMarkdown replies and optional AI Card streaming for a
typewriter effect when card_template_id is configured.

- Add DingTalkChannel implementation with token management, message
  routing, allowed_users filtering, and markdown adaptation
- Register dingtalk in channel service registry and capability map
- Propagate inbound metadata to outbound messages in ChannelManager
  for DingTalk sender context (sender_staff_id, conversation_type)
- Add dingtalk-stream dependency to pyproject.toml
- Add configuration examples in config.example.yaml and .env.example
- Update all README translations with setup instructions
- Add comprehensive test suite (test_dingtalk_channel.py) and
  metadata propagation test in test_channels.py
- Update backend CLAUDE.md to document DingTalk channel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): address PR review feedback for DingTalk integration

- Replace runtime mutation of CHANNEL_CAPABILITIES with a
  `supports_streaming` property on the Channel base class, overridden
  by DingTalkChannel, FeishuChannel, and WeComChannel
- Store stream client reference and attempt graceful disconnect in
  stop(); guard _on_chatbot_message with _running check to prevent
  post-stop message processing
- Use msg.chat_id as the primary routing key in send/send_file via
  a shared _resolve_routing helper, with metadata as fallback
- Fix process() return type annotation from tuple[str, str] to
  tuple[int, str] to match AckMessage.STATUS_OK
- Protect _incoming_messages with threading.Lock for cross-thread
  safety between the Stream Push thread and the asyncio loop
- Re-add Docker Compose URL guidance removed during DingTalk setup
  docs addition in README.md
- Fix incomplete sentence in README_zh.md (missing verb "启用")

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(docs): restore plain paragraph format for Docker Compose note

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): fix isinstance TypeError and add file size guard in DingTalk channel

Use tuple syntax for isinstance() type check to avoid runtime TypeError
with PEP 604 union types. Add upload size limit (20MB) before reading
files into memory. Narrow exception handlers to specific types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): propagate markdown fallback errors and validate access token response

- Re-raise exceptions in _send_markdown_fallback to prevent partial
  deliveries (files sent without accompanying text)
- Validate _get_access_token response: reject non-dict bodies, empty
  tokens, and coerce invalid expireIn to a safe default
- Add tests for both fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): validate upload response and broaden send_file exception handling

- Validate _upload_media JSON response: handle JSONDecodeError and
  non-dict payloads gracefully by returning None
- Broaden send_file exception tuple to include TypeError and
  AttributeError for unexpected JSON shapes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): fix streaming race on channel registration and slim outbound metadata

- Register channel in service before calling start() to avoid race
  where background receiver publishes inbound before registration,
  causing manager to fall back to static CHANNEL_CAPABILITIES
- Strip known-large metadata keys (raw_message, ref_msg) from outbound
  messages to prevent memory bloat from propagated inbound payloads

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update service.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update CLAUDE.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-30 11:25:33 +08:00
JilongSun 88e535269e Feature/feishu receive file (#1608)
* feat(feishu): add channel file materialization hook for inbound messages

- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).

* style(backend): format code with ruff for lint compliance

- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues

* fix(feishu): handle file write operation asynchronously to prevent blocking

* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code

* test(feishu): add tests for receive_file method and placeholder replacement

* fix(manager): remove unnecessary type casting for channel retrieval

* fix(feishu): update logging messages to reflect resource handling instead of image

* fix(feishu): sanitize filename by replacing invalid characters in file uploads

* fix(feishu): improve filename sanitization and reorder image key handling in message processing

* fix(feishu): add thread lock to prevent filename conflicts during file downloads

* fix(test): correct bad merge in test_feishu_parser.py

* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
2026-04-06 22:14:12 +08:00
DanielWalnut 76803b826f refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)
* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 22:55:52 +08:00