4 Commits

Author SHA1 Message Date
Xinmin Zeng 68d8caec1f fix(agents): make update_agent honor runtime.context user_id like setup_agent (#2867)
* fix(agents): make update_agent honor runtime.context user_id like setup_agent

PR #2784 hardened setup_agent to prefer runtime.context["user_id"] (set by
inject_authenticated_user_context from the auth-validated request) over the
contextvar, so an agent created during the bootstrap flow always lands under
users/<auth_uid>/agents/<name>. update_agent was left calling
get_effective_user_id() unconditionally — the same class of bug that produced
issues #2782 / #2862 still applies whenever the contextvar is not available
on the executing task (background work, future cross-process drivers,
checkpoint resume on a different task). In that regime update_agent silently
routes writes to users/default/agents/<name>, corrupting the shared default
bucket and losing the user's edit.

Extract the resolution policy into a shared resolve_runtime_user_id helper
on deerflow.runtime.user_context and route both setup_agent and update_agent
through it so the two halves of the lifecycle stay in lockstep.

Add load-bearing end-to-end tests that drive a real langchain.agents
create_agent graph with a fake LLM, exercising the full pipeline:

  HTTP wire format
    -> app.gateway.services.start_run config-assembly
    -> deerflow.runtime.runs.worker._build_runtime_context
    -> langchain.agents create_agent graph
    -> ToolNode dispatch (sync + async + sub-graph + ContextThreadPoolExecutor)
    -> setup_agent / update_agent

The negative-control tests intentionally land in users/default/ to prove the
positive tests are actually load-bearing rather than vacuously passing.

The new test_update_agent_e2e_user_isolation suite included a test that
failed against main and now passes after this fix.

* style: ruff format on new e2e tests

* test(e2e): real-server HTTP test driving setup_agent through the full ASGI stack

Adds tests/test_setup_agent_http_e2e_real_server.py — a single load-bearing
test that drives the entire FastAPI gateway through starlette.testclient.
TestClient with no mocks above the LLM:

  - lifespan boots (config, sqlite engine, LangGraph runtime, channels)
  - POST /api/v1/auth/register (real password hash, real sqlite write,
    issues access_token + csrf_token cookies)
  - POST /api/threads (real thread_meta + checkpoint creation)
  - POST /api/threads/{id}/runs/stream with the exact wire shape the React
    frontend sends (assistant_id + input + config + context with
    agent_name/is_bootstrap)
  - AuthMiddleware -> CSRFMiddleware -> require_permission ->
    start_run -> inject_authenticated_user_context ->
    asyncio.create_task(run_agent) -> worker._build_runtime_context ->
    Runtime injection -> ToolNode dispatch -> real setup_agent
  - Asserts SOUL.md is under users/<authenticated_uid>/agents/<name>/
    and NOT under users/default/agents/<name>/.

DEER_FLOW_HOME and the sqlite path are redirected into tmp_path so the test
never touches the real .deer-flow directory or developer database. The only
patch above the LLM boundary is replacing create_chat_model with a fake that
emits a single setup_agent tool_call.

This is the "真实验证" answer: it reproduces what curl-against-uvicorn would
do, minus the network socket layer.

* test: address Copilot review on user-isolation e2e tests

- Drop "currently expected to FAIL" wording from update_agent e2e docstring
  and header (Copilot review): the fix is in this PR, the test pins the
  corrected behaviour rather than driving a future change.
- Rephrase the assertion failure messages from "BUG:" to "REGRESSION:" to
  match the test's role on the fixed branch.
- Bound _drain_stream with a wall-clock timeout, a max-bytes cap, and an
  early break on the "event: end" SSE frame (Copilot review). Stops the
  test from hanging on a stuck run or runaway heartbeat loop.
- Replace the misleading "patch both module aliases" comment with an
  explanation of why patching lead_agent.agent.create_chat_model is the
  only correct target (Copilot review): lead_agent rebinds the symbol
  into its own namespace at import time, so patching deerflow.models is
  too late.

* test(refactor): address WillemJiang review on user-isolation e2e tests

- Extract the duplicated FakeToolCallingModel (and a
  build_single_tool_call_model helper) into tests/_agent_e2e_helpers.py.
  All three e2e files now import from the shared module instead of
  redefining the shim locally.
- Convert the manual p.start() / p.stop() try/finally blocks in
  test_update_agent_e2e_user_isolation.py to contextlib.ExitStack so
  patch lifecycle is Pythonic and exception-safe.
- Lift the isolated_app fixture's private-attribute resets into a
  named _reset_process_singletons helper with a comment block
  explaining why each singleton has to be invalidated for true e2e
  isolation, and why raising=False is intentional. Makes the
  fragility visible and the intent self-documenting rather than
  leaving the resets inline as opaque monkeypatch calls.

Net change: -59 lines (143 -> 84) across the three test files, with
every assertion intact. Full suite remains 69 passed / lint clean.

* test(e2e): make real-server test self-supply its config

CI's actions/checkout only ships config.example.yaml (the real config.yaml
is gitignored), so the production config-discovery search
(./config.yaml -> ../config.yaml -> $DEER_FLOW_CONFIG_PATH) finds nothing
and the test fails at lifespan boot with FileNotFoundError. The dev-machine
run passed only because a local config.yaml happened to exist.

Write a minimal AppConfig-valid yaml into tmp_path and pin
DEER_FLOW_CONFIG_PATH to it. The yaml carries just what the schema requires
(a single fake-test-model entry, LocalSandboxProvider, sqlite database).
The LLM never gets instantiated because the test patches create_chat_model
on the lead agent module, so the api_key/base_url stay placeholders.

Verified by hiding the local config.yaml to mirror the CI checkout — the
test now passes in both environments.
2026-05-12 23:18:54 +08:00
Maz Benoscar 30a5846219 fix(tools): make write_file append discoverable in model-facing schema (#2843)
* fix: make tool argument behavior discoverable

The write_file tool already supported append=false by default with append=true for end-of-file writes, but the parsed docstring did not describe append in the model-facing schema. This records the overwrite default and append path in the tool description, adds resilient schema regression coverage, and keeps backend sandbox docs aligned.

The regression now also checks that every public parameter in the existing tool schema test matrix has a description. Enabling docstring parsing on setup_agent and update_agent fills the two existing gaps with their existing Args docs instead of duplicating descriptions elsewhere.

Constraint: Issue #2831 asks for a small docstring/schema discoverability fix without changing runtime file-writing behavior
Rejected: Changing write_file defaults | would alter existing overwrite semantics and broaden the fix beyond schema discoverability
Rejected: Exact phrase assertions | too brittle for future docstring rewording while testing the same behavior
Confidence: high
Scope-risk: narrow
Directive: Keep model-facing tool parameters documented through parsed docstrings or equivalent schema descriptions
Tested: cd backend && uv run pytest tests/test_setup_agent_tool.py tests/test_update_agent_tool.py tests/test_tool_args_schema_no_pydantic_warning.py tests/test_sandbox_tools_security.py::test_str_replace_and_append_on_same_path_should_preserve_both_updates -q
Tested: cd backend && uv run ruff check packages/harness/deerflow/sandbox/tools.py packages/harness/deerflow/tools/builtins/setup_agent_tool.py packages/harness/deerflow/tools/builtins/update_agent_tool.py tests/test_tool_args_schema_no_pydantic_warning.py
Not-tested: Full backend test suite
Co-authored-by: OmX <omx@oh-my-codex.dev>

* Fix the lint error

---------

Co-authored-by: OmX <omx@oh-my-codex.dev>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-10 23:09:03 +08:00
He Wang 7de9b5828b fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning (#2774)
* fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning

Add deerflow/tools/types.py with:

    Runtime = ToolRuntime[dict[str, Any], ThreadState]

Replace every runtime: ToolRuntime[ContextT, ThreadState] and
runtime: ToolRuntime[dict[str, Any], ThreadState] annotation in
sandbox/tools.py, present_file_tool.py, task_tool.py, view_image_tool.py,
and skill_manage_tool.py with the new Runtime alias.

The unbound ContextT TypeVar (default None) caused
PydanticSerializationUnexpectedValue warnings on every tool call because
LangChain's BaseTool._parse_input calls model_dump() on the auto-generated
args_schema while DeerFlow passes a dict as runtime context.
Binding the context to dict[str, Any] aligns Pydantic's serialization
expectations with reality and removes the noise from all run modes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(tools): extend Runtime alias to setup_agent and update_agent tools

Replace bare ToolRuntime annotations in setup_agent_tool.py and
update_agent_tool.py with the shared Runtime alias introduced in the
previous commit, and add both tools to the Pydantic serialization
warning regression test (13 cases total).

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(tools): loosen Pydantic warning filter to avoid version-specific format

Replace the brittle "field_name='context'" substring check with a looser
"context" match so the assertion stays valid if Pydantic changes its
internal warning format across versions.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(tools): simplify warning filter and clean up docstring

Remove the "context" substring condition from the Pydantic warning
filter — asserting that no PydanticSerializationUnexpectedValue fires
at all is both simpler and more comprehensive, since the test payload
contains only the tool's own args plus runtime.

Also update the module docstring to remove the version-specific warning
format example that was inconsistent with the looser filter.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 14:50:33 +08:00
yangzheli 59c4a3f0a4 feat(agent): add custom-agent self-updates with user isolation (#2713)
* feat(agent): add update_agent tool for in-chat custom-agent self-updates (#2616)

Custom agents had no built-in way to persist updates to their own SOUL.md /
config.yaml from a normal chat — `setup_agent` was only bound during the
bootstrap flow, so when the user asked the agent to refine its description
or personality, the agent would shell out via bash/write_file and the edits
landed in a temporary sandbox/tool workspace instead of
`{base_dir}/agents/{agent_name}/`.

Changes:
- New `update_agent` builtin tool with partial-update semantics (only the
  fields you pass are written) and atomic temp-file + os.replace writes so
  a failed update never corrupts existing SOUL.md / config.yaml.
- Lead agent now binds `update_agent` in the non-bootstrap path whenever
  `agent_name` is set in the runtime context. Default agent (no
  agent_name) and bootstrap flow are unchanged.
- New `<self_update>` system-prompt section is injected for custom agents,
  instructing them to use `update_agent` — and explicitly NOT bash /
  write_file — to persist self-updates.
- Tests: 11 new cases in `tests/test_update_agent_tool.py` covering
  validation (missing/invalid agent_name, unknown agent, no fields),
  partial updates (soul-only, description-only, skills=[] vs omitted),
  no-op detection, atomic-write safety, and AgentConfig round-tripping;
  plus 2 new cases in `tests/test_lead_agent_prompt.py` covering the
  self-update prompt section.
- Docs: updated backend/CLAUDE.md builtin tools list and tools.mdx
  (en/zh) with the new tool description.

* feat(agent): isolate custom agents per user

Store custom agent definitions under the effective user, keep legacy agents readable until migration, and cover API/tool/migration behavior with tests.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: consistent write/delete targets & add --user-id to migration

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-05 23:17:42 +08:00