* feat(trace): Add `run_name` to the trace info for suggestions and memory.
before(in langsmith):
CodexChatModel
CodexChatModel
lead_agent
after:
suggest_agent
memory_agent
lead_agent
feat(trace): Add `run_name` to the trace info for suggestions and memory.
before(in langsmith):
CodexChatModel
CodexChatModel
lead_agent
after:
suggest_agent
memory_agent
lead_agent
* feat(trace): Add `run_name` to the trace info for system agents.
before(in langsmith):
CodexChatModel
CodexChatModel
CodexChatModel
CodexChatModel
lead_agent
after:
suggest_agent
title_agent
security_agent
memory_agent
lead_agent
* chore(code format):code format
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(subagents): support per-subagent skill loading and custom subagent types (#2230)
Add per-subagent skill configuration and custom subagent type registration,
aligned with Codex's role-based config layering and per-session skill injection.
Backend:
- SubagentConfig gains `skills` field (None=all, []=none, list=whitelist)
- New CustomSubagentConfig for user-defined subagent types in config.yaml
- SubagentsAppConfig gains `custom_agents` section and `get_skills_for()`
- Registry resolves custom agents with three-layer config precedence
- SubagentExecutor loads skills per-session as conversation items (Codex pattern)
- task_tool no longer appends skills to system_prompt
- Lead agent system prompt dynamically lists all registered subagent types
- setup_agent tool accepts optional skills parameter
- Gateway agents API transparently passes skills in CRUD operations
Frontend:
- Agent/CreateAgentRequest/UpdateAgentRequest types include skills field
- Agent card displays skills as badges alongside tool_groups
Config:
- config.example.yaml documents custom_agents and per-agent skills override
Tests:
- 40 new tests covering all skill config, custom agents, and registry logic
- Existing tests updated for new get_skills_prompt_section signature
Closes#2230
* fix: address review feedback on skills PR
- Remove stale get_skills_prompt_section monkeypatches from test_task_tool_core_logic.py
(task_tool no longer imports this function after skill injection moved to executor)
- Add key prefixes (tg:/sk:) to agent-card badges to prevent React key collisions
between tool_groups and skills
* fix(ci): resolve lint and test failures
- Format agent-card.tsx with prettier (lint-frontend)
- Remove stale "Skills Appendix" system_prompt assertion — skills are now
loaded per-session by SubagentExecutor, not appended to system_prompt
* fix(ci): sort imports in test_subagent_skills_config.py (ruff I001)
* fix(ci): use nullish coalescing in agent-card badge condition (eslint)
* fix: address review feedback on skills PR
- Use model_fields_set in AgentUpdateRequest to distinguish "field omitted"
from "explicitly set to null" — fixes skills=None ambiguity where None
means "inherit all" but was treated as "don't change"
- Move lazy import of get_subagent_config outside loop in
_build_available_subagents_description to avoid repeated import overhead
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The tool is registered as `present_files` (plural) in present_file_tool.py,
but four references in documentation and prompt strings incorrectly used the
singular form `present_file`. This could cause confusion and potentially
lead to incorrect tool invocations.
Changed files:
- backend/docs/GUARDRAILS.md
- backend/docs/ARCHITECTURE.md
- backend/packages/harness/deerflow/agents/lead_agent/prompt.py (2 occurrences)
* fix(subagent): inherit parent agent's tool_groups in task_tool
When a custom agent defines tool_groups (e.g. [file:read, file:write, bash]),
the restriction is correctly applied to the lead agent. However, when the lead
agent delegates work to a subagent via the task tool, get_available_tools() is
called without the groups parameter, causing the subagent to receive ALL tools
(including web_search, web_fetch, image_search, etc.) regardless of the parent
agent's configuration.
This fix propagates tool_groups through run metadata so that task_tool passes
the same group filter when building the subagent's tool set.
Changes:
- agent.py: include tool_groups in run metadata
- task_tool.py: read tool_groups from metadata and pass to get_available_tools()
* fix: initialize metadata before conditional block and update tests for tool_groups propagation
- Initialize metadata = {} before the 'if runtime is not None' block to
avoid Ruff F821 (possibly-undefined variable) and simplify the
parent_tool_groups expression.
- Update existing test assertion to expect groups=None in
get_available_tools call signature.
- Add 3 new test cases:
- test_task_tool_propagates_tool_groups_to_subagent
- test_task_tool_no_tool_groups_passes_none
- test_task_tool_runtime_none_passes_groups_none
* fix(memory): cache corruption, thread-safety, and caller mutation bugs
Bug 1 (updater.py): deep-copy current_memory before passing to
_apply_updates() so a subsequent save() failure cannot leave a
partially-mutated object in the storage cache.
Bug 3 (storage.py): add _cache_lock (threading.Lock) to
FileMemoryStorage and acquire it around every read/write of
_memory_cache, fixing concurrent-access races between the background
timer thread and HTTP reload calls.
Bug 4 (storage.py): replace in-place mutation
memory_data["lastUpdated"] = ...
with a shallow copy
memory_data = {**memory_data, "lastUpdated": ...}
so save() no longer silently modifies the caller's dict.
Regression tests added for all three bugs in test_memory_storage.py
and test_memory_updater.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: format test_memory_updater.py with ruff
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: remove stale bug-number labels from code comments and docstrings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(checkpointer): create parent directory before opening SQLite in sync provider
The sync checkpointer factory (_sync_checkpointer_cm) opens a SQLite
connection without first ensuring the parent directory exists. The async
provider and both store providers already call ensure_sqlite_parent_dir(),
but this call was missing from the sync path.
When the deer-flow harness package is used from an external virtualenv
(where the .deer-flow directory is not pre-created), the missing parent
directory causes:
sqlite3.OperationalError: unable to open database file
Add the missing ensure_sqlite_parent_dir() call in the sync SQLite
branch, consistent with the async provider, and add a regression test.
Closes#2259
* style: fix ruff format + add call-order assertion for ensure_parent_dir
- Fix formatting in test_checkpointer.py (ruff format)
- Add test_sqlite_ensure_parent_dir_before_connect to verify
ensure_sqlite_parent_dir is called before from_conn_string
(addresses Copilot review suggestion)
---------
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix(memory): use asyncio.to_thread for blocking file I/O in aupdate_memory
`_finalize_update` performs synchronous blocking operations (os.mkdir,
file open/write/rename/stat) that were called directly from the async
`aupdate_memory` method, causing `BlockingError` from blockbuster when
running under an ASGI server. Wrap the call with `asyncio.to_thread` to
offload all blocking I/O to a thread pool.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): use unique temp filename to prevent concurrent write collision
`file_path.with_suffix(".tmp")` produces a fixed path — concurrent saves
for the same agent (now possible after wrapping _finalize_update in
asyncio.to_thread) would clobber the same temp file. Use a UUID-suffixed
temp file so each write is isolated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): also offload _prepare_update_prompt to thread pool
FileMemoryStorage.load() inside _prepare_update_prompt performs
synchronous stat() and file read, blocking the event loop just like
_finalize_update did. Wrap _prepare_update_prompt in asyncio.to_thread
for the same reason.
The async path now has no blocking file I/O on the event loop:
to_thread(_prepare_update_prompt) → await model.ainvoke() → to_thread(_finalize_update)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(todo-middleware): prevent premature agent exit with incomplete todos
When plan mode is active (is_plan_mode=True), the agent occasionally
exits the loop and outputs a final response while todo items are still
incomplete. This happens because the routing edge only checks for
tool_calls, not todo completion state.
Fixes#2112
Add an after_model override to TodoMiddleware with
@hook_config(can_jump_to=["model"]). When the model produces a
response with no tool calls but there are still incomplete todos, the
middleware injects a todo_completion_reminder HumanMessage and returns
jump_to=model to force another model turn. A cap of 2 reminders
prevents infinite loops when the agent cannot make further progress.
Also adds _completion_reminder_count() helper and 14 new unit tests
covering all edge cases of the new after_model / aafter_model logic.
* Remove unnecessary blank line in test file
* Fix runtime argument annotation in before_model
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* docs: mark memory updater async migration as completed
- Update TODO.md to mark the replacement of sync model.invoke()
with async model.ainvoke() in title_middleware and memory updater
as completed using [x] format
Addresses #2131
* feat: switch memory updater to async LLM calls
- Add async aupdate_memory() method using await model.ainvoke()
- Convert sync update_memory() to use async wrapper
- Add _run_async_update_sync() for nested loop context handling
- Maintain backward compatibility with existing sync API
- Add ThreadPoolExecutor for async execution from sync contexts
Addresses #2131
* test: add tests for async memory updater
- Add test_async_update_memory_uses_ainvoke() to verify async path
- Convert existing tests to use AsyncMock and ainvoke assertions
- Add test_sync_update_memory_wrapper_works_in_running_loop()
- Update all model mocks to use async await patterns
Addresses #2131
* fix: apply ruff formatting to memory updater
- Format multi-line expressions to single line
- Ensure code style consistency with project standards
- Fix lint issues caught by GitHub Actions
* test: add comprehensive tests for async memory updater
- Add test_async_update_memory_uses_ainvoke() to verify async path
- Convert existing tests to use AsyncMock and ainvoke assertions
- Add test_sync_update_memory_wrapper_works_in_running_loop()
- Update all model mocks to use async await patterns
- Ensure backward compatibility with sync API
* fix: satisfy ruff formatting in memory updater test
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(title): strip <think> tags from title model responses and assistant context
Reasoning models (e.g. minimax M2.7, DeepSeek-R1) emit <think>...</think>
blocks before their actual output. When such a model is used as the title
model (or as the main agent), the raw thinking content leaked into the thread
title stored in state, so the chat list showed the internal monologue instead
of a meaningful title.
Fixes#1884
- Add `_strip_think_tags()` helper using a regex to remove all <think>...</think> blocks
- Apply it in `_parse_title()` so the title model response is always clean
- Apply it to the assistant message in `_build_title_prompt()` so thinking
content from the first AI turn is not fed back to the title model
- Add four new unit tests covering: stripping in parse, think-only response,
assistant prompt stripping, and end-to-end async flow with think tags
* Fix the lint error
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware
The existing hash-based loop detection only catches identical tool call
sets. When the agent calls the same tool type (e.g. read_file) on many
different files, each call produces a unique hash and bypasses detection.
This causes the agent to exhaust recursion_limit, consuming 150K-225K
tokens per failed run.
Add a second detection layer that tracks cumulative call counts per tool
type per thread. Warns at 30 calls (configurable) and forces stop at 50.
The hard stop message now uses the actual returned message instead of a
hardcoded constant, so both hash-based and frequency-based stops produce
accurate diagnostics.
Also fix _apply() to use the warning message returned by
_track_and_check() for hard stops, instead of always using _HARD_STOP_MSG.
Closes#1987
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(lint): remove unused imports and fix line length
- Remove unused _TOOL_FREQ_HARD_STOP_MSG and _TOOL_FREQ_WARNING_MSG
imports from test file (F401)
- Break long _TOOL_FREQ_WARNING_MSG string to fit within 240 char limit (E501)
* style: apply ruff format
* test: add LRU eviction and per-thread reset coverage for frequency state
Address review feedback from @WillemJiang:
- Verify _tool_freq and _tool_freq_warned are cleaned on LRU eviction
- Add test for reset(thread_id=...) clearing only the target thread's
frequency state while leaving others intact
* fix(makefile): route Windows shell-script targets through Git Bash (#2060)
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Asish Kumar <87874775+officialasishkumar@users.noreply.github.com>
* fix: improve sandbox security and preserve multimodal content
* Add unit test modifications for test_injects_uploaded_files_tag_into_list_content
* format updated_content
* Add regression tests for multimodal upload content and host bash default safety
* fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995)
Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON
strings instead of native arrays. Add defensive type checking in
_format_clarification_message() to deserialize string options before
iteration, preventing per-character rendering.
* fix(middleware): normalize options after JSON deserialization
Address Copilot review feedback:
- Add post-deserialization normalization so options is always a list
(handles json.loads returning a scalar string, dict, or None)
- Add test for JSON-encoded scalar string ("development")
- Fix test_json_string_with_mixed_types to use actual mixed types
* fix(backend): use timezone-aware UTC in memory modules
Replace datetime.utcnow() with datetime.now(timezone.utc) and a shared
utc_now_iso_z() helper so persisted ISO timestamps keep the trailing Z
suffix without triggering Python 3.12+ deprecation warnings.
Made-with: Cursor
* refactor(backend): use removesuffix for utc_now_iso_z suffix
Makes the +00:00 -> Z transform explicit for the trailing offset only
(Copilot review on PR #1992).
Made-with: Cursor
* style(backend): satisfy ruff UP017 with datetime.UTC in memory queue
Made-with: Cursor
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(backend): make loop detection hash tool calls by stable keys
The loop detection middleware previously hashed full tool call arguments,
which made repeated calls look different when only non-essential argument
details changed. In particular, `read_file` calls with nearby line ranges
could bypass repetition detection even when the agent was effectively
reading the same file region again and again.
- Hash tool calls using stable keys instead of the full raw args payload
- Bucket `read_file` line ranges so nearby reads map to the same region key
- Prefer stable identifiers such as `path`, `url`, `query`, or `command`
before falling back to JSON serialization of args
- Keep hashing order-independent so the same tool call set produces the
same hash regardless of call order
Fixes#1905
* fix(backend): harden loop detection hash normalization
- Normalize and parse stringified tool args defensively
- Expand stable key derivation to include pattern, glob, and cmd
- Normalize reversed read_file ranges before bucketing
Fixes#1905
* fix(backend): harden loop detection tool format
* exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged.
---------
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware
Add _validate_input() to reject malformed bash commands before regex
classification: empty commands, oversized commands (>10 000 chars), and
null bytes that could cause detection/execution layer inconsistency.
* fix(sandbox): address Copilot review — type guard, log truncation, reject reason
- Coerce None/non-string command to str before validation
- Truncate oversized commands in audit logs to prevent log amplification
- Propagate reject_reason through _pre_process() to block message
- Remove L2 label from comments and test class names
* fix(sandbox): isinstance type guard + async input sanitisation tests
Address review comments:
- Replace str() coercion with isinstance(raw_command, str) guard so
non-string truthy values (0, [], False) fall back to empty string
instead of passing validation as "0"/"[]"/"False".
- Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests
covering empty, null-byte, oversized, and None command via
awrap_tool_call path.
* fix(memory): case-insensitive fact deduplication and positive reinforcement detection
Two fixes to the memory system:
1. _fact_content_key() now lowercases content before comparison, preventing
semantically duplicate facts like "User prefers Python" and "user prefers
python" from being stored separately.
2. Adds detect_reinforcement() to MemoryMiddleware (closes#1719), mirroring
detect_correction(). When users signal approval ("yes exactly", "perfect",
"完全正确", etc.), the memory updater now receives reinforcement_detected=True
and injects a hint prompting the LLM to record confirmed preferences and
behaviors with high confidence.
Changes across the full signal path:
- memory_middleware.py: _REINFORCEMENT_PATTERNS + detect_reinforcement()
- queue.py: reinforcement_detected field in ConversationContext and add()
- updater.py: reinforcement_detected param in update_memory() and
update_memory_from_conversation(); builds reinforcement_hint alongside
the existing correction_hint
Tests: 11 new tests covering deduplication, hint injection, and signal
detection (Chinese + English patterns, window boundary, conflict with correction).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): address Copilot review comments on reinforcement detection
- Tighten _REINFORCEMENT_PATTERNS: remove 很好, require punctuation/end-of-string boundaries on remaining patterns, split this-is-good into stricter variants
- Suppress reinforcement_detected when correction_detected is true to avoid mixed-signal noise
- Use casefold() instead of lower() for Unicode-aware fact deduplication
- Add missing test coverage for reinforcement_detected OR merge and forwarding in queue
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents
Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.
This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this): agent guided to use outline + grep + read_file workflow
* feat(uploads): add file-first priority and fallback guidance to uploaded_files context
* fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline
- Add _clean_bold_title() to merge adjacent bold spans (** **) produced
by pymupdf4llm when bold text crosses span boundaries
- Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise **<num>** **<title>**
headings common in academic papers; excludes pure-number table headers
and rows with more than 4 bold blocks
- When outline is empty, read first 5 non-empty lines of the .md as a
content preview and surface a grep hint in the agent context
- Update _format_file_entry to render the preview + grep hint instead of
silently omitting the outline section
- Add 3 new extract_outline tests and 2 new middleware tests (65 total)
* fix(uploads): address Copilot review comments on extract_outline regex
- Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII
titles (e.g. **1** **概述**); pure-numeric/punctuation blocks still excluded
- Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to
keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input
- Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex)
- Log debug message with exc_info when preview extraction fails
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents
Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.
This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this): agent guided to use outline + grep + read_file workflow
* feat(uploads): add file-first priority and fallback guidance to uploaded_files context
* fix: inject longTermBackground into memory prompt
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.
The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.
Add the missing field handling and a regression test.
* fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware
LangChain AIMessage.content can be str | list. When using providers that
return structured content blocks (e.g. Anthropic thinking mode, certain
OpenAI-compatible gateways), content is a list of dicts like
[{"type": "text", "text": "..."}].
The hard_limit branch in _apply() concatenated content with a string via
(last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}", which raises
TypeError when content is a non-empty list (list + str is invalid).
Add _append_text() static method that:
- Returns the text directly when content is None
- Appends a {"type": "text"} block when content is a list
- Falls back to string concatenation when content is a str
This is consistent with how other modules in the project already handle
list content (client.py._extract_text, memory_middleware, executor.py).
* test(middleware): add unit tests for _append_text and list content hard stop
Add regression tests to verify LoopDetectionMiddleware handles list-type
AIMessage.content correctly during hard stop:
- TestAppendText: unit tests for the new _append_text() static method
covering None, str, list (including empty list) content types
- TestHardStopWithListContent: integration tests verifying hard stop
works correctly with list content (Anthropic thinking mode), None
content, and str content
Requested by reviewer in PR #1823.
* fix(middleware): improve _append_text robustness and test isolation
- Add explicit isinstance(content, str) check with fallback for
unexpected types (coerce to str) to prevent TypeError on edge cases
- Deep-copy list content in _make_state() test helper to prevent
shared mutable references across test iterations
- Add test_unexpected_type_coerced_to_str: verify fallback for
non-str/list/None content types
- Add test_list_content_not_mutated_in_place: verify _append_text
does not modify the original list
* style: fix ruff format whitespace in test file
---------
Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
* feat(uploads): inject document outline into agent context for converted files
Extract headings from converted .md files and inject them into the
<uploaded_files> context block so the agent can navigate large documents
by line number before reading.
- Add `extract_outline()` to `file_conversion.py`: recognises standard
Markdown headings (#/##/###) and SEC-style bold structural headings
(**ITEM N. BUSINESS**, **PART II**); caps at 50 entries; excludes
cover-page boilerplate (WASHINGTON DC, CURRENT REPORT, SIGNATURES)
- Add `_extract_outline_for_file()` helper in `uploads_middleware.py`:
looks for a sibling `.md` file produced by the conversion pipeline
- Update `UploadsMiddleware._create_files_message()` to render the outline
under each file entry with `L{line}: {title}` format and a `read_file`
prompt for range-based reading
- Tests: 10 new tests for `extract_outline()`, 4 new tests for outline
injection in `UploadsMiddleware`; existing test updated for new `outline`
field in `uploaded_files` state
Partially addresses #1647 (agent ignores uploaded files).
* fix(uploads): stream outline file reads and strip inline bold from heading titles
- Switch extract_outline() from read_text().splitlines() to open()+line iteration
so large converted documents are not loaded into memory on every agent turn;
exits as soon as MAX_OUTLINE_ENTRIES is reached (Copilot suggestion)
- Strip **...** wrapper from standard Markdown heading titles before appending
to outline so agent context stays clean (e.g. "## **Overview**" → "Overview")
(Copilot suggestion)
- Remove unused pathlib.Path import and fix import sort order in test_file_conversion.py
to satisfy ruff CI lint
* fix(uploads): show truncation hint when outline exceeds MAX_OUTLINE_ENTRIES
When extract_outline() hits the cap it now appends a sentinel entry
{"truncated": True} instead of silently dropping the rest of the headings.
UploadsMiddleware reads the sentinel and renders a hint line:
... (showing first 50 headings; use `read_file` to explore further)
Without this the agent had no way to know the outline was incomplete and
would treat the first 50 headings as the full document structure.
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id
runtime.context does not always carry thread_id (depends on LangGraph
invocation path). ThreadDataMiddleware already falls back to
get_config().configurable.thread_id — apply the same pattern so
UploadsMiddleware can resolve the uploads directory and attach outlines
in all invocation paths.
* style: apply ruff format
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id
runtime.context does not always carry thread_id depending on the
LangGraph invocation path. When absent, uploads_dir resolved to None
and the entire outline/historical-files attachment was silently skipped.
Apply the same fallback pattern already used by ThreadDataMiddleware:
try get_config().configurable.thread_id, with a RuntimeError guard for
test environments where get_config() is called outside a runnable context.
Discovered via live integration testing (curl against local LangGraph).
Unit tests inject uploads_dir directly and would not catch this.
* style: apply ruff format to uploads_middleware.py
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.
The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.
Add the missing field handling and a regression test.
Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
* feat(agent): 为AgentConfig添加skills字段并更新lead_agent系统提示
在AgentConfig中添加skills字段以支持配置agent可用技能
更新lead_agent的系统提示模板以包含可用技能信息
* fix: resolve agent skill configuration edge cases and add tests
* Update backend/packages/harness/deerflow/agents/lead_agent/prompt.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* refactor(agent): address PR review comments for skills configuration
- Add detailed docstring to `skills` field in `AgentConfig` to clarify the semantics of `None` vs `[]`.
- Add unit tests in `test_custom_agent.py` to verify `load_agent_config()` correctly parses omitted skills and explicit empty lists.
- Fix `test_make_lead_agent_empty_skills_passed_correctly` to include `agent_name` in the runtime config, ensuring it exercises the real code path.
* docs: 添加关于按代理过滤技能的配置说明
在配置示例文件和文档中添加说明,解释如何通过代理的config.yaml文件限制加载的技能
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli
Implement all core LangGraph Platform API endpoints in the Gateway,
allowing it to fully replace the langgraph-cli dev server for local
development. This eliminates a heavyweight dependency and simplifies
the development stack.
Changes:
- Add runs lifecycle endpoints (create, stream, wait, cancel, join)
- Add threads CRUD and search endpoints
- Add assistants compatibility endpoints (search, get, graph, schemas)
- Add StreamBridge (in-memory pub/sub for SSE) and async provider
- Add RunManager with atomic create_or_reject (eliminates TOCTOU race)
- Add worker with interrupt/rollback cancel actions and runtime context injection
- Route /api/langgraph/* to Gateway in nginx config
- Skip langgraph-cli startup by default (SKIP_LANGGRAPH_SERVER=0 to restore)
- Add unit tests for RunManager, SSE format, and StreamBridge
* fix: drain bridge queue on client disconnect to prevent backpressure
When on_disconnect=continue, keep consuming events from the bridge
without yielding, so the worker is not blocked by a full queue.
Only on_disconnect=cancel breaks out immediately.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: remove pytest import
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Fix default stream_mode to ["values", "messages-tuple"]
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Remove unused if_exists field from ThreadCreateRequest
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: address review comments on gateway LangGraph API
- Mount runs.py router in app.py (missing include_router)
- Normalize interrupt_before/after "*" to node list before run_agent()
- Use entry.id for SSE event ID instead of counter
- Drain bridge queue on disconnect when on_disconnect=continue
- Reuse serialization helper in wait_run() for consistent wire format
- Reject unsupported multitask_strategy with 400
- Remove SKIP_LANGGRAPH_SERVER fallback, always use Gateway
* feat: extract app.state access into deps.py
Encapsulate read/write operations for singleton objects (RunManager,
StreamBridge, checkpointer) held in app.state into a shared utility,
reducing repeated access patterns across router modules.
* feat: extract deerflow.runtime.serialization module with tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: replace duplicated serialization with deerflow.runtime.serialization
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: extract app/gateway/services.py with run lifecycle logic
Create a service layer that centralizes SSE formatting, input/config
normalization, and run lifecycle management. Router modules will delegate
to these functions instead of using private cross-imported helpers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: wire routers to use services layer, remove cross-module private imports
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff formatting to refactored files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(runtime): support LangGraph dev server and add compat route
- Enable official LangGraph dev server for local development workflow
- Decouple runtime components from agents package for better separation
- Provide gateway-backed fallback route when dev server is skipped
- Simplify lifecycle management using context manager in gateway
* feat(runtime): add Store providers with auto-backend selection
- Add async_provider.py and provider.py under deerflow/runtime/store/
- Support memory, sqlite, postgres backends matching checkpointer config
- Integrate into FastAPI lifespan via AsyncExitStack in deps.py
- Replace hardcoded InMemoryStore with config-driven factory
* refactor(gateway): migrate thread management from checkpointer to Store and resolve multiple endpoint failures
- Add Store-backed CRUD helpers (_store_get, _store_put, _store_upsert)
- Replace checkpoint-scanning search with two-phase strategy:
phase 1 reads Store (O(threads)), phase 2 backfills from checkpointer
for legacy/LangGraph Server threads with lazy migration
- Extend Store record schema with values field for title persistence
- Sync thread title from checkpoint to Store after run completion
- Fix /threads/{id}/runs/{run_id}/stream 405 by accepting both
GET and POST methods; POST handles interrupt/rollback actions
- Fix /threads/{id}/state 500 by separating read_config and
write_config, adding checkpoint_ns to configurable, and
shallow-copying checkpoint/metadata before mutation
- Sync title to Store on state update for immediate search reflection
- Move _upsert_thread_in_store into services.py, remove duplicate logic
- Add _sync_thread_title_after_run: await run task, read final
checkpoint title, write back to Store record
- Spawn title sync as background task from start_run when Store exists
* refactor(runtime): deduplicate store and checkpointer provider logic
Extract _ensure_sqlite_parent_dir() helper into checkpointer/provider.py
and use it in all three places that previously inlined the same mkdir logic.
Consolidate duplicate error constants in store/async_provider.py by importing
from store/provider.py instead of redefining them.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(runtime): move SQLite helpers to runtime/store, checkpointer imports from store
_resolve_sqlite_conn_str and _ensure_sqlite_parent_dir now live in
runtime/store/provider.py. agents/checkpointer/provider and
agents/checkpointer/async_provider import from there, reversing the
previous dependency direction (store → checkpointer becomes
checkpointer → store).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(runtime): extract SQLite helpers into runtime/store/_sqlite_utils.py
Move resolve_sqlite_conn_str and ensure_sqlite_parent_dir out of
checkpointer/provider.py into a dedicated _sqlite_utils module.
Functions are now public (no underscore prefix), making cross-module
imports semantically correct. All four provider files import from
the single shared location.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): use adelete_thread to fully remove thread checkpoints on delete
AsyncSqliteSaver has no adelete method — the previous hasattr check
always evaluated to False, silently leaving all checkpoint rows in the
database. Switch to adelete_thread(thread_id) which deletes every
checkpoint and pending-write row for the thread across all namespaces
(including sub-graph checkpoints).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(gateway): remove dead bridge_cm/ckpt_cm code and fix StrEnum lint
app.py had unreachable code after the async-with lifespan refactor:
bridge_cm and ckpt_cm were referenced but never defined (F821), and
the channel service startup/shutdown was outside the langgraph_runtime
block so it never ran. Move channel service lifecycle inside the
async-with block where it belongs.
Replace str+Enum inheritance in RunStatus and DisconnectMode with
StrEnum as suggested by UP042.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: format with ruff
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(sandbox): add SandboxAuditMiddleware for bash command security auditing
Addresses the LocalSandbox escape vector reported in #1224 where bash tool
calls can execute destructive commands against the host filesystem.
- Add SandboxAuditMiddleware with three-tier command classification:
- High-risk (block): rm -rf /, curl|bash, dd if=, mkfs, /etc/shadow access
- Medium-risk (warn): pip install, apt install, chmod 777
- Safe (pass): normal workspace operations
- Register middleware after GuardrailMiddleware in _build_runtime_middlewares,
applied to both lead agent and subagents
- Structured audit log via standard logger (visible in langgraph.log)
- Medium-risk commands execute but append a warning to the tool result,
allowing the LLM to self-correct without blocking legitimate workflows
- High-risk commands return an error ToolMessage without calling the handler,
so the agent loop continues gracefully
* fix(lint): sort imports in test_sandbox_audit_middleware
* refactor(sandbox-audit): address Copilot review feedback (3/5/6)
- Fix class docstring to match implementation: medium-risk commands are
executed with a warning appended (not rejected), and cwd anchoring note
removed (handled in a separate PR)
- Remove capsys.disabled() from benchmark test to avoid CI log noise;
keep assertions for recall/precision targets
- Remove misleading 'cwd fix' from test module docstring
* test(sandbox-audit): add async tests for awrap_tool_call
* fix(sandbox-audit): address Copilot review feedback (1/2)
- Narrow rm high-risk regex to only block truly destructive targets
(/, /*, ~, ~/*, /home, /root); legitimate workspace paths like
/mnt/user-data/ are no longer false-positived
- Handle list-typed ToolMessage content in _append_warn_to_result;
append a text block instead of str()-ing the list to avoid breaking
structured content normalization
* style: apply ruff format to sandbox_audit_middleware files
* fix(sandbox-audit): update benchmark comment to match assert-based implementation
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>