mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-05-23 16:35:59 +00:00
e37912e2c85925fcdc712fa0b50fafe08602f884
61 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0c37509b38 |
fix(middleware): Prevent todo completion reminder IMMessage leak (#2907)
* fix(middleware): Prevent todo completion reminder IMMessage leak (#2892) * make format * fix(middleware): Clear stale todo reminder counts (#2892) * add size guard for _completion_reminder_counts and add a integration test |
||
|
|
181d836541 |
fix(middleware): normalize tool result adjacency before model calls (#2939)
* normalizing tool-call transcripts before invocation * test(middleware): cover tool result regrouping edge cases |
||
|
|
eab7ae3d62 |
feat: stream subagent token usage to header via terminal task events (#2882)
* feat: real-time subagent token usage display in header and per-turn Backend: - Persist subagent token usage to AIMessage.usage_metadata via TokenUsageMiddleware, so accumulateUsage() naturally includes subagent tokens without frontend state management - Cache subagent usage by tool_call_id in task_tool, write back to the dispatching AIMessage on next model response - Emit subagent token usage on all terminal task events (task_completed, task_failed, task_cancelled, task_timed_out) - Report subagent usage to parent RunJournal for API totals - Search backward from ToolMessage to find dispatching AIMessage for correct multi-tool-call attribution Frontend: - Remove subagentUsage state, custom event handling, and prop threading — subagent tokens are now embedded in message metadata - Simplify selectHeaderTokenUsage (no subagentUsage parameter) - Per-turn inline badges show turn-specific usage via message accumulation - Remove isLoading guard from MessageTokenUsageList for dynamic updates during streaming * fix: prevent header token double counting from baseline reset race onFinish, onError, and thread-switch useEffect all reset pendingUsageBaselineMessageIdsRef to an empty Set. If thread.isLoading is still true on the next render, all messages pass the getMessagesAfterBaseline filter and their tokens are added to backendUsage (which already includes them), causing the header to display up to 2× the actual token count. Capture current message IDs instead of using an empty Set so that getMessagesAfterBaseline correctly returns no pending messages even if thread.isLoading lags behind the stream end. * fix: write back subagent tokens for all concurrent task tool calls TokenUsageMiddleware only processed messages[-2], so when a single model response dispatched multiple task tool calls only the last ToolMessage had its cached subagent usage written back to the dispatch AIMessage.usage_metadata. Earlier tasks' usage stayed in _subagent_usage_cache indefinitely (leak) and never appeared in the per-turn inline token display. Walk backward through all consecutive ToolMessages before the new AIMessage, and accumulate updates targeting the same dispatch message into one state update so overlapping writes don't clobber each other. * fix: clean up subagent usage cache entry on task cancellation When a task_tool invocation is cancelled via CancelledError, any cached subagent usage entry leaked because the TokenUsageMiddleware writeback path never fires after cancellation. Pop the cache entry before re-raising to prevent unbounded growth of the module-level _subagent_usage_cache dict. * fix: address token usage review feedback * fix: handle missing config for subagent usage cache --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
20d2d2b373 | fix(middleware): Handle invalid tool calls in dangling pairing middleware (#2890) (#2891) | ||
|
|
08ee7adeba |
fix(lint): remove duplicate is_dynamic_context_reminder definition (#2837)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
881ff71252 | fix(harness): preserve dynamic context across summarization (#2823) | ||
|
|
f76e4e35c8 | fix title generation with dynamic context reminder (#2830) | ||
|
|
c1b7f1d189 |
feat: static system prompt with DynamicContextMiddleware for prefix-cache optimization (#2801)
* feat(middleware): inject dynamic context via DynamicContextMiddleware
Move memory and current date out of the system prompt and into a
dedicated <system-reminder> HumanMessage injected once per session
(frozen-snapshot pattern) via a new DynamicContextMiddleware.
This keeps the system prompt byte-exact across all users and sessions,
enabling maximum Anthropic/Bedrock prefix-cache reuse.
Key design decisions:
- ID-swap technique: reminder takes the first HumanMessage's ID
(replacing it in-place via add_messages), original content gets a
derived `{id}__user` ID (appended after). Preserves correct ordering.
- hide_from_ui: True on reminder messages so frontend filters them out.
- Midnight crossing: date-update reminder injected before the current
turn's HumanMessage when the conversation spans midnight.
- INFO-level logging for production diagnostics.
Also adds prompt-caching breakpoint budget enforcement tests and
updates ClaudeChatModel docs to reference the new pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(token-usage): log input/output token detail breakdown in middleware
Extend the LLM token usage log line to include input_token_details and
output_token_details (cache_creation, cache_read, reasoning, audio, etc.)
when present. Adds tests covering Anthropic cache detail logging from
both usage_metadata and response_metadata.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: fix nginx
* fix(middleware): always inject date; gate memory on injection_enabled
Date injection is now unconditional — it is part of the static system
prompt replacement and should always be present. Memory injection
remains gated by `memory.injection_enabled` in the app config.
Previously the entire DynamicContextMiddleware was skipped when
injection_enabled was False, which also suppressed the date.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): format files and correct test assertions for token usage middleware
- ruff format dynamic_context_middleware.py and test_claude_provider_prompt_caching.py
- Remove unused pytest import from test_dynamic_context_middleware.py
- Fix two tests that asserted response_metadata fallback logic that
doesn't exist: replace with tests that match actual middleware behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(middleware): address Copilot review comments on DynamicContextMiddleware
- Use additional_kwargs flag for reminder detection instead of content
substring matching, so user messages containing '<system-reminder>'
are not mistakenly treated as injected reminders
- Generate stable UUID when original HumanMessage.id is None to prevent
ambiguous 'None__user' derived IDs and message collisions
- Downgrade per-turn no-op log to DEBUG; keep actual injection events at INFO
- Add two new tests: missing-id UUID fallback and user-text false-positive
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|
|
5fd0e6ac89 | fix(middleware): sync raw tool call metadata (#2757) | ||
|
|
daa3ffc29b |
feat(loop-detection): make loop detection configurable with per-tool frequency overrides (#2711)
* Make loop detection configurable Expose LoopDetectionMiddleware thresholds through config.yaml while preserving existing defaults and allowing the middleware to be disabled. Refs bytedance/deer-flow#2517 * feat(loop-detection): add per-tool tool_freq_overrides to Phase 1 Adds ToolFreqOverride model and tool_freq_overrides field to LoopDetectionConfig, wires it through LoopDetectionMiddleware, and documents the option in config.example.yaml. Resolves the gap flagged in the #2586 review: without per-tool overrides, users hit by #2510/#2511 (RNA-seq workflows exceeding the bash hard limit) had no way to raise thresholds for one tool without loosening the global limit for every tool. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * docs(loop-detection): document tool_freq_overrides in LoopDetectionMiddleware docstring Add the missing Args entry for tool_freq_overrides, explaining the (warn, hard_limit) tuple structure and how per-tool thresholds supersede the global tool_freq_warn / tool_freq_hard_limit for named tools. Also run ruff format on the three files flagged by the lint check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(loop-detection): validate LoopDetectionMiddleware __init__ params eagerly Raise clear ValueError at construction time instead of crashing at unpack-time inside _track_and_check when bad values are passed: - tool_freq_overrides: must be 2-tuples of positive ints with hard_limit >= warn - scalar thresholds: warn_threshold, hard_limit, tool_freq_warn, tool_freq_hard_limit must be >= 1 and hard limits must >= their warn pairs - window_size, max_tracked_threads must be >= 1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(test): isolate credential loader directory-path test from real ~/.claude The test didn't monkeypatch HOME, so on any machine with real Claude Code credentials at ~/.claude/.credentials.json the function fell through to those credentials and the assertion failed. Adding HOME redirect ensures the default credential path doesn't exist during the test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style(test): add blank lines after import pytest in TestInitValidation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(loop-detection): collapse dual validation to LoopDetectionConfig Modifications - LoopDetectionMiddleware.__init__: stripped of all ValueError raises; becomes a plain field-assignment constructor. - LoopDetectionMiddleware.from_config: classmethod that builds the middleware from a Pydantic-validated LoopDetectionConfig and handles the ToolFreqOverride -> tuple[int, int] conversion. - agents/factory.py: SDK construction routed through LoopDetectionMiddleware.from_config(LoopDetectionConfig()) so the defaults path is Pydantic-validated too. - agents/lead_agent/agent.py: uses from_config instead of unpacking config fields by hand. - tests/test_loop_detection_middleware.py: deleted TestInitValidation (16 methods exercising the removed __init__ checks); added TestFromConfig (4 tests: scalar field mapping, override tuple conversion, empty overrides, behavioral smoke test). Result: one validation layer (Pydantic), zero duplication, no __new__ hacks. Both production construction sites flow through LoopDetectionConfig. Test results make test -> 2977 passed, 18 skipped, 0 failed (137s) make format -> All checks passed; 411 files left unchanged * feat(agents): make loop_detection configurable in create_deerflow_agent Adds a `loop_detection: bool | AgentMiddleware = True` field to RuntimeFeatures, mirroring the existing pattern used by `sandbox`, `memory`, and `vision`. SDK users can now disable LoopDetectionMiddleware or replace it with a custom instance built from their own LoopDetectionConfig — e.g. `LoopDetectionMiddleware.from_config(my_cfg)` — instead of being stuck with the hardcoded defaults previously installed by the SDK factory. The lead-agent path (which already reads AppConfig.loop_detection) is unchanged, and the default `True` preserves prior always-on behavior for all existing callers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: knight0940 <631532668@qq.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Amorend <142649913+knight0940@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
e8675f266d |
fix(loop-detection): keep tool-call pairing on warn injection (#2724) (#2725)
* fix(loop-detection): keep tool-call pairing on warn injection (#2724) * make format * fix(loop-detection): avoid IMMessage leak to downstream consumer * fix(channels): filter loop warning text from IM replies |
||
|
|
d02f762ab0 |
feat: refine token usage display modes (#2329)
* feat: refine token usage display modes * docs: clarify token usage accounting semantics * fix: avoid duplicate subtask debug keys * style: format token usage tests * chore: address token attribution review feedback * Update test_token_usage_middleware.py * Update test_token_usage_middleware.py * chore: simplify token attribution fallback * fix token usage metadata follow-up handling --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
8ba01dfd83 |
refactor: thread app_config through lead and subagent task path (#2666)
* refactor: thread app config through lead prompt * fix: honor explicit app config across runtime paths * style: format subagent executor tests * fix: thread resolved app config and guard subagents-only fallback Address two PR review findings: 1. _create_summarization_middleware passed the original (possibly None) app_config into create_chat_model, forcing the model factory back to ambient get_app_config() and risking config drift between the middleware's resolved view and the model's view. Pass the resolved AppConfig instance through end-to-end. 2. get_available_subagent_names accepted Any-typed config and forwarded it to is_host_bash_allowed, which reads ``.sandbox``. A SubagentsAppConfig (also accepted upstream as a sum-type input) has no ``.sandbox`` attribute and would be silently treated as "no sandbox configured", incorrectly disabling the bash subagent. Guard on hasattr and fall back to ambient lookup otherwise. Adds regression tests for both paths. * chore: simplify hasattr guard and tighten regression tests - Collapse if/else into ternary in get_available_subagent_names; hasattr(None, ...) is False so the explicit None check was redundant. - Drop comments that narrate the change rather than explain non-obvious WHY (test names already convey intent). - Replace stringly-typed sentinel "no-arg" in regression test with direct args tuple comparison. --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> |
||
|
|
487c1d939f |
fix(subagents): use model override for tools and middleware (#2641)
* fix(subagents): use model override for tools and middleware * fix(config): resolve effective subagent model * fix(subagents): defer app config loading * fix(subagents): fully defer config.yaml load in executor __init__ The previous attempt only relocated the explicit get_app_config() call, but left resolve_subagent_model_name(...) running eagerly in __init__. That helper has its own internal get_app_config() fallback, which still fired when both app_config and parent_model were None and config.model == "inherit" — exactly the path unit tests hit, breaking 21 tests in CI with FileNotFoundError: config.yaml. Skip the eager resolve in __init__ when it would require loading the config file, and defer to _create_agent (which already has the app_config or get_app_config() fallback). |
||
|
|
38714b6ceb |
refactor: thread app_config through middleware factories (#2652)
* refactor: thread app_config through middleware factories Continues the incremental config-refactor sequence (#2611 root, #2612 lead path) one layer deeper into the middleware factories. Two ambient lookups inside _build_runtime_middlewares are eliminated and the LLMErrorHandling band-aid removed: - _build_runtime_middlewares / build_lead_runtime_middlewares / build_subagent_runtime_middlewares now require app_config: AppConfig. - get_guardrails_config() inside the factory is replaced with app_config.guardrails (semantically identical — same default-factory GuardrailsConfig — verified by direct equality check). - LLMErrorHandlingMiddleware.__init__ now requires app_config and reads circuit_breaker fields directly. The class-level circuit_failure_threshold / circuit_recovery_timeout_sec defaults are removed along with the try/except (FileNotFoundError, RuntimeError): pass band-aid — the let-it-crash invariant the rest of the refactor enforces. Caller chain (already-resolved app_config sources): - _build_middlewares in lead_agent/agent.py: reorder so resolved_app_config = app_config or get_app_config() is computed BEFORE build_lead_runtime_middlewares is called, then passed as kwarg. - SubagentExecutor: optional app_config parameter (mirrors the lead-agent pattern); _create_agent does the same `or get_app_config()` fallback at agent-build time, so task_tool callers don't need to plumb app_config through yet (typed-context plumbing for tool runtimes is a separate refactor). Tests: - test_llm_error_handling_middleware: _make_app_config helper using AppConfig(sandbox=SandboxConfig(use="test")) — same minimal-config pattern conftest already uses. Three direct LLMErrorHandlingMiddleware() calls each followed by post-construction circuit_breaker mutation fold cleanly into _build_middleware(circuit_failure_threshold=..., circuit_recovery_timeout_sec=...). Verification: - tests/test_llm_error_handling_middleware.py — 14 passed - tests/test_subagent_executor.py — 28 passed - tests/test_tool_error_handling_middleware.py — 6 passed - tests/test_task_tool_core_logic.py — 18 passed (verifies task_tool unchanged behavior) - Full suite: 2697 passed, 3 skipped. The single intermittent failure in tests/test_client_e2e.py::test_tool_call_produces_events is pre-existing LLM flakiness (the test asserts the model decided to call a tool; reproduces 1/3 on unchanged main as well). * fix: address middleware app config review comments * fix: satisfy app config annotation lint * test: cover explicit app config middleware wiring --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> |
||
|
|
db5ad86381 |
feat: enhance chat history loading with new hooks and UI components (#2338)
* Refactor API fetch calls to use a unified fetch function; enhance chat history loading with new hooks and UI components - Replaced `fetchWithAuth` with a generic `fetch` function across various API modules for consistency. - Updated `useThreadStream` and `useThreadHistory` hooks to manage chat history loading, including loading states and pagination. - Introduced `LoadMoreHistoryIndicator` component for better user experience when loading more chat history. - Enhanced message handling in `MessageList` to accommodate new loading states and history management. - Added support for run messages in the thread context, improving the overall message handling logic. - Updated translations for loading indicators in English and Chinese. * Fix test assertions for run ordering in RunManager tests - Updated assertions in `test_list_by_thread` to reflect correct ordering of runs. - Modified `test_list_by_thread_is_stable_when_timestamps_tie` to ensure stable ordering when timestamps are tied. |
||
|
|
2e05f380c4 |
feat(persistence): per-user filesystem isolation, run-scoped APIs, and state/history simplification (#2153)
* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930) * feat(persistence): add SQLAlchemy 2.0 async ORM scaffold Introduce a unified database configuration (DatabaseConfig) that controls both the LangGraph checkpointer and the DeerFlow application persistence layer from a single `database:` config section. New modules: - deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends - deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton - deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation Gateway integration initializes/tears down the persistence engine in the existing langgraph_runtime() context manager. Legacy checkpointer config is preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add RunEventStore ABC + MemoryRunEventStore Phase 2-A prerequisite for event storage: adds the unified run event stream interface (RunEventStore) with an in-memory implementation, RunEventsConfig, gateway integration, and comprehensive tests (27 cases). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints Phase 2-B: run persistence + event storage + token tracking. - ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow - RunRepository implements RunStore ABC via SQLAlchemy ORM - ThreadMetaRepository with owner access control - DbRunEventStore with trace content truncation and cursor pagination - JsonlRunEventStore with per-run files and seq recovery from disk - RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events, accumulates token usage by caller type, buffers and flushes to store - RunManager now accepts optional RunStore for persistent backing - Worker creates RunJournal, writes human_message, injects callbacks - Gateway deps use factory functions (RunRepository when DB available) - New endpoints: messages, run messages, run events, token-usage - ThreadCreateRequest gains assistant_id field - 92 tests pass (33 new), zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add user feedback + follow-up run association Phase 2-C: feedback and follow-up tracking. - FeedbackRow ORM model (rating +1/-1, optional message_id, comment) - FeedbackRepository with CRUD, list_by_run/thread, aggregate stats - Feedback API endpoints: create, list, stats, delete - follow_up_to_run_id in RunCreateRequest (explicit or auto-detected from latest successful run on the thread) - Worker writes follow_up_to_run_id into human_message event metadata - Gateway deps: feedback_repo factory + getter - 17 new tests (14 FeedbackRepository + 3 follow-up association) - 109 total tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config - config.example.yaml: deprecate standalone checkpointer section, activate unified database:sqlite as default (drives both checkpointer + app data) - New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage including check_access owner logic, list_by_owner pagination - Extended test_run_repository.py (+4 tests) — completion preserves fields, list ordering desc, limit, owner_none returns all - Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false, middleware no ai_message, unknown caller tokens, convenience fields, tool_error, non-summarization custom event - Extended test_run_event_store.py (+7 tests) — DB batch seq continuity, make_run_event_store factory (memory/db/jsonl/fallback/unknown) - Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists, follow-up metadata, summarization in history, full DB-backed lifecycle - Fixed DB integration test to use proper fake objects (not MagicMock) for JSON-serializable metadata - 157 total Phase 2 tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * config: move default sqlite_dir to .deer-flow/data Keep SQLite databases alongside other DeerFlow-managed data (threads, memory) under the .deer-flow/ directory instead of a top-level ./data folder. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now() - Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM models. Add json_serializer=json.dumps(ensure_ascii=False) to all create_async_engine calls so non-ASCII text (Chinese etc.) is stored as-is in both SQLite and Postgres. - Change ORM datetime defaults from datetime.now(UTC) to datetime.now(), remove UTC imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): simplify deps.py with getter factory + inline repos - Replace 6 identical getter functions with _require() factory. - Inline 3 _make_*_repo() factories into langgraph_runtime(), call get_session_factory() once instead of 3 times. - Add thread_meta upsert in start_run (services.py). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add UV_EXTRAS build arg for optional dependencies Support installing optional dependency groups (e.g. postgres) at Docker build time via UV_EXTRAS build arg: UV_EXTRAS=postgres docker compose build Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(journal): fix flush, token tracking, and consolidate tests RunJournal fixes: - _flush_sync: retain events in buffer when no event loop instead of dropping them; worker's finally block flushes via async flush(). - on_llm_end: add tool_calls filter and caller=="lead_agent" guard for ai_message events; mark message IDs for dedup with record_llm_usage. - worker.py: persist completion data (tokens, message count) to RunStore in finally block. Model factory: - Auto-inject stream_usage=True for BaseChatOpenAI subclasses with custom api_base, so usage_metadata is populated in streaming responses. Test consolidation: - Delete test_phase2b_integration.py (redundant with existing tests). - Move DB-backed lifecycle test into test_run_journal.py. - Add tests for stream_usage injection in test_model_factory.py. - Clean up executor/task_tool dead journal references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): widen content type to str|dict in all store backends Allow event content to be a dict (for structured OpenAI-format messages) in addition to plain strings. Dict values are JSON-serialized for the DB backend and deserialized on read; memory and JSONL backends handle dicts natively. Trace truncation now serializes dicts to JSON before measuring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(events): use metadata flag instead of heuristic for dict content detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(converters): add LangChain-to-OpenAI message format converters Pure functions langchain_to_openai_message, langchain_to_openai_completion, langchain_messages_to_openai, and _infer_finish_reason for converting LangChain BaseMessage objects to OpenAI Chat Completions format, used by RunJournal for event storage. 15 unit tests added. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(converters): handle empty list content as null, clean up test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): human_message content uses OpenAI user message format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): ai_message uses OpenAI format, add ai_tool_call message event - ai_message content now uses {"role": "assistant", "content": "..."} format - New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls - ai_tool_call uses langchain_to_openai_message converter for consistent format - Both events include finish_reason in metadata ("stop" or "tool_calls") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add tool_result message event with OpenAI tool message format Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end, then emit a tool_result message event (role=tool, tool_call_id, content) after each successful tool completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): summary content uses OpenAI system message format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format Add on_chat_model_start to capture structured prompt messages as llm_request events. Replace llm_end trace events with llm_response using OpenAI Chat Completions format. Track llm_call_index to pair request/response events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add record_middleware method for middleware trace events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(events): add full run sequence integration test for OpenAI content format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): align message events with checkpoint format and add middleware tag injection - Message events (ai_message, ai_tool_call, tool_result, human_message) now use BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages - on_tool_end extracts tool_call_id/name/status from ToolMessage objects - on_tool_error now emits tool_result message events with error status - record_middleware uses middleware:{tag} event_type and middleware category - Summarization custom events use middleware:summarize category - TitleMiddleware injects middleware:title tag via get_config() inheritance - SummarizationMiddleware model bound with middleware:summarize tag - Worker writes human_message using HumanMessage.model_dump() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): switch search endpoint to threads_meta table and sync title - POST /api/threads/search now queries threads_meta table directly, removing the two-phase Store + Checkpointer scan approach - Add ThreadMetaRepository.search() with metadata/status filters - Add ThreadMetaRepository.update_display_name() for title sync - Worker syncs checkpoint title to threads_meta.display_name on run completion - Map display_name to values.title in search response for API compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): history endpoint reads messages from event store - POST /api/threads/{thread_id}/history now combines two data sources: checkpointer for checkpoint_id, metadata, title, thread_data; event store for messages (complete history, not truncated by summarization) - Strip internal LangGraph metadata keys from response - Remove full channel_values serialization in favor of selective fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate optional-dependencies header in pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(middleware): pass tagged config to TitleMiddleware ainvoke call Without the config, the middleware:title tag was not injected, causing the LLM response to be recorded as a lead_agent ai_message in run_events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflict in .env.example Keep both DATABASE_URL (from persistence-scaffold) and WECOM credentials (from main) after the merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address review feedback on PR #1851 - Fix naive datetime.now() → datetime.now(UTC) in all ORM models - Fix seq race condition in DbRunEventStore.put() with FOR UPDATE and UNIQUE(thread_id, seq) constraint - Encapsulate _store access in RunManager.update_run_completion() - Deduplicate _store.put() logic in RunManager via _persist_to_store() - Add update_run_completion to RunStore ABC + MemoryRunStore - Wire follow_up_to_run_id through the full create path - Add error recovery to RunJournal._flush_sync() lost-event scenario - Add migration note for search_threads breaking change - Fix test_checkpointer_none_fix mock to set database=None Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update uv.lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality Bug fixes: - Sanitize log params to prevent log injection (CodeQL) - Reset threads_meta.status to idle/error when run completes - Attach messages only to latest checkpoint in /history response - Write threads_meta on POST /threads so new threads appear in search Lint fixes: - Remove unused imports (journal.py, migrations/env.py, test_converters.py) - Convert lambda to named function (engine.py, Ruff E731) - Remove unused logger definitions in repos (Ruff F841) - Add logging to JSONL decode errors and empty except blocks - Separate assert side-effects in tests (CodeQL) - Remove unused local variables in tests (Ruff F841) - Fix max_trace_content truncation to use byte length, not char length Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: apply ruff format to persistence and runtime files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding 'Statement has no effect' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * refactor(runtime): introduce RunContext to reduce run_agent parameter bloat Extract checkpointer, store, event_store, run_events_config, thread_meta_repo, and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context() in deps.py to build the base context from app.state singletons. start_run() uses dataclasses.replace() to enrich per-run fields before passing ctx to run_agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): move sanitize_log_param to app/gateway/utils.py Extract the log-injection sanitizer from routers/threads.py into a shared utils module and rename to sanitize_log_param (public API). Eliminates the reverse service → router import in services.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use SQL aggregation for feedback stats and thread token usage Replace Python-side counting in FeedbackRepository.aggregate_by_run with a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread abstract method with SQL GROUP BY implementation in RunRepository and Python fallback in MemoryRunStore. Simplify the thread_token_usage endpoint to delegate to the new method, eliminating the limit=10000 truncation risk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: annotate DbRunEventStore.put() as low-frequency path Add docstring clarifying that put() opens a per-call transaction with FOR UPDATE and should only be used for infrequent writes (currently just the initial human_message event). High-throughput callers should use put_batch() instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(threads): fall back to Store search when ThreadMetaRepository is unavailable When database.backend=memory (default) or no SQL session factory is configured, search_threads now queries the LangGraph Store instead of returning 503. Returns empty list if neither Store nor repo is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata Add ThreadMetaStore abstract base class with create/get/search/update/delete interface. ThreadMetaRepository (SQL) now inherits from it. New MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments. deps.py now always provides a non-None thread_meta_repo, eliminating all `if thread_meta_repo is not None` guards in services.py, worker.py, and routers/threads.py. search_threads no longer needs a Store fallback branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(history): read messages from checkpointer instead of RunEventStore The /history endpoint now reads messages directly from the checkpointer's channel_values (the authoritative source) instead of querying RunEventStore.list_messages(). The RunEventStore API is preserved for other consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address new Copilot review comments - feedback.py: validate thread_id/run_id before deleting feedback - jsonl.py: add path traversal protection with ID validation - run_repo.py: parse `before` to datetime for PostgreSQL compat - thread_meta_repo.py: fix pagination when metadata filter is active - database_config.py: use resolve_path for sqlite_dir consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Implement skill self-evolution and skill_manage flow (#1874) * chore: ignore .worktrees directory * Add skill_manage self-evolution flow * Fix CI regressions for skill_manage * Address PR review feedback for skill evolution * fix(skill-evolution): preserve history on delete * fix(skill-evolution): tighten scanner fallbacks * docs: add skill_manage e2e evidence screenshot * fix(skill-manage): avoid blocking fs ops in session runtime --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> * fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir resolve_path() resolves relative to Paths.base_dir (.deer-flow), which double-nested the path to .deer-flow/.deer-flow/data/app.db. Use Path.resolve() (CWD-relative) instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Feature/feishu receive file (#1608) * feat(feishu): add channel file materialization hook for inbound messages - Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op. - Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text. - Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files. - No impact on Slack/Telegram or other channels (they inherit the default no-op). * style(backend): format code with ruff for lint compliance - Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format` - Ensured both files conform to project linting standards - Fixes CI lint check failures caused by code style issues * fix(feishu): handle file write operation asynchronously to prevent blocking * fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code * test(feishu): add tests for receive_file method and placeholder replacement * fix(manager): remove unnecessary type casting for channel retrieval * fix(feishu): update logging messages to reflect resource handling instead of image * fix(feishu): sanitize filename by replacing invalid characters in file uploads * fix(feishu): improve filename sanitization and reorder image key handling in message processing * fix(feishu): add thread lock to prevent filename conflicts during file downloads * fix(test): correct bad merge in test_feishu_parser.py * chore: run ruff and apply formatting cleanup fix(feishu): preserve rich-text attachment order and improve fallback filename handling * fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915) Two production docker-compose.yaml bugs prevent `make up` from working: 1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH environment overrides. Added in |
||
|
|
d8ecaf46c9 |
feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)
* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold Introduce a unified database configuration (DatabaseConfig) that controls both the LangGraph checkpointer and the DeerFlow application persistence layer from a single `database:` config section. New modules: - deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends - deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton - deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation Gateway integration initializes/tears down the persistence engine in the existing langgraph_runtime() context manager. Legacy checkpointer config is preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add RunEventStore ABC + MemoryRunEventStore Phase 2-A prerequisite for event storage: adds the unified run event stream interface (RunEventStore) with an in-memory implementation, RunEventsConfig, gateway integration, and comprehensive tests (27 cases). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints Phase 2-B: run persistence + event storage + token tracking. - ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow - RunRepository implements RunStore ABC via SQLAlchemy ORM - ThreadMetaRepository with owner access control - DbRunEventStore with trace content truncation and cursor pagination - JsonlRunEventStore with per-run files and seq recovery from disk - RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events, accumulates token usage by caller type, buffers and flushes to store - RunManager now accepts optional RunStore for persistent backing - Worker creates RunJournal, writes human_message, injects callbacks - Gateway deps use factory functions (RunRepository when DB available) - New endpoints: messages, run messages, run events, token-usage - ThreadCreateRequest gains assistant_id field - 92 tests pass (33 new), zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add user feedback + follow-up run association Phase 2-C: feedback and follow-up tracking. - FeedbackRow ORM model (rating +1/-1, optional message_id, comment) - FeedbackRepository with CRUD, list_by_run/thread, aggregate stats - Feedback API endpoints: create, list, stats, delete - follow_up_to_run_id in RunCreateRequest (explicit or auto-detected from latest successful run on the thread) - Worker writes follow_up_to_run_id into human_message event metadata - Gateway deps: feedback_repo factory + getter - 17 new tests (14 FeedbackRepository + 3 follow-up association) - 109 total tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config - config.example.yaml: deprecate standalone checkpointer section, activate unified database:sqlite as default (drives both checkpointer + app data) - New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage including check_access owner logic, list_by_owner pagination - Extended test_run_repository.py (+4 tests) — completion preserves fields, list ordering desc, limit, owner_none returns all - Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false, middleware no ai_message, unknown caller tokens, convenience fields, tool_error, non-summarization custom event - Extended test_run_event_store.py (+7 tests) — DB batch seq continuity, make_run_event_store factory (memory/db/jsonl/fallback/unknown) - Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists, follow-up metadata, summarization in history, full DB-backed lifecycle - Fixed DB integration test to use proper fake objects (not MagicMock) for JSON-serializable metadata - 157 total Phase 2 tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * config: move default sqlite_dir to .deer-flow/data Keep SQLite databases alongside other DeerFlow-managed data (threads, memory) under the .deer-flow/ directory instead of a top-level ./data folder. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now() - Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM models. Add json_serializer=json.dumps(ensure_ascii=False) to all create_async_engine calls so non-ASCII text (Chinese etc.) is stored as-is in both SQLite and Postgres. - Change ORM datetime defaults from datetime.now(UTC) to datetime.now(), remove UTC imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): simplify deps.py with getter factory + inline repos - Replace 6 identical getter functions with _require() factory. - Inline 3 _make_*_repo() factories into langgraph_runtime(), call get_session_factory() once instead of 3 times. - Add thread_meta upsert in start_run (services.py). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add UV_EXTRAS build arg for optional dependencies Support installing optional dependency groups (e.g. postgres) at Docker build time via UV_EXTRAS build arg: UV_EXTRAS=postgres docker compose build Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(journal): fix flush, token tracking, and consolidate tests RunJournal fixes: - _flush_sync: retain events in buffer when no event loop instead of dropping them; worker's finally block flushes via async flush(). - on_llm_end: add tool_calls filter and caller=="lead_agent" guard for ai_message events; mark message IDs for dedup with record_llm_usage. - worker.py: persist completion data (tokens, message count) to RunStore in finally block. Model factory: - Auto-inject stream_usage=True for BaseChatOpenAI subclasses with custom api_base, so usage_metadata is populated in streaming responses. Test consolidation: - Delete test_phase2b_integration.py (redundant with existing tests). - Move DB-backed lifecycle test into test_run_journal.py. - Add tests for stream_usage injection in test_model_factory.py. - Clean up executor/task_tool dead journal references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): widen content type to str|dict in all store backends Allow event content to be a dict (for structured OpenAI-format messages) in addition to plain strings. Dict values are JSON-serialized for the DB backend and deserialized on read; memory and JSONL backends handle dicts natively. Trace truncation now serializes dicts to JSON before measuring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(events): use metadata flag instead of heuristic for dict content detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(converters): add LangChain-to-OpenAI message format converters Pure functions langchain_to_openai_message, langchain_to_openai_completion, langchain_messages_to_openai, and _infer_finish_reason for converting LangChain BaseMessage objects to OpenAI Chat Completions format, used by RunJournal for event storage. 15 unit tests added. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(converters): handle empty list content as null, clean up test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): human_message content uses OpenAI user message format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): ai_message uses OpenAI format, add ai_tool_call message event - ai_message content now uses {"role": "assistant", "content": "..."} format - New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls - ai_tool_call uses langchain_to_openai_message converter for consistent format - Both events include finish_reason in metadata ("stop" or "tool_calls") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add tool_result message event with OpenAI tool message format Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end, then emit a tool_result message event (role=tool, tool_call_id, content) after each successful tool completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): summary content uses OpenAI system message format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format Add on_chat_model_start to capture structured prompt messages as llm_request events. Replace llm_end trace events with llm_response using OpenAI Chat Completions format. Track llm_call_index to pair request/response events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add record_middleware method for middleware trace events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(events): add full run sequence integration test for OpenAI content format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): align message events with checkpoint format and add middleware tag injection - Message events (ai_message, ai_tool_call, tool_result, human_message) now use BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages - on_tool_end extracts tool_call_id/name/status from ToolMessage objects - on_tool_error now emits tool_result message events with error status - record_middleware uses middleware:{tag} event_type and middleware category - Summarization custom events use middleware:summarize category - TitleMiddleware injects middleware:title tag via get_config() inheritance - SummarizationMiddleware model bound with middleware:summarize tag - Worker writes human_message using HumanMessage.model_dump() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): switch search endpoint to threads_meta table and sync title - POST /api/threads/search now queries threads_meta table directly, removing the two-phase Store + Checkpointer scan approach - Add ThreadMetaRepository.search() with metadata/status filters - Add ThreadMetaRepository.update_display_name() for title sync - Worker syncs checkpoint title to threads_meta.display_name on run completion - Map display_name to values.title in search response for API compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): history endpoint reads messages from event store - POST /api/threads/{thread_id}/history now combines two data sources: checkpointer for checkpoint_id, metadata, title, thread_data; event store for messages (complete history, not truncated by summarization) - Strip internal LangGraph metadata keys from response - Remove full channel_values serialization in favor of selective fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate optional-dependencies header in pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(middleware): pass tagged config to TitleMiddleware ainvoke call Without the config, the middleware:title tag was not injected, causing the LLM response to be recorded as a lead_agent ai_message in run_events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflict in .env.example Keep both DATABASE_URL (from persistence-scaffold) and WECOM credentials (from main) after the merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address review feedback on PR #1851 - Fix naive datetime.now() → datetime.now(UTC) in all ORM models - Fix seq race condition in DbRunEventStore.put() with FOR UPDATE and UNIQUE(thread_id, seq) constraint - Encapsulate _store access in RunManager.update_run_completion() - Deduplicate _store.put() logic in RunManager via _persist_to_store() - Add update_run_completion to RunStore ABC + MemoryRunStore - Wire follow_up_to_run_id through the full create path - Add error recovery to RunJournal._flush_sync() lost-event scenario - Add migration note for search_threads breaking change - Fix test_checkpointer_none_fix mock to set database=None Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update uv.lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality Bug fixes: - Sanitize log params to prevent log injection (CodeQL) - Reset threads_meta.status to idle/error when run completes - Attach messages only to latest checkpoint in /history response - Write threads_meta on POST /threads so new threads appear in search Lint fixes: - Remove unused imports (journal.py, migrations/env.py, test_converters.py) - Convert lambda to named function (engine.py, Ruff E731) - Remove unused logger definitions in repos (Ruff F841) - Add logging to JSONL decode errors and empty except blocks - Separate assert side-effects in tests (CodeQL) - Remove unused local variables in tests (Ruff F841) - Fix max_trace_content truncation to use byte length, not char length Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: apply ruff format to persistence and runtime files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding 'Statement has no effect' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * refactor(runtime): introduce RunContext to reduce run_agent parameter bloat Extract checkpointer, store, event_store, run_events_config, thread_meta_repo, and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context() in deps.py to build the base context from app.state singletons. start_run() uses dataclasses.replace() to enrich per-run fields before passing ctx to run_agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): move sanitize_log_param to app/gateway/utils.py Extract the log-injection sanitizer from routers/threads.py into a shared utils module and rename to sanitize_log_param (public API). Eliminates the reverse service → router import in services.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use SQL aggregation for feedback stats and thread token usage Replace Python-side counting in FeedbackRepository.aggregate_by_run with a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread abstract method with SQL GROUP BY implementation in RunRepository and Python fallback in MemoryRunStore. Simplify the thread_token_usage endpoint to delegate to the new method, eliminating the limit=10000 truncation risk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: annotate DbRunEventStore.put() as low-frequency path Add docstring clarifying that put() opens a per-call transaction with FOR UPDATE and should only be used for infrequent writes (currently just the initial human_message event). High-throughput callers should use put_batch() instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(threads): fall back to Store search when ThreadMetaRepository is unavailable When database.backend=memory (default) or no SQL session factory is configured, search_threads now queries the LangGraph Store instead of returning 503. Returns empty list if neither Store nor repo is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata Add ThreadMetaStore abstract base class with create/get/search/update/delete interface. ThreadMetaRepository (SQL) now inherits from it. New MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments. deps.py now always provides a non-None thread_meta_repo, eliminating all `if thread_meta_repo is not None` guards in services.py, worker.py, and routers/threads.py. search_threads no longer needs a Store fallback branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(history): read messages from checkpointer instead of RunEventStore The /history endpoint now reads messages directly from the checkpointer's channel_values (the authoritative source) instead of querying RunEventStore.list_messages(). The RunEventStore API is preserved for other consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address new Copilot review comments - feedback.py: validate thread_id/run_id before deleting feedback - jsonl.py: add path traversal protection with ID validation - run_repo.py: parse `before` to datetime for PostgreSQL compat - thread_meta_repo.py: fix pagination when metadata filter is active - database_config.py: use resolve_path for sqlite_dir consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Implement skill self-evolution and skill_manage flow (#1874) * chore: ignore .worktrees directory * Add skill_manage self-evolution flow * Fix CI regressions for skill_manage * Address PR review feedback for skill evolution * fix(skill-evolution): preserve history on delete * fix(skill-evolution): tighten scanner fallbacks * docs: add skill_manage e2e evidence screenshot * fix(skill-manage): avoid blocking fs ops in session runtime --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> * fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir resolve_path() resolves relative to Paths.base_dir (.deer-flow), which double-nested the path to .deer-flow/.deer-flow/data/app.db. Use Path.resolve() (CWD-relative) instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Feature/feishu receive file (#1608) * feat(feishu): add channel file materialization hook for inbound messages - Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op. - Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text. - Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files. - No impact on Slack/Telegram or other channels (they inherit the default no-op). * style(backend): format code with ruff for lint compliance - Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format` - Ensured both files conform to project linting standards - Fixes CI lint check failures caused by code style issues * fix(feishu): handle file write operation asynchronously to prevent blocking * fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code * test(feishu): add tests for receive_file method and placeholder replacement * fix(manager): remove unnecessary type casting for channel retrieval * fix(feishu): update logging messages to reflect resource handling instead of image * fix(feishu): sanitize filename by replacing invalid characters in file uploads * fix(feishu): improve filename sanitization and reorder image key handling in message processing * fix(feishu): add thread lock to prevent filename conflicts during file downloads * fix(test): correct bad merge in test_feishu_parser.py * chore: run ruff and apply formatting cleanup fix(feishu): preserve rich-text attachment order and improve fallback filename handling * fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915) Two production docker-compose.yaml bugs prevent `make up` from working: 1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH environment overrides. Added in |
||
|
|
ec8a8cae38 |
fix: gate deferred MCP tool execution (#2513)
* fix: gate deferred MCP tool execution * style: format deferred tool middleware * fix: address deferred tool review feedback |
||
|
|
f9ff3a698d |
fix(middleware): avoid rescuing non-skill tool outputs during summarization (#2458)
* fix(middelware): narrow skill rescue to skill-related tool outputs * fix(summarization): address skill rescue review feedback * fix: wire summarization skill rescue config * fix: remove dead skill tool helper * fix(lint): fix format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
11f557a2c6 |
feat(trace):Add run_name to the trace info for system agents. (#2492)
* feat(trace): Add `run_name` to the trace info for suggestions and memory. before(in langsmith): CodexChatModel CodexChatModel lead_agent after: suggest_agent memory_agent lead_agent feat(trace): Add `run_name` to the trace info for suggestions and memory. before(in langsmith): CodexChatModel CodexChatModel lead_agent after: suggest_agent memory_agent lead_agent * feat(trace): Add `run_name` to the trace info for system agents. before(in langsmith): CodexChatModel CodexChatModel CodexChatModel CodexChatModel lead_agent after: suggest_agent title_agent security_agent memory_agent lead_agent * chore(code format):code format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
a62ca5dd47 |
fix: Catch httpx.ReadError in the error handling (#2309)
* fix: Catch httpx.ReadError in the error handling * fix |
||
|
|
f514e35a36 | fix(backend): make clarification messages idempotent (#2350) (#2351) | ||
|
|
4ba3167f48 |
feat: flush memory before summarization (#2176)
* feat: flush memory before summarization * fix: keep agent-scoped memory on summarization flush * fix: harden summarization hook plumbing * fix: address summarization review feedback * style: format memory middleware |
||
|
|
e4f896e90d |
fix(todo-middleware): prevent premature agent exit with incomplete todos (#2135)
* fix(todo-middleware): prevent premature agent exit with incomplete todos When plan mode is active (is_plan_mode=True), the agent occasionally exits the loop and outputs a final response while todo items are still incomplete. This happens because the routing edge only checks for tool_calls, not todo completion state. Fixes #2112 Add an after_model override to TodoMiddleware with @hook_config(can_jump_to=["model"]). When the model produces a response with no tool calls but there are still incomplete todos, the middleware injects a todo_completion_reminder HumanMessage and returns jump_to=model to force another model turn. A cap of 2 reminders prevents infinite loops when the agent cannot make further progress. Also adds _completion_reminder_count() helper and 14 new unit tests covering all edge cases of the new after_model / aafter_model logic. * Remove unnecessary blank line in test file * Fix runtime argument annotation in before_model * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
c91785dd68 |
fix(title): strip <think> tags from title model responses and assistant context (#1927)
* fix(title): strip <think> tags from title model responses and assistant context Reasoning models (e.g. minimax M2.7, DeepSeek-R1) emit <think>...</think> blocks before their actual output. When such a model is used as the title model (or as the main agent), the raw thinking content leaked into the thread title stored in state, so the chat list showed the internal monologue instead of a meaningful title. Fixes #1884 - Add `_strip_think_tags()` helper using a regex to remove all <think>...</think> blocks - Apply it in `_parse_title()` so the title model response is always clean - Apply it to the assistant message in `_build_title_prompt()` so thinking content from the first AI turn is not fed back to the title model - Add four new unit tests covering: stripping in parse, think-only response, assistant prompt stripping, and end-to-end async flow with think tags * Fix the lint error --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
5db71cb68c |
fix(middleware): repair dangling tool-call history after loop interru… (#2035)
* fix(middleware): repair dangling tool-call history after loop interruption (#2029) * docs(backend): fix middleware chain ordering --------- Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> |
||
|
|
4d4ddb3d3f | feat(llm): introduce lightweight circuit breaker to prevent rate-limit bans and resource exhaustion (#2095) | ||
|
|
5b633449f8 |
fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware (#1988)
* fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware The existing hash-based loop detection only catches identical tool call sets. When the agent calls the same tool type (e.g. read_file) on many different files, each call produces a unique hash and bypasses detection. This causes the agent to exhaust recursion_limit, consuming 150K-225K tokens per failed run. Add a second detection layer that tracks cumulative call counts per tool type per thread. Warns at 30 calls (configurable) and forces stop at 50. The hard stop message now uses the actual returned message instead of a hardcoded constant, so both hash-based and frequency-based stops produce accurate diagnostics. Also fix _apply() to use the warning message returned by _track_and_check() for hard stops, instead of always using _HARD_STOP_MSG. Closes #1987 * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(lint): remove unused imports and fix line length - Remove unused _TOOL_FREQ_HARD_STOP_MSG and _TOOL_FREQ_WARNING_MSG imports from test file (F401) - Break long _TOOL_FREQ_WARNING_MSG string to fit within 240 char limit (E501) * style: apply ruff format * test: add LRU eviction and per-thread reset coverage for frequency state Address review feedback from @WillemJiang: - Verify _tool_freq and _tool_freq_warned are cleaned on LRU eviction - Add test for reset(thread_id=...) clearing only the target thread's frequency state while leaving others intact * fix(makefile): route Windows shell-script targets through Git Bash (#2060) --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Asish Kumar <87874775+officialasishkumar@users.noreply.github.com> |
||
|
|
02569136df |
fix(sandbox): improve sandbox security and preserve multimodal content (#2114)
* fix: improve sandbox security and preserve multimodal content * Add unit test modifications for test_injects_uploaded_files_tag_into_list_content * format updated_content * Add regression tests for multimodal upload content and host bash default safety |
||
|
|
ad6d934a5f |
fix(middleware): handle string-serialized options in ClarificationMiddleware (#1997)
* fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995) Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON strings instead of native arrays. Add defensive type checking in _format_clarification_message() to deserialize string options before iteration, preventing per-character rendering. * fix(middleware): normalize options after JSON deserialization Address Copilot review feedback: - Add post-deserialization normalization so options is always a list (handles json.loads returning a scalar string, dict, or None) - Add test for JSON-encoded scalar string ("development") - Fix test_json_string_with_mixed_types to use actual mixed types |
||
|
|
c3170f22da |
fix(backend): make loop detection hash tool calls by stable keys (#1911)
* fix(backend): make loop detection hash tool calls by stable keys The loop detection middleware previously hashed full tool call arguments, which made repeated calls look different when only non-essential argument details changed. In particular, `read_file` calls with nearby line ranges could bypass repetition detection even when the agent was effectively reading the same file region again and again. - Hash tool calls using stable keys instead of the full raw args payload - Bucket `read_file` line ranges so nearby reads map to the same region key - Prefer stable identifiers such as `path`, `url`, `query`, or `command` before falling back to JSON serialization of args - Keep hashing order-independent so the same tool call set produces the same hash regardless of call order Fixes #1905 * fix(backend): harden loop detection hash normalization - Normalize and parse stringified tool args defensively - Expand stable key derivation to include pattern, glob, and cmd - Normalize reversed read_file ranges before bucketing Fixes #1905 * fix(backend): harden loop detection tool format * exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged. --------- Co-authored-by: JeffJiang <for-eleven@hotmail.com> |
||
|
|
3b3e8e1b0b |
feat(sandbox): strengthen bash command auditing with compound splitting and expanded patterns (#1881)
* fix(sandbox): strengthen regex coverage in SandboxAuditMiddleware
Expand high-risk patterns from 6 to 13 and medium-risk from 4 to 6,
closing several bypass vectors identified by cross-referencing Claude
Code's BashSecurity validator chain against DeerFlow's threat model.
High-risk additions:
- Generalised pipe-to-sh (replaces narrow curl|sh rule)
- Targeted command substitution ($() / backtick with dangerous executables)
- base64 decode piped to execution
- Overwrite system binaries (/usr/bin/, /bin/, /sbin/)
- Overwrite shell startup files (~/.bashrc, ~/.profile, etc.)
- /proc/*/environ leakage
- LD_PRELOAD / LD_LIBRARY_PATH hijack
- /dev/tcp/ bash built-in networking
Medium-risk additions:
- sudo/su (no-op under Docker root, warn only)
- PATH= modification (long attack chain, warn only)
Design decisions:
- Command substitution uses targeted matching (curl/wget/bash/sh/python/
ruby/perl/base64) rather than blanket block to avoid false positives
on safe usage like $(date) or `whoami`.
- Skipped encoding/obfuscation checks (hex, octal, Unicode homoglyphs)
as ROI is low in Docker sandbox — LLMs don't generate encoded commands
and container isolation bounds the blast radius.
- Merged pip/pip3 into single pip3? pattern.
* feat(sandbox): compound command splitting and fork bomb detection
Split compound bash commands (&&, ||, ;) into sub-commands and classify
each independently — prevents dangerous commands hidden after safe
prefixes (e.g. "cd /workspace && rm -rf /") from bypassing detection.
- Add _split_compound_command() with shlex quote-aware splitting
- Add fork bomb detection patterns (classic and while-loop variants)
- Most severe verdict wins; block short-circuits
- 15 new tests covering compound commands, splitting, and fork bombs
* test(sandbox): add async tests for fork bomb and compound commands
Cover awrap_tool_call path for fork bomb detection (3 variants) and
compound command splitting (block/warn/pass scenarios).
* fix(sandbox): address Copilot review — no-whitespace operators, >>/etc/, whole-command scan
- _split_compound_command: replace shlex-based implementation with a
character-by-character quote/escape-aware scanner. shlex.split only
separates '&&' / '||' / ';' when they are surrounded by whitespace,
so payloads like 'rm -rf /&&echo ok' or 'safe;rm -rf /' bypassed the
previous splitter and therefore the per-sub-command classifier.
- _HIGH_RISK_PATTERNS: change r'>\s*/etc/' to r'>+\s*/etc/' so append
redirection ('>>/etc/hosts') is also blocked.
- _classify_command: run a whole-command high-risk scan *before*
splitting. Structural attacks like 'while true; do bash & done'
span multiple shell statements — splitting on ';' destroys the
pattern context, so the raw command must be scanned first.
- tests: add no-whitespace operator cases to TestSplitCompoundCommand
and test_compound_command_classification to lock in the bypass fix.
|
||
|
|
055e4df049 |
fix(sandbox): add input sanitisation guard to SandboxAuditMiddleware (#1872)
* fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware Add _validate_input() to reject malformed bash commands before regex classification: empty commands, oversized commands (>10 000 chars), and null bytes that could cause detection/execution layer inconsistency. * fix(sandbox): address Copilot review — type guard, log truncation, reject reason - Coerce None/non-string command to str before validation - Truncate oversized commands in audit logs to prevent log amplification - Propagate reject_reason through _pre_process() to block message - Remove L2 label from comments and test class names * fix(sandbox): isinstance type guard + async input sanitisation tests Address review comments: - Replace str() coercion with isinstance(raw_command, str) guard so non-string truthy values (0, [], False) fall back to empty string instead of passing validation as "0"/"[]"/"False". - Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests covering empty, null-byte, oversized, and None command via awrap_tool_call path. |
||
|
|
1ced6e977c |
fix(backend): preserve viewed image reducer metadata (#1900)
Fix concurrent viewed_images state updates for multi-image input by preserving the reducer metadata in the vision middleware state schema. |
||
|
|
8049785de6 |
fix(memory): case-insensitive fact deduplication and positive reinforcement detection (#1804)
* fix(memory): case-insensitive fact deduplication and positive reinforcement detection Two fixes to the memory system: 1. _fact_content_key() now lowercases content before comparison, preventing semantically duplicate facts like "User prefers Python" and "user prefers python" from being stored separately. 2. Adds detect_reinforcement() to MemoryMiddleware (closes #1719), mirroring detect_correction(). When users signal approval ("yes exactly", "perfect", "完全正确", etc.), the memory updater now receives reinforcement_detected=True and injects a hint prompting the LLM to record confirmed preferences and behaviors with high confidence. Changes across the full signal path: - memory_middleware.py: _REINFORCEMENT_PATTERNS + detect_reinforcement() - queue.py: reinforcement_detected field in ConversationContext and add() - updater.py: reinforcement_detected param in update_memory() and update_memory_from_conversation(); builds reinforcement_hint alongside the existing correction_hint Tests: 11 new tests covering deduplication, hint injection, and signal detection (Chinese + English patterns, window boundary, conflict with correction). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): address Copilot review comments on reinforcement detection - Tighten _REINFORCEMENT_PATTERNS: remove 很好, require punctuation/end-of-string boundaries on remaining patterns, split this-is-good into stricter variants - Suppress reinforcement_detected when correction_detected is true to avoid mixed-signal noise - Use casefold() instead of lower() for Unicode-aware fact deduplication - Add missing test coverage for reinforcement_detected OR merge and forwarding in queue --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
2a150f5d4a |
fix: unblock concurrent threads and workspace hydration (#1839)
* fix: unblock concurrent threads and workspace hydration * fix: restore async title generation * fix: address PR review feedback * style: format lead agent prompt |
||
|
|
163121d327 |
fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline (#1838)
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents Add workflow guidance to the <uploaded_files> context block so the agent knows to use grep and glob (added in #1784) alongside read_file when working with uploaded documents, rather than falling back to web search. This is the final piece of the three-PR PDF agentic search pipeline: - PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings - PR2 (#1738): document outline injected into agent context with line numbers - PR3 (this): agent guided to use outline + grep + read_file workflow * feat(uploads): add file-first priority and fallback guidance to uploaded_files context * fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline - Add _clean_bold_title() to merge adjacent bold spans (** **) produced by pymupdf4llm when bold text crosses span boundaries - Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise **<num>** **<title>** headings common in academic papers; excludes pure-number table headers and rows with more than 4 bold blocks - When outline is empty, read first 5 non-empty lines of the .md as a content preview and surface a grep hint in the agent context - Update _format_file_entry to render the preview + grep hint instead of silently omitting the outline section - Add 3 new extract_outline tests and 2 new middleware tests (65 total) * fix(uploads): address Copilot review comments on extract_outline regex - Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII titles (e.g. **1** **概述**); pure-numeric/punctuation blocks still excluded - Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input - Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex) - Log debug message with exc_info when preview extraction fails |
||
|
|
bbd0866374 |
feat(uploads): guide agent using agentic search for uploaded documents (#1816)
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents Add workflow guidance to the <uploaded_files> context block so the agent knows to use grep and glob (added in #1784) alongside read_file when working with uploaded documents, rather than falling back to web search. This is the final piece of the three-PR PDF agentic search pipeline: - PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings - PR2 (#1738): document outline injected into agent context with line numbers - PR3 (this): agent guided to use outline + grep + read_file workflow * feat(uploads): add file-first priority and fallback guidance to uploaded_files context |
||
|
|
db82b59254 |
fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware (#1823)
* fix: inject longTermBackground into memory prompt
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.
The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.
Add the missing field handling and a regression test.
* fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware
LangChain AIMessage.content can be str | list. When using providers that
return structured content blocks (e.g. Anthropic thinking mode, certain
OpenAI-compatible gateways), content is a list of dicts like
[{"type": "text", "text": "..."}].
The hard_limit branch in _apply() concatenated content with a string via
(last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}", which raises
TypeError when content is a non-empty list (list + str is invalid).
Add _append_text() static method that:
- Returns the text directly when content is None
- Appends a {"type": "text"} block when content is a list
- Falls back to string concatenation when content is a str
This is consistent with how other modules in the project already handle
list content (client.py._extract_text, memory_middleware, executor.py).
* test(middleware): add unit tests for _append_text and list content hard stop
Add regression tests to verify LoopDetectionMiddleware handles list-type
AIMessage.content correctly during hard stop:
- TestAppendText: unit tests for the new _append_text() static method
covering None, str, list (including empty list) content types
- TestHardStopWithListContent: integration tests verifying hard stop
works correctly with list content (Anthropic thinking mode), None
content, and str content
Requested by reviewer in PR #1823.
* fix(middleware): improve _append_text robustness and test isolation
- Add explicit isinstance(content, str) check with fallback for
unexpected types (coerce to str) to prevent TypeError on edge cases
- Deep-copy list content in _make_state() test helper to prevent
shared mutable references across test iterations
- Add test_unexpected_type_coerced_to_str: verify fallback for
non-str/list/None content types
- Add test_list_content_not_mutated_in_place: verify _append_text
does not modify the original list
* style: fix ruff format whitespace in test file
---------
Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
|
||
|
|
5ff230eafd |
feat(uploads): inject document outline into agent context for converted files (#1738)
* feat(uploads): inject document outline into agent context for converted files
Extract headings from converted .md files and inject them into the
<uploaded_files> context block so the agent can navigate large documents
by line number before reading.
- Add `extract_outline()` to `file_conversion.py`: recognises standard
Markdown headings (#/##/###) and SEC-style bold structural headings
(**ITEM N. BUSINESS**, **PART II**); caps at 50 entries; excludes
cover-page boilerplate (WASHINGTON DC, CURRENT REPORT, SIGNATURES)
- Add `_extract_outline_for_file()` helper in `uploads_middleware.py`:
looks for a sibling `.md` file produced by the conversion pipeline
- Update `UploadsMiddleware._create_files_message()` to render the outline
under each file entry with `L{line}: {title}` format and a `read_file`
prompt for range-based reading
- Tests: 10 new tests for `extract_outline()`, 4 new tests for outline
injection in `UploadsMiddleware`; existing test updated for new `outline`
field in `uploaded_files` state
Partially addresses #1647 (agent ignores uploaded files).
* fix(uploads): stream outline file reads and strip inline bold from heading titles
- Switch extract_outline() from read_text().splitlines() to open()+line iteration
so large converted documents are not loaded into memory on every agent turn;
exits as soon as MAX_OUTLINE_ENTRIES is reached (Copilot suggestion)
- Strip **...** wrapper from standard Markdown heading titles before appending
to outline so agent context stays clean (e.g. "## **Overview**" → "Overview")
(Copilot suggestion)
- Remove unused pathlib.Path import and fix import sort order in test_file_conversion.py
to satisfy ruff CI lint
* fix(uploads): show truncation hint when outline exceeds MAX_OUTLINE_ENTRIES
When extract_outline() hits the cap it now appends a sentinel entry
{"truncated": True} instead of silently dropping the rest of the headings.
UploadsMiddleware reads the sentinel and renders a hint line:
... (showing first 50 headings; use `read_file` to explore further)
Without this the agent had no way to know the outline was incomplete and
would treat the first 50 headings as the full document structure.
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id
runtime.context does not always carry thread_id (depends on LangGraph
invocation path). ThreadDataMiddleware already falls back to
get_config().configurable.thread_id — apply the same pattern so
UploadsMiddleware can resolve the uploads directory and attach outlines
in all invocation paths.
* style: apply ruff format
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
|
||
|
|
46d0c329c1 |
fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id (#1814)
* fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id runtime.context does not always carry thread_id depending on the LangGraph invocation path. When absent, uploads_dir resolved to None and the entire outline/historical-files attachment was silently skipped. Apply the same fallback pattern already used by ThreadDataMiddleware: try get_config().configurable.thread_id, with a RuntimeError guard for test environments where get_config() is called outside a runnable context. Discovered via live integration testing (curl against local LangGraph). Unit tests inject uploads_dir directly and would not catch this. * style: apply ruff format to uploads_middleware.py |
||
|
|
8128a3bc57 | fix: enable DanglingToolCallMiddleware for subagents (#1766) | ||
|
|
3a672b39c7 |
Fix/1681 llm call retry handling (#1683)
* fix(runtime): handle llm call errors gracefully * fix(runtime): preserve graph control flow in llm retry middleware --------- Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com> |
||
|
|
0cdecf7b30 |
feat(memory): structured reflection + correction detection in MemoryMiddleware (#1620) (#1668)
* feat(memory): add structured reflection and correction detection * fix(memory): align sourceError schema and prompt guidance --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
9aa3ff7c48 |
feat(sandbox): add SandboxAuditMiddleware for bash command security auditing (#1532)
* feat(sandbox): add SandboxAuditMiddleware for bash command security auditing Addresses the LocalSandbox escape vector reported in #1224 where bash tool calls can execute destructive commands against the host filesystem. - Add SandboxAuditMiddleware with three-tier command classification: - High-risk (block): rm -rf /, curl|bash, dd if=, mkfs, /etc/shadow access - Medium-risk (warn): pip install, apt install, chmod 777 - Safe (pass): normal workspace operations - Register middleware after GuardrailMiddleware in _build_runtime_middlewares, applied to both lead agent and subagents - Structured audit log via standard logger (visible in langgraph.log) - Medium-risk commands execute but append a warning to the tool result, allowing the LLM to self-correct without blocking legitimate workflows - High-risk commands return an error ToolMessage without calling the handler, so the agent loop continues gracefully * fix(lint): sort imports in test_sandbox_audit_middleware * refactor(sandbox-audit): address Copilot review feedback (3/5/6) - Fix class docstring to match implementation: medium-risk commands are executed with a warning appended (not rejected), and cwd anchoring note removed (handled in a separate PR) - Remove capsys.disabled() from benchmark test to avoid CI log noise; keep assertions for recall/precision targets - Remove misleading 'cwd fix' from test module docstring * test(sandbox-audit): add async tests for awrap_tool_call * fix(sandbox-audit): address Copilot review feedback (1/2) - Narrow rm high-risk regex to only block truly destructive targets (/, /*, ~, ~/*, /home, /root); legitimate workspace paths like /mnt/user-data/ are no longer false-positived - Handle list-typed ToolMessage content in _append_warn_to_result; append a text block instead of str()-ing the list to avoid breaking structured content normalization * style: apply ruff format to sandbox_audit_middleware files * fix(sandbox-audit): update benchmark comment to match assert-based implementation --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> |
||
|
|
06a623f9c8 | feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203) | ||
|
|
520c0352b5 |
fix(middleware): fall back to configurable thread_id in MemoryMiddleware (#1425) (#1426)
* fix(middleware): fall back to configurable thread_id in MemoryMiddleware (#1425) * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> |
||
|
|
03b144f9c9 |
fix: replace print() with logging across harness package (#1282)
Replace all bare print() calls with proper logging using Python's standard logging module across the deerflow harness package. Changes across 8 files (16 print statements replaced): - agents/middlewares/clarification_middleware.py: use logger.info/debug - agents/middlewares/memory_middleware.py: use logger.debug - agents/middlewares/thread_data_middleware.py: use logger.debug - agents/middlewares/view_image_middleware.py: use logger.debug - agents/memory/queue.py: use logger.info/debug/warning/error - agents/lead_agent/prompt.py: use logger.error - skills/loader.py: use logger.warning - skills/parser.py: use logger.error Each file follows the established codebase convention: import logging logger = logging.getLogger(__name__) Log levels chosen based on message semantics: - debug: routine operational details (directory creation, timer resets) - info: significant state changes (memory queued, updates processed) - warning: recoverable issues (config load failures, skipped updates) - error: unexpected failures (parsing errors, memory update errors) Note: client.py is intentionally excluded as it uses print() for CLI output, which is the correct behavior for a command-line client. Co-authored-by: moose-lab <moose-lab@users.noreply.github.com> |
||
|
|
4708700723 |
fix(middleware): return proper content format when no images viewed (#1454)
- Fix OpenAI BadRequestError: 'No images have been viewed.' was returned as
a plain string array instead of a properly formatted content block
- The OpenAI API expects message content to be either a string or an array
of objects with 'type' field, not an array of plain strings
- Changed return from ['No images have been viewed.'] to
[{'type': 'text', 'text': 'No images have been viewed.'}]
Fixes #1441
Co-authored-by: JasonOA888 <noreply@github.com>
|