deer-flow

mirror of https://github.com/bytedance/deer-flow.git synced 2026-06-10 09:25:57 +00:00

Author	SHA1	Message	Date
Willem Jiang	519200728a	fix(middleware): offload memory injection off event loop to prevent tiktoken blocking (#3402 ) (#3411 ) * fix(middleware): offload memory injection off event loop to prevent tiktoken blocking (#3402) DynamicContextMiddleware.abefore_agent() called _inject() synchronously on the asyncio event loop. The first time memory is injected (second request), _inject() → format_memory_for_injection() → _count_tokens() → tiktoken.get_encoding("cl100k_base") needs to download the BPE data from openaipublic.blob.core.windows.net. In network-restricted environments this download blocks until the OS TCP timeout (~26 min), starving ALL concurrent handlers including /api/v1/auth/me. Fix: - abefore_agent now uses asyncio.to_thread(self._inject, state) so file I/O and tiktoken never block the event loop. - Extract _get_tiktoken_encoding() with a module-level cache so tiktoken.get_encoding() is called at most once per encoding name. - Add warm_tiktoken_cache() startup helper; gateway lifespan pre-warms the cache via asyncio.to_thread so the first request never triggers a cold download. - _count_tokens falls back to len(text) // 4 on any encoding failure. Tests: - tests/test_tiktoken_cache_and_count_tokens.py (12 tests): cache hit/miss, fallback paths, warm-up helper. - tests/blocking_io/test_dynamic_context_middleware.py (2 tests): Blockbuster gate verifies abefore_agent does not block the event loop; async/sync parity check. Fixes #3402 * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix the lint error * fix(memory): use future annotations to avoid NameError when tiktoken is absent Add `from __future__ import annotations` to prompt.py so that tiktoken.Encoding type hints are never evaluated at runtime. Without this, environments where tiktoken is not installed could raise NameError on the module-level cache and function return annotations. Addresses Copilot review comment on PR #3411. * fix(middleware): bound abefore_agent injection with timeout to prevent hung requests Wrap the asyncio.to_thread(self._inject) offload in asyncio.wait_for() with a 5-second cap. If the startup warm-up failed silently (e.g. network blip during deploy), a cold tiktoken BPE download on the first request can block until the OS TCP timeout (~26 min). The bounded timeout ensures the request degrades gracefully (no memory/date context for that turn) rather than hanging. Adds test_abefore_agent_returns_none_on_timeout to the blocking-IO regression anchors. Addresses review feedback from xg-gh-25 on PR #3411. --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-06-08 12:21:55 +08:00
DanielWalnut	08ee7adeba	fix(lint): remove duplicate is_dynamic_context_reminder definition (#2837 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 23:40:46 +08:00
DanielWalnut	881ff71252	fix(harness): preserve dynamic context across summarization (#2823 )	2026-05-09 19:39:36 +08:00
DanielWalnut	f76e4e35c8	fix title generation with dynamic context reminder (#2830 )	2026-05-09 18:22:58 +08:00
DanielWalnut	c1b7f1d189	feat: static system prompt with DynamicContextMiddleware for prefix-cache optimization (#2801 ) * feat(middleware): inject dynamic context via DynamicContextMiddleware Move memory and current date out of the system prompt and into a dedicated <system-reminder> HumanMessage injected once per session (frozen-snapshot pattern) via a new DynamicContextMiddleware. This keeps the system prompt byte-exact across all users and sessions, enabling maximum Anthropic/Bedrock prefix-cache reuse. Key design decisions: - ID-swap technique: reminder takes the first HumanMessage's ID (replacing it in-place via add_messages), original content gets a derived `{id}__user` ID (appended after). Preserves correct ordering. - hide_from_ui: True on reminder messages so frontend filters them out. - Midnight crossing: date-update reminder injected before the current turn's HumanMessage when the conversation spans midnight. - INFO-level logging for production diagnostics. Also adds prompt-caching breakpoint budget enforcement tests and updates ClaudeChatModel docs to reference the new pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(token-usage): log input/output token detail breakdown in middleware Extend the LLM token usage log line to include input_token_details and output_token_details (cache_creation, cache_read, reasoning, audio, etc.) when present. Adds tests covering Anthropic cache detail logging from both usage_metadata and response_metadata. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: fix nginx * fix(middleware): always inject date; gate memory on injection_enabled Date injection is now unconditional — it is part of the static system prompt replacement and should always be present. Memory injection remains gated by `memory.injection_enabled` in the app config. Previously the entire DynamicContextMiddleware was skipped when injection_enabled was False, which also suppressed the date. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): format files and correct test assertions for token usage middleware - ruff format dynamic_context_middleware.py and test_claude_provider_prompt_caching.py - Remove unused pytest import from test_dynamic_context_middleware.py - Fix two tests that asserted response_metadata fallback logic that doesn't exist: replace with tests that match actual middleware behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(middleware): address Copilot review comments on DynamicContextMiddleware - Use additional_kwargs flag for reminder detection instead of content substring matching, so user messages containing '<system-reminder>' are not mistakenly treated as injected reminders - Generate stable UUID when original HumanMessage.id is None to prevent ambiguous 'None__user' derived IDs and message collisions - Downgrade per-turn no-op log to DEBUG; keep actual injection events at INFO - Add two new tests: missing-id UUID fallback and user-text false-positive Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 09:27:02 +08:00

5 Commits