feat: static system prompt with DynamicContextMiddleware for prefix-cache optimization (#2801)

* feat(middleware): inject dynamic context via DynamicContextMiddleware Move memory and current date out of the system prompt and into a dedicated <system-reminder> HumanMessage injected once per session (frozen-snapshot pattern) via a new DynamicContextMiddleware. This keeps the system prompt byte-exact across all users and sessions, enabling maximum Anthropic/Bedrock prefix-cache reuse. Key design decisions: - ID-swap technique: reminder takes the first HumanMessage's ID (replacing it in-place via add_messages), original content gets a derived `{id}__user` ID (appended after). Preserves correct ordering. - hide_from_ui: True on reminder messages so frontend filters them out. - Midnight crossing: date-update reminder injected before the current turn's HumanMessage when the conversation spans midnight. - INFO-level logging for production diagnostics. Also adds prompt-caching breakpoint budget enforcement tests and updates ClaudeChatModel docs to reference the new pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(token-usage): log input/output token detail breakdown in middleware Extend the LLM token usage log line to include input_token_details and output_token_details (cache_creation, cache_read, reasoning, audio, etc.) when present. Adds tests covering Anthropic cache detail logging from both usage_metadata and response_metadata. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: fix nginx * fix(middleware): always inject date; gate memory on injection_enabled Date injection is now unconditional — it is part of the static system prompt replacement and should always be present. Memory injection remains gated by `memory.injection_enabled` in the app config. Previously the entire DynamicContextMiddleware was skipped when injection_enabled was False, which also suppressed the date. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): format files and correct test assertions for token usage middleware - ruff format dynamic_context_middleware.py and test_claude_provider_prompt_caching.py - Remove unused pytest import from test_dynamic_context_middleware.py - Fix two tests that asserted response_metadata fallback logic that doesn't exist: replace with tests that match actual middleware behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(middleware): address Copilot review comments on DynamicContextMiddleware - Use additional_kwargs flag for reminder detection instead of content substring matching, so user messages containing '<system-reminder>' are not mistakenly treated as injected reminders - Generate stable UUID when original HumanMessage.id is None to prevent ambiguous 'None__user' derived IDs and message collisions - Downgrade per-turn no-op log to DEBUG; keep actual injection events at INFO - Add two new tests: missing-id UUID fallback and user-text false-positive Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 15:36:48 +00:00 · 2026-05-09 09:27:02 +08:00
parent 109490da25
commit c1b7f1d189
8 changed files with 623 additions and 12 deletions
@@ -1,5 +1,6 @@
 """Tests for TokenUsageMiddleware attribution annotations."""

+import logging
 from unittest.mock import MagicMock

 from langchain_core.messages import AIMessage
@@ -17,6 +18,82 @@ def _make_runtime():


 class TestTokenUsageMiddleware:
+    def test_logs_cache_token_details(self, caplog):
+        middleware = TokenUsageMiddleware()
+        message = AIMessage(
+            content="Here is the final answer.",
+            usage_metadata={
+                "input_tokens": 350,
+                "output_tokens": 240,
+                "total_tokens": 590,
+                "input_token_details": {
+                    "audio": 10,
+                    "cache_creation": 200,
+                    "cache_read": 100,
+                },
+                "output_token_details": {
+                    "audio": 10,
+                    "reasoning": 200,
+                },
+            },
+        )
+
+        with caplog.at_level(
+            logging.INFO,
+            logger="deerflow.agents.middlewares.token_usage_middleware",
+        ):
+            result = middleware.after_model({"messages": [message]}, _make_runtime())
+
+        assert result is not None
+        assert "LLM token usage: input=350 output=240 total=590" in caplog.text
+        assert "input_token_details={'audio': 10, 'cache_creation': 200, 'cache_read': 100}" in caplog.text
+        assert "output_token_details={'audio': 10, 'reasoning': 200}" in caplog.text
+
+    def test_logs_basic_tokens_when_no_detail_fields_in_usage_metadata(self, caplog):
+        """When usage_metadata has only totals (no input_token_details), log just the counts."""
+        middleware = TokenUsageMiddleware()
+        message = AIMessage(
+            content="Here is the final answer.",
+            usage_metadata={
+                "input_tokens": 350,
+                "output_tokens": 240,
+                "total_tokens": 590,
+            },
+        )
+
+        with caplog.at_level(
+            logging.INFO,
+            logger="deerflow.agents.middlewares.token_usage_middleware",
+        ):
+            result = middleware.after_model({"messages": [message]}, _make_runtime())
+
+        assert result is not None
+        assert "LLM token usage: input=350 output=240 total=590" in caplog.text
+        assert "input_token_details" not in caplog.text
+
+    def test_no_log_when_usage_metadata_is_missing(self, caplog):
+        """When usage_metadata is absent, no token usage line is logged."""
+        middleware = TokenUsageMiddleware()
+        message = AIMessage(
+            content="Here is the final answer.",
+            response_metadata={
+                "usage": {
+                    "input_tokens": 350,
+                    "output_tokens": 240,
+                    "total_tokens": 590,
+                }
+            },
+        )
+
+        with caplog.at_level(
+            logging.INFO,
+            logger="deerflow.agents.middlewares.token_usage_middleware",
+        ):
+            result = middleware.after_model({"messages": [message]}, _make_runtime())
+
+        assert result is not None
+        assert "LLM token usage" not in caplog.text
+
    def test_annotates_todo_updates_with_structured_actions(self):
        middleware = TokenUsageMiddleware()
        message = AIMessage(