Files
deer-flow/frontend/src/content/en/harness/middlewares.mdx
T
greatmengqi 3e6a34297d refactor(config): eliminate global mutable state — explicit parameter passing on top of main
Squashes 25 PR commits onto current main. AppConfig becomes a pure value
object with no ambient lookup. Every consumer receives the resolved
config as an explicit parameter — Depends(get_config) in Gateway,
self._app_config in DeerFlowClient, runtime.context.app_config in agent
runs, AppConfig.from_file() at the LangGraph Server registration
boundary.

Phase 1 — frozen data + typed context

- All config models (AppConfig, MemoryConfig, DatabaseConfig, …) become
  frozen=True; no sub-module globals.
- AppConfig.from_file() is pure (no side-effect singleton loaders).
- Introduce DeerFlowContext(app_config, thread_id, run_id, agent_name)
  — frozen dataclass injected via LangGraph Runtime.
- Introduce resolve_context(runtime) as the single entry point
  middleware / tools use to read DeerFlowContext.

Phase 2 — pure explicit parameter passing

- Gateway: app.state.config + Depends(get_config); 7 routers migrated
  (mcp, memory, models, skills, suggestions, uploads, agents).
- DeerFlowClient: __init__(config=...) captures config locally.
- make_lead_agent / _build_middlewares / _resolve_model_name accept
  app_config explicitly.
- RunContext.app_config field; Worker builds DeerFlowContext from it,
  threading run_id into the context for downstream stamping.
- Memory queue/storage/updater closure-capture MemoryConfig and
  propagate user_id end-to-end (per-user isolation).
- Sandbox/skills/community/factories/tools thread app_config.
- resolve_context() rejects non-typed runtime.context.
- Test suite migrated off AppConfig.current() monkey-patches.
- AppConfig.current() classmethod deleted.

Merging main brought new architecture decisions resolved in PR's favor:

- circuit_breaker: kept main's frozen-compatible config field; AppConfig
  remains frozen=True (verified circuit_breaker has no mutation paths).
- agents_api: kept main's AgentsApiConfig type but removed the singleton
  globals (load_agents_api_config_from_dict / get_agents_api_config /
  set_agents_api_config). 8 routes in agents.py now read via
  Depends(get_config).
- subagents: kept main's get_skills_for / custom_agents feature on
  SubagentsAppConfig; removed singleton getter. registry.py now reads
  app_config.subagents directly.
- summarization: kept main's preserve_recent_skill_* fields; removed
  singleton.
- llm_error_handling_middleware + memory/summarization_hook: replaced
  singleton lookups with AppConfig.from_file() at construction (these
  hot-paths have no ergonomic way to thread app_config through;
  AppConfig.from_file is a pure load).
- worker.py + thread_data_middleware.py: DeerFlowContext.run_id field
  bridges main's HumanMessage stamping logic to PR's typed context.

Trade-offs (follow-up work):

- main's #2138 (async memory updater) reverted to PR's sync
  implementation. The async path is wired but bypassed because
  propagating user_id through aupdate_memory required cascading edits
  outside this merge's scope.
- tests/test_subagent_skills_config.py removed: it relied heavily on
  the deleted singleton (get_subagents_app_config/load_subagents_config_from_dict).
  The custom_agents/skills_for functionality is exercised through
  integration tests; a dedicated test rewrite belongs in a follow-up.

Verification: backend test suite — 2560 passed, 4 skipped, 84 failures.
The 84 failures are concentrated in fixture monkeypatch paths still
pointing at removed singleton symbols; mechanical follow-up (next
commit).
2026-04-26 21:45:02 +08:00

218 lines
8.2 KiB
Plaintext

---
title: Middlewares
description: Every time the Lead Agent calls the LLM, it runs through a **middleware chain** before and after the model call. Middlewares can read and modify the agent's state, inject content into the system prompt, intercept tool calls, and react to model outputs.
---
import { Callout } from "nextra/components";
# Middlewares
<Callout type="info" emoji="🔌">
Middlewares wrap every LLM turn in the Lead Agent. They are the primary
extension point for adding cross-cutting behaviors like memory, summarization,
clarification, and token tracking.
</Callout>
Every time the Lead Agent calls the LLM, it runs through a **middleware chain** before and after the model call. Middlewares can read and modify the agent's state, inject content into the system prompt, intercept tool calls, and react to model outputs.
This design keeps the agent core simple and stable while allowing rich, composable behaviors to be layered in.
## How the chain works
The middleware chain is built once per agent invocation, based on the current configuration and request parameters. The middlewares run in a defined order:
1. Runtime middlewares (error handling, thread data, uploads, dangling tool call patching)
2. `SummarizationMiddleware` — context compression (if enabled)
3. `TodoMiddleware` — task list management (plan mode only)
4. `TokenUsageMiddleware` — token tracking (if enabled)
5. `TitleMiddleware` — automatic thread title generation
6. `MemoryMiddleware` — cross-session memory injection and queuing
7. `ViewImageMiddleware` — image details injection (if model supports vision)
8. `DeferredToolFilterMiddleware` — hides deferred tool schemas (if tool search enabled)
9. `SubagentLimitMiddleware` — limits parallel subagent calls (if subagents enabled)
10. `LoopDetectionMiddleware` — breaks repetitive tool call loops
11. Custom middlewares (if any)
12. `ClarificationMiddleware` — intercepts clarification requests (always last)
The ordering is significant. Summarization runs early to reduce context before other processing. Clarification always runs last so it can intercept after all other middlewares have had their turn.
## Middleware reference
### ClarificationMiddleware
Intercepts clarification tool calls and converts them into proper user-facing requests for additional information. When the model decides it needs to ask the user something before proceeding, this middleware surfaces that request.
**Configuration**: controlled by `guardrails.clarification` settings.
---
### LoopDetectionMiddleware
Detects when the agent is making the same tool call repeatedly without making progress. When a loop is detected, the middleware intervenes to break the cycle and prevents the agent from burning turns indefinitely.
**Configuration**: built-in, no user configuration.
---
### MemoryMiddleware
Reads persisted memory facts at the start of each conversation and injects them into the system prompt. After a conversation ends, queues a background update to incorporate any new information into the memory store.
**Configuration**: see the [Memory](/docs/harness/memory) page and the `memory:` section in `config.yaml`.
```yaml
memory:
enabled: true
injection_enabled: true
max_injection_tokens: 2000
debounce_seconds: 30
```
---
### SubagentLimitMiddleware
Limits the number of parallel subagent task calls the agent can make in a single turn. This prevents the agent from spawning an unbounded number of concurrent subagents.
**Configuration**: `subagent_enabled` and `max_concurrent_subagents` in the per-request config.
---
### TitleMiddleware
Automatically generates a title for the thread after the first exchange. The title is derived from the user's first message and the agent's response.
**Configuration**: `title:` section in `config.yaml`.
```yaml
title:
enabled: true
max_words: 6
max_chars: 60
model_name: null # use default model
```
---
### TodoMiddleware
When plan mode is active, maintains a structured task list visible to the user. The agent uses the `write_todos` tool to mark tasks as `pending`, `in_progress`, or `completed` as it works through a complex objective.
**Activation**: enabled automatically when `is_plan_mode: true` is set in the request configuration. No `config.yaml` entry required.
---
### TokenUsageMiddleware
Tracks LLM token consumption per model call and logs it at the `info` level. Useful for monitoring costs and understanding where tokens are going in long tasks.
**Configuration**: `token_usage:` section in `config.yaml`.
```yaml
token_usage:
enabled: false
```
---
### SandboxAuditMiddleware
Audits sandbox operations performed during the agent's execution. Provides a record of what files were read, written, and what commands were run.
**Configuration**: built-in runtime middleware, always active when a sandbox is available.
---
### SummarizationMiddleware
When the conversation grows long, summarizes older messages to reduce context size. The summary is injected back into the conversation in place of the original messages, preserving meaning without the full token cost.
**Configuration**: `summarization:` section in `config.yaml`. See detailed configuration below.
---
### ViewImageMiddleware
When the current model supports vision (`supports_vision: true`), this middleware intercepts `view_image` tool calls and injects the image content directly into the model's context so it can be analyzed.
**Activation**: automatically enabled when the resolved model has `supports_vision: true`.
---
### DeferredToolFilterMiddleware
When tool search is enabled, this middleware hides deferred tool schemas from the model's context. Tools are discovered lazily via the `tool_search` tool instead of being listed upfront, reducing context usage.
**Configuration**: `tool_search.enabled: true` in `config.yaml`.
## Summarization configuration
The `SummarizationMiddleware` is one of the most impactful middlewares for long-horizon tasks. Here is the full configuration reference:
```yaml
summarization:
enabled: true
# Model to use for summarization (null = use default model)
# A lightweight model like gpt-4o-mini is recommended to reduce cost.
model_name: null
# Trigger conditions — summarization runs when ANY threshold is met
trigger:
- type: tokens # trigger when context exceeds N tokens
value: 15564
# - type: messages # trigger when there are more than N messages
# value: 50
# - type: fraction # trigger when context exceeds X% of model max
# value: 0.8
# How much recent history to keep after summarization
keep:
type: messages
value: 10 # keep the 10 most recent messages
# Alternative: keep by tokens
# type: tokens
# value: 3000
# Maximum tokens to trim when preparing messages for the summarizer
trim_tokens_to_summarize: 15564
# Custom summary prompt (null = use default LangChain prompt)
summary_prompt: null
```
**Trigger types**:
- `tokens`: triggers when the total token count in the conversation exceeds `value`.
- `messages`: triggers when the number of messages exceeds `value`.
- `fraction`: triggers when the context reaches `value` fraction of the model's maximum input token limit.
Multiple triggers can be listed; summarization runs when **any** of them fires.
**Keep types**:
- `messages`: keep the last `value` messages after summarization.
- `tokens`: keep up to `value` tokens of recent history.
- `fraction`: keep up to `value` fraction of the model's max input token limit.
## Writing a custom middleware
Custom middlewares can be injected into the chain for specialized use cases. A middleware must implement the `AgentMiddleware` interface from `langchain.agents.middleware`.
The basic structure is:
```python
from langchain.agents.middleware import AgentMiddleware
class MyMiddleware(AgentMiddleware):
async def on_start(self, state, config):
# Runs before the model call
# Modify state or config here
return state, config
async def on_end(self, state, config):
# Runs after the model call
# Inspect or modify the result
return state, config
```
Custom middlewares are passed to `make_lead_agent` via the `custom_middlewares` parameter in `_build_middlewares`. They are injected immediately before `ClarificationMiddleware` at the end of the chain.