Squashes 25 PR commits onto current main. AppConfig becomes a pure value object with no ambient lookup. Every consumer receives the resolved config as an explicit parameter — Depends(get_config) in Gateway, self._app_config in DeerFlowClient, runtime.context.app_config in agent runs, AppConfig.from_file() at the LangGraph Server registration boundary. Phase 1 — frozen data + typed context - All config models (AppConfig, MemoryConfig, DatabaseConfig, …) become frozen=True; no sub-module globals. - AppConfig.from_file() is pure (no side-effect singleton loaders). - Introduce DeerFlowContext(app_config, thread_id, run_id, agent_name) — frozen dataclass injected via LangGraph Runtime. - Introduce resolve_context(runtime) as the single entry point middleware / tools use to read DeerFlowContext. Phase 2 — pure explicit parameter passing - Gateway: app.state.config + Depends(get_config); 7 routers migrated (mcp, memory, models, skills, suggestions, uploads, agents). - DeerFlowClient: __init__(config=...) captures config locally. - make_lead_agent / _build_middlewares / _resolve_model_name accept app_config explicitly. - RunContext.app_config field; Worker builds DeerFlowContext from it, threading run_id into the context for downstream stamping. - Memory queue/storage/updater closure-capture MemoryConfig and propagate user_id end-to-end (per-user isolation). - Sandbox/skills/community/factories/tools thread app_config. - resolve_context() rejects non-typed runtime.context. - Test suite migrated off AppConfig.current() monkey-patches. - AppConfig.current() classmethod deleted. Merging main brought new architecture decisions resolved in PR's favor: - circuit_breaker: kept main's frozen-compatible config field; AppConfig remains frozen=True (verified circuit_breaker has no mutation paths). - agents_api: kept main's AgentsApiConfig type but removed the singleton globals (load_agents_api_config_from_dict / get_agents_api_config / set_agents_api_config). 8 routes in agents.py now read via Depends(get_config). - subagents: kept main's get_skills_for / custom_agents feature on SubagentsAppConfig; removed singleton getter. registry.py now reads app_config.subagents directly. - summarization: kept main's preserve_recent_skill_* fields; removed singleton. - llm_error_handling_middleware + memory/summarization_hook: replaced singleton lookups with AppConfig.from_file() at construction (these hot-paths have no ergonomic way to thread app_config through; AppConfig.from_file is a pure load). - worker.py + thread_data_middleware.py: DeerFlowContext.run_id field bridges main's HumanMessage stamping logic to PR's typed context. Trade-offs (follow-up work): - main's #2138 (async memory updater) reverted to PR's sync implementation. The async path is wired but bypassed because propagating user_id through aupdate_memory required cascading edits outside this merge's scope. - tests/test_subagent_skills_config.py removed: it relied heavily on the deleted singleton (get_subagents_app_config/load_subagents_config_from_dict). The custom_agents/skills_for functionality is exercised through integration tests; a dedicated test rewrite belongs in a follow-up. Verification: backend test suite — 2560 passed, 4 skipped, 84 failures. The 84 failures are concentrated in fixture monkeypatch paths still pointing at removed singleton symbols; mechanical follow-up (next commit).
20 KiB
Design: Eliminate Global Mutable State in Configuration System
Implements #1811 · Tracked in #2151
Phase 1 (shipped): PR #2271 — frozen config tree, purify
from_file(), 3-tierAppConfig.current()lifecycle,DeerFlowContextfor agent execution path.Phase 2 (proposed): eliminate the remaining implicit-state surface (
_global/_override/current()) via pure explicit parameter passing. See §8.
Problem
deerflow/config/ had three structural issues:
- Dual source of truth — each sub-config existed both as an
AppConfigfield and a module-level global (e.g._memory_config). Consumers didn't know which to trust. - Side-effect coupling —
AppConfig.from_file()silently mutated 8 sub-module globals viaload_*_from_dict()calls. - Incomplete isolation —
ContextVaronly scopedAppConfig, not the 8 sub-config globals.
Design Principle
Config is a value object, not live shared state. Constructed once, immutable, no reload. New config = new object + rebuild agent.
Solution
1. Frozen AppConfig (full tree)
All config models set frozen=True, including DatabaseConfig and RunEventsConfig (added late in review). No mutation after construction.
class MemoryConfig(BaseModel):
model_config = ConfigDict(frozen=True)
class AppConfig(BaseModel):
model_config = ConfigDict(extra="allow", frozen=True)
memory: MemoryConfig
title: TitleConfig
...
Changes use copy-on-write: config.model_copy(update={...}).
2. Pure from_file()
AppConfig.from_file() is a pure function — returns a frozen object, no side effects. All 8 load_*_from_dict() calls and their imports were removed.
3. Deleted sub-module globals
Every sub-config module's global state was deleted:
| Deleted | Files |
|---|---|
_memory_config, get_memory_config(), set_memory_config(), load_memory_config_from_dict() |
memory_config.py |
_title_config, get_title_config(), set_title_config(), load_title_config_from_dict() |
title_config.py |
| Same pattern | summarization_config.py, subagents_config.py, guardrails_config.py, tool_search_config.py, checkpointer_config.py, stream_bridge_config.py, acp_config.py |
_extensions_config, reload_extensions_config(), reset_extensions_config(), set_extensions_config() |
extensions_config.py |
reload_app_config(), reset_app_config(), set_app_config(), mtime detection, push/pop_current_app_config() |
app_config.py |
Consumers migrated from get_memory_config() → AppConfig.current().memory (~100 call-sites).
4. Lifecycle: 3-tier AppConfig.current()
The original plan called for a single ContextVar with hard-fail on uninitialized access. The shipped lifecycle is a 3-tier fallback attached to AppConfig itself (no separate context.py module). The divergence is explained in §7.
# app_config.py
class AppConfig(BaseModel):
...
# Process-global singleton. Atomic pointer swap under the GIL,
# so no lock is needed for current read/write patterns.
_global: ClassVar[AppConfig | None] = None
# Per-context override (tests, multi-client scenarios).
_override: ClassVar[ContextVar[AppConfig]] = ContextVar("deerflow_app_config_override")
@classmethod
def init(cls, config: AppConfig) -> None:
"""Set the process-global. Visible to all subsequent async tasks."""
cls._global = config
@classmethod
def set_override(cls, config: AppConfig) -> Token[AppConfig]:
"""Per-context override. Returns Token for reset_override()."""
return cls._override.set(config)
@classmethod
def reset_override(cls, token: Token[AppConfig]) -> None:
cls._override.reset(token)
@classmethod
def current(cls) -> AppConfig:
"""Priority: per-context override > process-global > auto-load from file."""
try:
return cls._override.get()
except LookupError:
pass
if cls._global is not None:
return cls._global
logger.warning(
"AppConfig.current() called before init(); auto-loading from file. "
"Call AppConfig.init() at process startup to surface config errors early."
)
config = cls.from_file()
cls._global = config
return config
Why three tiers and not one:
- Process-global is required because
ContextVardoesn't propagate config updates across async request boundaries. Gateway receives aPUT /mcp/configon one request, reloads config, and the next request — in a fresh async context — must see the new value. A plain class variable (_global) does this; aContextVardoes not. - Per-context override is retained for test isolation and multi-client scenarios. A test can scope its config without mutating the process singleton.
reset_override()restores the previous state deterministically viaToken. - Auto-load fallback is a backward-compatibility escape hatch with a warning. Call sites that skipped explicit
init()(legacy or test) still work, but the warning surfaces the miss.
5. Per-invocation context: DeerFlowContext
Lives in deerflow/config/deer_flow_context.py (not context.py as originally planned — the name was reserved to avoid implying a lifecycle module).
@dataclass(frozen=True)
class DeerFlowContext:
"""Typed, immutable, per-invocation context injected via LangGraph Runtime."""
app_config: AppConfig
thread_id: str
agent_name: str | None = None
Fields:
| Field | Type | Source | Mutability |
|---|---|---|---|
app_config |
AppConfig |
AppConfig.current() at run start |
Immutable per-run |
thread_id |
str |
Caller-provided | Immutable per-run |
agent_name |
str | None |
Caller-provided (bootstrap only) | Immutable per-run |
Not in context: sandbox_id is mutable runtime state (lazy-acquired mid-execution). It flows through ThreadState.sandbox (state channel), not context. All 3 runtime.context["sandbox_id"] = ... writes in sandbox/tools.py were removed; SandboxMiddleware.after_agent reads from state["sandbox"] only.
Construction per entry point:
# Gateway runtime (worker.py) — primary path
deer_flow_context = DeerFlowContext(
app_config=AppConfig.current(),
thread_id=thread_id,
)
agent.astream(input, config=config, context=deer_flow_context)
# DeerFlowClient (client.py)
AppConfig.init(AppConfig.from_file(config_path))
context = DeerFlowContext(app_config=AppConfig.current(), thread_id=thread_id)
agent.stream(input, config=config, context=context)
# LangGraph Server — legacy path, context=None or dict, fallback via resolve_context()
6. Access pattern by caller type
The shipped code stratifies callers by what runtime.context type they see, and tightened middleware access over time:
| Caller type | Access pattern | Examples |
|---|---|---|
Typed middleware (declares Runtime[DeerFlowContext]) |
runtime.context.app_config.xxx — direct field access, no wrapper |
memory_middleware, title_middleware, thread_data_middleware, uploads_middleware, loop_detection_middleware |
| Tools that may see legacy dict context | resolve_context(runtime).xxx |
sandbox/tools.py (bash-guard gate, sandbox config), task_tool.py (bash subagent gate) |
| Tools with typed runtime | runtime.context.xxx directly |
present_file_tool.py, setup_agent_tool.py, skill_manage_tool.py |
| Non-agent paths (Gateway routers, CLI, factories) | AppConfig.current().xxx |
app/gateway/routers/*, reset_admin.py, models/factory.py |
Middleware hardening (late commit a934a822): the original plan had middlewares call resolve_context(runtime) everywhere. In practice, once the middleware signature was typed as Runtime[DeerFlowContext], the wrapper became defensive noise. The commit removed:
try/exceptwrappers aroundresolve_context(...)in middlewares and sandbox tools- Optional
title_config=Nonefallback on every_build_title_prompt/_format_for_title_modelhelper; they now takeTitleConfigas a required parameter - Ad-hoc
get_config()fallback chains inmemory_middleware
Dropping the swallowed-exception layer means config-resolution bugs surface as errors instead of silently degrading — aligning with let-it-crash.
resolve_context() itself still exists and handles three cases:
def resolve_context(runtime: Any) -> DeerFlowContext:
ctx = getattr(runtime, "context", None)
if isinstance(ctx, DeerFlowContext):
return ctx # typed path (Gateway, Client)
if isinstance(ctx, dict):
return DeerFlowContext( # legacy dict path (with warning if empty thread_id)
app_config=AppConfig.current(),
thread_id=ctx.get("thread_id", ""),
agent_name=ctx.get("agent_name"),
)
# Final fallback: LangGraph configurable (e.g. LangGraph Server)
cfg = get_config().get("configurable", {})
return DeerFlowContext(
app_config=AppConfig.current(),
thread_id=cfg.get("thread_id", ""),
agent_name=cfg.get("agent_name"),
)
7. Divergence from original plan
Two material divergences from the original design, both driven by implementation feedback:
7.1 Lifecycle: ContextVar → process-global + ContextVar override
Original: single ContextVar in a new context.py module. get_app_config() raises ConfigNotInitializedError if unset.
Shipped: process-global AppConfig._global (primary) + ContextVar override (scoped) + auto-load with warning (fallback).
Why: a ContextVar set by Gateway startup is not visible to subsequent requests that spawn fresh async contexts. PUT /mcp/config must update config such that the next incoming request sees the new value in its async task — this requires process-wide state. ContextVar is retained for test isolation (reset_override() works cleanly per test via Token) and for per-client scoping if ever needed.
The ConfigNotInitializedError was replaced by a warning + auto-load. The hard error caught more legitimate bugs but also broke call sites that historically worked without explicit init (internal scripts, test fixtures during import-time). The warning preserves the signal without breaking backward compatibility; backend/tests/conftest.py now has an autouse fixture that sets _global to a minimal AppConfig so tests never hit auto-load.
7.2 Module name: context.py → lifecycle on AppConfig, deer_flow_context.py for the invocation context
Original: lifecycle and DeerFlowContext both in deerflow/config/context.py.
Shipped: lifecycle is classmethods on AppConfig itself (init, current, set_override, reset_override). DeerFlowContext and resolve_context() live in deerflow/config/deer_flow_context.py.
Why: the lifecycle operates on AppConfig directly — putting it on the class removes one level of module coupling. The per-invocation context is conceptually separate (it's agent-execution plumbing, not config lifecycle) so it got its own file with a distinguishing name.
7.3 Client lifecycle: init() + set_override() → init() only
Original (never finalized): DeerFlowClient.__init__ called both init() (process-global) and set_override() so two clients with different configs wouldn't clobber each other.
Shipped: init() only.
Why (commit a934a822): set_override() leaked overrides across test boundaries because the ContextVar wasn't reset between client instances. Single-client is the common case, and tests use the autouse fixture for isolation. Multi-client scoping can be added back with explicit set_override() if the need arises.
What doesn't change
config.yamlschemaextensions_config.jsonloading- External API behavior (Gateway, DeerFlowClient)
Migration scope (Phase 1, actual)
- ~100 call-sites:
get_*_config()→AppConfig.current().xxx - 6 runtime-path migrations: middlewares + sandbox tools read from
runtime.contextorresolve_context() - 3 deleted sandbox_id writes in
sandbox/tools.py - ~100 test locations updated;
conftest.pyautouse fixture added - New tests:
test_config_frozen.py,test_deer_flow_context.py,test_app_config_reload.py - Gateway update flow:
reload_*→AppConfig.init(AppConfig.from_file()) - Dependency: langgraph
Runtime/ToolRuntime(already available at target version)
8. Phase 2: pure explicit parameter passing
Phase 1 shipped a working 3-tier AppConfig.current() lifecycle. The remaining implicit-state surface is:
AppConfig._global: ClassVar— process-level singletonAppConfig._override: ClassVar[ContextVar]— per-context overrideAppConfig.current()— fallback-chain reader with auto-load warning
Phase 2 proposes removing all three. AppConfig reduces to a pure Pydantic value object with from_file() as its only factory. All consumers receive AppConfig as an explicit parameter, either through a typed constructor, a function signature, or LangGraph Runtime[DeerFlowContext].
8.1 Motivation
Phase 1 addressed the data side of the problem: config is now a frozen ADT, sub-module globals deleted, from_file() pure. The access side still relies on implicit ambient lookup:
# Today (Phase 1 shipped):
def _get_memory_prompt() -> str:
config = AppConfig.current().memory # implicit global lookup
...
# Target (Phase 2):
def _get_memory_prompt(config: MemoryConfig) -> str: # explicit dependency
...
Three concrete benefits:
| Benefit | What it buys |
|---|---|
| Referential transparency | A function's result depends only on its inputs. Testing becomes parameter substitution, no patch.object(AppConfig, "current") chains |
| Dependency visibility | A function signature declares what config it needs. No "this deep helper secretly reads .memory" surprises |
| True multi-config isolation | Two DeerFlowClient instances with different configs can run in the same process without any ambient shared state to contend over |
The cost (Phase 1 wouldn't have made this smaller): ~97 production call sites + ~91 test mock sites need touching, plus signature changes for helpers that now accept config as a parameter.
8.2 Non-agent call paths and their target APIs
Phase 1 got the agent-execution path right (runtime.context.app_config.xxx). The unsolved paths split into four categories:
FastAPI Gateway → Depends(get_config)
# app/gateway/app.py — at startup
app.state.config = AppConfig.from_file()
# app/gateway/deps.py
def get_config(request: Request) -> AppConfig:
return request.app.state.config
# app/gateway/routers/models.py
@router.get("/models")
def list_models(config: AppConfig = Depends(get_config)):
...
# app/gateway/routers/mcp.py — config reload replaces AppConfig.init()
@router.put("/config")
def update_mcp(..., request: Request):
...
request.app.state.config = AppConfig.from_file()
app.state.config is a FastAPI-owned attribute on the app object, not a module-level global. Scoped to the app's lifetime, only written at startup and config-reload.
DeerFlowClient → constructor-captured config
class DeerFlowClient:
def __init__(self, config_path: str | None = None, config: AppConfig | None = None):
self._config = config or AppConfig.from_file(config_path)
def chat(self, message: str, thread_id: str) -> str:
context = DeerFlowContext(app_config=self._config, thread_id=thread_id)
...
Multiple DeerFlowClient instances are now first-class — each owns its config, nothing shared.
Agent construction (make_lead_agent, _build_middlewares, prompt helpers) → threaded through
def make_lead_agent(config: RunnableConfig, app_config: AppConfig):
middlewares = _build_middlewares(app_config, runtime_config=config)
...
def _build_middlewares(app_config: AppConfig, runtime_config: RunnableConfig):
if app_config.token_usage.enabled:
middlewares.append(TokenUsageMiddleware())
...
Every helper that reads config is now on a function-signature chain from make_lead_agent.
Background threads (memory debounce Timer, queue consumers) → closure-captured
def MemoryQueue.add(self, conversation, user_id, config: MemoryConfig):
# capture config at enqueue time
def _flush():
self._updater.update(conversation, user_id, config)
self._timer = Timer(config.debounce_seconds, _flush)
self._timer.start()
The captured config lives in the closure, not in a contextvar the thread can't see.
8.3 Target AppConfig shape
class AppConfig(BaseModel):
model_config = ConfigDict(extra="allow", frozen=True)
log_level: str = "info"
memory: MemoryConfig = Field(default_factory=MemoryConfig)
... # same fields as Phase 1
@classmethod
def from_file(cls, config_path: str | None = None) -> Self:
"""Pure factory. Reads file, returns frozen object. No side effects."""
...
@classmethod
def resolve_config_path(cls, config_path: str | None = None) -> Path:
"""Unchanged from Phase 1."""
...
def get_model_config(self, name: str) -> ModelConfig | None:
"""Unchanged."""
...
# Removed:
# - _global: ClassVar
# - _override: ClassVar[ContextVar]
# - init(), set_override(), reset_override(), current()
8.4 DeerFlowContext and resolve_context() after Phase 2
DeerFlowContext is unchanged — it's already Phase 2-compliant.
resolve_context() simplifies: the "fall back to AppConfig.current()" branch goes away. The dict-context legacy path either constructs DeerFlowContext with an explicitly-passed AppConfig (fed by caller) or is deleted if no dict-context callers remain.
def resolve_context(runtime: Any) -> DeerFlowContext:
ctx = getattr(runtime, "context", None)
if isinstance(ctx, DeerFlowContext):
return ctx
raise RuntimeError(
"runtime.context is not a DeerFlowContext. All callers must construct "
"and inject one explicitly; there is no global fallback."
)
Let-it-crash: if Phase 2 is done correctly, every caller constructs a typed context. If one doesn't, fail loudly.
8.5 Trade-off acknowledgment
The three cases where ambient lookup is genuinely tempting (and why we reject them):
| Tempting case | Why ambient looks easier | Why we still reject it |
|---|---|---|
Deep helper in memory/storage.py needs memory.storage_path |
Just threaded through 4 call layers | That's exactly the dependency chain you want visible. It's either there or it's hiding |
| Community tool factory reading API keys from config | "Each tool factory doesn't want to take config" | Each tool factory literally needs the config. Passing it is the honest signature |
| Test that wants to "override just one field globally" | patch.object(AppConfig, "current") is one line |
Tests constructing their own AppConfig is one fixture — and that fixture becomes infrastructure for all future tests |
The rejection is consistent: an explicit parameter is strictly more honest than an implicit global lookup, in every case.
8.6 Scope
- ~97 production call sites:
AppConfig.current()→ parameter - ~91 test mock sites:
patch.object(AppConfig, "current")/AppConfig._global = ...→ fixture injection - ~30 FastAPI endpoints gain
config: AppConfig = Depends(get_config) - ~15 factory / helper functions gain
config: AppConfigparameter - Delete from
app_config.py:_global,_override,init,current,set_override,reset_override - Simplify
resolve_context(): removeAppConfig.current()fallback
Implementation plan: see 2026-04-12-config-refactor-plan.md §Phase 2.