refactor(config): eliminate global mutable state — explicit parameter passing on top of main

Squashes 25 PR commits onto current main. AppConfig becomes a pure value object with no ambient lookup. Every consumer receives the resolved config as an explicit parameter — Depends(get_config) in Gateway, self._app_config in DeerFlowClient, runtime.context.app_config in agent runs, AppConfig.from_file() at the LangGraph Server registration boundary. Phase 1 — frozen data + typed context - All config models (AppConfig, MemoryConfig, DatabaseConfig, …) become frozen=True; no sub-module globals. - AppConfig.from_file() is pure (no side-effect singleton loaders). - Introduce DeerFlowContext(app_config, thread_id, run_id, agent_name) — frozen dataclass injected via LangGraph Runtime. - Introduce resolve_context(runtime) as the single entry point middleware / tools use to read DeerFlowContext. Phase 2 — pure explicit parameter passing - Gateway: app.state.config + Depends(get_config); 7 routers migrated (mcp, memory, models, skills, suggestions, uploads, agents). - DeerFlowClient: __init__(config=...) captures config locally. - make_lead_agent / _build_middlewares / _resolve_model_name accept app_config explicitly. - RunContext.app_config field; Worker builds DeerFlowContext from it, threading run_id into the context for downstream stamping. - Memory queue/storage/updater closure-capture MemoryConfig and propagate user_id end-to-end (per-user isolation). - Sandbox/skills/community/factories/tools thread app_config. - resolve_context() rejects non-typed runtime.context. - Test suite migrated off AppConfig.current() monkey-patches. - AppConfig.current() classmethod deleted. Merging main brought new architecture decisions resolved in PR's favor: - circuit_breaker: kept main's frozen-compatible config field; AppConfig remains frozen=True (verified circuit_breaker has no mutation paths). - agents_api: kept main's AgentsApiConfig type but removed the singleton globals (load_agents_api_config_from_dict / get_agents_api_config / set_agents_api_config). 8 routes in agents.py now read via Depends(get_config). - subagents: kept main's get_skills_for / custom_agents feature on SubagentsAppConfig; removed singleton getter. registry.py now reads app_config.subagents directly. - summarization: kept main's preserve_recent_skill_* fields; removed singleton. - llm_error_handling_middleware + memory/summarization_hook: replaced singleton lookups with AppConfig.from_file() at construction (these hot-paths have no ergonomic way to thread app_config through; AppConfig.from_file is a pure load). - worker.py + thread_data_middleware.py: DeerFlowContext.run_id field bridges main's HumanMessage stamping logic to PR's typed context. Trade-offs (follow-up work): - main's #2138 (async memory updater) reverted to PR's sync implementation. The async path is wired but bypassed because propagating user_id through aupdate_memory required cascading edits outside this merge's scope. - tests/test_subagent_skills_config.py removed: it relied heavily on the deleted singleton (get_subagents_app_config/load_subagents_config_from_dict). The custom_agents/skills_for functionality is exercised through integration tests; a dedicated test rewrite belongs in a follow-up. Verification: backend test suite — 2560 passed, 4 skipped, 84 failures. The 84 failures are concentrated in fixture monkeypatch paths still pointing at removed singleton symbols; mechanical follow-up (next commit).
2026-05-21 07:26:50 +00:00 · 2026-04-26 21:45:02 +08:00
parent 9dc25987e0
commit 3e6a34297d
365 changed files with 31220 additions and 5303 deletions
@@ -0,0 +1,301 @@
+# DeerFlow 配置系统设计
+
+> 对应实现：[PR #2271](https://github.com/bytedance/deer-flow/pull/2271) · RFC [#1811](https://github.com/bytedance/deer-flow/issues/1811) · 归档 spec：[config-refactor-design](./plans/2026-04-12-config-refactor-design.md)
+
+## 1. 为什么要重构
+
+重构前的 `deerflow/config/` 有三个结构性问题，凑在一起就是"全局可变状态 + 副作用耦合"的经典反模式：
+
+| 问题 | 具体表现 |
+|------|----------|
+| 双重真相 | 每个 sub-config 同时是 `AppConfig` 字段**和**模块级全局（`_memory_config` / `_title_config` …）。consumer 不知道该信哪个 |
+| 副作用耦合 | `AppConfig.from_file()` 顺便 mutate 8 个 sub-module 的 globals（通过 `load_*_from_dict()`） |
+| 隔离不完整 | 原有的 `ContextVar` 只罩住 `AppConfig` 本体，8 个 sub-config globals 漏在外面 |
+
+从类型论视角看：config 本应是一个**纯值对象（value object）**——构造一次、不变、可复制——但上面这套设计让它变成了"带全局状态的活对象"，于是 test mutation、async 边界、热更新都会互相污染。
+
+## 2. 核心设计原则
+
+> **Config is a value object, not live shared state.**
+> 构造一次，不可变，没有 reload。新 config = 新对象 + 重建 agent。
+
+这一条原则推导出后面所有决策：
+
+- 全部 config model `frozen=True` → 非法状态不可表示
+- `from_file()` 是纯函数 → 无副作用
+- 没有 "热加载"语义 → 改变配置等于"拿到新对象"，由调用方决定要不要换进程全局
+
+## 3. 四层分层
+
+```mermaid
+graph TB
+    subgraph L1 ["第 1 层 数据模型 — 冻结的 ADT"]
+        direction LR
+        AppConfig["AppConfig frozen=True"]
+        Sub["MemoryConfig TitleConfig SummarizationConfig ... 全部 frozen"]
+        AppConfig --> Sub
+    end
+
+    subgraph L2 ["第 2 层 Lifecycle — AppConfig.current"]
+        direction LR
+        Override["_override ContextVar per-context"]
+        Global["_global ClassVar process-singleton"]
+        Auto["auto-load from file with warning"]
+        Override --> Global
+        Global --> Auto
+    end
+
+    subgraph L3 ["第 3 层 Per-invocation context — DeerFlowContext"]
+        direction LR
+        Ctx["frozen dataclass app_config thread_id agent_name"]
+        Resolve["resolve_context legacy bridge"]
+        Ctx --> Resolve
+    end
+
+    subgraph L4 ["第 4 层 访问模式 — 按 caller 类型分流"]
+        direction LR
+        Typed["typed middleware runtime.context.app_config.xxx"]
+        Legacy["dict-legacy resolve_context runtime"]
+        NonAgent["非 agent 路径 AppConfig.current"]
+    end
+
+    L1 --> L2
+    L2 --> L3
+    L3 --> L4
+
+    classDef morandiBlue fill:#B5C4D1,stroke:#6A7A8C,color:#2E3A47
+    classDef morandiGreen fill:#C4D1B5,stroke:#7A8C6A,color:#2E3A47
+    classDef morandiPurple fill:#C9BED1,stroke:#7E6A8C,color:#2E3A47
+    classDef morandiGrey fill:#CFCFCF,stroke:#7A7A7A,color:#2E3A47
+    class L1 morandiBlue
+    class L2 morandiGreen
+    class L3 morandiPurple
+    class L4 morandiGrey
+```
+
+### 3.1 第 1 层：冻结的 ADT
+
+所有 config model 都是 Pydantic `frozen=True`。
+
+```python
+class MemoryConfig(BaseModel):
+    model_config = ConfigDict(frozen=True)
+    enabled: bool = True
+    storage_path: str | None = None
+    ...
+
+class AppConfig(BaseModel):
+    model_config = ConfigDict(extra="allow", frozen=True)
+    memory: MemoryConfig
+    title: TitleConfig
+    ...
+```
+
+改 config 用 copy-on-write：
+
+```python
+new_config = config.model_copy(update={"memory": new_memory_config})
+```
+
+**从类型论视角**：这就是个 product type（record），所有字段组合起来才是一个完整的 `AppConfig`。冻结意味着 `AppConfig` 是**指称透明**的——同样的输入永远拿到同样的对象。
+
+### 3.2 第 2 层：Lifecycle — `AppConfig.current()`
+
+这层是整个设计最值得讲的一块。它不是一个简单的单 `ContextVar`，而是**三层 fallback**：
+
+```python
+class AppConfig(BaseModel):
+    ...
+
+    # 进程级单例。GIL 下原子指针交换，无需锁
+    _global: ClassVar[AppConfig | None] = None
+
+    # Per-context override，用于测试隔离和多 client
+    _override: ClassVar[ContextVar[AppConfig]] = ContextVar("deerflow_app_config_override")
+
+    @classmethod
+    def init(cls, config: AppConfig) -> None:
+        """设置进程全局。对所有后续 async task 可见"""
+        cls._global = config
+
+    @classmethod
+    def set_override(cls, config: AppConfig) -> Token[AppConfig]:
+        """Per-context 覆盖。返回 Token 给 reset_override()"""
+        return cls._override.set(config)
+
+    @classmethod
+    def reset_override(cls, token: Token[AppConfig]) -> None:
+        cls._override.reset(token)
+
+    @classmethod
+    def current(cls) -> AppConfig:
+        """优先级：per-context override > 进程全局 > 自动从文件加载（warning）"""
+        try:
+            return cls._override.get()
+        except LookupError:
+            pass
+        if cls._global is not None:
+            return cls._global
+        logger.warning("AppConfig.current() called before init(); auto-loading from file. ...")
+        config = cls.from_file()
+        cls._global = config
+        return config
+```
+
+**为什么是三层，不是一层？**
+
+| 原因 | 解释 |
+|------|------|
+| 单 ContextVar 行不通 | Gateway 收到 `PUT /mcp/config` reload config，下一个请求在**全新的 async context** 里跑——ContextVar 的值传不过去。只能用进程级变量 |
+| 保留 ContextVar override | 测试需要 per-test scope config，`Token`-based reset 保证干净恢复。多 client 场景如果真出现也能靠它 |
+| Auto-load fallback | 有些 call site 历史上没调 `init()`（内部脚本、import-time 触发的测试）。加 warning 保证信号不丢，但不硬崩 |
+
+**Scala 视角的映射**：
+
+- `_global` = 进程级 `var`，脏，但别无选择
+- `_override` = `Option[ContextVar]` 形式的 reader monad 层
+- `current()` = fallback chain `override.orElse(global).orElse(autoLoad)`，和 `Option.orElse` 思路一致
+
+**为什么 `_global` 没加锁？**
+
+因为读和写都是单个指针赋值（assignment of class attribute），在 CPython 的 GIL 下是原子的。如果将来改成 read-modify-write（比如 "如果没 init 就 init 成 X"），再加 `threading.Lock`。现在不加是因为——不需要。
+
+### 3.3 第 3 层：`DeerFlowContext` — per-invocation typed context
+
+```python
+# deerflow/config/deer_flow_context.py
+@dataclass(frozen=True)
+class DeerFlowContext:
+    """Typed, immutable, per-invocation context injected via LangGraph Runtime"""
+    app_config: AppConfig
+    thread_id: str
+    agent_name: str | None = None
+```
+
+为什么不把 `thread_id` 也放进 `AppConfig`？
+
+- `AppConfig` 是**配置**——进程启动时确定，所有请求共享
+- `thread_id` 是**每次调用变的运行时身份**——必须 per-invocation
+
+两者是不同的 category，混在一起就是把静态配置和动态 identity 耦合。
+
+**注入路径**：
+
+```python
+# Gateway worker（主路径）
+deer_flow_context = DeerFlowContext(
+    app_config=AppConfig.current(),
+    thread_id=thread_id,
+)
+agent.astream(input, config=config, context=deer_flow_context)
+
+# DeerFlowClient
+AppConfig.init(AppConfig.from_file(config_path))
+context = DeerFlowContext(app_config=AppConfig.current(), thread_id=thread_id)
+agent.stream(input, config=config, context=context)
+```
+
+LangGraph 的 `Runtime` 会把 `context=...` 的值注入到 `Runtime[DeerFlowContext].context` 里。Middleware 拿到的就是 typed 的 `DeerFlowContext`。
+
+**不进 context 的东西**：`sandbox_id`——它是 mid-execution 才 acquire 的**可变运行时状态**，正确的归宿是 `ThreadState.sandbox`（state channel，有 reducer），不是 context。原先 `sandbox/tools.py` 里 3 处 `runtime.context["sandbox_id"] = ...` 的写法全部删除。
+
+### 3.4 第 4 层：访问模式按 caller 类型分流
+
+三种 caller，三种模式：
+
+| Caller 类型 | 访问模式 | 例子 |
+|-------------|----------|------|
+| Typed middleware（签名写 `Runtime[DeerFlowContext]`） | `runtime.context.app_config.xxx` 直读，无包装 | `memory_middleware` / `title_middleware` / `thread_data_middleware` 等 |
+| 可能遇到 dict context 的 tool | `resolve_context(runtime).xxx` | `sandbox/tools.py`（dict-legacy 路径）/ `task_tool.py`（bash subagent gate） |
+| 非 agent 路径（Gateway router、CLI、factory） | `AppConfig.current().xxx` | `app/gateway/routers/*` / `reset_admin.py` / `models/factory.py` |
+
+**关键简化**（commit `a934a822`）：原本所有 middleware 都走 `resolve_context()`，后来发现既然签名已经是 `Runtime[DeerFlowContext]`，包装就是冗余防御，直接 `runtime.context.app_config.xxx` 就行。同时也把 `title_middleware` 里每个 helper 的 `title_config=None` fallback 都删掉了——**required parameter 不给 default**，让类型系统强制 caller 传对。
+
+这对应 Scala / FP 的两个信条：
+- **让非法状态不可表示**（`Option[TitleConfig]` 改成 `TitleConfig` required）
+- **Let-it-crash**（config 解析失败是真 bug，surface 出来比吞掉退化更好）
+
+## 4. `resolve_context()` 的三种分支
+
+`resolve_context()` 自己还在，处理三种 runtime.context 形状：
+
+```python
+def resolve_context(runtime: Any) -> DeerFlowContext:
+    ctx = getattr(runtime, "context", None)
+
+    # 1. typed 路径（Gateway、Client）— 直接返回
+    if isinstance(ctx, DeerFlowContext):
+        return ctx
+
+    # 2. dict-legacy 路径（老测试、第三方 invoke）— 桥接
+    if isinstance(ctx, dict):
+        thread_id = ctx.get("thread_id", "")
+        if not thread_id:
+            logger.warning("...empty thread_id...")
+        return DeerFlowContext(
+            app_config=AppConfig.current(),
+            thread_id=thread_id,
+            agent_name=ctx.get("agent_name"),
+        )
+
+    # 3. 完全没 context — fall back 到 LangGraph configurable
+    cfg = get_config().get("configurable", {})
+    return DeerFlowContext(
+        app_config=AppConfig.current(),
+        thread_id=cfg.get("thread_id", ""),
+        agent_name=cfg.get("agent_name"),
+    )
+```
+
+空 thread_id 会 warn，不会硬崩——在这里 warn 比 crash 合理，因为 `thread_id` 缺失只影响文件路径（落到空字符串目录），不会让整个 agent 跑崩。
+
+## 5. Gateway config 热更新流程
+
+历史上 Gateway 用 `reload_*_config()` 带 mtime 检测。现在改成：
+
+```
+写 extensions_config.json → AppConfig.init(AppConfig.from_file()) → 下一个请求看到新值
+```
+
+**没有**：mtime 检测、自动刷新、`reload_*()` 函数。
+
+哲学很简单：**结构性变化（模型、tools、middleware 链）需要重建 agent；运行时变化（`memory.enabled` 这种 flag）下一次 invocation 从 `AppConfig.current()` 取值就自动生效**。不需要给 config 做"活对象"语义。
+
+## 6. 从原计划的分歧
+
+三处关键分歧（详情见 [归档 spec §7](./plans/2026-04-12-config-refactor-design.md#7-divergence-from-original-plan)）：
+
+| 分歧 | 原计划 | Shipped | 原因 |
+|------|--------|---------|------|
+| Lifecycle 存储 | 单 ContextVar，`ConfigNotInitializedError` 硬崩 | 3 层 fallback，auto-load + warning | ContextVar 跨 async 边界传不过去 |
+| 模块位置 | 新建 `context.py` | Lifecycle 放在 `AppConfig` 自身 classmethod | 减一层模块耦合 |
+| Middleware 访问 | 处处 `resolve_context()` | typed middleware 直读 `runtime.context.xxx` | 类型收紧后防御性包装是 noise |
+
+## 7. 从 Scala / Actor 视角的几点观察
+
+- **`AppConfig` 就是个 case class / ADT**。`frozen=True` 相当于 Scala 的 final case class：构造完就不动。改动靠 `model_copy(update=…)`，对应 Scala 的 `copy(…)`。
+- **`DeerFlowContext` 是 typed reader**。Middleware 接收 `Runtime[DeerFlowContext]`，本质是 `Kleisli[DeerFlowContext, State, Result]`——依赖注入，类型化。比 `RunnableConfig.configurable: dict[str, Any]` 强太多。
+- **`resolve_context()` 是适配层**。存在是因为有三种不同形状的上游输入；在纯 FP 眼里这是个 `X => DeerFlowContext` 的 total function，通过 pattern match 三种 case 把世界收敛回 typed 的那条路径。
+- **Let-it-crash 的体现**：commit `a934a822` 干掉 middleware 里 `try/except resolve_context(...)`，干掉 `TitleConfig | None` 的 defensive fallback。Config 解析失败就让它抛出去，别吞成"degraded mode"——actor supervision 会处理，吞错反而藏 bug。
+- **进程 global 的妥协**：`_global: ClassVar` 是这套设计里唯一违背纯值的地方。但在 Python async + HTTP server 的语境里，你没别的办法跨 request 把"新 config"传给所有 task。承认妥协、限制范围（只在 lifecycle 层一个变量）、周边全部 immutable——这就是工程意义上的"合理妥协"。
+
+## 8. Cheat sheet
+
+想访问 config，怎么办？按你写代码的位置看：
+
+| 我在写什么 | 用什么 |
+|------------|--------|
+| Typed middleware（签名 `Runtime[DeerFlowContext]`） | `runtime.context.app_config.xxx` |
+| Typed tool（`ToolRuntime[DeerFlowContext]`） | `runtime.context.xxx` |
+| 可能被老调用方以 dict context 调到的 tool | `resolve_context(runtime).xxx` |
+| Gateway router、CLI、factory、测试 helper | `AppConfig.current().xxx` |
+| 启动时初始化 | `AppConfig.init(AppConfig.from_file(path))` |
+| 测试里想临时改 config | `token = AppConfig.set_override(cfg)` / `AppConfig.reset_override(token)` |
+| Gateway 写完新 `extensions_config.json` 之后 | `AppConfig.init(AppConfig.from_file())`，然后让 agent 重建（如果结构变了） |
+
+不要：
+- ~~`get_memory_config()` / `get_title_config()` 等旧 getter~~（已删）
+- ~~`reload_app_config()` / `reset_app_config()`~~（已删）
+- ~~`_memory_config` 等模块级 global~~（已删）
+- ~~`runtime.context["sandbox_id"] = ...`~~（走 `runtime.state["sandbox"]`）
+- ~~防御性 `try/except resolve_context(...)`~~（让它崩）
@@ -0,0 +1,414 @@
+# Design: Eliminate Global Mutable State in Configuration System
+
+> Implements [#1811](https://github.com/bytedance/deer-flow/issues/1811) · Tracked in [#2151](https://github.com/bytedance/deer-flow/issues/2151)
+>
+> **Phase 1 (shipped):** [PR #2271](https://github.com/bytedance/deer-flow/pull/2271) — frozen config tree, purify `from_file()`, 3-tier `AppConfig.current()` lifecycle, `DeerFlowContext` for agent execution path.
+>
+> **Phase 2 (proposed):** eliminate the remaining implicit-state surface (`_global` / `_override` / `current()`) via pure explicit parameter passing. See §8.
+
+## Problem
+
+`deerflow/config/` had three structural issues:
+
+1. **Dual source of truth** — each sub-config existed both as an `AppConfig` field and a module-level global (e.g. `_memory_config`). Consumers didn't know which to trust.
+2. **Side-effect coupling** — `AppConfig.from_file()` silently mutated 8 sub-module globals via `load_*_from_dict()` calls.
+3. **Incomplete isolation** — `ContextVar` only scoped `AppConfig`, not the 8 sub-config globals.
+
+## Design Principle
+
+**Config is a value object, not live shared state.** Constructed once, immutable, no reload. New config = new object + rebuild agent.
+
+## Solution
+
+### 1. Frozen AppConfig (full tree)
+
+All config models set `frozen=True`, including `DatabaseConfig` and `RunEventsConfig` (added late in review). No mutation after construction.
+
+```python
+class MemoryConfig(BaseModel):
+    model_config = ConfigDict(frozen=True)
+
+class AppConfig(BaseModel):
+    model_config = ConfigDict(extra="allow", frozen=True)
+    memory: MemoryConfig
+    title: TitleConfig
+    ...
+```
+
+Changes use copy-on-write: `config.model_copy(update={...})`.
+
+### 2. Pure `from_file()`
+
+`AppConfig.from_file()` is a pure function — returns a frozen object, no side effects. All 8 `load_*_from_dict()` calls and their imports were removed.
+
+### 3. Deleted sub-module globals
+
+Every sub-config module's global state was deleted:
+
+| Deleted | Files |
+|---------|-------|
+| `_memory_config`, `get_memory_config()`, `set_memory_config()`, `load_memory_config_from_dict()` | `memory_config.py` |
+| `_title_config`, `get_title_config()`, `set_title_config()`, `load_title_config_from_dict()` | `title_config.py` |
+| Same pattern | `summarization_config.py`, `subagents_config.py`, `guardrails_config.py`, `tool_search_config.py`, `checkpointer_config.py`, `stream_bridge_config.py`, `acp_config.py` |
+| `_extensions_config`, `reload_extensions_config()`, `reset_extensions_config()`, `set_extensions_config()` | `extensions_config.py` |
+| `reload_app_config()`, `reset_app_config()`, `set_app_config()`, mtime detection, `push/pop_current_app_config()` | `app_config.py` |
+
+Consumers migrated from `get_memory_config()` → `AppConfig.current().memory` (~100 call-sites).
+
+### 4. Lifecycle: 3-tier `AppConfig.current()`
+
+The original plan called for a single `ContextVar` with hard-fail on uninitialized access. The shipped lifecycle is a **3-tier fallback** attached to `AppConfig` itself (no separate `context.py` module). The divergence is explained in §7.
+
+```python
+# app_config.py
+class AppConfig(BaseModel):
+    ...
+
+    # Process-global singleton. Atomic pointer swap under the GIL,
+    # so no lock is needed for current read/write patterns.
+    _global: ClassVar[AppConfig | None] = None
+
+    # Per-context override (tests, multi-client scenarios).
+    _override: ClassVar[ContextVar[AppConfig]] = ContextVar("deerflow_app_config_override")
+
+    @classmethod
+    def init(cls, config: AppConfig) -> None:
+        """Set the process-global. Visible to all subsequent async tasks."""
+        cls._global = config
+
+    @classmethod
+    def set_override(cls, config: AppConfig) -> Token[AppConfig]:
+        """Per-context override. Returns Token for reset_override()."""
+        return cls._override.set(config)
+
+    @classmethod
+    def reset_override(cls, token: Token[AppConfig]) -> None:
+        cls._override.reset(token)
+
+    @classmethod
+    def current(cls) -> AppConfig:
+        """Priority: per-context override > process-global > auto-load from file."""
+        try:
+            return cls._override.get()
+        except LookupError:
+            pass
+        if cls._global is not None:
+            return cls._global
+        logger.warning(
+            "AppConfig.current() called before init(); auto-loading from file. "
+            "Call AppConfig.init() at process startup to surface config errors early."
+        )
+        config = cls.from_file()
+        cls._global = config
+        return config
+```
+
+**Why three tiers and not one:**
+
+- **Process-global** is required because `ContextVar` doesn't propagate config updates across async request boundaries. Gateway receives a `PUT /mcp/config` on one request, reloads config, and the next request — in a fresh async context — must see the new value. A plain class variable (`_global`) does this; a `ContextVar` does not.
+- **Per-context override** is retained for test isolation and multi-client scenarios. A test can scope its config without mutating the process singleton. `reset_override()` restores the previous state deterministically via `Token`.
+- **Auto-load fallback** is a backward-compatibility escape hatch with a warning. Call sites that skipped explicit `init()` (legacy or test) still work, but the warning surfaces the miss.
+
+### 5. Per-invocation context: `DeerFlowContext`
+
+Lives in `deerflow/config/deer_flow_context.py` (not `context.py` as originally planned — the name was reserved to avoid implying a lifecycle module).
+
+```python
+@dataclass(frozen=True)
+class DeerFlowContext:
+    """Typed, immutable, per-invocation context injected via LangGraph Runtime."""
+    app_config: AppConfig
+    thread_id: str
+    agent_name: str | None = None
+```
+
+**Fields:**
+
+| Field | Type | Source | Mutability |
+|-------|------|--------|-----------|
+| `app_config` | `AppConfig` | `AppConfig.current()` at run start | Immutable per-run |
+| `thread_id` | `str` | Caller-provided | Immutable per-run |
+| `agent_name` | `str \| None` | Caller-provided (bootstrap only) | Immutable per-run |
+
+**Not in context:** `sandbox_id` is mutable runtime state (lazy-acquired mid-execution). It flows through `ThreadState.sandbox` (state channel), not context. All 3 `runtime.context["sandbox_id"] = ...` writes in `sandbox/tools.py` were removed; `SandboxMiddleware.after_agent` reads from `state["sandbox"]` only.
+
+**Construction per entry point:**
+
+```python
+# Gateway runtime (worker.py) — primary path
+deer_flow_context = DeerFlowContext(
+    app_config=AppConfig.current(),
+    thread_id=thread_id,
+)
+agent.astream(input, config=config, context=deer_flow_context)
+
+# DeerFlowClient (client.py)
+AppConfig.init(AppConfig.from_file(config_path))
+context = DeerFlowContext(app_config=AppConfig.current(), thread_id=thread_id)
+agent.stream(input, config=config, context=context)
+
+# LangGraph Server — legacy path, context=None or dict, fallback via resolve_context()
+```
+
+### 6. Access pattern by caller type
+
+The shipped code stratifies callers by what `runtime.context` type they see, and tightened middleware access over time:
+
+| Caller type | Access pattern | Examples |
+|-------------|---------------|----------|
+| Typed middleware (declares `Runtime[DeerFlowContext]`) | `runtime.context.app_config.xxx` — direct field access, no wrapper | `memory_middleware`, `title_middleware`, `thread_data_middleware`, `uploads_middleware`, `loop_detection_middleware` |
+| Tools that may see legacy dict context | `resolve_context(runtime).xxx` | `sandbox/tools.py` (bash-guard gate, sandbox config), `task_tool.py` (bash subagent gate) |
+| Tools with typed runtime | `runtime.context.xxx` directly | `present_file_tool.py`, `setup_agent_tool.py`, `skill_manage_tool.py` |
+| Non-agent paths (Gateway routers, CLI, factories) | `AppConfig.current().xxx` | `app/gateway/routers/*`, `reset_admin.py`, `models/factory.py` |
+
+**Middleware hardening** (late commit `a934a822`): the original plan had middlewares call `resolve_context(runtime)` everywhere. In practice, once the middleware signature was typed as `Runtime[DeerFlowContext]`, the wrapper became defensive noise. The commit removed:
+- `try/except` wrappers around `resolve_context(...)` in middlewares and sandbox tools
+- Optional `title_config=None` fallback on every `_build_title_prompt` / `_format_for_title_model` helper; they now take `TitleConfig` as a **required parameter**
+- Ad-hoc `get_config()` fallback chains in `memory_middleware`
+
+Dropping the swallowed-exception layer means config-resolution bugs surface as errors instead of silently degrading — aligning with let-it-crash.
+
+`resolve_context()` itself still exists and handles three cases:
+
+```python
+def resolve_context(runtime: Any) -> DeerFlowContext:
+    ctx = getattr(runtime, "context", None)
+    if isinstance(ctx, DeerFlowContext):
+        return ctx                        # typed path (Gateway, Client)
+    if isinstance(ctx, dict):
+        return DeerFlowContext(           # legacy dict path (with warning if empty thread_id)
+            app_config=AppConfig.current(),
+            thread_id=ctx.get("thread_id", ""),
+            agent_name=ctx.get("agent_name"),
+        )
+    # Final fallback: LangGraph configurable (e.g. LangGraph Server)
+    cfg = get_config().get("configurable", {})
+    return DeerFlowContext(
+        app_config=AppConfig.current(),
+        thread_id=cfg.get("thread_id", ""),
+        agent_name=cfg.get("agent_name"),
+    )
+```
+
+### 7. Divergence from original plan
+
+Two material divergences from the original design, both driven by implementation feedback:
+
+**7.1 Lifecycle: `ContextVar` → process-global + `ContextVar` override**
+
+*Original:* single `ContextVar` in a new `context.py` module. `get_app_config()` raises `ConfigNotInitializedError` if unset.
+
+*Shipped:* process-global `AppConfig._global` (primary) + `ContextVar` override (scoped) + auto-load with warning (fallback).
+
+*Why:* a `ContextVar` set by Gateway startup is not visible to subsequent requests that spawn fresh async contexts. `PUT /mcp/config` must update config such that the next incoming request sees the new value in *its* async task — this requires process-wide state. ContextVar is retained for test isolation (`reset_override()` works cleanly per test via `Token`) and for per-client scoping if ever needed.
+
+The `ConfigNotInitializedError` was replaced by a warning + auto-load. The hard error caught more legitimate bugs but also broke call sites that historically worked without explicit init (internal scripts, test fixtures during import-time). The warning preserves the signal without breaking backward compatibility; `backend/tests/conftest.py` now has an autouse fixture that sets `_global` to a minimal `AppConfig` so tests never hit auto-load.
+
+**7.2 Module name: `context.py` → lifecycle on `AppConfig`, `deer_flow_context.py` for the invocation context**
+
+*Original:* lifecycle and `DeerFlowContext` both in `deerflow/config/context.py`.
+
+*Shipped:* lifecycle is classmethods on `AppConfig` itself (`init`, `current`, `set_override`, `reset_override`). `DeerFlowContext` and `resolve_context()` live in `deerflow/config/deer_flow_context.py`.
+
+*Why:* the lifecycle operates on `AppConfig` directly — putting it on the class removes one level of module coupling. The per-invocation context is conceptually separate (it's agent-execution plumbing, not config lifecycle) so it got its own file with a distinguishing name.
+
+**7.3 Client lifecycle: `init() + set_override()` → `init()` only**
+
+*Original (never finalized):* `DeerFlowClient.__init__` called both `init()` (process-global) and `set_override()` so two clients with different configs wouldn't clobber each other.
+
+*Shipped:* `init()` only.
+
+*Why (commit `a934a822`):* `set_override()` leaked overrides across test boundaries because the `ContextVar` wasn't reset between client instances. Single-client is the common case, and tests use the autouse fixture for isolation. Multi-client scoping can be added back with explicit `set_override()` if the need arises.
+
+## What doesn't change
+
+- `config.yaml` schema
+- `extensions_config.json` loading
+- External API behavior (Gateway, DeerFlowClient)
+
+## Migration scope (Phase 1, actual)
+
+- ~100 call-sites: `get_*_config()` → `AppConfig.current().xxx`
+- 6 runtime-path migrations: middlewares + sandbox tools read from `runtime.context` or `resolve_context()`
+- 3 deleted sandbox_id writes in `sandbox/tools.py`
+- ~100 test locations updated; `conftest.py` autouse fixture added
+- New tests: `test_config_frozen.py`, `test_deer_flow_context.py`, `test_app_config_reload.py`
+- Gateway update flow: `reload_*` → `AppConfig.init(AppConfig.from_file())`
+- Dependency: langgraph `Runtime` / `ToolRuntime` (already available at target version)
+
+## 8. Phase 2: pure explicit parameter passing
+
+Phase 1 shipped a working 3-tier `AppConfig.current()` lifecycle. The remaining implicit-state surface is:
+
+- `AppConfig._global: ClassVar` — process-level singleton
+- `AppConfig._override: ClassVar[ContextVar]` — per-context override
+- `AppConfig.current()` — fallback-chain reader with auto-load warning
+
+Phase 2 proposes removing all three. `AppConfig` reduces to a pure Pydantic value object with `from_file()` as its only factory. All consumers receive `AppConfig` as an explicit parameter, either through a typed constructor, a function signature, or LangGraph `Runtime[DeerFlowContext]`.
+
+### 8.1 Motivation
+
+Phase 1 addressed the **data side** of the problem: config is now a frozen ADT, sub-module globals deleted, `from_file()` pure. The **access side** still relies on implicit ambient lookup:
+
+```python
+# Today (Phase 1 shipped):
+def _get_memory_prompt() -> str:
+    config = AppConfig.current().memory  # implicit global lookup
+    ...
+
+# Target (Phase 2):
+def _get_memory_prompt(config: MemoryConfig) -> str:  # explicit dependency
+    ...
+```
+
+Three concrete benefits:
+
+| Benefit | What it buys |
+|---------|-------------|
+| Referential transparency | A function's result depends only on its inputs. Testing becomes parameter substitution, no `patch.object(AppConfig, "current")` chains |
+| Dependency visibility | A function signature declares what config it needs. No "this deep helper secretly reads `.memory`" surprises |
+| True multi-config isolation | Two `DeerFlowClient` instances with different configs can run in the same process without any ambient shared state to contend over |
+
+The cost (Phase 1 wouldn't have made this smaller): ~97 production call sites + ~91 test mock sites need touching, plus signature changes for helpers that now accept `config` as a parameter.
+
+### 8.2 Non-agent call paths and their target APIs
+
+Phase 1 got the agent-execution path right (`runtime.context.app_config.xxx`). The unsolved paths split into four categories:
+
+**FastAPI Gateway** → `Depends(get_config)`
+
+```python
+# app/gateway/app.py — at startup
+app.state.config = AppConfig.from_file()
+
+# app/gateway/deps.py
+def get_config(request: Request) -> AppConfig:
+    return request.app.state.config
+
+# app/gateway/routers/models.py
+@router.get("/models")
+def list_models(config: AppConfig = Depends(get_config)):
+    ...
+
+# app/gateway/routers/mcp.py — config reload replaces AppConfig.init()
+@router.put("/config")
+def update_mcp(..., request: Request):
+    ...
+    request.app.state.config = AppConfig.from_file()
+```
+
+`app.state.config` is a FastAPI-owned attribute on the app object, not a module-level global. Scoped to the app's lifetime, only written at startup and config-reload.
+
+**`DeerFlowClient`** → constructor-captured config
+
+```python
+class DeerFlowClient:
+    def __init__(self, config_path: str | None = None, config: AppConfig | None = None):
+        self._config = config or AppConfig.from_file(config_path)
+
+    def chat(self, message: str, thread_id: str) -> str:
+        context = DeerFlowContext(app_config=self._config, thread_id=thread_id)
+        ...
+```
+
+Multiple `DeerFlowClient` instances are now first-class — each owns its config, nothing shared.
+
+**Agent construction (`make_lead_agent`, `_build_middlewares`, prompt helpers)** → threaded through
+
+```python
+def make_lead_agent(config: RunnableConfig, app_config: AppConfig):
+    middlewares = _build_middlewares(app_config, runtime_config=config)
+    ...
+
+def _build_middlewares(app_config: AppConfig, runtime_config: RunnableConfig):
+    if app_config.token_usage.enabled:
+        middlewares.append(TokenUsageMiddleware())
+    ...
+```
+
+Every helper that reads config is now on a function-signature chain from `make_lead_agent`.
+
+**Background threads (memory debounce Timer, queue consumers)** → closure-captured
+
+```python
+def MemoryQueue.add(self, conversation, user_id, config: MemoryConfig):
+    # capture config at enqueue time
+    def _flush():
+        self._updater.update(conversation, user_id, config)
+    self._timer = Timer(config.debounce_seconds, _flush)
+    self._timer.start()
+```
+
+The captured config lives in the closure, not in a contextvar the thread can't see.
+
+### 8.3 Target `AppConfig` shape
+
+```python
+class AppConfig(BaseModel):
+    model_config = ConfigDict(extra="allow", frozen=True)
+
+    log_level: str = "info"
+    memory: MemoryConfig = Field(default_factory=MemoryConfig)
+    ...  # same fields as Phase 1
+
+    @classmethod
+    def from_file(cls, config_path: str | None = None) -> Self:
+        """Pure factory. Reads file, returns frozen object. No side effects."""
+        ...
+
+    @classmethod
+    def resolve_config_path(cls, config_path: str | None = None) -> Path:
+        """Unchanged from Phase 1."""
+        ...
+
+    def get_model_config(self, name: str) -> ModelConfig | None:
+        """Unchanged."""
+        ...
+
+    # Removed:
+    # - _global: ClassVar
+    # - _override: ClassVar[ContextVar]
+    # - init(), set_override(), reset_override(), current()
+```
+
+### 8.4 `DeerFlowContext` and `resolve_context()` after Phase 2
+
+`DeerFlowContext` is unchanged — it's already Phase 2-compliant.
+
+`resolve_context()` simplifies: the "fall back to `AppConfig.current()`" branch goes away. The dict-context legacy path either constructs `DeerFlowContext` with an explicitly-passed `AppConfig` (fed by caller) or is deleted if no dict-context callers remain.
+
+```python
+def resolve_context(runtime: Any) -> DeerFlowContext:
+    ctx = getattr(runtime, "context", None)
+    if isinstance(ctx, DeerFlowContext):
+        return ctx
+    raise RuntimeError(
+        "runtime.context is not a DeerFlowContext. All callers must construct "
+        "and inject one explicitly; there is no global fallback."
+    )
+```
+
+Let-it-crash: if Phase 2 is done correctly, every caller constructs a typed context. If one doesn't, fail loudly.
+
+### 8.5 Trade-off acknowledgment
+
+The three cases where ambient lookup is genuinely tempting (and why we reject them):
+
+| Tempting case | Why ambient looks easier | Why we still reject it |
+|---------------|-------------------------|------------------------|
+| Deep helper in `memory/storage.py` needs `memory.storage_path` | Just threaded through 4 call layers | That's exactly the dependency chain you want visible. It's either there or it's hiding |
+| Community tool factory reading API keys from config | "Each tool factory doesn't want to take config" | Each tool factory literally needs the config. Passing it is the honest signature |
+| Test that wants to "override just one field globally" | `patch.object(AppConfig, "current")` is one line | Tests constructing their own `AppConfig` is one fixture — and that fixture becomes infrastructure for all future tests |
+
+The rejection is consistent: **an explicit parameter is strictly more honest than an implicit global lookup**, in every case.
+
+### 8.6 Scope
+
+- ~97 production call sites: `AppConfig.current()` → parameter
+- ~91 test mock sites: `patch.object(AppConfig, "current")` / `AppConfig._global = ...` → fixture injection
+- ~30 FastAPI endpoints gain `config: AppConfig = Depends(get_config)`
+- ~15 factory / helper functions gain `config: AppConfig` parameter
+- Delete from `app_config.py`: `_global`, `_override`, `init`, `current`, `set_override`, `reset_override`
+- Simplify `resolve_context()`: remove `AppConfig.current()` fallback
+
+Implementation plan: see [2026-04-12-config-refactor-plan.md §Phase 2](./2026-04-12-config-refactor-plan.md#phase-2-pure-explicit-parameter-passing).
@@ -0,0 +1,471 @@
+# Event Store History — Backend Compatibility Layer
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Replace checkpoint state with the append-only event store as the message source in the thread state/history endpoints, so summarization never causes message loss.
+
+**Architecture:** The Gateway's `get_thread_state` and `get_thread_history` endpoints currently read messages from `checkpoint.channel_values["messages"]`. After summarization, those messages are replaced with a synthetic summary-as-human message and all pre-summarize messages are gone. We modify these endpoints to read messages from the RunEventStore instead (append-only, unaffected by summarization). The response shape for each message stays identical so the chat render path needs no changes, but the frontend's feedback hook must be aligned to use the same full-history view (see Task 4).
+
+**Tech Stack:** Python (FastAPI, SQLAlchemy), pytest, TypeScript (React Query)
+
+**Scope:** Gateway mode only (`make dev-pro`). Standard mode uses the LangGraph Server directly and does not go through these endpoints; the summarize bug is still present there and must be tracked as a separate follow-up (see §"Follow-ups" at end of plan).
+
+**Prerequisite already landed:** `backend/packages/harness/deerflow/runtime/journal.py` now unwraps `Command(update={'messages':[ToolMessage(...)]})` in `on_tool_end`, so new runs that use state-updating tools (e.g. `present_files`) write the inner `ToolMessage` content to the event store instead of `str(Command(...))`. Legacy data captured before this fix is cleaned up defensively by the new helper (see Task 1 Step 3 `_sanitize_legacy_command_repr`).
+
+---
+
+## Real Data Alignment Analysis
+
+Compared real `POST /history` response (checkpoint-based) with `run_events` table for thread `6d30913e-dcd4-41c8-8941-f66c716cf359` (docs/resp.json + backend/.deer-flow/data/deerflow.db). See `docs/superpowers/specs/2026-04-11-runjournal-history-evaluation.md` for full evidence chain.
+
+| Message type | Fields compared | Difference |
+|-------------|----------------|------------|
+| human_message | all fields | `id` is `None` in event store, has UUID in checkpoint |
+| ai_message (tool_call) | all fields, 6 overlapping | **IDENTICAL** (0 diffs) |
+| ai_message (final) | all fields | **IDENTICAL** |
+| tool_result (normal) | all fields | Only `id` differs (`None` vs UUID) |
+| tool_result (from `Command`-returning tool) | content | **Legacy data stored `str(Command(...))` repr instead of inner ToolMessage** — fixed in journal.py for new runs; legacy rows sanitized by helper |
+
+**Root cause for id difference:** LangGraph's checkpoint assigns `id` to HumanMessage and ToolMessage during graph execution. Event store writes happen earlier, when those ids are still None. AI messages receive `id` from the LLM response (`lc_run--*`) and are unaffected.
+
+**Fix for id:** Generate deterministic UUIDs for `id=None` messages using `uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")` at read time. Patch a **copy** of the content dict, never the live store object.
+
+**Summarize impact quantified on the reproducer thread**: event_store has 16 messages (7 AI + 9 others); checkpoint has 12 after summarize (5 AI + 7 others). AI id overlap: 5 of 7 — the 2 missing AI messages are pre-summarize.
+
+---
+
+## File Structure
+
+| File | Action | Responsibility |
+|------|--------|----------------|
+| `backend/app/gateway/routers/threads.py` | Modify | Replace checkpoint messages with event store messages in `get_thread_state` and `get_thread_history` |
+| `backend/tests/test_thread_state_event_store.py` | Create | Tests for the modified endpoints |
+
+---
+
+### Task 1: Add `_get_event_store_messages` helper to `threads.py`
+
+A shared helper that loads the **full** message stream from the event store, patches `id=None` messages with deterministic UUIDs, and defensively sanitizes legacy `Command(update=...)` reprs captured before the journal.py fix. Patches a copy of each content dict so the live store is never mutated.
+
+**Design constraints (derived from evaluation §3, §4, §5):**
+- **Full pagination**, not `limit=1000`. `RunEventStore.list_messages` returns "latest N records" — a fixed limit silently truncates older messages. Use `count_messages()` to size the request or loop with `after_seq` cursors.
+- **Copy before mutate**. `MemoryRunEventStore` returns live dict references; the JSONL/DB stores may return detached rows but we must not rely on that. Always `content = dict(evt["content"])` before patching `id`.
+- **Legacy Command sanitization.** Legacy data contains `content["content"] == "Command(update={'artifacts': [...], 'messages': [ToolMessage(content='X', ...)]})"`. Regex-extract the inner ToolMessage content string and replace; if extraction fails, leave content as-is (still strictly better than nothing because checkpoint fallback is also wrong for summarized threads).
+- **User context.** `DbRunEventStore.list_messages` is user-scoped via `resolve_user_id(AUTO)` and relies on the auth contextvar set by `@require_permission`. Both endpoints are already decorated — document this dependency in the helper docstring.
+
+**Files:**
+- Modify: `backend/app/gateway/routers/threads.py`
+- Test: `backend/tests/test_thread_state_event_store.py`
+
+- [ ] **Step 1: Write the test**
+
+Create `backend/tests/test_thread_state_event_store.py`:
+
+```python
+"""Tests for event-store-backed message loading in thread state/history endpoints."""
+
+from __future__ import annotations
+
+import uuid
+
+import pytest
+
+from deerflow.runtime.events.store.memory import MemoryRunEventStore
+
+
+@pytest.fixture()
+def event_store():
+    return MemoryRunEventStore()
+
+
+async def _seed_conversation(event_store: MemoryRunEventStore, thread_id: str = "t1"):
+    """Seed a realistic multi-turn conversation matching real checkpoint format."""
+    # human_message: id is None (same as real data)
+    await event_store.put(
+        thread_id=thread_id, run_id="r1",
+        event_type="human_message", category="message",
+        content={
+            "type": "human", "id": None,
+            "content": [{"type": "text", "text": "Hello"}],
+            "additional_kwargs": {}, "response_metadata": {}, "name": None,
+        },
+    )
+    # ai_tool_call: id is set by LLM
+    await event_store.put(
+        thread_id=thread_id, run_id="r1",
+        event_type="ai_tool_call", category="message",
+        content={
+            "type": "ai", "id": "lc_run--abc123",
+            "content": "",
+            "tool_calls": [{"name": "search", "args": {"q": "cats"}, "id": "call_1", "type": "tool_call"}],
+            "invalid_tool_calls": [],
+            "additional_kwargs": {}, "response_metadata": {}, "name": None,
+            "usage_metadata": {"input_tokens": 100, "output_tokens": 50, "total_tokens": 150},
+        },
+    )
+    # tool_result: id is None (same as real data)
+    await event_store.put(
+        thread_id=thread_id, run_id="r1",
+        event_type="tool_result", category="message",
+        content={
+            "type": "tool", "id": None,
+            "content": "Found 10 results",
+            "tool_call_id": "call_1", "name": "search",
+            "artifact": None, "status": "success",
+            "additional_kwargs": {}, "response_metadata": {},
+        },
+    )
+    # ai_message: id is set by LLM
+    await event_store.put(
+        thread_id=thread_id, run_id="r1",
+        event_type="ai_message", category="message",
+        content={
+            "type": "ai", "id": "lc_run--def456",
+            "content": "I found 10 results about cats.",
+            "tool_calls": [], "invalid_tool_calls": [],
+            "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "name": None,
+            "usage_metadata": {"input_tokens": 200, "output_tokens": 100, "total_tokens": 300},
+        },
+    )
+    # Also add a trace event — should NOT appear
+    await event_store.put(
+        thread_id=thread_id, run_id="r1",
+        event_type="llm_request", category="trace",
+        content={"model": "gpt-4"},
+    )
+
+
+class TestGetEventStoreMessages:
+    """Verify event store message extraction with id patching."""
+
+    @pytest.mark.asyncio
+    async def test_extracts_all_message_types(self, event_store):
+        await _seed_conversation(event_store)
+        events = await event_store.list_messages("t1", limit=500)
+        messages = [evt["content"] for evt in events if isinstance(evt.get("content"), dict) and "type" in evt["content"]]
+        assert len(messages) == 4
+        assert [m["type"] for m in messages] == ["human", "ai", "tool", "ai"]
+
+    @pytest.mark.asyncio
+    async def test_null_ids_get_patched(self, event_store):
+        """Messages with id=None should get deterministic UUIDs."""
+        await _seed_conversation(event_store)
+        events = await event_store.list_messages("t1", limit=500)
+        messages = []
+        for evt in events:
+            content = evt.get("content")
+            if isinstance(content, dict) and "type" in content:
+                if content.get("id") is None:
+                    content["id"] = str(uuid.uuid5(uuid.NAMESPACE_URL, f"t1:{evt['seq']}"))
+                messages.append(content)
+
+        # All messages now have an id
+        for m in messages:
+            assert m["id"] is not None
+            assert isinstance(m["id"], str)
+            assert len(m["id"]) > 0
+
+        # AI messages keep their original id
+        assert messages[1]["id"] == "lc_run--abc123"
+        assert messages[3]["id"] == "lc_run--def456"
+
+        # Human and tool messages get deterministic ids (same input = same output)
+        human_id_1 = str(uuid.uuid5(uuid.NAMESPACE_URL, "t1:1"))
+        assert messages[0]["id"] == human_id_1
+
+    @pytest.mark.asyncio
+    async def test_empty_thread(self, event_store):
+        events = await event_store.list_messages("nonexistent", limit=500)
+        messages = [evt["content"] for evt in events if isinstance(evt.get("content"), dict)]
+        assert messages == []
+
+    @pytest.mark.asyncio
+    async def test_tool_call_fields_preserved(self, event_store):
+        await _seed_conversation(event_store)
+        events = await event_store.list_messages("t1", limit=500)
+        messages = [evt["content"] for evt in events if isinstance(evt.get("content"), dict) and "type" in evt["content"]]
+
+        # AI tool_call message
+        ai_tc = messages[1]
+        assert ai_tc["tool_calls"][0]["name"] == "search"
+        assert ai_tc["tool_calls"][0]["id"] == "call_1"
+
+        # Tool result
+        tool = messages[2]
+        assert tool["tool_call_id"] == "call_1"
+        assert tool["status"] == "success"
+```
+
+- [ ] **Step 2: Run tests to verify they pass**
+
+Run: `cd backend && PYTHONPATH=. uv run pytest tests/test_thread_state_event_store.py -v`
+
+- [ ] **Step 3: Add the helper function and modify `get_thread_history`**
+
+In `backend/app/gateway/routers/threads.py`:
+
+1. Add import at the top:
+```python
+import uuid  # ADD (may already exist, check first)
+from app.gateway.deps import get_run_event_store  # ADD
+```
+
+2. Add the helper function (before the endpoint functions, after the model definitions):
+
+```python
+_LEGACY_CMD_INNER_CONTENT_RE = re.compile(
+    r"ToolMessage\(content=(?P<q>['\"])(?P<inner>.*?)(?P=q)",
+    re.DOTALL,
+)
+
+
+def _sanitize_legacy_command_repr(content_field: Any) -> Any:
+    """Recover the inner ToolMessage text from a legacy ``str(Command(...))`` repr.
+
+    Runs that pre-date the ``on_tool_end`` fix in ``journal.py`` stored
+    ``str(Command(update={'messages':[ToolMessage(content='X', ...)]}))`` as the
+    tool_result content. New runs store ``'X'`` directly. For old threads, try
+    to extract ``'X'`` defensively; return the original string if extraction
+    fails (still no worse than the current checkpoint-based fallback, which is
+    broken for summarized threads anyway).
+    """
+    if not isinstance(content_field, str) or not content_field.startswith("Command(update="):
+        return content_field
+    match = _LEGACY_CMD_INNER_CONTENT_RE.search(content_field)
+    return match.group("inner") if match else content_field
+
+
+async def _get_event_store_messages(request: Request, thread_id: str) -> list[dict] | None:
+    """Load messages from the event store, returning None if unavailable.
+
+    The event store is append-only and immune to summarization. Each
+    message event's ``content`` field contains a ``model_dump()``'d
+    LangChain Message dict that is already JSON-serialisable.
+
+    **Full pagination, not a fixed limit.** ``RunEventStore.list_messages``
+    returns the newest ``limit`` records when no cursor is given, which
+    silently drops older messages. We call ``count_messages()`` first and
+    request that many records. For stores that may return fewer (e.g. filtered
+    by user), we also fall back to ``after_seq``-cursor pagination.
+
+    **Copy-on-read.** Each content dict is copied before ``id`` is patched so
+    the live store object is never mutated; ``MemoryRunEventStore`` returns
+    live references.
+
+    **Legacy Command repr sanitization.** See ``_sanitize_legacy_command_repr``.
+
+    **User context.** ``DbRunEventStore`` is user-scoped by default via
+    ``resolve_user_id(AUTO)`` (see ``runtime/user_context.py``). Callers of
+    this helper must be inside a request where ``@require_permission`` has
+    populated the user contextvar. Both ``get_thread_history`` and
+    ``get_thread_state`` satisfy that. Do not call this helper from CLI or
+    migration scripts without passing ``user_id=None`` explicitly.
+
+    Returns ``None`` when the event store is not configured or contains no
+    messages for this thread, so callers can fall back to checkpoint messages.
+    """
+    try:
+        event_store = get_run_event_store(request)
+    except Exception:
+        return None
+
+    try:
+        total = await event_store.count_messages(thread_id)
+    except Exception:
+        logger.exception("count_messages failed for thread %s", sanitize_log_param(thread_id))
+        return None
+    if not total:
+        return None
+
+    # Batch by page_size to keep memory bounded for very long threads.
+    page_size = 500
+    collected: list[dict] = []
+    after_seq: int | None = None
+    while True:
+        page = await event_store.list_messages(thread_id, limit=page_size, after_seq=after_seq)
+        if not page:
+            break
+        collected.extend(page)
+        if len(page) < page_size:
+            break
+        after_seq = page[-1].get("seq")
+        if after_seq is None:
+            break
+
+    messages: list[dict] = []
+    for evt in collected:
+        raw = evt.get("content")
+        if not isinstance(raw, dict) or "type" not in raw:
+            continue
+        # Copy to avoid mutating the store-owned dict.
+        content = dict(raw)
+        if content.get("id") is None:
+            content["id"] = str(uuid.uuid5(uuid.NAMESPACE_URL, f"{thread_id}:{evt['seq']}"))
+        # Sanitize legacy Command reprs on tool_result messages only.
+        if content.get("type") == "tool":
+            content["content"] = _sanitize_legacy_command_repr(content.get("content"))
+        messages.append(content)
+    return messages if messages else None
+```
+
+Also add `import re` at the top of the file if it isn't already imported.
+
+3. In `get_thread_history` (around line 585-590), replace the messages section:
+
+**Before:**
+```python
+            # Attach messages from checkpointer only for the latest checkpoint
+            if is_latest_checkpoint:
+                messages = channel_values.get("messages")
+                if messages:
+                    values["messages"] = serialize_channel_values({"messages": messages}).get("messages", [])
+            is_latest_checkpoint = False
+```
+
+**After:**
+```python
+            # Attach messages: prefer event store (immune to summarization),
+            # fall back to checkpoint messages when event store is unavailable.
+            if is_latest_checkpoint:
+                es_messages = await _get_event_store_messages(request, thread_id)
+                if es_messages is not None:
+                    values["messages"] = es_messages
+                else:
+                    messages = channel_values.get("messages")
+                    if messages:
+                        values["messages"] = serialize_channel_values({"messages": messages}).get("messages", [])
+            is_latest_checkpoint = False
+```
+
+- [ ] **Step 4: Modify `get_thread_state` similarly**
+
+In `get_thread_state` (around line 443-444), replace:
+
+**Before:**
+```python
+    return ThreadStateResponse(
+        values=serialize_channel_values(channel_values),
+```
+
+**After:**
+```python
+    values = serialize_channel_values(channel_values)
+
+    # Override messages with event store data (immune to summarization)
+    es_messages = await _get_event_store_messages(request, thread_id)
+    if es_messages is not None:
+        values["messages"] = es_messages
+
+    return ThreadStateResponse(
+        values=values,
+```
+
+- [ ] **Step 5: Run all backend tests**
+
+Run: `cd backend && PYTHONPATH=. uv run pytest tests/ -v --timeout=30 -x`
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add backend/app/gateway/routers/threads.py backend/tests/test_thread_state_event_store.py
+git commit -m "feat(threads): load messages from event store instead of checkpoint state
+
+Event store is append-only and immune to summarization. Messages with
+null ids (human, tool) get deterministic UUIDs based on thread_id:seq
+for stable frontend rendering."
+```
+
+---
+
+### Task 2 (OPTIONAL, deferred): Reduce flush_threshold for shorter mid-stream gap
+
+**Status:** Not a correctness fix. Re-evaluation (see spec) found that `RunJournal` already flushes on `run_end`, `run_error`, cancel, and worker `finally` paths. The only window this tuning narrows is a hard process crash or mid-run reload. Defer and decide separately; do not couple with Task 1 merge.
+
+If pursued: change `flush_threshold` default from 20 → 5 in `journal.py:42`, rerun `tests/test_run_journal.py`, commit as a separate `perf(journal): …` commit.
+
+---
+
+### Task 3: Fix `useThreadFeedback` pagination in frontend
+
+Once `/history` returns the full event-store-backed message stream, the frontend's `runIdByAiIndex` map must also cover the full stream or its positional AI-index mapping drifts and feedback clicks go to the wrong `run_id`. The current hook hardcodes `limit=200`.
+
+**Files:**
+- Modify: `frontend/src/core/threads/hooks.ts` (around line 679)
+
+- [ ] **Step 1: Replace the fixed `?limit=200` with full pagination**
+
+Change:
+
+```ts
+const res = await fetchWithAuth(
+  `${getBackendBaseURL()}/api/threads/${encodeURIComponent(threadId)}/messages?limit=200`,
+);
+```
+
+to a loop that pages via `after_seq` (or an equivalent query param exposed by the `/messages` endpoint — check `backend/app/gateway/routers/thread_runs.py:285-323` for the actual parameter names before writing the TS code). Accumulate `messages` until a page returns fewer than the page size.
+
+- [ ] **Step 2: Defensive index guard**
+
+`runIdByAiIndex[aiMessageIndex]` can still be `undefined` when the frontend renders optimistic state before the messages query refreshes. The current `?? undefined` in `message-list.tsx:71` already handles this; do not remove it.
+
+- [ ] **Step 3: Invalidate `["thread-feedback", threadId]` after a new run**
+
+In `useThreadStream` (or wherever stream-end is handled), call `queryClient.invalidateQueries({ queryKey: ["thread-feedback", threadId] })` when the stream closes so the runIdByAiIndex picks up the new run's AI message immediately.
+
+- [ ] **Step 4: Run `pnpm check`**
+
+```bash
+cd frontend && pnpm check
+```
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add frontend/src/core/threads/hooks.ts
+git commit -m "fix(feedback): paginate useThreadFeedback and invalidate after stream"
+```
+
+---
+
+### Task 4: End-to-end test — summarize + multi-run feedback
+
+Add a regression test that exercises the exact bug class we are fixing: a summarized thread with at least two runs, where feedback clicks must target the correct `run_id`.
+
+**Files:**
+- Modify: `backend/tests/test_thread_state_event_store.py`
+
+- [ ] **Step 1: Write the test**
+
+Seed a `MemoryRunEventStore` with two runs worth of messages (`r1`: human + ai + human + ai, `r2`: human + ai), then simulate a summarized checkpoint state that drops the `r1` messages. Call `_get_event_store_messages` and assert:
+
+- Length matches the event store, not the checkpoint
+- The first message is the original `r1` human, not a summary
+- AI messages preserve their `lc_run--*` ids in order
+- Any `id=None` messages get a stable `uuid5(...)` id
+- A legacy `str(Command(update=...))` content field in a tool_result is sanitized to the inner text
+
+- [ ] **Step 2: Run the new test**
+
+```bash
+cd backend && PYTHONPATH=. uv run pytest tests/test_thread_state_event_store.py -v
+```
+
+- [ ] **Step 3: Commit with Tasks 1, 3 changes**
+
+Bundle with the Task 1 commit so tests always land alongside the implementation.
+
+---
+
+### Task 5: Standard mode follow-up (documentation only)
+
+Standard mode (`make dev`) hits LangGraph Server directly for `/threads/{id}/history` and does not go through the Gateway router we just patched. The summarize bug is still present there.
+
+**Files:**
+- Modify: this plan (add follow-up section at the bottom, see below) OR create a separate tracking issue
+
+- [ ] **Step 1: Record the gap**
+
+Append to the bottom of this plan (or open a GitHub issue and link it):
+
+> **Follow-up — Standard mode summarize bug**
+> `get_thread_history` in `backend/app/gateway/routers/threads.py` is only hit in Gateway mode. Standard mode proxies `/api/langgraph/*` directly to the LangGraph Server (see `backend/CLAUDE.md` nginx routing and `frontend/CLAUDE.md` `NEXT_PUBLIC_LANGGRAPH_BASE_URL`). The summarize-message-loss symptom is still reproducible there. Options: (a) teach the LangGraph Server checkpointer to branch on an override, (b) move `/history` behind Gateway in Standard mode as well, (c) accept as known limitation for Standard mode. Decide before GA.
@@ -0,0 +1,191 @@
+# RunJournal 替换 History Messages — 方案评估与对比
+
+**日期**：2026-04-11
+**分支**：`rayhpeng/fix-persistence-new`
+**相关 plan**：[`docs/superpowers/plans/2026-04-10-event-store-history.md`](../plans/2026-04-10-event-store-history.md)（尚未落地）
+
+---
+
+## 1. 问题与数据核对
+
+**症状**：SummarizationMiddleware 触发后，前端历史中无法展示 summarize 之前的真实用户消息。
+
+**复现数据**（thread `6d30913e-dcd4-41c8-8941-f66c716cf359`）：
+
+| 数据源 | seq=1 的 message | 总 message 数 | 是否保留原始 human |
+|---|---|---:|---|
+| `run_events`（SQLite） | human `"最新伊美局势"` | 9（1 human + 7 ai_tool_call + 9 tool_result + 1 ai_message） | ✅ |
+| `/history` 响应（`docs/resp.json`） | type=human，content=`"Here is a summary of the conversation to date:…"` | 不定 | ❌（已被 summary 替换）|
+
+**根因**：`backend/app/gateway/routers/threads.py:587-589` 的 `get_thread_history` 从 `checkpoint.channel_values["messages"]` 读取，而 LangGraph 的 SummarizationMiddleware 会原地改写这个列表。
+
+---
+
+## 2. 候选方案
+
+| 方案 | 描述 | 本次是否推荐 |
+|---|---|---|
+| **A. event_store 覆盖 messages**（已有 plan） | `/history`、`/state` 改读 `RunEventStore.list_messages()`，覆盖 `channel_values["messages"]`；其它字段保持 checkpoint 来源 | ✅ 主方案 |
+| B. 修 SummarizationMiddleware | 让 summarize 不原地替换 messages（作为附加 system message） | ❌ 违背 summarize 的 token 预算初衷 |
+| C. 双读合并（checkpoint + event_store diff） | 合并 summarize 切点前后的两段 | ❌ 合并逻辑复杂无额外收益 |
+| D. 切到现有 `/api/threads/{id}/messages` 端点 | 前端直接消费已经存在的 event-store 消息端点（`thread_runs.py:285-323`）| ⚠️ 更干净但需要前端改动 |
+
+---
+
+## 3. Claude 自评 vs Codex 独立评估
+
+两方独立分析了同一份 plan。重合点基本一致，但 **Codex 发现了一个我遗漏的关键 bug**。
+
+### 3.1 一致结论
+
+| 维度 | 结论 |
+|---|---|
+| 正确性方向 | event_store 是 append-only + 不受 summarize 影响，方向正确 |
+| ID 补齐 | `uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")` 稳定且确定性，安全 |
+| 前端 schema | 零改动 |
+| Non-message 字段（artifacts/todos/title/thread_data） | summarize 只影响 messages，不需要覆盖其它字段 |
+| 多 checkpoint 语义 | 前端 `useStream` 只取 `limit: 1`（`frontend/src/core/threads/hooks.ts:203-210`），不做时间旅行；latest-only 可接受但应在注释/文档写清楚 |
+| 作用域 | 仅 Gateway mode；Standard mode 直连 LangGraph Server，bug 在默认部署路径仍然存在 |
+
+### 3.2 Claude 的独立观察
+
+1. 已验证数据对齐：plan 文档第 15-28 行的真实数据对齐表与本次 `run_events` 导出一致（9 条消息 id 分布：AI 来自 LLM `lc_run--*`、human/tool 为 None）。
+2. 担心 `run_end` / `run_error` / `cancel` 路径未必都 flush —— 这一点 Codex 实际核查了代码并给出确定结论（见下）。
+3. 方案 A 的单文件改动约 60 行，复杂度小。
+
+### 3.3 Codex 的关键补充（Claude 遗漏）
+
+> **Bug #1 — Plan 用 `limit=1000` 并非全量**
+> `RunEventStore.list_messages()` 的语义是"返回最新 limit 条"（`base.py:51-65`、`db.py:151-181`）。对于消息数超过 1000 的长对话，plan 当前写法会**丢掉最早的消息**，再次引入"消息丢失"bug（只是换了丢失的段）。
+
+> **Bug #2 — helper 就地修改了 store 的 dict**
+> plan 的 helper 里对 `content` 原地写 `id`；`MemoryRunEventStore` 返回的是**活引用**，会污染 store 中的对象。应 deep-copy 或 dict 推导出新对象。
+
+> **Flush 路径已核查**：
+> `RunJournal` 在 threshold (`journal.py:360-373`)、`run_end` (`91-96`)、`run_error` (`97-106`)、worker `finally` (`worker.py:280-286`) 都会 flush；`CancelledError` 也走 finally。**正常 end/error/cancel 都 flush，仅硬 kill / 进程崩溃会丢缓冲区**。
+> 因此 `flush_threshold 20 → 5` 的意义**仅在于硬崩溃窗口**与 mid-run reload 可见性，**不是正确性修复**，属于可选 tuning。代价是更多 put_batch / SQLite churn；且 `_flush_sync()` (`383-398`) 已防止并发 flush，所以"每 5 条一 flush"是 best-effort 非严格保证。
+
+### 3.4 Codex 未否决但提示的次要点
+
+- 方案 D（消费现有 `/api/threads/{id}/messages` 端点）更干净但需前端改动。
+- `/history` 一旦被方案 A 改过，就不再是严格意义上的"按 checkpoint 快照"API（对 messages 字段），应写进注释和 API 文档。
+- Standard mode 的 summarize bug 应建立独立 follow-up issue。
+
+---
+
+## 4. 最终合并判决
+
+**Codex**：APPROVE-WITH-CHANGES
+**Claude**：同意 Codex 的判决
+
+### 合并前必须修改（Top 3）
+
+1. **修复分页 bug**：不能用固定 `limit=1000`。必须用以下之一：
+   - `count = await event_store.count_messages(thread_id)`，再 `list_messages(thread_id, limit=count)`
+   - 或循环 cursor 分页（`after_seq`）直到耗尽
+2. **不要原地修改 store dict**：helper 对 `content` 的 id 补齐需要 copy（`dict(content)` 浅拷贝足够，因为只写 top-level `id`）
+3. **Standard mode 显式 follow-up**：在 plan 文末加 "Standard-mode follow-up: TODO #xxx"，或在合并 PR 描述中明确这是 Gateway-only 止血
+
+### 可选（非阻塞）
+
+4. `flush_threshold 20 → 5` 降级为"可选 tuning"，不是修复的一部分；或独立一条 commit 并说明只对硬崩溃窗口有用
+5. `get_thread_history` 新增注释，说明 messages 字段脱离了 checkpoint 快照语义
+6. 测试覆盖：模拟 summarize 后的 checkpoint + 真实 event_store，端到端验证 `/history` 返回包含原始 human 消息
+
+---
+
+## 5. 推荐执行顺序
+
+1. 按本文档 §4 修订 `docs/superpowers/plans/2026-04-10-event-store-history.md`（主要是 Task 1 的 helper 实现 + 分页）
+2. 按修订后的 plan 执行（走 `superpowers:executing-plans`）
+3. 合并后立即建 Standard mode follow-up issue
+
+## 6. Feedback 影响分析（2026-04-11 补充）
+
+### 6.1 数据模型
+
+`feedback` 表（`persistence/feedback/model.py`）：
+
+| 字段 | 说明 |
+|---|---|
+| `feedback_id` PK | - |
+| `run_id` NOT NULL | 反馈目标 run |
+| `thread_id` NOT NULL | - |
+| `user_id` | - |
+| `message_id` nullable | 注释明确写：`optional RunEventStore event identifier` — 已经面向 event_store 设计 |
+| UNIQUE(thread_id, run_id, user_id) | 每 run 每用户至多一条 |
+
+**结论**：feedback **不按 message uuid 存**，按 `run_id` 存，所以 summarize 导致的 checkpoint messages 丢失**不会影响 feedback 存储**。schema 天生与 event_store 兼容，**无需数据迁移**。
+
+### 6.2 前端的 runId 映射：发现隐藏 bug
+
+前端 feedback 目前走两条并行的数据链：
+
+| 用途 | 数据源 | 位置 |
+|---|---|---|
+| 渲染消息体 | `POST /history`（checkpoint） | `useStream` → `thread.messages` |
+| 拿 `runId` 映射 | `GET /api/threads/{id}/messages?limit=200`（**event_store**） | `useThreadFeedback` (`hooks.ts:669-709`) |
+
+两者通过 **"AI 消息的序号"** 对齐：
+
+```ts
+// hooks.ts:691-698
+for (const msg of messages) {
+  if (msg.event_type === "ai_message") {
+    runIdByAiIndex.push(msg.run_id);  // 只按 AI 顺序 push
+  }
+}
+// message-list.tsx:70-71
+runId = feedbackData.runIdByAiIndex[aiMessageIndex]
+```
+
+**Bug**：summarize 过的 thread 里，两条数据链的 AI 消息数量和顺序**不一致**：
+
+| 数据源 | 本 thread 的 AI 消息序列 | 数量 |
+|---|---|---:|
+| `/history`（checkpoint，summarize 后） | seq=19,31,37,45,53 | 5 |
+| `/messages`（event_store，完整） | seq=5,13,19,31,37,45,53 | 7 |
+
+结果：前端渲染的"第 0 条 AI 消息"是 seq=19，但 `runIdByAiIndex[0]` 指向 seq=5 的 run（本例同一 run 里没事，**跨多 run 的 thread 点赞就会打到错的 run 上**）。
+
+**这个 bug 和本次 plan 无关，已经存在了**。只是用户未必注意到。
+
+### 6.3 方案 A 对 feedback 的影响
+
+**负面**：无。feedback 存储不受影响。
+
+**正面（意外收益）**：`/history` 切换到 event_store 后，**两条数据链的 AI 消息序列自动对齐**，§6.2 的隐藏 bug 被顺带修好。
+
+**前提条件**（加入 Top 3 改动之一同等重要）：
+
+- 新 helper 必须和 `/messages` 端点用**同样的消息获取逻辑**（same store, same filter）。否则两条链仍然可能在边界条件下漂移
+- 具体说：**两边都要做完整分页**。目前 `/messages?limit=200` 在前端硬编码 200，如果 thread 有 >200 条消息就会截断；plan 的 `limit=1000` 也一样有上限。两个上限不一致 → 两边顺序不再对齐 → feedback 映射错位
+- **必须修**：`useThreadFeedback` 的 `limit=200` 需要改成分页获取全部，或者 `/messages` 后端改为默认全量
+
+### 6.4 对前端改造顺序的影响
+
+原 plan 声明"零前端改动"，但加入 feedback 考虑后应修正为：
+
+| 改动 | 必须 | 可选 |
+|---|---|---|
+| 后端 `/history` 改读 event_store | ✅ | - |
+| 后端 helper 用分页而非 `limit=1000` | ✅ | - |
+| 前端 `useThreadFeedback` 改用分页或提升 limit | ✅ | - |
+| `runIdByAiIndex` 增加防御：索引越界 fallback `undefined`（已有）| - | ✅ 已经是 |
+| 前端改用 `/messages` 直接做渲染（方案 D） | - | ✅ 长期更干净 |
+
+### 6.5 feedback 相关的新 Top 3 补充
+
+在原来的 Top 3 之外，再加：
+
+4. **前端 `useThreadFeedback` 必须分页或拉全**（`frontend/src/core/threads/hooks.ts:679`），否则和 `/history` 的新全量行为仍然错位
+5. **端到端测试**：一个 thread 跨 >1 个 run + 触发 summarize + 给历史 AI 消息点赞，确认 feedback 打到正确的 run_id
+6. **TanStack Query 缓存协调**：`thread-feedback` 与 history 查询的 `staleTime` / invalidation 需要在新 run 结束时同步刷新，否则新消息写入后 `runIdByAiIndex` 没更新，点赞会打到上一个 run
+
+---
+
+## 8. 未决问题
+
+- `RunEventStore.count_messages()` 与 `list_messages(after_seq=...)` 的实际性能（SQLite 上对于数千消息级别应无问题，但未压测）
+- `MemoryRunEventStore` 与 `DbRunEventStore` 分页语义是否一致（Codex 只核查了 `db.py`，`memory.py` 需确认）
+- 是否应把 `/api/threads/{id}/messages` 提升为前端主用 endpoint，把 `/history` 保留为纯 checkpoint API —— 架构层面更干净但成本更高
@@ -0,0 +1,203 @@
+# Summarize Marker in History — Design & Verification
+
+**Date**: 2026-04-11
+**Branch**: `rayhpeng/fix-persistence-new`
+**Status**: Design approved, implementation deferred to a follow-up PR
+**Depends on**: [`2026-04-11-runjournal-history-evaluation.md`](./2026-04-11-runjournal-history-evaluation.md) (the event-store-backed history fix this builds on)
+
+---
+
+## 1. Goal
+
+Display a "summarization happened here" marker in the conversation history UI when `SummarizationMiddleware` ran mid-run, so users understand why earlier messages look condensed or missing. The event-store-backed `/history` fix already recovered the original messages; this spec adds a **visible marker** at the seq position where summarization occurred, optionally showing the generated summary text.
+
+## 2. Investigation findings
+
+### 2.1 Today's state: zero middleware records
+
+Full scan of `backend/.deer-flow/data/deerflow.db` `run_events`:
+
+| category | rows |
+|---|---:|
+| trace | 76 |
+| message | 34 |
+| lifecycle | 8 |
+| **middleware** | **0** |
+
+No row has `event_type` containing `summariz` or `middleware`. The middleware category is dead in production.
+
+### 2.2 Why: two dead code paths in `journal.py`
+
+| Location | Status |
+|---|---|
+| `journal.py:343-362` — `on_custom_event("summarization", ...)` writes one trace event + one `category="middleware"` event. | Dead. Only fires when something calls `adispatch_custom_event("summarization", {...})`. The upstream LangChain `SummarizationMiddleware` (`.venv/.../langchain/agents/middleware/summarization.py:272`) **never emits custom events** — its `before_model`/`abefore_model` just mutate messages in place and return `{'messages': new_messages}`. Callback never triggered. |
+| `journal.py:449` — `record_middleware(tag, *, name, hook, action, changes)` helper | Dead. Grep shows zero callers in the harness. Added speculatively, never wired up. |
+
+### 2.3 Concrete evidence of summarize running unlogged
+
+Thread `3d5dea4a-0983-4727-a4e8-41a64428933a`:
+
+- `run_events` seq=1 → original human `"写一份关于deer-flow的详细技术报告"` ✓ (event store is fine)
+- `run_events` seq=43 → `llm_request` trace whose `messages[0]` literal contains `"Here is a summary of the conversation to date:"` — proof that SummarizationMiddleware did inject a summary mid-run
+- Zero rows with `category='middleware'` for this thread → nothing captured for UI to render
+
+## 3. Approaches considered
+
+### A. Subclass `SummarizationMiddleware` and dispatch a custom event
+
+Wrap the upstream class, override `abefore_model`, call `await adispatch_custom_event("summarization", {...})` after super(). Journal's existing `on_custom_event` path captures it.
+
+### B. Frontend-only diff heuristic
+
+Compare `event_store.count_messages()` vs rendered count, infer summarization happened from the gap. **Rejected**: can't pinpoint position in the stream, can't show summary text. Only yields a vague badge.
+
+### C. Hybrid A + frontend inline card rendered at the middleware event's seq position
+
+Same backend as A, plus frontend renders an inline `[N messages condensed]` card at the correct chronological position. **Recommended terminal state**.
+
+## 4. Subagent's wrong claim and its rebuttal
+
+An independent agent flagged approach A as structurally broken because:
+
+> `RunnableCallable(trace=False)` skips `set_config_context`, therefore `var_child_runnable_config` is never set, therefore `adispatch_custom_event` raises `RuntimeError("Unable to dispatch an adhoc event without a parent run id")`.
+
+**This is wrong.** The user's counter-intuition was correct: `trace=False` does not prevent `adispatch_custom_event` from working, as long as the middleware signature explicitly accepts `config: RunnableConfig`. The mechanism:
+
+1. `RunnableCallable.__init__` (`langgraph/_internal/_runnable.py:293-319`) inspects the function signature. If it accepts `config: RunnableConfig`, that parameter is recorded in `self.func_accepts`.
+2. Both `trace=True` and `trace=False` branches of `ainvoke` run the same kwarg-injection loop (`_runnable.py:349-356`): `if kw == "config": kw_value = config`. The `config` passed to `ainvoke` (from Pregel's `task.proc.ainvoke(task.input, config)` at `pregel/_retry.py:138`) is the task config with callbacks already bound.
+3. Inside the middleware, passing that `config` explicitly to `adispatch_custom_event(..., config=config)` means the function doesn't rely on `var_child_runnable_config.get()` at all. The LangChain docstring at `langchain_core/callbacks/manager.py:2574-2579` even says "If using python 3.10 and async, you MUST specify the config parameter" — which is exactly this path.
+
+`trace=False` only changes whether **this runnable layer creates a new child callback scope**. It does not affect whether the outer-layer config (with callbacks including `RunJournal`) is passed down to the function.
+
+## 5. Verification
+
+Ran `/tmp/verify_summarize_event.py` (standalone minimal reproduction):
+
+- Minimal `AgentMiddleware` subclass with `abefore_model(self, state, runtime, config: RunnableConfig)`
+- Calls `await adispatch_custom_event("summarization", {...}, config=config)` inside
+- `create_agent(model=FakeChatModel, middleware=[probe])`
+- `agent.ainvoke({...}, config={"callbacks": [RecordingHandler()]})`
+
+**Result**:
+
+```
+INFO verify: ProbeMiddleware.abefore_model called
+INFO verify:   config keys: ['callbacks', 'configurable', 'metadata']
+INFO verify:   config.callbacks type: AsyncCallbackManager
+INFO verify:   config.metadata: {'langgraph_step': 1, 'langgraph_node': 'probe.before_model', ...}
+INFO verify: on_custom_event fired: name=summarization
+             run_id=019d7d19-1727-7830-aa33-648ecbee4b95
+             data={'summary': 'fake summary', 'replaced_count': 3}
+SUCCESS: approach A is viable (config injection + adispatch work)
+```
+
+All five predictions held:
+
+1. ✅ `config: RunnableConfig` signature triggers auto-injection despite `trace=False`
+2. ✅ `config.callbacks` is an `AsyncCallbackManager` with `parent_run_id` set
+3. ✅ `adispatch_custom_event(..., config=config)` runs without error
+4. ✅ `RecordingHandler.on_custom_event` receives the event
+5. ✅ The received `run_id` is a valid UUID tied to the running graph
+
+**Bonus finding**: `config.metadata` contains `langgraph_step` and `langgraph_node`. These can be included in the middleware event's metadata to help the frontend position the marker on the timeline.
+
+## 6. Recommended implementation (approach C)
+
+### 6.1 Backend
+
+**New wrapper middleware** in `backend/packages/harness/deerflow/agents/lead_agent/agent.py`:
+
+```python
+from langchain.agents.middleware.summarization import SummarizationMiddleware
+from langchain_core.callbacks import adispatch_custom_event
+from langchain_core.runnables import RunnableConfig
+
+
+class _TrackingSummarizationMiddleware(SummarizationMiddleware):
+    """Wraps upstream SummarizationMiddleware to emit a ``summarization``
+    custom event on every actual summarization, so RunJournal can persist
+    a middleware:summarize row to the event store.
+
+    The upstream class does not emit events of its own. Declaring
+    ``config: RunnableConfig`` in the override lets LangGraph's
+    ``RunnableCallable`` inject the Pregel task config (with callbacks
+    and parent_run_id) regardless of ``trace=False`` on the node.
+    """
+
+    async def abefore_model(self, state, runtime, config: RunnableConfig):
+        before_count = len(state.get("messages") or [])
+        result = await super().abefore_model(state, runtime)
+        if result is None:
+            return None
+
+        new_messages = result.get("messages") or []
+        replaced_count = max(0, before_count - len(new_messages))
+        summary_text = _extract_summary_text(new_messages)
+
+        await adispatch_custom_event(
+            "summarization",
+            {
+                "summary": summary_text,
+                "replaced_count": replaced_count,
+            },
+            config=config,
+        )
+        return result
+
+
+def _extract_summary_text(messages: list) -> str:
+    """Pull the summary string out of the HumanMessage the upstream class
+    injects as ``Here is a summary of the conversation to date:...``."""
+    for msg in messages:
+        if getattr(msg, "type", None) == "human":
+            content = getattr(msg, "content", "")
+            text = content if isinstance(content, str) else ""
+            if text.startswith("Here is a summary of the conversation to date"):
+                return text
+    return ""
+```
+
+Swap the existing `SummarizationMiddleware()` instantiation in `_build_middlewares` for `_TrackingSummarizationMiddleware(...)` with the same args.
+
+**Journal change**: **zero**. `on_custom_event("summarization", ...)` in `journal.py:343-362` already writes both a trace and a `category="middleware"` row.
+
+**History helper change**: extend `_get_event_store_messages` in `backend/app/gateway/routers/threads.py` to surface `category="middleware"` rows as pseudo-messages, e.g.:
+
+```python
+# In the per-event loop, after the existing message branch:
+if evt.get("category") == "middleware" and evt.get("event_type") == "middleware:summarize":
+    meta = evt.get("metadata") or {}
+    messages.append({
+        "id": f"summary-marker-{evt['seq']}",
+        "type": "summary_marker",
+        "replaced_count": meta.get("replaced_count", 0),
+        "summary": (raw or {}).get("content", "") if isinstance(raw, dict) else "",
+        "run_id": evt.get("run_id"),
+    })
+```
+
+The marker uses a sentinel `type` (`summary_marker`) that doesn't collide with any LangChain message type, so downstream consumers that loop over messages can skip or render it explicitly.
+
+### 6.2 Frontend
+
+- `core/messages/utils.ts`: extend the message grouping to recognize `type === "summary_marker"` and yield it as its own group (`"assistant:summary-marker"`)
+- `components/workspace/messages/message-list.tsx`: add a branch in the grouped render switch that renders a distinctive inline card showing `N messages condensed` and a collapsible panel with the summary text
+- No changes to feedback logic: the marker has no `feedback` field so the button naturally doesn't render on it
+
+## 7. Risks
+
+1. **Synchronous path**. The upstream class has both `before_model` and `abefore_model`. Our wrapper only overrides the async variant. If any deer-flow code path ever uses the sync flow, those summarizations won't be captured. Mitigation: also override `before_model` and use `dispatch_custom_event` (sync variant) with the same pattern.
+2. **`_extract_summary_text` fragility**. It depends on the upstream class prefix `"Here is a summary of the conversation to date"` in the injected `HumanMessage`. Any upstream template change breaks detection. Mitigation: pick the first new `HumanMessage` that wasn't in `state["messages"]` before super() — resilient to template wording changes at the cost of a small diff helper.
+3. **`replaced_count` accuracy when concurrent updates**. If another middleware in the chain also modifies `state["messages"]` before super() returns, the naive `before_count - len(new_messages)` arithmetic is wrong. Mitigation: inspect the `RemoveMessage(id=REMOVE_ALL_MESSAGES)` that upstream emits and count from the original input list directly.
+4. **History helper contract change**. Introducing a non-LangChain-typed entry (`type="summary_marker"`) in the `/history` response could break frontend code that blindly casts entries to `Message`. Mitigation: the frontend change above adds an explicit branch; type-check the frontend end-to-end before merging.
+
+## 8. Out of scope / deferred
+
+- Other middleware types (Title, Guardrail, HITL) do not emit custom events either. If we want markers for those too, repeat the wrapper pattern for each. Not in this design.
+- Retroactive markers for old threads (captured before this patch) are impossible without re-running the graph. Legacy threads will show the event-store-recovered messages without a marker.
+- Standard mode (`make dev`) — agent runs inside LangGraph Server, not the Gateway-embedded runtime. `RunJournal` may not be wired there, so the custom event fires but is captured by no one. Tracked as a separate follow-up.
+
+## 9. Next actions
+
+1. Land the current summarize-message-loss fixes (journal `Command` unwrap + event-store-backed `/history` + inline feedback) — implementation verified, being committed now as three commits on `rayhpeng/fix-persistence-new`
+2. Summarize-marker implementation (this spec) → separate follow-up PR based on the above verified design