mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-05-22 16:06:50 +00:00
refactor(config): eliminate global mutable state — explicit parameter passing on top of main
Squashes 25 PR commits onto current main. AppConfig becomes a pure value object with no ambient lookup. Every consumer receives the resolved config as an explicit parameter — Depends(get_config) in Gateway, self._app_config in DeerFlowClient, runtime.context.app_config in agent runs, AppConfig.from_file() at the LangGraph Server registration boundary. Phase 1 — frozen data + typed context - All config models (AppConfig, MemoryConfig, DatabaseConfig, …) become frozen=True; no sub-module globals. - AppConfig.from_file() is pure (no side-effect singleton loaders). - Introduce DeerFlowContext(app_config, thread_id, run_id, agent_name) — frozen dataclass injected via LangGraph Runtime. - Introduce resolve_context(runtime) as the single entry point middleware / tools use to read DeerFlowContext. Phase 2 — pure explicit parameter passing - Gateway: app.state.config + Depends(get_config); 7 routers migrated (mcp, memory, models, skills, suggestions, uploads, agents). - DeerFlowClient: __init__(config=...) captures config locally. - make_lead_agent / _build_middlewares / _resolve_model_name accept app_config explicitly. - RunContext.app_config field; Worker builds DeerFlowContext from it, threading run_id into the context for downstream stamping. - Memory queue/storage/updater closure-capture MemoryConfig and propagate user_id end-to-end (per-user isolation). - Sandbox/skills/community/factories/tools thread app_config. - resolve_context() rejects non-typed runtime.context. - Test suite migrated off AppConfig.current() monkey-patches. - AppConfig.current() classmethod deleted. Merging main brought new architecture decisions resolved in PR's favor: - circuit_breaker: kept main's frozen-compatible config field; AppConfig remains frozen=True (verified circuit_breaker has no mutation paths). - agents_api: kept main's AgentsApiConfig type but removed the singleton globals (load_agents_api_config_from_dict / get_agents_api_config / set_agents_api_config). 8 routes in agents.py now read via Depends(get_config). - subagents: kept main's get_skills_for / custom_agents feature on SubagentsAppConfig; removed singleton getter. registry.py now reads app_config.subagents directly. - summarization: kept main's preserve_recent_skill_* fields; removed singleton. - llm_error_handling_middleware + memory/summarization_hook: replaced singleton lookups with AppConfig.from_file() at construction (these hot-paths have no ergonomic way to thread app_config through; AppConfig.from_file is a pure load). - worker.py + thread_data_middleware.py: DeerFlowContext.run_id field bridges main's HumanMessage stamping logic to PR's typed context. Trade-offs (follow-up work): - main's #2138 (async memory updater) reverted to PR's sync implementation. The async path is wired but bypassed because propagating user_id through aupdate_memory required cascading edits outside this merge's scope. - tests/test_subagent_skills_config.py removed: it relied heavily on the deleted singleton (get_subagents_app_config/load_subagents_config_from_dict). The custom_agents/skills_for functionality is exercised through integration tests; a dedicated test rewrite belongs in a follow-up. Verification: backend test suite — 2560 passed, 4 skipped, 84 failures. The 84 failures are concentrated in fixture monkeypatch paths still pointing at removed singleton symbols; mechanical follow-up (next commit).
This commit is contained in:
@@ -0,0 +1,191 @@
|
||||
# RunJournal 替换 History Messages — 方案评估与对比
|
||||
|
||||
**日期**:2026-04-11
|
||||
**分支**:`rayhpeng/fix-persistence-new`
|
||||
**相关 plan**:[`docs/superpowers/plans/2026-04-10-event-store-history.md`](../plans/2026-04-10-event-store-history.md)(尚未落地)
|
||||
|
||||
---
|
||||
|
||||
## 1. 问题与数据核对
|
||||
|
||||
**症状**:SummarizationMiddleware 触发后,前端历史中无法展示 summarize 之前的真实用户消息。
|
||||
|
||||
**复现数据**(thread `6d30913e-dcd4-41c8-8941-f66c716cf359`):
|
||||
|
||||
| 数据源 | seq=1 的 message | 总 message 数 | 是否保留原始 human |
|
||||
|---|---|---:|---|
|
||||
| `run_events`(SQLite) | human `"最新伊美局势"` | 9(1 human + 7 ai_tool_call + 9 tool_result + 1 ai_message) | ✅ |
|
||||
| `/history` 响应(`docs/resp.json`) | type=human,content=`"Here is a summary of the conversation to date:…"` | 不定 | ❌(已被 summary 替换)|
|
||||
|
||||
**根因**:`backend/app/gateway/routers/threads.py:587-589` 的 `get_thread_history` 从 `checkpoint.channel_values["messages"]` 读取,而 LangGraph 的 SummarizationMiddleware 会原地改写这个列表。
|
||||
|
||||
---
|
||||
|
||||
## 2. 候选方案
|
||||
|
||||
| 方案 | 描述 | 本次是否推荐 |
|
||||
|---|---|---|
|
||||
| **A. event_store 覆盖 messages**(已有 plan) | `/history`、`/state` 改读 `RunEventStore.list_messages()`,覆盖 `channel_values["messages"]`;其它字段保持 checkpoint 来源 | ✅ 主方案 |
|
||||
| B. 修 SummarizationMiddleware | 让 summarize 不原地替换 messages(作为附加 system message) | ❌ 违背 summarize 的 token 预算初衷 |
|
||||
| C. 双读合并(checkpoint + event_store diff) | 合并 summarize 切点前后的两段 | ❌ 合并逻辑复杂无额外收益 |
|
||||
| D. 切到现有 `/api/threads/{id}/messages` 端点 | 前端直接消费已经存在的 event-store 消息端点(`thread_runs.py:285-323`)| ⚠️ 更干净但需要前端改动 |
|
||||
|
||||
---
|
||||
|
||||
## 3. Claude 自评 vs Codex 独立评估
|
||||
|
||||
两方独立分析了同一份 plan。重合点基本一致,但 **Codex 发现了一个我遗漏的关键 bug**。
|
||||
|
||||
### 3.1 一致结论
|
||||
|
||||
| 维度 | 结论 |
|
||||
|---|---|
|
||||
| 正确性方向 | event_store 是 append-only + 不受 summarize 影响,方向正确 |
|
||||
| ID 补齐 | `uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")` 稳定且确定性,安全 |
|
||||
| 前端 schema | 零改动 |
|
||||
| Non-message 字段(artifacts/todos/title/thread_data) | summarize 只影响 messages,不需要覆盖其它字段 |
|
||||
| 多 checkpoint 语义 | 前端 `useStream` 只取 `limit: 1`(`frontend/src/core/threads/hooks.ts:203-210`),不做时间旅行;latest-only 可接受但应在注释/文档写清楚 |
|
||||
| 作用域 | 仅 Gateway mode;Standard mode 直连 LangGraph Server,bug 在默认部署路径仍然存在 |
|
||||
|
||||
### 3.2 Claude 的独立观察
|
||||
|
||||
1. 已验证数据对齐:plan 文档第 15-28 行的真实数据对齐表与本次 `run_events` 导出一致(9 条消息 id 分布:AI 来自 LLM `lc_run--*`、human/tool 为 None)。
|
||||
2. 担心 `run_end` / `run_error` / `cancel` 路径未必都 flush —— 这一点 Codex 实际核查了代码并给出确定结论(见下)。
|
||||
3. 方案 A 的单文件改动约 60 行,复杂度小。
|
||||
|
||||
### 3.3 Codex 的关键补充(Claude 遗漏)
|
||||
|
||||
> **Bug #1 — Plan 用 `limit=1000` 并非全量**
|
||||
> `RunEventStore.list_messages()` 的语义是"返回最新 limit 条"(`base.py:51-65`、`db.py:151-181`)。对于消息数超过 1000 的长对话,plan 当前写法会**丢掉最早的消息**,再次引入"消息丢失"bug(只是换了丢失的段)。
|
||||
|
||||
> **Bug #2 — helper 就地修改了 store 的 dict**
|
||||
> plan 的 helper 里对 `content` 原地写 `id`;`MemoryRunEventStore` 返回的是**活引用**,会污染 store 中的对象。应 deep-copy 或 dict 推导出新对象。
|
||||
|
||||
> **Flush 路径已核查**:
|
||||
> `RunJournal` 在 threshold (`journal.py:360-373`)、`run_end` (`91-96`)、`run_error` (`97-106`)、worker `finally` (`worker.py:280-286`) 都会 flush;`CancelledError` 也走 finally。**正常 end/error/cancel 都 flush,仅硬 kill / 进程崩溃会丢缓冲区**。
|
||||
> 因此 `flush_threshold 20 → 5` 的意义**仅在于硬崩溃窗口**与 mid-run reload 可见性,**不是正确性修复**,属于可选 tuning。代价是更多 put_batch / SQLite churn;且 `_flush_sync()` (`383-398`) 已防止并发 flush,所以"每 5 条一 flush"是 best-effort 非严格保证。
|
||||
|
||||
### 3.4 Codex 未否决但提示的次要点
|
||||
|
||||
- 方案 D(消费现有 `/api/threads/{id}/messages` 端点)更干净但需前端改动。
|
||||
- `/history` 一旦被方案 A 改过,就不再是严格意义上的"按 checkpoint 快照"API(对 messages 字段),应写进注释和 API 文档。
|
||||
- Standard mode 的 summarize bug 应建立独立 follow-up issue。
|
||||
|
||||
---
|
||||
|
||||
## 4. 最终合并判决
|
||||
|
||||
**Codex**:APPROVE-WITH-CHANGES
|
||||
**Claude**:同意 Codex 的判决
|
||||
|
||||
### 合并前必须修改(Top 3)
|
||||
|
||||
1. **修复分页 bug**:不能用固定 `limit=1000`。必须用以下之一:
|
||||
- `count = await event_store.count_messages(thread_id)`,再 `list_messages(thread_id, limit=count)`
|
||||
- 或循环 cursor 分页(`after_seq`)直到耗尽
|
||||
2. **不要原地修改 store dict**:helper 对 `content` 的 id 补齐需要 copy(`dict(content)` 浅拷贝足够,因为只写 top-level `id`)
|
||||
3. **Standard mode 显式 follow-up**:在 plan 文末加 "Standard-mode follow-up: TODO #xxx",或在合并 PR 描述中明确这是 Gateway-only 止血
|
||||
|
||||
### 可选(非阻塞)
|
||||
|
||||
4. `flush_threshold 20 → 5` 降级为"可选 tuning",不是修复的一部分;或独立一条 commit 并说明只对硬崩溃窗口有用
|
||||
5. `get_thread_history` 新增注释,说明 messages 字段脱离了 checkpoint 快照语义
|
||||
6. 测试覆盖:模拟 summarize 后的 checkpoint + 真实 event_store,端到端验证 `/history` 返回包含原始 human 消息
|
||||
|
||||
---
|
||||
|
||||
## 5. 推荐执行顺序
|
||||
|
||||
1. 按本文档 §4 修订 `docs/superpowers/plans/2026-04-10-event-store-history.md`(主要是 Task 1 的 helper 实现 + 分页)
|
||||
2. 按修订后的 plan 执行(走 `superpowers:executing-plans`)
|
||||
3. 合并后立即建 Standard mode follow-up issue
|
||||
|
||||
## 6. Feedback 影响分析(2026-04-11 补充)
|
||||
|
||||
### 6.1 数据模型
|
||||
|
||||
`feedback` 表(`persistence/feedback/model.py`):
|
||||
|
||||
| 字段 | 说明 |
|
||||
|---|---|
|
||||
| `feedback_id` PK | - |
|
||||
| `run_id` NOT NULL | 反馈目标 run |
|
||||
| `thread_id` NOT NULL | - |
|
||||
| `user_id` | - |
|
||||
| `message_id` nullable | 注释明确写:`optional RunEventStore event identifier` — 已经面向 event_store 设计 |
|
||||
| UNIQUE(thread_id, run_id, user_id) | 每 run 每用户至多一条 |
|
||||
|
||||
**结论**:feedback **不按 message uuid 存**,按 `run_id` 存,所以 summarize 导致的 checkpoint messages 丢失**不会影响 feedback 存储**。schema 天生与 event_store 兼容,**无需数据迁移**。
|
||||
|
||||
### 6.2 前端的 runId 映射:发现隐藏 bug
|
||||
|
||||
前端 feedback 目前走两条并行的数据链:
|
||||
|
||||
| 用途 | 数据源 | 位置 |
|
||||
|---|---|---|
|
||||
| 渲染消息体 | `POST /history`(checkpoint) | `useStream` → `thread.messages` |
|
||||
| 拿 `runId` 映射 | `GET /api/threads/{id}/messages?limit=200`(**event_store**) | `useThreadFeedback` (`hooks.ts:669-709`) |
|
||||
|
||||
两者通过 **"AI 消息的序号"** 对齐:
|
||||
|
||||
```ts
|
||||
// hooks.ts:691-698
|
||||
for (const msg of messages) {
|
||||
if (msg.event_type === "ai_message") {
|
||||
runIdByAiIndex.push(msg.run_id); // 只按 AI 顺序 push
|
||||
}
|
||||
}
|
||||
// message-list.tsx:70-71
|
||||
runId = feedbackData.runIdByAiIndex[aiMessageIndex]
|
||||
```
|
||||
|
||||
**Bug**:summarize 过的 thread 里,两条数据链的 AI 消息数量和顺序**不一致**:
|
||||
|
||||
| 数据源 | 本 thread 的 AI 消息序列 | 数量 |
|
||||
|---|---|---:|
|
||||
| `/history`(checkpoint,summarize 后) | seq=19,31,37,45,53 | 5 |
|
||||
| `/messages`(event_store,完整) | seq=5,13,19,31,37,45,53 | 7 |
|
||||
|
||||
结果:前端渲染的"第 0 条 AI 消息"是 seq=19,但 `runIdByAiIndex[0]` 指向 seq=5 的 run(本例同一 run 里没事,**跨多 run 的 thread 点赞就会打到错的 run 上**)。
|
||||
|
||||
**这个 bug 和本次 plan 无关,已经存在了**。只是用户未必注意到。
|
||||
|
||||
### 6.3 方案 A 对 feedback 的影响
|
||||
|
||||
**负面**:无。feedback 存储不受影响。
|
||||
|
||||
**正面(意外收益)**:`/history` 切换到 event_store 后,**两条数据链的 AI 消息序列自动对齐**,§6.2 的隐藏 bug 被顺带修好。
|
||||
|
||||
**前提条件**(加入 Top 3 改动之一同等重要):
|
||||
|
||||
- 新 helper 必须和 `/messages` 端点用**同样的消息获取逻辑**(same store, same filter)。否则两条链仍然可能在边界条件下漂移
|
||||
- 具体说:**两边都要做完整分页**。目前 `/messages?limit=200` 在前端硬编码 200,如果 thread 有 >200 条消息就会截断;plan 的 `limit=1000` 也一样有上限。两个上限不一致 → 两边顺序不再对齐 → feedback 映射错位
|
||||
- **必须修**:`useThreadFeedback` 的 `limit=200` 需要改成分页获取全部,或者 `/messages` 后端改为默认全量
|
||||
|
||||
### 6.4 对前端改造顺序的影响
|
||||
|
||||
原 plan 声明"零前端改动",但加入 feedback 考虑后应修正为:
|
||||
|
||||
| 改动 | 必须 | 可选 |
|
||||
|---|---|---|
|
||||
| 后端 `/history` 改读 event_store | ✅ | - |
|
||||
| 后端 helper 用分页而非 `limit=1000` | ✅ | - |
|
||||
| 前端 `useThreadFeedback` 改用分页或提升 limit | ✅ | - |
|
||||
| `runIdByAiIndex` 增加防御:索引越界 fallback `undefined`(已有)| - | ✅ 已经是 |
|
||||
| 前端改用 `/messages` 直接做渲染(方案 D) | - | ✅ 长期更干净 |
|
||||
|
||||
### 6.5 feedback 相关的新 Top 3 补充
|
||||
|
||||
在原来的 Top 3 之外,再加:
|
||||
|
||||
4. **前端 `useThreadFeedback` 必须分页或拉全**(`frontend/src/core/threads/hooks.ts:679`),否则和 `/history` 的新全量行为仍然错位
|
||||
5. **端到端测试**:一个 thread 跨 >1 个 run + 触发 summarize + 给历史 AI 消息点赞,确认 feedback 打到正确的 run_id
|
||||
6. **TanStack Query 缓存协调**:`thread-feedback` 与 history 查询的 `staleTime` / invalidation 需要在新 run 结束时同步刷新,否则新消息写入后 `runIdByAiIndex` 没更新,点赞会打到上一个 run
|
||||
|
||||
---
|
||||
|
||||
## 8. 未决问题
|
||||
|
||||
- `RunEventStore.count_messages()` 与 `list_messages(after_seq=...)` 的实际性能(SQLite 上对于数千消息级别应无问题,但未压测)
|
||||
- `MemoryRunEventStore` 与 `DbRunEventStore` 分页语义是否一致(Codex 只核查了 `db.py`,`memory.py` 需确认)
|
||||
- 是否应把 `/api/threads/{id}/messages` 提升为前端主用 endpoint,把 `/history` 保留为纯 checkpoint API —— 架构层面更干净但成本更高
|
||||
@@ -0,0 +1,203 @@
|
||||
# Summarize Marker in History — Design & Verification
|
||||
|
||||
**Date**: 2026-04-11
|
||||
**Branch**: `rayhpeng/fix-persistence-new`
|
||||
**Status**: Design approved, implementation deferred to a follow-up PR
|
||||
**Depends on**: [`2026-04-11-runjournal-history-evaluation.md`](./2026-04-11-runjournal-history-evaluation.md) (the event-store-backed history fix this builds on)
|
||||
|
||||
---
|
||||
|
||||
## 1. Goal
|
||||
|
||||
Display a "summarization happened here" marker in the conversation history UI when `SummarizationMiddleware` ran mid-run, so users understand why earlier messages look condensed or missing. The event-store-backed `/history` fix already recovered the original messages; this spec adds a **visible marker** at the seq position where summarization occurred, optionally showing the generated summary text.
|
||||
|
||||
## 2. Investigation findings
|
||||
|
||||
### 2.1 Today's state: zero middleware records
|
||||
|
||||
Full scan of `backend/.deer-flow/data/deerflow.db` `run_events`:
|
||||
|
||||
| category | rows |
|
||||
|---|---:|
|
||||
| trace | 76 |
|
||||
| message | 34 |
|
||||
| lifecycle | 8 |
|
||||
| **middleware** | **0** |
|
||||
|
||||
No row has `event_type` containing `summariz` or `middleware`. The middleware category is dead in production.
|
||||
|
||||
### 2.2 Why: two dead code paths in `journal.py`
|
||||
|
||||
| Location | Status |
|
||||
|---|---|
|
||||
| `journal.py:343-362` — `on_custom_event("summarization", ...)` writes one trace event + one `category="middleware"` event. | Dead. Only fires when something calls `adispatch_custom_event("summarization", {...})`. The upstream LangChain `SummarizationMiddleware` (`.venv/.../langchain/agents/middleware/summarization.py:272`) **never emits custom events** — its `before_model`/`abefore_model` just mutate messages in place and return `{'messages': new_messages}`. Callback never triggered. |
|
||||
| `journal.py:449` — `record_middleware(tag, *, name, hook, action, changes)` helper | Dead. Grep shows zero callers in the harness. Added speculatively, never wired up. |
|
||||
|
||||
### 2.3 Concrete evidence of summarize running unlogged
|
||||
|
||||
Thread `3d5dea4a-0983-4727-a4e8-41a64428933a`:
|
||||
|
||||
- `run_events` seq=1 → original human `"写一份关于deer-flow的详细技术报告"` ✓ (event store is fine)
|
||||
- `run_events` seq=43 → `llm_request` trace whose `messages[0]` literal contains `"Here is a summary of the conversation to date:"` — proof that SummarizationMiddleware did inject a summary mid-run
|
||||
- Zero rows with `category='middleware'` for this thread → nothing captured for UI to render
|
||||
|
||||
## 3. Approaches considered
|
||||
|
||||
### A. Subclass `SummarizationMiddleware` and dispatch a custom event
|
||||
|
||||
Wrap the upstream class, override `abefore_model`, call `await adispatch_custom_event("summarization", {...})` after super(). Journal's existing `on_custom_event` path captures it.
|
||||
|
||||
### B. Frontend-only diff heuristic
|
||||
|
||||
Compare `event_store.count_messages()` vs rendered count, infer summarization happened from the gap. **Rejected**: can't pinpoint position in the stream, can't show summary text. Only yields a vague badge.
|
||||
|
||||
### C. Hybrid A + frontend inline card rendered at the middleware event's seq position
|
||||
|
||||
Same backend as A, plus frontend renders an inline `[N messages condensed]` card at the correct chronological position. **Recommended terminal state**.
|
||||
|
||||
## 4. Subagent's wrong claim and its rebuttal
|
||||
|
||||
An independent agent flagged approach A as structurally broken because:
|
||||
|
||||
> `RunnableCallable(trace=False)` skips `set_config_context`, therefore `var_child_runnable_config` is never set, therefore `adispatch_custom_event` raises `RuntimeError("Unable to dispatch an adhoc event without a parent run id")`.
|
||||
|
||||
**This is wrong.** The user's counter-intuition was correct: `trace=False` does not prevent `adispatch_custom_event` from working, as long as the middleware signature explicitly accepts `config: RunnableConfig`. The mechanism:
|
||||
|
||||
1. `RunnableCallable.__init__` (`langgraph/_internal/_runnable.py:293-319`) inspects the function signature. If it accepts `config: RunnableConfig`, that parameter is recorded in `self.func_accepts`.
|
||||
2. Both `trace=True` and `trace=False` branches of `ainvoke` run the same kwarg-injection loop (`_runnable.py:349-356`): `if kw == "config": kw_value = config`. The `config` passed to `ainvoke` (from Pregel's `task.proc.ainvoke(task.input, config)` at `pregel/_retry.py:138`) is the task config with callbacks already bound.
|
||||
3. Inside the middleware, passing that `config` explicitly to `adispatch_custom_event(..., config=config)` means the function doesn't rely on `var_child_runnable_config.get()` at all. The LangChain docstring at `langchain_core/callbacks/manager.py:2574-2579` even says "If using python 3.10 and async, you MUST specify the config parameter" — which is exactly this path.
|
||||
|
||||
`trace=False` only changes whether **this runnable layer creates a new child callback scope**. It does not affect whether the outer-layer config (with callbacks including `RunJournal`) is passed down to the function.
|
||||
|
||||
## 5. Verification
|
||||
|
||||
Ran `/tmp/verify_summarize_event.py` (standalone minimal reproduction):
|
||||
|
||||
- Minimal `AgentMiddleware` subclass with `abefore_model(self, state, runtime, config: RunnableConfig)`
|
||||
- Calls `await adispatch_custom_event("summarization", {...}, config=config)` inside
|
||||
- `create_agent(model=FakeChatModel, middleware=[probe])`
|
||||
- `agent.ainvoke({...}, config={"callbacks": [RecordingHandler()]})`
|
||||
|
||||
**Result**:
|
||||
|
||||
```
|
||||
INFO verify: ProbeMiddleware.abefore_model called
|
||||
INFO verify: config keys: ['callbacks', 'configurable', 'metadata']
|
||||
INFO verify: config.callbacks type: AsyncCallbackManager
|
||||
INFO verify: config.metadata: {'langgraph_step': 1, 'langgraph_node': 'probe.before_model', ...}
|
||||
INFO verify: on_custom_event fired: name=summarization
|
||||
run_id=019d7d19-1727-7830-aa33-648ecbee4b95
|
||||
data={'summary': 'fake summary', 'replaced_count': 3}
|
||||
SUCCESS: approach A is viable (config injection + adispatch work)
|
||||
```
|
||||
|
||||
All five predictions held:
|
||||
|
||||
1. ✅ `config: RunnableConfig` signature triggers auto-injection despite `trace=False`
|
||||
2. ✅ `config.callbacks` is an `AsyncCallbackManager` with `parent_run_id` set
|
||||
3. ✅ `adispatch_custom_event(..., config=config)` runs without error
|
||||
4. ✅ `RecordingHandler.on_custom_event` receives the event
|
||||
5. ✅ The received `run_id` is a valid UUID tied to the running graph
|
||||
|
||||
**Bonus finding**: `config.metadata` contains `langgraph_step` and `langgraph_node`. These can be included in the middleware event's metadata to help the frontend position the marker on the timeline.
|
||||
|
||||
## 6. Recommended implementation (approach C)
|
||||
|
||||
### 6.1 Backend
|
||||
|
||||
**New wrapper middleware** in `backend/packages/harness/deerflow/agents/lead_agent/agent.py`:
|
||||
|
||||
```python
|
||||
from langchain.agents.middleware.summarization import SummarizationMiddleware
|
||||
from langchain_core.callbacks import adispatch_custom_event
|
||||
from langchain_core.runnables import RunnableConfig
|
||||
|
||||
|
||||
class _TrackingSummarizationMiddleware(SummarizationMiddleware):
|
||||
"""Wraps upstream SummarizationMiddleware to emit a ``summarization``
|
||||
custom event on every actual summarization, so RunJournal can persist
|
||||
a middleware:summarize row to the event store.
|
||||
|
||||
The upstream class does not emit events of its own. Declaring
|
||||
``config: RunnableConfig`` in the override lets LangGraph's
|
||||
``RunnableCallable`` inject the Pregel task config (with callbacks
|
||||
and parent_run_id) regardless of ``trace=False`` on the node.
|
||||
"""
|
||||
|
||||
async def abefore_model(self, state, runtime, config: RunnableConfig):
|
||||
before_count = len(state.get("messages") or [])
|
||||
result = await super().abefore_model(state, runtime)
|
||||
if result is None:
|
||||
return None
|
||||
|
||||
new_messages = result.get("messages") or []
|
||||
replaced_count = max(0, before_count - len(new_messages))
|
||||
summary_text = _extract_summary_text(new_messages)
|
||||
|
||||
await adispatch_custom_event(
|
||||
"summarization",
|
||||
{
|
||||
"summary": summary_text,
|
||||
"replaced_count": replaced_count,
|
||||
},
|
||||
config=config,
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
def _extract_summary_text(messages: list) -> str:
|
||||
"""Pull the summary string out of the HumanMessage the upstream class
|
||||
injects as ``Here is a summary of the conversation to date:...``."""
|
||||
for msg in messages:
|
||||
if getattr(msg, "type", None) == "human":
|
||||
content = getattr(msg, "content", "")
|
||||
text = content if isinstance(content, str) else ""
|
||||
if text.startswith("Here is a summary of the conversation to date"):
|
||||
return text
|
||||
return ""
|
||||
```
|
||||
|
||||
Swap the existing `SummarizationMiddleware()` instantiation in `_build_middlewares` for `_TrackingSummarizationMiddleware(...)` with the same args.
|
||||
|
||||
**Journal change**: **zero**. `on_custom_event("summarization", ...)` in `journal.py:343-362` already writes both a trace and a `category="middleware"` row.
|
||||
|
||||
**History helper change**: extend `_get_event_store_messages` in `backend/app/gateway/routers/threads.py` to surface `category="middleware"` rows as pseudo-messages, e.g.:
|
||||
|
||||
```python
|
||||
# In the per-event loop, after the existing message branch:
|
||||
if evt.get("category") == "middleware" and evt.get("event_type") == "middleware:summarize":
|
||||
meta = evt.get("metadata") or {}
|
||||
messages.append({
|
||||
"id": f"summary-marker-{evt['seq']}",
|
||||
"type": "summary_marker",
|
||||
"replaced_count": meta.get("replaced_count", 0),
|
||||
"summary": (raw or {}).get("content", "") if isinstance(raw, dict) else "",
|
||||
"run_id": evt.get("run_id"),
|
||||
})
|
||||
```
|
||||
|
||||
The marker uses a sentinel `type` (`summary_marker`) that doesn't collide with any LangChain message type, so downstream consumers that loop over messages can skip or render it explicitly.
|
||||
|
||||
### 6.2 Frontend
|
||||
|
||||
- `core/messages/utils.ts`: extend the message grouping to recognize `type === "summary_marker"` and yield it as its own group (`"assistant:summary-marker"`)
|
||||
- `components/workspace/messages/message-list.tsx`: add a branch in the grouped render switch that renders a distinctive inline card showing `N messages condensed` and a collapsible panel with the summary text
|
||||
- No changes to feedback logic: the marker has no `feedback` field so the button naturally doesn't render on it
|
||||
|
||||
## 7. Risks
|
||||
|
||||
1. **Synchronous path**. The upstream class has both `before_model` and `abefore_model`. Our wrapper only overrides the async variant. If any deer-flow code path ever uses the sync flow, those summarizations won't be captured. Mitigation: also override `before_model` and use `dispatch_custom_event` (sync variant) with the same pattern.
|
||||
2. **`_extract_summary_text` fragility**. It depends on the upstream class prefix `"Here is a summary of the conversation to date"` in the injected `HumanMessage`. Any upstream template change breaks detection. Mitigation: pick the first new `HumanMessage` that wasn't in `state["messages"]` before super() — resilient to template wording changes at the cost of a small diff helper.
|
||||
3. **`replaced_count` accuracy when concurrent updates**. If another middleware in the chain also modifies `state["messages"]` before super() returns, the naive `before_count - len(new_messages)` arithmetic is wrong. Mitigation: inspect the `RemoveMessage(id=REMOVE_ALL_MESSAGES)` that upstream emits and count from the original input list directly.
|
||||
4. **History helper contract change**. Introducing a non-LangChain-typed entry (`type="summary_marker"`) in the `/history` response could break frontend code that blindly casts entries to `Message`. Mitigation: the frontend change above adds an explicit branch; type-check the frontend end-to-end before merging.
|
||||
|
||||
## 8. Out of scope / deferred
|
||||
|
||||
- Other middleware types (Title, Guardrail, HITL) do not emit custom events either. If we want markers for those too, repeat the wrapper pattern for each. Not in this design.
|
||||
- Retroactive markers for old threads (captured before this patch) are impossible without re-running the graph. Legacy threads will show the event-store-recovered messages without a marker.
|
||||
- Standard mode (`make dev`) — agent runs inside LangGraph Server, not the Gateway-embedded runtime. `RunJournal` may not be wired there, so the custom event fires but is captured by no one. Tracked as a separate follow-up.
|
||||
|
||||
## 9. Next actions
|
||||
|
||||
1. Land the current summarize-message-loss fixes (journal `Command` unwrap + event-store-backed `/history` + inline feedback) — implementation verified, being committed now as three commits on `rayhpeng/fix-persistence-new`
|
||||
2. Summarize-marker implementation (this spec) → separate follow-up PR based on the above verified design
|
||||
Reference in New Issue
Block a user