``get_thread_history`` and ``get_thread_state`` in Gateway mode read
messages from ``checkpoint.channel_values["messages"]``. After
SummarizationMiddleware runs mid-run, that list is rewritten in-place:
pre-summarize messages are dropped and a synthetic summary-as-human
message takes position 0. The frontend then renders a chat history that
starts with ``"Here is a summary of the conversation to date:..."``
instead of the user's original query, and all earlier turns are gone.
The event store (``RunEventStore``) is append-only and never rewritten,
so it retains the full transcript. This commit adds a helper
``_get_event_store_messages`` that loads the event store's message
stream and overrides ``values["messages"]`` in both endpoints; the
checkpoint fallback kicks in only when the event store is unavailable.
Behavior contract of the helper:
- **Full pagination.** ``list_messages`` returns the newest ``limit``
records when no cursor is given, so a fixed limit silently drops
older messages on long threads. The helper sizes the read from
``count_messages()`` and pages forward with ``after_seq`` cursors.
- **Copy-on-read.** Each content dict is copied before ``id`` is
patched so the live store object (``MemoryRunEventStore`` returns
references) is never mutated.
- **Stable ids.** Messages with ``id=None`` (human + tool_result,
which don't receive an id until checkpoint persistence) get a
deterministic ``uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")`` so
React keys stay stable across requests. AI messages keep their
LLM-assigned ``lc_run--*`` ids.
- **Legacy ``Command`` repr sanitization.** Rows captured before the
``journal.py`` ``on_tool_end`` fix (previous commit) stored
``str(Command(update={'messages': [ToolMessage(content='X', ...)]}))``
as the tool_result content. ``_sanitize_legacy_command_repr``
regex-extracts the inner text so old threads render cleanly.
- **Inline feedback.** When loading the stream, the helper also pulls
``feedback_repo.list_by_thread_grouped`` and attaches ``run_id`` to
every message plus ``feedback`` to the final ``ai_message`` of each
run. This removes the frontend's need to fetch a second endpoint
and positional-index-map its way back to the right run. When the
feedback subsystem is unavailable, the ``feedback`` field is left
absent entirely so the frontend hides the button rather than
rendering it over a broken write path.
- **User context.** ``DbRunEventStore`` is user-scoped by default via
``resolve_user_id(AUTO)``. The helper relies on the ``@require_permission``
decorator having populated the user contextvar on both callers; the
docstring documents this dependency explicitly so nobody wires it
into a CLI or migration script without passing ``user_id=None``.
Real data verification against thread
``6d30913e-dcd4-41c8-8941-f66c716cf359``: checkpoint showed 12 messages
(summarize-corrupted), event store had 16. The original human message
``"最新伊美局势"`` was preserved as seq=1 in the event store and
correctly restored to position 0 in the helper output. Helper output
for AI messages was byte-identical to checkpoint for every overlapping
message; only tool_result ids differed (patched to uuid5) and the
legacy Command repr at seq=48 was sanitized.
Tests:
- ``test_thread_state_event_store.py`` — 18 tests covering
``_sanitize_legacy_command_repr`` (passthrough, single/double-quote
extraction, unparseable fallback), helper happy path (all message
types, stable uuid5, store non-mutation), multi-page pagination,
summarize regression (recovers pre-summarize messages), feedback
attachment (per-run, multi-run threads, repo failure graceful),
and dependency failure fallback to ``None``.
Docs:
- ``docs/superpowers/plans/2026-04-10-event-store-history.md`` — the
implementation plan this commit realizes, with Task 1 revised after
the evaluation findings (pagination, copy-on-read, Command wrap
already landed in journal.py, frontend feedback pagination in the
follow-up commit, Standard-mode follow-up noted).
- ``docs/superpowers/specs/2026-04-11-runjournal-history-evaluation.md``
— the Claude + second-opinion evaluation document that drove the
plan revisions (pagination bug, dict-mutation bug, feedback hidden
bug, Command bug).
- ``docs/superpowers/specs/2026-04-11-summarize-marker-design.md`` —
design for a follow-up PR that visually marks summarize events in
history, based on a verified ``adispatch_custom_event`` experiment
(``trace=False`` middleware nodes can still forward the Pregel task
config via explicit signature injection).
Scope: Gateway mode only (``make dev-pro``). Standard mode
(``make dev``) hits LangGraph Server directly and bypasses these
endpoints; the summarize symptom is still present there and is
tracked as a separate follow-up in the plan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 KiB
Summarize Marker in History — Design & Verification
Date: 2026-04-11
Branch: rayhpeng/fix-persistence-new
Status: Design approved, implementation deferred to a follow-up PR
Depends on: 2026-04-11-runjournal-history-evaluation.md (the event-store-backed history fix this builds on)
1. Goal
Display a "summarization happened here" marker in the conversation history UI when SummarizationMiddleware ran mid-run, so users understand why earlier messages look condensed or missing. The event-store-backed /history fix already recovered the original messages; this spec adds a visible marker at the seq position where summarization occurred, optionally showing the generated summary text.
2. Investigation findings
2.1 Today's state: zero middleware records
Full scan of backend/.deer-flow/data/deerflow.db run_events:
| category | rows |
|---|---|
| trace | 76 |
| message | 34 |
| lifecycle | 8 |
| middleware | 0 |
No row has event_type containing summariz or middleware. The middleware category is dead in production.
2.2 Why: two dead code paths in journal.py
| Location | Status |
|---|---|
journal.py:343-362 — on_custom_event("summarization", ...) writes one trace event + one category="middleware" event. |
Dead. Only fires when something calls adispatch_custom_event("summarization", {...}). The upstream LangChain SummarizationMiddleware (.venv/.../langchain/agents/middleware/summarization.py:272) never emits custom events — its before_model/abefore_model just mutate messages in place and return {'messages': new_messages}. Callback never triggered. |
journal.py:449 — record_middleware(tag, *, name, hook, action, changes) helper |
Dead. Grep shows zero callers in the harness. Added speculatively, never wired up. |
2.3 Concrete evidence of summarize running unlogged
Thread 3d5dea4a-0983-4727-a4e8-41a64428933a:
run_eventsseq=1 → original human"写一份关于deer-flow的详细技术报告"✓ (event store is fine)run_eventsseq=43 →llm_requesttrace whosemessages[0]literal contains"Here is a summary of the conversation to date:"— proof that SummarizationMiddleware did inject a summary mid-run- Zero rows with
category='middleware'for this thread → nothing captured for UI to render
3. Approaches considered
A. Subclass SummarizationMiddleware and dispatch a custom event
Wrap the upstream class, override abefore_model, call await adispatch_custom_event("summarization", {...}) after super(). Journal's existing on_custom_event path captures it.
B. Frontend-only diff heuristic
Compare event_store.count_messages() vs rendered count, infer summarization happened from the gap. Rejected: can't pinpoint position in the stream, can't show summary text. Only yields a vague badge.
C. Hybrid A + frontend inline card rendered at the middleware event's seq position
Same backend as A, plus frontend renders an inline [N messages condensed] card at the correct chronological position. Recommended terminal state.
4. Subagent's wrong claim and its rebuttal
An independent agent flagged approach A as structurally broken because:
RunnableCallable(trace=False)skipsset_config_context, thereforevar_child_runnable_configis never set, thereforeadispatch_custom_eventraisesRuntimeError("Unable to dispatch an adhoc event without a parent run id").
This is wrong. The user's counter-intuition was correct: trace=False does not prevent adispatch_custom_event from working, as long as the middleware signature explicitly accepts config: RunnableConfig. The mechanism:
RunnableCallable.__init__(langgraph/_internal/_runnable.py:293-319) inspects the function signature. If it acceptsconfig: RunnableConfig, that parameter is recorded inself.func_accepts.- Both
trace=Trueandtrace=Falsebranches ofainvokerun the same kwarg-injection loop (_runnable.py:349-356):if kw == "config": kw_value = config. Theconfigpassed toainvoke(from Pregel'stask.proc.ainvoke(task.input, config)atpregel/_retry.py:138) is the task config with callbacks already bound. - Inside the middleware, passing that
configexplicitly toadispatch_custom_event(..., config=config)means the function doesn't rely onvar_child_runnable_config.get()at all. The LangChain docstring atlangchain_core/callbacks/manager.py:2574-2579even says "If using python 3.10 and async, you MUST specify the config parameter" — which is exactly this path.
trace=False only changes whether this runnable layer creates a new child callback scope. It does not affect whether the outer-layer config (with callbacks including RunJournal) is passed down to the function.
5. Verification
Ran /tmp/verify_summarize_event.py (standalone minimal reproduction):
- Minimal
AgentMiddlewaresubclass withabefore_model(self, state, runtime, config: RunnableConfig) - Calls
await adispatch_custom_event("summarization", {...}, config=config)inside create_agent(model=FakeChatModel, middleware=[probe])agent.ainvoke({...}, config={"callbacks": [RecordingHandler()]})
Result:
INFO verify: ProbeMiddleware.abefore_model called
INFO verify: config keys: ['callbacks', 'configurable', 'metadata']
INFO verify: config.callbacks type: AsyncCallbackManager
INFO verify: config.metadata: {'langgraph_step': 1, 'langgraph_node': 'probe.before_model', ...}
INFO verify: on_custom_event fired: name=summarization
run_id=019d7d19-1727-7830-aa33-648ecbee4b95
data={'summary': 'fake summary', 'replaced_count': 3}
SUCCESS: approach A is viable (config injection + adispatch work)
All five predictions held:
- ✅
config: RunnableConfigsignature triggers auto-injection despitetrace=False - ✅
config.callbacksis anAsyncCallbackManagerwithparent_run_idset - ✅
adispatch_custom_event(..., config=config)runs without error - ✅
RecordingHandler.on_custom_eventreceives the event - ✅ The received
run_idis a valid UUID tied to the running graph
Bonus finding: config.metadata contains langgraph_step and langgraph_node. These can be included in the middleware event's metadata to help the frontend position the marker on the timeline.
6. Recommended implementation (approach C)
6.1 Backend
New wrapper middleware in backend/packages/harness/deerflow/agents/lead_agent/agent.py:
from langchain.agents.middleware.summarization import SummarizationMiddleware
from langchain_core.callbacks import adispatch_custom_event
from langchain_core.runnables import RunnableConfig
class _TrackingSummarizationMiddleware(SummarizationMiddleware):
"""Wraps upstream SummarizationMiddleware to emit a ``summarization``
custom event on every actual summarization, so RunJournal can persist
a middleware:summarize row to the event store.
The upstream class does not emit events of its own. Declaring
``config: RunnableConfig`` in the override lets LangGraph's
``RunnableCallable`` inject the Pregel task config (with callbacks
and parent_run_id) regardless of ``trace=False`` on the node.
"""
async def abefore_model(self, state, runtime, config: RunnableConfig):
before_count = len(state.get("messages") or [])
result = await super().abefore_model(state, runtime)
if result is None:
return None
new_messages = result.get("messages") or []
replaced_count = max(0, before_count - len(new_messages))
summary_text = _extract_summary_text(new_messages)
await adispatch_custom_event(
"summarization",
{
"summary": summary_text,
"replaced_count": replaced_count,
},
config=config,
)
return result
def _extract_summary_text(messages: list) -> str:
"""Pull the summary string out of the HumanMessage the upstream class
injects as ``Here is a summary of the conversation to date:...``."""
for msg in messages:
if getattr(msg, "type", None) == "human":
content = getattr(msg, "content", "")
text = content if isinstance(content, str) else ""
if text.startswith("Here is a summary of the conversation to date"):
return text
return ""
Swap the existing SummarizationMiddleware() instantiation in _build_middlewares for _TrackingSummarizationMiddleware(...) with the same args.
Journal change: zero. on_custom_event("summarization", ...) in journal.py:343-362 already writes both a trace and a category="middleware" row.
History helper change: extend _get_event_store_messages in backend/app/gateway/routers/threads.py to surface category="middleware" rows as pseudo-messages, e.g.:
# In the per-event loop, after the existing message branch:
if evt.get("category") == "middleware" and evt.get("event_type") == "middleware:summarize":
meta = evt.get("metadata") or {}
messages.append({
"id": f"summary-marker-{evt['seq']}",
"type": "summary_marker",
"replaced_count": meta.get("replaced_count", 0),
"summary": (raw or {}).get("content", "") if isinstance(raw, dict) else "",
"run_id": evt.get("run_id"),
})
The marker uses a sentinel type (summary_marker) that doesn't collide with any LangChain message type, so downstream consumers that loop over messages can skip or render it explicitly.
6.2 Frontend
core/messages/utils.ts: extend the message grouping to recognizetype === "summary_marker"and yield it as its own group ("assistant:summary-marker")components/workspace/messages/message-list.tsx: add a branch in the grouped render switch that renders a distinctive inline card showingN messages condensedand a collapsible panel with the summary text- No changes to feedback logic: the marker has no
feedbackfield so the button naturally doesn't render on it
7. Risks
- Synchronous path. The upstream class has both
before_modelandabefore_model. Our wrapper only overrides the async variant. If any deer-flow code path ever uses the sync flow, those summarizations won't be captured. Mitigation: also overridebefore_modeland usedispatch_custom_event(sync variant) with the same pattern. _extract_summary_textfragility. It depends on the upstream class prefix"Here is a summary of the conversation to date"in the injectedHumanMessage. Any upstream template change breaks detection. Mitigation: pick the first newHumanMessagethat wasn't instate["messages"]before super() — resilient to template wording changes at the cost of a small diff helper.replaced_countaccuracy when concurrent updates. If another middleware in the chain also modifiesstate["messages"]before super() returns, the naivebefore_count - len(new_messages)arithmetic is wrong. Mitigation: inspect theRemoveMessage(id=REMOVE_ALL_MESSAGES)that upstream emits and count from the original input list directly.- History helper contract change. Introducing a non-LangChain-typed entry (
type="summary_marker") in the/historyresponse could break frontend code that blindly casts entries toMessage. Mitigation: the frontend change above adds an explicit branch; type-check the frontend end-to-end before merging.
8. Out of scope / deferred
- Other middleware types (Title, Guardrail, HITL) do not emit custom events either. If we want markers for those too, repeat the wrapper pattern for each. Not in this design.
- Retroactive markers for old threads (captured before this patch) are impossible without re-running the graph. Legacy threads will show the event-store-recovered messages without a marker.
- Standard mode (
make dev) — agent runs inside LangGraph Server, not the Gateway-embedded runtime.RunJournalmay not be wired there, so the custom event fires but is captured by no one. Tracked as a separate follow-up.
9. Next actions
- Land the current summarize-message-loss fixes (journal
Commandunwrap + event-store-backed/history+ inline feedback) — implementation verified, being committed now as three commits onrayhpeng/fix-persistence-new - Summarize-marker implementation (this spec) → separate follow-up PR based on the above verified design