Files

T

rayhpeng 229c8095be fix(threads): load history messages from event store, immune to summarize

``get_thread_history`` and ``get_thread_state`` in Gateway mode read
messages from ``checkpoint.channel_values["messages"]``. After
SummarizationMiddleware runs mid-run, that list is rewritten in-place:
pre-summarize messages are dropped and a synthetic summary-as-human
message takes position 0. The frontend then renders a chat history that
starts with ``"Here is a summary of the conversation to date:..."``
instead of the user's original query, and all earlier turns are gone.

The event store (``RunEventStore``) is append-only and never rewritten,
so it retains the full transcript. This commit adds a helper
``_get_event_store_messages`` that loads the event store's message
stream and overrides ``values["messages"]`` in both endpoints; the
checkpoint fallback kicks in only when the event store is unavailable.

Behavior contract of the helper:

- **Full pagination.** ``list_messages`` returns the newest ``limit``
  records when no cursor is given, so a fixed limit silently drops
  older messages on long threads. The helper sizes the read from
  ``count_messages()`` and pages forward with ``after_seq`` cursors.
- **Copy-on-read.** Each content dict is copied before ``id`` is
  patched so the live store object (``MemoryRunEventStore`` returns
  references) is never mutated.
- **Stable ids.** Messages with ``id=None`` (human + tool_result,
  which don't receive an id until checkpoint persistence) get a
  deterministic ``uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")`` so
  React keys stay stable across requests. AI messages keep their
  LLM-assigned ``lc_run--*`` ids.
- **Legacy ``Command`` repr sanitization.** Rows captured before the
  ``journal.py`` ``on_tool_end`` fix (previous commit) stored
  ``str(Command(update={'messages': [ToolMessage(content='X', ...)]}))``
  as the tool_result content. ``_sanitize_legacy_command_repr``
  regex-extracts the inner text so old threads render cleanly.
- **Inline feedback.** When loading the stream, the helper also pulls
  ``feedback_repo.list_by_thread_grouped`` and attaches ``run_id`` to
  every message plus ``feedback`` to the final ``ai_message`` of each
  run. This removes the frontend's need to fetch a second endpoint
  and positional-index-map its way back to the right run. When the
  feedback subsystem is unavailable, the ``feedback`` field is left
  absent entirely so the frontend hides the button rather than
  rendering it over a broken write path.
- **User context.** ``DbRunEventStore`` is user-scoped by default via
  ``resolve_user_id(AUTO)``. The helper relies on the ``@require_permission``
  decorator having populated the user contextvar on both callers; the
  docstring documents this dependency explicitly so nobody wires it
  into a CLI or migration script without passing ``user_id=None``.

Real data verification against thread
``6d30913e-dcd4-41c8-8941-f66c716cf359``: checkpoint showed 12 messages
(summarize-corrupted), event store had 16. The original human message
``"最新伊美局势"`` was preserved as seq=1 in the event store and
correctly restored to position 0 in the helper output. Helper output
for AI messages was byte-identical to checkpoint for every overlapping
message; only tool_result ids differed (patched to uuid5) and the
legacy Command repr at seq=48 was sanitized.

Tests:
- ``test_thread_state_event_store.py`` — 18 tests covering
  ``_sanitize_legacy_command_repr`` (passthrough, single/double-quote
  extraction, unparseable fallback), helper happy path (all message
  types, stable uuid5, store non-mutation), multi-page pagination,
  summarize regression (recovers pre-summarize messages), feedback
  attachment (per-run, multi-run threads, repo failure graceful),
  and dependency failure fallback to ``None``.

Docs:
- ``docs/superpowers/plans/2026-04-10-event-store-history.md`` — the
  implementation plan this commit realizes, with Task 1 revised after
  the evaluation findings (pagination, copy-on-read, Command wrap
  already landed in journal.py, frontend feedback pagination in the
  follow-up commit, Standard-mode follow-up noted).
- ``docs/superpowers/specs/2026-04-11-runjournal-history-evaluation.md``
  — the Claude + second-opinion evaluation document that drove the
  plan revisions (pagination bug, dict-mutation bug, feedback hidden
  bug, Command bug).
- ``docs/superpowers/specs/2026-04-11-summarize-marker-design.md`` —
  design for a follow-up PR that visually marks summarize events in
  history, based on a verified ``adispatch_custom_event`` experiment
  (``trace=False`` middleware nodes can still forward the Pregel task
  config via explicit signature injection).

Scope: Gateway mode only (``make dev-pro``). Standard mode
(``make dev``) hits LangGraph Server directly and bypasses these
endpoints; the summarize symptom is still present there and is
tracked as a separate follow-up in the plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-11 23:38:53 +08:00

12 KiB

Raw Blame History

Summarize Marker in History — Design & Verification

Date: 2026-04-11 Branch: rayhpeng/fix-persistence-new Status: Design approved, implementation deferred to a follow-up PR Depends on: 2026-04-11-runjournal-history-evaluation.md (the event-store-backed history fix this builds on)

1. Goal

Display a "summarization happened here" marker in the conversation history UI when SummarizationMiddleware ran mid-run, so users understand why earlier messages look condensed or missing. The event-store-backed /history fix already recovered the original messages; this spec adds a visible marker at the seq position where summarization occurred, optionally showing the generated summary text.

2. Investigation findings

2.1 Today's state: zero middleware records

Full scan of backend/.deer-flow/data/deerflow.db run_events:

category	rows
trace	76
message	34
lifecycle	8
middleware	0

No row has event_type containing summariz or middleware. The middleware category is dead in production.

2.2 Why: two dead code paths in `journal.py`

Location	Status
`journal.py:343-362` — `on_custom_event("summarization", ...)` writes one trace event + one `category="middleware"` event.	Dead. Only fires when something calls `adispatch_custom_event("summarization", {...})`. The upstream LangChain `SummarizationMiddleware` (`.venv/.../langchain/agents/middleware/summarization.py:272`) never emits custom events — its `before_model`/`abefore_model` just mutate messages in place and return `{'messages': new_messages}`. Callback never triggered.
`journal.py:449` — `record_middleware(tag, *, name, hook, action, changes)` helper	Dead. Grep shows zero callers in the harness. Added speculatively, never wired up.

2.3 Concrete evidence of summarize running unlogged

Thread 3d5dea4a-0983-4727-a4e8-41a64428933a:

run_events seq=1 → original human "写一份关于deer-flow的详细技术报告" ✓ (event store is fine)
run_events seq=43 → llm_request trace whose messages[0] literal contains "Here is a summary of the conversation to date:" — proof that SummarizationMiddleware did inject a summary mid-run
Zero rows with category='middleware' for this thread → nothing captured for UI to render

3. Approaches considered

A. Subclass `SummarizationMiddleware` and dispatch a custom event

Wrap the upstream class, override abefore_model, call await adispatch_custom_event("summarization", {...}) after super(). Journal's existing on_custom_event path captures it.

B. Frontend-only diff heuristic

Compare event_store.count_messages() vs rendered count, infer summarization happened from the gap. Rejected: can't pinpoint position in the stream, can't show summary text. Only yields a vague badge.

C. Hybrid A + frontend inline card rendered at the middleware event's seq position

Same backend as A, plus frontend renders an inline [N messages condensed] card at the correct chronological position. Recommended terminal state.

4. Subagent's wrong claim and its rebuttal

An independent agent flagged approach A as structurally broken because:

RunnableCallable(trace=False) skips set_config_context, therefore var_child_runnable_config is never set, therefore adispatch_custom_event raises RuntimeError("Unable to dispatch an adhoc event without a parent run id").

This is wrong. The user's counter-intuition was correct: trace=False does not prevent adispatch_custom_event from working, as long as the middleware signature explicitly accepts config: RunnableConfig. The mechanism:

RunnableCallable.__init__ (langgraph/_internal/_runnable.py:293-319) inspects the function signature. If it accepts config: RunnableConfig, that parameter is recorded in self.func_accepts.
Both trace=True and trace=False branches of ainvoke run the same kwarg-injection loop (_runnable.py:349-356): if kw == "config": kw_value = config. The config passed to ainvoke (from Pregel's task.proc.ainvoke(task.input, config) at pregel/_retry.py:138) is the task config with callbacks already bound.
Inside the middleware, passing that config explicitly to adispatch_custom_event(..., config=config) means the function doesn't rely on var_child_runnable_config.get() at all. The LangChain docstring at langchain_core/callbacks/manager.py:2574-2579 even says "If using python 3.10 and async, you MUST specify the config parameter" — which is exactly this path.

trace=False only changes whether this runnable layer creates a new child callback scope. It does not affect whether the outer-layer config (with callbacks including RunJournal) is passed down to the function.

5. Verification

Ran /tmp/verify_summarize_event.py (standalone minimal reproduction):

Minimal AgentMiddleware subclass with abefore_model(self, state, runtime, config: RunnableConfig)
Calls await adispatch_custom_event("summarization", {...}, config=config) inside
create_agent(model=FakeChatModel, middleware=[probe])
agent.ainvoke({...}, config={"callbacks": [RecordingHandler()]})

Result:

INFO verify: ProbeMiddleware.abefore_model called
INFO verify:   config keys: ['callbacks', 'configurable', 'metadata']
INFO verify:   config.callbacks type: AsyncCallbackManager
INFO verify:   config.metadata: {'langgraph_step': 1, 'langgraph_node': 'probe.before_model', ...}
INFO verify: on_custom_event fired: name=summarization
             run_id=019d7d19-1727-7830-aa33-648ecbee4b95
             data={'summary': 'fake summary', 'replaced_count': 3}
SUCCESS: approach A is viable (config injection + adispatch work)

All five predictions held:

✅ config: RunnableConfig signature triggers auto-injection despite trace=False
✅ config.callbacks is an AsyncCallbackManager with parent_run_id set
✅ adispatch_custom_event(..., config=config) runs without error
✅ RecordingHandler.on_custom_event receives the event
✅ The received run_id is a valid UUID tied to the running graph

Bonus finding: config.metadata contains langgraph_step and langgraph_node. These can be included in the middleware event's metadata to help the frontend position the marker on the timeline.

6. Recommended implementation (approach C)

6.1 Backend

New wrapper middleware in backend/packages/harness/deerflow/agents/lead_agent/agent.py:

from langchain.agents.middleware.summarization import SummarizationMiddleware
from langchain_core.callbacks import adispatch_custom_event
from langchain_core.runnables import RunnableConfig


class _TrackingSummarizationMiddleware(SummarizationMiddleware):
    """Wraps upstream SummarizationMiddleware to emit a ``summarization``
    custom event on every actual summarization, so RunJournal can persist
    a middleware:summarize row to the event store.

    The upstream class does not emit events of its own. Declaring
    ``config: RunnableConfig`` in the override lets LangGraph's
    ``RunnableCallable`` inject the Pregel task config (with callbacks
    and parent_run_id) regardless of ``trace=False`` on the node.
    """

    async def abefore_model(self, state, runtime, config: RunnableConfig):
        before_count = len(state.get("messages") or [])
        result = await super().abefore_model(state, runtime)
        if result is None:
            return None

        new_messages = result.get("messages") or []
        replaced_count = max(0, before_count - len(new_messages))
        summary_text = _extract_summary_text(new_messages)

        await adispatch_custom_event(
            "summarization",
            {
                "summary": summary_text,
                "replaced_count": replaced_count,
            },
            config=config,
        )
        return result


def _extract_summary_text(messages: list) -> str:
    """Pull the summary string out of the HumanMessage the upstream class
    injects as ``Here is a summary of the conversation to date:...``."""
    for msg in messages:
        if getattr(msg, "type", None) == "human":
            content = getattr(msg, "content", "")
            text = content if isinstance(content, str) else ""
            if text.startswith("Here is a summary of the conversation to date"):
                return text
    return ""

Swap the existing SummarizationMiddleware() instantiation in _build_middlewares for _TrackingSummarizationMiddleware(...) with the same args.

Journal change: zero. on_custom_event("summarization", ...) in journal.py:343-362 already writes both a trace and a category="middleware" row.

History helper change: extend _get_event_store_messages in backend/app/gateway/routers/threads.py to surface category="middleware" rows as pseudo-messages, e.g.:

# In the per-event loop, after the existing message branch:
if evt.get("category") == "middleware" and evt.get("event_type") == "middleware:summarize":
    meta = evt.get("metadata") or {}
    messages.append({
        "id": f"summary-marker-{evt['seq']}",
        "type": "summary_marker",
        "replaced_count": meta.get("replaced_count", 0),
        "summary": (raw or {}).get("content", "") if isinstance(raw, dict) else "",
        "run_id": evt.get("run_id"),
    })

The marker uses a sentinel type (summary_marker) that doesn't collide with any LangChain message type, so downstream consumers that loop over messages can skip or render it explicitly.

6.2 Frontend

core/messages/utils.ts: extend the message grouping to recognize type === "summary_marker" and yield it as its own group ("assistant:summary-marker")
components/workspace/messages/message-list.tsx: add a branch in the grouped render switch that renders a distinctive inline card showing N messages condensed and a collapsible panel with the summary text
No changes to feedback logic: the marker has no feedback field so the button naturally doesn't render on it

7. Risks

Synchronous path. The upstream class has both before_model and abefore_model. Our wrapper only overrides the async variant. If any deer-flow code path ever uses the sync flow, those summarizations won't be captured. Mitigation: also override before_model and use dispatch_custom_event (sync variant) with the same pattern.
_extract_summary_text fragility. It depends on the upstream class prefix "Here is a summary of the conversation to date" in the injected HumanMessage. Any upstream template change breaks detection. Mitigation: pick the first new HumanMessage that wasn't in state["messages"] before super() — resilient to template wording changes at the cost of a small diff helper.
replaced_count accuracy when concurrent updates. If another middleware in the chain also modifies state["messages"] before super() returns, the naive before_count - len(new_messages) arithmetic is wrong. Mitigation: inspect the RemoveMessage(id=REMOVE_ALL_MESSAGES) that upstream emits and count from the original input list directly.
History helper contract change. Introducing a non-LangChain-typed entry (type="summary_marker") in the /history response could break frontend code that blindly casts entries to Message. Mitigation: the frontend change above adds an explicit branch; type-check the frontend end-to-end before merging.

8. Out of scope / deferred

Other middleware types (Title, Guardrail, HITL) do not emit custom events either. If we want markers for those too, repeat the wrapper pattern for each. Not in this design.
Retroactive markers for old threads (captured before this patch) are impossible without re-running the graph. Legacy threads will show the event-store-recovered messages without a marker.
Standard mode (make dev) — agent runs inside LangGraph Server, not the Gateway-embedded runtime. RunJournal may not be wired there, so the custom event fires but is captured by no one. Tracked as a separate follow-up.

9. Next actions

Land the current summarize-message-loss fixes (journal Command unwrap + event-store-backed /history + inline feedback) — implementation verified, being committed now as three commits on rayhpeng/fix-persistence-new
Summarize-marker implementation (this spec) → separate follow-up PR based on the above verified design

12 KiB Raw Blame History