fix(threads): load history messages from event store, immune to summarize

``get_thread_history`` and ``get_thread_state`` in Gateway mode read messages from ``checkpoint.channel_values["messages"]``. After SummarizationMiddleware runs mid-run, that list is rewritten in-place: pre-summarize messages are dropped and a synthetic summary-as-human message takes position 0. The frontend then renders a chat history that starts with ``"Here is a summary of the conversation to date:..."`` instead of the user's original query, and all earlier turns are gone. The event store (``RunEventStore``) is append-only and never rewritten, so it retains the full transcript. This commit adds a helper ``_get_event_store_messages`` that loads the event store's message stream and overrides ``values["messages"]`` in both endpoints; the checkpoint fallback kicks in only when the event store is unavailable. Behavior contract of the helper: - **Full pagination.** ``list_messages`` returns the newest ``limit`` records when no cursor is given, so a fixed limit silently drops older messages on long threads. The helper sizes the read from ``count_messages()`` and pages forward with ``after_seq`` cursors. - **Copy-on-read.** Each content dict is copied before ``id`` is patched so the live store object (``MemoryRunEventStore`` returns references) is never mutated. - **Stable ids.** Messages with ``id=None`` (human + tool_result, which don't receive an id until checkpoint persistence) get a deterministic ``uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")`` so React keys stay stable across requests. AI messages keep their LLM-assigned ``lc_run--*`` ids. - **Legacy ``Command`` repr sanitization.** Rows captured before the ``journal.py`` ``on_tool_end`` fix (previous commit) stored ``str(Command(update={'messages': [ToolMessage(content='X', ...)]}))`` as the tool_result content. ``_sanitize_legacy_command_repr`` regex-extracts the inner text so old threads render cleanly. - **Inline feedback.** When loading the stream, the helper also pulls ``feedback_repo.list_by_thread_grouped`` and attaches ``run_id`` to every message plus ``feedback`` to the final ``ai_message`` of each run. This removes the frontend's need to fetch a second endpoint and positional-index-map its way back to the right run. When the feedback subsystem is unavailable, the ``feedback`` field is left absent entirely so the frontend hides the button rather than rendering it over a broken write path. - **User context.** ``DbRunEventStore`` is user-scoped by default via ``resolve_user_id(AUTO)``. The helper relies on the ``@require_permission`` decorator having populated the user contextvar on both callers; the docstring documents this dependency explicitly so nobody wires it into a CLI or migration script without passing ``user_id=None``. Real data verification against thread ``6d30913e-dcd4-41c8-8941-f66c716cf359``: checkpoint showed 12 messages (summarize-corrupted), event store had 16. The original human message ``"最新伊美局势"`` was preserved as seq=1 in the event store and correctly restored to position 0 in the helper output. Helper output for AI messages was byte-identical to checkpoint for every overlapping message; only tool_result ids differed (patched to uuid5) and the legacy Command repr at seq=48 was sanitized. Tests: - ``test_thread_state_event_store.py`` — 18 tests covering ``_sanitize_legacy_command_repr`` (passthrough, single/double-quote extraction, unparseable fallback), helper happy path (all message types, stable uuid5, store non-mutation), multi-page pagination, summarize regression (recovers pre-summarize messages), feedback attachment (per-run, multi-run threads, repo failure graceful), and dependency failure fallback to ``None``. Docs: - ``docs/superpowers/plans/2026-04-10-event-store-history.md`` — the implementation plan this commit realizes, with Task 1 revised after the evaluation findings (pagination, copy-on-read, Command wrap already landed in journal.py, frontend feedback pagination in the follow-up commit, Standard-mode follow-up noted). - ``docs/superpowers/specs/2026-04-11-runjournal-history-evaluation.md`` — the Claude + second-opinion evaluation document that drove the plan revisions (pagination bug, dict-mutation bug, feedback hidden bug, Command bug). - ``docs/superpowers/specs/2026-04-11-summarize-marker-design.md`` — design for a follow-up PR that visually marks summarize events in history, based on a verified ``adispatch_custom_event`` experiment (``trace=False`` middleware nodes can still forward the Pregel task config via explicit signature injection). Scope: Gateway mode only (``make dev-pro``). Standard mode (``make dev``) hits LangGraph Server directly and bypasses these endpoints; the summarize symptom is still present there and is tracked as a separate follow-up in the plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-23 00:16:48 +00:00 · 2026-04-11 23:21:15 +08:00
parent ce24424449
commit 229c8095be
5 changed files with 1488 additions and 6 deletions
@@ -13,6 +13,7 @@ matching the LangGraph Platform wire format expected by the
 from __future__ import annotations

 import logging
+import re
 import time
 import uuid
 from typing import Any
@@ -21,7 +22,7 @@ from fastapi import APIRouter, HTTPException, Request
 from pydantic import BaseModel, Field, field_validator

 from app.gateway.authz import require_permission
-from app.gateway.deps import get_checkpointer
+from app.gateway.deps import get_checkpointer, get_current_user, get_feedback_repo, get_run_event_store
 from app.gateway.utils import sanitize_log_param
 from deerflow.config.paths import Paths, get_paths
 from deerflow.runtime import serialize_channel_values
@@ -402,6 +403,165 @@ async def get_thread(thread_id: str, request: Request) -> ThreadResponse:
    )


+# ---------------------------------------------------------------------------
+# Event-store-backed message loader
+# ---------------------------------------------------------------------------
+
+_LEGACY_CMD_INNER_CONTENT_RE = re.compile(
+    r"ToolMessage\(content=(?P<q>['\"])(?P<inner>.*?)(?P=q)",
+    re.DOTALL,
+)
+
+
+def _sanitize_legacy_command_repr(content_field: Any) -> Any:
+    """Recover the inner ToolMessage text from a legacy ``str(Command(...))`` repr.
+
+    Runs captured before the ``on_tool_end`` fix in ``journal.py`` stored
+    ``str(Command(update={'messages':[ToolMessage(content='X', ...)]}))`` as the
+    tool_result content. New runs store ``'X'`` directly. For legacy rows, try
+    to extract ``'X'`` defensively; return the original string if extraction
+    fails (still no worse than the checkpoint fallback for summarized threads).
+    """
+    if not isinstance(content_field, str) or not content_field.startswith("Command(update="):
+        return content_field
+    match = _LEGACY_CMD_INNER_CONTENT_RE.search(content_field)
+    return match.group("inner") if match else content_field
+
+
+async def _get_event_store_messages(request: Request, thread_id: str) -> list[dict] | None:
+    """Load the full message stream for ``thread_id`` from the event store.
+
+    The event store is append-only and unaffected by summarization — the
+    checkpoint's ``channel_values["messages"]`` is rewritten in-place when the
+    SummarizationMiddleware runs, which drops all pre-summarize messages. The
+    event store retains the full transcript, so callers in Gateway mode should
+    prefer it for rendering the conversation history.
+
+    In addition to the core message content, this helper attaches two extra
+    fields to every returned dict:
+
+    - ``run_id``: the ``run_id`` of the event that produced this message.
+      Always present.
+    - ``feedback``: thumbs-up/down data. Present only on the **final
+      ``ai_message`` of each run** (matching the per-run feedback semantics
+      of ``POST /api/threads/{id}/runs/{run_id}/feedback``). The frontend uses
+      the presence of this field to decide whether to render the feedback
+      button, which sidesteps the positional-index mapping bug that an
+      out-of-band ``/messages`` fetch exhibited.
+
+    Behaviour contract:
+
+    - **Full pagination.** ``RunEventStore.list_messages`` returns the newest
+      ``limit`` records when no cursor is given, so a fixed limit silently
+      drops older messages on long threads. We size the read from
+      ``count_messages()`` and then page forward with ``after_seq`` cursors.
+    - **Copy-on-read.** Each content dict is copied before ``id`` is patched
+      so the live store object is never mutated; ``MemoryRunEventStore``
+      returns live references.
+    - **Stable ids.** Messages with ``id=None`` (human + tool_result) receive
+      a deterministic ``uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")`` so React
+      keys are stable across requests without altering stored data. AI messages
+      retain their LLM-assigned ``lc_run--*`` ids.
+    - **Legacy Command repr.** Rows captured before the ``journal.py``
+      ``on_tool_end`` fix stored ``str(Command(update={...}))`` as the tool
+      result content. ``_sanitize_legacy_command_repr`` extracts the inner
+      ToolMessage text.
+    - **User context.** ``DbRunEventStore`` is user-scoped by default via
+      ``resolve_user_id(AUTO)`` in ``runtime/user_context.py``. This helper
+      must run inside a request where ``@require_permission`` has populated
+      the user contextvar. Both callers below are decorated appropriately.
+      Do not call this helper from CLI or migration scripts without passing
+      ``user_id=None`` explicitly to the underlying store methods.
+
+    Returns ``None`` when the event store is not configured or has no message
+    events for this thread, so callers fall back to checkpoint messages.
+    """
+    try:
+        event_store = get_run_event_store(request)
+    except Exception:
+        return None
+
+    try:
+        total = await event_store.count_messages(thread_id)
+    except Exception:
+        logger.exception("count_messages failed for thread %s", sanitize_log_param(thread_id))
+        return None
+    if not total:
+        return None
+
+    # Batch by page_size to keep memory bounded for very long threads.
+    page_size = 500
+    collected: list[dict] = []
+    after_seq: int | None = None
+    while True:
+        try:
+            page = await event_store.list_messages(thread_id, limit=page_size, after_seq=after_seq)
+        except Exception:
+            logger.exception("list_messages failed for thread %s", sanitize_log_param(thread_id))
+            return None
+        if not page:
+            break
+        collected.extend(page)
+        if len(page) < page_size:
+            break
+        next_cursor = page[-1].get("seq")
+        if next_cursor is None or (after_seq is not None and next_cursor <= after_seq):
+            break
+        after_seq = next_cursor
+
+    # Build the message list; track the final ``ai_message`` index per run so
+    # feedback can be attached at the right position (matches thread_runs.py).
+    messages: list[dict] = []
+    last_ai_per_run: dict[str, int] = {}
+    for evt in collected:
+        raw = evt.get("content")
+        if not isinstance(raw, dict) or "type" not in raw:
+            continue
+        content = dict(raw)
+        if content.get("id") is None:
+            content["id"] = str(uuid.uuid5(uuid.NAMESPACE_URL, f"{thread_id}:{evt['seq']}"))
+        if content.get("type") == "tool":
+            content["content"] = _sanitize_legacy_command_repr(content.get("content"))
+        run_id = evt.get("run_id")
+        if run_id:
+            content["run_id"] = run_id
+        if evt.get("event_type") == "ai_message" and run_id:
+            last_ai_per_run[run_id] = len(messages)
+        messages.append(content)
+
+    if not messages:
+        return None
+
+    # Attach feedback to the final ai_message of each run. If the feedback
+    # subsystem is unavailable, leave the ``feedback`` field absent entirely
+    # so the frontend hides the button rather than showing it over a broken
+    # write path.
+    feedback_available = False
+    feedback_map: dict[str, dict] = {}
+    try:
+        feedback_repo = get_feedback_repo(request)
+        user_id = await get_current_user(request)
+        feedback_map = await feedback_repo.list_by_thread_grouped(thread_id, user_id=user_id)
+        feedback_available = True
+    except Exception:
+        logger.exception("feedback lookup failed for thread %s", sanitize_log_param(thread_id))
+
+    if feedback_available:
+        for run_id, idx in last_ai_per_run.items():
+            fb = feedback_map.get(run_id)
+            messages[idx]["feedback"] = (
+                {
+                    "feedback_id": fb["feedback_id"],
+                    "rating": fb["rating"],
+                    "comment": fb.get("comment"),
+                }
+                if fb
+                else None
+            )
+
+    return messages
+
+
@router.get("/{thread_id}/state", response_model=ThreadStateResponse)
@require_permission("threads", "read", owner_check=True)
 async def get_thread_state(thread_id: str, request: Request) -> ThreadStateResponse:
@@ -440,8 +600,15 @@ async def get_thread_state(thread_id: str, request: Request) -> ThreadStateRespo
    next_tasks = [t.name for t in tasks_raw if hasattr(t, "name")]
    tasks = [{"id": getattr(t, "id", ""), "name": getattr(t, "name", "")} for t in tasks_raw]

+    values = serialize_channel_values(channel_values)
+
+    # Prefer event-store messages: append-only, immune to summarization.
+    es_messages = await _get_event_store_messages(request, thread_id)
+    if es_messages is not None:
+        values["messages"] = es_messages
+
    return ThreadStateResponse(
-        values=serialize_channel_values(channel_values),
+        values=values,
        next=next_tasks,
        metadata=metadata,
        checkpoint={"id": checkpoint_id, "ts": str(metadata.get("created_at", ""))},
@@ -559,6 +726,11 @@ async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request
    if body.before:
        config["configurable"]["checkpoint_id"] = body.before

+    # Load the full event-store message stream once; attach to the latest
+    # checkpoint entry only (matching the prior semantics). The event store
+    # is append-only and immune to summarization.
+    es_messages = await _get_event_store_messages(request, thread_id)
+
    entries: list[HistoryEntry] = []
    is_latest_checkpoint = True
    try:
@@ -582,11 +754,17 @@ async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request
            if thread_data := channel_values.get("thread_data"):
                values["thread_data"] = thread_data

-            # Attach messages from checkpointer only for the latest checkpoint
+            # Attach messages only to the latest checkpoint. Prefer the
+            # event-store stream (complete and unaffected by summarization);
+            # fall back to checkpoint channel_values when the event store is
+            # unavailable or empty.
            if is_latest_checkpoint:
-                messages = channel_values.get("messages")
-                if messages:
-                    values["messages"] = serialize_channel_values({"messages": messages}).get("messages", [])
+                if es_messages is not None:
+                    values["messages"] = es_messages
+                else:
+                    messages = channel_values.get("messages")
+                    if messages:
+                        values["messages"] = serialize_channel_values({"messages": messages}).get("messages", [])
            is_latest_checkpoint = False

            # Derive next tasks
@@ -0,0 +1,439 @@
+"""Tests for event-store-backed message loading in thread state/history endpoints.
+
+Covers the helper functions added to ``app/gateway/routers/threads.py``:
+
+- ``_sanitize_legacy_command_repr`` — extracts inner ToolMessage text from
+  legacy ``str(Command(...))`` strings captured before the ``journal.py``
+  fix for state-updating tools like ``present_files``.
+- ``_get_event_store_messages`` — loads the full message stream with full
+  pagination, copy-on-read id patching, legacy Command sanitization, and
+  a clean fallback to ``None`` when the event store is unavailable.
+"""
+
+from __future__ import annotations
+
+import uuid
+from types import SimpleNamespace
+from typing import Any
+
+import pytest
+
+from app.gateway.routers.threads import (
+    _get_event_store_messages,
+    _sanitize_legacy_command_repr,
+)
+from deerflow.runtime.events.store.memory import MemoryRunEventStore
+
+
+@pytest.fixture()
+def event_store() -> MemoryRunEventStore:
+    return MemoryRunEventStore()
+
+
+class _FakeFeedbackRepo:
+    """Minimal ``FeedbackRepository`` stand-in that returns a configured map."""
+
+    def __init__(self, by_run: dict[str, dict] | None = None) -> None:
+        self._by_run = by_run or {}
+
+    async def list_by_thread_grouped(self, thread_id: str, *, user_id: str | None) -> dict[str, dict]:
+        return dict(self._by_run)
+
+
+def _make_request(
+    event_store: MemoryRunEventStore,
+    feedback_repo: _FakeFeedbackRepo | None = None,
+) -> Any:
+    """Build a minimal FastAPI-like Request object.
+
+    ``get_run_event_store(request)`` reads ``request.app.state.run_event_store``.
+    ``get_feedback_repo(request)`` reads ``request.app.state.feedback_repo``.
+    ``get_current_user`` is monkey-patched separately in tests that need it.
+    """
+    state = SimpleNamespace(
+        run_event_store=event_store,
+        feedback_repo=feedback_repo or _FakeFeedbackRepo(),
+    )
+    app = SimpleNamespace(state=state)
+    return SimpleNamespace(app=app)
+
+
+@pytest.fixture(autouse=True)
+def _stub_current_user(monkeypatch):
+    """Stub out ``get_current_user`` so tests don't need real auth context."""
+    import app.gateway.routers.threads as threads_mod
+
+    async def _fake(_request):
+        return None
+
+    monkeypatch.setattr(threads_mod, "get_current_user", _fake)
+
+
+async def _seed_simple_run(store: MemoryRunEventStore, thread_id: str, run_id: str) -> None:
+    """Seed one run: human + ai_tool_call + tool_result + final ai_message, plus a trace."""
+    await store.put(
+        thread_id=thread_id, run_id=run_id,
+        event_type="human_message", category="message",
+        content={
+            "type": "human", "id": None,
+            "content": [{"type": "text", "text": "hello"}],
+            "additional_kwargs": {}, "response_metadata": {}, "name": None,
+        },
+    )
+    await store.put(
+        thread_id=thread_id, run_id=run_id,
+        event_type="ai_tool_call", category="message",
+        content={
+            "type": "ai", "id": "lc_run--tc1",
+            "content": "",
+            "tool_calls": [{"name": "search", "args": {"q": "x"}, "id": "call_1", "type": "tool_call"}],
+            "invalid_tool_calls": [],
+            "additional_kwargs": {}, "response_metadata": {}, "name": None,
+            "usage_metadata": {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15},
+        },
+    )
+    await store.put(
+        thread_id=thread_id, run_id=run_id,
+        event_type="tool_result", category="message",
+        content={
+            "type": "tool", "id": None,
+            "content": "results",
+            "tool_call_id": "call_1", "name": "search",
+            "artifact": None, "status": "success",
+            "additional_kwargs": {}, "response_metadata": {},
+        },
+    )
+    await store.put(
+        thread_id=thread_id, run_id=run_id,
+        event_type="ai_message", category="message",
+        content={
+            "type": "ai", "id": "lc_run--final1",
+            "content": "done",
+            "tool_calls": [], "invalid_tool_calls": [],
+            "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "name": None,
+            "usage_metadata": {"input_tokens": 20, "output_tokens": 10, "total_tokens": 30},
+        },
+    )
+    # Non-message trace — must be filtered out.
+    await store.put(
+        thread_id=thread_id, run_id=run_id,
+        event_type="llm_request", category="trace",
+        content={"model": "test"},
+    )
+
+
+class TestSanitizeLegacyCommandRepr:
+    def test_passthrough_non_string(self):
+        assert _sanitize_legacy_command_repr(None) is None
+        assert _sanitize_legacy_command_repr(42) == 42
+        assert _sanitize_legacy_command_repr([{"type": "text", "text": "x"}]) == [{"type": "text", "text": "x"}]
+
+    def test_passthrough_plain_string(self):
+        assert _sanitize_legacy_command_repr("Successfully presented files") == "Successfully presented files"
+        assert _sanitize_legacy_command_repr("") == ""
+
+    def test_extracts_inner_content_single_quotes(self):
+        legacy = (
+            "Command(update={'artifacts': ['/mnt/user-data/outputs/report.md'], "
+            "'messages': [ToolMessage(content='Successfully presented files', "
+            "tool_call_id='call_abc')]})"
+        )
+        assert _sanitize_legacy_command_repr(legacy) == "Successfully presented files"
+
+    def test_extracts_inner_content_double_quotes(self):
+        legacy = 'Command(update={"messages": [ToolMessage(content="ok", tool_call_id="x")]})'
+        assert _sanitize_legacy_command_repr(legacy) == "ok"
+
+    def test_unparseable_command_returns_original(self):
+        legacy = "Command(update={'something_else': 1})"
+        assert _sanitize_legacy_command_repr(legacy) == legacy
+
+
+class TestGetEventStoreMessages:
+    @pytest.mark.anyio
+    async def test_returns_none_when_store_empty(self, event_store):
+        request = _make_request(event_store)
+        assert await _get_event_store_messages(request, "t_missing") is None
+
+    @pytest.mark.anyio
+    async def test_extracts_all_message_types_in_order(self, event_store):
+        await _seed_simple_run(event_store, "t1", "r1")
+        request = _make_request(event_store)
+        messages = await _get_event_store_messages(request, "t1")
+        assert messages is not None
+        types = [m["type"] for m in messages]
+        assert types == ["human", "ai", "tool", "ai"]
+        # Trace events must not appear
+        for m in messages:
+            assert m.get("type") in {"human", "ai", "tool"}
+
+    @pytest.mark.anyio
+    async def test_null_ids_get_deterministic_uuid5(self, event_store):
+        await _seed_simple_run(event_store, "t1", "r1")
+        request = _make_request(event_store)
+        messages = await _get_event_store_messages(request, "t1")
+        assert messages is not None
+
+        # AI messages keep their LLM ids
+        assert messages[1]["id"] == "lc_run--tc1"
+        assert messages[3]["id"] == "lc_run--final1"
+
+        # Human (seq=1) + tool (seq=3) get deterministic uuid5
+        expected_human_id = str(uuid.uuid5(uuid.NAMESPACE_URL, "t1:1"))
+        expected_tool_id = str(uuid.uuid5(uuid.NAMESPACE_URL, "t1:3"))
+        assert messages[0]["id"] == expected_human_id
+        assert messages[2]["id"] == expected_tool_id
+
+        # Re-running produces the same ids (stability across requests)
+        messages2 = await _get_event_store_messages(request, "t1")
+        assert [m["id"] for m in messages2] == [m["id"] for m in messages]
+
+    @pytest.mark.anyio
+    async def test_helper_does_not_mutate_store(self, event_store):
+        """Helper must copy content dicts; the live store must stay unchanged."""
+        await _seed_simple_run(event_store, "t1", "r1")
+        request = _make_request(event_store)
+        _ = await _get_event_store_messages(request, "t1")
+
+        # Raw store records still have id=None for human/tool
+        raw = await event_store.list_messages("t1", limit=500)
+        human = next(e for e in raw if e["content"]["type"] == "human")
+        tool = next(e for e in raw if e["content"]["type"] == "tool")
+        assert human["content"]["id"] is None
+        assert tool["content"]["id"] is None
+
+    @pytest.mark.anyio
+    async def test_legacy_command_repr_sanitized(self, event_store):
+        """A tool_result whose content is a legacy ``str(Command(...))`` is cleaned."""
+        legacy = (
+            "Command(update={'artifacts': ['/mnt/user-data/outputs/x.md'], "
+            "'messages': [ToolMessage(content='Successfully presented files', "
+            "tool_call_id='call_p')]})"
+        )
+        await event_store.put(
+            thread_id="t2", run_id="r1",
+            event_type="tool_result", category="message",
+            content={
+                "type": "tool", "id": None,
+                "content": legacy,
+                "tool_call_id": "call_p", "name": "present_files",
+                "artifact": None, "status": "success",
+                "additional_kwargs": {}, "response_metadata": {},
+            },
+        )
+        request = _make_request(event_store)
+        messages = await _get_event_store_messages(request, "t2")
+        assert messages is not None and len(messages) == 1
+        assert messages[0]["content"] == "Successfully presented files"
+
+    @pytest.mark.anyio
+    async def test_pagination_covers_more_than_one_page(self, event_store, monkeypatch):
+        """Simulate a long thread that exceeds a single page to exercise the loop."""
+        thread_id = "t_long"
+        # Seed 12 human messages
+        for i in range(12):
+            await event_store.put(
+                thread_id=thread_id, run_id="r1",
+                event_type="human_message", category="message",
+                content={
+                    "type": "human", "id": None,
+                    "content": [{"type": "text", "text": f"msg {i}"}],
+                    "additional_kwargs": {}, "response_metadata": {}, "name": None,
+                },
+            )
+
+        # Force small page size to exercise pagination
+        import app.gateway.routers.threads as threads_mod
+        original = threads_mod._get_event_store_messages
+
+        # Monkeypatch MemoryRunEventStore.list_messages to assert it's called with cursor pagination
+        calls: list[dict] = []
+        real_list = event_store.list_messages
+
+        async def spy_list_messages(tid, *, limit=50, before_seq=None, after_seq=None):
+            calls.append({"limit": limit, "after_seq": after_seq})
+            return await real_list(tid, limit=limit, before_seq=before_seq, after_seq=after_seq)
+
+        monkeypatch.setattr(event_store, "list_messages", spy_list_messages)
+
+        request = _make_request(event_store)
+        messages = await original(request, thread_id)
+        assert messages is not None
+        assert len(messages) == 12
+        assert [m["content"][0]["text"] for m in messages] == [f"msg {i}" for i in range(12)]
+        # At least one call was made with after_seq=None (the initial page)
+        assert any(c["after_seq"] is None for c in calls)
+
+    @pytest.mark.anyio
+    async def test_summarize_regression_recovers_pre_summarize_messages(self, event_store):
+        """The exact bug: checkpoint would have only post-summarize messages;
+        event store must surface the original pre-summarize human query."""
+        # Run 1 (pre-summarize)
+        await event_store.put(
+            thread_id="t_sum", run_id="r1",
+            event_type="human_message", category="message",
+            content={
+                "type": "human", "id": None,
+                "content": [{"type": "text", "text": "original question"}],
+                "additional_kwargs": {}, "response_metadata": {}, "name": None,
+            },
+        )
+        await event_store.put(
+            thread_id="t_sum", run_id="r1",
+            event_type="ai_message", category="message",
+            content={
+                "type": "ai", "id": "lc_run--r1",
+                "content": "first answer",
+                "tool_calls": [], "invalid_tool_calls": [],
+                "additional_kwargs": {}, "response_metadata": {}, "name": None,
+                "usage_metadata": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
+            },
+        )
+        # Run 2 (post-summarize — what the checkpoint still has)
+        await event_store.put(
+            thread_id="t_sum", run_id="r2",
+            event_type="human_message", category="message",
+            content={
+                "type": "human", "id": None,
+                "content": [{"type": "text", "text": "follow up"}],
+                "additional_kwargs": {}, "response_metadata": {}, "name": None,
+            },
+        )
+        await event_store.put(
+            thread_id="t_sum", run_id="r2",
+            event_type="ai_message", category="message",
+            content={
+                "type": "ai", "id": "lc_run--r2",
+                "content": "second answer",
+                "tool_calls": [], "invalid_tool_calls": [],
+                "additional_kwargs": {}, "response_metadata": {}, "name": None,
+                "usage_metadata": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
+            },
+        )
+
+        request = _make_request(event_store)
+        messages = await _get_event_store_messages(request, "t_sum")
+        assert messages is not None
+        # 4 messages, not 2 (which is what the summarized checkpoint would yield)
+        assert len(messages) == 4
+        assert messages[0]["content"][0]["text"] == "original question"
+        assert messages[1]["id"] == "lc_run--r1"
+        assert messages[3]["id"] == "lc_run--r2"
+
+    @pytest.mark.anyio
+    async def test_run_id_attached_to_every_message(self, event_store):
+        await _seed_simple_run(event_store, "t1", "r1")
+        request = _make_request(event_store)
+        messages = await _get_event_store_messages(request, "t1")
+        assert messages is not None
+        assert all(m.get("run_id") == "r1" for m in messages)
+
+    @pytest.mark.anyio
+    async def test_feedback_attached_only_to_final_ai_message_per_run(self, event_store):
+        await _seed_simple_run(event_store, "t1", "r1")
+        feedback_repo = _FakeFeedbackRepo(
+            {"r1": {"feedback_id": "fb1", "rating": 1, "comment": "great"}}
+        )
+        request = _make_request(event_store, feedback_repo=feedback_repo)
+        messages = await _get_event_store_messages(request, "t1")
+        assert messages is not None
+
+        # human (0), ai_tool_call (1), tool (2), ai_message (3)
+        final_ai = messages[3]
+        assert final_ai["feedback"] == {
+            "feedback_id": "fb1",
+            "rating": 1,
+            "comment": "great",
+        }
+        # Non-final messages must NOT have a feedback key at all — the
+        # frontend keys button visibility off of this.
+        assert "feedback" not in messages[0]
+        assert "feedback" not in messages[1]
+        assert "feedback" not in messages[2]
+
+    @pytest.mark.anyio
+    async def test_feedback_none_when_no_row_for_run(self, event_store):
+        await _seed_simple_run(event_store, "t1", "r1")
+        request = _make_request(event_store, feedback_repo=_FakeFeedbackRepo({}))
+        messages = await _get_event_store_messages(request, "t1")
+        assert messages is not None
+        # Final ai_message gets an explicit ``None`` — distinguishes "eligible
+        # but unrated" from "not eligible" (field absent).
+        assert messages[3]["feedback"] is None
+
+    @pytest.mark.anyio
+    async def test_feedback_per_run_for_multi_run_thread(self, event_store):
+        """A thread with two runs: each final ai_message should get its own feedback."""
+        # Run 1
+        await event_store.put(
+            thread_id="t_multi", run_id="r1",
+            event_type="human_message", category="message",
+            content={"type": "human", "id": None, "content": "q1",
+                     "additional_kwargs": {}, "response_metadata": {}, "name": None},
+        )
+        await event_store.put(
+            thread_id="t_multi", run_id="r1",
+            event_type="ai_message", category="message",
+            content={"type": "ai", "id": "lc_run--a1", "content": "a1",
+                     "tool_calls": [], "invalid_tool_calls": [],
+                     "additional_kwargs": {}, "response_metadata": {}, "name": None,
+                     "usage_metadata": None},
+        )
+        # Run 2
+        await event_store.put(
+            thread_id="t_multi", run_id="r2",
+            event_type="human_message", category="message",
+            content={"type": "human", "id": None, "content": "q2",
+                     "additional_kwargs": {}, "response_metadata": {}, "name": None},
+        )
+        await event_store.put(
+            thread_id="t_multi", run_id="r2",
+            event_type="ai_message", category="message",
+            content={"type": "ai", "id": "lc_run--a2", "content": "a2",
+                     "tool_calls": [], "invalid_tool_calls": [],
+                     "additional_kwargs": {}, "response_metadata": {}, "name": None,
+                     "usage_metadata": None},
+        )
+        feedback_repo = _FakeFeedbackRepo({
+            "r1": {"feedback_id": "fb_r1", "rating": 1, "comment": None},
+            "r2": {"feedback_id": "fb_r2", "rating": -1, "comment": "meh"},
+        })
+        request = _make_request(event_store, feedback_repo=feedback_repo)
+        messages = await _get_event_store_messages(request, "t_multi")
+        assert messages is not None
+        # human[r1], ai[r1], human[r2], ai[r2]
+        assert messages[1]["feedback"]["feedback_id"] == "fb_r1"
+        assert messages[1]["feedback"]["rating"] == 1
+        assert messages[3]["feedback"]["feedback_id"] == "fb_r2"
+        assert messages[3]["feedback"]["rating"] == -1
+        # Humans don't get feedback
+        assert "feedback" not in messages[0]
+        assert "feedback" not in messages[2]
+
+    @pytest.mark.anyio
+    async def test_feedback_repo_failure_does_not_break_helper(self, monkeypatch, event_store):
+        """If feedback lookup throws, messages still come back without feedback."""
+        await _seed_simple_run(event_store, "t1", "r1")
+
+        class _BoomRepo:
+            async def list_by_thread_grouped(self, *a, **kw):
+                raise RuntimeError("db down")
+
+        request = _make_request(event_store, feedback_repo=_BoomRepo())
+        messages = await _get_event_store_messages(request, "t1")
+        assert messages is not None
+        assert len(messages) == 4
+        for m in messages:
+            assert "feedback" not in m
+
+    @pytest.mark.anyio
+    async def test_returns_none_when_dep_raises(self, monkeypatch, event_store):
+        """When ``get_run_event_store`` is not configured, helper returns None."""
+        import app.gateway.routers.threads as threads_mod
+
+        def boom(_request):
+            raise RuntimeError("no store")
+
+        monkeypatch.setattr(threads_mod, "get_run_event_store", boom)
+        request = _make_request(event_store)
+        assert await threads_mod._get_event_store_messages(request, "t1") is None