mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-05-23 00:16:48 +00:00
fix(threads): load history messages from event store, immune to summarize
``get_thread_history`` and ``get_thread_state`` in Gateway mode read
messages from ``checkpoint.channel_values["messages"]``. After
SummarizationMiddleware runs mid-run, that list is rewritten in-place:
pre-summarize messages are dropped and a synthetic summary-as-human
message takes position 0. The frontend then renders a chat history that
starts with ``"Here is a summary of the conversation to date:..."``
instead of the user's original query, and all earlier turns are gone.
The event store (``RunEventStore``) is append-only and never rewritten,
so it retains the full transcript. This commit adds a helper
``_get_event_store_messages`` that loads the event store's message
stream and overrides ``values["messages"]`` in both endpoints; the
checkpoint fallback kicks in only when the event store is unavailable.
Behavior contract of the helper:
- **Full pagination.** ``list_messages`` returns the newest ``limit``
records when no cursor is given, so a fixed limit silently drops
older messages on long threads. The helper sizes the read from
``count_messages()`` and pages forward with ``after_seq`` cursors.
- **Copy-on-read.** Each content dict is copied before ``id`` is
patched so the live store object (``MemoryRunEventStore`` returns
references) is never mutated.
- **Stable ids.** Messages with ``id=None`` (human + tool_result,
which don't receive an id until checkpoint persistence) get a
deterministic ``uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")`` so
React keys stay stable across requests. AI messages keep their
LLM-assigned ``lc_run--*`` ids.
- **Legacy ``Command`` repr sanitization.** Rows captured before the
``journal.py`` ``on_tool_end`` fix (previous commit) stored
``str(Command(update={'messages': [ToolMessage(content='X', ...)]}))``
as the tool_result content. ``_sanitize_legacy_command_repr``
regex-extracts the inner text so old threads render cleanly.
- **Inline feedback.** When loading the stream, the helper also pulls
``feedback_repo.list_by_thread_grouped`` and attaches ``run_id`` to
every message plus ``feedback`` to the final ``ai_message`` of each
run. This removes the frontend's need to fetch a second endpoint
and positional-index-map its way back to the right run. When the
feedback subsystem is unavailable, the ``feedback`` field is left
absent entirely so the frontend hides the button rather than
rendering it over a broken write path.
- **User context.** ``DbRunEventStore`` is user-scoped by default via
``resolve_user_id(AUTO)``. The helper relies on the ``@require_permission``
decorator having populated the user contextvar on both callers; the
docstring documents this dependency explicitly so nobody wires it
into a CLI or migration script without passing ``user_id=None``.
Real data verification against thread
``6d30913e-dcd4-41c8-8941-f66c716cf359``: checkpoint showed 12 messages
(summarize-corrupted), event store had 16. The original human message
``"最新伊美局势"`` was preserved as seq=1 in the event store and
correctly restored to position 0 in the helper output. Helper output
for AI messages was byte-identical to checkpoint for every overlapping
message; only tool_result ids differed (patched to uuid5) and the
legacy Command repr at seq=48 was sanitized.
Tests:
- ``test_thread_state_event_store.py`` — 18 tests covering
``_sanitize_legacy_command_repr`` (passthrough, single/double-quote
extraction, unparseable fallback), helper happy path (all message
types, stable uuid5, store non-mutation), multi-page pagination,
summarize regression (recovers pre-summarize messages), feedback
attachment (per-run, multi-run threads, repo failure graceful),
and dependency failure fallback to ``None``.
Docs:
- ``docs/superpowers/plans/2026-04-10-event-store-history.md`` — the
implementation plan this commit realizes, with Task 1 revised after
the evaluation findings (pagination, copy-on-read, Command wrap
already landed in journal.py, frontend feedback pagination in the
follow-up commit, Standard-mode follow-up noted).
- ``docs/superpowers/specs/2026-04-11-runjournal-history-evaluation.md``
— the Claude + second-opinion evaluation document that drove the
plan revisions (pagination bug, dict-mutation bug, feedback hidden
bug, Command bug).
- ``docs/superpowers/specs/2026-04-11-summarize-marker-design.md`` —
design for a follow-up PR that visually marks summarize events in
history, based on a verified ``adispatch_custom_event`` experiment
(``trace=False`` middleware nodes can still forward the Pregel task
config via explicit signature injection).
Scope: Gateway mode only (``make dev-pro``). Standard mode
(``make dev``) hits LangGraph Server directly and bypasses these
endpoints; the summarize symptom is still present there and is
tracked as a separate follow-up in the plan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -13,6 +13,7 @@ matching the LangGraph Platform wire format expected by the
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
import uuid
|
||||
from typing import Any
|
||||
@@ -21,7 +22,7 @@ from fastapi import APIRouter, HTTPException, Request
|
||||
from pydantic import BaseModel, Field, field_validator
|
||||
|
||||
from app.gateway.authz import require_permission
|
||||
from app.gateway.deps import get_checkpointer
|
||||
from app.gateway.deps import get_checkpointer, get_current_user, get_feedback_repo, get_run_event_store
|
||||
from app.gateway.utils import sanitize_log_param
|
||||
from deerflow.config.paths import Paths, get_paths
|
||||
from deerflow.runtime import serialize_channel_values
|
||||
@@ -402,6 +403,165 @@ async def get_thread(thread_id: str, request: Request) -> ThreadResponse:
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Event-store-backed message loader
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_LEGACY_CMD_INNER_CONTENT_RE = re.compile(
|
||||
r"ToolMessage\(content=(?P<q>['\"])(?P<inner>.*?)(?P=q)",
|
||||
re.DOTALL,
|
||||
)
|
||||
|
||||
|
||||
def _sanitize_legacy_command_repr(content_field: Any) -> Any:
|
||||
"""Recover the inner ToolMessage text from a legacy ``str(Command(...))`` repr.
|
||||
|
||||
Runs captured before the ``on_tool_end`` fix in ``journal.py`` stored
|
||||
``str(Command(update={'messages':[ToolMessage(content='X', ...)]}))`` as the
|
||||
tool_result content. New runs store ``'X'`` directly. For legacy rows, try
|
||||
to extract ``'X'`` defensively; return the original string if extraction
|
||||
fails (still no worse than the checkpoint fallback for summarized threads).
|
||||
"""
|
||||
if not isinstance(content_field, str) or not content_field.startswith("Command(update="):
|
||||
return content_field
|
||||
match = _LEGACY_CMD_INNER_CONTENT_RE.search(content_field)
|
||||
return match.group("inner") if match else content_field
|
||||
|
||||
|
||||
async def _get_event_store_messages(request: Request, thread_id: str) -> list[dict] | None:
|
||||
"""Load the full message stream for ``thread_id`` from the event store.
|
||||
|
||||
The event store is append-only and unaffected by summarization — the
|
||||
checkpoint's ``channel_values["messages"]`` is rewritten in-place when the
|
||||
SummarizationMiddleware runs, which drops all pre-summarize messages. The
|
||||
event store retains the full transcript, so callers in Gateway mode should
|
||||
prefer it for rendering the conversation history.
|
||||
|
||||
In addition to the core message content, this helper attaches two extra
|
||||
fields to every returned dict:
|
||||
|
||||
- ``run_id``: the ``run_id`` of the event that produced this message.
|
||||
Always present.
|
||||
- ``feedback``: thumbs-up/down data. Present only on the **final
|
||||
``ai_message`` of each run** (matching the per-run feedback semantics
|
||||
of ``POST /api/threads/{id}/runs/{run_id}/feedback``). The frontend uses
|
||||
the presence of this field to decide whether to render the feedback
|
||||
button, which sidesteps the positional-index mapping bug that an
|
||||
out-of-band ``/messages`` fetch exhibited.
|
||||
|
||||
Behaviour contract:
|
||||
|
||||
- **Full pagination.** ``RunEventStore.list_messages`` returns the newest
|
||||
``limit`` records when no cursor is given, so a fixed limit silently
|
||||
drops older messages on long threads. We size the read from
|
||||
``count_messages()`` and then page forward with ``after_seq`` cursors.
|
||||
- **Copy-on-read.** Each content dict is copied before ``id`` is patched
|
||||
so the live store object is never mutated; ``MemoryRunEventStore``
|
||||
returns live references.
|
||||
- **Stable ids.** Messages with ``id=None`` (human + tool_result) receive
|
||||
a deterministic ``uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")`` so React
|
||||
keys are stable across requests without altering stored data. AI messages
|
||||
retain their LLM-assigned ``lc_run--*`` ids.
|
||||
- **Legacy Command repr.** Rows captured before the ``journal.py``
|
||||
``on_tool_end`` fix stored ``str(Command(update={...}))`` as the tool
|
||||
result content. ``_sanitize_legacy_command_repr`` extracts the inner
|
||||
ToolMessage text.
|
||||
- **User context.** ``DbRunEventStore`` is user-scoped by default via
|
||||
``resolve_user_id(AUTO)`` in ``runtime/user_context.py``. This helper
|
||||
must run inside a request where ``@require_permission`` has populated
|
||||
the user contextvar. Both callers below are decorated appropriately.
|
||||
Do not call this helper from CLI or migration scripts without passing
|
||||
``user_id=None`` explicitly to the underlying store methods.
|
||||
|
||||
Returns ``None`` when the event store is not configured or has no message
|
||||
events for this thread, so callers fall back to checkpoint messages.
|
||||
"""
|
||||
try:
|
||||
event_store = get_run_event_store(request)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
try:
|
||||
total = await event_store.count_messages(thread_id)
|
||||
except Exception:
|
||||
logger.exception("count_messages failed for thread %s", sanitize_log_param(thread_id))
|
||||
return None
|
||||
if not total:
|
||||
return None
|
||||
|
||||
# Batch by page_size to keep memory bounded for very long threads.
|
||||
page_size = 500
|
||||
collected: list[dict] = []
|
||||
after_seq: int | None = None
|
||||
while True:
|
||||
try:
|
||||
page = await event_store.list_messages(thread_id, limit=page_size, after_seq=after_seq)
|
||||
except Exception:
|
||||
logger.exception("list_messages failed for thread %s", sanitize_log_param(thread_id))
|
||||
return None
|
||||
if not page:
|
||||
break
|
||||
collected.extend(page)
|
||||
if len(page) < page_size:
|
||||
break
|
||||
next_cursor = page[-1].get("seq")
|
||||
if next_cursor is None or (after_seq is not None and next_cursor <= after_seq):
|
||||
break
|
||||
after_seq = next_cursor
|
||||
|
||||
# Build the message list; track the final ``ai_message`` index per run so
|
||||
# feedback can be attached at the right position (matches thread_runs.py).
|
||||
messages: list[dict] = []
|
||||
last_ai_per_run: dict[str, int] = {}
|
||||
for evt in collected:
|
||||
raw = evt.get("content")
|
||||
if not isinstance(raw, dict) or "type" not in raw:
|
||||
continue
|
||||
content = dict(raw)
|
||||
if content.get("id") is None:
|
||||
content["id"] = str(uuid.uuid5(uuid.NAMESPACE_URL, f"{thread_id}:{evt['seq']}"))
|
||||
if content.get("type") == "tool":
|
||||
content["content"] = _sanitize_legacy_command_repr(content.get("content"))
|
||||
run_id = evt.get("run_id")
|
||||
if run_id:
|
||||
content["run_id"] = run_id
|
||||
if evt.get("event_type") == "ai_message" and run_id:
|
||||
last_ai_per_run[run_id] = len(messages)
|
||||
messages.append(content)
|
||||
|
||||
if not messages:
|
||||
return None
|
||||
|
||||
# Attach feedback to the final ai_message of each run. If the feedback
|
||||
# subsystem is unavailable, leave the ``feedback`` field absent entirely
|
||||
# so the frontend hides the button rather than showing it over a broken
|
||||
# write path.
|
||||
feedback_available = False
|
||||
feedback_map: dict[str, dict] = {}
|
||||
try:
|
||||
feedback_repo = get_feedback_repo(request)
|
||||
user_id = await get_current_user(request)
|
||||
feedback_map = await feedback_repo.list_by_thread_grouped(thread_id, user_id=user_id)
|
||||
feedback_available = True
|
||||
except Exception:
|
||||
logger.exception("feedback lookup failed for thread %s", sanitize_log_param(thread_id))
|
||||
|
||||
if feedback_available:
|
||||
for run_id, idx in last_ai_per_run.items():
|
||||
fb = feedback_map.get(run_id)
|
||||
messages[idx]["feedback"] = (
|
||||
{
|
||||
"feedback_id": fb["feedback_id"],
|
||||
"rating": fb["rating"],
|
||||
"comment": fb.get("comment"),
|
||||
}
|
||||
if fb
|
||||
else None
|
||||
)
|
||||
|
||||
return messages
|
||||
|
||||
|
||||
@router.get("/{thread_id}/state", response_model=ThreadStateResponse)
|
||||
@require_permission("threads", "read", owner_check=True)
|
||||
async def get_thread_state(thread_id: str, request: Request) -> ThreadStateResponse:
|
||||
@@ -440,8 +600,15 @@ async def get_thread_state(thread_id: str, request: Request) -> ThreadStateRespo
|
||||
next_tasks = [t.name for t in tasks_raw if hasattr(t, "name")]
|
||||
tasks = [{"id": getattr(t, "id", ""), "name": getattr(t, "name", "")} for t in tasks_raw]
|
||||
|
||||
values = serialize_channel_values(channel_values)
|
||||
|
||||
# Prefer event-store messages: append-only, immune to summarization.
|
||||
es_messages = await _get_event_store_messages(request, thread_id)
|
||||
if es_messages is not None:
|
||||
values["messages"] = es_messages
|
||||
|
||||
return ThreadStateResponse(
|
||||
values=serialize_channel_values(channel_values),
|
||||
values=values,
|
||||
next=next_tasks,
|
||||
metadata=metadata,
|
||||
checkpoint={"id": checkpoint_id, "ts": str(metadata.get("created_at", ""))},
|
||||
@@ -559,6 +726,11 @@ async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request
|
||||
if body.before:
|
||||
config["configurable"]["checkpoint_id"] = body.before
|
||||
|
||||
# Load the full event-store message stream once; attach to the latest
|
||||
# checkpoint entry only (matching the prior semantics). The event store
|
||||
# is append-only and immune to summarization.
|
||||
es_messages = await _get_event_store_messages(request, thread_id)
|
||||
|
||||
entries: list[HistoryEntry] = []
|
||||
is_latest_checkpoint = True
|
||||
try:
|
||||
@@ -582,11 +754,17 @@ async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request
|
||||
if thread_data := channel_values.get("thread_data"):
|
||||
values["thread_data"] = thread_data
|
||||
|
||||
# Attach messages from checkpointer only for the latest checkpoint
|
||||
# Attach messages only to the latest checkpoint. Prefer the
|
||||
# event-store stream (complete and unaffected by summarization);
|
||||
# fall back to checkpoint channel_values when the event store is
|
||||
# unavailable or empty.
|
||||
if is_latest_checkpoint:
|
||||
messages = channel_values.get("messages")
|
||||
if messages:
|
||||
values["messages"] = serialize_channel_values({"messages": messages}).get("messages", [])
|
||||
if es_messages is not None:
|
||||
values["messages"] = es_messages
|
||||
else:
|
||||
messages = channel_values.get("messages")
|
||||
if messages:
|
||||
values["messages"] = serialize_channel_values({"messages": messages}).get("messages", [])
|
||||
is_latest_checkpoint = False
|
||||
|
||||
# Derive next tasks
|
||||
|
||||
@@ -0,0 +1,439 @@
|
||||
"""Tests for event-store-backed message loading in thread state/history endpoints.
|
||||
|
||||
Covers the helper functions added to ``app/gateway/routers/threads.py``:
|
||||
|
||||
- ``_sanitize_legacy_command_repr`` — extracts inner ToolMessage text from
|
||||
legacy ``str(Command(...))`` strings captured before the ``journal.py``
|
||||
fix for state-updating tools like ``present_files``.
|
||||
- ``_get_event_store_messages`` — loads the full message stream with full
|
||||
pagination, copy-on-read id patching, legacy Command sanitization, and
|
||||
a clean fallback to ``None`` when the event store is unavailable.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import uuid
|
||||
from types import SimpleNamespace
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
from app.gateway.routers.threads import (
|
||||
_get_event_store_messages,
|
||||
_sanitize_legacy_command_repr,
|
||||
)
|
||||
from deerflow.runtime.events.store.memory import MemoryRunEventStore
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def event_store() -> MemoryRunEventStore:
|
||||
return MemoryRunEventStore()
|
||||
|
||||
|
||||
class _FakeFeedbackRepo:
|
||||
"""Minimal ``FeedbackRepository`` stand-in that returns a configured map."""
|
||||
|
||||
def __init__(self, by_run: dict[str, dict] | None = None) -> None:
|
||||
self._by_run = by_run or {}
|
||||
|
||||
async def list_by_thread_grouped(self, thread_id: str, *, user_id: str | None) -> dict[str, dict]:
|
||||
return dict(self._by_run)
|
||||
|
||||
|
||||
def _make_request(
|
||||
event_store: MemoryRunEventStore,
|
||||
feedback_repo: _FakeFeedbackRepo | None = None,
|
||||
) -> Any:
|
||||
"""Build a minimal FastAPI-like Request object.
|
||||
|
||||
``get_run_event_store(request)`` reads ``request.app.state.run_event_store``.
|
||||
``get_feedback_repo(request)`` reads ``request.app.state.feedback_repo``.
|
||||
``get_current_user`` is monkey-patched separately in tests that need it.
|
||||
"""
|
||||
state = SimpleNamespace(
|
||||
run_event_store=event_store,
|
||||
feedback_repo=feedback_repo or _FakeFeedbackRepo(),
|
||||
)
|
||||
app = SimpleNamespace(state=state)
|
||||
return SimpleNamespace(app=app)
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _stub_current_user(monkeypatch):
|
||||
"""Stub out ``get_current_user`` so tests don't need real auth context."""
|
||||
import app.gateway.routers.threads as threads_mod
|
||||
|
||||
async def _fake(_request):
|
||||
return None
|
||||
|
||||
monkeypatch.setattr(threads_mod, "get_current_user", _fake)
|
||||
|
||||
|
||||
async def _seed_simple_run(store: MemoryRunEventStore, thread_id: str, run_id: str) -> None:
|
||||
"""Seed one run: human + ai_tool_call + tool_result + final ai_message, plus a trace."""
|
||||
await store.put(
|
||||
thread_id=thread_id, run_id=run_id,
|
||||
event_type="human_message", category="message",
|
||||
content={
|
||||
"type": "human", "id": None,
|
||||
"content": [{"type": "text", "text": "hello"}],
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None,
|
||||
},
|
||||
)
|
||||
await store.put(
|
||||
thread_id=thread_id, run_id=run_id,
|
||||
event_type="ai_tool_call", category="message",
|
||||
content={
|
||||
"type": "ai", "id": "lc_run--tc1",
|
||||
"content": "",
|
||||
"tool_calls": [{"name": "search", "args": {"q": "x"}, "id": "call_1", "type": "tool_call"}],
|
||||
"invalid_tool_calls": [],
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None,
|
||||
"usage_metadata": {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15},
|
||||
},
|
||||
)
|
||||
await store.put(
|
||||
thread_id=thread_id, run_id=run_id,
|
||||
event_type="tool_result", category="message",
|
||||
content={
|
||||
"type": "tool", "id": None,
|
||||
"content": "results",
|
||||
"tool_call_id": "call_1", "name": "search",
|
||||
"artifact": None, "status": "success",
|
||||
"additional_kwargs": {}, "response_metadata": {},
|
||||
},
|
||||
)
|
||||
await store.put(
|
||||
thread_id=thread_id, run_id=run_id,
|
||||
event_type="ai_message", category="message",
|
||||
content={
|
||||
"type": "ai", "id": "lc_run--final1",
|
||||
"content": "done",
|
||||
"tool_calls": [], "invalid_tool_calls": [],
|
||||
"additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "name": None,
|
||||
"usage_metadata": {"input_tokens": 20, "output_tokens": 10, "total_tokens": 30},
|
||||
},
|
||||
)
|
||||
# Non-message trace — must be filtered out.
|
||||
await store.put(
|
||||
thread_id=thread_id, run_id=run_id,
|
||||
event_type="llm_request", category="trace",
|
||||
content={"model": "test"},
|
||||
)
|
||||
|
||||
|
||||
class TestSanitizeLegacyCommandRepr:
|
||||
def test_passthrough_non_string(self):
|
||||
assert _sanitize_legacy_command_repr(None) is None
|
||||
assert _sanitize_legacy_command_repr(42) == 42
|
||||
assert _sanitize_legacy_command_repr([{"type": "text", "text": "x"}]) == [{"type": "text", "text": "x"}]
|
||||
|
||||
def test_passthrough_plain_string(self):
|
||||
assert _sanitize_legacy_command_repr("Successfully presented files") == "Successfully presented files"
|
||||
assert _sanitize_legacy_command_repr("") == ""
|
||||
|
||||
def test_extracts_inner_content_single_quotes(self):
|
||||
legacy = (
|
||||
"Command(update={'artifacts': ['/mnt/user-data/outputs/report.md'], "
|
||||
"'messages': [ToolMessage(content='Successfully presented files', "
|
||||
"tool_call_id='call_abc')]})"
|
||||
)
|
||||
assert _sanitize_legacy_command_repr(legacy) == "Successfully presented files"
|
||||
|
||||
def test_extracts_inner_content_double_quotes(self):
|
||||
legacy = 'Command(update={"messages": [ToolMessage(content="ok", tool_call_id="x")]})'
|
||||
assert _sanitize_legacy_command_repr(legacy) == "ok"
|
||||
|
||||
def test_unparseable_command_returns_original(self):
|
||||
legacy = "Command(update={'something_else': 1})"
|
||||
assert _sanitize_legacy_command_repr(legacy) == legacy
|
||||
|
||||
|
||||
class TestGetEventStoreMessages:
|
||||
@pytest.mark.anyio
|
||||
async def test_returns_none_when_store_empty(self, event_store):
|
||||
request = _make_request(event_store)
|
||||
assert await _get_event_store_messages(request, "t_missing") is None
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_extracts_all_message_types_in_order(self, event_store):
|
||||
await _seed_simple_run(event_store, "t1", "r1")
|
||||
request = _make_request(event_store)
|
||||
messages = await _get_event_store_messages(request, "t1")
|
||||
assert messages is not None
|
||||
types = [m["type"] for m in messages]
|
||||
assert types == ["human", "ai", "tool", "ai"]
|
||||
# Trace events must not appear
|
||||
for m in messages:
|
||||
assert m.get("type") in {"human", "ai", "tool"}
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_null_ids_get_deterministic_uuid5(self, event_store):
|
||||
await _seed_simple_run(event_store, "t1", "r1")
|
||||
request = _make_request(event_store)
|
||||
messages = await _get_event_store_messages(request, "t1")
|
||||
assert messages is not None
|
||||
|
||||
# AI messages keep their LLM ids
|
||||
assert messages[1]["id"] == "lc_run--tc1"
|
||||
assert messages[3]["id"] == "lc_run--final1"
|
||||
|
||||
# Human (seq=1) + tool (seq=3) get deterministic uuid5
|
||||
expected_human_id = str(uuid.uuid5(uuid.NAMESPACE_URL, "t1:1"))
|
||||
expected_tool_id = str(uuid.uuid5(uuid.NAMESPACE_URL, "t1:3"))
|
||||
assert messages[0]["id"] == expected_human_id
|
||||
assert messages[2]["id"] == expected_tool_id
|
||||
|
||||
# Re-running produces the same ids (stability across requests)
|
||||
messages2 = await _get_event_store_messages(request, "t1")
|
||||
assert [m["id"] for m in messages2] == [m["id"] for m in messages]
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_helper_does_not_mutate_store(self, event_store):
|
||||
"""Helper must copy content dicts; the live store must stay unchanged."""
|
||||
await _seed_simple_run(event_store, "t1", "r1")
|
||||
request = _make_request(event_store)
|
||||
_ = await _get_event_store_messages(request, "t1")
|
||||
|
||||
# Raw store records still have id=None for human/tool
|
||||
raw = await event_store.list_messages("t1", limit=500)
|
||||
human = next(e for e in raw if e["content"]["type"] == "human")
|
||||
tool = next(e for e in raw if e["content"]["type"] == "tool")
|
||||
assert human["content"]["id"] is None
|
||||
assert tool["content"]["id"] is None
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_legacy_command_repr_sanitized(self, event_store):
|
||||
"""A tool_result whose content is a legacy ``str(Command(...))`` is cleaned."""
|
||||
legacy = (
|
||||
"Command(update={'artifacts': ['/mnt/user-data/outputs/x.md'], "
|
||||
"'messages': [ToolMessage(content='Successfully presented files', "
|
||||
"tool_call_id='call_p')]})"
|
||||
)
|
||||
await event_store.put(
|
||||
thread_id="t2", run_id="r1",
|
||||
event_type="tool_result", category="message",
|
||||
content={
|
||||
"type": "tool", "id": None,
|
||||
"content": legacy,
|
||||
"tool_call_id": "call_p", "name": "present_files",
|
||||
"artifact": None, "status": "success",
|
||||
"additional_kwargs": {}, "response_metadata": {},
|
||||
},
|
||||
)
|
||||
request = _make_request(event_store)
|
||||
messages = await _get_event_store_messages(request, "t2")
|
||||
assert messages is not None and len(messages) == 1
|
||||
assert messages[0]["content"] == "Successfully presented files"
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_pagination_covers_more_than_one_page(self, event_store, monkeypatch):
|
||||
"""Simulate a long thread that exceeds a single page to exercise the loop."""
|
||||
thread_id = "t_long"
|
||||
# Seed 12 human messages
|
||||
for i in range(12):
|
||||
await event_store.put(
|
||||
thread_id=thread_id, run_id="r1",
|
||||
event_type="human_message", category="message",
|
||||
content={
|
||||
"type": "human", "id": None,
|
||||
"content": [{"type": "text", "text": f"msg {i}"}],
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None,
|
||||
},
|
||||
)
|
||||
|
||||
# Force small page size to exercise pagination
|
||||
import app.gateway.routers.threads as threads_mod
|
||||
original = threads_mod._get_event_store_messages
|
||||
|
||||
# Monkeypatch MemoryRunEventStore.list_messages to assert it's called with cursor pagination
|
||||
calls: list[dict] = []
|
||||
real_list = event_store.list_messages
|
||||
|
||||
async def spy_list_messages(tid, *, limit=50, before_seq=None, after_seq=None):
|
||||
calls.append({"limit": limit, "after_seq": after_seq})
|
||||
return await real_list(tid, limit=limit, before_seq=before_seq, after_seq=after_seq)
|
||||
|
||||
monkeypatch.setattr(event_store, "list_messages", spy_list_messages)
|
||||
|
||||
request = _make_request(event_store)
|
||||
messages = await original(request, thread_id)
|
||||
assert messages is not None
|
||||
assert len(messages) == 12
|
||||
assert [m["content"][0]["text"] for m in messages] == [f"msg {i}" for i in range(12)]
|
||||
# At least one call was made with after_seq=None (the initial page)
|
||||
assert any(c["after_seq"] is None for c in calls)
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_summarize_regression_recovers_pre_summarize_messages(self, event_store):
|
||||
"""The exact bug: checkpoint would have only post-summarize messages;
|
||||
event store must surface the original pre-summarize human query."""
|
||||
# Run 1 (pre-summarize)
|
||||
await event_store.put(
|
||||
thread_id="t_sum", run_id="r1",
|
||||
event_type="human_message", category="message",
|
||||
content={
|
||||
"type": "human", "id": None,
|
||||
"content": [{"type": "text", "text": "original question"}],
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None,
|
||||
},
|
||||
)
|
||||
await event_store.put(
|
||||
thread_id="t_sum", run_id="r1",
|
||||
event_type="ai_message", category="message",
|
||||
content={
|
||||
"type": "ai", "id": "lc_run--r1",
|
||||
"content": "first answer",
|
||||
"tool_calls": [], "invalid_tool_calls": [],
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None,
|
||||
"usage_metadata": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
|
||||
},
|
||||
)
|
||||
# Run 2 (post-summarize — what the checkpoint still has)
|
||||
await event_store.put(
|
||||
thread_id="t_sum", run_id="r2",
|
||||
event_type="human_message", category="message",
|
||||
content={
|
||||
"type": "human", "id": None,
|
||||
"content": [{"type": "text", "text": "follow up"}],
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None,
|
||||
},
|
||||
)
|
||||
await event_store.put(
|
||||
thread_id="t_sum", run_id="r2",
|
||||
event_type="ai_message", category="message",
|
||||
content={
|
||||
"type": "ai", "id": "lc_run--r2",
|
||||
"content": "second answer",
|
||||
"tool_calls": [], "invalid_tool_calls": [],
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None,
|
||||
"usage_metadata": {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0},
|
||||
},
|
||||
)
|
||||
|
||||
request = _make_request(event_store)
|
||||
messages = await _get_event_store_messages(request, "t_sum")
|
||||
assert messages is not None
|
||||
# 4 messages, not 2 (which is what the summarized checkpoint would yield)
|
||||
assert len(messages) == 4
|
||||
assert messages[0]["content"][0]["text"] == "original question"
|
||||
assert messages[1]["id"] == "lc_run--r1"
|
||||
assert messages[3]["id"] == "lc_run--r2"
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_run_id_attached_to_every_message(self, event_store):
|
||||
await _seed_simple_run(event_store, "t1", "r1")
|
||||
request = _make_request(event_store)
|
||||
messages = await _get_event_store_messages(request, "t1")
|
||||
assert messages is not None
|
||||
assert all(m.get("run_id") == "r1" for m in messages)
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_feedback_attached_only_to_final_ai_message_per_run(self, event_store):
|
||||
await _seed_simple_run(event_store, "t1", "r1")
|
||||
feedback_repo = _FakeFeedbackRepo(
|
||||
{"r1": {"feedback_id": "fb1", "rating": 1, "comment": "great"}}
|
||||
)
|
||||
request = _make_request(event_store, feedback_repo=feedback_repo)
|
||||
messages = await _get_event_store_messages(request, "t1")
|
||||
assert messages is not None
|
||||
|
||||
# human (0), ai_tool_call (1), tool (2), ai_message (3)
|
||||
final_ai = messages[3]
|
||||
assert final_ai["feedback"] == {
|
||||
"feedback_id": "fb1",
|
||||
"rating": 1,
|
||||
"comment": "great",
|
||||
}
|
||||
# Non-final messages must NOT have a feedback key at all — the
|
||||
# frontend keys button visibility off of this.
|
||||
assert "feedback" not in messages[0]
|
||||
assert "feedback" not in messages[1]
|
||||
assert "feedback" not in messages[2]
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_feedback_none_when_no_row_for_run(self, event_store):
|
||||
await _seed_simple_run(event_store, "t1", "r1")
|
||||
request = _make_request(event_store, feedback_repo=_FakeFeedbackRepo({}))
|
||||
messages = await _get_event_store_messages(request, "t1")
|
||||
assert messages is not None
|
||||
# Final ai_message gets an explicit ``None`` — distinguishes "eligible
|
||||
# but unrated" from "not eligible" (field absent).
|
||||
assert messages[3]["feedback"] is None
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_feedback_per_run_for_multi_run_thread(self, event_store):
|
||||
"""A thread with two runs: each final ai_message should get its own feedback."""
|
||||
# Run 1
|
||||
await event_store.put(
|
||||
thread_id="t_multi", run_id="r1",
|
||||
event_type="human_message", category="message",
|
||||
content={"type": "human", "id": None, "content": "q1",
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None},
|
||||
)
|
||||
await event_store.put(
|
||||
thread_id="t_multi", run_id="r1",
|
||||
event_type="ai_message", category="message",
|
||||
content={"type": "ai", "id": "lc_run--a1", "content": "a1",
|
||||
"tool_calls": [], "invalid_tool_calls": [],
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None,
|
||||
"usage_metadata": None},
|
||||
)
|
||||
# Run 2
|
||||
await event_store.put(
|
||||
thread_id="t_multi", run_id="r2",
|
||||
event_type="human_message", category="message",
|
||||
content={"type": "human", "id": None, "content": "q2",
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None},
|
||||
)
|
||||
await event_store.put(
|
||||
thread_id="t_multi", run_id="r2",
|
||||
event_type="ai_message", category="message",
|
||||
content={"type": "ai", "id": "lc_run--a2", "content": "a2",
|
||||
"tool_calls": [], "invalid_tool_calls": [],
|
||||
"additional_kwargs": {}, "response_metadata": {}, "name": None,
|
||||
"usage_metadata": None},
|
||||
)
|
||||
feedback_repo = _FakeFeedbackRepo({
|
||||
"r1": {"feedback_id": "fb_r1", "rating": 1, "comment": None},
|
||||
"r2": {"feedback_id": "fb_r2", "rating": -1, "comment": "meh"},
|
||||
})
|
||||
request = _make_request(event_store, feedback_repo=feedback_repo)
|
||||
messages = await _get_event_store_messages(request, "t_multi")
|
||||
assert messages is not None
|
||||
# human[r1], ai[r1], human[r2], ai[r2]
|
||||
assert messages[1]["feedback"]["feedback_id"] == "fb_r1"
|
||||
assert messages[1]["feedback"]["rating"] == 1
|
||||
assert messages[3]["feedback"]["feedback_id"] == "fb_r2"
|
||||
assert messages[3]["feedback"]["rating"] == -1
|
||||
# Humans don't get feedback
|
||||
assert "feedback" not in messages[0]
|
||||
assert "feedback" not in messages[2]
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_feedback_repo_failure_does_not_break_helper(self, monkeypatch, event_store):
|
||||
"""If feedback lookup throws, messages still come back without feedback."""
|
||||
await _seed_simple_run(event_store, "t1", "r1")
|
||||
|
||||
class _BoomRepo:
|
||||
async def list_by_thread_grouped(self, *a, **kw):
|
||||
raise RuntimeError("db down")
|
||||
|
||||
request = _make_request(event_store, feedback_repo=_BoomRepo())
|
||||
messages = await _get_event_store_messages(request, "t1")
|
||||
assert messages is not None
|
||||
assert len(messages) == 4
|
||||
for m in messages:
|
||||
assert "feedback" not in m
|
||||
|
||||
@pytest.mark.anyio
|
||||
async def test_returns_none_when_dep_raises(self, monkeypatch, event_store):
|
||||
"""When ``get_run_event_store`` is not configured, helper returns None."""
|
||||
import app.gateway.routers.threads as threads_mod
|
||||
|
||||
def boom(_request):
|
||||
raise RuntimeError("no store")
|
||||
|
||||
monkeypatch.setattr(threads_mod, "get_run_event_store", boom)
|
||||
request = _make_request(event_store)
|
||||
assert await threads_mod._get_event_store_messages(request, "t1") is None
|
||||
Reference in New Issue
Block a user