fix(backend): stream DeerFlowClient AI text as token deltas (#1969) (#1974)

* fix(backend): stream DeerFlowClient AI text as token deltas (#1969)

DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values",
"custom"] which only delivers full-state snapshots at graph-node
boundaries, so AI replies were dumped as a single messages-tuple event
per node instead of streaming token-by-token. `client.stream("hello")`
looked identical to `client.chat("hello")` — the bug reported in #1969.

Subscribe to "messages" mode as well, forward AIMessageChunk deltas as
messages-tuple events with delta semantics (consumers accumulate by id),
and dedup the values-snapshot path so it does not re-synthesize AI
text that was already streamed. Introduce a per-id usage_metadata
counter so the final AIMessage in the values snapshot and the final
"messages" chunk — which carry the same cumulative usage — are not
double-counted.

chat() now accumulates per-id deltas and returns the last message's
full accumulated text. Non-streaming mock sources (single event per id)
are a degenerate case of the same logic, keeping existing callers and
tests backward compatible.

Verified end-to-end against a real LLM: a 15-number count emits 35
messages-tuple events with BPE subword boundaries clearly visible
("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across
the window, end-event usage matches the values-snapshot usage exactly
(not doubled). tests/test_client_live.py::TestLiveStreaming passes.

New unit tests:
- test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3
  delta events with correct content/id/usage, values-snapshot does not
  duplicate, usage counted once.
- test_chat_accumulates_streamed_deltas: chat() rebuilds full text
  from deltas.
- test_messages_mode_tool_message: ToolMessage delivered via messages
  mode is not duplicated by the values-snapshot synthesis path.

The stream() docstring now documents why this client does not reuse
Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw
LangChain objects vs serialized dicts, single caller vs HTTP fan-out).

Fixes #1969

* refactor(backend): simplify DeerFlowClient streaming helpers (#1969)

Post-review cleanup for the token-level streaming fix. No behavior
change for correct inputs; one efficiency regression fixed.

Fix: chat() O(n²) accumulator
-----------------------------
`chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`,
which is O(n) per concat → O(n²) total over a streamed response. At
~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks
it costs roughly 100-300 ms of pure copying. Switched to
`dict[str, list[str]]` + `"".join()` once at return.

Cleanup
-------
- Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`,
  and `_tool_message_event` static helpers. The messages-mode and
  values-mode branches previously repeated four inline dict literals each;
  they now call the same builders.
- `StreamEvent.type` is now typed as `Literal["values", "messages-tuple",
  "custom", "end"]` via a `StreamEventType` alias. Makes the closed set
  explicit and catches typos at type-check time.
- Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`,
  `.tool_calls`, `.id` all have default values on the base class, so the
  `getattr(..., None)` fallbacks were dead code. Removed from the hot
  path.
- `_account_usage` parameter type loosened to `Any` so that LangChain's
  `UsageMetadata` TypedDict is accepted under strict type checking.
- Trimmed narrating comments on `seen_ids` / `streamed_ids` / the
  values-synthesis skip block; kept the non-obvious ones that document
  the cross-mode dedup invariant.

Net diff: -15 lines. All 132 unit tests + harness boundary test still
pass; ruff check and ruff format pass.

* docs(backend): add STREAMING.md design note (#1969)

Dedicated design document for the token-level streaming architecture,
prompted by the bug investigation in #1969.

Contents:
- Why two parallel streaming paths exist (Gateway HTTP/async vs
  DeerFlowClient sync/in-process) and why they cannot be merged.
- LangGraph's three-layer mode naming (Graph "messages" vs Platform
  SDK "messages-tuple" vs HTTP SSE) and why a shared string constant
  would be harmful.
- Gateway path: run_agent + StreamBridge + sse_consumer with a
  sequence diagram.
- DeerFlowClient path: sync generator + direct yield, delta semantics,
  chat() accumulator.
- Why the three id sets (seen_ids / streamed_ids / counted_usage_ids)
  each carry an independent invariant and cannot be collapsed.
- End-to-end sequence for a real conversation turn.
- Lessons from #1969: why mock-based tests missed the bug, why
  BPE subword boundaries in live output are the strongest
  correctness signal, and the regression test that locks it in.
- Source code location index.

Also:
- Link from backend/CLAUDE.md Embedded Client section.
- Link from backend/docs/README.md under Feature Documentation.

* test(backend): add refactor regression guards for stream() (#1969)

Three new tests in TestStream that lock the contract introduced by
PR #1974 so any future refactor (sync->async migration, sharing a
core with Gateway's run_agent, dedup strategy change) cannot
silently change behavior.

- test_dedup_requires_messages_before_values_invariant: canary that
  documents the order-dependence of cross-mode dedup. streamed_ids
  is populated only by the messages branch, so values-before-messages
  for the same id produces duplicate AI text events. Real LangGraph
  never inverts this order, but a refactor that does (or that makes
  dedup idempotent) must update this test deliberately.

- test_messages_mode_golden_event_sequence: locks the *exact* event
  sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1
  end) for a canonical streaming turn. List equality gives a clear
  diff on any drift in order, type, or payload shape.

- test_chat_accumulates_in_linear_time: perf canary for the O(n^2)
  fix in commit 1f11ba10. 10,000 single-char chunks must accumulate
  in under 1s; the threshold is wide enough to pass on slow CI but
  tight enough to fail if buffer = buffer + delta is restored.

All three tests pass alongside the existing 12 TestStream tests
(15/15). ruff check + ruff format clean.

* docs(backend): clarify stream() docstring on JSON serialization (#1969)

Replace the misleading "raw LangChain objects (AIMessage,
usage_metadata as dataclasses), not dicts" claim in the
"Why not reuse Gateway's run_agent?" section. The implementation
already yields plain Python dicts (StreamEvent.data is dict, and
usage_metadata is a TypedDict), so the original wording suggested
a richer return type than the API actually delivers.

The corrected wording focuses on what is actually true and
relevant: this client skips the JSON/SSE serialization layer that
Gateway adds for HTTP wire transmission, and yields stream event
payloads directly as Python data structures.

Addresses Copilot review feedback on PR #1974.

* test(backend): document none-id messages dedup limitation (#1969)

Add test_none_id_chunks_produce_duplicates_known_limitation to
TestStream that explicitly documents and asserts the current
behavior when an LLM provider emits AIMessageChunk with id=None
(vLLM, certain custom backends).

The cross-mode dedup machinery cannot record a None id in
streamed_ids (guarded by ``if msg_id:``), so the values snapshot's
reassembled AIMessage with a real id falls through and synthesizes
a duplicate AI text event. The test asserts len == 2 and locks
this as a known limitation rather than silently letting future
contributors hit it without context.

Why this is documented rather than fixed:
* Falling back to ``metadata.get("id")`` does not help — LangGraph's
  messages-mode metadata never carries the message id.
* Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the
  values snapshot uses the same fallback, which it does not.
* A real fix requires provider cooperation (always emit chunk ids)
  or content-based dedup (false-positive risk), neither of which
  belongs in this PR.

If a real fix lands, replace this test with a positive assertion
that dedup works for None-id chunks.

Addresses Copilot review feedback on PR #1974 (client.py:515).

* fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942)

- Fix `font-norma` typo to `font-normal` in message-list subtask count
- Fix dark mode `--border` using reddish hue (22.216) instead of neutral
- Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground`
- Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground`
- Add missing `font-sans` to welcome description `<pre>` for consistency
- Make case-study-section padding responsive (`px-4 md:px-20`)

Closes #1940

* docs: clarify deployment sizing guidance (#1963)

* fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960)

After history.replaceState updates the URL from /chats/new to
/chats/{UUID}, Next.js useParams does not update because replaceState
bypasses the router. The useEffect in useThreadChat would then set
threadIdFromPath ('new') as the threadId, causing the LangGraph SDK
to call POST /threads/new/history which returns HTTP 422 (Invalid
thread ID: must be a UUID).

This fix adds a guard to skip the threadId update when
threadIdFromPath is the literal string 'new', preserving the
already-correct UUID that was set when the thread was created.

* fix(frontend): avoid using route new as thread id (#1967)

Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>

* Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965)

* Fix event loop conflict in SubagentExecutor.execute()

When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.

This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.

Fixes: sub-task cards not appearing in Ultra mode when using async parent agents

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(subagent): harden isolated event loop execution

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(backend): remove dead getattr in _tool_message_event

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com>
Co-authored-by: 13ernkastel <LennonCMJ@live.com>
Co-authored-by: siwuai <458372151@qq.com>
Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com>
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
This commit is contained in:
greatmengqi
2026-04-10 18:16:38 +08:00
committed by GitHub
parent 654354c624
commit b1aabe88b8
5 changed files with 917 additions and 56 deletions
+368 -2
View File
@@ -10,7 +10,7 @@ from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage # noqa: F401
from langchain_core.messages import AIMessage, AIMessageChunk, HumanMessage, SystemMessage, ToolMessage # noqa: F401
from app.gateway.routers.mcp import McpConfigResponse
from app.gateway.routers.memory import MemoryConfigResponse, MemoryStatusResponse
@@ -225,7 +225,9 @@ class TestStream:
agent.stream.assert_called_once()
call_kwargs = agent.stream.call_args.kwargs
assert call_kwargs["stream_mode"] == ["values", "custom"]
# ``messages`` enables token-level streaming of AI text deltas;
# see DeerFlowClient.stream() docstring and GitHub issue #1969.
assert call_kwargs["stream_mode"] == ["values", "messages", "custom"]
assert events[0].type == "custom"
assert events[0].data == {"type": "task_started", "task_id": "task-1"}
@@ -351,6 +353,123 @@ class TestStream:
# Should not raise; end event proves it completed
assert events[-1].type == "end"
def test_messages_mode_emits_token_deltas(self, client):
"""stream() forwards LangGraph ``messages`` mode chunks as delta events.
Regression for bytedance/deer-flow#1969 — before the fix the client
only subscribed to ``values`` mode, so LLM output was delivered as
a single cumulative dump after each graph node finished instead of
token-by-token deltas as the model generated them.
"""
# Three AI chunks sharing the same id, followed by a terminal
# values snapshot with the fully assembled message — this matches
# the shape LangGraph emits when ``stream_mode`` includes both
# ``messages`` and ``values``.
assembled = AIMessage(content="Hel lo world!", id="ai-1", usage_metadata={"input_tokens": 3, "output_tokens": 4, "total_tokens": 7})
agent = MagicMock()
agent.stream.return_value = iter(
[
("messages", (AIMessageChunk(content="Hel", id="ai-1"), {})),
("messages", (AIMessageChunk(content=" lo ", id="ai-1"), {})),
(
"messages",
(
AIMessageChunk(
content="world!",
id="ai-1",
usage_metadata={"input_tokens": 3, "output_tokens": 4, "total_tokens": 7},
),
{},
),
),
("values", {"messages": [HumanMessage(content="hi", id="h-1"), assembled]}),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-stream"))
# Three delta messages-tuple events, all with the same id, each
# carrying only its own delta (not cumulative).
ai_text_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "ai" and e.data.get("content")]
assert [e.data["content"] for e in ai_text_events] == ["Hel", " lo ", "world!"]
assert all(e.data["id"] == "ai-1" for e in ai_text_events)
# The values snapshot MUST NOT re-synthesize an AI text event for
# the already-streamed id (otherwise consumers see duplicated text).
assert len(ai_text_events) == 3
# Usage metadata attached only to the chunk that actually carried
# it, and counted into cumulative usage exactly once (the values
# snapshot's duplicate usage on the assembled AIMessage must not
# be double-counted).
events_with_usage = [e for e in ai_text_events if "usage_metadata" in e.data]
assert len(events_with_usage) == 1
assert events_with_usage[0].data["usage_metadata"] == {"input_tokens": 3, "output_tokens": 4, "total_tokens": 7}
end_event = events[-1]
assert end_event.type == "end"
assert end_event.data["usage"] == {"input_tokens": 3, "output_tokens": 4, "total_tokens": 7}
# The values snapshot itself is still emitted.
assert any(e.type == "values" for e in events)
# stream_mode includes ``messages`` — the whole point of this fix.
call_kwargs = agent.stream.call_args.kwargs
assert "messages" in call_kwargs["stream_mode"]
def test_chat_accumulates_streamed_deltas(self, client):
"""chat() concatenates per-id deltas from messages mode."""
agent = MagicMock()
agent.stream.return_value = iter(
[
("messages", (AIMessageChunk(content="Hel", id="ai-1"), {})),
("messages", (AIMessageChunk(content="lo ", id="ai-1"), {})),
("messages", (AIMessageChunk(content="world!", id="ai-1"), {})),
("values", {"messages": [HumanMessage(content="hi", id="h-1"), AIMessage(content="Hello world!", id="ai-1")]}),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
result = client.chat("hi", thread_id="t-chat-stream")
assert result == "Hello world!"
def test_messages_mode_tool_message(self, client):
"""stream() forwards ToolMessage chunks from messages mode."""
agent = MagicMock()
agent.stream.return_value = iter(
[
(
"messages",
(
ToolMessage(content="file.txt", id="tm-1", tool_call_id="tc-1", name="bash"),
{},
),
),
("values", {"messages": [HumanMessage(content="ls", id="h-1"), ToolMessage(content="file.txt", id="tm-1", tool_call_id="tc-1", name="bash")]}),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("ls", thread_id="t-tool-stream"))
tool_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "tool"]
# The tool result must be delivered exactly once (from messages
# mode), not duplicated by the values-snapshot synthesis path.
assert len(tool_events) == 1
assert tool_events[0].data["content"] == "file.txt"
assert tool_events[0].data["name"] == "bash"
assert tool_events[0].data["tool_call_id"] == "tc-1"
def test_list_content_blocks(self, client):
"""stream() handles AIMessage with list-of-blocks content."""
ai = AIMessage(
@@ -373,6 +492,253 @@ class TestStream:
assert len(msg_events) == 1
assert msg_events[0].data["content"] == "result"
# ------------------------------------------------------------------
# Refactor regression guards (PR #1974 follow-up safety)
#
# The three tests below are not bug-fix tests — they exist to lock
# the *exact* contract of stream() so a future refactor (e.g. moving
# to ``agent.astream()``, sharing a core with Gateway's run_agent,
# changing the dedup strategy) cannot silently change behavior.
# ------------------------------------------------------------------
def test_dedup_requires_messages_before_values_invariant(self, client):
"""Canary: locks the order-dependence of cross-mode dedup.
``streamed_ids`` is populated only by the ``messages`` branch.
If a ``values`` snapshot arrives BEFORE its corresponding
``messages`` chunks for the same id, the values path falls
through and synthesizes its own AI text event, then the
messages chunk emits another delta — consumers see the same
id twice.
Under normal LangGraph operation this never happens (messages
chunks are emitted during LLM streaming, the values snapshot
after the node completes), so the implicit invariant is safe
in production. This test exists as a tripwire for refactors
that switch to ``agent.astream()`` or share a core with
Gateway: if the ordering ever changes, this test fails and
forces the refactor to either (a) preserve the ordering or
(b) deliberately re-baseline to a stronger order-independent
dedup contract — and document the new contract here.
"""
agent = MagicMock()
agent.stream.return_value = iter(
[
# values arrives FIRST — streamed_ids still empty.
("values", {"messages": [HumanMessage(content="hi", id="h-1"), AIMessage(content="Hello", id="ai-1")]}),
# messages chunk for the same id arrives SECOND.
("messages", (AIMessageChunk(content="Hello", id="ai-1"), {})),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-order-canary"))
ai_text_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "ai" and e.data.get("content")]
# Current behavior: 2 events (values synthesis + messages delta).
# If a refactor makes dedup order-independent, this becomes 1 —
# update the assertion AND the docstring above to record the
# new contract, do not silently fix this number.
assert len(ai_text_events) == 2
assert all(e.data["id"] == "ai-1" for e in ai_text_events)
assert [e.data["content"] for e in ai_text_events] == ["Hello", "Hello"]
def test_messages_mode_golden_event_sequence(self, client):
"""Locks the **exact** event sequence for a canonical streaming turn.
This is a strong regression guard: any future refactor that
changes the order, type, or shape of emitted events fails this
test with a clear list-equality diff, forcing either a
preserved sequence or a deliberate re-baseline.
Input shape:
messages chunk 1 — text "Hel", no usage
messages chunk 2 — text "lo", with cumulative usage
values snapshot — assembled AIMessage with same usage
Locked behavior:
* Two messages-tuple AI text events (one per chunk), each
carrying ONLY its own delta — not cumulative.
* ``usage_metadata`` attached only to the chunk that
delivered it (not the first chunk).
* The values event is still emitted, but its embedded
``messages`` list is the *serialized* form — no
synthesized messages-tuple events for the already-
streamed id.
* ``end`` event carries cumulative usage counted exactly
once across both modes.
"""
# Inline the usage literal at construction sites so Pyright can
# narrow ``dict[str, int]`` to ``UsageMetadata`` (TypedDict
# narrowing only works on literals, not on bound variables).
# The local ``usage`` is reused only for assertion comparisons
# below, where structural dict equality is sufficient.
usage = {"input_tokens": 3, "output_tokens": 2, "total_tokens": 5}
agent = MagicMock()
agent.stream.return_value = iter(
[
("messages", (AIMessageChunk(content="Hel", id="ai-1"), {})),
("messages", (AIMessageChunk(content="lo", id="ai-1", usage_metadata={"input_tokens": 3, "output_tokens": 2, "total_tokens": 5}), {})),
(
"values",
{
"messages": [
HumanMessage(content="hi", id="h-1"),
AIMessage(content="Hello", id="ai-1", usage_metadata={"input_tokens": 3, "output_tokens": 2, "total_tokens": 5}),
]
},
),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-golden"))
actual = [(e.type, e.data) for e in events]
expected = [
("messages-tuple", {"type": "ai", "content": "Hel", "id": "ai-1"}),
("messages-tuple", {"type": "ai", "content": "lo", "id": "ai-1", "usage_metadata": usage}),
(
"values",
{
"title": None,
"messages": [
{"type": "human", "content": "hi", "id": "h-1"},
{"type": "ai", "content": "Hello", "id": "ai-1", "usage_metadata": usage},
],
"artifacts": [],
},
),
("end", {"usage": usage}),
]
assert actual == expected
def test_chat_accumulates_in_linear_time(self, client):
"""``chat()`` must use a non-quadratic accumulation strategy.
PR #1974 commit 2 replaced ``buffer = buffer + delta`` with
``list[str].append`` + ``"".join`` to fix an O(n²) regression
introduced in commit 1. This test guards against a future
refactor accidentally restoring the quadratic path.
Threshold rationale (10,000 single-char chunks, 1 second):
* Current O(n) implementation: ~50-200 ms total, including
all mock + event yield overhead.
* O(n²) regression at n=10,000: chat accumulation alone
becomes ~500 ms-2 s (50 M character copies), reliably
over the bound on any reasonable CI.
If this test ever flakes on slow CI, do NOT raise the threshold
blindly — first confirm the implementation still uses
``"".join``, then consider whether the test should move to a
benchmark suite that excludes mock overhead.
"""
import time
n = 10_000
chunks: list = [("messages", (AIMessageChunk(content="x", id="ai-1"), {})) for _ in range(n)]
chunks.append(
(
"values",
{
"messages": [
HumanMessage(content="go", id="h-1"),
AIMessage(content="x" * n, id="ai-1"),
]
},
)
)
agent = MagicMock()
agent.stream.return_value = iter(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
start = time.monotonic()
result = client.chat("go", thread_id="t-perf")
elapsed = time.monotonic() - start
assert result == "x" * n
assert elapsed < 1.0, f"chat() took {elapsed:.3f}s for {n} chunks — possible O(n^2) regression (see PR #1974 commit 2 for the original fix)"
def test_none_id_chunks_produce_duplicates_known_limitation(self, client):
"""Documents a known dedup limitation: ``messages`` chunks with ``id=None``.
Some LLM providers (vLLM, certain custom backends) emit
``AIMessageChunk`` instances without an ``id``. In that case
the cross-mode dedup machinery cannot record the chunk in
``streamed_ids`` (the implementation guards on ``if msg_id``
before adding), and a subsequent ``values`` snapshot whose
reassembled ``AIMessage`` carries a real id will fall through
the dedup check and synthesize a second AI text event for the
same logical message — consumers see duplicated text.
Why this is documented rather than fixed
----------------------------------------
Falling back to ``metadata.get("id")`` does **not** help:
LangGraph's messages-mode metadata never carries the message
id (it carries ``langgraph_node`` / ``langgraph_step`` /
``checkpoint_ns`` / ``tags`` etc.). Synthesizing a fallback
like ``f"_synth_{id(msg_chunk)}"`` only helps if the values
snapshot uses the same fallback, which it does not. A real
fix requires either provider cooperation (always emit chunk
ids — out of scope for this PR) or content-based dedup (risks
false positives for two distinct short messages with identical
text).
This test makes the limitation **explicit and discoverable**
so a future contributor debugging "duplicate text in vLLM
streaming" finds the answer immediately. If a real fix lands,
replace this test with a positive assertion that dedup works
for the None-id case.
See PR #1974 Copilot review comment on ``client.py:515``.
"""
agent = MagicMock()
agent.stream.return_value = iter(
[
# Realistic shape: chunk has no id (provider didn't set one),
# values snapshot's reassembled AIMessage has a fresh id
# assigned somewhere downstream (langgraph or middleware).
("messages", (AIMessageChunk(content="Hello", id=None), {})),
(
"values",
{
"messages": [
HumanMessage(content="hi", id="h-1"),
AIMessage(content="Hello", id="ai-1"),
]
},
),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-none-id-limitation"))
ai_text_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "ai" and e.data.get("content")]
# KNOWN LIMITATION: 2 events for the same logical message.
# 1) from messages chunk (id=None, NOT added to streamed_ids
# because of ``if msg_id:`` guard at client.py line ~522)
# 2) from values-snapshot synthesis (ai-1 not in streamed_ids,
# so the skip-branch at line ~549 doesn't trigger)
# If this becomes 1, someone fixed the limitation — update this
# test to a positive assertion and document the fix.
assert len(ai_text_events) == 2
assert ai_text_events[0].data["id"] is None
assert ai_text_events[1].data["id"] == "ai-1"
assert all(e.data["content"] == "Hello" for e in ai_text_events)
class TestChat:
def test_returns_last_message(self, client):