fix(gateway): preserve message additional_kwargs in normalize_input (#3132) (#3136)

* fix(gateway): preserve message additional_kwargs in normalize_input (#3132)

The gateway's hand-rolled dict→message coercion only forwarded `content`
and collapsed every role to `HumanMessage`, silently dropping the
frontend's `additional_kwargs.files` payload (along with `id`, `name`,
and ai/system/tool roles).

Effect on issue #3132:

- `UploadsMiddleware` saw no `files` on the last human message, so the
  just-uploaded file got bucketed under "previous messages" while the
  current turn was reported as `(empty)`.
- The persisted human message had no `files`, so the attachment chip on
  the message disappeared the moment the optimistic UI cleared.

Delegate the conversion to `langchain_core.messages.utils.convert_to_messages`
so `additional_kwargs`, `id`, `name`, and non-human roles round-trip
unchanged.

* fix(gateway): convert malformed-message ValueError into HTTP 400

normalize_input now sits at the request boundary, so a malformed
input.messages[N] dict (missing role/type/content, unsupported role,
etc.) should surface as 400 with the offending index — not bubble out
of FastAPI as 500.

Per Copilot review on #3136.
This commit is contained in:
Xinmin Zeng
2026-05-21 21:06:19 +08:00
committed by GitHub
parent 1c5c585741
commit 9c03a71a07
2 changed files with 115 additions and 12 deletions
+27 -12
View File
@@ -15,7 +15,8 @@ from collections.abc import Mapping
from typing import Any
from fastapi import HTTPException, Request
from langchain_core.messages import HumanMessage
from langchain_core.messages import BaseMessage
from langchain_core.messages.utils import convert_to_messages
from app.gateway.deps import get_run_context, get_run_manager, get_stream_bridge
from app.gateway.utils import sanitize_log_param
@@ -76,21 +77,35 @@ def normalize_stream_modes(raw: list[str] | str | None) -> list[str]:
def normalize_input(raw_input: dict[str, Any] | None) -> dict[str, Any]:
"""Convert LangGraph Platform input format to LangChain state dict."""
"""Convert LangGraph Platform input format to LangChain state dict.
Delegates dict→message coercion to ``langchain_core.messages.utils.convert_to_messages``
so that ``additional_kwargs`` (e.g. uploaded-file metadata — gh #3132), ``id``,
``name``, and non-human roles (ai/system/tool) survive unchanged. An earlier
hand-rolled version only forwarded ``content`` and collapsed every role to
``HumanMessage``, which silently stripped frontend-supplied attachments.
Malformed message dicts (missing ``role``/``type``/``content``, unsupported
role, etc.) raise ``HTTPException(400)`` with the offending index, instead
of bubbling up as a 500. The gateway is a system boundary, so per-entry
validation errors are the right shape for clients to retry against.
"""
if raw_input is None:
return {}
messages = raw_input.get("messages")
if messages and isinstance(messages, list):
converted = []
for msg in messages:
if isinstance(msg, dict):
role = msg.get("role", msg.get("type", "user"))
content = msg.get("content", "")
if role in ("user", "human"):
converted.append(HumanMessage(content=content))
else:
# TODO: handle other message types (system, ai, tool)
converted.append(HumanMessage(content=content))
converted: list[Any] = []
for index, msg in enumerate(messages):
if isinstance(msg, BaseMessage):
converted.append(msg)
elif isinstance(msg, dict):
try:
converted.extend(convert_to_messages([msg]))
except (ValueError, TypeError, NotImplementedError) as exc:
raise HTTPException(
status_code=400,
detail=f"Invalid message at input.messages[{index}]: {exc}",
) from exc
else:
converted.append(msg)
return {**raw_input, "messages": converted}