feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)

* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold Introduce a unified database configuration (DatabaseConfig) that controls both the LangGraph checkpointer and the DeerFlow application persistence layer from a single `database:` config section. New modules: - deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends - deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton - deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation Gateway integration initializes/tears down the persistence engine in the existing langgraph_runtime() context manager. Legacy checkpointer config is preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add RunEventStore ABC + MemoryRunEventStore Phase 2-A prerequisite for event storage: adds the unified run event stream interface (RunEventStore) with an in-memory implementation, RunEventsConfig, gateway integration, and comprehensive tests (27 cases). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints Phase 2-B: run persistence + event storage + token tracking. - ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow - RunRepository implements RunStore ABC via SQLAlchemy ORM - ThreadMetaRepository with owner access control - DbRunEventStore with trace content truncation and cursor pagination - JsonlRunEventStore with per-run files and seq recovery from disk - RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events, accumulates token usage by caller type, buffers and flushes to store - RunManager now accepts optional RunStore for persistent backing - Worker creates RunJournal, writes human_message, injects callbacks - Gateway deps use factory functions (RunRepository when DB available) - New endpoints: messages, run messages, run events, token-usage - ThreadCreateRequest gains assistant_id field - 92 tests pass (33 new), zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add user feedback + follow-up run association Phase 2-C: feedback and follow-up tracking. - FeedbackRow ORM model (rating +1/-1, optional message_id, comment) - FeedbackRepository with CRUD, list_by_run/thread, aggregate stats - Feedback API endpoints: create, list, stats, delete - follow_up_to_run_id in RunCreateRequest (explicit or auto-detected from latest successful run on the thread) - Worker writes follow_up_to_run_id into human_message event metadata - Gateway deps: feedback_repo factory + getter - 17 new tests (14 FeedbackRepository + 3 follow-up association) - 109 total tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config - config.example.yaml: deprecate standalone checkpointer section, activate unified database:sqlite as default (drives both checkpointer + app data) - New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage including check_access owner logic, list_by_owner pagination - Extended test_run_repository.py (+4 tests) — completion preserves fields, list ordering desc, limit, owner_none returns all - Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false, middleware no ai_message, unknown caller tokens, convenience fields, tool_error, non-summarization custom event - Extended test_run_event_store.py (+7 tests) — DB batch seq continuity, make_run_event_store factory (memory/db/jsonl/fallback/unknown) - Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists, follow-up metadata, summarization in history, full DB-backed lifecycle - Fixed DB integration test to use proper fake objects (not MagicMock) for JSON-serializable metadata - 157 total Phase 2 tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * config: move default sqlite_dir to .deer-flow/data Keep SQLite databases alongside other DeerFlow-managed data (threads, memory) under the .deer-flow/ directory instead of a top-level ./data folder. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now() - Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM models. Add json_serializer=json.dumps(ensure_ascii=False) to all create_async_engine calls so non-ASCII text (Chinese etc.) is stored as-is in both SQLite and Postgres. - Change ORM datetime defaults from datetime.now(UTC) to datetime.now(), remove UTC imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): simplify deps.py with getter factory + inline repos - Replace 6 identical getter functions with _require() factory. - Inline 3 _make_*_repo() factories into langgraph_runtime(), call get_session_factory() once instead of 3 times. - Add thread_meta upsert in start_run (services.py). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(docker): add UV_EXTRAS build arg for optional dependencies Support installing optional dependency groups (e.g. postgres) at Docker build time via UV_EXTRAS build arg: UV_EXTRAS=postgres docker compose build Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(journal): fix flush, token tracking, and consolidate tests RunJournal fixes: - _flush_sync: retain events in buffer when no event loop instead of dropping them; worker's finally block flushes via async flush(). - on_llm_end: add tool_calls filter and caller=="lead_agent" guard for ai_message events; mark message IDs for dedup with record_llm_usage. - worker.py: persist completion data (tokens, message count) to RunStore in finally block. Model factory: - Auto-inject stream_usage=True for BaseChatOpenAI subclasses with custom api_base, so usage_metadata is populated in streaming responses. Test consolidation: - Delete test_phase2b_integration.py (redundant with existing tests). - Move DB-backed lifecycle test into test_run_journal.py. - Add tests for stream_usage injection in test_model_factory.py. - Clean up executor/task_tool dead journal references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): widen content type to str|dict in all store backends Allow event content to be a dict (for structured OpenAI-format messages) in addition to plain strings. Dict values are JSON-serialized for the DB backend and deserialized on read; memory and JSONL backends handle dicts natively. Trace truncation now serializes dicts to JSON before measuring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(events): use metadata flag instead of heuristic for dict content detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(converters): add LangChain-to-OpenAI message format converters Pure functions langchain_to_openai_message, langchain_to_openai_completion, langchain_messages_to_openai, and _infer_finish_reason for converting LangChain BaseMessage objects to OpenAI Chat Completions format, used by RunJournal for event storage. 15 unit tests added. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(converters): handle empty list content as null, clean up test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): human_message content uses OpenAI user message format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): ai_message uses OpenAI format, add ai_tool_call message event - ai_message content now uses {"role": "assistant", "content": "..."} format - New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls - ai_tool_call uses langchain_to_openai_message converter for consistent format - Both events include finish_reason in metadata ("stop" or "tool_calls") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add tool_result message event with OpenAI tool message format Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end, then emit a tool_result message event (role=tool, tool_call_id, content) after each successful tool completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): summary content uses OpenAI system message format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format Add on_chat_model_start to capture structured prompt messages as llm_request events. Replace llm_end trace events with llm_response using OpenAI Chat Completions format. Track llm_call_index to pair request/response events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add record_middleware method for middleware trace events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(events): add full run sequence integration test for OpenAI content format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): align message events with checkpoint format and add middleware tag injection - Message events (ai_message, ai_tool_call, tool_result, human_message) now use BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages - on_tool_end extracts tool_call_id/name/status from ToolMessage objects - on_tool_error now emits tool_result message events with error status - record_middleware uses middleware:{tag} event_type and middleware category - Summarization custom events use middleware:summarize category - TitleMiddleware injects middleware:title tag via get_config() inheritance - SummarizationMiddleware model bound with middleware:summarize tag - Worker writes human_message using HumanMessage.model_dump() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): switch search endpoint to threads_meta table and sync title - POST /api/threads/search now queries threads_meta table directly, removing the two-phase Store + Checkpointer scan approach - Add ThreadMetaRepository.search() with metadata/status filters - Add ThreadMetaRepository.update_display_name() for title sync - Worker syncs checkpoint title to threads_meta.display_name on run completion - Map display_name to values.title in search response for API compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): history endpoint reads messages from event store - POST /api/threads/{thread_id}/history now combines two data sources: checkpointer for checkpoint_id, metadata, title, thread_data; event store for messages (complete history, not truncated by summarization) - Strip internal LangGraph metadata keys from response - Remove full channel_values serialization in favor of selective fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate optional-dependencies header in pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(middleware): pass tagged config to TitleMiddleware ainvoke call Without the config, the middleware:title tag was not injected, causing the LLM response to be recorded as a lead_agent ai_message in run_events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflict in .env.example Keep both DATABASE_URL (from persistence-scaffold) and WECOM credentials (from main) after the merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address review feedback on PR #1851 - Fix naive datetime.now() → datetime.now(UTC) in all ORM models - Fix seq race condition in DbRunEventStore.put() with FOR UPDATE and UNIQUE(thread_id, seq) constraint - Encapsulate _store access in RunManager.update_run_completion() - Deduplicate _store.put() logic in RunManager via _persist_to_store() - Add update_run_completion to RunStore ABC + MemoryRunStore - Wire follow_up_to_run_id through the full create path - Add error recovery to RunJournal._flush_sync() lost-event scenario - Add migration note for search_threads breaking change - Fix test_checkpointer_none_fix mock to set database=None Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update uv.lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality Bug fixes: - Sanitize log params to prevent log injection (CodeQL) - Reset threads_meta.status to idle/error when run completes - Attach messages only to latest checkpoint in /history response - Write threads_meta on POST /threads so new threads appear in search Lint fixes: - Remove unused imports (journal.py, migrations/env.py, test_converters.py) - Convert lambda to named function (engine.py, Ruff E731) - Remove unused logger definitions in repos (Ruff F841) - Add logging to JSONL decode errors and empty except blocks - Separate assert side-effects in tests (CodeQL) - Remove unused local variables in tests (Ruff F841) - Fix max_trace_content truncation to use byte length, not char length Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: apply ruff format to persistence and runtime files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding 'Statement has no effect' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * refactor(runtime): introduce RunContext to reduce run_agent parameter bloat Extract checkpointer, store, event_store, run_events_config, thread_meta_repo, and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context() in deps.py to build the base context from app.state singletons. start_run() uses dataclasses.replace() to enrich per-run fields before passing ctx to run_agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): move sanitize_log_param to app/gateway/utils.py Extract the log-injection sanitizer from routers/threads.py into a shared utils module and rename to sanitize_log_param (public API). Eliminates the reverse service → router import in services.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use SQL aggregation for feedback stats and thread token usage Replace Python-side counting in FeedbackRepository.aggregate_by_run with a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread abstract method with SQL GROUP BY implementation in RunRepository and Python fallback in MemoryRunStore. Simplify the thread_token_usage endpoint to delegate to the new method, eliminating the limit=10000 truncation risk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: annotate DbRunEventStore.put() as low-frequency path Add docstring clarifying that put() opens a per-call transaction with FOR UPDATE and should only be used for infrequent writes (currently just the initial human_message event). High-throughput callers should use put_batch() instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(threads): fall back to Store search when ThreadMetaRepository is unavailable When database.backend=memory (default) or no SQL session factory is configured, search_threads now queries the LangGraph Store instead of returning 503. Returns empty list if neither Store nor repo is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata Add ThreadMetaStore abstract base class with create/get/search/update/delete interface. ThreadMetaRepository (SQL) now inherits from it. New MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments. deps.py now always provides a non-None thread_meta_repo, eliminating all `if thread_meta_repo is not None` guards in services.py, worker.py, and routers/threads.py. search_threads no longer needs a Store fallback branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(history): read messages from checkpointer instead of RunEventStore The /history endpoint now reads messages directly from the checkpointer's channel_values (the authoritative source) instead of querying RunEventStore.list_messages(). The RunEventStore API is preserved for other consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address new Copilot review comments - feedback.py: validate thread_id/run_id before deleting feedback - jsonl.py: add path traversal protection with ID validation - run_repo.py: parse `before` to datetime for PostgreSQL compat - thread_meta_repo.py: fix pagination when metadata filter is active - database_config.py: use resolve_path for sqlite_dir consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Implement skill self-evolution and skill_manage flow (#1874) * chore: ignore .worktrees directory * Add skill_manage self-evolution flow * Fix CI regressions for skill_manage * Address PR review feedback for skill evolution * fix(skill-evolution): preserve history on delete * fix(skill-evolution): tighten scanner fallbacks * docs: add skill_manage e2e evidence screenshot * fix(skill-manage): avoid blocking fs ops in session runtime --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> * fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir resolve_path() resolves relative to Paths.base_dir (.deer-flow), which double-nested the path to .deer-flow/.deer-flow/data/app.db. Use Path.resolve() (CWD-relative) instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Feature/feishu receive file (#1608) * feat(feishu): add channel file materialization hook for inbound messages - Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op. - Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text. - Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files. - No impact on Slack/Telegram or other channels (they inherit the default no-op). * style(backend): format code with ruff for lint compliance - Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format` - Ensured both files conform to project linting standards - Fixes CI lint check failures caused by code style issues * fix(feishu): handle file write operation asynchronously to prevent blocking * fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code * test(feishu): add tests for receive_file method and placeholder replacement * fix(manager): remove unnecessary type casting for channel retrieval * fix(feishu): update logging messages to reflect resource handling instead of image * fix(feishu): sanitize filename by replacing invalid characters in file uploads * fix(feishu): improve filename sanitization and reorder image key handling in message processing * fix(feishu): add thread lock to prevent filename conflicts during file downloads * fix(test): correct bad merge in test_feishu_parser.py * chore: run ruff and apply formatting cleanup fix(feishu): preserve rich-text attachment order and improve fallback filename handling * fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915) Two production docker-compose.yaml bugs prevent `make up` from working: 1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH environment overrides. Added in fb2d99f (#1836) but accidentally reverted by ca2fb95 (#1847). Without them, gateway reads host paths from .env via env_file, causing FileNotFoundError inside the container. 2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default). Empty $${allow_blocking} inserts a bare space between flags, causing ' --no-reload' to be parsed as unexpected extra argument. Fix by building args string first and conditionally appending --allow-blocking. Co-authored-by: cooper <cooperfu@tencent.com> * fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904) * fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities Fix `<button>` inside `<a>` invalid HTML in artifact components and add missing `noopener,noreferrer` to `window.open` calls to prevent reverse tabnabbing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(frontend): address Copilot review on tabnabbing and double-tab-open Remove redundant parent onClick on web_fetch ChainOfThoughtStep to prevent opening two tabs on link click, and explicitly null out window.opener after window.open() for defensive tabnabbing hardening. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * refactor(persistence): organize entities into per-entity directories Restructure the persistence layer from horizontal "models/ + repositories/" split into vertical entity-aligned directories. Each entity (thread_meta, run, feedback) now owns its ORM model, abstract interface (where applicable), and concrete implementations under a single directory with an aggregating __init__.py for one-line imports. Layout: persistence/thread_meta/{base,model,sql,memory}.py persistence/run/{model,sql}.py persistence/feedback/{model,sql}.py models/__init__.py is kept as a facade so Alembic autogenerate continues to discover all ORM tables via Base.metadata. RunEventRow remains under models/run_event.py because its storage implementation lives in runtime/events/store/db.py and has no matching repository directory. The repositories/ directory is removed entirely. All call sites in gateway/deps.py and tests are updated to import from the new entity packages, e.g.: from deerflow.persistence.thread_meta import ThreadMetaRepository from deerflow.persistence.run import RunRepository from deerflow.persistence.feedback import FeedbackRepository Full test suite passes (1690 passed, 14 skipped). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): sync thread rename and delete through ThreadMetaStore The POST /threads/{id}/state endpoint previously synced title changes only to the LangGraph Store via _store_upsert. In sqlite mode the search endpoint reads from the ThreadMetaRepository SQL table, so renames never appeared in /threads/search until the next agent run completed (worker.py syncs title from checkpoint to thread_meta in its finally block). Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem, Store, and checkpointer but left the threads_meta row orphaned in sqlite, so deleted threads kept appearing in /threads/search. Fix both endpoints by routing through the ThreadMetaStore abstraction which already has the correct sqlite/memory implementations wired up by deps.py. The rename path now calls update_display_name() and the delete path calls delete() — both work uniformly across backends. Verified end-to-end with curl in gateway mode against sqlite backend. Existing test suite (1690 passed) and focused router/repo tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): route all thread metadata access through ThreadMetaStore Following the rename/delete bug fix in PR1, migrate the remaining direct LangGraph Store reads/writes in the threads router and services to the ThreadMetaStore abstraction so that the sqlite and memory backends behave identically and the legacy dual-write paths can be removed. Migrated endpoints (threads.py): - create_thread: idempotency check + write now use thread_meta_repo.get/create instead of dual-writing the LangGraph Store and the SQL row. - get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback for legacy threads is preserved. - patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata. - delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete already covers it. Removed dead code (services.py): - _upsert_thread_in_store — redundant with the immediately following thread_meta_repo.create() call. - _sync_thread_title_after_run — worker.py's finally block already syncs the title via thread_meta_repo.update_display_name() after each run. Removed dead code (threads.py): - _store_get / _store_put / _store_upsert helpers (no remaining callers). - THREADS_NS constant. - get_store import (router no longer touches the LangGraph Store directly). New abstract method: - ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into the thread's metadata field. Implemented in both ThreadMetaRepository (SQL, read-modify-write inside one session) and MemoryThreadMetaStore. Three new unit tests cover merge / empty / nonexistent behaviour. Net change: -134 lines. Full test suite: 1693 passed, 14 skipped. Verified end-to-end with curl in gateway mode against sqlite backend (create / patch / get / rename / search / delete). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: JilongSun <965640067@qq.com> Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com> Co-authored-by: cooper <cooperfu@tencent.com> Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
2026-05-23 00:16:48 +00:00 · 2026-04-07 11:53:52 +08:00
parent 092bf13f5e
commit 185f5649dd
66 changed files with 6481 additions and 401 deletions
@@ -5,7 +5,7 @@ Re-exports the public API of :mod:`~deerflow.runtime.runs` and
 directly from ``deerflow.runtime``.
 """

-from .runs import ConflictError, DisconnectMode, RunManager, RunRecord, RunStatus, UnsupportedStrategyError, run_agent
+from .runs import ConflictError, DisconnectMode, RunContext, RunManager, RunRecord, RunStatus, UnsupportedStrategyError, run_agent
 from .serialization import serialize, serialize_channel_values, serialize_lc_object, serialize_messages_tuple
 from .store import get_store, make_store, reset_store, store_context
 from .stream_bridge import END_SENTINEL, HEARTBEAT_SENTINEL, MemoryStreamBridge, StreamBridge, StreamEvent, make_stream_bridge
@@ -14,6 +14,7 @@ __all__ = [
    # runs
    "ConflictError",
    "DisconnectMode",
+    "RunContext",
    "RunManager",
    "RunRecord",
    "RunStatus",
@@ -0,0 +1,134 @@
+"""Pure functions to convert LangChain message objects to OpenAI Chat Completions format.
+
+Used by RunJournal to build content dicts for event storage.
+"""
+
+from __future__ import annotations
+
+import json
+from typing import Any
+
+_ROLE_MAP = {
+    "human": "user",
+    "ai": "assistant",
+    "system": "system",
+    "tool": "tool",
+}
+
+
+def langchain_to_openai_message(message: Any) -> dict:
+    """Convert a single LangChain BaseMessage to an OpenAI message dict.
+
+    Handles:
+    - HumanMessage → {"role": "user", "content": "..."}
+    - AIMessage (text only) → {"role": "assistant", "content": "..."}
+    - AIMessage (with tool_calls) → {"role": "assistant", "content": null, "tool_calls": [...]}
+    - AIMessage (text + tool_calls) → both content and tool_calls present
+    - AIMessage (list content / multimodal) → content preserved as list
+    - SystemMessage → {"role": "system", "content": "..."}
+    - ToolMessage → {"role": "tool", "tool_call_id": "...", "content": "..."}
+    """
+    msg_type = getattr(message, "type", "")
+    role = _ROLE_MAP.get(msg_type, msg_type)
+    content = getattr(message, "content", "")
+
+    if role == "tool":
+        return {
+            "role": "tool",
+            "tool_call_id": getattr(message, "tool_call_id", ""),
+            "content": content,
+        }
+
+    if role == "assistant":
+        tool_calls = getattr(message, "tool_calls", None) or []
+        result: dict = {"role": "assistant"}
+
+        if tool_calls:
+            openai_tool_calls = []
+            for tc in tool_calls:
+                args = tc.get("args", {})
+                openai_tool_calls.append(
+                    {
+                        "id": tc.get("id", ""),
+                        "type": "function",
+                        "function": {
+                            "name": tc.get("name", ""),
+                            "arguments": json.dumps(args) if not isinstance(args, str) else args,
+                        },
+                    }
+                )
+            # If no text content, set content to null per OpenAI spec
+            result["content"] = content if (isinstance(content, list) and content) or (isinstance(content, str) and content) else None
+            result["tool_calls"] = openai_tool_calls
+        else:
+            result["content"] = content
+
+        return result
+
+    # user / system / unknown
+    return {"role": role, "content": content}
+
+
+def _infer_finish_reason(message: Any) -> str:
+    """Infer OpenAI finish_reason from an AIMessage.
+
+    Returns "tool_calls" if tool_calls present, else looks in
+    response_metadata.finish_reason, else returns "stop".
+    """
+    tool_calls = getattr(message, "tool_calls", None) or []
+    if tool_calls:
+        return "tool_calls"
+    resp_meta = getattr(message, "response_metadata", None) or {}
+    if isinstance(resp_meta, dict):
+        finish = resp_meta.get("finish_reason")
+        if finish:
+            return finish
+    return "stop"
+
+
+def langchain_to_openai_completion(message: Any) -> dict:
+    """Convert an AIMessage and its metadata to an OpenAI completion response dict.
+
+    Returns:
+        {
+            "id": message.id,
+            "model": message.response_metadata.get("model_name"),
+            "choices": [{"index": 0, "message": <openai_message>, "finish_reason": <inferred>}],
+            "usage": {"prompt_tokens": ..., "completion_tokens": ..., "total_tokens": ...} or None,
+        }
+    """
+    resp_meta = getattr(message, "response_metadata", None) or {}
+    model_name = resp_meta.get("model_name") if isinstance(resp_meta, dict) else None
+
+    openai_msg = langchain_to_openai_message(message)
+    finish_reason = _infer_finish_reason(message)
+
+    usage_metadata = getattr(message, "usage_metadata", None)
+    if usage_metadata is not None:
+        input_tokens = usage_metadata.get("input_tokens", 0) or 0
+        output_tokens = usage_metadata.get("output_tokens", 0) or 0
+        usage: dict | None = {
+            "prompt_tokens": input_tokens,
+            "completion_tokens": output_tokens,
+            "total_tokens": input_tokens + output_tokens,
+        }
+    else:
+        usage = None
+
+    return {
+        "id": getattr(message, "id", None),
+        "model": model_name,
+        "choices": [
+            {
+                "index": 0,
+                "message": openai_msg,
+                "finish_reason": finish_reason,
+            }
+        ],
+        "usage": usage,
+    }
+
+
+def langchain_messages_to_openai(messages: list) -> list[dict]:
+    """Convert a list of LangChain BaseMessages to OpenAI message dicts."""
+    return [langchain_to_openai_message(m) for m in messages]
@@ -0,0 +1,4 @@
+from deerflow.runtime.events.store.base import RunEventStore
+from deerflow.runtime.events.store.memory import MemoryRunEventStore
+
+__all__ = ["MemoryRunEventStore", "RunEventStore"]
@@ -0,0 +1,26 @@
+from deerflow.runtime.events.store.base import RunEventStore
+from deerflow.runtime.events.store.memory import MemoryRunEventStore
+
+
+def make_run_event_store(config=None) -> RunEventStore:
+    """Create a RunEventStore based on run_events.backend configuration."""
+    if config is None or config.backend == "memory":
+        return MemoryRunEventStore()
+    if config.backend == "db":
+        from deerflow.persistence.engine import get_session_factory
+
+        sf = get_session_factory()
+        if sf is None:
+            # database.backend=memory but run_events.backend=db -> fallback
+            return MemoryRunEventStore()
+        from deerflow.runtime.events.store.db import DbRunEventStore
+
+        return DbRunEventStore(sf, max_trace_content=config.max_trace_content)
+    if config.backend == "jsonl":
+        from deerflow.runtime.events.store.jsonl import JsonlRunEventStore
+
+        return JsonlRunEventStore()
+    raise ValueError(f"Unknown run_events backend: {config.backend!r}")
+
+
+__all__ = ["MemoryRunEventStore", "RunEventStore", "make_run_event_store"]
@@ -0,0 +1,99 @@
+"""Abstract interface for run event storage.
+
+RunEventStore is the unified storage interface for run event streams.
+Messages (frontend display) and execution traces (debugging/audit) go
+through the same interface, distinguished by the ``category`` field.
+
+Implementations:
+- MemoryRunEventStore: in-memory dict (development, tests)
+- Future: DB-backed store (SQLAlchemy ORM), JSONL file store
+"""
+
+from __future__ import annotations
+
+import abc
+
+
+class RunEventStore(abc.ABC):
+    """Run event stream storage interface.
+
+    All implementations must guarantee:
+    1. put() events are retrievable in subsequent queries
+    2. seq is strictly increasing within the same thread
+    3. list_messages() only returns category="message" events
+    4. list_events() returns all events for the specified run
+    5. Returned dicts match the RunEvent field structure
+    """
+
+    @abc.abstractmethod
+    async def put(
+        self,
+        *,
+        thread_id: str,
+        run_id: str,
+        event_type: str,
+        category: str,
+        content: str | dict = "",
+        metadata: dict | None = None,
+        created_at: str | None = None,
+    ) -> dict:
+        """Write an event, auto-assign seq, return the complete record."""
+
+    @abc.abstractmethod
+    async def put_batch(self, events: list[dict]) -> list[dict]:
+        """Batch-write events. Used by RunJournal flush buffer.
+
+        Each dict's keys match put()'s keyword arguments.
+        Returns complete records with seq assigned.
+        """
+
+    @abc.abstractmethod
+    async def list_messages(
+        self,
+        thread_id: str,
+        *,
+        limit: int = 50,
+        before_seq: int | None = None,
+        after_seq: int | None = None,
+    ) -> list[dict]:
+        """Return displayable messages (category=message) for a thread, ordered by seq ascending.
+
+        Supports bidirectional cursor pagination:
+        - before_seq: return the last ``limit`` records with seq < before_seq (ascending)
+        - after_seq: return the first ``limit`` records with seq > after_seq (ascending)
+        - neither: return the latest ``limit`` records (ascending)
+        """
+
+    @abc.abstractmethod
+    async def list_events(
+        self,
+        thread_id: str,
+        run_id: str,
+        *,
+        event_types: list[str] | None = None,
+        limit: int = 500,
+    ) -> list[dict]:
+        """Return the full event stream for a run, ordered by seq ascending.
+
+        Optionally filter by event_types.
+        """
+
+    @abc.abstractmethod
+    async def list_messages_by_run(
+        self,
+        thread_id: str,
+        run_id: str,
+    ) -> list[dict]:
+        """Return displayable messages (category=message) for a specific run, ordered by seq ascending."""
+
+    @abc.abstractmethod
+    async def count_messages(self, thread_id: str) -> int:
+        """Count displayable messages (category=message) in a thread."""
+
+    @abc.abstractmethod
+    async def delete_by_thread(self, thread_id: str) -> int:
+        """Delete all events for a thread. Return the number of deleted events."""
+
+    @abc.abstractmethod
+    async def delete_by_run(self, thread_id: str, run_id: str) -> int:
+        """Delete all events for a specific run. Return the number of deleted events."""
@@ -0,0 +1,185 @@
+"""SQLAlchemy-backed RunEventStore implementation.
+
+Persists events to the ``run_events`` table. Trace content is truncated
+at ``max_trace_content`` bytes to avoid bloating the database.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from datetime import UTC, datetime
+
+from sqlalchemy import delete, func, select
+from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
+
+from deerflow.persistence.models.run_event import RunEventRow
+from deerflow.runtime.events.store.base import RunEventStore
+
+logger = logging.getLogger(__name__)
+
+
+class DbRunEventStore(RunEventStore):
+    def __init__(self, session_factory: async_sessionmaker[AsyncSession], *, max_trace_content: int = 10240):
+        self._sf = session_factory
+        self._max_trace_content = max_trace_content
+
+    @staticmethod
+    def _row_to_dict(row: RunEventRow) -> dict:
+        d = row.to_dict()
+        d["metadata"] = d.pop("event_metadata", {})
+        val = d.get("created_at")
+        if isinstance(val, datetime):
+            d["created_at"] = val.isoformat()
+        d.pop("id", None)
+        # Restore dict content that was JSON-serialized on write
+        raw = d.get("content", "")
+        if isinstance(raw, str) and d.get("metadata", {}).get("content_is_dict"):
+            try:
+                d["content"] = json.loads(raw)
+            except (json.JSONDecodeError, ValueError):
+                # Content looked like JSON (content_is_dict flag) but failed to parse;
+                # keep the raw string as-is.
+                logger.debug("Failed to deserialize content as JSON for event seq=%s", d.get("seq"))
+        return d
+
+    def _truncate_trace(self, category: str, content: str | dict, metadata: dict | None) -> tuple[str | dict, dict]:
+        if category == "trace":
+            text = json.dumps(content, default=str, ensure_ascii=False) if isinstance(content, dict) else content
+            encoded = text.encode("utf-8")
+            if len(encoded) > self._max_trace_content:
+                # Truncate by bytes, then decode back (may cut a multi-byte char, so use errors="ignore")
+                content = encoded[: self._max_trace_content].decode("utf-8", errors="ignore")
+                metadata = {**(metadata or {}), "content_truncated": True, "original_byte_length": len(encoded)}
+        return content, metadata or {}
+
+    async def put(self, *, thread_id, run_id, event_type, category, content="", metadata=None, created_at=None):  # noqa: D401
+        """Write a single event — low-frequency path only.
+
+        This opens a dedicated transaction with a FOR UPDATE lock to
+        assign a monotonic *seq*.  For high-throughput writes use
+        :meth:`put_batch`, which acquires the lock once for the whole
+        batch.  Currently the only caller is ``worker.run_agent`` for
+        the initial ``human_message`` event (once per run).
+        """
+        content, metadata = self._truncate_trace(category, content, metadata)
+        if isinstance(content, dict):
+            db_content = json.dumps(content, default=str, ensure_ascii=False)
+            metadata = {**(metadata or {}), "content_is_dict": True}
+        else:
+            db_content = content
+        async with self._sf() as session:
+            async with session.begin():
+                # Use FOR UPDATE to serialize seq assignment within a thread.
+                # NOTE: with_for_update() on aggregates is a no-op on SQLite;
+                # the UNIQUE(thread_id, seq) constraint catches races there.
+                max_seq = await session.scalar(select(func.max(RunEventRow.seq)).where(RunEventRow.thread_id == thread_id).with_for_update())
+                seq = (max_seq or 0) + 1
+                row = RunEventRow(
+                    thread_id=thread_id,
+                    run_id=run_id,
+                    event_type=event_type,
+                    category=category,
+                    content=db_content,
+                    event_metadata=metadata,
+                    seq=seq,
+                    created_at=datetime.fromisoformat(created_at) if created_at else datetime.now(UTC),
+                )
+                session.add(row)
+            return self._row_to_dict(row)
+
+    async def put_batch(self, events):
+        if not events:
+            return []
+        async with self._sf() as session:
+            async with session.begin():
+                # Get max seq for the thread (assume all events in batch belong to same thread).
+                # NOTE: with_for_update() on aggregates is a no-op on SQLite;
+                # the UNIQUE(thread_id, seq) constraint catches races there.
+                thread_id = events[0]["thread_id"]
+                max_seq = await session.scalar(select(func.max(RunEventRow.seq)).where(RunEventRow.thread_id == thread_id).with_for_update())
+                seq = max_seq or 0
+                rows = []
+                for e in events:
+                    seq += 1
+                    content = e.get("content", "")
+                    category = e.get("category", "trace")
+                    metadata = e.get("metadata")
+                    content, metadata = self._truncate_trace(category, content, metadata)
+                    if isinstance(content, dict):
+                        db_content = json.dumps(content, default=str, ensure_ascii=False)
+                        metadata = {**(metadata or {}), "content_is_dict": True}
+                    else:
+                        db_content = content
+                    row = RunEventRow(
+                        thread_id=e["thread_id"],
+                        run_id=e["run_id"],
+                        event_type=e["event_type"],
+                        category=category,
+                        content=db_content,
+                        event_metadata=metadata,
+                        seq=seq,
+                        created_at=datetime.fromisoformat(e["created_at"]) if e.get("created_at") else datetime.now(UTC),
+                    )
+                    session.add(row)
+                    rows.append(row)
+            return [self._row_to_dict(r) for r in rows]
+
+    async def list_messages(self, thread_id, *, limit=50, before_seq=None, after_seq=None):
+        stmt = select(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.category == "message")
+        if before_seq is not None:
+            stmt = stmt.where(RunEventRow.seq < before_seq)
+        if after_seq is not None:
+            stmt = stmt.where(RunEventRow.seq > after_seq)
+
+        if after_seq is not None:
+            # Forward pagination: first `limit` records after cursor
+            stmt = stmt.order_by(RunEventRow.seq.asc()).limit(limit)
+            async with self._sf() as session:
+                result = await session.execute(stmt)
+                return [self._row_to_dict(r) for r in result.scalars()]
+        else:
+            # before_seq or default (latest): take last `limit` records, return ascending
+            stmt = stmt.order_by(RunEventRow.seq.desc()).limit(limit)
+            async with self._sf() as session:
+                result = await session.execute(stmt)
+                rows = list(result.scalars())
+                return [self._row_to_dict(r) for r in reversed(rows)]
+
+    async def list_events(self, thread_id, run_id, *, event_types=None, limit=500):
+        stmt = select(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id)
+        if event_types:
+            stmt = stmt.where(RunEventRow.event_type.in_(event_types))
+        stmt = stmt.order_by(RunEventRow.seq.asc()).limit(limit)
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def list_messages_by_run(self, thread_id, run_id):
+        stmt = select(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id, RunEventRow.category == "message").order_by(RunEventRow.seq.asc())
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            return [self._row_to_dict(r) for r in result.scalars()]
+
+    async def count_messages(self, thread_id):
+        stmt = select(func.count()).select_from(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.category == "message")
+        async with self._sf() as session:
+            return await session.scalar(stmt) or 0
+
+    async def delete_by_thread(self, thread_id):
+        async with self._sf() as session:
+            count_stmt = select(func.count()).select_from(RunEventRow).where(RunEventRow.thread_id == thread_id)
+            count = await session.scalar(count_stmt) or 0
+            if count > 0:
+                await session.execute(delete(RunEventRow).where(RunEventRow.thread_id == thread_id))
+                await session.commit()
+            return count
+
+    async def delete_by_run(self, thread_id, run_id):
+        async with self._sf() as session:
+            count_stmt = select(func.count()).select_from(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id)
+            count = await session.scalar(count_stmt) or 0
+            if count > 0:
+                await session.execute(delete(RunEventRow).where(RunEventRow.thread_id == thread_id, RunEventRow.run_id == run_id))
+                await session.commit()
+            return count
@@ -0,0 +1,179 @@
+"""JSONL file-backed RunEventStore implementation.
+
+Each run's events are stored in a single file:
+``.deer-flow/threads/{thread_id}/runs/{run_id}.jsonl``
+
+All categories (message, trace, lifecycle) are in the same file.
+This backend is suitable for lightweight single-node deployments.
+
+Known trade-off: ``list_messages()`` must scan all run files for a
+thread since messages from multiple runs need unified seq ordering.
+``list_events()`` reads only one file -- the fast path.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import re
+from datetime import UTC, datetime
+from pathlib import Path
+
+from deerflow.runtime.events.store.base import RunEventStore
+
+logger = logging.getLogger(__name__)
+
+_SAFE_ID_PATTERN = re.compile(r"^[A-Za-z0-9_\-]+$")
+
+
+class JsonlRunEventStore(RunEventStore):
+    def __init__(self, base_dir: str | Path | None = None):
+        self._base_dir = Path(base_dir) if base_dir else Path(".deer-flow")
+        self._seq_counters: dict[str, int] = {}  # thread_id -> current max seq
+
+    @staticmethod
+    def _validate_id(value: str, label: str) -> str:
+        """Validate that an ID is safe for use in filesystem paths."""
+        if not value or not _SAFE_ID_PATTERN.match(value):
+            raise ValueError(f"Invalid {label}: must be alphanumeric/dash/underscore, got {value!r}")
+        return value
+
+    def _thread_dir(self, thread_id: str) -> Path:
+        self._validate_id(thread_id, "thread_id")
+        return self._base_dir / "threads" / thread_id / "runs"
+
+    def _run_file(self, thread_id: str, run_id: str) -> Path:
+        self._validate_id(run_id, "run_id")
+        return self._thread_dir(thread_id) / f"{run_id}.jsonl"
+
+    def _next_seq(self, thread_id: str) -> int:
+        self._seq_counters[thread_id] = self._seq_counters.get(thread_id, 0) + 1
+        return self._seq_counters[thread_id]
+
+    def _ensure_seq_loaded(self, thread_id: str) -> None:
+        """Load max seq from existing files if not yet cached."""
+        if thread_id in self._seq_counters:
+            return
+        max_seq = 0
+        thread_dir = self._thread_dir(thread_id)
+        if thread_dir.exists():
+            for f in thread_dir.glob("*.jsonl"):
+                for line in f.read_text(encoding="utf-8").strip().splitlines():
+                    try:
+                        record = json.loads(line)
+                        max_seq = max(max_seq, record.get("seq", 0))
+                    except json.JSONDecodeError:
+                        logger.debug("Skipping malformed JSONL line in %s", f)
+                        continue
+        self._seq_counters[thread_id] = max_seq
+
+    def _write_record(self, record: dict) -> None:
+        path = self._run_file(record["thread_id"], record["run_id"])
+        path.parent.mkdir(parents=True, exist_ok=True)
+        with open(path, "a", encoding="utf-8") as f:
+            f.write(json.dumps(record, default=str, ensure_ascii=False) + "\n")
+
+    def _read_thread_events(self, thread_id: str) -> list[dict]:
+        """Read all events for a thread, sorted by seq."""
+        events = []
+        thread_dir = self._thread_dir(thread_id)
+        if not thread_dir.exists():
+            return events
+        for f in sorted(thread_dir.glob("*.jsonl")):
+            for line in f.read_text(encoding="utf-8").strip().splitlines():
+                if not line:
+                    continue
+                try:
+                    events.append(json.loads(line))
+                except json.JSONDecodeError:
+                    logger.debug("Skipping malformed JSONL line in %s", f)
+                    continue
+        events.sort(key=lambda e: e.get("seq", 0))
+        return events
+
+    def _read_run_events(self, thread_id: str, run_id: str) -> list[dict]:
+        """Read events for a specific run file."""
+        path = self._run_file(thread_id, run_id)
+        if not path.exists():
+            return []
+        events = []
+        for line in path.read_text(encoding="utf-8").strip().splitlines():
+            if not line:
+                continue
+            try:
+                events.append(json.loads(line))
+            except json.JSONDecodeError:
+                logger.debug("Skipping malformed JSONL line in %s", path)
+                continue
+        events.sort(key=lambda e: e.get("seq", 0))
+        return events
+
+    async def put(self, *, thread_id, run_id, event_type, category, content="", metadata=None, created_at=None):
+        self._ensure_seq_loaded(thread_id)
+        seq = self._next_seq(thread_id)
+        record = {
+            "thread_id": thread_id,
+            "run_id": run_id,
+            "event_type": event_type,
+            "category": category,
+            "content": content,
+            "metadata": metadata or {},
+            "seq": seq,
+            "created_at": created_at or datetime.now(UTC).isoformat(),
+        }
+        self._write_record(record)
+        return record
+
+    async def put_batch(self, events):
+        if not events:
+            return []
+        results = []
+        for ev in events:
+            record = await self.put(**ev)
+            results.append(record)
+        return results
+
+    async def list_messages(self, thread_id, *, limit=50, before_seq=None, after_seq=None):
+        all_events = self._read_thread_events(thread_id)
+        messages = [e for e in all_events if e.get("category") == "message"]
+
+        if before_seq is not None:
+            messages = [e for e in messages if e["seq"] < before_seq]
+            return messages[-limit:]
+        elif after_seq is not None:
+            messages = [e for e in messages if e["seq"] > after_seq]
+            return messages[:limit]
+        else:
+            return messages[-limit:]
+
+    async def list_events(self, thread_id, run_id, *, event_types=None, limit=500):
+        events = self._read_run_events(thread_id, run_id)
+        if event_types is not None:
+            events = [e for e in events if e.get("event_type") in event_types]
+        return events[:limit]
+
+    async def list_messages_by_run(self, thread_id, run_id):
+        events = self._read_run_events(thread_id, run_id)
+        return [e for e in events if e.get("category") == "message"]
+
+    async def count_messages(self, thread_id):
+        all_events = self._read_thread_events(thread_id)
+        return sum(1 for e in all_events if e.get("category") == "message")
+
+    async def delete_by_thread(self, thread_id):
+        all_events = self._read_thread_events(thread_id)
+        count = len(all_events)
+        thread_dir = self._thread_dir(thread_id)
+        if thread_dir.exists():
+            for f in thread_dir.glob("*.jsonl"):
+                f.unlink()
+        self._seq_counters.pop(thread_id, None)
+        return count
+
+    async def delete_by_run(self, thread_id, run_id):
+        events = self._read_run_events(thread_id, run_id)
+        count = len(events)
+        path = self._run_file(thread_id, run_id)
+        if path.exists():
+            path.unlink()
+        return count
@@ -0,0 +1,120 @@
+"""In-memory RunEventStore. Used when run_events.backend=memory (default) and in tests.
+
+Thread-safe for single-process async usage (no threading locks needed
+since all mutations happen within the same event loop).
+"""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+
+from deerflow.runtime.events.store.base import RunEventStore
+
+
+class MemoryRunEventStore(RunEventStore):
+    def __init__(self) -> None:
+        self._events: dict[str, list[dict]] = {}  # thread_id -> sorted event list
+        self._seq_counters: dict[str, int] = {}  # thread_id -> last assigned seq
+
+    def _next_seq(self, thread_id: str) -> int:
+        current = self._seq_counters.get(thread_id, 0)
+        next_val = current + 1
+        self._seq_counters[thread_id] = next_val
+        return next_val
+
+    def _put_one(
+        self,
+        *,
+        thread_id: str,
+        run_id: str,
+        event_type: str,
+        category: str,
+        content: str | dict = "",
+        metadata: dict | None = None,
+        created_at: str | None = None,
+    ) -> dict:
+        seq = self._next_seq(thread_id)
+        record = {
+            "thread_id": thread_id,
+            "run_id": run_id,
+            "event_type": event_type,
+            "category": category,
+            "content": content,
+            "metadata": metadata or {},
+            "seq": seq,
+            "created_at": created_at or datetime.now(UTC).isoformat(),
+        }
+        self._events.setdefault(thread_id, []).append(record)
+        return record
+
+    async def put(
+        self,
+        *,
+        thread_id,
+        run_id,
+        event_type,
+        category,
+        content="",
+        metadata=None,
+        created_at=None,
+    ):
+        return self._put_one(
+            thread_id=thread_id,
+            run_id=run_id,
+            event_type=event_type,
+            category=category,
+            content=content,
+            metadata=metadata,
+            created_at=created_at,
+        )
+
+    async def put_batch(self, events):
+        results = []
+        for ev in events:
+            record = self._put_one(**ev)
+            results.append(record)
+        return results
+
+    async def list_messages(self, thread_id, *, limit=50, before_seq=None, after_seq=None):
+        all_events = self._events.get(thread_id, [])
+        messages = [e for e in all_events if e["category"] == "message"]
+
+        if before_seq is not None:
+            messages = [e for e in messages if e["seq"] < before_seq]
+            # Take the last `limit` records
+            return messages[-limit:]
+        elif after_seq is not None:
+            messages = [e for e in messages if e["seq"] > after_seq]
+            return messages[:limit]
+        else:
+            # Return the latest `limit` records, ascending
+            return messages[-limit:]
+
+    async def list_events(self, thread_id, run_id, *, event_types=None, limit=500):
+        all_events = self._events.get(thread_id, [])
+        filtered = [e for e in all_events if e["run_id"] == run_id]
+        if event_types is not None:
+            filtered = [e for e in filtered if e["event_type"] in event_types]
+        return filtered[:limit]
+
+    async def list_messages_by_run(self, thread_id, run_id):
+        all_events = self._events.get(thread_id, [])
+        return [e for e in all_events if e["run_id"] == run_id and e["category"] == "message"]
+
+    async def count_messages(self, thread_id):
+        all_events = self._events.get(thread_id, [])
+        return sum(1 for e in all_events if e["category"] == "message")
+
+    async def delete_by_thread(self, thread_id):
+        events = self._events.pop(thread_id, [])
+        self._seq_counters.pop(thread_id, None)
+        return len(events)
+
+    async def delete_by_run(self, thread_id, run_id):
+        all_events = self._events.get(thread_id, [])
+        if not all_events:
+            return 0
+        remaining = [e for e in all_events if e["run_id"] != run_id]
+        removed = len(all_events) - len(remaining)
+        self._events[thread_id] = remaining
+        return removed
@@ -0,0 +1,471 @@
+"""Run event capture via LangChain callbacks.
+
+RunJournal sits between LangChain's callback mechanism and the pluggable
+RunEventStore. It standardizes callback data into RunEvent records and
+handles token usage accumulation.
+
+Key design decisions:
+- on_llm_new_token is NOT implemented -- only complete messages via on_llm_end
+- on_chat_model_start captures structured prompts as llm_request (OpenAI format)
+- on_llm_end emits llm_response in OpenAI Chat Completions format
+- Token usage accumulated in memory, written to RunRow on run completion
+- Caller identification via tags injection (lead_agent / subagent:{name} / middleware:{name})
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import time
+from datetime import UTC, datetime
+from typing import TYPE_CHECKING, Any
+from uuid import UUID
+
+from langchain_core.callbacks import BaseCallbackHandler
+
+if TYPE_CHECKING:
+    from deerflow.runtime.events.store.base import RunEventStore
+
+logger = logging.getLogger(__name__)
+
+
+class RunJournal(BaseCallbackHandler):
+    """LangChain callback handler that captures events to RunEventStore."""
+
+    def __init__(
+        self,
+        run_id: str,
+        thread_id: str,
+        event_store: RunEventStore,
+        *,
+        track_token_usage: bool = True,
+        flush_threshold: int = 20,
+    ):
+        super().__init__()
+        self.run_id = run_id
+        self.thread_id = thread_id
+        self._store = event_store
+        self._track_tokens = track_token_usage
+        self._flush_threshold = flush_threshold
+
+        # Write buffer
+        self._buffer: list[dict] = []
+
+        # Token accumulators
+        self._total_input_tokens = 0
+        self._total_output_tokens = 0
+        self._total_tokens = 0
+        self._llm_call_count = 0
+        self._lead_agent_tokens = 0
+        self._subagent_tokens = 0
+        self._middleware_tokens = 0
+
+        # Convenience fields
+        self._last_ai_msg: str | None = None
+        self._first_human_msg: str | None = None
+        self._msg_count = 0
+
+        # Latency tracking
+        self._llm_start_times: dict[str, float] = {}  # langchain run_id -> start time
+
+        # LLM request/response tracking
+        self._llm_call_index = 0
+        self._cached_prompts: dict[str, list[dict]] = {}  # langchain run_id -> OpenAI messages
+        self._cached_models: dict[str, str] = {}  # langchain run_id -> model name
+
+        # Tool call ID cache
+        self._tool_call_ids: dict[str, str] = {}  # langchain run_id -> tool_call_id
+
+    # -- Lifecycle callbacks --
+
+    def on_chain_start(self, serialized: dict, inputs: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        if kwargs.get("parent_run_id") is not None:
+            return
+        self._put(
+            event_type="run_start",
+            category="lifecycle",
+            metadata={"input_preview": str(inputs)[:500]},
+        )
+
+    def on_chain_end(self, outputs: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        if kwargs.get("parent_run_id") is not None:
+            return
+        self._put(event_type="run_end", category="lifecycle", metadata={"status": "success"})
+        self._flush_sync()
+
+    def on_chain_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
+        if kwargs.get("parent_run_id") is not None:
+            return
+        self._put(
+            event_type="run_error",
+            category="lifecycle",
+            content=str(error),
+            metadata={"error_type": type(error).__name__},
+        )
+        self._flush_sync()
+
+    # -- LLM callbacks --
+
+    def on_chat_model_start(self, serialized: dict, messages: list[list], *, run_id: UUID, **kwargs: Any) -> None:
+        """Capture structured prompt messages for llm_request event."""
+        from deerflow.runtime.converters import langchain_messages_to_openai
+
+        rid = str(run_id)
+        self._llm_start_times[rid] = time.monotonic()
+        self._llm_call_index += 1
+
+        model_name = serialized.get("name", "")
+        self._cached_models[rid] = model_name
+
+        # Convert the first message list (LangChain passes list-of-lists)
+        prompt_msgs = messages[0] if messages else []
+        openai_msgs = langchain_messages_to_openai(prompt_msgs)
+        self._cached_prompts[rid] = openai_msgs
+
+        caller = self._identify_caller(kwargs)
+        self._put(
+            event_type="llm_request",
+            category="trace",
+            content={"model": model_name, "messages": openai_msgs},
+            metadata={"caller": caller, "llm_call_index": self._llm_call_index},
+        )
+
+    def on_llm_start(self, serialized: dict, prompts: list[str], *, run_id: UUID, **kwargs: Any) -> None:
+        # Fallback: on_chat_model_start is preferred. This just tracks latency.
+        self._llm_start_times[str(run_id)] = time.monotonic()
+
+    def on_llm_end(self, response: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        from deerflow.runtime.converters import langchain_to_openai_completion
+
+        try:
+            message = response.generations[0][0].message
+        except (IndexError, AttributeError):
+            logger.debug("on_llm_end: could not extract message from response")
+            return
+
+        caller = self._identify_caller(kwargs)
+
+        # Latency
+        rid = str(run_id)
+        start = self._llm_start_times.pop(rid, None)
+        latency_ms = int((time.monotonic() - start) * 1000) if start else None
+
+        # Token usage from message
+        usage = getattr(message, "usage_metadata", None)
+        usage_dict = dict(usage) if usage else {}
+
+        # Resolve call index
+        call_index = self._llm_call_index
+        if rid not in self._cached_prompts:
+            # Fallback: on_chat_model_start was not called
+            self._llm_call_index += 1
+            call_index = self._llm_call_index
+
+        # Clean up caches
+        self._cached_prompts.pop(rid, None)
+        self._cached_models.pop(rid, None)
+
+        # Trace event: llm_response (OpenAI completion format)
+        content = getattr(message, "content", "")
+        self._put(
+            event_type="llm_response",
+            category="trace",
+            content=langchain_to_openai_completion(message),
+            metadata={
+                "caller": caller,
+                "usage": usage_dict,
+                "latency_ms": latency_ms,
+                "llm_call_index": call_index,
+            },
+        )
+
+        # Message events: only lead_agent gets message-category events.
+        # Content uses message.model_dump() to align with checkpoint format.
+        tool_calls = getattr(message, "tool_calls", None) or []
+        if caller == "lead_agent":
+            resp_meta = getattr(message, "response_metadata", None) or {}
+            model_name = resp_meta.get("model_name") if isinstance(resp_meta, dict) else None
+            if tool_calls:
+                # ai_tool_call: agent decided to use tools
+                self._put(
+                    event_type="ai_tool_call",
+                    category="message",
+                    content=message.model_dump(),
+                    metadata={"model_name": model_name, "finish_reason": "tool_calls"},
+                )
+            elif isinstance(content, str) and content:
+                # ai_message: final text reply
+                self._put(
+                    event_type="ai_message",
+                    category="message",
+                    content=message.model_dump(),
+                    metadata={"model_name": model_name, "finish_reason": "stop"},
+                )
+                self._last_ai_msg = content
+                self._msg_count += 1
+
+        # Token accumulation
+        if self._track_tokens:
+            input_tk = usage_dict.get("input_tokens", 0) or 0
+            output_tk = usage_dict.get("output_tokens", 0) or 0
+            total_tk = usage_dict.get("total_tokens", 0) or 0
+            if total_tk == 0:
+                total_tk = input_tk + output_tk
+            if total_tk > 0:
+                self._total_input_tokens += input_tk
+                self._total_output_tokens += output_tk
+                self._total_tokens += total_tk
+                self._llm_call_count += 1
+                if caller.startswith("subagent:"):
+                    self._subagent_tokens += total_tk
+                elif caller.startswith("middleware:"):
+                    self._middleware_tokens += total_tk
+                else:
+                    self._lead_agent_tokens += total_tk
+
+    def on_llm_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
+        self._llm_start_times.pop(str(run_id), None)
+        self._put(event_type="llm_error", category="trace", content=str(error))
+
+    # -- Tool callbacks --
+
+    def on_tool_start(self, serialized: dict, input_str: str, *, run_id: UUID, **kwargs: Any) -> None:
+        tool_call_id = kwargs.get("tool_call_id")
+        if tool_call_id:
+            self._tool_call_ids[str(run_id)] = tool_call_id
+        self._put(
+            event_type="tool_start",
+            category="trace",
+            metadata={
+                "tool_name": serialized.get("name", ""),
+                "tool_call_id": tool_call_id,
+                "args": str(input_str)[:2000],
+            },
+        )
+
+    def on_tool_end(self, output: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        from langchain_core.messages import ToolMessage
+
+        # Extract fields from ToolMessage object when LangChain provides one.
+        # LangChain's _format_output wraps tool results into a ToolMessage
+        # with tool_call_id, name, status, and artifact — more complete than
+        # what kwargs alone provides.
+        if isinstance(output, ToolMessage):
+            tool_call_id = output.tool_call_id or kwargs.get("tool_call_id") or self._tool_call_ids.pop(str(run_id), None)
+            tool_name = output.name or kwargs.get("name", "")
+            status = getattr(output, "status", "success") or "success"
+            content_str = output.content if isinstance(output.content, str) else str(output.content)
+            # Use model_dump() for checkpoint-aligned message content.
+            # Override tool_call_id if it was resolved from cache.
+            msg_content = output.model_dump()
+            if msg_content.get("tool_call_id") != tool_call_id:
+                msg_content["tool_call_id"] = tool_call_id
+        else:
+            tool_call_id = kwargs.get("tool_call_id") or self._tool_call_ids.pop(str(run_id), None)
+            tool_name = kwargs.get("name", "")
+            status = "success"
+            content_str = str(output)
+            # Construct checkpoint-aligned dict when output is a plain string.
+            msg_content = ToolMessage(
+                content=content_str,
+                tool_call_id=tool_call_id or "",
+                name=tool_name,
+                status=status,
+            ).model_dump()
+
+        # Trace event (always)
+        self._put(
+            event_type="tool_end",
+            category="trace",
+            content=content_str,
+            metadata={
+                "tool_name": tool_name,
+                "tool_call_id": tool_call_id,
+                "status": status,
+            },
+        )
+
+        # Message event: tool_result (checkpoint-aligned model_dump format)
+        self._put(
+            event_type="tool_result",
+            category="message",
+            content=msg_content,
+            metadata={"tool_name": tool_name, "status": status},
+        )
+
+    def on_tool_error(self, error: BaseException, *, run_id: UUID, **kwargs: Any) -> None:
+        from langchain_core.messages import ToolMessage
+
+        tool_call_id = kwargs.get("tool_call_id") or self._tool_call_ids.pop(str(run_id), None)
+        tool_name = kwargs.get("name", "")
+
+        # Trace event
+        self._put(
+            event_type="tool_error",
+            category="trace",
+            content=str(error),
+            metadata={
+                "tool_name": tool_name,
+                "tool_call_id": tool_call_id,
+            },
+        )
+
+        # Message event: tool_result with error status (checkpoint-aligned)
+        msg_content = ToolMessage(
+            content=str(error),
+            tool_call_id=tool_call_id or "",
+            name=tool_name,
+            status="error",
+        ).model_dump()
+        self._put(
+            event_type="tool_result",
+            category="message",
+            content=msg_content,
+            metadata={"tool_name": tool_name, "status": "error"},
+        )
+
+    # -- Custom event callback --
+
+    def on_custom_event(self, name: str, data: Any, *, run_id: UUID, **kwargs: Any) -> None:
+        from deerflow.runtime.serialization import serialize_lc_object
+
+        if name == "summarization":
+            data_dict = data if isinstance(data, dict) else {}
+            self._put(
+                event_type="summarization",
+                category="trace",
+                content=data_dict.get("summary", ""),
+                metadata={
+                    "replaced_message_ids": data_dict.get("replaced_message_ids", []),
+                    "replaced_count": data_dict.get("replaced_count", 0),
+                },
+            )
+            self._put(
+                event_type="middleware:summarize",
+                category="middleware",
+                content={"role": "system", "content": data_dict.get("summary", "")},
+                metadata={"replaced_count": data_dict.get("replaced_count", 0)},
+            )
+        else:
+            event_data = serialize_lc_object(data) if not isinstance(data, dict) else data
+            self._put(
+                event_type=name,
+                category="trace",
+                metadata=event_data if isinstance(event_data, dict) else {"data": event_data},
+            )
+
+    # -- Internal methods --
+
+    def _put(self, *, event_type: str, category: str, content: str | dict = "", metadata: dict | None = None) -> None:
+        self._buffer.append(
+            {
+                "thread_id": self.thread_id,
+                "run_id": self.run_id,
+                "event_type": event_type,
+                "category": category,
+                "content": content,
+                "metadata": metadata or {},
+                "created_at": datetime.now(UTC).isoformat(),
+            }
+        )
+        if len(self._buffer) >= self._flush_threshold:
+            self._flush_sync()
+
+    def _flush_sync(self) -> None:
+        """Best-effort flush of buffer to RunEventStore.
+
+        BaseCallbackHandler methods are synchronous.  If an event loop is
+        running we schedule an async ``put_batch``; otherwise the events
+        stay in the buffer and are flushed later by the async ``flush()``
+        call in the worker's ``finally`` block.
+        """
+        if not self._buffer:
+            return
+        try:
+            loop = asyncio.get_running_loop()
+        except RuntimeError:
+            # No event loop — keep events in buffer for later async flush.
+            return
+        batch = self._buffer.copy()
+        self._buffer.clear()
+        task = loop.create_task(self._flush_async(batch))
+        task.add_done_callback(self._on_flush_done)
+
+    async def _flush_async(self, batch: list[dict]) -> None:
+        try:
+            await self._store.put_batch(batch)
+        except Exception:
+            logger.warning(
+                "Failed to flush %d events for run %s — returning to buffer",
+                len(batch),
+                self.run_id,
+                exc_info=True,
+            )
+            # Return failed events to buffer for retry on next flush
+            self._buffer = batch + self._buffer
+
+    @staticmethod
+    def _on_flush_done(task: asyncio.Task) -> None:
+        if task.cancelled():
+            return
+        exc = task.exception()
+        if exc:
+            logger.warning("Journal flush task failed: %s", exc)
+
+    def _identify_caller(self, kwargs: dict) -> str:
+        for tag in kwargs.get("tags") or []:
+            if isinstance(tag, str) and (tag.startswith("subagent:") or tag.startswith("middleware:") or tag == "lead_agent"):
+                return tag
+        # Default to lead_agent: the main agent graph does not inject
+        # callback tags, while subagents and middleware explicitly tag
+        # themselves.
+        return "lead_agent"
+
+    # -- Public methods (called by worker) --
+
+    def set_first_human_message(self, content: str) -> None:
+        """Record the first human message for convenience fields."""
+        self._first_human_msg = content[:2000] if content else None
+
+    def record_middleware(self, tag: str, *, name: str, hook: str, action: str, changes: dict) -> None:
+        """Record a middleware state-change event.
+
+        Called by middleware implementations when they perform a meaningful
+        state change (e.g., title generation, summarization, HITL approval).
+        Pure-observation middleware should not call this.
+
+        Args:
+            tag: Short identifier for the middleware (e.g., "title", "summarize",
+                 "guardrail"). Used to form event_type="middleware:{tag}".
+            name: Full middleware class name.
+            hook: Lifecycle hook that triggered the action (e.g., "after_model").
+            action: Specific action performed (e.g., "generate_title").
+            changes: Dict describing the state changes made.
+        """
+        self._put(
+            event_type=f"middleware:{tag}",
+            category="middleware",
+            content={"name": name, "hook": hook, "action": action, "changes": changes},
+        )
+
+    async def flush(self) -> None:
+        """Force flush remaining buffer. Called in worker's finally block."""
+        if self._buffer:
+            batch = self._buffer.copy()
+            self._buffer.clear()
+            await self._store.put_batch(batch)
+
+    def get_completion_data(self) -> dict:
+        """Return accumulated token and message data for run completion."""
+        return {
+            "total_input_tokens": self._total_input_tokens,
+            "total_output_tokens": self._total_output_tokens,
+            "total_tokens": self._total_tokens,
+            "llm_call_count": self._llm_call_count,
+            "lead_agent_tokens": self._lead_agent_tokens,
+            "subagent_tokens": self._subagent_tokens,
+            "middleware_tokens": self._middleware_tokens,
+            "message_count": self._msg_count,
+            "last_ai_message": self._last_ai_msg,
+            "first_human_message": self._first_human_msg,
+        }
@@ -2,11 +2,12 @@

 from .manager import ConflictError, RunManager, RunRecord, UnsupportedStrategyError
 from .schemas import DisconnectMode, RunStatus
-from .worker import run_agent
+from .worker import RunContext, run_agent

 __all__ = [
    "ConflictError",
    "DisconnectMode",
+    "RunContext",
    "RunManager",
    "RunRecord",
    "RunStatus",
@@ -1,4 +1,4 @@
-"""In-memory run registry."""
+"""In-memory run registry with optional persistent RunStore backing."""

 from __future__ import annotations

@@ -7,9 +7,13 @@ import logging
 import uuid
 from dataclasses import dataclass, field
 from datetime import UTC, datetime
+from typing import TYPE_CHECKING

 from .schemas import DisconnectMode, RunStatus

+if TYPE_CHECKING:
+    from deerflow.runtime.runs.store.base import RunStore
+
 logger = logging.getLogger(__name__)


@@ -38,11 +42,44 @@ class RunRecord:


 class RunManager:
-    """In-memory run registry.  All mutations are protected by an asyncio lock."""
+    """In-memory run registry with optional persistent RunStore backing.

-    def __init__(self) -> None:
+    All mutations are protected by an asyncio lock. When a ``store`` is
+    provided, serializable metadata is also persisted to the store so
+    that run history survives process restarts.
+    """
+
+    def __init__(self, store: RunStore | None = None) -> None:
        self._runs: dict[str, RunRecord] = {}
        self._lock = asyncio.Lock()
+        self._store = store
+
+    async def _persist_to_store(self, record: RunRecord, *, follow_up_to_run_id: str | None = None) -> None:
+        """Best-effort persist run record to backing store."""
+        if self._store is None:
+            return
+        try:
+            await self._store.put(
+                record.run_id,
+                thread_id=record.thread_id,
+                assistant_id=record.assistant_id,
+                status=record.status.value,
+                multitask_strategy=record.multitask_strategy,
+                metadata=record.metadata or {},
+                kwargs=record.kwargs or {},
+                created_at=record.created_at,
+                follow_up_to_run_id=follow_up_to_run_id,
+            )
+        except Exception:
+            logger.warning("Failed to persist run %s to store", record.run_id, exc_info=True)
+
+    async def update_run_completion(self, run_id: str, **kwargs) -> None:
+        """Persist token usage and completion data to the backing store."""
+        if self._store is not None:
+            try:
+                await self._store.update_run_completion(run_id, **kwargs)
+            except Exception:
+                logger.warning("Failed to persist run completion for %s", run_id, exc_info=True)

    async def create(
        self,
@@ -53,6 +90,7 @@ class RunManager:
        metadata: dict | None = None,
        kwargs: dict | None = None,
        multitask_strategy: str = "reject",
+        follow_up_to_run_id: str | None = None,
    ) -> RunRecord:
        """Create a new pending run and register it."""
        run_id = str(uuid.uuid4())
@@ -71,6 +109,7 @@ class RunManager:
        )
        async with self._lock:
            self._runs[run_id] = record
+        await self._persist_to_store(record, follow_up_to_run_id=follow_up_to_run_id)
        logger.info("Run created: run_id=%s thread_id=%s", run_id, thread_id)
        return record

@@ -96,6 +135,11 @@ class RunManager:
            record.updated_at = _now_iso()
            if error is not None:
                record.error = error
+        if self._store is not None:
+            try:
+                await self._store.update_status(run_id, status.value, error=error)
+            except Exception:
+                logger.warning("Failed to persist status update for run %s", run_id, exc_info=True)
        logger.info("Run %s -> %s", run_id, status.value)

    async def cancel(self, run_id: str, *, action: str = "interrupt") -> bool:
@@ -132,6 +176,7 @@ class RunManager:
        metadata: dict | None = None,
        kwargs: dict | None = None,
        multitask_strategy: str = "reject",
+        follow_up_to_run_id: str | None = None,
    ) -> RunRecord:
        """Atomically check for inflight runs and create a new one.

@@ -185,6 +230,7 @@ class RunManager:
            )
            self._runs[run_id] = record

+        await self._persist_to_store(record, follow_up_to_run_id=follow_up_to_run_id)
        logger.info("Run created: run_id=%s thread_id=%s", run_id, thread_id)
        return record

@@ -0,0 +1,4 @@
+from deerflow.runtime.runs.store.base import RunStore
+from deerflow.runtime.runs.store.memory import MemoryRunStore
+
+__all__ = ["MemoryRunStore", "RunStore"]
@@ -0,0 +1,96 @@
+"""Abstract interface for run metadata storage.
+
+RunManager depends on this interface. Implementations:
+- MemoryRunStore: in-memory dict (development, tests)
+- Future: RunRepository backed by SQLAlchemy ORM
+
+All methods accept an optional owner_id for user isolation.
+When owner_id is None, no user filtering is applied (single-user mode).
+"""
+
+from __future__ import annotations
+
+import abc
+from typing import Any
+
+
+class RunStore(abc.ABC):
+    @abc.abstractmethod
+    async def put(
+        self,
+        run_id: str,
+        *,
+        thread_id: str,
+        assistant_id: str | None = None,
+        owner_id: str | None = None,
+        status: str = "pending",
+        multitask_strategy: str = "reject",
+        metadata: dict[str, Any] | None = None,
+        kwargs: dict[str, Any] | None = None,
+        error: str | None = None,
+        created_at: str | None = None,
+        follow_up_to_run_id: str | None = None,
+    ) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def get(self, run_id: str) -> dict[str, Any] | None:
+        pass
+
+    @abc.abstractmethod
+    async def list_by_thread(
+        self,
+        thread_id: str,
+        *,
+        owner_id: str | None = None,
+        limit: int = 100,
+    ) -> list[dict[str, Any]]:
+        pass
+
+    @abc.abstractmethod
+    async def update_status(
+        self,
+        run_id: str,
+        status: str,
+        *,
+        error: str | None = None,
+    ) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def delete(self, run_id: str) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def update_run_completion(
+        self,
+        run_id: str,
+        *,
+        status: str,
+        total_input_tokens: int = 0,
+        total_output_tokens: int = 0,
+        total_tokens: int = 0,
+        llm_call_count: int = 0,
+        lead_agent_tokens: int = 0,
+        subagent_tokens: int = 0,
+        middleware_tokens: int = 0,
+        message_count: int = 0,
+        last_ai_message: str | None = None,
+        first_human_message: str | None = None,
+        error: str | None = None,
+    ) -> None:
+        pass
+
+    @abc.abstractmethod
+    async def list_pending(self, *, before: str | None = None) -> list[dict[str, Any]]:
+        pass
+
+    @abc.abstractmethod
+    async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
+        """Aggregate token usage for completed runs in a thread.
+
+        Returns a dict with keys: total_tokens, total_input_tokens,
+        total_output_tokens, total_runs, by_model (model_name → {tokens, runs}),
+        by_caller ({lead_agent, subagent, middleware}).
+        """
+        pass
@@ -0,0 +1,100 @@
+"""In-memory RunStore. Used when database.backend=memory (default) and in tests.
+
+Equivalent to the original RunManager._runs dict behavior.
+"""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+from typing import Any
+
+from deerflow.runtime.runs.store.base import RunStore
+
+
+class MemoryRunStore(RunStore):
+    def __init__(self) -> None:
+        self._runs: dict[str, dict[str, Any]] = {}
+
+    async def put(
+        self,
+        run_id,
+        *,
+        thread_id,
+        assistant_id=None,
+        owner_id=None,
+        status="pending",
+        multitask_strategy="reject",
+        metadata=None,
+        kwargs=None,
+        error=None,
+        created_at=None,
+        follow_up_to_run_id=None,
+    ):
+        now = datetime.now(UTC).isoformat()
+        self._runs[run_id] = {
+            "run_id": run_id,
+            "thread_id": thread_id,
+            "assistant_id": assistant_id,
+            "owner_id": owner_id,
+            "status": status,
+            "multitask_strategy": multitask_strategy,
+            "metadata": metadata or {},
+            "kwargs": kwargs or {},
+            "error": error,
+            "follow_up_to_run_id": follow_up_to_run_id,
+            "created_at": created_at or now,
+            "updated_at": now,
+        }
+
+    async def get(self, run_id):
+        return self._runs.get(run_id)
+
+    async def list_by_thread(self, thread_id, *, owner_id=None, limit=100):
+        results = [r for r in self._runs.values() if r["thread_id"] == thread_id and (owner_id is None or r.get("owner_id") == owner_id)]
+        results.sort(key=lambda r: r["created_at"], reverse=True)
+        return results[:limit]
+
+    async def update_status(self, run_id, status, *, error=None):
+        if run_id in self._runs:
+            self._runs[run_id]["status"] = status
+            if error is not None:
+                self._runs[run_id]["error"] = error
+            self._runs[run_id]["updated_at"] = datetime.now(UTC).isoformat()
+
+    async def delete(self, run_id):
+        self._runs.pop(run_id, None)
+
+    async def update_run_completion(self, run_id, *, status, **kwargs):
+        if run_id in self._runs:
+            self._runs[run_id]["status"] = status
+            for key, value in kwargs.items():
+                if value is not None:
+                    self._runs[run_id][key] = value
+            self._runs[run_id]["updated_at"] = datetime.now(UTC).isoformat()
+
+    async def list_pending(self, *, before=None):
+        now = before or datetime.now(UTC).isoformat()
+        results = [r for r in self._runs.values() if r["status"] == "pending" and r["created_at"] <= now]
+        results.sort(key=lambda r: r["created_at"])
+        return results
+
+    async def aggregate_tokens_by_thread(self, thread_id: str) -> dict[str, Any]:
+        completed = [r for r in self._runs.values() if r["thread_id"] == thread_id and r.get("status") in ("success", "error")]
+        by_model: dict[str, dict] = {}
+        for r in completed:
+            model = r.get("model_name") or "unknown"
+            entry = by_model.setdefault(model, {"tokens": 0, "runs": 0})
+            entry["tokens"] += r.get("total_tokens", 0)
+            entry["runs"] += 1
+        return {
+            "total_tokens": sum(r.get("total_tokens", 0) for r in completed),
+            "total_input_tokens": sum(r.get("total_input_tokens", 0) for r in completed),
+            "total_output_tokens": sum(r.get("total_output_tokens", 0) for r in completed),
+            "total_runs": len(completed),
+            "by_model": by_model,
+            "by_caller": {
+                "lead_agent": sum(r.get("lead_agent_tokens", 0) for r in completed),
+                "subagent": sum(r.get("subagent_tokens", 0) for r in completed),
+                "middleware": sum(r.get("middleware_tokens", 0) for r in completed),
+            },
+        }
@@ -19,7 +19,11 @@ import asyncio
 import copy
 import inspect
 import logging
-from typing import Any, Literal
+from dataclasses import dataclass, field
+from typing import TYPE_CHECKING, Any, Literal
+
+if TYPE_CHECKING:
+    from langchain_core.messages import HumanMessage

 from deerflow.runtime.serialization import serialize
 from deerflow.runtime.stream_bridge import StreamBridge
@@ -33,13 +37,29 @@ logger = logging.getLogger(__name__)
 _VALID_LG_MODES = {"values", "updates", "checkpoints", "tasks", "debug", "messages", "custom"}


+@dataclass(frozen=True)
+class RunContext:
+    """Infrastructure dependencies for a single agent run.
+
+    Groups checkpointer, store, and persistence-related singletons so that
+    ``run_agent`` (and any future callers) receive one object instead of a
+    growing list of keyword arguments.
+    """
+
+    checkpointer: Any
+    store: Any | None = field(default=None)
+    event_store: Any | None = field(default=None)
+    run_events_config: Any | None = field(default=None)
+    thread_meta_repo: Any | None = field(default=None)
+    follow_up_to_run_id: str | None = field(default=None)
+
+
 async def run_agent(
    bridge: StreamBridge,
    run_manager: RunManager,
    record: RunRecord,
    *,
-    checkpointer: Any,
-    store: Any | None = None,
+    ctx: RunContext,
    agent_factory: Any,
    graph_input: dict,
    config: dict,
@@ -50,6 +70,14 @@ async def run_agent(
 ) -> None:
    """Execute an agent in the background, publishing events to *bridge*."""

+    # Unpack infrastructure dependencies from RunContext.
+    checkpointer = ctx.checkpointer
+    store = ctx.store
+    event_store = ctx.event_store
+    run_events_config = ctx.run_events_config
+    thread_meta_repo = ctx.thread_meta_repo
+    follow_up_to_run_id = ctx.follow_up_to_run_id
+
    run_id = record.run_id
    thread_id = record.thread_id
    requested_modes: set[str] = set(stream_modes or ["values"])
@@ -57,6 +85,35 @@ async def run_agent(
    pre_run_snapshot: dict[str, Any] | None = None
    snapshot_capture_failed = False

+    # Initialize RunJournal for event capture
+    journal = None
+    if event_store is not None:
+        from deerflow.runtime.journal import RunJournal
+
+        journal = RunJournal(
+            run_id=run_id,
+            thread_id=thread_id,
+            event_store=event_store,
+            track_token_usage=getattr(run_events_config, "track_token_usage", True),
+        )
+
+        # Write human_message event (model_dump format, aligned with checkpoint)
+        human_msg = _extract_human_message(graph_input)
+        if human_msg is not None:
+            msg_metadata = {}
+            if follow_up_to_run_id:
+                msg_metadata["follow_up_to_run_id"] = follow_up_to_run_id
+            await event_store.put(
+                thread_id=thread_id,
+                run_id=run_id,
+                event_type="human_message",
+                category="message",
+                content=human_msg.model_dump(),
+                metadata=msg_metadata or None,
+            )
+            content = human_msg.content
+            journal.set_first_human_message(content if isinstance(content, str) else str(content))
+
    # Track whether "events" was requested but skipped
    if "events" in requested_modes:
        logger.info(
@@ -110,6 +167,11 @@ async def run_agent(
            config["context"].setdefault("thread_id", thread_id)
        config.setdefault("configurable", {})["__pregel_runtime"] = runtime

+        # Inject RunJournal as a LangChain callback handler.
+        # on_llm_end captures token usage; on_chain_start/end captures lifecycle.
+        if journal is not None:
+            config.setdefault("callbacks", []).append(journal)
+
        runnable_config = RunnableConfig(**config)
        agent = agent_factory(config=runnable_config)

@@ -236,6 +298,37 @@ async def run_agent(
        )

    finally:
+        # Flush any buffered journal events and persist completion data
+        if journal is not None:
+            try:
+                await journal.flush()
+            except Exception:
+                logger.warning("Failed to flush journal for run %s", run_id, exc_info=True)
+
+            # Persist token usage + convenience fields to RunStore
+            completion = journal.get_completion_data()
+            await run_manager.update_run_completion(run_id, status=record.status.value, **completion)
+
+        # Sync title from checkpoint to threads_meta.display_name
+        if checkpointer is not None:
+            try:
+                ckpt_config = {"configurable": {"thread_id": thread_id, "checkpoint_ns": ""}}
+                ckpt_tuple = await checkpointer.aget_tuple(ckpt_config)
+                if ckpt_tuple is not None:
+                    ckpt = getattr(ckpt_tuple, "checkpoint", {}) or {}
+                    title = ckpt.get("channel_values", {}).get("title")
+                    if title:
+                        await thread_meta_repo.update_display_name(thread_id, title)
+            except Exception:
+                logger.debug("Failed to sync title for thread %s (non-fatal)", thread_id)
+
+        # Update threads_meta status based on run outcome
+        try:
+            final_status = "idle" if record.status == RunStatus.success else record.status.value
+            await thread_meta_repo.update_status(thread_id, final_status)
+        except Exception:
+            logger.debug("Failed to update thread_meta status for %s (non-fatal)", thread_id)
+
        await bridge.publish_end(run_id)
        asyncio.create_task(bridge.cleanup(run_id, delay=60))

@@ -355,6 +448,31 @@ def _lg_mode_to_sse_event(mode: str) -> str:
    return mode


+def _extract_human_message(graph_input: dict) -> HumanMessage | None:
+    """Extract or construct a HumanMessage from graph_input for event recording.
+
+    Returns a LangChain HumanMessage so callers can use .model_dump() to get
+    the checkpoint-aligned serialization format.
+    """
+    from langchain_core.messages import HumanMessage
+
+    messages = graph_input.get("messages")
+    if not messages:
+        return None
+    last = messages[-1] if isinstance(messages, list) else messages
+    if isinstance(last, HumanMessage):
+        return last
+    if isinstance(last, str):
+        return HumanMessage(content=last) if last else None
+    if hasattr(last, "content"):
+        content = last.content
+        return HumanMessage(content=content)
+    if isinstance(last, dict):
+        content = last.get("content", "")
+        return HumanMessage(content=content) if content else None
+    return None
+
+
 def _unpack_stream_item(
    item: Any,
    lg_modes: list[str],