mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-05-20 15:11:09 +00:00
9d0a42c1fb
Major refactoring of deerflow/runtime/: - runs/callbacks/ - new callback system (builder, events, title, tokens) - runs/internal/ - execution internals (executor, supervisor, stream_logic, registry) - runs/internal/execution/ - execution artifacts and events handling - runs/facade.py - high-level run facade - runs/observer.py - run observation protocol - runs/types.py - type definitions - runs/store/ - simplified store interfaces (create, delete, query, event) Refactor stream_bridge/: - Replace old providers with contract.py and exceptions.py - Remove async_provider.py, base.py, memory.py Add documentation: - README.md and README_zh.md for runtime module Remove deprecated: - manager.py moved to internal/ - worker.py, schemas.py - user_context.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
595 lines
22 KiB
Markdown
595 lines
22 KiB
Markdown
# deerflow.runtime Design Overview
|
||
|
||
This document describes the current implementation of `backend/packages/harness/deerflow/runtime`, including its overall design, boundary model, the collaboration between `runs` and `stream_bridge`, how it interacts with external infrastructure and the `app` layer, and how `actor_context` is dynamically injected to provide user isolation.
|
||
|
||
## 1. Overall Role
|
||
|
||
`deerflow.runtime` is the runtime kernel layer of DeerFlow.
|
||
|
||
It sits below agents / tools / middlewares and above app / gateway / infra. Its purpose is to define runtime semantics and boundary contracts, without directly owning web endpoints, ORM models, or concrete infrastructure implementations.
|
||
|
||
Its public surface is re-exported from [`__init__.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/__init__.py) and currently exposes four main capability areas:
|
||
|
||
1. `runs`
|
||
- Run domain types, execution facade, lifecycle observers, and store protocols
|
||
2. `stream_bridge`
|
||
- Stream event bridge contract and public stream types
|
||
3. `actor_context`
|
||
- Request/task-scoped actor context and user-isolation bridge
|
||
4. `serialization`
|
||
- Runtime serialization helpers for LangChain / LangGraph data and outward-facing events
|
||
|
||
Structurally, the current package looks like:
|
||
|
||
```text
|
||
runtime
|
||
├─ runs
|
||
│ ├─ facade / types / observer / store
|
||
│ ├─ internal/*
|
||
│ └─ callbacks/*
|
||
├─ stream_bridge
|
||
│ ├─ contract
|
||
│ └─ exceptions
|
||
├─ actor_context
|
||
└─ serialization / converters
|
||
```
|
||
|
||
## 2. Overall Design and Constraint Model
|
||
|
||
### 2.1 Design Goal
|
||
|
||
The core goal of `runtime` is to decouple runtime control-plane semantics from infrastructure implementations.
|
||
|
||
It only cares about:
|
||
|
||
1. What a run is and how run state changes over time
|
||
2. What lifecycle events and stream events are produced during execution
|
||
3. Which capabilities must be injected from the outside, such as checkpointer, event store, stream bridge, and durable stores
|
||
4. Who the current actor is, and how lower layers can use that for isolation
|
||
|
||
It deliberately does not care about:
|
||
|
||
1. Whether events are stored in memory, Redis, or another transport
|
||
2. How run / thread / feedback data is persisted
|
||
3. HTTP / SSE / FastAPI details
|
||
4. How the auth plugin resolves the request user
|
||
|
||
### 2.2 Boundary Rules
|
||
|
||
The current package has a fairly clear boundary model:
|
||
|
||
1. `runs` owns execution orchestration, not ORM or SQL writes
|
||
2. `stream_bridge` defines stream semantics, not app-level bridge construction
|
||
3. `actor_context` defines runtime context, not auth-plugin behavior
|
||
4. Durable data enters only through boundary protocols:
|
||
- `RunCreateStore`
|
||
- `RunQueryStore`
|
||
- `RunDeleteStore`
|
||
- `RunEventStore`
|
||
5. Lifecycle side effects enter only through `RunObserver`
|
||
6. User isolation is not implemented ad hoc in each module; it is propagated through actor context
|
||
|
||
In one sentence:
|
||
|
||
`runtime` defines semantics and contracts; `app.infra` provides implementations.
|
||
|
||
## 3. runs Subsystem Design
|
||
|
||
### 3.1 Purpose
|
||
|
||
`runtime/runs` is the run orchestration domain. It is responsible for:
|
||
|
||
1. Defining run domain objects and status transitions
|
||
2. Organizing create / stream / wait / join / cancel / delete behavior
|
||
3. Maintaining the in-process runtime control plane
|
||
4. Emitting stream events and lifecycle events during execution
|
||
5. Collecting trace, token, title, and message data through callbacks
|
||
|
||
### 3.2 Core Objects
|
||
|
||
See [`runs/types.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/types.py).
|
||
|
||
The most important types are:
|
||
|
||
1. `RunSpec`
|
||
- Built by the app-side input layer
|
||
- The real execution input
|
||
2. `RunRecord`
|
||
- The runtime record managed by `RunRegistry`
|
||
3. `RunStatus`
|
||
- `pending`, `starting`, `running`, `success`, `error`, `interrupted`, `timeout`
|
||
4. `RunScope`
|
||
- Distinguishes stateful vs stateless execution and temporary thread behavior
|
||
|
||
### 3.3 Current Constraints
|
||
|
||
The current implementation explicitly limits some parts of the problem space:
|
||
|
||
1. `multitask_strategy` currently supports only `reject` and `interrupt` on the main path
|
||
2. `enqueue`, `after_seconds`, and batch execution are not on the current primary path
|
||
3. `RunRegistry` is an in-process state source, not a durable source of truth
|
||
4. External queries may use durable stores, but the live control plane still centers on the in-memory registry
|
||
|
||
### 3.4 Facade and Internal Components
|
||
|
||
`RunsFacade` in [`runs/facade.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/facade.py) provides the unified API:
|
||
|
||
1. `create_background`
|
||
2. `create_and_stream`
|
||
3. `create_and_wait`
|
||
4. `join_stream`
|
||
5. `join_wait`
|
||
6. `cancel`
|
||
7. `get_run`
|
||
8. `list_runs`
|
||
9. `delete_run`
|
||
|
||
Internally it composes:
|
||
|
||
1. `RunRegistry`
|
||
2. `ExecutionPlanner`
|
||
3. `RunSupervisor`
|
||
4. `RunStreamService`
|
||
5. `RunWaitService`
|
||
6. `RunCreateStore` / `RunQueryStore` / `RunDeleteStore`
|
||
7. `RunObserver`
|
||
|
||
So `RunsFacade` is the public entry point, while execution and state transitions are distributed across smaller components.
|
||
|
||
## 4. stream_bridge Design and Implementation
|
||
|
||
### 4.1 Why stream_bridge Is a Separate Abstraction
|
||
|
||
`StreamBridge` is defined in [`stream_bridge/contract.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/stream_bridge/contract.py).
|
||
|
||
It exists because run execution needs an event channel that is:
|
||
|
||
1. Subscribable
|
||
2. Replayable
|
||
3. Terminal-state aware
|
||
4. Resume-capable
|
||
|
||
That behavior must not be hard-coupled to HTTP SSE, in-memory queues, or Redis-specific details.
|
||
|
||
So:
|
||
|
||
1. harness defines stream semantics
|
||
2. the app layer owns backend selection and implementation
|
||
|
||
### 4.2 Contract Contents
|
||
|
||
The abstract `StreamBridge` currently exposes:
|
||
|
||
1. `publish(run_id, event, data)`
|
||
2. `publish_end(run_id)`
|
||
3. `publish_terminal(run_id, kind, data)`
|
||
4. `subscribe(run_id, last_event_id, heartbeat_interval)`
|
||
5. `cleanup(run_id, delay=0)`
|
||
6. `cancel(run_id)`
|
||
7. `mark_awaiting_input(run_id)`
|
||
8. `start()`
|
||
9. `close()`
|
||
|
||
Public types include:
|
||
|
||
1. `StreamEvent`
|
||
2. `StreamStatus`
|
||
3. `ResumeResult`
|
||
4. `HEARTBEAT_SENTINEL`
|
||
5. `END_SENTINEL`
|
||
6. `CANCELLED_SENTINEL`
|
||
|
||
### 4.3 Semantic Boundary
|
||
|
||
The contract explicitly distinguishes:
|
||
|
||
1. `end` / `cancel` / `error`
|
||
- Real business-level terminal events for a run
|
||
2. `close()`
|
||
- Bridge-level shutdown
|
||
- Not equivalent to run cancellation
|
||
|
||
### 4.4 Current Implementation Style
|
||
|
||
The concrete implementation currently used is the app-layer [`MemoryStreamBridge`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/stream_bridge/adapters/memory.py).
|
||
|
||
Its design is effectively “one in-memory event log per run”:
|
||
|
||
1. `_RunStream` stores the event list, offset mapping, status, subscriber count, and awaiting-input state
|
||
2. `publish()` generates increasing event IDs and appends to the per-run log
|
||
3. `subscribe()` supports replay, heartbeat, resume, and terminal exit
|
||
4. `cleanup_loop()` handles:
|
||
- old streams
|
||
- active streams with no publish activity
|
||
- orphan terminal streams
|
||
- TTL expiration
|
||
5. `mark_awaiting_input()` extends timeout behavior for HITL flows
|
||
|
||
The Redis implementation is still only a placeholder in [`RedisStreamBridge`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/stream_bridge/adapters/redis.py).
|
||
|
||
### 4.5 Call Chain
|
||
|
||
The stream bridge participates in the execution chain like this:
|
||
|
||
```text
|
||
RunsFacade
|
||
-> RunStreamService
|
||
-> StreamBridge
|
||
-> app route converts events to SSE
|
||
```
|
||
|
||
More concretely:
|
||
|
||
1. `_RunExecution._start()` publishes `metadata`
|
||
2. `_RunExecution._stream()` converts agent `astream()` output into bridge events
|
||
3. `_RunExecution._finish_success()` / `_finish_failed()` / `_finish_aborted()` publish terminal events
|
||
4. `RunWaitService` waits by subscribing for `values`, `error`, or terminal events
|
||
5. The app route layer converts those events into outward-facing SSE
|
||
|
||
### 4.6 Future Extensions
|
||
|
||
Likely future directions include:
|
||
|
||
1. A real Redis bridge for cross-process / multi-instance streaming
|
||
2. Stronger Last-Event-ID gap recovery behavior
|
||
3. Richer HITL state handling
|
||
4. Cross-node run coordination and more explicit dead-letter strategies
|
||
|
||
## 5. External Communication and Store Read/Write Boundaries
|
||
|
||
### 5.1 Two Main Outward Boundaries
|
||
|
||
`runtime` does not send HTTP requests directly and does not write ORM models directly, but it communicates outward through two main boundaries:
|
||
|
||
1. `StreamBridge`
|
||
- For outward-facing stream events
|
||
2. `store` / `observer`
|
||
- For durable data and lifecycle side effects
|
||
|
||
### 5.2 Store Boundary Protocols
|
||
|
||
Under [`runs/store`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/store), the harness layer defines:
|
||
|
||
1. `RunCreateStore`
|
||
2. `RunQueryStore`
|
||
3. `RunDeleteStore`
|
||
4. `RunEventStore`
|
||
|
||
These are not harness-internal persistence implementations. They are app-facing contracts declared by the runtime.
|
||
|
||
### 5.3 How the app Layer Supplies Store Implementations
|
||
|
||
The app layer currently provides:
|
||
|
||
1. [`AppRunCreateStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/store/create_store.py)
|
||
2. [`AppRunQueryStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/store/query_store.py)
|
||
3. [`AppRunDeleteStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/store/delete_store.py)
|
||
4. [`AppRunEventStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/storage/run_events.py)
|
||
5. [`JsonlRunEventStore`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/run_events/jsonl_store.py)
|
||
|
||
The shared pattern is:
|
||
|
||
1. harness depends only on protocols
|
||
2. the app layer owns session lifecycle, commit behavior, access control, and backend choice
|
||
3. durable data eventually lands in `store.repositories.*` or JSONL files
|
||
|
||
### 5.4 How Run Lifecycle Data Leaves the Runtime
|
||
|
||
The single-run executor [`_RunExecution`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/internal/execution/executor.py) does not write to the database directly.
|
||
|
||
It exports data through three paths:
|
||
|
||
1. bridge events
|
||
- Streamed outward to subscribers
|
||
2. callback -> `RunEventStore`
|
||
- Execution trace / message / tool / custom events are persisted in batches
|
||
3. lifecycle event -> `RunObserver`
|
||
- Run started, completed, failed, cancelled, and thread-status updates are emitted for app observers
|
||
|
||
### 5.5 `RunEventStore` Backends
|
||
|
||
The app-side factory [`app/infra/run_events/factory.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/run_events/factory.py) currently selects:
|
||
|
||
1. `run_events.backend == "db"`
|
||
- `AppRunEventStore`
|
||
2. `run_events.backend == "jsonl"`
|
||
- `JsonlRunEventStore`
|
||
|
||
So the runtime does not care whether events end up in a database or in files. It only requires the event-store protocol.
|
||
|
||
## 6. Run Lifecycle Data, Callbacks, Write-Back, and Query Flow
|
||
|
||
### 6.1 Main Single-Run Flow
|
||
|
||
The main `_RunExecution.run()` flow is:
|
||
|
||
1. `_start()`
|
||
2. `_prepare()`
|
||
3. `_stream()`
|
||
4. `_finish_after_stream()`
|
||
5. `finally`
|
||
- `_emit_final_thread_status()`
|
||
- `callbacks.flush()`
|
||
- `bridge.cleanup(run_id)`
|
||
|
||
### 6.2 What the Start Phase Records
|
||
|
||
`_start()`:
|
||
|
||
1. sets run status to `running`
|
||
2. emits `RUN_STARTED`
|
||
3. extracts the first human message and emits `HUMAN_MESSAGE`
|
||
4. captures the pre-run checkpoint ID
|
||
5. publishes a `metadata` stream event
|
||
|
||
### 6.3 What the Callbacks Collect
|
||
|
||
Callbacks live under [`runs/callbacks`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/callbacks).
|
||
|
||
The main ones are:
|
||
|
||
1. `RunEventCallback`
|
||
- Records `run_start`, `run_end`, `llm_request`, `llm_response`, `tool_start`, `tool_end`, `tool_result`, `custom_event`, and more
|
||
- Flushes batches into `RunEventStore`
|
||
2. `RunTokenCallback`
|
||
- Aggregates token usage, LLM call counts, lead/subagent/middleware token split, message counts, first human message, and last AI message
|
||
3. `RunTitleCallback`
|
||
- Extracts thread title from title middleware output or custom events
|
||
|
||
### 6.4 How completion_data Is Produced
|
||
|
||
`RunTokenCallback.completion_data()` yields `RunCompletionData`, including:
|
||
|
||
1. `total_input_tokens`
|
||
2. `total_output_tokens`
|
||
3. `total_tokens`
|
||
4. `llm_call_count`
|
||
5. `lead_agent_tokens`
|
||
6. `subagent_tokens`
|
||
7. `middleware_tokens`
|
||
8. `message_count`
|
||
9. `last_ai_message`
|
||
10. `first_human_message`
|
||
|
||
The executor includes this data in lifecycle payloads on success, failure, and cancellation.
|
||
|
||
### 6.5 How the app Layer Writes Lifecycle Results Back
|
||
|
||
The executor emits `RunLifecycleEvent` objects through [`RunEventEmitter`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/runs/internal/execution/events.py).
|
||
|
||
The app-layer [`StorageRunObserver`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/storage/runs.py) then persists durable state:
|
||
|
||
1. `RUN_STARTED`
|
||
- Marks the run as `running`
|
||
2. `RUN_COMPLETED`
|
||
- Writes completion data
|
||
- Syncs thread title if present
|
||
3. `RUN_FAILED`
|
||
- Writes error and completion data
|
||
4. `RUN_CANCELLED`
|
||
- Writes `interrupted` state and completion data
|
||
5. `THREAD_STATUS_UPDATED`
|
||
- Syncs thread status
|
||
|
||
### 6.6 Query Paths
|
||
|
||
`RunsFacade.get_run()` and `list_runs()` have two paths:
|
||
|
||
1. If a `RunQueryStore` is injected, durable state is used first
|
||
2. Otherwise, the facade falls back to `RunRegistry`
|
||
|
||
So:
|
||
|
||
1. the in-memory registry is the control plane
|
||
2. the durable store is the preferred query surface
|
||
|
||
## 7. How actor_context Is Dynamically Injected for User Isolation
|
||
|
||
### 7.1 Design Goal
|
||
|
||
`actor_context` is defined in [`actor_context.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/packages/harness/deerflow/runtime/actor_context.py).
|
||
|
||
Its purpose is to let the runtime and lower-level infrastructure modules depend on a stable notion of “who the current actor is” without importing the auth plugin, FastAPI request objects, or a specific user model.
|
||
|
||
### 7.2 Current Implementation
|
||
|
||
The current implementation is a request/task-scoped context built on top of `ContextVar`:
|
||
|
||
1. `ActorContext`
|
||
- Currently carries only `user_id`
|
||
2. `_current_actor`
|
||
- A `ContextVar[ActorContext | None]`
|
||
3. `bind_actor_context(actor)`
|
||
- Binds the current actor
|
||
4. `reset_actor_context(token)`
|
||
- Restores the previous context
|
||
5. `get_actor_context()`
|
||
- Returns the current actor
|
||
6. `get_effective_user_id()`
|
||
- Returns the current user ID or `DEFAULT_USER_ID`
|
||
7. `resolve_user_id(value=AUTO | explicit | None)`
|
||
- Resolves repository/storage-facing user IDs consistently
|
||
|
||
### 7.3 How the app Layer Injects It Dynamically
|
||
|
||
Dynamic injection currently happens at the app/auth boundary.
|
||
|
||
For HTTP request flows:
|
||
|
||
1. [`app.plugins.auth.security.middleware`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/plugins/auth/security/middleware.py)
|
||
- Builds `ActorContext(user_id=...)` from the authenticated request user
|
||
- Binds and resets runtime actor context around request handling
|
||
2. [`app.plugins.auth.security.actor_context`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/plugins/auth/security/actor_context.py)
|
||
- Provides `bind_request_actor_context(request)` and `bind_user_actor_context(user_id)`
|
||
- Allows routes and non-HTTP entry points to bind runtime actor context explicitly
|
||
|
||
For non-HTTP / external channel flows:
|
||
|
||
1. [`app/channels/manager.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/channels/manager.py)
|
||
2. [`app/channels/feishu.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/channels/feishu.py)
|
||
|
||
Those entry points also wrap execution with `bind_user_actor_context(user_id)` before they enter runtime-facing code. This matters because:
|
||
|
||
1. the runtime does not need to distinguish HTTP from Feishu or other channels
|
||
2. any entry point that can resolve a user ID can inject the same isolation semantics
|
||
3. the same runtime/store/path/memory code can stay protocol-agnostic
|
||
|
||
So the runtime itself does not know what a request is, and it does not know the auth plugin’s user model. It only knows whether an `ActorContext` is currently bound in the `ContextVar`.
|
||
|
||
### 7.4 Propagation Semantics After Injection
|
||
|
||
In practice, “dynamic injection” here does not mean manually threading `user_id` through every function signature. The app boundary binds the actor into a `ContextVar`, and runtime-facing code reads it only where isolation is actually needed.
|
||
|
||
The current semantics are:
|
||
|
||
1. an entry boundary calls `bind_actor_context(...)`
|
||
2. the async call chain created inside that context sees the same actor view
|
||
3. the boundary restores the previous value with `reset_actor_context(token)` when the request/task exits
|
||
|
||
That gives two practical outcomes:
|
||
|
||
1. most runtime interfaces do not need to carry `user_id` as an explicit parameter through every layer
|
||
2. boundaries that do need durable isolation or path isolation can still read explicitly via `resolve_user_id()` or `get_effective_user_id()`
|
||
|
||
### 7.5 How User Isolation Actually Works
|
||
|
||
User isolation is implemented through “dynamic injection + boundary-specific reads”.
|
||
|
||
The main paths are:
|
||
|
||
1. path / uploads / sandbox / memory
|
||
- Use `get_effective_user_id()` to derive per-user directories and resource scopes
|
||
2. app storage adapters
|
||
- Use `resolve_user_id(AUTO)` in `RunStoreAdapter`, `ThreadMetaStorage`, and related boundaries
|
||
3. run event store
|
||
- `AppRunEventStore` reads `get_actor_context()` and decides whether the current actor may see a thread
|
||
|
||
So user isolation is not centralized in a single middleware and then forgotten. Instead:
|
||
|
||
1. the app boundary dynamically binds the actor into runtime context
|
||
2. runtime and lower layers read that context when they need isolation input
|
||
3. each boundary applies the user ID according to its own responsibility
|
||
|
||
### 7.6 Why This Approach Works Well
|
||
|
||
The current design has several practical strengths:
|
||
|
||
1. The runtime does not depend on a specific auth implementation
|
||
2. HTTP and non-HTTP entry points can reuse the same isolation mechanism
|
||
3. The same user ID propagates naturally into paths, memory, store access, and event visibility
|
||
4. Where stronger enforcement is needed, `AUTO` + `resolve_user_id()` can require a bound actor context
|
||
|
||
### 7.7 Future Extensions
|
||
|
||
`ActorContext` already contains explicit future-extension hints. The current pattern can be extended without changing the architecture:
|
||
|
||
1. `tenant_id`
|
||
- For multi-tenant isolation
|
||
2. `subject_id`
|
||
- For a more stable identity key
|
||
3. `scopes`
|
||
- For finer-grained authorization
|
||
4. `auth_source`
|
||
- To track the source channel or auth mechanism
|
||
|
||
The recommended extension model is to preserve the current shape:
|
||
|
||
1. The app/auth boundary binds a richer `ActorContext`
|
||
2. The runtime depends only on abstract context fields, never on request/user objects
|
||
3. Lower layers read only the fields they actually need
|
||
4. Store / path / sandbox / stream / memory boundaries can gradually become tenant-aware or scope-aware
|
||
|
||
More concretely, stronger isolation can be added incrementally at the boundaries:
|
||
|
||
1. store boundaries
|
||
- add `tenant_id` filtering in `RunStoreAdapter`, `ThreadMetaStorage`, and feedback/event stores
|
||
2. path and sandbox boundaries
|
||
- shard directories by `tenant_id/user_id` instead of `user_id` alone
|
||
3. event-visibility boundaries
|
||
- layer `scopes` or `subject_id` checks into run-event and thread queries
|
||
4. external-channel boundaries
|
||
- populate `auth_source` so API, channel, and internal-job traffic can be distinguished
|
||
|
||
That keeps the runtime dependent on the abstract “current actor context” concept, not on FastAPI request objects or a specific auth implementation.
|
||
|
||
## 8. Interaction with the app Layer
|
||
|
||
### 8.1 How the app Layer Wires the Runtime
|
||
|
||
The app composition root for runs is [`app/gateway/services/runs/facade_factory.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/services/runs/facade_factory.py).
|
||
|
||
It assembles:
|
||
|
||
1. `RunRegistry`
|
||
2. `ExecutionPlanner`
|
||
3. `RunSupervisor`
|
||
4. `RunStreamService`
|
||
5. `RunWaitService`
|
||
6. `RunsRuntime`
|
||
- `bridge`
|
||
- `checkpointer`
|
||
- `store`
|
||
- `event_store`
|
||
- `agent_factory_resolver`
|
||
7. `StorageRunObserver`
|
||
8. `AppRunCreateStore`
|
||
9. `AppRunQueryStore`
|
||
10. `AppRunDeleteStore`
|
||
|
||
### 8.2 How app.state Provides Infrastructure
|
||
|
||
In [`app/gateway/registrar.py`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/gateway/registrar.py):
|
||
|
||
1. `init_persistence()` creates:
|
||
- `persistence`
|
||
- `checkpointer`
|
||
- `run_store`
|
||
- `thread_meta_storage`
|
||
- `run_event_store`
|
||
2. `init_runtime()` creates:
|
||
- `stream_bridge`
|
||
|
||
Those objects are then attached to `app.state` for dependency injection and facade construction.
|
||
|
||
### 8.3 The app Boundary for `stream_bridge`
|
||
|
||
Concrete stream bridge construction now belongs entirely to the app layer:
|
||
|
||
1. harness exports only the `StreamBridge` contract
|
||
2. [`app.infra.stream_bridge.build_stream_bridge`](/Users/rayhpeng/workspace/open-source/deer-flow/backend/app/infra/stream_bridge/factory.py) constructs the actual implementation
|
||
|
||
That is a very explicit boundary:
|
||
|
||
1. harness defines runtime semantics and interfaces
|
||
2. app selects and constructs infrastructure
|
||
|
||
## 9. Summary
|
||
|
||
The most accurate one-line summary of `deerflow.runtime` today is:
|
||
|
||
It is a runtime kernel built around run orchestration, a stream bridge as the streaming boundary, actor context as the dynamic isolation bridge, and store / observer protocols as the durable and side-effect boundaries.
|
||
|
||
More concretely:
|
||
|
||
1. `runs` owns orchestration and lifecycle progression
|
||
2. `stream_bridge` owns stream semantics
|
||
3. `actor_context` owns runtime-scoped user context and isolation bridging
|
||
4. `serialization` / `converters` own outward event and message formatting
|
||
5. the app layer owns real persistence, stream infrastructure, and auth-driven context injection
|
||
|
||
The main strengths of this structure are:
|
||
|
||
1. Runtime semantics are decoupled from infrastructure implementations
|
||
2. Request identity is decoupled from runtime logic
|
||
3. HTTP, CLI, and channel-worker entry points can reuse the same runtime boundaries
|
||
4. The system can grow toward multi-tenancy, cross-process stream bridges, and richer durable backends without changing the core model
|
||
|
||
The current limitations are also clear:
|
||
|
||
1. `RunRegistry` is still an in-process control plane
|
||
2. The Redis bridge is not implemented yet
|
||
3. Some multitask strategies and batch capabilities are still outside the main path
|
||
4. `ActorContext` currently carries only `user_id`, not richer fields such as tenant, scopes, or auth source
|
||
|
||
So the best way to understand the current code is not as a final platform, but as a runtime kernel with clear semantics and extension boundaries.
|