mirror of
https://github.com/bytedance/deer-flow.git
synced 2026-06-10 17:35:57 +00:00
fix(docker): default Gateway to a single worker to prevent multi-worker breakage (#3475)
The default `make up` started the Gateway with `--workers 4`, but run state (RunManager and the stream bridge) is held in-process and nginx uses no sticky sessions. With the default config, same-run requests scatter across workers that each keep their own run state, breaking run cancellation (409), SSE reconnect (hangs on heartbeats), multitask de-duplication, and IM channels (duplicate replies). The shared cross-worker stream bridge does not exist yet. Default GATEWAY_WORKERS to 1 so the out-of-the-box deployment is correct, document the single-worker boundary in the README, and add a regression test pinning the default while keeping it overridable. This is a stop-gap, not a multi-worker implementation; the full fix (shared run state + stream bridge) is tracked in #3191. Refs #3239, #3260
This commit is contained in:
@@ -247,6 +247,9 @@ Access: http://localhost:2026
|
||||
|
||||
The unified nginx endpoint is same-origin by default and does not emit browser CORS headers. If you run a split-origin or port-forwarded browser client, set `GATEWAY_CORS_ORIGINS` to comma-separated exact origins such as `http://localhost:3000`; the Gateway then applies the CORS allowlist and matching CSRF origin checks.
|
||||
|
||||
> [!IMPORTANT]
|
||||
> The Gateway holds run state (RunManager and the stream bridge) in process, so production defaults to a single Gateway worker (`GATEWAY_WORKERS=1`). Raising the worker count without a shared cross-worker stream bridge — which is not yet available — breaks run cancellation, SSE reconnects, request de-duplication, and IM channels, because nginx uses no sticky sessions and each worker keeps its own run state. Scale a single worker up with more CPU/RAM (or move the database and sandbox onto dedicated tiers) instead of raising `GATEWAY_WORKERS`.
|
||||
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed Docker development guide.
|
||||
|
||||
#### Option 2: Local Development
|
||||
|
||||
@@ -0,0 +1,45 @@
|
||||
"""Regression test for the Docker Compose default Gateway worker count.
|
||||
|
||||
The Gateway holds run state (RunManager and the stream bridge) in process, so
|
||||
the default deployment must run a single Uvicorn worker. Running more than one
|
||||
worker without a shared cross-worker stream bridge breaks run cancellation, SSE
|
||||
reconnects, request de-duplication, and IM channels (nginx has no sticky
|
||||
sessions, so requests scatter across workers that each keep their own run
|
||||
state). This test pins the safe default so it cannot silently regress to a
|
||||
multi-worker default, while still allowing operators to override it once a
|
||||
shared stream bridge exists.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parents[2]
|
||||
COMPOSE_PATH = REPO_ROOT / "docker" / "docker-compose.yaml"
|
||||
|
||||
|
||||
def _gateway_command() -> str:
|
||||
"""Return the gateway service command as a single string."""
|
||||
compose = yaml.safe_load(COMPOSE_PATH.read_text(encoding="utf-8"))
|
||||
command = compose["services"]["gateway"]["command"]
|
||||
# ``command`` may load as a scalar string or a list depending on YAML style.
|
||||
if isinstance(command, list):
|
||||
command = " ".join(str(part) for part in command)
|
||||
return command
|
||||
|
||||
|
||||
def test_gateway_defaults_to_single_worker():
|
||||
"""With GATEWAY_WORKERS unset, the worker count must default to 1."""
|
||||
command = _gateway_command()
|
||||
match = re.search(r"GATEWAY_WORKERS:-(\d+)", command)
|
||||
assert match is not None, f"gateway command must set a GATEWAY_WORKERS default; got: {command}"
|
||||
assert match.group(1) == "1", f"default Gateway worker count must be 1, got {match.group(1)}"
|
||||
|
||||
|
||||
def test_gateway_worker_count_remains_overridable():
|
||||
"""The worker count must stay configurable, not hard-coded to 1."""
|
||||
command = _gateway_command()
|
||||
assert "${GATEWAY_WORKERS:-1}" in command, f"worker count must use ${{GATEWAY_WORKERS:-1}} so operators can override it; got: {command}"
|
||||
@@ -72,7 +72,13 @@ services:
|
||||
UV_INDEX_URL: ${UV_INDEX_URL:-https://pypi.org/simple}
|
||||
UV_EXTRAS: ${UV_EXTRAS:-}
|
||||
container_name: deer-flow-gateway
|
||||
command: sh -c "cd backend && PYTHONPATH=. uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001 --workers ${GATEWAY_WORKERS:-4}"
|
||||
# Gateway hosts the agent runtime with in-process RunManager + StreamBridge
|
||||
# singletons -- run state lives in this worker's memory. Default to a single
|
||||
# worker: with >1 worker and no nginx sticky sessions, run cancel, SSE
|
||||
# reconnect, request dedup, and per-worker IM channel services all break
|
||||
# across workers until a shared (e.g. redis) stream bridge lands, which is
|
||||
# not yet implemented. Override GATEWAY_WORKERS only once that is in place.
|
||||
command: sh -c "cd backend && PYTHONPATH=. uv run uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001 --workers ${GATEWAY_WORKERS:-1}"
|
||||
volumes:
|
||||
- ${DEER_FLOW_CONFIG_PATH}:/app/backend/config.yaml:ro
|
||||
- ${DEER_FLOW_EXTENSIONS_CONFIG_PATH}:/app/backend/extensions_config.json:ro
|
||||
|
||||
Reference in New Issue
Block a user