feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008)

* feat(auth): introduce backend auth module Port RFC-001 authentication core from PR #1728: - JWT token handling (create_access_token, decode_token, TokenPayload) - Password hashing (bcrypt) with verify_password - SQLite UserRepository with base interface - Provider Factory pattern (LocalAuthProvider) - CLI reset_admin tool - Auth-specific errors (AuthErrorCode, TokenError, AuthErrorResponse) Deps: - bcrypt>=4.0.0 - pyjwt>=2.9.0 - email-validator>=2.0.0 - backend/uv.toml pins public PyPI index Tests: 12 pure unit tests (test_auth_config.py, test_auth_errors.py). Scope note: authz.py, test_auth.py, and test_auth_type_system.py are deferred to commit 2 because they depend on middleware and deps wiring that is not yet in place. Commit 1 stays "pure new files only" as the spec mandates. * feat(auth): wire auth end-to-end (middleware + frontend replacement) Backend: - Port auth_middleware, csrf_middleware, langgraph_auth, routers/auth - Port authz decorator (owner_filter_key defaults to 'owner_id') - Merge app.py: register AuthMiddleware + CSRFMiddleware + CORS, add _ensure_admin_user lifespan hook, _migrate_orphaned_threads helper, register auth router - Merge deps.py: add get_local_provider, get_current_user_from_request, get_optional_user_from_request; keep get_current_user as thin str|None adapter for feedback router - langgraph.json: add auth path pointing to langgraph_auth.py:auth - Rename metadata['user_id'] -> metadata['owner_id'] in langgraph_auth (both metadata write and LangGraph filter dict) + test fixtures Frontend: - Delete better-auth library and api catch-all route - Remove better-auth npm dependency and env vars (BETTER_AUTH_SECRET, BETTER_AUTH_GITHUB_*) from env.js - Port frontend/src/core/auth/* (AuthProvider, gateway-config, proxy-policy, server-side getServerSideUser, types) - Port frontend/src/core/api/fetcher.ts - Port (auth)/layout, (auth)/login, (auth)/setup pages - Rewrite workspace/layout.tsx as server component that calls getServerSideUser and wraps in AuthProvider - Port workspace/workspace-content.tsx for the client-side sidebar logic Tests: - Port 5 auth test files (test_auth, test_auth_middleware, test_auth_type_system, test_ensure_admin, test_langgraph_auth) - 176 auth tests PASS After this commit: login/logout/registration flow works, but persistence layer does not yet filter by owner_id. Commit 4 closes that gap. * feat(auth): account settings page + i18n - Port account-settings-page.tsx (change password, change email, logout) - Wire into settings-dialog.tsx as new "account" section with UserIcon, rendered first in the section list - Add i18n keys: - en-US/zh-CN: settings.sections.account ("Account" / "账号") - en-US/zh-CN: button.logout ("Log out" / "退出登录") - types.ts: matching type declarations * feat(auth): enforce owner_id across 2.0-rc persistence layer Add request-scoped contextvar-based owner filtering to threads_meta, runs, run_events, and feedback repositories. Router code is unchanged — isolation is enforced at the storage layer so that any caller that forgets to pass owner_id still gets filtered results, and new routes cannot accidentally leak data. Core infrastructure ------------------- - deerflow/runtime/user_context.py (new): - ContextVar[CurrentUser | None] with default None - runtime_checkable CurrentUser Protocol (structural subtype with .id) - set/reset/get/require helpers - AUTO sentinel + resolve_owner_id(value, method_name) for sentinel three-state resolution: AUTO reads contextvar, explicit str overrides, explicit None bypasses the filter (for migration/CLI) Repository changes ------------------ - ThreadMetaRepository: create/get/search/update_*/delete gain owner_id=AUTO kwarg; read paths filter by owner, writes stamp it, mutations check ownership before applying - RunRepository: put/get/list_by_thread/delete gain owner_id=AUTO kwarg - FeedbackRepository: create/get/list_by_run/list_by_thread/delete gain owner_id=AUTO kwarg - DbRunEventStore: list_messages/list_events/list_messages_by_run/ count_messages/delete_by_thread/delete_by_run gain owner_id=AUTO kwarg. Write paths (put/put_batch) read contextvar softly: when a request-scoped user is available, owner_id is stamped; background worker writes without a user context pass None which is valid (orphan row to be bound by migration) Schema ------ - persistence/models/run_event.py: RunEventRow.owner_id = Mapped[ str | None] = mapped_column(String(64), nullable=True, index=True) - No alembic migration needed: 2.0 ships fresh, Base.metadata.create_all picks up the new column automatically Middleware ---------- - auth_middleware.py: after cookie check, call get_optional_user_from_ request to load the real User, stamp it into request.state.user AND the contextvar via set_current_user, reset in a try/finally. Public paths and unauthenticated requests continue without contextvar, and @require_auth handles the strict 401 path Test infrastructure ------------------- - tests/conftest.py: @pytest.fixture(autouse=True) _auto_user_context sets a default SimpleNamespace(id="test-user-autouse") on every test unless marked @pytest.mark.no_auto_user. Keeps existing 20+ persistence tests passing without modification - pyproject.toml [tool.pytest.ini_options]: register no_auto_user marker so pytest does not emit warnings for opt-out tests - tests/test_user_context.py: 6 tests covering three-state semantics, Protocol duck typing, and require/optional APIs - tests/test_thread_meta_repo.py: one test updated to pass owner_id= None explicitly where it was previously relying on the old default Test results ------------ - test_user_context.py: 6 passed - test_auth*.py + test_langgraph_auth.py + test_ensure_admin.py: 127 - test_run_event_store / test_run_repository / test_thread_meta_repo / test_feedback: 92 passed - Full backend suite: 1905 passed, 2 failed (both @requires_llm flaky integration tests unrelated to auth), 1 skipped * feat(auth): extend orphan migration to 2.0-rc persistence tables _ensure_admin_user now runs a three-step pipeline on every boot: Step 1 (fatal): admin user exists / is created / password is reset Step 2 (non-fatal): LangGraph store orphan threads → admin Step 3 (non-fatal): SQL persistence tables → admin - threads_meta - runs - run_events - feedback Each step is idempotent. The fatal/non-fatal split mirrors PR #1728's original philosophy: admin creation failure blocks startup (the system is unusable without an admin), whereas migration failures log a warning and let the service proceed (a partial migration is recoverable; a missing admin is not). Key helpers ----------- - _iter_store_items(store, namespace, *, page_size=500): async generator that cursor-paginates across LangGraph store pages. Fixes PR #1728's hardcoded limit=1000 bug that would silently lose orphans beyond the first page. - _migrate_orphaned_threads(store, admin_user_id): Rewritten to use _iter_store_items. Returns the migrated count so the caller can log it; raises only on unhandled exceptions. - _migrate_orphan_sql_tables(admin_user_id): Imports the 4 ORM models lazily, grabs the shared session factory, runs one UPDATE per table in a single transaction, commits once. No-op when no persistence backend is configured (in-memory dev). Tests: test_ensure_admin.py (8 passed) * test(auth): port AUTH test plan docs + lint/format pass - Port backend/docs/AUTH_TEST_PLAN.md and AUTH_UPGRADE.md from PR #1728 - Rename metadata.user_id → metadata.owner_id in AUTH_TEST_PLAN.md (4 occurrences from the original PR doc) - ruff auto-fix UP037 in sentinel type annotations: drop quotes around "str | None | _AutoSentinel" now that from __future__ import annotations makes them implicit string forms - ruff format: 2 files (app/gateway/app.py, runtime/user_context.py) Note on test coverage additions: - conftest.py autouse fixture was already added in commit 4 (had to be co-located with the repository changes to keep pre-existing persistence tests passing) - cross-user isolation E2E tests (test_owner_isolation.py) deferred — enforcement is already proven by the 98-test repository suite via the autouse fixture + explicit _AUTO sentinel exercises - New test cases (TC-API-17..20, TC-ATK-13, TC-MIG-01..07) listed in AUTH_TEST_PLAN.md are deferred to a follow-up PR — they are manual-QA test cases rather than pytest code, and the spec-level coverage is already met by test_user_context.py + the 98-test repository suite. Final test results: - Auth suite (test_auth*, test_langgraph_auth, test_ensure_admin, test_user_context): 186 passed - Persistence suite (test_run_event_store, test_run_repository, test_thread_meta_repo, test_feedback): 98 passed - Lint: ruff check + ruff format both clean * test(auth): add cross-user isolation test suite 10 tests exercising the storage-layer owner filter by manually switching the user_context contextvar between two users. Verifies the safety invariant: After a repository write with owner_id=A, a subsequent read with owner_id=B must not return the row, and vice versa. Covers all 4 tables that own user-scoped data: TC-API-17 threads_meta — read, search, update, delete cross-user TC-API-18 runs — get, list_by_thread, delete cross-user TC-API-19 run_events — list_messages, list_events, count_messages, delete_by_thread (CRITICAL: raw conversation content leak vector) TC-API-20 feedback — get, list_by_run, delete cross-user Plus two meta-tests verifying the sentinel pattern itself: - AUTO + unset contextvar raises RuntimeError - explicit owner_id=None bypasses the filter (migration escape hatch) Architecture note ----------------- These tests bypass the HTTP layer by design. The full chain (cookie → middleware → contextvar → repository) is covered piecewise: - test_auth_middleware.py: middleware sets contextvar from cookies - test_owner_isolation.py: repositories enforce isolation when contextvar is set to different users Together they prove the end-to-end safety property without the ceremony of spinning up a full TestClient + in-memory DB for every router endpoint. Tests pass: 231 (full auth + persistence + isolation suite) Lint: clean * refactor(auth): migrate user repository to SQLAlchemy ORM Move the users table into the shared persistence engine so auth matches the pattern of threads_meta, runs, run_events, and feedback — one engine, one session factory, one schema init codepath. New files --------- - persistence/user/__init__.py, persistence/user/model.py: UserRow ORM class with partial unique index on (oauth_provider, oauth_id) - Registered in persistence/models/__init__.py so Base.metadata.create_all() picks it up Modified -------- - auth/repositories/sqlite.py: rewritten as async SQLAlchemy, identical constructor pattern to the other four repositories (def __init__(self, session_factory) + self._sf = session_factory) - auth/config.py: drop users_db_path field — storage is configured through config.database like every other table - deps.py/get_local_provider: construct SQLiteUserRepository with the shared session factory, fail fast if engine is not initialised - tests/test_auth.py: rewrite test_sqlite_round_trip_new_fields to use the shared engine (init_engine + close_engine in a tempdir) - tests/test_auth_type_system.py: add per-test autouse fixture that spins up a scratch engine and resets deps._cached_* singletons * refactor(auth): remove SQL orphan migration (unused in supported scenarios) The _migrate_orphan_sql_tables helper existed to bind NULL owner_id rows in threads_meta, runs, run_events, and feedback to the admin on first boot. But in every supported upgrade path, it's a no-op: 1. Fresh install: create_all builds fresh tables, no legacy rows 2. No-auth → with-auth (no existing persistence DB): persistence tables are created fresh by create_all, no legacy rows 3. No-auth → with-auth (has existing persistence DB from #1930): NOT a supported upgrade path — "有 DB 到有 DB" schema evolution is out of scope; users wipe DB or run manual ALTER So the SQL orphan migration never has anything to do in the supported matrix. Delete the function, simplify _ensure_admin_user from a 3-step pipeline to a 2-step one (admin creation + LangGraph store orphan migration only). LangGraph store orphan migration stays: it serves the real "no-auth → with-auth" upgrade path where a user's existing LangGraph thread metadata has no owner_id field and needs to be stamped with the newly-created admin's id. Tests: 284 passed (auth + persistence + isolation) Lint: clean * security(auth): write initial admin password to 0600 file instead of logs CodeQL py/clear-text-logging-sensitive-data flagged 3 call sites that logged the auto-generated admin password to stdout via logger.info(). Production log aggregators (ELK/Splunk/etc) would have captured those cleartext secrets. Replace with a shared helper that writes to .deer-flow/admin_initial_credentials.txt with mode 0600, and log only the path. New file -------- - app/gateway/auth/credential_file.py: write_initial_credentials() helper. Takes email, password, and a "initial"/"reset" label. Creates .deer-flow/ if missing, writes a header comment plus the email+password, chmods 0o600, returns the absolute Path. Modified -------- - app/gateway/app.py: both _ensure_admin_user paths (fresh creation + needs_setup password reset) now write to file and log the path - app/gateway/auth/reset_admin.py: rewritten to use the shared ORM repo (SQLiteUserRepository with session_factory) and the credential_file helper. The previous implementation was broken after the earlier ORM refactor — it still imported _get_users_conn and constructed SQLiteUserRepository() without a session factory. No tests changed — the three password-log sites are all exercised via existing test_ensure_admin.py which checks that startup succeeds, not that a specific string appears in logs. CodeQL alerts 272, 283, 284: all resolved. * security(auth): strict JWT validation in middleware (fix junk cookie bypass) AUTH_TEST_PLAN test 7.5.8 expects junk cookies to be rejected with 401. The previous middleware behaviour was "presence-only": check that some access_token cookie exists, then pass through. In combination with my Task-12 decision to skip @require_auth decorators on routes, this created a gap where a request with any cookie-shaped string (e.g. access_token=not-a-jwt) would bypass authentication on routes that do not touch the repository (/api/models, /api/mcp/config, /api/memory, /api/skills, …). Fix: middleware now calls get_current_user_from_request() strictly and catches the resulting HTTPException to render a 401 with the proper fine-grained error code (token_invalid, token_expired, user_not_found, …). On success it stamps request.state.user and the contextvar so repository-layer owner filters work downstream. The 4 old "_with_cookie_passes" tests in test_auth_middleware.py were written for the presence-only behaviour; they asserted that a junk cookie would make the handler return 200. They are renamed to "_with_junk_cookie_rejected" and their assertions flipped to 401. The negative path (no cookie → 401 not_authenticated) is unchanged. Verified: no cookie → 401 not_authenticated junk cookie → 401 token_invalid (the fixed bug) expired cookie → 401 token_expired Tests: 284 passed (auth + persistence + isolation) Lint: clean * security(auth): wire @require_permission(owner_check=True) on isolation routes Apply the require_permission decorator to all 28 routes that take a {thread_id} path parameter. Combined with the strict middleware (previous commit), this gives the double-layer protection that AUTH_TEST_PLAN test 7.5.9 documents: Layer 1 (AuthMiddleware): cookie + JWT validation, rejects junk cookies and stamps request.state.user Layer 2 (@require_permission with owner_check=True): per-resource ownership verification via ThreadMetaStore.check_access — returns 404 if a different user owns the thread The decorator's owner_check branch is rewritten to use the SQL thread_meta_repo (the 2.0-rc persistence layer) instead of the LangGraph store path that PR #1728 used (_store_get / get_store in routers/threads.py). The inject_record convenience is dropped — no caller in 2.0 needs the LangGraph blob, and the SQL repo has a different shape. Routes decorated (28 total): - threads.py: delete, patch, get, get-state, post-state, post-history - thread_runs.py: post-runs, post-runs-stream, post-runs-wait, list_runs, get_run, cancel_run, join_run, stream_existing_run, list_thread_messages, list_run_messages, list_run_events, thread_token_usage - feedback.py: create, list, stats, delete - uploads.py: upload (added Request param), list, delete - artifacts.py: get_artifact - suggestions.py: generate (renamed body parameter to avoid conflict with FastAPI Request) Test fixes: - test_suggestions_router.py: bypass the decorator via __wrapped__ (the unit tests cover parsing logic, not auth — no point spinning up a thread_meta_repo just to test JSON unwrapping) - test_auth_middleware.py 4 fake-cookie tests: already updated in the previous commit (745bf432) Tests: 293 passed (auth + persistence + isolation + suggestions) Lint: clean * security(auth): defense-in-depth fixes from release validation pass Eight findings caught while running the AUTH_TEST_PLAN end-to-end against the deployed sg_dev stack. Each is a pre-condition for shipping release/2.0-rc that the previous PRs missed. Backend hardening - routers/auth.py: rate limiter X-Real-IP now requires AUTH_TRUSTED_PROXIES whitelist (CIDR/IP allowlist). Without nginx in front, the previous code honored arbitrary X-Real-IP, letting an attacker rotate the header to fully bypass the per-IP login lockout. - routers/auth.py: 36-entry common-password blocklist via Pydantic field_validator on RegisterRequest + ChangePasswordRequest. The shared _validate_strong_password helper keeps the constraint in one place. - routers/threads.py: ThreadCreateRequest + ThreadPatchRequest strip server-reserved metadata keys (owner_id, user_id) via Pydantic field_validator so a forged value can never round-trip back to other clients reading the same thread. The actual ownership invariant stays on the threads_meta row; this closes the metadata-blob echo gap. - authz.py + thread_meta/sql.py: require_permission gains a require_existing flag plumbed through check_access(require_existing=True). Destructive routes (DELETE/PATCH/state-update/runs/feedback) now treat a missing thread_meta row as 404 instead of "untracked legacy thread, allow", closing the cross-user delete-idempotence gap where any user could successfully DELETE another user's deleted thread. - repositories/sqlite.py + base.py: update_user raises UserNotFoundError on a vanished row instead of silently returning the input. Concurrent delete during password reset can no longer look like a successful update. - runtime/user_context.py: resolve_owner_id() coerces User.id (UUID) to str at the contextvar boundary so SQLAlchemy String(64) columns can bind it. The whole 2.0-rc isolation pipeline was previously broken end-to-end (POST /api/threads → 500 "type 'UUID' is not supported"). - persistence/engine.py: SQLAlchemy listener enables PRAGMA journal_mode=WAL, synchronous=NORMAL, foreign_keys=ON on every new SQLite connection. TC-UPG-06 in the test plan expects WAL; previous code shipped with the default 'delete' journal. - auth_middleware.py: stamp request.state.auth = AuthContext(...) so @require_permission's short-circuit fires; previously every isolation request did a duplicate JWT decode + users SELECT. Also unifies the 401 payload through AuthErrorResponse(...).model_dump(). - app.py: _ensure_admin_user restructure removes the noqa F821 scoping bug where 'password' was referenced outside the branch that defined it. New _announce_credentials helper absorbs the duplicate log block in the fresh-admin and reset-admin branches. * fix(frontend+nginx): rollout CSRF on every state-changing client path The frontend was 100% broken in gateway-pro mode for any user trying to open a specific chat thread. Three cumulative bugs each silently masked the next. LangGraph SDK CSRF gap (api-client.ts) - The Client constructor took only apiUrl, no defaultHeaders, no fetch interceptor. The SDK's internal fetch never sent X-CSRF-Token, so every state-changing /api/langgraph-compat/* call (runs/stream, threads/search, threads/{tid}/history, ...) hit CSRFMiddleware and got 403 before reaching the auth check. UI symptom: empty thread page with no error message; the SPA's hooks swallowed the rejection. - Fix: pass an onRequest hook that injects X-CSRF-Token from the csrf_token cookie per request. Reading the cookie per call (not at construction time) handles login / logout / password-change cookie rotation transparently. The SDK's prepareFetchOptions calls onRequest for both regular requests AND streaming/SSE/reconnect, so the same hook covers runs.stream and runs.joinStream. Raw fetch CSRF gap (7 files) - Audit: 11 frontend fetch sites, only 2 included CSRF (login/setup + account-settings change-password). The other 7 routed through raw fetch() with no header — suggestions, memory, agents, mcp, skills, uploads, and the local thread cleanup hook all 403'd silently. - Fix: enhance fetcher.ts:fetchWithAuth to auto-inject X-CSRF-Token on POST/PUT/DELETE/PATCH from a single shared readCsrfCookie() helper. Convert all 7 raw fetch() callers to fetchWithAuth so the contract is centrally enforced. api-client.ts and fetcher.ts share readCsrfCookie + STATE_CHANGING_METHODS to avoid drift. nginx routing + buffering (nginx.local.conf) - The auth feature shipped without updating the nginx config: per-API explicit location blocks but no /api/v1/auth/, /api/feedback, /api/runs. The frontend's client-side fetches to /api/v1/auth/login/local 404'd from the Next.js side because nginx routed /api/* to the frontend. - Fix: add catch-all `location /api/` that proxies to the gateway. nginx longest-prefix matching keeps the explicit blocks (/api/models, /api/threads regex, /api/langgraph/, ...) winning for their paths. - Fix: disable proxy_buffering + proxy_request_buffering for the frontend `location /` block. Without it, nginx tries to spool large Next.js chunks into /var/lib/nginx/proxy (root-owned) and fails with Permission denied → ERR_INCOMPLETE_CHUNKED_ENCODING → ChunkLoadError. * test(auth): release-validation test infra and new coverage Test fixtures and unit tests added during the validation pass. Router test helpers (NEW: tests/_router_auth_helpers.py) - make_authed_test_app(): builds a FastAPI test app with a stub middleware that stamps request.state.user + request.state.auth and a permissive thread_meta_repo mock. TestClient-based router tests (test_artifacts_router, test_threads_router) use it instead of bare FastAPI() so the new @require_permission(owner_check=True) decorators short-circuit cleanly. - call_unwrapped(): walks the __wrapped__ chain to invoke the underlying handler without going through the authz wrappers. Direct-call tests (test_uploads_router) use it. Typed with ParamSpec so the wrapped signature flows through. Backend test additions - test_auth.py: 7 tests for the new _get_client_ip trust model (no proxy / trusted proxy / untrusted peer / XFF rejection / invalid CIDR / no client). 5 tests for the password blocklist (literal, case-insensitive, strong password accepted, change-password binding, short-password length-check still fires before blocklist). test_update_user_raises_when_row_concurrently_deleted: closes a shipped-without-coverage gap on the new UserNotFoundError contract. - test_thread_meta_repo.py: 4 tests for check_access(require_existing=True) — strict missing-row denial, strict owner match, strict owner mismatch, strict null-owner still allowed (shared rows survive the tightening). - test_ensure_admin.py: 3 tests for _migrate_orphaned_threads / _iter_store_items pagination, covering the TC-UPG-02 upgrade story end-to-end via mock store. Closes the gap where the cursor pagination was untested even though the previous PR rewrote it. - test_threads_router.py: 5 tests for _strip_reserved_metadata (owner_id removal, user_id removal, safe-keys passthrough, empty input, both-stripped). - test_auth_type_system.py: replace "password123" fixtures with Tr0ub4dor3a / AnotherStr0ngPwd! so the new password blocklist doesn't reject the test data. * docs(auth): refresh TC-DOCKER-05 + document Docker validation gap - AUTH_TEST_PLAN.md TC-DOCKER-05: the previous expectation ("admin password visible in docker logs") was stale after the simplify pass that moved credentials to a 0600 file. The grep "Password:" check would have silently failed and given a false sense of coverage. New expectation matches the actual file-based path: 0600 file in DEER_FLOW_HOME, log shows the path (not the secret), reverse-grep asserts no leaked password in container logs. - NEW: docs/AUTH_TEST_DOCKER_GAP.md documents the only un-executed block in the test plan (TC-DOCKER-01..06). Reason: sg_dev validation host has no Docker daemon installed. The doc maps each Docker case to an already-validated bare-metal equivalent (TC-1.1, TC-REENT-01, TC-API-02 etc.) so the gap is auditable, and includes pre-flight reproduction steps for whoever has Docker available. --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
2026-05-22 07:56:48 +00:00 · 2026-04-09 11:29:32 +08:00
parent 185f5649dd
commit e75a2ff29a
92 changed files with 9142 additions and 471 deletions
@@ -1,15 +1,21 @@
 import logging
+import os
 from collections.abc import AsyncGenerator
 from contextlib import asynccontextmanager
+from datetime import UTC

 from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware

+from app.gateway.auth_middleware import AuthMiddleware
 from app.gateway.config import get_gateway_config
+from app.gateway.csrf_middleware import CSRFMiddleware
 from app.gateway.deps import langgraph_runtime
 from app.gateway.routers import (
    agents,
    artifacts,
    assistants_compat,
+    auth,
    channels,
    feedback,
    mcp,
@@ -34,6 +40,125 @@ logging.basicConfig(
 logger = logging.getLogger(__name__)


+async def _ensure_admin_user(app: FastAPI) -> None:
+    """Auto-create the admin user on first boot if no users exist.
+
+    After admin creation, migrate orphan threads from the LangGraph
+    store (metadata.owner_id unset) to the admin account. This is the
+    "no-auth → with-auth" upgrade path: users who ran DeerFlow without
+    authentication have existing LangGraph thread data that needs an
+    owner assigned.
+
+    No SQL persistence migration is needed: the four owner_id columns
+    (threads_meta, runs, run_events, feedback) only come into existence
+    alongside the auth module via create_all, so freshly created tables
+    never contain NULL-owner rows. "Existing persistence DB + new auth"
+    is not a supported upgrade path — fresh install or wipe-and-retry.
+
+    Multi-worker safe: relies on SQLite UNIQUE constraint to resolve
+    races during admin creation. Only the worker that successfully
+    creates/updates the admin prints the password; losers silently skip.
+    """
+    import secrets
+
+    from app.gateway.auth.credential_file import write_initial_credentials
+    from app.gateway.deps import get_local_provider
+
+    def _announce_credentials(email: str, password: str, *, label: str, headline: str) -> None:
+        """Write the password to a 0600 file and log the path (never the secret)."""
+        cred_path = write_initial_credentials(email, password, label=label)
+        logger.info("=" * 60)
+        logger.info("  %s", headline)
+        logger.info("  Credentials written to: %s (mode 0600)", cred_path)
+        logger.info("  Change it after login: Settings -> Account")
+        logger.info("=" * 60)
+
+    provider = get_local_provider()
+    user_count = await provider.count_users()
+
+    admin = None
+
+    if user_count == 0:
+        password = secrets.token_urlsafe(16)
+        try:
+            admin = await provider.create_user(email="admin@deerflow.dev", password=password, system_role="admin", needs_setup=True)
+        except ValueError:
+            return  # Another worker already created the admin.
+        _announce_credentials(admin.email, password, label="initial", headline="Admin account created on first boot")
+    else:
+        # Admin exists but setup never completed — reset password so operator
+        # can always find it in the console without needing the CLI.
+        # Multi-worker guard: if admin was created less than 30s ago, another
+        # worker just created it and will print the password — skip reset.
+        admin = await provider.get_user_by_email("admin@deerflow.dev")
+        if admin and admin.needs_setup:
+            import time
+
+            age = time.time() - admin.created_at.replace(tzinfo=UTC).timestamp()
+            if age >= 30:
+                from app.gateway.auth.password import hash_password_async
+
+                password = secrets.token_urlsafe(16)
+                admin.password_hash = await hash_password_async(password)
+                admin.token_version += 1
+                await provider.update_user(admin)
+                _announce_credentials(admin.email, password, label="reset", headline="Admin account setup incomplete — password reset")
+
+    if admin is None:
+        return  # Nothing to bind orphans to.
+
+    admin_id = str(admin.id)
+
+    # LangGraph store orphan migration — non-fatal.
+    # This covers the "no-auth → with-auth" upgrade path for users
+    # whose existing LangGraph thread metadata has no owner_id set.
+    store = getattr(app.state, "store", None)
+    if store is not None:
+        try:
+            migrated = await _migrate_orphaned_threads(store, admin_id)
+            if migrated:
+                logger.info("Migrated %d orphan LangGraph thread(s) to admin", migrated)
+        except Exception:
+            logger.exception("LangGraph thread migration failed (non-fatal)")
+
+
+async def _iter_store_items(store, namespace, *, page_size: int = 500):
+    """Paginated async iterator over a LangGraph store namespace.
+
+    Replaces the old hardcoded ``limit=1000`` call with a cursor-style
+    loop so that environments with more than one page of orphans do
+    not silently lose data. Terminates when a page is empty OR when a
+    short page arrives (indicating the last page).
+    """
+    offset = 0
+    while True:
+        batch = await store.asearch(namespace, limit=page_size, offset=offset)
+        if not batch:
+            return
+        for item in batch:
+            yield item
+        if len(batch) < page_size:
+            return
+        offset += page_size
+
+
+async def _migrate_orphaned_threads(store, admin_user_id: str) -> int:
+    """Migrate LangGraph store threads with no owner_id to the given admin.
+
+    Uses cursor pagination so all orphans are migrated regardless of
+    count. Returns the number of rows migrated.
+    """
+    migrated = 0
+    async for item in _iter_store_items(store, ("threads",)):
+        metadata = item.value.get("metadata", {})
+        if not metadata.get("owner_id"):
+            metadata["owner_id"] = admin_user_id
+            item.value["metadata"] = metadata
+            await store.aput(("threads",), item.key, item.value)
+            migrated += 1
+    return migrated
+
+
@asynccontextmanager
 async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """Application lifespan handler."""
@@ -53,6 +178,10 @@ async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    async with langgraph_runtime(app):
        logger.info("LangGraph runtime initialised")

+        # Ensure admin user exists (auto-create on first boot)
+        # Must run AFTER langgraph_runtime so app.state.store is available for thread migration
+        await _ensure_admin_user(app)
+
        # Start IM channel service if any channels are configured
        try:
            from app.channels.service import start_channel_service
@@ -164,7 +293,31 @@ This gateway provides custom endpoints for models, MCP configuration, skills, an
        ],
    )

-    # CORS is handled by nginx - no need for FastAPI middleware
+    # Auth: reject unauthenticated requests to non-public paths (fail-closed safety net)
+    app.add_middleware(AuthMiddleware)
+
+    # CSRF: Double Submit Cookie pattern for state-changing requests
+    app.add_middleware(CSRFMiddleware)
+
+    # CORS: when GATEWAY_CORS_ORIGINS is set (dev without nginx), add CORS middleware.
+    # In production, nginx handles CORS and no middleware is needed.
+    cors_origins_env = os.environ.get("GATEWAY_CORS_ORIGINS", "")
+    if cors_origins_env:
+        cors_origins = [o.strip() for o in cors_origins_env.split(",") if o.strip()]
+        # Validate: wildcard origin with credentials is a security misconfiguration
+        for origin in cors_origins:
+            if origin == "*":
+                logger.error("GATEWAY_CORS_ORIGINS contains wildcard '*' with allow_credentials=True. This is a security misconfiguration — browsers will reject the response. Use explicit scheme://host:port origins instead.")
+                cors_origins = [o for o in cors_origins if o != "*"]
+                break
+        if cors_origins:
+            app.add_middleware(
+                CORSMiddleware,
+                allow_origins=cors_origins,
+                allow_credentials=True,
+                allow_methods=["*"],
+                allow_headers=["*"],
+            )

    # Include routers
    # Models API is mounted at /api/models
@@ -200,6 +353,9 @@ This gateway provides custom endpoints for models, MCP configuration, skills, an
    # Assistants compatibility API (LangGraph Platform stub)
    app.include_router(assistants_compat.router)

+    # Auth API is mounted at /api/v1/auth
+    app.include_router(auth.router)
+
    # Feedback API is mounted at /api/threads/{thread_id}/runs/{run_id}/feedback
    app.include_router(feedback.router)

@@ -0,0 +1,42 @@
+"""Authentication module for DeerFlow.
+
+This module provides:
+- JWT-based authentication
+- Provider Factory pattern for extensible auth methods
+- UserRepository interface for storage backends (SQLite)
+"""
+
+from app.gateway.auth.config import AuthConfig, get_auth_config, set_auth_config
+from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse, TokenError
+from app.gateway.auth.jwt import TokenPayload, create_access_token, decode_token
+from app.gateway.auth.local_provider import LocalAuthProvider
+from app.gateway.auth.models import User, UserResponse
+from app.gateway.auth.password import hash_password, verify_password
+from app.gateway.auth.providers import AuthProvider
+from app.gateway.auth.repositories.base import UserRepository
+
+__all__ = [
+    # Config
+    "AuthConfig",
+    "get_auth_config",
+    "set_auth_config",
+    # Errors
+    "AuthErrorCode",
+    "AuthErrorResponse",
+    "TokenError",
+    # JWT
+    "TokenPayload",
+    "create_access_token",
+    "decode_token",
+    # Password
+    "hash_password",
+    "verify_password",
+    # Models
+    "User",
+    "UserResponse",
+    # Providers
+    "AuthProvider",
+    "LocalAuthProvider",
+    # Repository
+    "UserRepository",
+]
@@ -0,0 +1,57 @@
+"""Authentication configuration for DeerFlow."""
+
+import logging
+import os
+import secrets
+
+from dotenv import load_dotenv
+from pydantic import BaseModel, Field
+
+load_dotenv()
+
+logger = logging.getLogger(__name__)
+
+
+class AuthConfig(BaseModel):
+    """JWT and auth-related configuration. Parsed once at startup.
+
+    Note: the ``users`` table now lives in the shared persistence
+    database managed by ``deerflow.persistence.engine``. The old
+    ``users_db_path`` config key has been removed — user storage is
+    configured through ``config.database`` like every other table.
+    """
+
+    jwt_secret: str = Field(
+        ...,
+        description="Secret key for JWT signing. MUST be set via AUTH_JWT_SECRET.",
+    )
+    token_expiry_days: int = Field(default=7, ge=1, le=30)
+    oauth_github_client_id: str | None = Field(default=None)
+    oauth_github_client_secret: str | None = Field(default=None)
+
+
+_auth_config: AuthConfig | None = None
+
+
+def get_auth_config() -> AuthConfig:
+    """Get the global AuthConfig instance. Parses from env on first call."""
+    global _auth_config
+    if _auth_config is None:
+        jwt_secret = os.environ.get("AUTH_JWT_SECRET")
+        if not jwt_secret:
+            jwt_secret = secrets.token_urlsafe(32)
+            os.environ["AUTH_JWT_SECRET"] = jwt_secret
+            logger.warning(
+                "⚠ AUTH_JWT_SECRET is not set — using an auto-generated ephemeral secret. "
+                "Sessions will be invalidated on restart. "
+                "For production, add AUTH_JWT_SECRET to your .env file: "
+                'python -c "import secrets; print(secrets.token_urlsafe(32))"'
+            )
+        _auth_config = AuthConfig(jwt_secret=jwt_secret)
+    return _auth_config
+
+
+def set_auth_config(config: AuthConfig) -> None:
+    """Set the global AuthConfig instance (for testing)."""
+    global _auth_config
+    _auth_config = config
@@ -0,0 +1,48 @@
+"""Write initial admin credentials to a restricted file instead of logs.
+
+Logging secrets to stdout/stderr is a well-known CodeQL finding
+(py/clear-text-logging-sensitive-data) — in production those logs
+get collected into ELK/Splunk/etc and become a secret sprawl
+source. This helper writes the credential to a 0600 file that only
+the process user can read, and returns the path so the caller can
+log **the path** (not the password) for the operator to pick up.
+"""
+
+from __future__ import annotations
+
+import os
+from pathlib import Path
+
+from deerflow.config.paths import get_paths
+
+_CREDENTIAL_FILENAME = "admin_initial_credentials.txt"
+
+
+def write_initial_credentials(email: str, password: str, *, label: str = "initial") -> Path:
+    """Write the admin email + password to ``{base_dir}/admin_initial_credentials.txt``.
+
+    The file is created **atomically** with mode 0600 via ``os.open``
+    so the password is never world-readable, even for the single syscall
+    window between ``write_text`` and ``chmod``.
+
+    ``label`` distinguishes "initial" (fresh creation) from "reset"
+    (password reset) in the file header so an operator picking up the
+    file after a restart can tell which event produced it.
+
+    Returns the absolute :class:`Path` to the file.
+    """
+    target = get_paths().base_dir / _CREDENTIAL_FILENAME
+    target.parent.mkdir(parents=True, exist_ok=True)
+
+    content = (
+        f"# DeerFlow admin {label} credentials\n# This file is generated on first boot or password reset.\n# Change the password after login via Settings -> Account,\n# then delete this file.\n#\nemail: {email}\npassword: {password}\n"
+    )
+
+    # Atomic 0600 create-or-truncate. O_TRUNC (not O_EXCL) so the
+    # reset-password path can rewrite an existing file without a
+    # separate unlink-then-create dance.
+    fd = os.open(target, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
+    with os.fdopen(fd, "w", encoding="utf-8") as fh:
+        fh.write(content)
+
+    return target.resolve()
@@ -0,0 +1,44 @@
+"""Typed error definitions for auth module.
+
+AuthErrorCode: exhaustive enum of all auth failure conditions.
+TokenError: exhaustive enum of JWT decode failures.
+AuthErrorResponse: structured error payload for HTTP responses.
+"""
+
+from enum import StrEnum
+
+from pydantic import BaseModel
+
+
+class AuthErrorCode(StrEnum):
+    """Exhaustive list of auth error conditions."""
+
+    INVALID_CREDENTIALS = "invalid_credentials"
+    TOKEN_EXPIRED = "token_expired"
+    TOKEN_INVALID = "token_invalid"
+    USER_NOT_FOUND = "user_not_found"
+    EMAIL_ALREADY_EXISTS = "email_already_exists"
+    PROVIDER_NOT_FOUND = "provider_not_found"
+    NOT_AUTHENTICATED = "not_authenticated"
+
+
+class TokenError(StrEnum):
+    """Exhaustive list of JWT decode failure reasons."""
+
+    EXPIRED = "expired"
+    INVALID_SIGNATURE = "invalid_signature"
+    MALFORMED = "malformed"
+
+
+class AuthErrorResponse(BaseModel):
+    """Structured error response — replaces bare `detail` strings."""
+
+    code: AuthErrorCode
+    message: str
+
+
+def token_error_to_code(err: TokenError) -> AuthErrorCode:
+    """Map TokenError to AuthErrorCode — single source of truth."""
+    if err == TokenError.EXPIRED:
+        return AuthErrorCode.TOKEN_EXPIRED
+    return AuthErrorCode.TOKEN_INVALID
@@ -0,0 +1,55 @@
+"""JWT token creation and verification."""
+
+from datetime import UTC, datetime, timedelta
+
+import jwt
+from pydantic import BaseModel
+
+from app.gateway.auth.config import get_auth_config
+from app.gateway.auth.errors import TokenError
+
+
+class TokenPayload(BaseModel):
+    """JWT token payload."""
+
+    sub: str  # user_id
+    exp: datetime
+    iat: datetime | None = None
+    ver: int = 0  # token_version — must match User.token_version
+
+
+def create_access_token(user_id: str, expires_delta: timedelta | None = None, token_version: int = 0) -> str:
+    """Create a JWT access token.
+
+    Args:
+        user_id: The user's UUID as string
+        expires_delta: Optional custom expiry, defaults to 7 days
+        token_version: User's current token_version for invalidation
+
+    Returns:
+        Encoded JWT string
+    """
+    config = get_auth_config()
+    expiry = expires_delta or timedelta(days=config.token_expiry_days)
+
+    now = datetime.now(UTC)
+    payload = {"sub": user_id, "exp": now + expiry, "iat": now, "ver": token_version}
+    return jwt.encode(payload, config.jwt_secret, algorithm="HS256")
+
+
+def decode_token(token: str) -> TokenPayload | TokenError:
+    """Decode and validate a JWT token.
+
+    Returns:
+        TokenPayload if valid, or a specific TokenError variant.
+    """
+    config = get_auth_config()
+    try:
+        payload = jwt.decode(token, config.jwt_secret, algorithms=["HS256"])
+        return TokenPayload(**payload)
+    except jwt.ExpiredSignatureError:
+        return TokenError.EXPIRED
+    except jwt.InvalidSignatureError:
+        return TokenError.INVALID_SIGNATURE
+    except jwt.PyJWTError:
+        return TokenError.MALFORMED
@@ -0,0 +1,87 @@
+"""Local email/password authentication provider."""
+
+from app.gateway.auth.models import User
+from app.gateway.auth.password import hash_password_async, verify_password_async
+from app.gateway.auth.providers import AuthProvider
+from app.gateway.auth.repositories.base import UserRepository
+
+
+class LocalAuthProvider(AuthProvider):
+    """Email/password authentication provider using local database."""
+
+    def __init__(self, repository: UserRepository):
+        """Initialize with a UserRepository.
+
+        Args:
+            repository: UserRepository implementation (SQLite)
+        """
+        self._repo = repository
+
+    async def authenticate(self, credentials: dict) -> User | None:
+        """Authenticate with email and password.
+
+        Args:
+            credentials: dict with 'email' and 'password' keys
+
+        Returns:
+            User if authentication succeeds, None otherwise
+        """
+        email = credentials.get("email")
+        password = credentials.get("password")
+
+        if not email or not password:
+            return None
+
+        user = await self._repo.get_user_by_email(email)
+        if user is None:
+            return None
+
+        if user.password_hash is None:
+            # OAuth user without local password
+            return None
+
+        if not await verify_password_async(password, user.password_hash):
+            return None
+
+        return user
+
+    async def get_user(self, user_id: str) -> User | None:
+        """Get user by ID."""
+        return await self._repo.get_user_by_id(user_id)
+
+    async def create_user(self, email: str, password: str | None = None, system_role: str = "user", needs_setup: bool = False) -> User:
+        """Create a new local user.
+
+        Args:
+            email: User email address
+            password: Plain text password (will be hashed)
+            system_role: Role to assign ("admin" or "user")
+            needs_setup: If True, user must complete setup on first login
+
+        Returns:
+            Created User instance
+        """
+        password_hash = await hash_password_async(password) if password else None
+        user = User(
+            email=email,
+            password_hash=password_hash,
+            system_role=system_role,
+            needs_setup=needs_setup,
+        )
+        return await self._repo.create_user(user)
+
+    async def get_user_by_oauth(self, provider: str, oauth_id: str) -> User | None:
+        """Get user by OAuth provider and ID."""
+        return await self._repo.get_user_by_oauth(provider, oauth_id)
+
+    async def count_users(self) -> int:
+        """Return total number of registered users."""
+        return await self._repo.count_users()
+
+    async def update_user(self, user: User) -> User:
+        """Update an existing user."""
+        return await self._repo.update_user(user)
+
+    async def get_user_by_email(self, email: str) -> User | None:
+        """Get user by email."""
+        return await self._repo.get_user_by_email(email)
@@ -0,0 +1,41 @@
+"""User Pydantic models for authentication."""
+
+from datetime import UTC, datetime
+from typing import Literal
+from uuid import UUID, uuid4
+
+from pydantic import BaseModel, ConfigDict, EmailStr, Field
+
+
+def _utc_now() -> datetime:
+    """Return current UTC time (timezone-aware)."""
+    return datetime.now(UTC)
+
+
+class User(BaseModel):
+    """Internal user representation."""
+
+    model_config = ConfigDict(from_attributes=True)
+
+    id: UUID = Field(default_factory=uuid4, description="Primary key")
+    email: EmailStr = Field(..., description="Unique email address")
+    password_hash: str | None = Field(None, description="bcrypt hash, nullable for OAuth users")
+    system_role: Literal["admin", "user"] = Field(default="user")
+    created_at: datetime = Field(default_factory=_utc_now)
+
+    # OAuth linkage (optional)
+    oauth_provider: str | None = Field(None, description="e.g. 'github', 'google'")
+    oauth_id: str | None = Field(None, description="User ID from OAuth provider")
+
+    # Auth lifecycle
+    needs_setup: bool = Field(default=False, description="True for auto-created admin until setup completes")
+    token_version: int = Field(default=0, description="Incremented on password change to invalidate old JWTs")
+
+
+class UserResponse(BaseModel):
+    """Response model for user info endpoint."""
+
+    id: str
+    email: str
+    system_role: Literal["admin", "user"]
+    needs_setup: bool = False
@@ -0,0 +1,33 @@
+"""Password hashing utilities using bcrypt directly."""
+
+import asyncio
+
+import bcrypt
+
+
+def hash_password(password: str) -> str:
+    """Hash a password using bcrypt."""
+    return bcrypt.hashpw(password.encode("utf-8"), bcrypt.gensalt()).decode("utf-8")
+
+
+def verify_password(plain_password: str, hashed_password: str) -> bool:
+    """Verify a password against its hash."""
+    return bcrypt.checkpw(plain_password.encode("utf-8"), hashed_password.encode("utf-8"))
+
+
+async def hash_password_async(password: str) -> str:
+    """Hash a password using bcrypt (non-blocking).
+
+    Wraps the blocking bcrypt operation in a thread pool to avoid
+    blocking the event loop during password hashing.
+    """
+    return await asyncio.to_thread(hash_password, password)
+
+
+async def verify_password_async(plain_password: str, hashed_password: str) -> bool:
+    """Verify a password against its hash (non-blocking).
+
+    Wraps the blocking bcrypt operation in a thread pool to avoid
+    blocking the event loop during password verification.
+    """
+    return await asyncio.to_thread(verify_password, plain_password, hashed_password)
@@ -0,0 +1,24 @@
+"""Auth provider abstraction."""
+
+from abc import ABC, abstractmethod
+
+
+class AuthProvider(ABC):
+    """Abstract base class for authentication providers."""
+
+    @abstractmethod
+    async def authenticate(self, credentials: dict) -> "User | None":
+        """Authenticate user with given credentials.
+
+        Returns User if authentication succeeds, None otherwise.
+        """
+        ...
+
+    @abstractmethod
+    async def get_user(self, user_id: str) -> "User | None":
+        """Retrieve user by ID."""
+        ...
+
+
+# Import User at runtime to avoid circular imports
+from app.gateway.auth.models import User  # noqa: E402
@@ -0,0 +1,97 @@
+"""User repository interface for abstracting database operations."""
+
+from abc import ABC, abstractmethod
+
+from app.gateway.auth.models import User
+
+
+class UserNotFoundError(LookupError):
+    """Raised when a user repository operation targets a non-existent row.
+
+    Subclass of :class:`LookupError` so callers that already catch
+    ``LookupError`` for "missing entity" can keep working unchanged,
+    while specific call sites can pin to this class to distinguish
+    "concurrent delete during update" from other lookups.
+    """
+
+
+class UserRepository(ABC):
+    """Abstract interface for user data storage.
+
+    Implement this interface to support different storage backends
+    (SQLite)
+    """
+
+    @abstractmethod
+    async def create_user(self, user: User) -> User:
+        """Create a new user.
+
+        Args:
+            user: User object to create
+
+        Returns:
+            Created User with ID assigned
+
+        Raises:
+            ValueError: If email already exists
+        """
+        ...
+
+    @abstractmethod
+    async def get_user_by_id(self, user_id: str) -> User | None:
+        """Get user by ID.
+
+        Args:
+            user_id: User UUID as string
+
+        Returns:
+            User if found, None otherwise
+        """
+        ...
+
+    @abstractmethod
+    async def get_user_by_email(self, email: str) -> User | None:
+        """Get user by email.
+
+        Args:
+            email: User email address
+
+        Returns:
+            User if found, None otherwise
+        """
+        ...
+
+    @abstractmethod
+    async def update_user(self, user: User) -> User:
+        """Update an existing user.
+
+        Args:
+            user: User object with updated fields
+
+        Returns:
+            Updated User
+
+        Raises:
+            UserNotFoundError: If no row exists for ``user.id``. This is
+                a hard failure (not a no-op) so callers cannot mistake a
+                concurrent-delete race for a successful update.
+        """
+        ...
+
+    @abstractmethod
+    async def count_users(self) -> int:
+        """Return total number of registered users."""
+        ...
+
+    @abstractmethod
+    async def get_user_by_oauth(self, provider: str, oauth_id: str) -> User | None:
+        """Get user by OAuth provider and ID.
+
+        Args:
+            provider: OAuth provider name (e.g. 'github', 'google')
+            oauth_id: User ID from the OAuth provider
+
+        Returns:
+            User if found, None otherwise
+        """
+        ...
@@ -0,0 +1,122 @@
+"""SQLAlchemy-backed UserRepository implementation.
+
+Uses the shared async session factory from
+``deerflow.persistence.engine`` — the ``users`` table lives in the
+same database as ``threads_meta``, ``runs``, ``run_events``, and
+``feedback``.
+
+Constructor takes the session factory directly (same pattern as the
+other four repositories in ``deerflow.persistence.*``). Callers
+construct this after ``init_engine_from_config()`` has run.
+"""
+
+from __future__ import annotations
+
+from datetime import UTC
+from uuid import UUID
+
+from sqlalchemy import func, select
+from sqlalchemy.exc import IntegrityError
+from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
+
+from app.gateway.auth.models import User
+from app.gateway.auth.repositories.base import UserNotFoundError, UserRepository
+from deerflow.persistence.user.model import UserRow
+
+
+class SQLiteUserRepository(UserRepository):
+    """Async user repository backed by the shared SQLAlchemy engine."""
+
+    def __init__(self, session_factory: async_sessionmaker[AsyncSession]) -> None:
+        self._sf = session_factory
+
+    # ── Converters ────────────────────────────────────────────────────
+
+    @staticmethod
+    def _row_to_user(row: UserRow) -> User:
+        return User(
+            id=UUID(row.id),
+            email=row.email,
+            password_hash=row.password_hash,
+            system_role=row.system_role,  # type: ignore[arg-type]
+            # SQLite loses tzinfo on read; reattach UTC so downstream
+            # code can compare timestamps reliably.
+            created_at=row.created_at if row.created_at.tzinfo else row.created_at.replace(tzinfo=UTC),
+            oauth_provider=row.oauth_provider,
+            oauth_id=row.oauth_id,
+            needs_setup=row.needs_setup,
+            token_version=row.token_version,
+        )
+
+    @staticmethod
+    def _user_to_row(user: User) -> UserRow:
+        return UserRow(
+            id=str(user.id),
+            email=user.email,
+            password_hash=user.password_hash,
+            system_role=user.system_role,
+            created_at=user.created_at,
+            oauth_provider=user.oauth_provider,
+            oauth_id=user.oauth_id,
+            needs_setup=user.needs_setup,
+            token_version=user.token_version,
+        )
+
+    # ── CRUD ──────────────────────────────────────────────────────────
+
+    async def create_user(self, user: User) -> User:
+        """Insert a new user. Raises ``ValueError`` on duplicate email."""
+        row = self._user_to_row(user)
+        async with self._sf() as session:
+            session.add(row)
+            try:
+                await session.commit()
+            except IntegrityError as exc:
+                await session.rollback()
+                raise ValueError(f"Email already registered: {user.email}") from exc
+        return user
+
+    async def get_user_by_id(self, user_id: str) -> User | None:
+        async with self._sf() as session:
+            row = await session.get(UserRow, user_id)
+            return self._row_to_user(row) if row is not None else None
+
+    async def get_user_by_email(self, email: str) -> User | None:
+        stmt = select(UserRow).where(UserRow.email == email)
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            row = result.scalar_one_or_none()
+            return self._row_to_user(row) if row is not None else None
+
+    async def update_user(self, user: User) -> User:
+        async with self._sf() as session:
+            row = await session.get(UserRow, str(user.id))
+            if row is None:
+                # Hard fail on concurrent delete: callers (reset_admin,
+                # password change handlers, _ensure_admin_user) all
+                # fetched the user just before this call, so a missing
+                # row here means the row vanished underneath us. Silent
+                # success would let the caller log "password reset" for
+                # a row that no longer exists.
+                raise UserNotFoundError(f"User {user.id} no longer exists")
+            row.email = user.email
+            row.password_hash = user.password_hash
+            row.system_role = user.system_role
+            row.oauth_provider = user.oauth_provider
+            row.oauth_id = user.oauth_id
+            row.needs_setup = user.needs_setup
+            row.token_version = user.token_version
+            await session.commit()
+        return user
+
+    async def count_users(self) -> int:
+        stmt = select(func.count()).select_from(UserRow)
+        async with self._sf() as session:
+            return await session.scalar(stmt) or 0
+
+    async def get_user_by_oauth(self, provider: str, oauth_id: str) -> User | None:
+        stmt = select(UserRow).where(UserRow.oauth_provider == provider, UserRow.oauth_id == oauth_id)
+        async with self._sf() as session:
+            result = await session.execute(stmt)
+            row = result.scalar_one_or_none()
+            return self._row_to_user(row) if row is not None else None
@@ -0,0 +1,91 @@
+"""CLI tool to reset an admin password.
+
+Usage:
+    python -m app.gateway.auth.reset_admin
+    python -m app.gateway.auth.reset_admin --email admin@example.com
+
+Writes the new password to ``.deer-flow/admin_initial_credentials.txt``
+(mode 0600) instead of printing it, so CI / log aggregators never see
+the cleartext secret.
+"""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import secrets
+import sys
+
+from sqlalchemy import select
+
+from app.gateway.auth.credential_file import write_initial_credentials
+from app.gateway.auth.password import hash_password
+from app.gateway.auth.repositories.sqlite import SQLiteUserRepository
+from deerflow.persistence.user.model import UserRow
+
+
+async def _run(email: str | None) -> int:
+    from deerflow.config import get_app_config
+    from deerflow.persistence.engine import (
+        close_engine,
+        get_session_factory,
+        init_engine_from_config,
+    )
+
+    config = get_app_config()
+    await init_engine_from_config(config.database)
+    try:
+        sf = get_session_factory()
+        if sf is None:
+            print("Error: persistence engine not available (check config.database).", file=sys.stderr)
+            return 1
+
+        repo = SQLiteUserRepository(sf)
+
+        if email:
+            user = await repo.get_user_by_email(email)
+        else:
+            # Find first admin via direct SELECT — repository does not
+            # expose a "first admin" helper and we do not want to add
+            # one just for this CLI.
+            async with sf() as session:
+                stmt = select(UserRow).where(UserRow.system_role == "admin").limit(1)
+                row = (await session.execute(stmt)).scalar_one_or_none()
+            if row is None:
+                user = None
+            else:
+                user = await repo.get_user_by_id(row.id)
+
+        if user is None:
+            if email:
+                print(f"Error: user '{email}' not found.", file=sys.stderr)
+            else:
+                print("Error: no admin user found.", file=sys.stderr)
+            return 1
+
+        new_password = secrets.token_urlsafe(16)
+        user.password_hash = hash_password(new_password)
+        user.token_version += 1
+        user.needs_setup = True
+        await repo.update_user(user)
+
+        cred_path = write_initial_credentials(user.email, new_password, label="reset")
+        print(f"Password reset for: {user.email}")
+        print(f"Credentials written to: {cred_path} (mode 0600)")
+        print("Next login will require setup (new email + password).")
+        return 0
+    finally:
+        await close_engine()
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Reset admin password")
+    parser.add_argument("--email", help="Admin email (default: first admin found)")
+    args = parser.parse_args()
+
+    exit_code = asyncio.run(_run(args.email))
+    sys.exit(exit_code)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,117 @@
+"""Global authentication middleware — fail-closed safety net.
+
+Rejects unauthenticated requests to non-public paths with 401. When a
+request passes the cookie check, resolves the JWT payload to a real
+``User`` object and stamps it into both ``request.state.user`` and the
+``deerflow.runtime.user_context`` contextvar so that repository-layer
+owner filtering works automatically via the sentinel pattern.
+
+Fine-grained permission checks remain in authz.py decorators.
+"""
+
+from collections.abc import Callable
+
+from fastapi import HTTPException, Request, Response
+from starlette.middleware.base import BaseHTTPMiddleware
+from starlette.responses import JSONResponse
+from starlette.types import ASGIApp
+
+from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse
+from app.gateway.authz import _ALL_PERMISSIONS, AuthContext
+from deerflow.runtime.user_context import reset_current_user, set_current_user
+
+# Paths that never require authentication.
+_PUBLIC_PATH_PREFIXES: tuple[str, ...] = (
+    "/health",
+    "/docs",
+    "/redoc",
+    "/openapi.json",
+)
+
+# Exact auth paths that are public (login/register/status check).
+# /api/v1/auth/me, /api/v1/auth/change-password etc. are NOT public.
+_PUBLIC_EXACT_PATHS: frozenset[str] = frozenset(
+    {
+        "/api/v1/auth/login/local",
+        "/api/v1/auth/register",
+        "/api/v1/auth/logout",
+        "/api/v1/auth/setup-status",
+    }
+)
+
+
+def _is_public(path: str) -> bool:
+    stripped = path.rstrip("/")
+    if stripped in _PUBLIC_EXACT_PATHS:
+        return True
+    return any(path.startswith(prefix) for prefix in _PUBLIC_PATH_PREFIXES)
+
+
+class AuthMiddleware(BaseHTTPMiddleware):
+    """Strict auth gate: reject requests without a valid session.
+
+    Two-stage check for non-public paths:
+
+    1. Cookie presence — return 401 NOT_AUTHENTICATED if missing
+    2. JWT validation via ``get_optional_user_from_request`` — return 401
+       TOKEN_INVALID if the token is absent, malformed, expired, or the
+       signed user does not exist / is stale
+
+    On success, stamps ``request.state.user`` and the
+    ``deerflow.runtime.user_context`` contextvar so that repository-layer
+    owner filters work downstream without every route needing a
+    ``@require_auth`` decorator. Routes that need per-resource
+    authorization (e.g. "user A cannot read user B's thread by guessing
+    the URL") should additionally use ``@require_permission(...,
+    owner_check=True)`` for explicit enforcement — but authentication
+    itself is fully handled here.
+    """
+
+    def __init__(self, app: ASGIApp) -> None:
+        super().__init__(app)
+
+    async def dispatch(self, request: Request, call_next: Callable) -> Response:
+        if _is_public(request.url.path):
+            return await call_next(request)
+
+        # Non-public path: require session cookie
+        if not request.cookies.get("access_token"):
+            return JSONResponse(
+                status_code=401,
+                content={
+                    "detail": AuthErrorResponse(
+                        code=AuthErrorCode.NOT_AUTHENTICATED,
+                        message="Authentication required",
+                    ).model_dump()
+                },
+            )
+
+        # Strict JWT validation: reject junk/expired tokens with 401
+        # right here instead of silently passing through. This closes
+        # the "junk cookie bypass" gap (AUTH_TEST_PLAN test 7.5.8):
+        # without this, non-isolation routes like /api/models would
+        # accept any cookie-shaped string as authentication.
+        #
+        # We call the *strict* resolver so that fine-grained error
+        # codes (token_expired, token_invalid, user_not_found, …)
+        # propagate from AuthErrorCode, not get flattened into one
+        # generic code. BaseHTTPMiddleware doesn't let HTTPException
+        # bubble up, so we catch and render it as JSONResponse here.
+        from app.gateway.deps import get_current_user_from_request
+
+        try:
+            user = await get_current_user_from_request(request)
+        except HTTPException as exc:
+            return JSONResponse(status_code=exc.status_code, content={"detail": exc.detail})
+
+        # Stamp both request.state.user (for the contextvar pattern)
+        # and request.state.auth (so @require_permission's "auth is
+        # None" branch short-circuits instead of running the entire
+        # JWT-decode + DB-lookup pipeline a second time per request).
+        request.state.user = user
+        request.state.auth = AuthContext(user=user, permissions=_ALL_PERMISSIONS)
+        token = set_current_user(user)
+        try:
+            return await call_next(request)
+        finally:
+            reset_current_user(token)
@@ -0,0 +1,262 @@
+"""Authorization decorators and context for DeerFlow.
+
+Inspired by LangGraph Auth system: https://github.com/langchain-ai/langgraph/blob/main/libs/sdk-py/langgraph_sdk/auth/__init__.py
+
+**Usage:**
+
+1. Use ``@require_auth`` on routes that need authentication
+2. Use ``@require_permission("resource", "action", filter_key=...)`` for permission checks
+3. The decorator chain processes from bottom to top
+
+**Example:**
+
+    @router.get("/{thread_id}")
+    @require_auth
+    @require_permission("threads", "read", owner_check=True)
+    async def get_thread(thread_id: str, request: Request):
+        # User is authenticated and has threads:read permission
+        ...
+
+**Permission Model:**
+
+- threads:read   - View thread
+- threads:write  - Create/update thread
+- threads:delete - Delete thread
+- runs:create   - Run agent
+- runs:read     - View run
+- runs:cancel   - Cancel run
+"""
+
+from __future__ import annotations
+
+import functools
+from collections.abc import Callable
+from typing import TYPE_CHECKING, Any, ParamSpec, TypeVar
+
+from fastapi import HTTPException, Request
+
+if TYPE_CHECKING:
+    from app.gateway.auth.models import User
+
+P = ParamSpec("P")
+T = TypeVar("T")
+
+
+# Permission constants
+class Permissions:
+    """Permission constants for resource:action format."""
+
+    # Threads
+    THREADS_READ = "threads:read"
+    THREADS_WRITE = "threads:write"
+    THREADS_DELETE = "threads:delete"
+
+    # Runs
+    RUNS_CREATE = "runs:create"
+    RUNS_READ = "runs:read"
+    RUNS_CANCEL = "runs:cancel"
+
+
+class AuthContext:
+    """Authentication context for the current request.
+
+    Stored in request.state.auth after require_auth decoration.
+
+    Attributes:
+        user: The authenticated user, or None if anonymous
+        permissions: List of permission strings (e.g., "threads:read")
+    """
+
+    __slots__ = ("user", "permissions")
+
+    def __init__(self, user: User | None = None, permissions: list[str] | None = None):
+        self.user = user
+        self.permissions = permissions or []
+
+    @property
+    def is_authenticated(self) -> bool:
+        """Check if user is authenticated."""
+        return self.user is not None
+
+    def has_permission(self, resource: str, action: str) -> bool:
+        """Check if context has permission for resource:action.
+
+        Args:
+            resource: Resource name (e.g., "threads")
+            action: Action name (e.g., "read")
+
+        Returns:
+            True if user has permission
+        """
+        permission = f"{resource}:{action}"
+        return permission in self.permissions
+
+    def require_user(self) -> User:
+        """Get user or raise 401.
+
+        Raises:
+            HTTPException 401 if not authenticated
+        """
+        if not self.user:
+            raise HTTPException(status_code=401, detail="Authentication required")
+        return self.user
+
+
+def get_auth_context(request: Request) -> AuthContext | None:
+    """Get AuthContext from request state."""
+    return getattr(request.state, "auth", None)
+
+
+_ALL_PERMISSIONS: list[str] = [
+    Permissions.THREADS_READ,
+    Permissions.THREADS_WRITE,
+    Permissions.THREADS_DELETE,
+    Permissions.RUNS_CREATE,
+    Permissions.RUNS_READ,
+    Permissions.RUNS_CANCEL,
+]
+
+
+async def _authenticate(request: Request) -> AuthContext:
+    """Authenticate request and return AuthContext.
+
+    Delegates to deps.get_optional_user_from_request() for the JWT→User pipeline.
+    Returns AuthContext with user=None for anonymous requests.
+    """
+    from app.gateway.deps import get_optional_user_from_request
+
+    user = await get_optional_user_from_request(request)
+    if user is None:
+        return AuthContext(user=None, permissions=[])
+
+    # In future, permissions could be stored in user record
+    return AuthContext(user=user, permissions=_ALL_PERMISSIONS)
+
+
+def require_auth[**P, T](func: Callable[P, T]) -> Callable[P, T]:
+    """Decorator that authenticates the request and sets AuthContext.
+
+    Must be placed ABOVE other decorators (executes after them).
+
+    Usage:
+        @router.get("/{thread_id}")
+        @require_auth  # Bottom decorator (executes first after permission check)
+        @require_permission("threads", "read")
+        async def get_thread(thread_id: str, request: Request):
+            auth: AuthContext = request.state.auth
+            ...
+
+    Raises:
+        ValueError: If 'request' parameter is missing
+    """
+
+    @functools.wraps(func)
+    async def wrapper(*args: Any, **kwargs: Any) -> Any:
+        request = kwargs.get("request")
+        if request is None:
+            raise ValueError("require_auth decorator requires 'request' parameter")
+
+        # Authenticate and set context
+        auth_context = await _authenticate(request)
+        request.state.auth = auth_context
+
+        return await func(*args, **kwargs)
+
+    return wrapper
+
+
+def require_permission(
+    resource: str,
+    action: str,
+    owner_check: bool = False,
+    require_existing: bool = False,
+) -> Callable[[Callable[P, T]], Callable[P, T]]:
+    """Decorator that checks permission for resource:action.
+
+    Must be used AFTER @require_auth.
+
+    Args:
+        resource: Resource name (e.g., "threads", "runs")
+        action: Action name (e.g., "read", "write", "delete")
+        owner_check: If True, validates that the current user owns the resource.
+                     Requires 'thread_id' path parameter and performs ownership check.
+        require_existing: Only meaningful with ``owner_check=True``. If True, a
+                          missing ``threads_meta`` row counts as a denial (404)
+                          instead of "untracked legacy thread, allow". Use on
+                          **destructive / mutating** routes (DELETE, PATCH,
+                          state-update) so a deleted thread can't be re-targeted
+                          by another user via the missing-row code path.
+
+    Usage:
+        # Read-style: legacy untracked threads are allowed
+        @require_permission("threads", "read", owner_check=True)
+        async def get_thread(thread_id: str, request: Request):
+            ...
+
+        # Destructive: thread row MUST exist and be owned by caller
+        @require_permission("threads", "delete", owner_check=True, require_existing=True)
+        async def delete_thread(thread_id: str, request: Request):
+            ...
+
+    Raises:
+        HTTPException 401: If authentication required but user is anonymous
+        HTTPException 403: If user lacks permission
+        HTTPException 404: If owner_check=True but user doesn't own the thread
+        ValueError: If owner_check=True but 'thread_id' parameter is missing
+    """
+
+    def decorator(func: Callable[P, T]) -> Callable[P, T]:
+        @functools.wraps(func)
+        async def wrapper(*args: Any, **kwargs: Any) -> Any:
+            request = kwargs.get("request")
+            if request is None:
+                raise ValueError("require_permission decorator requires 'request' parameter")
+
+            auth: AuthContext = getattr(request.state, "auth", None)
+            if auth is None:
+                auth = await _authenticate(request)
+                request.state.auth = auth
+
+            if not auth.is_authenticated:
+                raise HTTPException(status_code=401, detail="Authentication required")
+
+            # Check permission
+            if not auth.has_permission(resource, action):
+                raise HTTPException(
+                    status_code=403,
+                    detail=f"Permission denied: {resource}:{action}",
+                )
+
+            # Owner check for thread-specific resources.
+            #
+            # 2.0-rc moved thread metadata into the SQL persistence layer
+            # (``threads_meta`` table). We verify ownership via
+            # ``ThreadMetaStore.check_access``: it returns True for
+            # missing rows (untracked legacy thread) and for rows whose
+            # ``owner_id`` is NULL (shared / pre-auth data), so this is
+            # strict-deny rather than strict-allow — only an *existing*
+            # row with a *different* owner_id triggers 404.
+            if owner_check:
+                thread_id = kwargs.get("thread_id")
+                if thread_id is None:
+                    raise ValueError("require_permission with owner_check=True requires 'thread_id' parameter")
+
+                from app.gateway.deps import get_thread_meta_repo
+
+                thread_meta_repo = get_thread_meta_repo(request)
+                allowed = await thread_meta_repo.check_access(
+                    thread_id,
+                    str(auth.user.id),
+                    require_existing=require_existing,
+                )
+                if not allowed:
+                    raise HTTPException(
+                        status_code=404,
+                        detail=f"Thread {thread_id} not found",
+                    )
+
+            return await func(*args, **kwargs)
+
+        return wrapper
+
+    return decorator
@@ -0,0 +1,112 @@
+"""CSRF protection middleware for FastAPI.
+
+Per RFC-001:
+State-changing operations require CSRF protection.
+"""
+
+import secrets
+from collections.abc import Callable
+
+from fastapi import Request, Response
+from starlette.middleware.base import BaseHTTPMiddleware
+from starlette.responses import JSONResponse
+from starlette.types import ASGIApp
+
+CSRF_COOKIE_NAME = "csrf_token"
+CSRF_HEADER_NAME = "X-CSRF-Token"
+CSRF_TOKEN_LENGTH = 64  # bytes
+
+
+def is_secure_request(request: Request) -> bool:
+    """Detect whether the original client request was made over HTTPS."""
+    return request.headers.get("x-forwarded-proto", request.url.scheme) == "https"
+
+
+def generate_csrf_token() -> str:
+    """Generate a secure random CSRF token."""
+    return secrets.token_urlsafe(CSRF_TOKEN_LENGTH)
+
+
+def should_check_csrf(request: Request) -> bool:
+    """Determine if a request needs CSRF validation.
+
+    CSRF is checked for state-changing methods (POST, PUT, DELETE, PATCH).
+    GET, HEAD, OPTIONS, and TRACE are exempt per RFC 7231.
+    """
+    if request.method not in ("POST", "PUT", "DELETE", "PATCH"):
+        return False
+
+    path = request.url.path.rstrip("/")
+    # Exempt /api/v1/auth/me endpoint
+    if path == "/api/v1/auth/me":
+        return False
+    return True
+
+
+_AUTH_EXEMPT_PATHS: frozenset[str] = frozenset(
+    {
+        "/api/v1/auth/login/local",
+        "/api/v1/auth/logout",
+        "/api/v1/auth/register",
+    }
+)
+
+
+def is_auth_endpoint(request: Request) -> bool:
+    """Check if the request is to an auth endpoint.
+
+    Auth endpoints don't need CSRF validation on first call (no token).
+    """
+    return request.url.path.rstrip("/") in _AUTH_EXEMPT_PATHS
+
+
+class CSRFMiddleware(BaseHTTPMiddleware):
+    """Middleware that implements CSRF protection using Double Submit Cookie pattern."""
+
+    def __init__(self, app: ASGIApp) -> None:
+        super().__init__(app)
+
+    async def dispatch(self, request: Request, call_next: Callable) -> Response:
+        _is_auth = is_auth_endpoint(request)
+
+        if should_check_csrf(request) and not _is_auth:
+            cookie_token = request.cookies.get(CSRF_COOKIE_NAME)
+            header_token = request.headers.get(CSRF_HEADER_NAME)
+
+            if not cookie_token or not header_token:
+                return JSONResponse(
+                    status_code=403,
+                    content={"detail": "CSRF token missing. Include X-CSRF-Token header."},
+                )
+
+            if not secrets.compare_digest(cookie_token, header_token):
+                return JSONResponse(
+                    status_code=403,
+                    content={"detail": "CSRF token mismatch."},
+                )
+
+        response = await call_next(request)
+
+        # For auth endpoints that set up session, also set CSRF cookie
+        if _is_auth and request.method == "POST":
+            # Generate a new CSRF token for the session
+            csrf_token = generate_csrf_token()
+            is_https = is_secure_request(request)
+            response.set_cookie(
+                key=CSRF_COOKIE_NAME,
+                value=csrf_token,
+                httponly=False,  # Must be JS-readable for Double Submit Cookie pattern
+                secure=is_https,
+                samesite="strict",
+            )
+
+        return response
+
+
+def get_csrf_token(request: Request) -> str | None:
+    """Get the CSRF token from the current request's cookies.
+
+    This is useful for server-side rendering where you need to embed
+    token in forms or headers.
+    """
+    return request.cookies.get(CSRF_COOKIE_NAME)
@@ -11,11 +11,16 @@ from __future__ import annotations

 from collections.abc import AsyncGenerator
 from contextlib import AsyncExitStack, asynccontextmanager
+from typing import TYPE_CHECKING

 from fastapi import FastAPI, HTTPException, Request

 from deerflow.runtime import RunContext, RunManager

+if TYPE_CHECKING:
+    from app.gateway.auth.local_provider import LocalAuthProvider
+    from app.gateway.auth.repositories.sqlite import SQLiteUserRepository
+

@asynccontextmanager
 async def langgraph_runtime(app: FastAPI) -> AsyncGenerator[None, None]:
@@ -127,10 +132,94 @@ def get_run_context(request: Request) -> RunContext:
    )


-async def get_current_user(request: Request) -> str | None:
-    """Extract user identity from request.
+# ---------------------------------------------------------------------------
+# Auth helpers (used by authz.py and auth middleware)
+# ---------------------------------------------------------------------------

-    Phase 2: always returns None (no authentication).
-    Phase 3: extract user_id from JWT / session / API key header.
+# Cached singletons to avoid repeated instantiation per request
+_cached_local_provider: LocalAuthProvider | None = None
+_cached_repo: SQLiteUserRepository | None = None
+
+
+def get_local_provider() -> LocalAuthProvider:
+    """Get or create the cached LocalAuthProvider singleton.
+
+    Must be called after ``init_engine_from_config()`` — the shared
+    session factory is required to construct the user repository.
    """
-    return None
+    global _cached_local_provider, _cached_repo
+    if _cached_repo is None:
+        from app.gateway.auth.repositories.sqlite import SQLiteUserRepository
+        from deerflow.persistence.engine import get_session_factory
+
+        sf = get_session_factory()
+        if sf is None:
+            raise RuntimeError("get_local_provider() called before init_engine_from_config(); cannot access users table")
+        _cached_repo = SQLiteUserRepository(sf)
+    if _cached_local_provider is None:
+        from app.gateway.auth.local_provider import LocalAuthProvider
+
+        _cached_local_provider = LocalAuthProvider(repository=_cached_repo)
+    return _cached_local_provider
+
+
+async def get_current_user_from_request(request: Request):
+    """Get the current authenticated user from the request cookie.
+
+    Raises HTTPException 401 if not authenticated.
+    """
+    from app.gateway.auth import decode_token
+    from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse, TokenError, token_error_to_code
+
+    access_token = request.cookies.get("access_token")
+    if not access_token:
+        raise HTTPException(
+            status_code=401,
+            detail=AuthErrorResponse(code=AuthErrorCode.NOT_AUTHENTICATED, message="Not authenticated").model_dump(),
+        )
+
+    payload = decode_token(access_token)
+    if isinstance(payload, TokenError):
+        raise HTTPException(
+            status_code=401,
+            detail=AuthErrorResponse(code=token_error_to_code(payload), message=f"Token error: {payload.value}").model_dump(),
+        )
+
+    provider = get_local_provider()
+    user = await provider.get_user(payload.sub)
+    if user is None:
+        raise HTTPException(
+            status_code=401,
+            detail=AuthErrorResponse(code=AuthErrorCode.USER_NOT_FOUND, message="User not found").model_dump(),
+        )
+
+    # Token version mismatch → password was changed, token is stale
+    if user.token_version != payload.ver:
+        raise HTTPException(
+            status_code=401,
+            detail=AuthErrorResponse(code=AuthErrorCode.TOKEN_INVALID, message="Token revoked (password changed)").model_dump(),
+        )
+
+    return user
+
+
+async def get_optional_user_from_request(request: Request):
+    """Get optional authenticated user from request.
+
+    Returns None if not authenticated.
+    """
+    try:
+        return await get_current_user_from_request(request)
+    except HTTPException:
+        return None
+
+
+async def get_current_user(request: Request) -> str | None:
+    """Extract user_id from request cookie, or None if not authenticated.
+
+    Thin adapter that returns the string id for callers that only need
+    identification (e.g., ``feedback.py``). Full-user callers should use
+    ``get_current_user_from_request`` or ``get_optional_user_from_request``.
+    """
+    user = await get_optional_user_from_request(request)
+    return str(user.id) if user else None
@@ -0,0 +1,106 @@
+"""LangGraph Server auth handler — shares JWT logic with Gateway.
+
+Loaded by LangGraph Server via langgraph.json ``auth.path``.
+Reuses the same ``decode_token`` / ``get_auth_config`` as Gateway,
+so both modes validate tokens with the same secret and rules.
+
+Two layers:
+  1. @auth.authenticate — validates JWT cookie, extracts user_id,
+     and enforces CSRF on state-changing methods (POST/PUT/DELETE/PATCH)
+  2. @auth.on — returns metadata filter so each user only sees own threads
+"""
+
+import secrets
+
+from langgraph_sdk import Auth
+
+from app.gateway.auth.errors import TokenError
+from app.gateway.auth.jwt import decode_token
+from app.gateway.deps import get_local_provider
+
+auth = Auth()
+
+# Methods that require CSRF validation (state-changing per RFC 7231).
+_CSRF_METHODS = frozenset({"POST", "PUT", "DELETE", "PATCH"})
+
+
+def _check_csrf(request) -> None:
+    """Enforce Double Submit Cookie CSRF check for state-changing requests.
+
+    Mirrors Gateway's CSRFMiddleware logic so that LangGraph routes
+    proxied directly by nginx have the same CSRF protection.
+    """
+    method = getattr(request, "method", "") or ""
+    if method.upper() not in _CSRF_METHODS:
+        return
+
+    cookie_token = request.cookies.get("csrf_token")
+    header_token = request.headers.get("x-csrf-token")
+
+    if not cookie_token or not header_token:
+        raise Auth.exceptions.HTTPException(
+            status_code=403,
+            detail="CSRF token missing. Include X-CSRF-Token header.",
+        )
+
+    if not secrets.compare_digest(cookie_token, header_token):
+        raise Auth.exceptions.HTTPException(
+            status_code=403,
+            detail="CSRF token mismatch.",
+        )
+
+
+@auth.authenticate
+async def authenticate(request):
+    """Validate the session cookie, decode JWT, and check token_version.
+
+    Same validation chain as Gateway's get_current_user_from_request:
+      cookie → decode JWT → DB lookup → token_version match
+    Also enforces CSRF on state-changing methods.
+    """
+    # CSRF check before authentication so forged cross-site requests
+    # are rejected early, even if the cookie carries a valid JWT.
+    _check_csrf(request)
+
+    token = request.cookies.get("access_token")
+    if not token:
+        raise Auth.exceptions.HTTPException(
+            status_code=401,
+            detail="Not authenticated",
+        )
+
+    payload = decode_token(token)
+    if isinstance(payload, TokenError):
+        raise Auth.exceptions.HTTPException(
+            status_code=401,
+            detail=f"Token error: {payload.value}",
+        )
+
+    user = await get_local_provider().get_user(payload.sub)
+    if user is None:
+        raise Auth.exceptions.HTTPException(
+            status_code=401,
+            detail="User not found",
+        )
+    if user.token_version != payload.ver:
+        raise Auth.exceptions.HTTPException(
+            status_code=401,
+            detail="Token revoked (password changed)",
+        )
+
+    return payload.sub
+
+
+@auth.on
+async def add_owner_filter(ctx: Auth.types.AuthContext, value: dict):
+    """Inject owner_id metadata on writes; filter by owner_id on reads.
+
+    Gateway stores thread ownership as ``metadata.owner_id``.
+    This handler ensures LangGraph Server enforces the same isolation.
+    """
+    # On create/update: stamp owner_id into metadata
+    metadata = value.setdefault("metadata", {})
+    metadata["owner_id"] = ctx.user.identity
+
+    # Return filter dict — LangGraph applies it to search/read/delete
+    return {"owner_id": ctx.user.identity}
@@ -7,6 +7,7 @@ from urllib.parse import quote
 from fastapi import APIRouter, HTTPException, Request
 from fastapi.responses import FileResponse, PlainTextResponse, Response

+from app.gateway.authz import require_permission
 from app.gateway.path_utils import resolve_thread_virtual_path

 logger = logging.getLogger(__name__)
@@ -81,6 +82,7 @@ def _extract_file_from_skill_archive(zip_path: Path, internal_path: str) -> byte
    summary="Get Artifact File",
    description="Retrieve an artifact file generated by the AI agent. Text and binary files can be viewed inline, while active web content is always downloaded.",
 )
+@require_permission("threads", "read", owner_check=True)
 async def get_artifact(thread_id: str, path: str, request: Request, download: bool = False) -> Response:
    """Get an artifact file by its path.

@@ -0,0 +1,418 @@
+"""Authentication endpoints."""
+
+import logging
+import os
+import time
+from ipaddress import ip_address, ip_network
+
+from fastapi import APIRouter, Depends, HTTPException, Request, Response, status
+from fastapi.security import OAuth2PasswordRequestForm
+from pydantic import BaseModel, EmailStr, Field, field_validator
+
+from app.gateway.auth import (
+    UserResponse,
+    create_access_token,
+)
+from app.gateway.auth.config import get_auth_config
+from app.gateway.auth.errors import AuthErrorCode, AuthErrorResponse
+from app.gateway.csrf_middleware import is_secure_request
+from app.gateway.deps import get_current_user_from_request, get_local_provider
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/v1/auth", tags=["auth"])
+
+
+# ── Request/Response Models ──────────────────────────────────────────────
+
+
+class LoginResponse(BaseModel):
+    """Response model for login — token only lives in HttpOnly cookie."""
+
+    expires_in: int  # seconds
+    needs_setup: bool = False
+
+
+# Top common-password blocklist. Drawn from the public SecLists "10k worst
+# passwords" set, lowercased + length>=8 only (shorter ones already fail
+# the min_length check). Kept tight on purpose: this is the **lower bound**
+# defense, not a full HIBP / passlib check, and runs in-process per request.
+_COMMON_PASSWORDS: frozenset[str] = frozenset(
+    {
+        "password",
+        "password1",
+        "password12",
+        "password123",
+        "password1234",
+        "12345678",
+        "123456789",
+        "1234567890",
+        "qwerty12",
+        "qwertyui",
+        "qwerty123",
+        "abc12345",
+        "abcd1234",
+        "iloveyou",
+        "letmein1",
+        "welcome1",
+        "welcome123",
+        "admin123",
+        "administrator",
+        "passw0rd",
+        "p@ssw0rd",
+        "monkey12",
+        "trustno1",
+        "sunshine",
+        "princess",
+        "football",
+        "baseball",
+        "superman",
+        "batman123",
+        "starwars",
+        "dragon123",
+        "master123",
+        "shadow12",
+        "michael1",
+        "jennifer",
+        "computer",
+    }
+)
+
+
+def _password_is_common(password: str) -> bool:
+    """Case-insensitive blocklist check.
+
+    Lowercases the input so trivial mutations like ``Password`` /
+    ``PASSWORD`` are also rejected. Does not normalize digit substitutions
+    (``p@ssw0rd`` is included as a literal entry instead) — keeping the
+    rule cheap and predictable.
+    """
+    return password.lower() in _COMMON_PASSWORDS
+
+
+def _validate_strong_password(value: str) -> str:
+    """Pydantic field-validator body shared by Register + ChangePassword.
+
+    Constraint = function, not type-level mixin. The two request models
+    have no "is-a" relationship; they only share the password-strength
+    rule. Lifting it into a free function lets each model bind it via
+    ``@field_validator(field_name)`` without inheritance gymnastics.
+    """
+    if _password_is_common(value):
+        raise ValueError("Password is too common; choose a stronger password.")
+    return value
+
+
+class RegisterRequest(BaseModel):
+    """Request model for user registration."""
+
+    email: EmailStr
+    password: str = Field(..., min_length=8)
+
+    _strong_password = field_validator("password")(classmethod(lambda cls, v: _validate_strong_password(v)))
+
+
+class ChangePasswordRequest(BaseModel):
+    """Request model for password change (also handles setup flow)."""
+
+    current_password: str
+    new_password: str = Field(..., min_length=8)
+    new_email: EmailStr | None = None
+
+    _strong_password = field_validator("new_password")(classmethod(lambda cls, v: _validate_strong_password(v)))
+
+
+class MessageResponse(BaseModel):
+    """Generic message response."""
+
+    message: str
+
+
+# ── Helpers ───────────────────────────────────────────────────────────────
+
+
+def _set_session_cookie(response: Response, token: str, request: Request) -> None:
+    """Set the access_token HttpOnly cookie on the response."""
+    config = get_auth_config()
+    is_https = is_secure_request(request)
+    response.set_cookie(
+        key="access_token",
+        value=token,
+        httponly=True,
+        secure=is_https,
+        samesite="lax",
+        max_age=config.token_expiry_days * 24 * 3600 if is_https else None,
+    )
+
+
+# ── Rate Limiting ────────────────────────────────────────────────────────
+# In-process dict — not shared across workers. Sufficient for single-worker deployments.
+
+_MAX_LOGIN_ATTEMPTS = 5
+_LOCKOUT_SECONDS = 300  # 5 minutes
+
+# ip → (fail_count, lock_until_timestamp)
+_login_attempts: dict[str, tuple[int, float]] = {}
+
+
+def _trusted_proxies() -> list:
+    """Parse ``AUTH_TRUSTED_PROXIES`` env var into a list of ip_network objects.
+
+    Comma-separated CIDR or single-IP entries. Empty / unset = no proxy is
+    trusted (direct mode). Invalid entries are skipped with a logger warning.
+    Read live so env-var overrides take effect immediately and tests can
+    ``monkeypatch.setenv`` without poking a module-level cache.
+    """
+    raw = os.getenv("AUTH_TRUSTED_PROXIES", "").strip()
+    if not raw:
+        return []
+    nets = []
+    for entry in raw.split(","):
+        entry = entry.strip()
+        if not entry:
+            continue
+        try:
+            nets.append(ip_network(entry, strict=False))
+        except ValueError:
+            logger.warning("AUTH_TRUSTED_PROXIES: ignoring invalid entry %r", entry)
+    return nets
+
+
+def _get_client_ip(request: Request) -> str:
+    """Extract the real client IP for rate limiting.
+
+    Trust model:
+
+    - The TCP peer (``request.client.host``) is always the baseline. It is
+      whatever the kernel reports as the connecting socket — unforgeable
+      by the client itself.
+    - ``X-Real-IP`` is **only** honored if the TCP peer is in the
+      ``AUTH_TRUSTED_PROXIES`` allowlist (set via env var, comma-separated
+      CIDR or single IPs). When set, the gateway is assumed to be behind a
+      reverse proxy (nginx, Cloudflare, ALB, …) that overwrites
+      ``X-Real-IP`` with the original client address.
+    - With no ``AUTH_TRUSTED_PROXIES`` set, ``X-Real-IP`` is silently
+      ignored — closing the bypass where any client could rotate the
+      header to dodge per-IP rate limits in dev / direct-gateway mode.
+
+    ``X-Forwarded-For`` is intentionally NOT used because it is naturally
+    client-controlled at the *first* hop and the trust chain is harder to
+    audit per-request.
+    """
+    peer_host = request.client.host if request.client else None
+
+    trusted = _trusted_proxies()
+    if trusted and peer_host:
+        try:
+            peer_ip = ip_address(peer_host)
+            if any(peer_ip in net for net in trusted):
+                real_ip = request.headers.get("x-real-ip", "").strip()
+                if real_ip:
+                    return real_ip
+        except ValueError:
+            # peer_host wasn't a parseable IP (e.g. "unknown") — fall through
+            pass
+
+    return peer_host or "unknown"
+
+
+def _check_rate_limit(ip: str) -> None:
+    """Raise 429 if the IP is currently locked out."""
+    record = _login_attempts.get(ip)
+    if record is None:
+        return
+    fail_count, lock_until = record
+    if fail_count >= _MAX_LOGIN_ATTEMPTS:
+        if time.time() < lock_until:
+            raise HTTPException(
+                status_code=429,
+                detail="Too many login attempts. Try again later.",
+            )
+        del _login_attempts[ip]
+
+
+_MAX_TRACKED_IPS = 10000
+
+
+def _record_login_failure(ip: str) -> None:
+    """Record a failed login attempt for the given IP."""
+    # Evict expired lockouts when dict grows too large
+    if len(_login_attempts) >= _MAX_TRACKED_IPS:
+        now = time.time()
+        expired = [k for k, (c, t) in _login_attempts.items() if c >= _MAX_LOGIN_ATTEMPTS and now >= t]
+        for k in expired:
+            del _login_attempts[k]
+        # If still too large, evict cheapest-to-lose half: below-threshold
+        # IPs (lock_until=0.0) sort first, then earliest-expiring lockouts.
+        if len(_login_attempts) >= _MAX_TRACKED_IPS:
+            by_time = sorted(_login_attempts.items(), key=lambda kv: kv[1][1])
+            for k, _ in by_time[: len(by_time) // 2]:
+                del _login_attempts[k]
+
+    record = _login_attempts.get(ip)
+    if record is None:
+        _login_attempts[ip] = (1, 0.0)
+    else:
+        new_count = record[0] + 1
+        lock_until = time.time() + _LOCKOUT_SECONDS if new_count >= _MAX_LOGIN_ATTEMPTS else 0.0
+        _login_attempts[ip] = (new_count, lock_until)
+
+
+def _record_login_success(ip: str) -> None:
+    """Clear failure counter for the given IP on successful login."""
+    _login_attempts.pop(ip, None)
+
+
+# ── Endpoints ─────────────────────────────────────────────────────────────
+
+
+@router.post("/login/local", response_model=LoginResponse)
+async def login_local(
+    request: Request,
+    response: Response,
+    form_data: OAuth2PasswordRequestForm = Depends(),
+):
+    """Local email/password login."""
+    client_ip = _get_client_ip(request)
+    _check_rate_limit(client_ip)
+
+    user = await get_local_provider().authenticate({"email": form_data.username, "password": form_data.password})
+
+    if user is None:
+        _record_login_failure(client_ip)
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail=AuthErrorResponse(code=AuthErrorCode.INVALID_CREDENTIALS, message="Incorrect email or password").model_dump(),
+        )
+
+    _record_login_success(client_ip)
+    token = create_access_token(str(user.id), token_version=user.token_version)
+    _set_session_cookie(response, token, request)
+
+    return LoginResponse(
+        expires_in=get_auth_config().token_expiry_days * 24 * 3600,
+        needs_setup=user.needs_setup,
+    )
+
+
+@router.post("/register", response_model=UserResponse, status_code=status.HTTP_201_CREATED)
+async def register(request: Request, response: Response, body: RegisterRequest):
+    """Register a new user account (always 'user' role).
+
+    Admin is auto-created on first boot. This endpoint creates regular users.
+    Auto-login by setting the session cookie.
+    """
+    try:
+        user = await get_local_provider().create_user(email=body.email, password=body.password, system_role="user")
+    except ValueError:
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail=AuthErrorResponse(code=AuthErrorCode.EMAIL_ALREADY_EXISTS, message="Email already registered").model_dump(),
+        )
+
+    token = create_access_token(str(user.id), token_version=user.token_version)
+    _set_session_cookie(response, token, request)
+
+    return UserResponse(id=str(user.id), email=user.email, system_role=user.system_role)
+
+
+@router.post("/logout", response_model=MessageResponse)
+async def logout(request: Request, response: Response):
+    """Logout current user by clearing the cookie."""
+    response.delete_cookie(key="access_token", secure=is_secure_request(request), samesite="lax")
+    return MessageResponse(message="Successfully logged out")
+
+
+@router.post("/change-password", response_model=MessageResponse)
+async def change_password(request: Request, response: Response, body: ChangePasswordRequest):
+    """Change password for the currently authenticated user.
+
+    Also handles the first-boot setup flow:
+    - If new_email is provided, updates email (checks uniqueness)
+    - If user.needs_setup is True and new_email is given, clears needs_setup
+    - Always increments token_version to invalidate old sessions
+    - Re-issues session cookie with new token_version
+    """
+    from app.gateway.auth.password import hash_password_async, verify_password_async
+
+    user = await get_current_user_from_request(request)
+
+    if user.password_hash is None:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=AuthErrorResponse(code=AuthErrorCode.INVALID_CREDENTIALS, message="OAuth users cannot change password").model_dump())
+
+    if not await verify_password_async(body.current_password, user.password_hash):
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=AuthErrorResponse(code=AuthErrorCode.INVALID_CREDENTIALS, message="Current password is incorrect").model_dump())
+
+    provider = get_local_provider()
+
+    # Update email if provided
+    if body.new_email is not None:
+        existing = await provider.get_user_by_email(body.new_email)
+        if existing and str(existing.id) != str(user.id):
+            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=AuthErrorResponse(code=AuthErrorCode.EMAIL_ALREADY_EXISTS, message="Email already in use").model_dump())
+        user.email = body.new_email
+
+    # Update password + bump version
+    user.password_hash = await hash_password_async(body.new_password)
+    user.token_version += 1
+
+    # Clear setup flag if this is the setup flow
+    if user.needs_setup and body.new_email is not None:
+        user.needs_setup = False
+
+    await provider.update_user(user)
+
+    # Re-issue cookie with new token_version
+    token = create_access_token(str(user.id), token_version=user.token_version)
+    _set_session_cookie(response, token, request)
+
+    return MessageResponse(message="Password changed successfully")
+
+
+@router.get("/me", response_model=UserResponse)
+async def get_me(request: Request):
+    """Get current authenticated user info."""
+    user = await get_current_user_from_request(request)
+    return UserResponse(id=str(user.id), email=user.email, system_role=user.system_role, needs_setup=user.needs_setup)
+
+
+@router.get("/setup-status")
+async def setup_status():
+    """Check if admin account exists. Always False after first boot."""
+    user_count = await get_local_provider().count_users()
+    return {"needs_setup": user_count == 0}
+
+
+# ── OAuth Endpoints (Future/Placeholder) ─────────────────────────────────
+
+
+@router.get("/oauth/{provider}")
+async def oauth_login(provider: str):
+    """Initiate OAuth login flow.
+
+    Redirects to the OAuth provider's authorization URL.
+    Currently a placeholder - requires OAuth provider implementation.
+    """
+    if provider not in ["github", "google"]:
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail=f"Unsupported OAuth provider: {provider}",
+        )
+
+    raise HTTPException(
+        status_code=status.HTTP_501_NOT_IMPLEMENTED,
+        detail="OAuth login not yet implemented",
+    )
+
+
+@router.get("/callback/{provider}")
+async def oauth_callback(provider: str, code: str, state: str):
+    """OAuth callback endpoint.
+
+    Handles the OAuth provider's callback after user authorization.
+    Currently a placeholder.
+    """
+    raise HTTPException(
+        status_code=status.HTTP_501_NOT_IMPLEMENTED,
+        detail="OAuth callback not yet implemented",
+    )
@@ -12,6 +12,7 @@ from typing import Any
 from fastapi import APIRouter, HTTPException, Request
 from pydantic import BaseModel, Field

+from app.gateway.authz import require_permission
 from app.gateway.deps import get_current_user, get_feedback_repo, get_run_store

 logger = logging.getLogger(__name__)
@@ -53,6 +54,7 @@ class FeedbackStatsResponse(BaseModel):


@router.post("/{thread_id}/runs/{run_id}/feedback", response_model=FeedbackResponse)
+@require_permission("threads", "write", owner_check=True, require_existing=True)
 async def create_feedback(
    thread_id: str,
    run_id: str,
@@ -85,6 +87,7 @@ async def create_feedback(


@router.get("/{thread_id}/runs/{run_id}/feedback", response_model=list[FeedbackResponse])
+@require_permission("threads", "read", owner_check=True)
 async def list_feedback(
    thread_id: str,
    run_id: str,
@@ -96,6 +99,7 @@ async def list_feedback(


@router.get("/{thread_id}/runs/{run_id}/feedback/stats", response_model=FeedbackStatsResponse)
+@require_permission("threads", "read", owner_check=True)
 async def feedback_stats(
    thread_id: str,
    run_id: str,
@@ -107,6 +111,7 @@ async def feedback_stats(


@router.delete("/{thread_id}/runs/{run_id}/feedback/{feedback_id}")
+@require_permission("threads", "delete", owner_check=True, require_existing=True)
 async def delete_feedback(
    thread_id: str,
    run_id: str,
@@ -1,10 +1,11 @@
 import json
 import logging

-from fastapi import APIRouter
+from fastapi import APIRouter, Request
 from langchain_core.messages import HumanMessage, SystemMessage
 from pydantic import BaseModel, Field

+from app.gateway.authz import require_permission
 from deerflow.models import create_chat_model

 logger = logging.getLogger(__name__)
@@ -98,12 +99,13 @@ def _format_conversation(messages: list[SuggestionMessage]) -> str:
    summary="Generate Follow-up Questions",
    description="Generate short follow-up questions a user might ask next, based on recent conversation context.",
 )
-async def generate_suggestions(thread_id: str, request: SuggestionsRequest) -> SuggestionsResponse:
-    if not request.messages:
+@require_permission("threads", "read", owner_check=True)
+async def generate_suggestions(thread_id: str, body: SuggestionsRequest, request: Request) -> SuggestionsResponse:
+    if not body.messages:
        return SuggestionsResponse(suggestions=[])

-    n = request.n
-    conversation = _format_conversation(request.messages)
+    n = body.n
+    conversation = _format_conversation(body.messages)
    if not conversation:
        return SuggestionsResponse(suggestions=[])

@@ -120,7 +122,7 @@ async def generate_suggestions(thread_id: str, request: SuggestionsRequest) -> S
    user_content = f"Conversation Context:\n{conversation}\n\nGenerate {n} follow-up questions"

    try:
-        model = create_chat_model(name=request.model_name, thinking_enabled=False)
+        model = create_chat_model(name=body.model_name, thinking_enabled=False)
        response = await model.ainvoke([SystemMessage(content=system_instruction), HumanMessage(content=user_content)])
        raw = _extract_response_text(response.content)
        suggestions = _parse_json_string_list(raw) or []
@@ -19,6 +19,7 @@ from fastapi import APIRouter, HTTPException, Query, Request
 from fastapi.responses import Response, StreamingResponse
 from pydantic import BaseModel, Field

+from app.gateway.authz import require_permission
 from app.gateway.deps import get_checkpointer, get_run_event_store, get_run_manager, get_run_store, get_stream_bridge
 from app.gateway.services import sse_consumer, start_run
 from deerflow.runtime import RunRecord, serialize_channel_values
@@ -93,6 +94,7 @@ def _record_to_response(record: RunRecord) -> RunResponse:


@router.post("/{thread_id}/runs", response_model=RunResponse)
+@require_permission("runs", "create", owner_check=True, require_existing=True)
 async def create_run(thread_id: str, body: RunCreateRequest, request: Request) -> RunResponse:
    """Create a background run (returns immediately)."""
    record = await start_run(body, thread_id, request)
@@ -100,6 +102,7 @@ async def create_run(thread_id: str, body: RunCreateRequest, request: Request) -


@router.post("/{thread_id}/runs/stream")
+@require_permission("runs", "create", owner_check=True, require_existing=True)
 async def stream_run(thread_id: str, body: RunCreateRequest, request: Request) -> StreamingResponse:
    """Create a run and stream events via SSE.

@@ -127,6 +130,7 @@ async def stream_run(thread_id: str, body: RunCreateRequest, request: Request) -


@router.post("/{thread_id}/runs/wait", response_model=dict)
+@require_permission("runs", "create", owner_check=True, require_existing=True)
 async def wait_run(thread_id: str, body: RunCreateRequest, request: Request) -> dict:
    """Create a run and block until it completes, returning the final state."""
    record = await start_run(body, thread_id, request)
@@ -152,6 +156,7 @@ async def wait_run(thread_id: str, body: RunCreateRequest, request: Request) ->


@router.get("/{thread_id}/runs", response_model=list[RunResponse])
+@require_permission("runs", "read", owner_check=True)
 async def list_runs(thread_id: str, request: Request) -> list[RunResponse]:
    """List all runs for a thread."""
    run_mgr = get_run_manager(request)
@@ -160,6 +165,7 @@ async def list_runs(thread_id: str, request: Request) -> list[RunResponse]:


@router.get("/{thread_id}/runs/{run_id}", response_model=RunResponse)
+@require_permission("runs", "read", owner_check=True)
 async def get_run(thread_id: str, run_id: str, request: Request) -> RunResponse:
    """Get details of a specific run."""
    run_mgr = get_run_manager(request)
@@ -170,6 +176,7 @@ async def get_run(thread_id: str, run_id: str, request: Request) -> RunResponse:


@router.post("/{thread_id}/runs/{run_id}/cancel")
+@require_permission("runs", "cancel", owner_check=True, require_existing=True)
 async def cancel_run(
    thread_id: str,
    run_id: str,
@@ -207,6 +214,7 @@ async def cancel_run(


@router.get("/{thread_id}/runs/{run_id}/join")
+@require_permission("runs", "read", owner_check=True)
 async def join_run(thread_id: str, run_id: str, request: Request) -> StreamingResponse:
    """Join an existing run's SSE stream."""
    bridge = get_stream_bridge(request)
@@ -227,6 +235,7 @@ async def join_run(thread_id: str, run_id: str, request: Request) -> StreamingRe


@router.api_route("/{thread_id}/runs/{run_id}/stream", methods=["GET", "POST"], response_model=None)
+@require_permission("runs", "read", owner_check=True)
 async def stream_existing_run(
    thread_id: str,
    run_id: str,
@@ -274,6 +283,7 @@ async def stream_existing_run(


@router.get("/{thread_id}/messages")
+@require_permission("runs", "read", owner_check=True)
 async def list_thread_messages(
    thread_id: str,
    request: Request,
@@ -287,6 +297,7 @@ async def list_thread_messages(


@router.get("/{thread_id}/runs/{run_id}/messages")
+@require_permission("runs", "read", owner_check=True)
 async def list_run_messages(thread_id: str, run_id: str, request: Request) -> list[dict]:
    """Return displayable messages for a specific run."""
    event_store = get_run_event_store(request)
@@ -294,6 +305,7 @@ async def list_run_messages(thread_id: str, run_id: str, request: Request) -> li


@router.get("/{thread_id}/runs/{run_id}/events")
+@require_permission("runs", "read", owner_check=True)
 async def list_run_events(
    thread_id: str,
    run_id: str,
@@ -308,6 +320,7 @@ async def list_run_events(


@router.get("/{thread_id}/token-usage")
+@require_permission("threads", "read", owner_check=True)
 async def thread_token_usage(thread_id: str, request: Request) -> dict:
    """Thread-level token usage aggregation."""
    run_store = get_run_store(request)
@@ -18,8 +18,9 @@ import uuid
 from typing import Any

 from fastapi import APIRouter, HTTPException, Request
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, field_validator

+from app.gateway.authz import require_permission
 from app.gateway.deps import get_checkpointer
 from app.gateway.utils import sanitize_log_param
 from deerflow.config.paths import Paths, get_paths
@@ -29,6 +30,22 @@ logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api/threads", tags=["threads"])


+# Metadata keys that the server controls; clients are not allowed to set
+# them. Pydantic ``@field_validator("metadata")`` strips them on every
+# inbound model below so a malicious client cannot reflect a forged
+# owner identity through the API surface. Defense-in-depth — the
+# row-level invariant is still ``threads_meta.owner_id`` populated from
+# the auth contextvar; this list closes the metadata-blob echo gap.
+_SERVER_RESERVED_METADATA_KEYS: frozenset[str] = frozenset({"owner_id", "user_id"})
+
+
+def _strip_reserved_metadata(metadata: dict[str, Any] | None) -> dict[str, Any]:
+    """Return ``metadata`` with server-controlled keys removed."""
+    if not metadata:
+        return metadata or {}
+    return {k: v for k, v in metadata.items() if k not in _SERVER_RESERVED_METADATA_KEYS}
+
+
 # ---------------------------------------------------------------------------
 # Response / request models
 # ---------------------------------------------------------------------------
@@ -60,6 +77,8 @@ class ThreadCreateRequest(BaseModel):
    assistant_id: str | None = Field(default=None, description="Associate thread with an assistant")
    metadata: dict[str, Any] = Field(default_factory=dict, description="Initial metadata")

+    _strip_reserved = field_validator("metadata")(classmethod(lambda cls, v: _strip_reserved_metadata(v)))
+

 class ThreadSearchRequest(BaseModel):
    """Request body for searching threads."""
@@ -88,6 +107,8 @@ class ThreadPatchRequest(BaseModel):

    metadata: dict[str, Any] = Field(default_factory=dict, description="Metadata to merge")

+    _strip_reserved = field_validator("metadata")(classmethod(lambda cls, v: _strip_reserved_metadata(v)))
+

 class ThreadStateUpdateRequest(BaseModel):
    """Request body for updating thread state (human-in-the-loop resume)."""
@@ -165,6 +186,7 @@ def _derive_thread_status(checkpoint_tuple) -> str:


@router.delete("/{thread_id}", response_model=ThreadDeleteResponse)
+@require_permission("threads", "delete", owner_check=True, require_existing=True)
 async def delete_thread_data(thread_id: str, request: Request) -> ThreadDeleteResponse:
    """Delete local persisted filesystem data for a thread.

@@ -211,6 +233,8 @@ async def create_thread(body: ThreadCreateRequest, request: Request) -> ThreadRe
    thread_meta_repo = get_thread_meta_repo(request)
    thread_id = body.thread_id or str(uuid.uuid4())
    now = time.time()
+    # ``body.metadata`` is already stripped of server-reserved keys by
+    # ``ThreadCreateRequest._strip_reserved`` — see the model definition.

    # Idempotency: return existing record when already present
    existing_record = await thread_meta_repo.get(thread_id)
@@ -293,6 +317,7 @@ async def search_threads(body: ThreadSearchRequest, request: Request) -> list[Th


@router.patch("/{thread_id}", response_model=ThreadResponse)
+@require_permission("threads", "write", owner_check=True, require_existing=True)
 async def patch_thread(thread_id: str, body: ThreadPatchRequest, request: Request) -> ThreadResponse:
    """Merge metadata into a thread record."""
    from app.gateway.deps import get_thread_meta_repo
@@ -302,6 +327,7 @@ async def patch_thread(thread_id: str, body: ThreadPatchRequest, request: Reques
    if record is None:
        raise HTTPException(status_code=404, detail=f"Thread {thread_id} not found")

+    # ``body.metadata`` already stripped by ``ThreadPatchRequest._strip_reserved``.
    try:
        await thread_meta_repo.update_metadata(thread_id, body.metadata)
    except Exception:
@@ -320,6 +346,7 @@ async def patch_thread(thread_id: str, body: ThreadPatchRequest, request: Reques


@router.get("/{thread_id}", response_model=ThreadResponse)
+@require_permission("threads", "read", owner_check=True)
 async def get_thread(thread_id: str, request: Request) -> ThreadResponse:
    """Get thread info.

@@ -376,6 +403,7 @@ async def get_thread(thread_id: str, request: Request) -> ThreadResponse:


@router.get("/{thread_id}/state", response_model=ThreadStateResponse)
+@require_permission("threads", "read", owner_check=True)
 async def get_thread_state(thread_id: str, request: Request) -> ThreadStateResponse:
    """Get the latest state snapshot for a thread.

@@ -425,6 +453,7 @@ async def get_thread_state(thread_id: str, request: Request) -> ThreadStateRespo


@router.post("/{thread_id}/state", response_model=ThreadStateResponse)
+@require_permission("threads", "write", owner_check=True, require_existing=True)
 async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, request: Request) -> ThreadStateResponse:
    """Update thread state (e.g. for human-in-the-loop resume or title rename).

@@ -514,6 +543,7 @@ async def update_thread_state(thread_id: str, body: ThreadStateUpdateRequest, re


@router.post("/{thread_id}/history", response_model=list[HistoryEntry])
+@require_permission("threads", "read", owner_check=True)
 async def get_thread_history(thread_id: str, body: ThreadHistoryRequest, request: Request) -> list[HistoryEntry]:
    """Get checkpoint history for a thread.

@@ -4,9 +4,10 @@ import logging
 import os
 import stat

-from fastapi import APIRouter, File, HTTPException, UploadFile
+from fastapi import APIRouter, File, HTTPException, Request, UploadFile
 from pydantic import BaseModel

+from app.gateway.authz import require_permission
 from deerflow.config.paths import get_paths
 from deerflow.sandbox.sandbox_provider import get_sandbox_provider
 from deerflow.uploads.manager import (
@@ -54,8 +55,10 @@ def _make_file_sandbox_writable(file_path: os.PathLike[str] | str) -> None:


@router.post("", response_model=UploadResponse)
+@require_permission("threads", "write", owner_check=True, require_existing=True)
 async def upload_files(
    thread_id: str,
+    request: Request,
    files: list[UploadFile] = File(...),
 ) -> UploadResponse:
    """Upload multiple files to a thread's uploads directory."""
@@ -133,7 +136,8 @@ async def upload_files(


@router.get("/list", response_model=dict)
-async def list_uploaded_files(thread_id: str) -> dict:
+@require_permission("threads", "read", owner_check=True)
+async def list_uploaded_files(thread_id: str, request: Request) -> dict:
    """List all files in a thread's uploads directory."""
    try:
        uploads_dir = get_uploads_dir(thread_id)
@@ -151,7 +155,8 @@ async def list_uploaded_files(thread_id: str) -> dict:


@router.delete("/{filename}")
-async def delete_uploaded_file(thread_id: str, filename: str) -> dict:
+@require_permission("threads", "delete", owner_check=True, require_existing=True)
+async def delete_uploaded_file(thread_id: str, filename: str, request: Request) -> dict:
    """Delete a file from a thread's uploads directory."""
    try:
        uploads_dir = get_uploads_dir(thread_id)