Files
deer-flow/backend/packages/harness/deerflow/community/jina_ai/jina_client.py
T
Ryker_Feng f92a26d56f fix(web_fetch): support proxy for Jina reader in restricted networks (#3418) (#3430)
* fix(web_fetch): support proxy for Jina reader in restricted networks

The web_fetch tool built a bare httpx.AsyncClient() with no proxy
awareness, so users behind a corporate proxy / in Docker / WSL could
not reach https://r.jina.ai and web_fetch timed out.

- Add optional `proxy` / `trust_env` params to JinaClient.crawl and
  wire them from the `web_fetch` tool config (with type coercion for
  YAML string values).
- Pass internal service hostnames through NO_PROXY in both compose
  files so proxy env inherited via env_file does not break in-cluster
  calls (gateway/provisioner/etc).
- Load proxy vars from .env into the shell in scripts/docker.sh so the
  NO_PROXY interpolation can merge user-provided values on `make` path.
- Document proxy/trust_env options in config.example.yaml.

Closes #3418

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-08 23:25:29 +08:00

47 lines
1.8 KiB
Python

import logging
import os
import httpx
logger = logging.getLogger(__name__)
_api_key_warned = False
class JinaClient:
async def crawl(self, url: str, return_format: str = "html", timeout: int = 10, proxy: str | None = None, trust_env: bool = True) -> str:
global _api_key_warned
headers = {
"Content-Type": "application/json",
"X-Return-Format": return_format,
"X-Timeout": str(timeout),
}
if os.getenv("JINA_API_KEY"):
headers["Authorization"] = f"Bearer {os.getenv('JINA_API_KEY')}"
elif not _api_key_warned:
_api_key_warned = True
logger.warning("Jina API key is not set. Provide your own key to access a higher rate limit. See https://jina.ai/reader for more information.")
data = {"url": url}
try:
client_kwargs: dict[str, object] = {"trust_env": trust_env}
if proxy:
client_kwargs["proxy"] = proxy
async with httpx.AsyncClient(**client_kwargs) as client:
response = await client.post("https://r.jina.ai/", headers=headers, json=data, timeout=timeout)
if response.status_code != 200:
error_message = f"Jina API returned status {response.status_code}: {response.text}"
logger.error(error_message)
return f"Error: {error_message}"
if not response.text or not response.text.strip():
error_message = "Jina API returned empty response"
logger.error(error_message)
return f"Error: {error_message}"
return response.text
except Exception as e:
error_message = f"Request to Jina API failed: {type(e).__name__}: {e}"
logger.warning(error_message)
return f"Error: {error_message}"