Skip to content

feat: SearXNG self-hosted search + OPENAI_EMBEDDING_BASE_URL for custom embeddings#1644

Open
mareurs wants to merge 1 commit intoassafelovic:mainfrom
mareurs:searxng-upstream
Open

feat: SearXNG self-hosted search + OPENAI_EMBEDDING_BASE_URL for custom embeddings#1644
mareurs wants to merge 1 commit intoassafelovic:mainfrom
mareurs:searxng-upstream

Conversation

@mareurs
Copy link
Contributor

@mareurs mareurs commented Feb 26, 2026

Summary

Two additions for users who want to avoid paid external APIs:

1. SearXNG as a self-hosted search backend (no API key required)

GPT-Researcher already supports RETRIEVER=searx, but running SearXNG alongside required manual Docker setup. This PR adds it as a first-class --profile searxng option in docker-compose.yml.

# Start with self-hosted search (no Tavily key needed)
docker compose --profile searxng up -d

Then in .env:

RETRIEVER=searx
SEARX_URL=http://searxng:8080

What's included:

  • searxng service in docker-compose.yml under --profile searxng
  • searxng/settings.yml with JSON format enabled (required by the searx retriever) and sensible defaults for Google, Bing, and DuckDuckGo
  • RETRIEVER and SEARX_URL forwarded to the gpt-researcher service
  • Default remains RETRIEVER=tavily — no breaking change for existing users

SearXNG aggregates results from multiple engines without any per-query API costs, making it ideal for local/private deployments.

2. OPENAI_EMBEDDING_BASE_URL for the custom embedding provider

When running a dedicated embedding service (e.g. HuggingFace TEI, Infinity) alongside a separate LLM API, a single OPENAI_BASE_URL can't address both endpoints.

The custom embedding provider now checks OPENAI_EMBEDDING_BASE_URL first, falling back to OPENAI_BASE_URL and then the LM Studio default:

OPENAI_BASE_URL=http://localhost:8000/v1        # LLM endpoint
OPENAI_EMBEDDING_BASE_URL=http://localhost:8080/v1  # Embedding endpoint
EMBEDDING=custom:BAAI/bge-large-en-v1.5

No behaviour change when OPENAI_EMBEDDING_BASE_URL is unset.

Test plan

  • docker compose --profile searxng up -d starts SearXNG on port 4000
  • curl "http://localhost:4000/search?q=test&format=json" returns JSON results
  • RETRIEVER=searx SEARX_URL=http://localhost:4000 produces research results
  • RETRIEVER=tavily (default) still works — no regression
  • Setting OPENAI_EMBEDDING_BASE_URL routes custom embeddings to the specified endpoint
  • Omitting OPENAI_EMBEDDING_BASE_URL falls back to OPENAI_BASE_URL (no regression)

🤖 Generated with Claude Code

Adds two zero-cost alternatives for users who prefer not to depend on
paid external APIs:

1. **SearXNG as a self-hosted search backend** (`--profile searxng`)
   - New `searxng` Docker Compose service behind `--profile searxng`
   - Bundles `searxng/settings.yml` with JSON format enabled (required
     by the `searx` retriever) and sensible engine defaults
   - `gpt-researcher` service now forwards `RETRIEVER` and `SEARX_URL`
     env vars so the profile switch is self-contained
   - Default remains `RETRIEVER=tavily` — zero breaking change for
     existing users

2. **`OPENAI_EMBEDDING_BASE_URL` for the `custom` embedding provider**
   - When running a dedicated embedding service (e.g. HuggingFace TEI,
     Infinity, Ollama) alongside a separate LLM endpoint, a single
     `OPENAI_BASE_URL` cannot address both
   - `custom` now checks `OPENAI_EMBEDDING_BASE_URL` first, falling
     back to `OPENAI_BASE_URL` and then the LM Studio default
   - No behaviour change when `OPENAI_EMBEDDING_BASE_URL` is unset

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@assafelovic
Copy link
Owner

@mareurs can you please add under the docs directory in relevant file addition for explaining how to use this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants