feat: SearXNG self-hosted search + OPENAI_EMBEDDING_BASE_URL for custom embeddings#1644
Open
mareurs wants to merge 1 commit intoassafelovic:mainfrom
Open
feat: SearXNG self-hosted search + OPENAI_EMBEDDING_BASE_URL for custom embeddings#1644mareurs wants to merge 1 commit intoassafelovic:mainfrom
mareurs wants to merge 1 commit intoassafelovic:mainfrom
Conversation
Adds two zero-cost alternatives for users who prefer not to depend on
paid external APIs:
1. **SearXNG as a self-hosted search backend** (`--profile searxng`)
- New `searxng` Docker Compose service behind `--profile searxng`
- Bundles `searxng/settings.yml` with JSON format enabled (required
by the `searx` retriever) and sensible engine defaults
- `gpt-researcher` service now forwards `RETRIEVER` and `SEARX_URL`
env vars so the profile switch is self-contained
- Default remains `RETRIEVER=tavily` — zero breaking change for
existing users
2. **`OPENAI_EMBEDDING_BASE_URL` for the `custom` embedding provider**
- When running a dedicated embedding service (e.g. HuggingFace TEI,
Infinity, Ollama) alongside a separate LLM endpoint, a single
`OPENAI_BASE_URL` cannot address both
- `custom` now checks `OPENAI_EMBEDDING_BASE_URL` first, falling
back to `OPENAI_BASE_URL` and then the LM Studio default
- No behaviour change when `OPENAI_EMBEDDING_BASE_URL` is unset
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner
|
@mareurs can you please add under the docs directory in relevant file addition for explaining how to use this? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two additions for users who want to avoid paid external APIs:
1. SearXNG as a self-hosted search backend (no API key required)
GPT-Researcher already supports
RETRIEVER=searx, but running SearXNG alongside required manual Docker setup. This PR adds it as a first-class--profile searxngoption indocker-compose.yml.# Start with self-hosted search (no Tavily key needed) docker compose --profile searxng up -dThen in
.env:What's included:
searxngservice indocker-compose.ymlunder--profile searxngsearxng/settings.ymlwith JSON format enabled (required by thesearxretriever) and sensible defaults for Google, Bing, and DuckDuckGoRETRIEVERandSEARX_URLforwarded to thegpt-researcherserviceRETRIEVER=tavily— no breaking change for existing usersSearXNG aggregates results from multiple engines without any per-query API costs, making it ideal for local/private deployments.
2.
OPENAI_EMBEDDING_BASE_URLfor thecustomembedding providerWhen running a dedicated embedding service (e.g. HuggingFace TEI, Infinity) alongside a separate LLM API, a single
OPENAI_BASE_URLcan't address both endpoints.The
customembedding provider now checksOPENAI_EMBEDDING_BASE_URLfirst, falling back toOPENAI_BASE_URLand then the LM Studio default:No behaviour change when
OPENAI_EMBEDDING_BASE_URLis unset.Test plan
docker compose --profile searxng up -dstarts SearXNG on port 4000curl "http://localhost:4000/search?q=test&format=json"returns JSON resultsRETRIEVER=searx SEARX_URL=http://localhost:4000produces research resultsRETRIEVER=tavily(default) still works — no regressionOPENAI_EMBEDDING_BASE_URLroutes custom embeddings to the specified endpointOPENAI_EMBEDDING_BASE_URLfalls back toOPENAI_BASE_URL(no regression)🤖 Generated with Claude Code