Production-Grade Hybrid RAG on Cloudflare Edge - Deploy for $5/month instead of $200+
A complete semantic search system with hybrid search (Vector + BM25), reranking, auto-chunking, one-time licensing, and MCP integration.
Traditional AI search setups are expensive and slow:
- 💸 Cost: $50-200+/month for Pinecone, Weaviate, or hosted solutions
- 🐌 Latency: 200-500ms response times from centralized servers
- 🔒 Privacy: Your data gets sent to third-party services
This worker changes the game:
- 💰 Cost: ~$5/month on Cloudflare's generous free tier
- ⚡ Speed: <1s hybrid search with reranking
- 🔐 Privacy: Your data stays on your Cloudflare account
| Feature | Pinecone | This Project |
|---|---|---|
| Monthly Cost | $50+ minimum | ~$5/month |
| Edge Deployment | ❌ Cloud-only | ✅ Cloudflare Edge (210% faster than Lambda@Edge) |
| Hybrid Search | Requires workarounds | ✅ Native Vector + BM25 |
| Cross-Encoder Reranking | Basic | ✅ bge-reranker-base (+9.3% MRR improvement) |
| MCP Integration | ❌ None | ✅ Native (4,400+ MCP tools compatible) |
| Licensing | Recurring SaaS | ✅ One-time option available |
| Vendor Lock-in | High (proprietary) | Low (open source) |
"Pinecone is struggling with customer churn largely driven by cost concerns" — VentureBeat
Cost at scale: Pinecone costs 10-30x more at 60-80M+ monthly queries. Self-hosted alternatives become dramatically cheaper at scale.
Accuracy: Hybrid search with cross-encoder reranking achieves 66.43% MRR@5 vs 56.72% for semantic-only search — a +9.3 percentage point improvement.
- ✅ Hybrid Search - Vector similarity + BM25 keyword matching
- ✅ Reciprocal Rank Fusion (RRF) - Intelligent result merging
- ✅ Cross-Encoder Reranking - Precision scoring with
bge-reranker-base - ✅ Recursive Chunking - Semantic boundary-aware with 15% overlap
- ✅ Image Ingestion - Upload screenshots, diagrams, photos
- ✅ Llama 4 Scout Vision - AI-powered image understanding
- ✅ OCR Extraction - Automatic text extraction from images (700+ chars)
- ✅ Reverse Image Search - Find similar images by uploading a query image
- ✅ Unified Index - Text and images searchable together (384 dims)
- ✅ 60s Cache - 0ms response time for repeated queries
- ✅ Interactive Dashboard - Visual playground at
/dashboard - ✅ MCP Integration - Works with Claude Desktop and AI agents
- ✅ One-Time Licensing - Pay-once validation system
- ✅ AI SEO Ready - JSON-LD schema +
llms.txt
- ✅ API Key Authentication - Production-ready security
- ✅ Performance Monitoring - Real-time timing breakdown
- ✅ CORS Support - Ready for web applications
- ✅ Edge Deployment - Runs globally on Cloudflare's network
User Query
│
├──► Vector Search (Vectorize) ──┐
│ └─ BGE embeddings (384d) │
│ ├──► RRF Fusion ──► Reranker ──► Results
└──► Keyword Search (D1 BM25) ───┘
└─ Term frequency/IDF
Images
│
├──► Llama 4 Scout ──► Description (semantic)
│ └─► OCR Text (keywords)
│
└──► BGE Embeddings ──► Same 384d index
Performance:
- Text search: ~900ms (first), 0ms (cached)
- Image ingest: ~7.7s (vision + OCR + embedding)
- Reverse search: ~8s (vision + search)
Tech Stack:
- Runtime: Cloudflare Workers
- Vector DB: Cloudflare Vectorize (384 dims, single unified index)
- SQL DB: Cloudflare D1 (BM25 keywords)
- Embedding:
@cf/baai/bge-small-en-v1.5 - Reranker:
@cf/baai/bge-reranker-base - Vision:
@cf/meta/llama-4-scout-17b-16e-instruct
- Cloudflare account (free tier works)
- Node.js v18+
- Wrangler CLI
git clone https://github.com/dannwaneri/vectorize-mcp-worker.git
cd vectorize-mcp-worker
npm installwrangler vectorize create mcp-knowledge-base --dimensions=384 --metric=cosine# Copy the example configuration file
cp wrangler.toml.example wrangler.toml
# Windows PowerShell: copy wrangler.toml.example wrangler.toml
# Create your D1 database
wrangler d1 create mcp-knowledge-dbCopy the database_id from the output and update wrangler.toml (line 15):
[[d1_databases]]
binding = "DB"
database_name = "mcp-knowledge-db"
database_id = "YOUR_DATABASE_ID" # ← Replace with your actual database_idwrangler.toml contains sensitive configuration and is in .gitignore. Never commit it to git.
wrangler d1 execute mcp-knowledge-db --remote --file=./schema.sqlwrangler deploywrangler secret put API_KEYOpen your dashboard: https://your-worker.workers.dev/dashboard
Or via CLI:
# Ingest a document
curl -X POST https://your-worker.workers.dev/ingest \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"id": "doc-1", "content": "Your document content here", "category": "docs"}'
# Search
curl -X POST https://your-worker.workers.dev/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"query": "your search query", "topK": 5}'curl -X POST https://your-worker.workers.dev/ingest-image \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "id=dashboard-001" \
-F "image=@dashboard.png" \
-F "imageType=screenshot"curl -X POST https://your-worker.workers.dev/search \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"query": "dashboard with navigation", "topK": 5}'Result: Your image appears in search results! 🎉
curl -X POST https://your-worker.workers.dev/find-similar-images \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "image=@similar-dashboard.png" \
-F "topK=5"Result: System finds visually similar dashboards you've indexed.
- ✅ DO commit:
wrangler.toml.example,.gitignore - ❌ DON'T commit:
wrangler.toml,.dev.vars,.envfiles
- Store API keys using Wrangler secrets (never in code):
wrangler secret put API_KEY- API key is required for all endpoints except:
GET /(API docs)GET /dashboard(UI playground)GET /test(health check)GET /llms.txt(search engine info)
- Each developer should create their own D1 database
- Never share database IDs publicly
- Use
wrangler.toml.exampleas a template
- Copy
wrangler.toml.example→wrangler.toml - Create your own D1 database
- Update
wrangler.tomlwith YOUR database ID - Set API key:
wrangler secret put API_KEY - Never commit
wrangler.toml
| Endpoint | Method | Description |
|---|---|---|
/ingest-image |
POST | Upload and index images with AI description + OCR |
/find-similar-images |
POST | Reverse image search - find visually similar images |
curl -X POST https://your-worker.workers.dev/ingest-image \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "id=screenshot-001" \
-F "image=@dashboard.png" \
-F "category=ui-screenshots" \
-F "imageType=screenshot"Response:
{
"success": true,
"documentId": "screenshot-001",
"description": "Dashboard interface with metrics cards showing...",
"extractedText": "API Key\nEnter your API key\nTest\nServer Online...",
"performance": {
"multimodalProcessing": "4852ms",
"totalTime": "7737ms"
}
}curl -X POST https://your-worker.workers.dev/find-similar-images \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "image=@query.png" \
-F "topK=5"Response:
{
"query": "Dashboard interface with metrics cards...",
"results": [
{
"id": "screenshot-001",
"score": 0.0156,
"content": "Dashboard interface...",
"category": "ui-screenshots",
"isImage": true
}
],
"performance": {
"totalTime": "1135ms"
}
}Image Types:
screenshot- UI screenshots (extracts button labels, form fields)diagram- Flowcharts, architecture diagramsdocument- Scanned documents, formschart- Data visualizationsphoto- General photographsauto- Let AI determine (default)
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | API documentation |
/test |
GET | Health check |
/dashboard |
GET | Interactive UI |
/llms.txt |
GET | AI search engine info |
/mcp/tools |
GET | List MCP tools |
| Endpoint | Method | Description |
|---|---|---|
/search |
POST | Hybrid semantic search |
/ingest |
POST | Ingest document with auto-chunking |
/stats |
GET | Index statistics |
/documents/:id |
DELETE | Delete document |
/mcp/call |
POST | Execute MCP tool |
| Endpoint | Method | Description |
|---|---|---|
/license/create |
POST | Create new license |
/license/validate |
POST | Validate license key |
/license/list |
GET | List all licenses |
/license/revoke |
POST | Revoke a license |
{
"query": "How does hybrid search work?",
"topK": 5,
"rerank": true
}{
"query": "How does hybrid search work?",
"topK": 5,
"resultsCount": 5,
"results": [
{
"id": "doc-1-chunk-0",
"score": 0.0142,
"content": "Hybrid search combines vector similarity...",
"category": "technical",
"scores": {
"vector": 0.85,
"keyword": 12.4,
"reranker": 0.92
}
}
],
"performance": {
"embeddingTime": "94ms",
"vectorSearchTime": "650ms",
"keywordSearchTime": "50ms",
"rerankerTime": "118ms",
"totalTime": "912ms"
}
}{
"id": "unique-doc-id",
"content": "Your document content. Can be multiple paragraphs.\n\nWill be automatically chunked.",
"title": "Optional Title",
"category": "optional-category"
}{
"success": true,
"documentId": "unique-doc-id",
"chunksCreated": 3,
"performance": {
"embeddingTime": "450ms",
"totalTime": "1200ms"
}
}Use this worker as a tool for Claude Desktop, Gemini CLI, or any MCP-compatible AI agent.
This server is compatible with mcp-cli for efficient tool discovery:
// Add to mcp_servers.json
{
"mcpServers": {
"vectorize": {
"url": "https://your-worker.workers.dev",
"headers": {
"Authorization": "Bearer ${VECTORIZE_API_KEY}"
}
}
}
}# Discover tools
mcp-cli vectorize
# Get tool schema
mcp-cli vectorize/search
# Search your knowledge base
mcp-cli vectorize/search '{"query": "cloudflare workers", "topK": 5}'
# Ingest a document
mcp-cli vectorize/ingest '{"id": "doc-1", "content": "Your content here"}'List Available Tools:
curl https://your-worker.workers.dev/mcp/toolsCall a Tool:
curl -X POST https://your-worker.workers.dev/mcp/call \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"tool": "search",
"arguments": {
"query": "cloudflare workers",
"topK": 3
}
}'search- Search the knowledge base (hybrid vector + BM25)ingest- Add text documents with auto-chunkingingest_image- Add images with AI description + OCR (NEW!)stats- Get index statisticsdelete- Remove documents
## License System
### Create a License
```bash
curl -X POST https://your-worker.workers.dev/license/create \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"email": "customer@example.com", "plan": "pro"}'
| Plan | Max Documents | Max Queries/Day |
|---|---|---|
| standard | 10,000 | 1,000 |
| pro | 50,000 | 5,000 |
| enterprise | 100,000 | 10,000 |
curl -X POST https://your-worker.workers.dev/license/validate \
-H "Content-Type: application/json" \
-d '{"license_key": "lic_xxxxx"}'Real-world benchmarks from production:
| Operation | Time | Notes |
|---|---|---|
| Text Embedding | ~90ms | BGE-small-en |
| Vector Search | ~650ms | Vectorize query |
| Keyword Search (BM25) | ~50ms | D1 database |
| Reranking | ~120ms | Cross-encoder |
| Hybrid Text Search | ~900ms | Full pipeline |
| Cached Search | 0ms | 60s TTL cache |
| Image Vision + OCR | ~5s | Llama 4 Scout (2 calls) |
| Image Embedding | ~2s | BGE-small-en |
| Image Ingestion | ~7.7s | Vision + OCR + embedding |
| Reverse Image Search | ~8s | Vision + hybrid search |
| Solution | Monthly Cost | Edge Native | Hybrid Search | MCP |
|---|---|---|---|---|
| This Project | ~$5 | ✅ | ✅ | ✅ |
| Pinecone | $50-200+ | ❌ | Partial | ❌ |
| Weaviate Cloud | $25-150+ | ❌ | ✅ | ❌ |
| Qdrant Cloud | $25-100+ | ❌ | ❌ | ❌ |
| pgvector (self-hosted) | $40-60+ | ❌ | ❌ | ❌ |
Access the interactive dashboard at /dashboard:
- 📊 Stats Panel - Vector count, document count
- 📥 Ingest Form - Add documents with auto-chunking
- 🔎 Search Interface - Test queries with latency monitor
- 🔑 Auth Management - Enter API key for protected endpoints
vectorize-mcp-worker/
├── src/
│ └── index.ts # Main worker code
├── schema.sql # D1 database schema
├── wrangler.toml # Cloudflare configuration
├── package.json
└── README.md
# Start dev server
wrangler dev
# Note: Vectorize requires deployment to test
# D1 works locally with --local flagwrangler d1 create mcp-knowledge-db
# Update wrangler.toml with database_id
wrangler d1 execute mcp-knowledge-db --remote --file=./schema.sqlwrangler vectorize create mcp-knowledge-base --dimensions=384 --metric=cosineEither set an API key or remove it for dev mode:
wrangler secret put API_KEY # Set for production
wrangler secret delete API_KEY # Remove for devIngest some documents first:
curl -X POST https://your-worker.workers.dev/ingest \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"id": "test", "content": "Your content here"}'Before deploying:
-
wrangler.tomlexists locally (copied from example) -
wrangler.tomlhas YOUR database ID (not placeholder) -
wrangler.tomlis NOT tracked by git - Vectorize index created:
mcp-knowledge-base - D1 database created and migrated
- API_KEY secret set (if using authentication)
- Test deployment:
wrangler deploy - Visit dashboard:
https://your-worker.workers.dev/dashboard
Check if wrangler.toml is properly ignored:
git check-ignore wrangler.toml # Should output: wrangler.tomlThanks to these people for improving the project:
- @luojiyin1987 - Bug reports and testing
Areas that need help:
- Multi-modal Support - Images, tables, code
- Batch Ingestion - Bulk document upload
- Analytics Dashboard - Usage metrics
- Testing Suite - Comprehensive tests
- Documentation - More examples
Need help deploying?
- Setup: $2,500 (one-time)
- Timeline: 48 hours
- Includes: Full configuration, up to 10K docs indexed, 2 weeks support
MIT License - see LICENSE file.
Daniel Nwaneri - Cloudflare Workers AI Specialist
- 🐦 Twitter: @dannwaneri
- 💼 Upwork: Profile
- 📝 Blog: DEV.to
- 🔗 GitHub: @dannwaneri
⭐ Star the repo if this saved you money!
💬 Questions? Open an issue