๐ English | ็ฎไฝไธญๆ | ็น้ซไธญๆ | ๆฅๆฌ่ช
Intelligent OCR System ยท Vue 3 Modern UI ยท Batch Processing ยท Multi-Mode Support
Features โข Quick Start โข Screenshots โข Contributors
- ๐ท๏ธ OCR-2 Model Badge โ Header now shows a prominent
OCR-2badge so users instantly know the model version - ๐จ Table Rendering Fix โ OCR-detected tables now display with white backgrounds, dark text, and zebra striping for clear readability (previously appeared as dark/unreadable blocks)
- ๐ก Health API
model_versionโ/healthendpoint now returns"model_version": "DeepSeek-OCR-2"for programmatic version detection - ๐ Footer Version โ Updated to
v4.1 ยท OCR-2
๐ Major model upgrade to DeepSeek-OCR-2 (Visual Causal Flow) โ better accuracy, higher resolution!
- ๐ง DeepSeek-OCR-2 Model - Upgraded to the latest DeepSeek-OCR-2 with Visual Causal Flow architecture
- ๐ฌ Higher Resolution - Dynamic resolution up to (0-6)ร768ร768 + 1ร1024ร1024 (was 640ร640)
- โก Flash Attention 2 - Native
flash_attention_2support on CUDA for optimal inference speed - ๐ฏ Improved Accuracy - Better document understanding, chart parsing, and text recognition
- ๐ Full Backward Compatibility - All 7 recognition modes, REST API, and frontend unchanged
- ๐ณ Docker v4.0 - New all-in-one image with pre-downloaded OCR-2 model (
Dockerfile.v4.0) - ๐ฆ Unified Tokenizer - Switched from
AutoProcessortoAutoTokenizer(aligned with official OCR-2 API)
| Component | v3.6 (OCR v1) | v4.0 (OCR-2) |
|---|---|---|
| Model | deepseek-ai/DeepSeek-OCR |
deepseek-ai/DeepSeek-OCR-2 |
image_size |
640 | 768 |
| Attention | eager |
flash_attention_2 (CUDA) |
| Tokenizer | AutoProcessor |
AutoTokenizer |
| Resolution | Fixed crops | Dynamic (0-6)ร768 + 1ร1024 |
๐ก All existing features from v3.6 (concurrency, rate limiting, queue management, Vue 3 frontend) are fully preserved.
๐ Performance optimization with smart queue management and rate limiting!
- โก Backend Concurrency Optimization - Non-blocking inference with ThreadPoolExecutor
- ๐ Rate Limiting - Per-client and per-IP request limits (X-Client-ID header support)
- ๐ Queue Management - Real-time queue status with position tracking
- ๐ฅ Enhanced Health API - Queue depth, status (healthy/busy/full), and rate limit info
- ๐ New Languages - Added Traditional Chinese (zh-TW) and Japanese (ja-JP)
- ๐ฏ 429 Error Handling - Graceful handling when queue is full or rate limited
๐ Contributors: @cloudman6 (PR #41)
๐ Complete UI Overhaul with Modern Vue 3 + TypeScript Architecture!
- ๐จ Brand New Vue 3 UI - Modern, responsive design with Naive UI components
- โก TypeScript Support - Full type safety and better developer experience
- ๐ฆ Dexie.js Database - Local IndexedDB for offline page management
- ๐ Real-time Processing Queue - Visual OCR progress with queue management
- ๐ฅ Health Check System - Backend status monitoring with visual indicators
- ๐ Enhanced PDF Support - Smooth PDF rendering with page-by-page processing
- ๐ i18n Ready - Built-in internationalization (EN/CN/TW/JP)
- ๐งช E2E Testing - Comprehensive Playwright test coverage
This project is the result of an outstanding collaboration. The Vue 3 frontend was developed through a successful merge of PR #34.
|
CloudMan ๐ Vue 3 Frontend Lead Developer 164 commits ยท Complete UI Rewrite |
neosun100 ๐ฏ Project Maintainer Backend ยท Docker ยท Integration |
๐ก About the Vue 3 Frontend: @cloudman6 contributed an exceptional Vue 3 + TypeScript frontend with 164 commits, including comprehensive E2E tests, modern UI components, and production-ready architecture. This collaboration transformed DeepSeek-OCR-WebUI into a professional-grade application!
DeepSeek-OCR-WebUI is an intelligent document recognition web application powered by the DeepSeek-OCR model. It provides a modern, intuitive interface for converting images and PDFs to structured text with high accuracy.
| Feature | Description |
|---|---|
| ๐ฏ 7 Recognition Modes | Document, OCR, Chart, Find, Freeform, and more |
| ๐ผ๏ธ Bounding Box Visualization | Find mode with automatic position annotation |
| ๐ฆ Batch Processing | Process multiple images/pages sequentially |
| ๐ PDF Support | Upload PDFs, auto-convert to images |
| ๐จ Modern Vue 3 UI | Responsive design with Naive UI |
| ๐ Multilingual | EN, ็ฎไฝไธญๆ, ็น้ซไธญๆ, ๆฅๆฌ่ช |
| ๐ Apple Silicon | Native MPS acceleration for M1/M2/M3/M4 |
| ๐ณ Docker Ready | One-command deployment |
| โก GPU Acceleration | NVIDIA CUDA support |
| Mode | Icon | Description | Use Cases |
|---|---|---|---|
| Doc to Markdown | ๐ | Preserve format and layout | Contracts, papers, reports |
| General OCR | ๐ | Extract all visible text | Image text extraction |
| Plain Text | ๐ | Pure text without format | Simple text recognition |
| Chart Parser | ๐ | Recognize charts and formulas | Data charts, math formulas |
| Image Description | ๐ผ๏ธ | Generate detailed descriptions | Image understanding |
| Find & Locate | ๐ | Find and annotate positions | Invoice field locating |
| Custom Prompt | โจ | Customize recognition needs | Flexible tasks |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Page Sidebar โ ๐ Document Viewer โ
โ โโ Thumbnail List โ โโ High-res Image Display โ
โ โโ Drag & Drop Reorder โ โโ OCR Overlay Toggle โ
โ โโ Batch Selection โ โโ Zoom Controls โ
โ โโ Quick Actions โ โโ Status Indicators โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ Processing Queue โ ๐ Result Panel โ
โ โโ Real-time Progress โ โโ Markdown Preview โ
โ โโ Cancel/Retry โ โโ Word/PDF Export โ
โ โโ Health Monitoring โ โโ Copy to Clipboard โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Pull and run
docker pull neosun/deepseek-ocr:v4.1
docker run -d \
--name deepseek-ocr \
--gpus all \
-p 8001:8001 \
--shm-size=8g \
neosun/deepseek-ocr:v4.1
# Access: http://localhost:8001| Tag | Description |
|---|---|
latest |
Latest stable (= v4.1) |
v4.1 |
UI improvements & model version display |
v4.0 |
DeepSeek-OCR-2 model upgrade |
v3.6 |
Backend concurrency & rate limiting |
v3.5 |
Vue 3 frontend version |
v3.3.1-fix-bfloat16 |
BFloat16 compatibility fix |
# Clone and setup
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI
# Create conda environment
conda create -n deepseek-ocr python=3.11
conda activate deepseek-ocr
# Install dependencies
pip install -r requirements-mac.txt
# Start service
./start.sh
# Access: http://localhost:8001# With NVIDIA GPU
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
./start.shimport requests
# Single image OCR
with open("image.png", "rb") as f:
response = requests.post(
"http://localhost:8001/ocr",
files={"file": f},
data={"prompt_type": "ocr"}
)
print(response.json()["text"])
# PDF OCR (all pages)
with open("document.pdf", "rb") as f:
response = requests.post(
"http://localhost:8001/ocr-pdf",
files={"file": f},
data={"prompt_type": "document"}
)
print(response.json()["merged_text"])Endpoints:
GET /health- Health checkPOST /ocr- Single image OCRPOST /ocr-pdf- PDF OCR (all pages)POST /pdf-to-images- Convert PDF to images
๐ Full API Documentation: API.md
Enable AI assistants like Claude Desktop to use OCR:
{
"mcpServers": {
"deepseek-ocr": {
"command": "python",
"args": ["/path/to/mcp_server.py"]
}
}
}๐ MCP Setup Guide: MCP_SETUP.md
| Language | Code | Status |
|---|---|---|
| ๐บ๐ธ English | en-US | โ Default |
| ๐จ๐ณ ็ฎไฝไธญๆ | zh-CN | โ |
| ๐น๐ผ ็น้ซไธญๆ | zh-TW | โ |
| ๐ฏ๐ต ๆฅๆฌ่ช | ja-JP | โ |
Switch language via the selector in the top-right corner.
๐ท๏ธ UI & API Enhancements:
- โ OCR-2 model badge in header for instant version recognition
- โ Table rendering fix: white background, dark text, zebra striping
- โ
Health API returns
model_version: "DeepSeek-OCR-2" - โ
Footer updated to
v4.1 ยท OCR-2
๐ง Major Model Upgrade:
- โ Upgraded to DeepSeek-OCR-2 (Visual Causal Flow)
- โ Dynamic resolution: (0-6)ร768ร768 + 1ร1024ร1024
- โ Flash Attention 2 on CUDA for optimal inference speed
- โ
Switched from
AutoProcessortoAutoTokenizer - โ
image_sizeupgraded from 640 to 768 - โ
New
Dockerfile.v4.0with pre-downloaded OCR-2 model - โ Full backward compatibility with all v3.6 features
โก Performance Optimization:
- โ Non-blocking inference with ThreadPoolExecutor
- โ Concurrency control with asyncio.Semaphore (OCR: 1, PDF: 2)
- โ Queue system with MAX_OCR_QUEUE_SIZE and dynamic status
- โ Per-IP and per-Client-ID rate limiting (X-Client-ID header)
- โ 429 error handling (queue full, client limit, IP limit)
- โ Health indicator with 3 status colors (green/yellow/red)
- โ OCR queue popover with real-time position display
๐ Contributors: @cloudman6 (PR #41)
๐จ Complete UI Overhaul:
- โ Vue 3 + TypeScript + Naive UI
- โ Dexie.js local database
- โ Real-time processing queue
- โ Health check monitoring
- โ E2E test coverage (Playwright)
- โ GitHub links in header
๐ Contributors: @cloudman6 (164 commits)
- โ Fixed GPU compatibility for RTX 20xx, GTX 10xx
- โ Auto-detect compute capability
- โ Native MPS backend for Mac M1/M2/M3/M4
- โ Multi-platform architecture
- โ PDF upload and conversion
- โ ModelScope auto-fallback
| Document | Description |
|---|---|
| API.md | REST API reference |
| MCP_SETUP.md | MCP integration guide |
| DOCKER_HUB.md | Docker deployment |
| CHANGELOG.md | Version history |
Contributions welcome! Please:
- Fork this repository
- Create feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open Pull Request
This project is licensed under the MIT License.
- DeepSeek-AI - DeepSeek-OCR model
- @cloudman6 - Vue 3 frontend development
- All contributors and users



