Skip to content

๐ŸŽจ Ready-to-use DeepSeek-OCR Web UI | Modern Interface | 7 Recognition Modes | Batch Processing | Real-time Logging | Fully Responsive

License

Notifications You must be signed in to change notification settings

neosun100/DeepSeek-OCR-WebUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

200 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ” DeepSeek-OCR-WebUI

Visit Application โ†’

๐ŸŒ English | ็ฎ€ไฝ“ไธญๆ–‡ | ็น้ซ”ไธญๆ–‡ | ๆ—ฅๆœฌ่ชž

Version Docker License Vue TypeScript

Intelligent OCR System ยท Vue 3 Modern UI ยท Batch Processing ยท Multi-Mode Support

Features โ€ข Quick Start โ€ข Screenshots โ€ข Contributors


๐ŸŽ‰ v4.1 Update: UI Improvements & Model Version Display

v4.1 OCR-2 UI

Header shows OCR-2 model badge ยท Footer displays v4.1 ยท OCR-2

  • ๐Ÿท๏ธ OCR-2 Model Badge โ€” Header now shows a prominent OCR-2 badge so users instantly know the model version
  • ๐ŸŽจ Table Rendering Fix โ€” OCR-detected tables now display with white backgrounds, dark text, and zebra striping for clear readability (previously appeared as dark/unreadable blocks)
  • ๐Ÿ“ก Health API model_version โ€” /health endpoint now returns "model_version": "DeepSeek-OCR-2" for programmatic version detection
  • ๐Ÿ”– Footer Version โ€” Updated to v4.1 ยท OCR-2

๐ŸŽ‰ v4.0 Update: DeepSeek-OCR-2 Model Upgrade!

๐Ÿš€ Major model upgrade to DeepSeek-OCR-2 (Visual Causal Flow) โ€” better accuracy, higher resolution!

โœจ What's New in v4.0

  • ๐Ÿง  DeepSeek-OCR-2 Model - Upgraded to the latest DeepSeek-OCR-2 with Visual Causal Flow architecture
  • ๐Ÿ”ฌ Higher Resolution - Dynamic resolution up to (0-6)ร—768ร—768 + 1ร—1024ร—1024 (was 640ร—640)
  • โšก Flash Attention 2 - Native flash_attention_2 support on CUDA for optimal inference speed
  • ๐ŸŽฏ Improved Accuracy - Better document understanding, chart parsing, and text recognition
  • ๐Ÿ”„ Full Backward Compatibility - All 7 recognition modes, REST API, and frontend unchanged
  • ๐Ÿณ Docker v4.0 - New all-in-one image with pre-downloaded OCR-2 model (Dockerfile.v4.0)
  • ๐Ÿ“ฆ Unified Tokenizer - Switched from AutoProcessor to AutoTokenizer (aligned with official OCR-2 API)

๐Ÿ”ง Technical Changes

Component v3.6 (OCR v1) v4.0 (OCR-2)
Model deepseek-ai/DeepSeek-OCR deepseek-ai/DeepSeek-OCR-2
image_size 640 768
Attention eager flash_attention_2 (CUDA)
Tokenizer AutoProcessor AutoTokenizer
Resolution Fixed crops Dynamic (0-6)ร—768 + 1ร—1024

๐Ÿ’ก All existing features from v3.6 (concurrency, rate limiting, queue management, Vue 3 frontend) are fully preserved.


๐ŸŽ‰ v3.6 Update: Backend Concurrency & Rate Limiting!

๐Ÿš€ Performance optimization with smart queue management and rate limiting!

โœจ What's New in v3.6

  • โšก Backend Concurrency Optimization - Non-blocking inference with ThreadPoolExecutor
  • ๐Ÿ”’ Rate Limiting - Per-client and per-IP request limits (X-Client-ID header support)
  • ๐Ÿ“Š Queue Management - Real-time queue status with position tracking
  • ๐Ÿฅ Enhanced Health API - Queue depth, status (healthy/busy/full), and rate limit info
  • ๐ŸŒ New Languages - Added Traditional Chinese (zh-TW) and Japanese (ja-JP)
  • ๐ŸŽฏ 429 Error Handling - Graceful handling when queue is full or rate limited

๐Ÿ™ Contributors: @cloudman6 (PR #41)


๐ŸŽ‰ v3.5 Major Update: Brand New Vue 3 Frontend!

๐Ÿš€ Complete UI Overhaul with Modern Vue 3 + TypeScript Architecture!

Home Page Processing Page
Vue3 Home Vue3 Processing

โœจ What's New in v3.5

  • ๐ŸŽจ Brand New Vue 3 UI - Modern, responsive design with Naive UI components
  • โšก TypeScript Support - Full type safety and better developer experience
  • ๐Ÿ“ฆ Dexie.js Database - Local IndexedDB for offline page management
  • ๐Ÿ”„ Real-time Processing Queue - Visual OCR progress with queue management
  • ๐Ÿฅ Health Check System - Backend status monitoring with visual indicators
  • ๐Ÿ“„ Enhanced PDF Support - Smooth PDF rendering with page-by-page processing
  • ๐ŸŒ i18n Ready - Built-in internationalization (EN/CN/TW/JP)
  • ๐Ÿงช E2E Testing - Comprehensive Playwright test coverage

๐Ÿ‘ฅ Contributors

๐ŸŒŸ Special Thanks to Our Amazing Contributors! ๐ŸŒŸ

This project is the result of an outstanding collaboration. The Vue 3 frontend was developed through a successful merge of PR #34.

CloudMan
CloudMan

๐Ÿ† Vue 3 Frontend Lead Developer
164 commits ยท Complete UI Rewrite
neosun100
neosun100

๐ŸŽฏ Project Maintainer
Backend ยท Docker ยท Integration

๐Ÿ’ก About the Vue 3 Frontend: @cloudman6 contributed an exceptional Vue 3 + TypeScript frontend with 164 commits, including comprehensive E2E tests, modern UI components, and production-ready architecture. This collaboration transformed DeepSeek-OCR-WebUI into a professional-grade application!


๐Ÿ“– Introduction

DeepSeek-OCR-WebUI is an intelligent document recognition web application powered by the DeepSeek-OCR model. It provides a modern, intuitive interface for converting images and PDFs to structured text with high accuracy.

โœจ Core Highlights

Feature Description
๐ŸŽฏ 7 Recognition Modes Document, OCR, Chart, Find, Freeform, and more
๐Ÿ–ผ๏ธ Bounding Box Visualization Find mode with automatic position annotation
๐Ÿ“ฆ Batch Processing Process multiple images/pages sequentially
๐Ÿ“„ PDF Support Upload PDFs, auto-convert to images
๐ŸŽจ Modern Vue 3 UI Responsive design with Naive UI
๐ŸŒ Multilingual EN, ็ฎ€ไฝ“ไธญๆ–‡, ็น้ซ”ไธญๆ–‡, ๆ—ฅๆœฌ่ชž
๐ŸŽ Apple Silicon Native MPS acceleration for M1/M2/M3/M4
๐Ÿณ Docker Ready One-command deployment
โšก GPU Acceleration NVIDIA CUDA support

๐Ÿš€ Features

7 Recognition Modes

Mode Icon Description Use Cases
Doc to Markdown ๐Ÿ“„ Preserve format and layout Contracts, papers, reports
General OCR ๐Ÿ“ Extract all visible text Image text extraction
Plain Text ๐Ÿ“‹ Pure text without format Simple text recognition
Chart Parser ๐Ÿ“Š Recognize charts and formulas Data charts, math formulas
Image Description ๐Ÿ–ผ๏ธ Generate detailed descriptions Image understanding
Find & Locate ๐Ÿ” Find and annotate positions Invoice field locating
Custom Prompt โœจ Customize recognition needs Flexible tasks

๐Ÿ†• Vue 3 Frontend Features

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ๐Ÿ“ Page Sidebar          โ”‚  ๐Ÿ“„ Document Viewer             โ”‚
โ”‚  โ”œโ”€ Thumbnail List        โ”‚  โ”œโ”€ High-res Image Display      โ”‚
โ”‚  โ”œโ”€ Drag & Drop Reorder   โ”‚  โ”œโ”€ OCR Overlay Toggle          โ”‚
โ”‚  โ”œโ”€ Batch Selection       โ”‚  โ”œโ”€ Zoom Controls               โ”‚
โ”‚  โ””โ”€ Quick Actions         โ”‚  โ””โ”€ Status Indicators           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐Ÿ”„ Processing Queue      โ”‚  ๐Ÿ“ Result Panel                โ”‚
โ”‚  โ”œโ”€ Real-time Progress    โ”‚  โ”œโ”€ Markdown Preview            โ”‚
โ”‚  โ”œโ”€ Cancel/Retry          โ”‚  โ”œโ”€ Word/PDF Export             โ”‚
โ”‚  โ””โ”€ Health Monitoring     โ”‚  โ””โ”€ Copy to Clipboard           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ–ผ๏ธ Screenshots

Home Page

Vue3 Home Page

Clean, modern landing page with quick access to all features

Processing Interface

Vue3 Processing Page

Full-featured document processing with sidebar, viewer, and results panel

Quick Start Guide

Quick Start Guide

Step-by-step guide: Import files โ†’ Select pages โ†’ Choose OCR mode โ†’ Get results


๐Ÿ“ฆ Quick Start

๐Ÿณ Docker (Recommended)

# Pull and run
docker pull neosun/deepseek-ocr:v4.1
docker run -d \
  --name deepseek-ocr \
  --gpus all \
  -p 8001:8001 \
  --shm-size=8g \
  neosun/deepseek-ocr:v4.1

# Access: http://localhost:8001

Available Docker Tags

Tag Description
latest Latest stable (= v4.1)
v4.1 UI improvements & model version display
v4.0 DeepSeek-OCR-2 model upgrade
v3.6 Backend concurrency & rate limiting
v3.5 Vue 3 frontend version
v3.3.1-fix-bfloat16 BFloat16 compatibility fix

๐ŸŽ Mac (Apple Silicon)

# Clone and setup
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI

# Create conda environment
conda create -n deepseek-ocr python=3.11
conda activate deepseek-ocr

# Install dependencies
pip install -r requirements-mac.txt

# Start service
./start.sh
# Access: http://localhost:8001

๐Ÿง Linux (Native)

# With NVIDIA GPU
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
./start.sh

๐Ÿ”Œ API & Integration

REST API

import requests

# Single image OCR
with open("image.png", "rb") as f:
    response = requests.post(
        "http://localhost:8001/ocr",
        files={"file": f},
        data={"prompt_type": "ocr"}
    )
    print(response.json()["text"])

# PDF OCR (all pages)
with open("document.pdf", "rb") as f:
    response = requests.post(
        "http://localhost:8001/ocr-pdf",
        files={"file": f},
        data={"prompt_type": "document"}
    )
    print(response.json()["merged_text"])

Endpoints:

  • GET /health - Health check
  • POST /ocr - Single image OCR
  • POST /ocr-pdf - PDF OCR (all pages)
  • POST /pdf-to-images - Convert PDF to images

๐Ÿ“– Full API Documentation: API.md

MCP (Model Context Protocol)

Enable AI assistants like Claude Desktop to use OCR:

{
  "mcpServers": {
    "deepseek-ocr": {
      "command": "python",
      "args": ["/path/to/mcp_server.py"]
    }
  }
}

๐Ÿ“– MCP Setup Guide: MCP_SETUP.md


๐ŸŒ Multilingual Support

Language Code Status
๐Ÿ‡บ๐Ÿ‡ธ English en-US โœ… Default
๐Ÿ‡จ๐Ÿ‡ณ ็ฎ€ไฝ“ไธญๆ–‡ zh-CN โœ…
๐Ÿ‡น๐Ÿ‡ผ ็น้ซ”ไธญๆ–‡ zh-TW โœ…
๐Ÿ‡ฏ๐Ÿ‡ต ๆ—ฅๆœฌ่ชž ja-JP โœ…

Switch language via the selector in the top-right corner.


๐Ÿ“Š Version History

v4.1 (2026-02-20) - UI Improvements & Model Version Display

๐Ÿท๏ธ UI & API Enhancements:

  • โœ… OCR-2 model badge in header for instant version recognition
  • โœ… Table rendering fix: white background, dark text, zebra striping
  • โœ… Health API returns model_version: "DeepSeek-OCR-2"
  • โœ… Footer updated to v4.1 ยท OCR-2

v4.0 (2026-02-20) - DeepSeek-OCR-2 Model Upgrade

๐Ÿง  Major Model Upgrade:

  • โœ… Upgraded to DeepSeek-OCR-2 (Visual Causal Flow)
  • โœ… Dynamic resolution: (0-6)ร—768ร—768 + 1ร—1024ร—1024
  • โœ… Flash Attention 2 on CUDA for optimal inference speed
  • โœ… Switched from AutoProcessor to AutoTokenizer
  • โœ… image_size upgraded from 640 to 768
  • โœ… New Dockerfile.v4.0 with pre-downloaded OCR-2 model
  • โœ… Full backward compatibility with all v3.6 features

v3.6 (2026-01-20) - Backend Concurrency & Rate Limiting

โšก Performance Optimization:

  • โœ… Non-blocking inference with ThreadPoolExecutor
  • โœ… Concurrency control with asyncio.Semaphore (OCR: 1, PDF: 2)
  • โœ… Queue system with MAX_OCR_QUEUE_SIZE and dynamic status
  • โœ… Per-IP and per-Client-ID rate limiting (X-Client-ID header)
  • โœ… 429 error handling (queue full, client limit, IP limit)
  • โœ… Health indicator with 3 status colors (green/yellow/red)
  • โœ… OCR queue popover with real-time position display

๐Ÿ™ Contributors: @cloudman6 (PR #41)

v3.5 (2026-01-17) - Vue 3 Frontend

๐ŸŽจ Complete UI Overhaul:

  • โœ… Vue 3 + TypeScript + Naive UI
  • โœ… Dexie.js local database
  • โœ… Real-time processing queue
  • โœ… Health check monitoring
  • โœ… E2E test coverage (Playwright)
  • โœ… GitHub links in header

๐Ÿ™ Contributors: @cloudman6 (164 commits)

v3.3.1 (2025-12-16) - BFloat16 Fix

  • โœ… Fixed GPU compatibility for RTX 20xx, GTX 10xx
  • โœ… Auto-detect compute capability

v3.3 (2025-11-05) - Apple Silicon

  • โœ… Native MPS backend for Mac M1/M2/M3/M4
  • โœ… Multi-platform architecture

v3.2 (2025-11-04) - PDF Support

  • โœ… PDF upload and conversion
  • โœ… ModelScope auto-fallback

๐Ÿ“– Documentation

Document Description
API.md REST API reference
MCP_SETUP.md MCP integration guide
DOCKER_HUB.md Docker deployment
CHANGELOG.md Version history

๐Ÿ“ˆ Star History

Star History Chart

โญ If this project helps you, please give it a Star! โญ


๐Ÿค Contributing

Contributions welcome! Please:

  1. Fork this repository
  2. Create feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add AmazingFeature')
  4. Push to branch (git push origin feature/AmazingFeature)
  5. Open Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License.


๐Ÿ™ Acknowledgments


Made with โค๏ธ by neosun100 & cloudman6

DeepSeek-OCR-WebUI v3.5 | ยฉ 2026

About

๐ŸŽจ Ready-to-use DeepSeek-OCR Web UI | Modern Interface | 7 Recognition Modes | Batch Processing | Real-time Logging | Fully Responsive

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors