USAGE.md

Client-Side OCR - Usage Guide

A powerful, privacy-focused OCR (Optical Character Recognition) web application that runs entirely in your browser. No server uploads, no data tracking - your documents stay on your device.

Features

🚀 Core Features

100% Client-Side Processing: All OCR processing happens in your browser
Multi-Language Support: Supports 100+ languages including English, Chinese, Japanese, Korean, Tamil, Hindi, Arabic, and more
High Performance: Uses ONNX Runtime with WebAssembly for fast processing
Multiple Input Methods:
- Drag & drop files
- File selection
- Camera capture (mobile)
- URL input
- Clipboard paste
File Format Support:
- Images: JPG, PNG, WebP, GIF, BMP, TIFF
- Documents: PDF (with text layer detection)
Progressive Web App (PWA): Install and use offline

🎨 Advanced Features

Image Preprocessing:
- Auto-enhancement detection
- Grayscale conversion
- Noise reduction
- Contrast adjustment
- Image sharpening
- Auto-deskew
- Background removal
Performance Monitoring:
- Real-time processing metrics
- Stage-by-stage progress tracking
- Resource usage monitoring
Export Options:
- Copy to clipboard
- Export as TXT
- Export as JSON (with coordinates)
Dark Mode Support
Responsive Design: Works on desktop, tablet, and mobile

Installation

Using npm

npm install client-side-ocr

Using yarn

yarn add client-side-ocr

Using pnpm

pnpm add client-side-ocr

CDN Usage

<!-- Add to your HTML -->
<script type="module">
  import { RapidOCREngine } from 'https://unpkg.com/client-side-ocr@latest/dist/index.js';
</script>

Usage

Basic OCR Processing

import { RapidOCREngine } from 'client-side-ocr';

// Initialize the OCR engine
const ocr = new RapidOCREngine({
  lang: 'en',  // Language code
  version: 'PP-OCRv4',  // Model version
  modelType: 'mobile'  // 'mobile' for speed, 'server' for accuracy
});

// Initialize models (one-time setup)
await ocr.initialize();

// Process an image
const imageElement = document.getElementById('myImage');
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');

canvas.width = imageElement.width;
canvas.height = imageElement.height;
ctx.drawImage(imageElement, 0, 0);

const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);
const results = await ocr.process(
  imageData.data,
  imageData.width,
  imageData.height
);

// Results contain text and bounding boxes
results.forEach(result => {
  console.log('Text:', result.text);
  console.log('Confidence:', result.confidence);
  console.log('Bounding box:', result.box);
});

Image Preprocessing

import { ImagePreprocessor } from 'client-side-ocr';

// Auto-preprocess for best results
const { processed, appliedOptions } = await ImagePreprocessor.autoPreprocess(imageData);

// Or manually configure preprocessing
const preprocessed = await ImagePreprocessor.preprocess(imageData, {
  grayscale: true,
  denoise: true,
  contrast: true,
  contrastAlpha: 1.5,
  sharpen: true,
  deskew: true,
  threshold: true,
  thresholdValue: 127
});

Language Support

// Get available languages
const languages = RapidOCREngine.getAvailableLanguages();

// Check if a language is supported
if (RapidOCREngine.isLanguageSupported('ta', 'PP-OCRv4')) {
  const tamilOCR = new RapidOCREngine({ lang: 'ta' });
}

// Multi-language text (Chinese example)
const chineseOCR = new RapidOCREngine({ 
  lang: 'ch',
  modelType: 'server' // Better for complex scripts
});

PDF Processing

import { pdfProcessor } from 'client-side-ocr';

// Load and process PDF
const pdfFile = document.getElementById('pdfInput').files[0];
const { pages, pdf } = await pdfProcessor.extractTextFromPDF(pdfFile, {
  scale: 2.0,  // Higher scale for better OCR
  maxPages: 10,  // Limit pages
  onProgress: (progress) => {
    console.log(`Processing page ${progress.currentPage}/${progress.totalPages}`);
  }
});

// Check if PDF has selectable text
const hasText = await pdfProcessor.hasSelectableText(pdf);
if (hasText) {
  const nativeText = await pdfProcessor.extractNativeText(pdf);
  console.log('Extracted text:', nativeText);
} else {
  // Process with OCR
  for (const page of pages) {
    const results = await ocr.process(
      page.imageData.data,
      page.width,
      page.height
    );
  }
}

// Clean up
pdfProcessor.destroy(pdf);

Performance Monitoring

// Set progress callback
ocr.setProgressCallback((progress) => {
  console.log(`${progress.stage}: ${Math.round(progress.progress * 100)}%`);
});

// Monitor download progress for models
ocr.setDownloadProgressCallback((progress) => {
  console.log(`Downloading ${progress.file}: ${progress.progress}%`);
});

// Check if models are cached
const modelsAvailable = await ocr.areModelsAvailable();
if (!modelsAvailable) {
  await ocr.downloadModels();
}

API Documentation

RapidOCREngine

Constructor Options

interface OCREngineOptions {
  lang?: LangType;         // Language code (default: 'en')
  version?: OCRVersion;    // Model version (default: 'PP-OCRv4')
  modelType?: ModelType;   // 'mobile' | 'server' (default: 'mobile')
  config?: Partial<OCRConfig>;  // Advanced configuration
  modelBasePath?: string;  // Custom model path
  enableWordBoxes?: boolean;  // Enable word-level boxes
}

Methods

initialize(): Promise<void> - Initialize the OCR engine
process(imageData: Uint8ClampedArray, width: number, height: number): Promise<OCRResult[]> - Process an image
setProgressCallback(callback: (progress: OCRProgress) => void): void - Set progress callback
setDownloadProgressCallback(callback: (progress: DownloadProgress) => void): void - Set download progress callback
areModelsAvailable(): Promise<boolean> - Check if models are cached
downloadModels(): Promise<void> - Download models if not cached
dispose(): void - Clean up resources

ImagePreprocessor

Methods

preprocess(imageData: ImageData, options: PreprocessingOptions): Promise<ImageData> - Apply preprocessing
autoPreprocess(imageData: ImageData): Promise<{ processed: ImageData, appliedOptions: PreprocessingOptions }> - Auto-detect best preprocessing

Preprocessing Options

interface PreprocessingOptions {
  grayscale?: boolean;         // Convert to grayscale
  threshold?: boolean;         // Apply binary threshold
  thresholdValue?: number;     // Threshold value (0-255)
  denoise?: boolean;           // Remove noise
  denoiseStrength?: number;    // Noise reduction strength
  contrast?: boolean;          // Enhance contrast
  contrastAlpha?: number;      // Contrast factor
  contrastBeta?: number;       // Brightness adjustment
  sharpen?: boolean;           // Sharpen image
  deskew?: boolean;            // Auto-straighten text
  removeBackground?: boolean;  // Simple background removal
  scale?: number;              // Image scale factor
}

Configuration

Environment Variables

Create a .env file in your project root:

# Base URL for deployment
VITE_BASE_URL=/client-ocr/

# Model CDN (optional, defaults to jsDelivr)
VITE_MODEL_CDN=https://cdn.jsdelivr.net/npm/

# Enable debug mode
VITE_DEBUG=false

Language Configuration

Languages are configured in src/core/language-models.ts. Each language includes:

{
  name: string;           // Display name
  nativeName: string;     // Native script name
  direction: 'ltr'|'rtl'; // Text direction
  models: {
    det: {...},  // Detection models
    rec: {...},  // Recognition models
    cls: {...}   // Classification models
  }
}

Development

Prerequisites

Node.js 18+
npm/yarn/pnpm

Setup

# Clone the repository
git clone https://github.com/yourusername/client-side-ocr.git
cd client-side-ocr

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

Project Structure

client-side-ocr/
├── src/
│   ├── core/                 # Core OCR functionality
│   │   ├── rapid-ocr-engine.ts
│   │   ├── language-models.ts
│   │   ├── preprocessing/
│   │   └── postprocessing/
│   ├── workers/              # Web Workers for processing
│   ├── ui/                   # React components
│   └── types/                # TypeScript types
├── public/                   # Static assets
└── dist/                     # Build output

Testing

# Run tests
npm test

# Run tests with coverage
npm run test:coverage

# Run e2e tests
npm run test:e2e

Deployment

GitHub Pages

# Build and deploy to GitHub Pages
npm run build
npm run deploy

Self-Hosting

Build the project:

npm run build

Serve the dist folder with any static file server:

npx serve dist -p 3000

Docker

FROM nginx:alpine
COPY dist /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Build and run:

docker build -t client-ocr .
docker run -p 8080:80 client-ocr

Performance Tips

Use appropriate model type:
- mobile: Faster, good for real-time processing
- server: More accurate, better for complex documents
Preprocess images:
- Use auto-preprocessing for best results
- Enable specific options based on your input
Optimize for your use case:
- For mobile: Use lower resolution images
- For accuracy: Use higher resolution and server models
Cache models:
- Models are cached in IndexedDB after first download
- Check areModelsAvailable() before processing

Browser Support

Chrome 90+
Firefox 88+
Safari 15.4+
Edge 90+

WebAssembly and Web Workers are required.

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

RapidOCR for the OCR models
ONNX Runtime Web for inference
OpenCV.js for image processing
PDF.js for PDF handling

Support

📧 Email: [email protected]
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions
📖 Docs: Documentation

FilesExpand file tree

USAGE.md

Latest commit

History

USAGE.md

File metadata and controls

Client-Side OCR - Usage Guide

Table of Contents

Features

🚀 Core Features

🎨 Advanced Features

Installation

Using npm

Using yarn

Using pnpm

CDN Usage

Usage

Basic OCR Processing

Image Preprocessing

Language Support

PDF Processing

Performance Monitoring

API Documentation

RapidOCREngine

Constructor Options

Methods

ImagePreprocessor

Methods

Preprocessing Options

Configuration

Environment Variables

Language Configuration

Development

Prerequisites

Setup

Project Structure

Testing

Deployment

GitHub Pages

Self-Hosting

Docker

Performance Tips

Browser Support

Contributing

Development Workflow

License

Acknowledgments

Support