Skip to content

borismus/supernote-ocr-bench

Repository files navigation

Supernote OCR Benchmark Tool

A comprehensive benchmarking tool to evaluate OCR performance of Ollama vision models against Supernote's on-device transcription. This tool measures Word Error Rate (WER) and latency to help identify the best model for handwriting recognition on e-ink notes.

Features

  • Auto-detection: Automatically discovers all available Ollama vision models
  • Flexible Input: Supports both image files (PNG/JPG) and PDF files
  • Comprehensive Metrics: Primary WER measurement with detailed latency statistics
  • Export Options: Results exported as JSON and CSV for analysis
  • Caching: Caches transcriptions to avoid redundant API calls
  • Robust Error Handling: Detailed logging and graceful error recovery

Quick Start

  1. Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Install the package and dependencies:
uv pip install -e .

Or sync dependencies:

uv sync
  1. Install system dependencies for PDF processing:
# macOS
brew install poppler

# Ubuntu/Debian
sudo apt-get install poppler-utils

# Windows
# Download poppler from https://github.com/oschwartz10612/poppler-windows/releases
  1. Prepare your dataset following the structure described below

  2. Run the benchmark:

uv run python -m src.cli run --dataset-path ./dataset --output ./results

Note: All commands should be prefixed with uv run to use the virtual environment, or activate it manually:

source .venv/bin/activate  # On macOS/Linux
python -m src.cli run --dataset-path ./dataset --output ./results

Installation

From Source

  1. Clone this repository:
git clone <repository-url>
cd supernote-ocr
  1. Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Install the package and dependencies:
uv pip install -e .

Or use uv sync (creates virtual environment automatically):

uv sync

This will create a .venv directory with all dependencies installed.

  1. Install traditional OCR engines (optional but recommended):

For Tesseract (most common, fast):

# macOS
brew install tesseract

# Ubuntu/Debian
sudo apt-get install tesseract-ocr

# Then install Python wrapper (already included in dependencies)

For EasyOCR (better accuracy, slower):

uv pip install easyocr

For PaddleOCR (state-of-the-art, best accuracy):

uv pip install paddlepaddle paddleocr

Or install all traditional OCR engines:

uv pip install -e ".[traditional-ocr]"
  1. Configure the tool:
cp config.yaml.example config.yaml
# Edit config.yaml with your settings

Using uv (if published)

uv pip install supernote-ocr

Using pip (if published)

pip install supernote-ocr

Dataset Structure

The tool expects a flat directory structure with matching file names:

dataset/
├── note1.png
├── note1_groundtruth.txt
├── note1_supernote.txt
├── note2.jpg
├── note2_groundtruth.txt
├── note2_supernote.txt
├── note3.pdf
├── note3_groundtruth.txt
└── note3_supernote.txt

File naming convention:

  • Note files: {name}.{ext} (e.g., note1.png, note2.pdf)
  • Ground truth: {name}_groundtruth.txt
  • Supernote transcription: {name}_supernote.txt

Supported image formats: PNG, JPG, JPEG Supported document formats: PDF

Configuration

Edit config.yaml to configure:

  • Dataset path
  • Supernote API credentials (if using API)
  • Output directory for results
  • Model selection (optional, defaults to all vision models)

Usage

List available Ollama vision models

uv run python -m src.cli list-models

This command will automatically detect all vision-capable models installed in your Ollama instance.

Validate your dataset

uv run python -m src.cli validate-dataset --dataset-path ./dataset

Validates that all note files have corresponding ground truth and Supernote transcription files.

Run benchmark

uv run python -m src.cli run --dataset-path ./dataset --output ./results

Run the full benchmark comparing all models. You can also specify specific models:

uv run python -m src.cli run --dataset-path ./dataset --output ./results --models llama3.2-vision --models llava

Command-line options

uv run python -m src.cli run --help

Available options:

  • --dataset-path: Override dataset path from config
  • --output: Override output directory from config
  • --models: Specify specific models to test (can be used multiple times)
  • --verbose: Enable debug logging
  • --config: Specify custom config file path

Output

Results are exported in two formats:

  1. JSON (results/results.json): Detailed per-note and per-model results
  2. CSV (results/summary.csv): Summary table with aggregated metrics

Metrics

  • WER (Word Error Rate): Primary metric, calculated using the jiwer library
  • Latency: Mean, median, p95, and p99 inference times per model

Supernote API Integration

If you're using the Supernote API to fetch transcriptions, configure your API credentials in config.yaml. The tool will cache transcriptions locally to avoid repeated API calls.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Development

Running Tests

python -m pytest tests/

Or using unittest:

python -m unittest discover tests

Project Structure

supernote-ocr/
├── src/              # Main source code
│   ├── dataset.py    # Dataset loading
│   ├── ollama_client.py  # Ollama API client
│   ├── supernote_client.py  # Supernote API client
│   ├── metrics.py    # WER and latency calculations
│   ├── benchmark.py  # Benchmark orchestrator
│   ├── config.py     # Configuration management
│   └── cli.py        # Command-line interface
├── tests/            # Unit tests
├── examples/         # Example datasets
└── config.yaml       # Configuration file

Blog Post

This tool was created for benchmarking OCR models on Supernote handwriting. Results and analysis will be published in a blog post.

Acknowledgments

  • Built for comparing Ollama vision models against Supernote's on-device transcription
  • Uses jiwer for WER calculations
  • Designed to run locally on Apple Silicon (M4) for optimal performance

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages