Supernote OCR Benchmark Tool

A comprehensive benchmarking tool to evaluate OCR performance of Ollama vision models against Supernote's on-device transcription. This tool measures Word Error Rate (WER) and latency to help identify the best model for handwriting recognition on e-ink notes.

Features

Auto-detection: Automatically discovers all available Ollama vision models
Flexible Input: Supports both image files (PNG/JPG) and PDF files
Comprehensive Metrics: Primary WER measurement with detailed latency statistics
Export Options: Results exported as JSON and CSV for analysis
Caching: Caches transcriptions to avoid redundant API calls
Robust Error Handling: Detailed logging and graceful error recovery

Quick Start

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Install the package and dependencies:

uv pip install -e .

Or sync dependencies:

uv sync

Install system dependencies for PDF processing:

# macOS
brew install poppler

# Ubuntu/Debian
sudo apt-get install poppler-utils

# Windows
# Download poppler from https://github.com/oschwartz10612/poppler-windows/releases

Prepare your dataset following the structure described below
Run the benchmark:

uv run python -m src.cli run --dataset-path ./dataset --output ./results

Note: All commands should be prefixed with uv run to use the virtual environment, or activate it manually:

source .venv/bin/activate  # On macOS/Linux
python -m src.cli run --dataset-path ./dataset --output ./results

Installation

From Source

Clone this repository:

git clone <repository-url>
cd supernote-ocr

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Install the package and dependencies:

uv pip install -e .

Or use uv sync (creates virtual environment automatically):

uv sync

This will create a .venv directory with all dependencies installed.

Install traditional OCR engines (optional but recommended):

For Tesseract (most common, fast):

# macOS
brew install tesseract

# Ubuntu/Debian
sudo apt-get install tesseract-ocr

# Then install Python wrapper (already included in dependencies)

For EasyOCR (better accuracy, slower):

uv pip install easyocr

For PaddleOCR (state-of-the-art, best accuracy):

uv pip install paddlepaddle paddleocr

Or install all traditional OCR engines:

uv pip install -e ".[traditional-ocr]"

Configure the tool:

cp config.yaml.example config.yaml
# Edit config.yaml with your settings

Using uv (if published)

uv pip install supernote-ocr

Using pip (if published)

pip install supernote-ocr

Dataset Structure

The tool expects a flat directory structure with matching file names:

dataset/
├── note1.png
├── note1_groundtruth.txt
├── note1_supernote.txt
├── note2.jpg
├── note2_groundtruth.txt
├── note2_supernote.txt
├── note3.pdf
├── note3_groundtruth.txt
└── note3_supernote.txt

File naming convention:

Note files: {name}.{ext} (e.g., note1.png, note2.pdf)
Ground truth: {name}_groundtruth.txt
Supernote transcription: {name}_supernote.txt

Supported image formats: PNG, JPG, JPEG Supported document formats: PDF

Configuration

Edit config.yaml to configure:

Dataset path
Supernote API credentials (if using API)
Output directory for results
Model selection (optional, defaults to all vision models)

Usage

List available Ollama vision models

uv run python -m src.cli list-models

This command will automatically detect all vision-capable models installed in your Ollama instance.

Validate your dataset

uv run python -m src.cli validate-dataset --dataset-path ./dataset

Validates that all note files have corresponding ground truth and Supernote transcription files.

Run benchmark

uv run python -m src.cli run --dataset-path ./dataset --output ./results

Run the full benchmark comparing all models. You can also specify specific models:

uv run python -m src.cli run --dataset-path ./dataset --output ./results --models llama3.2-vision --models llava

Command-line options

uv run python -m src.cli run --help

Available options:

--dataset-path: Override dataset path from config
--output: Override output directory from config
--models: Specify specific models to test (can be used multiple times)
--verbose: Enable debug logging
--config: Specify custom config file path

Output

Results are exported in two formats:

JSON (results/results.json): Detailed per-note and per-model results
CSV (results/summary.csv): Summary table with aggregated metrics

Metrics

WER (Word Error Rate): Primary metric, calculated using the jiwer library
Latency: Mean, median, p95, and p99 inference times per model

Supernote API Integration

If you're using the Supernote API to fetch transcriptions, configure your API credentials in config.yaml. The tool will cache transcriptions locally to avoid repeated API calls.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Development

Running Tests

python -m pytest tests/

Or using unittest:

python -m unittest discover tests

Project Structure

supernote-ocr/
├── src/              # Main source code
│   ├── dataset.py    # Dataset loading
│   ├── ollama_client.py  # Ollama API client
│   ├── supernote_client.py  # Supernote API client
│   ├── metrics.py    # WER and latency calculations
│   ├── benchmark.py  # Benchmark orchestrator
│   ├── config.py     # Configuration management
│   └── cli.py        # Command-line interface
├── tests/            # Unit tests
├── examples/         # Example datasets
└── config.yaml       # Configuration file

Blog Post

This tool was created for benchmarking OCR models on Supernote handwriting. Results and analysis will be published in a blog post.

Acknowledgments

Built for comparing Ollama vision models against Supernote's on-device transcription
Uses jiwer for WER calculations
Designed to run locally on Apple Silicon (M4) for optimal performance

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples/sample_dataset		examples/sample_dataset
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
RECOMMENDED_MODELS.md		RECOMMENDED_MODELS.md
analysis-excluding-note5.md		analysis-excluding-note5.md
analysis-long-timeout.md		analysis-long-timeout.md
analysis.md		analysis.md
config.yaml.example		config.yaml.example
debug_note_qwen.py		debug_note_qwen.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supernote OCR Benchmark Tool

Features

Quick Start

Installation

From Source

Using uv (if published)

Using pip (if published)

Dataset Structure

Configuration

Usage

List available Ollama vision models

Validate your dataset

Run benchmark

Command-line options

Output

Metrics

Supernote API Integration

Contributing

License

Development

Running Tests

Project Structure

Blog Post

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Supernote OCR Benchmark Tool

Features

Quick Start

Installation

From Source

Using uv (if published)

Using pip (if published)

Dataset Structure

Configuration

Usage

List available Ollama vision models

Validate your dataset

Run benchmark

Command-line options

Output

Metrics

Supernote API Integration

Contributing

License

Development

Running Tests

Project Structure

Blog Post

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages