A comprehensive benchmarking tool to evaluate OCR performance of Ollama vision models against Supernote's on-device transcription. This tool measures Word Error Rate (WER) and latency to help identify the best model for handwriting recognition on e-ink notes.
- Auto-detection: Automatically discovers all available Ollama vision models
- Flexible Input: Supports both image files (PNG/JPG) and PDF files
- Comprehensive Metrics: Primary WER measurement with detailed latency statistics
- Export Options: Results exported as JSON and CSV for analysis
- Caching: Caches transcriptions to avoid redundant API calls
- Robust Error Handling: Detailed logging and graceful error recovery
- Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh- Install the package and dependencies:
uv pip install -e .Or sync dependencies:
uv sync- Install system dependencies for PDF processing:
# macOS
brew install poppler
# Ubuntu/Debian
sudo apt-get install poppler-utils
# Windows
# Download poppler from https://github.com/oschwartz10612/poppler-windows/releases-
Prepare your dataset following the structure described below
-
Run the benchmark:
uv run python -m src.cli run --dataset-path ./dataset --output ./resultsNote: All commands should be prefixed with uv run to use the virtual environment, or activate it manually:
source .venv/bin/activate # On macOS/Linux
python -m src.cli run --dataset-path ./dataset --output ./results- Clone this repository:
git clone <repository-url>
cd supernote-ocr- Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh- Install the package and dependencies:
uv pip install -e .Or use uv sync (creates virtual environment automatically):
uv syncThis will create a .venv directory with all dependencies installed.
- Install traditional OCR engines (optional but recommended):
For Tesseract (most common, fast):
# macOS
brew install tesseract
# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# Then install Python wrapper (already included in dependencies)For EasyOCR (better accuracy, slower):
uv pip install easyocrFor PaddleOCR (state-of-the-art, best accuracy):
uv pip install paddlepaddle paddleocrOr install all traditional OCR engines:
uv pip install -e ".[traditional-ocr]"- Configure the tool:
cp config.yaml.example config.yaml
# Edit config.yaml with your settingsuv pip install supernote-ocrpip install supernote-ocrThe tool expects a flat directory structure with matching file names:
dataset/
├── note1.png
├── note1_groundtruth.txt
├── note1_supernote.txt
├── note2.jpg
├── note2_groundtruth.txt
├── note2_supernote.txt
├── note3.pdf
├── note3_groundtruth.txt
└── note3_supernote.txt
File naming convention:
- Note files:
{name}.{ext}(e.g.,note1.png,note2.pdf) - Ground truth:
{name}_groundtruth.txt - Supernote transcription:
{name}_supernote.txt
Supported image formats: PNG, JPG, JPEG Supported document formats: PDF
Edit config.yaml to configure:
- Dataset path
- Supernote API credentials (if using API)
- Output directory for results
- Model selection (optional, defaults to all vision models)
uv run python -m src.cli list-modelsThis command will automatically detect all vision-capable models installed in your Ollama instance.
uv run python -m src.cli validate-dataset --dataset-path ./datasetValidates that all note files have corresponding ground truth and Supernote transcription files.
uv run python -m src.cli run --dataset-path ./dataset --output ./resultsRun the full benchmark comparing all models. You can also specify specific models:
uv run python -m src.cli run --dataset-path ./dataset --output ./results --models llama3.2-vision --models llavauv run python -m src.cli run --helpAvailable options:
--dataset-path: Override dataset path from config--output: Override output directory from config--models: Specify specific models to test (can be used multiple times)--verbose: Enable debug logging--config: Specify custom config file path
Results are exported in two formats:
- JSON (
results/results.json): Detailed per-note and per-model results - CSV (
results/summary.csv): Summary table with aggregated metrics
- WER (Word Error Rate): Primary metric, calculated using the jiwer library
- Latency: Mean, median, p95, and p99 inference times per model
If you're using the Supernote API to fetch transcriptions, configure your API credentials in config.yaml. The tool will cache transcriptions locally to avoid repeated API calls.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
python -m pytest tests/Or using unittest:
python -m unittest discover testssupernote-ocr/
├── src/ # Main source code
│ ├── dataset.py # Dataset loading
│ ├── ollama_client.py # Ollama API client
│ ├── supernote_client.py # Supernote API client
│ ├── metrics.py # WER and latency calculations
│ ├── benchmark.py # Benchmark orchestrator
│ ├── config.py # Configuration management
│ └── cli.py # Command-line interface
├── tests/ # Unit tests
├── examples/ # Example datasets
└── config.yaml # Configuration file
This tool was created for benchmarking OCR models on Supernote handwriting. Results and analysis will be published in a blog post.
- Built for comparing Ollama vision models against Supernote's on-device transcription
- Uses jiwer for WER calculations
- Designed to run locally on Apple Silicon (M4) for optimal performance