A smart and privacy-focused Optical Character Recognition (OCR) tool to recognize, save, and query data from a folder full of images with lots of text. It runs locally and is fast, multi-platform, and multi-threaded.
Born as a personal tool. Not meant to be famous.
- folders with documents and recipes
- work environments and workflows where sending images to third-party services is not allowed
- you make screenshots during calls to remember things
- Image text extraction from multiple image formats (PNG, JPG, BMP, etc.)
- Local processing for enhanced privacy
- OCR engine selection: choose
tesseractfor CPU computation orollamafor AI-powered OCR via models provided by Ollama - Multi-threaded for performance
- Optional custom database path via config
- Simple GUI and CLI interfaces
- Python 3.10 and above. Tested on Python 3.10, 3.11, 3.12, and 3.13.
- Dependencies are managed in
pyproject.toml.
-
Clone the repository or download the source code.
-
Install it
pip install .or via uv:
uv pip install .-
Clone the repository or download the source code.
-
Setup a virtual environment and install dependencies via
just:
just prepare
just setupor manually:
python -m pip install virtualenv
python -m virtualenv .venv
source .venv/bin/activate # On Windows use `.venv\Scripts\activate.ps1`
pip install -e .Configuration is managed via the config.toml file in known locations. See config.toml.example for reference.
You can also provide a custom config file path at runtime:
pocr2 index --config C:/path/to/config.tomlIf --config is not provided, POCR2 uses the default known config locations.
Key options in config.toml:
screenshots_dir: directory with images to index.db_path(optional): custom SQLite path. If omitted, POCR2 uses the default data directory.ocr_engine: choosetesseractorollama.ollama_host,ollama_model,ollama_prompt: used whenocr_engine = "ollama".max_workers: OCR parallelism.fuzzy_threshold: default threshold for fuzzy search.
POCR2 uses a unified entrypoint:
pocr2 <command> [--config C:/path/to/config.toml]Commands:
indexruns OCR processing and updates the database.searchruns CLI search mode.--guilaunches the graphical interface.
Examples:
pocr2 index
pocr2 search
pocr2 --gui
pocr2 index --config C:/path/to/config.tomljust commands are still available for convenience. Check justfile for details.
Alternative module invocation (without script wrapper):
python -m src.main indexjust run
Run OCR processing in configured folder to init or update database:
just process
Query the database for text:
just search
- Code architecture
- Tesseract vs Ollama OCR, including GPU usage context and screenshots.
POCR stands for "Python OCR". The "2" because this is the second iteration. The first one was based on Visual LLMs out of curiosity, but proved too cumbersome to run and use. Not every problem needs an LLM solution.
See LICENSE file for details.
Contributions are welcome. Please open an issue or submit a pull request.
This project is provided "as is" without any warranties. Use at your own risk.
