Skip to content

Keyvanhardani/german-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

40 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

German-OCR Logo

German-OCR

High-performance German document OCR - Local & Cloud

PyPI version npm version Packagist License Cloud API

Hugging Face Ollama llama.cpp


πŸš€ Supported Backends

Hugging Face Β Β Β Β  Ollama Β Β Β Β  llama.cpp

Hugging Face Β β€’Β  Ollama Β β€’Β  llama.cpp


✨ Features

Feature Local Cloud (v1) Cloud (v2)
German Documents Invoices, contracts, forms All document types Structured extraction
Output Formats Markdown, JSON, text JSON, Markdown, text, n8n Typed JSON fields
PDF Support Images only Up to 50 pages Up to 50 pages
Privacy 100% local DSGVO-konform (Frankfurt) DSGVO-konform (Frankfurt)
Speed ~5s/page ~2-3s/page (async) Instant (synchronous)
Backends Ollama, llama.cpp, HuggingFace Cloud API Cloud API
Hardware CPU, GPU, NPU (CUDA/Metal/Vulkan/OpenVINO) Managed Managed

πŸ“¦ Installation

Python

pip install german-ocr

Node.js

npm install german-ocr

PHP

composer require keyvan/german-ocr

⚑ Quick Start

Option 1: ☁️ Cloud API (Recommended)

No GPU required. Get your API credentials at app.german-ocr.de

from german_ocr import CloudClient

# API Key + Secret (Secret is only shown once at creation!)
client = CloudClient(
    api_key="gocr_xxxxxxxx",
    api_secret="your_64_char_secret_here"
)

# Simple extraction
result = client.analyze("invoice.pdf")
print(result.text)

# Structured JSON output
result = client.analyze(
    "invoice.pdf",
    prompt="Extrahiere Rechnungsnummer und Gesamtbetrag",
    output_format="json"
)
print(result.text)

Node.js

const { GermanOCR } = require('german-ocr');

const client = new GermanOCR(
    process.env.GERMAN_OCR_API_KEY,
    process.env.GERMAN_OCR_API_SECRET
);

const result = await client.analyze('invoice.pdf', {
    model: 'german-ocr-ultra'
});
console.log(result.text);

PHP

<?php
use GermanOCR\GermanOCR;

$client = new GermanOCR(
    getenv('GERMAN_OCR_API_KEY'),
    getenv('GERMAN_OCR_API_SECRET')
);

$result = $client->analyze('invoice.pdf', [
    'model' => GermanOCR::MODEL_ULTRA
]);
echo $result['text'];

Option 2: πŸ¦™ Local (Ollama)

Requires Ollama installed.

# Install model
ollama pull Keyvan/german-ocr-turbo
from german_ocr import GermanOCR

ocr = GermanOCR()
text = ocr.extract("invoice.png")
print(text)

Option 3: πŸ”§ Local (llama.cpp)

For maximum control and edge deployment with GGUF models.

# Install with GPU support (CUDA)
CMAKE_ARGS="-DGGML_CUDA=on" pip install german-ocr[llamacpp]

# Or CPU only
pip install german-ocr[llamacpp]
from german_ocr import GermanOCR

# Auto-detect best device (GPU/CPU)
ocr = GermanOCR(backend="llamacpp")
text = ocr.extract("invoice.png")

# Force CPU only
ocr = GermanOCR(backend="llamacpp", n_gpu_layers=0)

# Full GPU acceleration
ocr = GermanOCR(backend="llamacpp", n_gpu_layers=-1)

☁️ Cloud Models

Model Parameter Best For
German-OCR Ultra german-ocr-ultra Maximale PrΓ€zision, Strukturerkennung
German-OCR Pro german-ocr-pro Balance aus Speed & QualitΓ€t
German-OCR Turbo german-ocr DSGVO-konform, lokale Verarbeitung in DE
Privacy Shield privacy-shield PII-Erkennung & Anonymisierung

Model Selection

from german_ocr import CloudClient

client = CloudClient(
    api_key="gocr_xxxxxxxx",
    api_secret="your_64_char_secret_here"
)

# German-OCR Ultra - Maximale PrΓ€zision
result = client.analyze("dokument.pdf", model="german-ocr-ultra")

# German-OCR Pro - Schnelle Cloud (Standard)
result = client.analyze("dokument.pdf", model="german-ocr-pro")

# German-OCR Turbo - Lokal, DSGVO-konform
result = client.analyze("dokument.pdf", model="german-ocr")

# Privacy Shield - PII detection & anonymization
result = client.analyze("dokument.pdf", model="privacy-shield")

πŸ†• German-OCR v2 β€” Premium Structured Extraction

v2 is a synchronous premium API that returns structured JSON instantly β€” no job polling needed.

Base URL: https://api.german-ocr.de/v2/analyze Β |Β  Price: €0.10/page

v2 Templates

Template Use Case Key Fields
general Auto-detect document type document_type, sender, amounts, iban, full_text
invoice German invoices rechnungssteller, rechnungsnummer, positionen, gesamtbetrag, iban
delivery-notes Delivery notes belegnummer, belegdatum, empfaenger, positionen
document-intelligence Bounding box extraction Field coordinates for visual annotation

v2 Quick Start (Python)

import httpx

response = httpx.post(
    "https://api.german-ocr.de/v2/analyze",
    headers={"Authorization": f"Bearer {api_key}:{api_secret}"},
    files={"file": open("invoice.pdf", "rb")},
    data={"template": "invoice"}
)
result = response.json()
print(result["result"]["rechnungsnummer"])  # "2024-001"
print(result["result"]["gesamtbetrag"])     # "1.499,99"

Note: v2 uses template (not model!) as the parameter name.


πŸ’» CLI Usage

Cloud

# Set API credentials (Secret shown only once at creation!)
export GERMAN_OCR_API_KEY="gocr_xxxxxxxx"
export GERMAN_OCR_API_SECRET="your_64_char_secret_here"

# Extract text (uses German-OCR Pro by default)
german-ocr --cloud invoice.pdf

# Use German-OCR Turbo (DSGVO-konform, lokal)
german-ocr --cloud --model german-ocr invoice.pdf

# JSON output with German-OCR Ultra
german-ocr --cloud --model german-ocr-ultra --output-format json invoice.pdf

# With custom prompt
german-ocr --cloud --prompt "Extrahiere alle Betraege" invoice.pdf

Local

# Single image
german-ocr invoice.png

# Batch processing
german-ocr --batch ./invoices/

# JSON output
german-ocr --format json invoice.png

πŸ”Œ API Endpoints

Endpoint Method Description
/v1/analyze POST OCR analysis (async, needs polling)
/v1/jobs/{id} GET Job status + result
/v1/jobs/{id} DELETE Cancel job
/v1/models GET List available models
/v1/balance GET Account balance
/v1/usage GET Usage statistics
/v2/analyze POST Premium analysis (sync, instant)
/v2/models GET List v2 templates

Full API documentation: german-ocr.de/docs

Output Formats

Format Description
text Plain text (default)
json Structured JSON
markdown Formatted Markdown
n8n n8n-compatible format

Progress Tracking

from german_ocr import CloudClient

client = CloudClient(
    api_key="gocr_xxxxxxxx",
    api_secret="your_64_char_secret"
)

def on_progress(status):
    print(f"Page {status.current_page}/{status.total_pages}")

result = client.analyze(
    "large_document.pdf",
    on_progress=on_progress
)

Async Processing (v1)

# Submit job with German-OCR Pro
job = client.submit("document.pdf", model="german-ocr-pro", output_format="json")
print(f"Job ID: {job.job_id}")

# Check status
status = client.get_job(job.job_id)
print(f"Status: {status.status}")

# Wait for result
result = client.wait_for_result(job.job_id)

# Cancel job
client.cancel_job(job.job_id)

Account Info

# Check balance
balance = client.get_balance()
print(f"Balance: {balance}")

# Usage statistics
usage = client.get_usage()
print(f"Usage: {usage}")

πŸ”— Integrations

Category Platforms
Automation Zapier, Make.com, n8n
CMS WordPress Plugin, Magento 2, TYPO3, Shopify
Frameworks Laravel, Symfony, Django, Flask, Spring Boot, .NET, Ruby on Rails

🏠 Local Models

πŸ¦™ Ollama Models

Model Size Speed Best For
german-ocr-turbo 1.9 GB ~5s Recommended
german-ocr 3.2 GB ~7s Standard

πŸ€— GGUF Models (llama.cpp / Hugging Face)

Model Size Speed Best For
german-ocr-2b 1.5 GB ~5s (GPU) / ~25s (CPU) Edge/Embedded
german-ocr-turbo 1.9 GB ~5s (GPU) / ~20s (CPU) Best accuracy

Hardware Support:

  • CUDA (NVIDIA GPUs)
  • Metal (Apple Silicon)
  • Vulkan (AMD/Intel/NVIDIA)
  • OpenVINO (Intel NPU)
  • CPU (all platforms)

πŸ’° Pricing

See current pricing at app.german-ocr.de

πŸ“„ License

Apache 2.0 - See LICENSE for details.

πŸ‘€ Author

Keyvan Hardani - keyvan.ai


Made with ❀️ in Germany πŸ‡©πŸ‡ͺ

⭐ Star us on GitHub!

About

German-OCR is specifically trained to extract text from German documents including invoices, receipts, forms, and other business documents.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors