High-performance German document OCR - Local & Cloud
Hugging Face Β β’Β Ollama Β β’Β llama.cpp
| Feature | Local | Cloud (v1) | Cloud (v2) |
|---|---|---|---|
| German Documents | Invoices, contracts, forms | All document types | Structured extraction |
| Output Formats | Markdown, JSON, text | JSON, Markdown, text, n8n | Typed JSON fields |
| PDF Support | Images only | Up to 50 pages | Up to 50 pages |
| Privacy | 100% local | DSGVO-konform (Frankfurt) | DSGVO-konform (Frankfurt) |
| Speed | ~5s/page | ~2-3s/page (async) | Instant (synchronous) |
| Backends | Ollama, llama.cpp, HuggingFace | Cloud API | Cloud API |
| Hardware | CPU, GPU, NPU (CUDA/Metal/Vulkan/OpenVINO) | Managed | Managed |
pip install german-ocrnpm install german-ocrcomposer require keyvan/german-ocrNo GPU required. Get your API credentials at app.german-ocr.de
from german_ocr import CloudClient
# API Key + Secret (Secret is only shown once at creation!)
client = CloudClient(
api_key="gocr_xxxxxxxx",
api_secret="your_64_char_secret_here"
)
# Simple extraction
result = client.analyze("invoice.pdf")
print(result.text)
# Structured JSON output
result = client.analyze(
"invoice.pdf",
prompt="Extrahiere Rechnungsnummer und Gesamtbetrag",
output_format="json"
)
print(result.text)const { GermanOCR } = require('german-ocr');
const client = new GermanOCR(
process.env.GERMAN_OCR_API_KEY,
process.env.GERMAN_OCR_API_SECRET
);
const result = await client.analyze('invoice.pdf', {
model: 'german-ocr-ultra'
});
console.log(result.text);<?php
use GermanOCR\GermanOCR;
$client = new GermanOCR(
getenv('GERMAN_OCR_API_KEY'),
getenv('GERMAN_OCR_API_SECRET')
);
$result = $client->analyze('invoice.pdf', [
'model' => GermanOCR::MODEL_ULTRA
]);
echo $result['text'];Requires Ollama installed.
# Install model
ollama pull Keyvan/german-ocr-turbofrom german_ocr import GermanOCR
ocr = GermanOCR()
text = ocr.extract("invoice.png")
print(text)For maximum control and edge deployment with GGUF models.
# Install with GPU support (CUDA)
CMAKE_ARGS="-DGGML_CUDA=on" pip install german-ocr[llamacpp]
# Or CPU only
pip install german-ocr[llamacpp]from german_ocr import GermanOCR
# Auto-detect best device (GPU/CPU)
ocr = GermanOCR(backend="llamacpp")
text = ocr.extract("invoice.png")
# Force CPU only
ocr = GermanOCR(backend="llamacpp", n_gpu_layers=0)
# Full GPU acceleration
ocr = GermanOCR(backend="llamacpp", n_gpu_layers=-1)| Model | Parameter | Best For |
|---|---|---|
| German-OCR Ultra | german-ocr-ultra |
Maximale PrΓ€zision, Strukturerkennung |
| German-OCR Pro | german-ocr-pro |
Balance aus Speed & QualitΓ€t |
| German-OCR Turbo | german-ocr |
DSGVO-konform, lokale Verarbeitung in DE |
| Privacy Shield | privacy-shield |
PII-Erkennung & Anonymisierung |
from german_ocr import CloudClient
client = CloudClient(
api_key="gocr_xxxxxxxx",
api_secret="your_64_char_secret_here"
)
# German-OCR Ultra - Maximale PrΓ€zision
result = client.analyze("dokument.pdf", model="german-ocr-ultra")
# German-OCR Pro - Schnelle Cloud (Standard)
result = client.analyze("dokument.pdf", model="german-ocr-pro")
# German-OCR Turbo - Lokal, DSGVO-konform
result = client.analyze("dokument.pdf", model="german-ocr")
# Privacy Shield - PII detection & anonymization
result = client.analyze("dokument.pdf", model="privacy-shield")v2 is a synchronous premium API that returns structured JSON instantly β no job polling needed.
Base URL: https://api.german-ocr.de/v2/analyze Β |Β Price: β¬0.10/page
| Template | Use Case | Key Fields |
|---|---|---|
general |
Auto-detect document type | document_type, sender, amounts, iban, full_text |
invoice |
German invoices | rechnungssteller, rechnungsnummer, positionen, gesamtbetrag, iban |
delivery-notes |
Delivery notes | belegnummer, belegdatum, empfaenger, positionen |
document-intelligence |
Bounding box extraction | Field coordinates for visual annotation |
import httpx
response = httpx.post(
"https://api.german-ocr.de/v2/analyze",
headers={"Authorization": f"Bearer {api_key}:{api_secret}"},
files={"file": open("invoice.pdf", "rb")},
data={"template": "invoice"}
)
result = response.json()
print(result["result"]["rechnungsnummer"]) # "2024-001"
print(result["result"]["gesamtbetrag"]) # "1.499,99"Note: v2 uses
template(notmodel!) as the parameter name.
# Set API credentials (Secret shown only once at creation!)
export GERMAN_OCR_API_KEY="gocr_xxxxxxxx"
export GERMAN_OCR_API_SECRET="your_64_char_secret_here"
# Extract text (uses German-OCR Pro by default)
german-ocr --cloud invoice.pdf
# Use German-OCR Turbo (DSGVO-konform, lokal)
german-ocr --cloud --model german-ocr invoice.pdf
# JSON output with German-OCR Ultra
german-ocr --cloud --model german-ocr-ultra --output-format json invoice.pdf
# With custom prompt
german-ocr --cloud --prompt "Extrahiere alle Betraege" invoice.pdf# Single image
german-ocr invoice.png
# Batch processing
german-ocr --batch ./invoices/
# JSON output
german-ocr --format json invoice.png| Endpoint | Method | Description |
|---|---|---|
/v1/analyze |
POST | OCR analysis (async, needs polling) |
/v1/jobs/{id} |
GET | Job status + result |
/v1/jobs/{id} |
DELETE | Cancel job |
/v1/models |
GET | List available models |
/v1/balance |
GET | Account balance |
/v1/usage |
GET | Usage statistics |
/v2/analyze |
POST | Premium analysis (sync, instant) |
/v2/models |
GET | List v2 templates |
Full API documentation: german-ocr.de/docs
| Format | Description |
|---|---|
text |
Plain text (default) |
json |
Structured JSON |
markdown |
Formatted Markdown |
n8n |
n8n-compatible format |
from german_ocr import CloudClient
client = CloudClient(
api_key="gocr_xxxxxxxx",
api_secret="your_64_char_secret"
)
def on_progress(status):
print(f"Page {status.current_page}/{status.total_pages}")
result = client.analyze(
"large_document.pdf",
on_progress=on_progress
)# Submit job with German-OCR Pro
job = client.submit("document.pdf", model="german-ocr-pro", output_format="json")
print(f"Job ID: {job.job_id}")
# Check status
status = client.get_job(job.job_id)
print(f"Status: {status.status}")
# Wait for result
result = client.wait_for_result(job.job_id)
# Cancel job
client.cancel_job(job.job_id)# Check balance
balance = client.get_balance()
print(f"Balance: {balance}")
# Usage statistics
usage = client.get_usage()
print(f"Usage: {usage}")| Category | Platforms |
|---|---|
| Automation | Zapier, Make.com, n8n |
| CMS | WordPress Plugin, Magento 2, TYPO3, Shopify |
| Frameworks | Laravel, Symfony, Django, Flask, Spring Boot, .NET, Ruby on Rails |
| Model | Size | Speed | Best For |
|---|---|---|---|
| german-ocr-turbo | 1.9 GB | ~5s | Recommended |
| german-ocr | 3.2 GB | ~7s | Standard |
| Model | Size | Speed | Best For |
|---|---|---|---|
| german-ocr-2b | 1.5 GB | ~5s (GPU) / ~25s (CPU) | Edge/Embedded |
| german-ocr-turbo | 1.9 GB | ~5s (GPU) / ~20s (CPU) | Best accuracy |
Hardware Support:
- CUDA (NVIDIA GPUs)
- Metal (Apple Silicon)
- Vulkan (AMD/Intel/NVIDIA)
- OpenVINO (Intel NPU)
- CPU (all platforms)
See current pricing at app.german-ocr.de
Apache 2.0 - See LICENSE for details.
Keyvan Hardani - keyvan.ai
Made with β€οΈ in Germany π©πͺ