Extracting data from images using ICR
Use ICR to extract structured document data from images with local models.
Common use cases include:
- Air-gapped document processing
- Privacy-sensitive workflows with local-only processing
- High-volume extraction with predictable runtime cost
- Pipelines that need layout and semantic structure
ICR returns more than plain text. It detects layout and semantic elements such as tables, key-value regions, headings, and equations.
Download sampleHow Nutrient helps
Nutrient Python SDK handles local model loading, layout analysis, and JSON output generation.
The SDK handles:
- Local model deployment and loading details
- Table detection and cell boundary extraction
- Semantic element classification and hierarchy parsing
- Bounding box and reading-order calculation
Prerequisites
Before following this guide, ensure you have:
- Python 3.8 or higher installed
- Nutrient Python SDK installed (
pip install nutrient-sdk) - An image file to process (PNG, JPEG, or other supported formats)
- Basic familiarity with Python context manager(opens in a new tab) and the
withstatement
For initial SDK setup and configuration, refer to the getting started guide.
Complete implementation
This example extracts structured JSON from an image using the ICR engine:
from nutrient_sdk import Document, Vision, VisionEngineConfiguring ICR mode
Open the image and set the vision engine to ICR.
In this sample:
- The document opens in a context manager(opens in a new tab).
document.settings.vision_settings.engine = VisionEngine.ICRsets local ICR mode.- ICR is the default engine, so this step is optional.
ICR is the default engine, so this property assignment is optional but shown here for illustration purposes.
with Document.open("input_ocr_multiple_languages.png") as document: # Configure ICR engine for local processing (this is the default) document.settings.vision_settings.engine = VisionEngine.ICRCreating a vision instance and extracting content
Create a vision instance and call extract_content().
In this sample:
Vision.set(document)binds extraction to the opened document.extract_content()returns structured JSON as a string.- Processing runs locally when the engine is ICR.
vision = Vision.set(document) content_json = vision.extract_content()Write the JSON string to a file for downstream use.
Use the output for storage, indexing, or custom analysis:
with open("output.json", "w") as f: f.write(content_json)Understanding the output
extract_content() returns structured JSON with layout and semantic information.
ICR output includes:
- Document elements — Paragraphs, headings, tables, figures, and equations
- Bounding boxes — Pixel coordinates for detected regions
- Reading order — Element order for content flow reconstruction
- Element classification — Semantic labels such as paragraph, table, and heading
- Hierarchical structure — Parent-child relationships across sections and blocks
Use this JSON for extraction pipelines, structured storage, and search indexing.
Error handling
Vision API raises VisionException when extraction fails.
Common failure scenarios include:
- The image file can’t be read because of path or permission issues.
- Image data is corrupted or truncated.
- ICR models are missing or inaccessible.
- Available memory is insufficient for model loading.
- Image format or encoding is unsupported.
In production code:
- Catch
VisionException. - Return a clear error message.
- Log failure details for debugging.
Conclusion
Use this workflow for ICR-based extraction:
- Open the image document using a context manager(opens in a new tab) for automatic resource cleanup.
- Configure the vision settings with the
engineproperty assigned toVisionEngine.ICRfor local AI processing. - ICR is the default engine, making this configuration optional but useful for explicit control.
- Create a vision instance with
Vision.set()to bind content extraction operations to the document. - Call
extract_content()to invoke local AI models for document layout analysis. - The ICR engine loads AI models, detects semantic elements (tables, equations, headings), and determines reading order.
- The method returns a JSON-formatted string containing complete document structure with bounding boxes in pixel coordinates.
- All processing occurs locally without external API calls, ensuring data privacy and offline capability.
- Write the JSON content to a file using Python’s built-in file handling with context manager(opens in a new tab) syntax.
- Handle
VisionExceptionerrors for robust error recovery in production environments. - The JSON output enables integration with downstream pipelines, including data extraction, database storage, and search indexing.
- ICR mode is ideal for air-gapped environments, sensitive document processing, and high-volume workflows.
For related image extraction workflows, refer to the Python SDK guides.
Download this ready-to-use sample package to explore the Vision API capabilities with preconfigured ICR settings.