Use ICR to extract structured document data from images with local models.

Common use cases include:

  • Air-gapped document processing
  • Privacy-sensitive workflows with local-only processing
  • High-volume extraction with predictable runtime cost
  • Pipelines that need layout and semantic structure

ICR returns more than plain text. It detects layout and semantic elements such as tables, key-value regions, headings, and equations.

Download sample

How Nutrient helps

Nutrient Python SDK handles local model loading, layout analysis, and JSON output generation.

The SDK handles:

  • Local model deployment and loading details
  • Table detection and cell boundary extraction
  • Semantic element classification and hierarchy parsing
  • Bounding box and reading-order calculation

Prerequisites

Before following this guide, ensure you have:

  • Python 3.8 or higher installed
  • Nutrient Python SDK installed (pip install nutrient-sdk)
  • An image file to process (PNG, JPEG, or other supported formats)
  • Basic familiarity with Python context manager(opens in a new tab) and the with statement

For initial SDK setup and configuration, refer to the getting started guide.

Complete implementation

This example extracts structured JSON from an image using the ICR engine:

from nutrient_sdk import Document, Vision, VisionEngine

Configuring ICR mode

Open the image and set the vision engine to ICR.

In this sample:

  • The document opens in a context manager(opens in a new tab).
  • document.settings.vision_settings.engine = VisionEngine.ICR sets local ICR mode.
  • ICR is the default engine, so this step is optional.

ICR is the default engine, so this property assignment is optional but shown here for illustration purposes.

with Document.open("input_ocr_multiple_languages.png") as document:
# Configure ICR engine for local processing (this is the default)
document.settings.vision_settings.engine = VisionEngine.ICR

Creating a vision instance and extracting content

Create a vision instance and call extract_content().

In this sample:

  • Vision.set(document) binds extraction to the opened document.
  • extract_content() returns structured JSON as a string.
  • Processing runs locally when the engine is ICR.
vision = Vision.set(document)
content_json = vision.extract_content()

Write the JSON string to a file for downstream use.

Use the output for storage, indexing, or custom analysis:

with open("output.json", "w") as f:
f.write(content_json)

Understanding the output

extract_content() returns structured JSON with layout and semantic information.

ICR output includes:

  • Document elements — Paragraphs, headings, tables, figures, and equations
  • Bounding boxes — Pixel coordinates for detected regions
  • Reading order — Element order for content flow reconstruction
  • Element classification — Semantic labels such as paragraph, table, and heading
  • Hierarchical structure — Parent-child relationships across sections and blocks

Use this JSON for extraction pipelines, structured storage, and search indexing.

Error handling

Vision API raises VisionException when extraction fails.

Common failure scenarios include:

  • The image file can’t be read because of path or permission issues.
  • Image data is corrupted or truncated.
  • ICR models are missing or inaccessible.
  • Available memory is insufficient for model loading.
  • Image format or encoding is unsupported.

In production code:

  • Catch VisionException.
  • Return a clear error message.
  • Log failure details for debugging.

Conclusion

Use this workflow for ICR-based extraction:

  1. Open the image document using a context manager(opens in a new tab) for automatic resource cleanup.
  2. Configure the vision settings with the engine property assigned to VisionEngine.ICR for local AI processing.
  3. ICR is the default engine, making this configuration optional but useful for explicit control.
  4. Create a vision instance with Vision.set() to bind content extraction operations to the document.
  5. Call extract_content() to invoke local AI models for document layout analysis.
  6. The ICR engine loads AI models, detects semantic elements (tables, equations, headings), and determines reading order.
  7. The method returns a JSON-formatted string containing complete document structure with bounding boxes in pixel coordinates.
  8. All processing occurs locally without external API calls, ensuring data privacy and offline capability.
  9. Write the JSON content to a file using Python’s built-in file handling with context manager(opens in a new tab) syntax.
  10. Handle VisionException errors for robust error recovery in production environments.
  11. The JSON output enables integration with downstream pipelines, including data extraction, database storage, and search indexing.
  12. ICR mode is ideal for air-gapped environments, sensitive document processing, and high-volume workflows.

For related image extraction workflows, refer to the Python SDK guides.

Download this ready-to-use sample package to explore the Vision API capabilities with preconfigured ICR settings.