Extracting data from images using OCR

Use OCR to extract text from images in high-throughput workflows.

Common use cases include:

Invoice and receipt processing
Search indexing pipelines
Real-time text capture
Large-scale document digitization

OCR focuses on text extraction and word-level coordinates. It doesn’t provide full semantic layout analysis like ICR.

Download sample

How Nutrient helps

Nutrient Java SDK handles OCR configuration, extraction, and JSON output.

The SDK handles:

Configuring OCR engines and language model selection for character recognition
Implementing word-level bounding box calculation and coordinate transformation
Handling text line detection and reading order determination
Complex language detection algorithms and multi-language text processing

Complete implementation

This example extracts OCR text and writes the output as JSON.

Prerequisites

Before following this guide, ensure you have:

Java 8 or higher installed
Nutrient Java SDK installed (via Maven, Gradle, or manual JAR installation)
An image file to process (PNG, JPEG, or other supported formats)
Basic familiarity with Java try-with-resources statements

For initial SDK setup and configuration, refer to the getting started guide.

Preparing the project

Start by specifying a package name and create a new class:

package io.nutrient.Sample;

Import the required classes from the SDK:

import io.nutrient.sdk.Document;
import io.nutrient.sdk.Vision;
import io.nutrient.sdk.enums.VisionEngine;
import io.nutrient.sdk.exceptions.NutrientException;

import java.io.FileWriter;
import java.io.IOException;

public class ExtractDataFromImageOcr {

Create the main method and declare thrown exceptions:

    public static void main(String[] args) throws NutrientException, IOException {

Configuring OCR mode

Open the image and set the vision engine to OCR.

In this sample:

The document opens in try-with-resources.
setEngine(VisionEngine.Ocr) enables OCR mode.
OCR mode prioritizes extraction speed.

        try (Document document = Document.open("input_ocr_multiple_languages.png")) {
            // Configure OCR engine for fast text extraction
            document.getSettings().getVisionSettings().setEngine(VisionEngine.Ocr);

Creating a vision instance and extracting content

Create a vision instance and call extractContent().

In this sample:

Vision.set(document) binds OCR extraction to the opened document.
extractContent() returns OCR results as a JSON string.
The output includes extracted text and coordinates.

            Vision vision = Vision.set(document);
            String contentJson = vision.extractContent();

Write the JSON string to a file for downstream processing.

Use this output for indexing, analytics, or storage:

            try (FileWriter writer = new FileWriter("output.json")) {
                writer.write(contentJson);
            }
        }
    }
}

Understanding the output

extractContent() in OCR mode returns JSON optimized for text and word-level positions.

OCR output includes:

Text content — Extracted text with line structure
Bounding boxes — Pixel coordinates for text regions
Word-level data — Per-word positions for highlighting or targeting
Language detection — Detected language metadata

Unlike ICR output, OCR output focuses on text and positions instead of semantic document structure.

Error handling

Vision API throws VisionException when OCR extraction fails.

Common failure scenarios include:

The image file can’t be read because of path or permission issues.
Image data is corrupted or uses unsupported encoding.
OCR models are missing or inaccessible.
Available memory is insufficient for large images.
Image format or resolution is unsupported.

In production code:

Catch VisionException.
Return a clear error message.
Log failure details for debugging.

Conclusion

Use this workflow for OCR-based text extraction:

Open the image document using a try-with-resources statement for automatic resource cleanup.
Configure the vision settings by calling getSettings().getVisionSettings().setEngine(VisionEngine.Ocr) to enable fast text extraction.
OCR mode focuses on character recognition and word extraction without semantic analysis or layout detection.
Create a vision instance with Vision.set() to bind text extraction operations to the document.
Call extractContent() to invoke the OCR engine for character recognition.
The OCR engine performs word detection, calculates bounding boxes, and generates JSON output with text and coordinates.
The method returns a JSON-formatted string containing extracted text with word-level bounding boxes in pixel coordinates.
OCR processing is optimized for speed, minimizing computational overhead for high-throughput scenarios.
Write the JSON content to a file using a try-with-resources statement with FileWriter for automatic resource management.
Handle VisionException errors for robust error recovery in production environments.
The JSON output enables integration with search indexing (Elasticsearch, Solr), text analysis, and database storage.
OCR mode is ideal for invoice processing, receipt scanning, search indexing, and document digitization where speed is critical.

For related image extraction workflows, refer to the Java SDK guides.

Download this ready-to-use sample package to explore the Vision API capabilities with preconfigured OCR settings.