Extracting data from images using OCR
Use OCR to extract text from images in high-throughput workflows.
Common use cases include:
- Invoice and receipt processing
- Search indexing pipelines
- Real-time text capture
- Large-scale document digitization
OCR focuses on text extraction and word-level coordinates. It doesn’t provide full semantic layout analysis like ICR.
Download sampleHow Nutrient helps
Nutrient Java SDK handles OCR configuration, extraction, and JSON output.
The SDK handles:
- Configuring OCR engines and language model selection for character recognition
- Implementing word-level bounding box calculation and coordinate transformation
- Handling text line detection and reading order determination
- Complex language detection algorithms and multi-language text processing
Complete implementation
This example extracts OCR text and writes the output as JSON.
Prerequisites
Before following this guide, ensure you have:
- Java 8 or higher installed
- Nutrient Java SDK installed (via Maven, Gradle, or manual JAR installation)
- An image file to process (PNG, JPEG, or other supported formats)
- Basic familiarity with Java try-with-resources statements
For initial SDK setup and configuration, refer to the getting started guide.
Preparing the project
Start by specifying a package name and create a new class:
package io.nutrient.Sample;Import the required classes from the SDK:
import io.nutrient.sdk.Document;import io.nutrient.sdk.Vision;import io.nutrient.sdk.enums.VisionEngine;import io.nutrient.sdk.exceptions.NutrientException;
import java.io.FileWriter;import java.io.IOException;
public class ExtractDataFromImageOcr {Create the main method and declare thrown exceptions:
public static void main(String[] args) throws NutrientException, IOException {Configuring OCR mode
Open the image and set the vision engine to OCR.
In this sample:
- The document opens in try-with-resources.
setEngine(VisionEngine.Ocr)enables OCR mode.- OCR mode prioritizes extraction speed.
try (Document document = Document.open("input_ocr_multiple_languages.png")) { // Configure OCR engine for fast text extraction document.getSettings().getVisionSettings().setEngine(VisionEngine.Ocr);Creating a vision instance and extracting content
Create a vision instance and call extractContent().
In this sample:
Vision.set(document)binds OCR extraction to the opened document.extractContent()returns OCR results as a JSON string.- The output includes extracted text and coordinates.
Vision vision = Vision.set(document); String contentJson = vision.extractContent();Write the JSON string to a file for downstream processing.
Use this output for indexing, analytics, or storage:
try (FileWriter writer = new FileWriter("output.json")) { writer.write(contentJson); } } }}Understanding the output
extractContent() in OCR mode returns JSON optimized for text and word-level positions.
OCR output includes:
- Text content — Extracted text with line structure
- Bounding boxes — Pixel coordinates for text regions
- Word-level data — Per-word positions for highlighting or targeting
- Language detection — Detected language metadata
Unlike ICR output, OCR output focuses on text and positions instead of semantic document structure.
Error handling
Vision API throws VisionException when OCR extraction fails.
Common failure scenarios include:
- The image file can’t be read because of path or permission issues.
- Image data is corrupted or uses unsupported encoding.
- OCR models are missing or inaccessible.
- Available memory is insufficient for large images.
- Image format or resolution is unsupported.
In production code:
- Catch
VisionException. - Return a clear error message.
- Log failure details for debugging.
Conclusion
Use this workflow for OCR-based text extraction:
- Open the image document using a try-with-resources statement for automatic resource cleanup.
- Configure the vision settings by calling
getSettings().getVisionSettings().setEngine(VisionEngine.Ocr)to enable fast text extraction. - OCR mode focuses on character recognition and word extraction without semantic analysis or layout detection.
- Create a vision instance with
Vision.set()to bind text extraction operations to the document. - Call
extractContent()to invoke the OCR engine for character recognition. - The OCR engine performs word detection, calculates bounding boxes, and generates JSON output with text and coordinates.
- The method returns a JSON-formatted string containing extracted text with word-level bounding boxes in pixel coordinates.
- OCR processing is optimized for speed, minimizing computational overhead for high-throughput scenarios.
- Write the JSON content to a file using a try-with-resources statement with
FileWriterfor automatic resource management. - Handle
VisionExceptionerrors for robust error recovery in production environments. - The JSON output enables integration with search indexing (Elasticsearch, Solr), text analysis, and database storage.
- OCR mode is ideal for invoice processing, receipt scanning, search indexing, and document digitization where speed is critical.
For related image extraction workflows, refer to the Java SDK guides.
Download this ready-to-use sample package to explore the Vision API capabilities with preconfigured OCR settings.