Extract structured data from any document
Send any of 40+ file formats — get structured JSON back. Define the fields you need, and the API extracts them with confidence scores.
No credit card required — start with free trial credits
One output feeds the next
Document Extraction is part of a complete content pipeline. One key, one credit pool, and structured JSON responses designed to chain together.
Mix and match freely
Extract data from a document, generate visuals from the results, then compile everything into a finished report. Mix, match, and build your own pipeline.
Three steps to your first extraction
Define a schema
Describe the fields you want to extract using our schema format. Each field has a name, a type, and an optional description to guide the extraction.
- 17 field types including text, currency, date, IBAN, and address
- Nested arrays for line items, tables, and repeating sections
- Optional descriptions to clarify ambiguous fields
Send your documents
Upload any of 40+ file formats via URL or base64. Send up to 20 files per request — they are combined into a single extraction result.
- 40+ formats: PDF, Office, EPUB, LaTeX, email, Jupyter, images, and more
- Up to 20 files combined into one structured result
- Built-in OCR for scanned pages and photos
Get structured data
Receive JSON with extracted fields, confidence scores, and source citations. Every field includes provenance so you know exactly where the value came from.
- Confidence scores between 0 and 1 for every field
- Source citations linking each value to its location in the document
- Missing fields return null with a confidence score of 0
Intelligent Extraction
The API automatically selects the best extraction approach for your schema and documents. Complex schemas, dense tables, and nested structures are handled without any configuration.
Schema-Driven Results
Define 17 typed fields — dates, IBANs, currencies, addresses, nested arrays — and get structured JSON back. No prompt engineering, no output parsing.
Deep Content Understanding
Images and scanned documents aren't treated as pixel grids to OCR. The API understands what they depict — product photos, charts, handwritten notes — and extracts field values from that visual meaning.
Built-In Trust Scores
Every extracted value includes a confidence score and a verbatim source citation from the document. Route low-confidence results to human review.
Multi-File Merge
Send up to 20 files per request and get one unified extraction across all of them. Mix formats freely — a PDF invoice, a DOCX contract, and a JPEG receipt in the same call.
40+ File Formats
PDF, DOCX, PPTX, ODT, ODS, XLSX, EPUB, LaTeX, EML, Jupyter notebooks, images (PNG, JPEG, TIFF, BMP, WebP, JP2, and more), plus text and markup formats like YAML, TOML, RST, and Org — all in the same endpoint.
No Model Training
Your documents are never used to train or improve AI models. This is guaranteed for all plans — not gated behind an enterprise contract.
Real-world pipelines, ready to ship
Each recipe chains multiple APIs into a complete workflow. Pick one, tweak it, and deploy — or use it as a starting point for your own pipeline.
Extract Academic Paper Metadata
Extract title, authors, abstract, and citation info from academic papers.
Extract Article Text
Extract clean article content — title, author, date, and body text — from PDFs, Word docs, and web pages.
Extract Contract Clause Data
Extract parties, dates, and clauses from a contract into structured JSON for legal review workflows.
Extract Court Filing Data
Extract case numbers, parties, filing dates, court details, and relief sought from court filing documents and legal pleadings.
Extract Customs Declaration
Merge a commercial invoice, packing list, and bill of lading into a unified customs declaration.
Extract Delivery Note Data
Extract shipment details, item quantities, and delivery confirmation data from warehouse delivery notes and goods received notes.
Extract Fleet Vehicle Registration Data
Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.
Extract Invoice Data
Extract vendor name, line items, totals, and dates from invoice documents.
Extract KPI Data
Extract campaign or business KPIs from report documents — metrics, values, periods, and targets.
Extract KYC Onboarding Data
Extract client identity verification details, company information, and beneficial ownership data from KYC onboarding documents.
Extract Legal Invoice Data
Extract timekeeper entries, disbursements, matter references, and billing summaries from law firm invoices.
Extract Medical Record
Extract patient details, diagnoses, and medications from a medical record into structured JSON for healthcare workflows.
Extract Multi-Invoice Data
Extract structured data from multiple invoice files in a single API call using an array schema.
Extract NDA Terms
Extract parties, obligations, restrictions, permitted disclosures, and expiry dates from non-disclosure agreements.
Extract Product Catalog Entry
Extract product name, SKU, price, and specifications from a catalog document into structured JSON for e-commerce workflows.
Extract Property Appraisal
Extract appraised value, property details, and comparable sales from a property appraisal report into structured JSON.
Extract Property Deed Data
Extract property ownership, legal descriptions, encumbrances, and recording details from property deeds and land registry documents.
Extract Purchase Order Data
Extract line items, quantities, unit prices, delivery dates, and supplier details from purchase order documents.
Extract Real Estate Listing
Extract property address, price, room count, and features from a listing document into structured JSON for MLS and property platforms.
Extract Receipt Data
Extract merchant, date, line items, tax, and total from receipts.
Extract Rental Application
Extract applicant details, employment history, income, and references from a rental application form into structured JSON for tenant screening.
Extract Resume Data
Extract candidate name, contact details, work history, and skills from resumes.
Extract Supplier Invoice Data for ERP Import
Extract supplier invoice details structured for direct import into ERP systems like SAP, Oracle, or Microsoft Dynamics.
Extract Terms and Conditions
Extract clause types, obligations, limitations, and governing law from terms and conditions documents.
Extract Traffic Fine Data
Extract violation details, fine amounts, vehicle information, and payment deadlines from traffic fine notices.
One n8n node for your entire pipeline
Most n8n document workflows chain three or four separate services. The Iteration Layer community node covers extraction, transformation, and generation in a single install — wire up multi-step pipelines visually instead of writing glue code.
Start building right now
One API call, one credit deducted. Chains naturally with our other APIs — pipe the output of one into the next without glue code. You'll be up and running in minutes.
- Full OpenAPI 3.1 specification available for code generation and IDE integration.
- MCP server support for seamless integration with AI agents and tools.
- Comprehensive documentation with examples for every field type and edge case.
curl -X POST \
https://api.iterationlayer.com/document-extraction/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"files": [{
"type": "url",
"name": "invoice.pdf",
"url": "https://example.com/invoice.pdf"
}],
"schema": {
"fields": [
{
"name": "invoice_number",
"type": "TEXT",
"description": "The invoice number"
},
{
"name": "total_amount",
"type": "CURRENCY_AMOUNT",
"description": "The total amount"
},
{
"name": "line_items",
"type": "ARRAY",
"description": "Line items",
"fields": [
{
"name": "description",
"type": "TEXT"
},
{
"name": "amount",
"type": "CURRENCY_AMOUNT"
}
]
}
]
}
}'
{
"success": true,
"data": {
"invoice_number": {
"type": "TEXT",
"value": "INV-2024-0042",
"confidence": 0.97,
"citations": ["Invoice #INV-2024-0042"],
"source": "invoice.pdf"
},
"total_amount": {
"type": "CURRENCY_AMOUNT",
"value": 1250.00,
"confidence": 0.95,
"citations": ["Total: $1,250.00"],
"source": "invoice.pdf"
},
"line_items": {
"type": "ARRAY",
"value": [
{
"description": {
"value": "Consulting (10h)",
"confidence": 0.98,
"citations": ["Consulting (10h)"]
},
"amount": {
"value": 1000.00,
"confidence": 0.96,
"citations": ["$1,000.00"]
}
}
],
"confidence": 0.97,
"citations": [],
"source": "invoice.pdf"
}
}
}
import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
apiKey: "YOUR_API_KEY",
});
const result = await client.extract({
files: [{
type: "url",
name: "invoice.pdf",
url: "https://example.com/invoice.pdf",
}],
schema: {
fields: [
{
type: "TEXT",
name: "invoice_number",
description: "The invoice number",
},
{
type: "CURRENCY_AMOUNT",
name: "total_amount",
description: "The total amount",
},
{
type: "ARRAY",
name: "line_items",
description: "Line items",
fields: [
{ type: "TEXT", name: "description" },
{ type: "CURRENCY_AMOUNT", name: "amount" },
],
},
],
},
});
{
"success": true,
"data": {
"invoice_number": {
"type": "TEXT",
"value": "INV-2024-0042",
"confidence": 0.97,
"citations": ["Invoice #INV-2024-0042"],
"source": "invoice.pdf"
},
"total_amount": {
"type": "CURRENCY_AMOUNT",
"value": 1250.00,
"confidence": 0.95,
"citations": ["Total: $1,250.00"],
"source": "invoice.pdf"
},
"line_items": {
"type": "ARRAY",
"value": [
{
"description": {
"value": "Consulting (10h)",
"confidence": 0.98,
"citations": ["Consulting (10h)"]
},
"amount": {
"value": 1000.00,
"confidence": 0.96,
"citations": ["$1,000.00"]
}
}
],
"confidence": 0.97,
"citations": [],
"source": "invoice.pdf"
}
}
}
from iterationlayer import IterationLayer
client = IterationLayer(
api_key="YOUR_API_KEY"
)
result = client.extract(
files=[{
"type": "url",
"name": "invoice.pdf",
"url": "https://example.com/invoice.pdf",
}],
schema={
"fields": [
{
"type": "TEXT",
"name": "invoice_number",
"description": "The invoice number",
},
{
"type": "CURRENCY_AMOUNT",
"name": "total_amount",
"description": "The total amount",
},
{
"type": "ARRAY",
"name": "line_items",
"description": "Line items",
"fields": [
{"type": "TEXT", "name": "description"},
{"type": "CURRENCY_AMOUNT", "name": "amount"},
],
},
],
},
)
{
"success": true,
"data": {
"invoice_number": {
"type": "TEXT",
"value": "INV-2024-0042",
"confidence": 0.97,
"citations": ["Invoice #INV-2024-0042"],
"source": "invoice.pdf"
},
"total_amount": {
"type": "CURRENCY_AMOUNT",
"value": 1250.00,
"confidence": 0.95,
"citations": ["Total: $1,250.00"],
"source": "invoice.pdf"
},
"line_items": {
"type": "ARRAY",
"value": [
{
"description": {
"value": "Consulting (10h)",
"confidence": 0.98,
"citations": ["Consulting (10h)"]
},
"amount": {
"value": 1000.00,
"confidence": 0.96,
"citations": ["$1,000.00"]
}
}
],
"confidence": 0.97,
"citations": [],
"source": "invoice.pdf"
}
}
}
import il "github.com/iterationlayer/sdk-go"
client := il.NewClient("YOUR_API_KEY")
result, err := client.Extract(il.ExtractRequest{
Files: []il.FileInput{
il.NewFileFromURL(
"invoice.pdf",
"https://example.com/invoice.pdf",
),
},
Schema: il.ExtractionSchema{
"invoice_number": il.NewTextFieldConfig(
"invoice_number",
"The invoice number",
),
"total_amount": il.NewCurrencyAmountFieldConfig(
"total_amount",
"The total amount",
),
},
})
{
"success": true,
"data": {
"invoice_number": {
"type": "TEXT",
"value": "INV-2024-0042",
"confidence": 0.97,
"citations": ["Invoice #INV-2024-0042"],
"source": "invoice.pdf"
},
"total_amount": {
"type": "CURRENCY_AMOUNT",
"value": 1250.00,
"confidence": 0.95,
"citations": ["Total: $1,250.00"],
"source": "invoice.pdf"
},
"line_items": {
"type": "ARRAY",
"value": [
{
"description": {
"value": "Consulting (10h)",
"confidence": 0.98,
"citations": ["Consulting (10h)"]
},
"amount": {
"value": 1000.00,
"confidence": 0.96,
"citations": ["$1,000.00"]
}
}
],
"confidence": 0.97,
"citations": [],
"source": "invoice.pdf"
}
}
}
Official SDKs for every major language
Install the SDK, set your API key, and start chaining requests. Full type safety, automatic retries, and idiomatic error handling included.
Your data stays in the EU
Your data is processed on EU servers and never stored beyond temporary logs. Zero retention, GDPR-compliant by design, with a Data Processing Agreement available for every customer. Learn more about our security practices .
No data storage, no model training
We don't store your files or processing results, and your data is never used to train or improve AI models. Logs are automatically deleted after 90 days.
EU-hosted infrastructure
All processing runs on servers located in the European Union. Your data never leaves the EU.
GDPR-compliant by design
Full compliance with EU data protection regulations. Data Processing Agreement available for all customers.
Pricing
Start with free trial credits. No credit card required.
Developer
For individuals & small projects
Startup
Save 40%For growing teams
Business
Save 47%For high-volume workloads
Or pay as you go from $0.022/credit with automatic volume discounts.
Frequently asked questions
What file formats are supported?
How does schema-based extraction work?
What are confidence scores?
How many files can I send per request?
Does it handle scanned documents?
What happens when a field isn't found?
Still evaluating?
See how we compare — and where the competition still wins. Choosing the right tool shouldn't require a week of research.
Reducto
Reducto uses JSON Schema for field definitions — verbose compared to simple typed field declarations.
DocuPipe
DocuPipe does zero-shot extraction, but returns generic string values — not typed, validated fields.
Nanonets
Nanonets uses open-source OCR models with generic schema definitions — not purpose-built typed fields.
LlamaParse
LlamaParse outputs markdown — your code still needs to parse it into typed fields.
Mistral OCR
Mistral has best-in-class OCR, but returns raw text — not structured data with typed fields.
AWS Textract
Textract needs five API calls per document and returns raw strings — not typed, structured data.
Azure Document Intelligence
Azure requires training custom models before you can extract data from new document types.
Google Document AI
Document AI is powerful, but requires a GCP project, service account, and storage bucket to get started.
Kreuzberg
Kreuzberg is fast and open source, but you own the deployment, scaling, and monitoring.
Regex & Templates
Regex templates break the moment a document layout changes — even slightly.
Built for how you work
Whether you're building pipelines in code, automating workflows, orchestrating AI agents, or shipping client projects — Iteration Layer fits your process.
Developers
One vendor, one credit pool — stop maintaining five libraries for document and image processing.
Operations Teams
Automate the manual document and image tasks that eat hours every week — no custom code required.
AI Agents
Give your AI agents a complete content processing toolkit via a single MCP server.
Agencies
One account, one credit pool — deploy the same processing pipeline across every client project.