Iteration Layer
Document Extraction

Extract structured data from any document

Send any of 40+ file formats — get structured JSON back. Define the fields you need, and the API extracts them with confidence scores.

No credit card required — start with free trial credits

Zero data retention · GDPR Made & hosted in the EU $60 free trial credits No credit card required 14-day money-back guarantee

One output feeds the next

Document Extraction is part of a complete content pipeline. One key, one credit pool, and structured JSON responses designed to chain together.

Fits into your existing stack

Native SDKs for Node, Python, and Go. OpenAPI spec for everything else. MCP server for AI agents and Claude Code skills. n8n community node for visual workflows.

Mix and match freely

Extract data from a document, generate visuals from the results, then compile everything into a finished report. Mix, match, and build your own pipeline.

Three steps to your first extraction

01

Define a schema

Describe the fields you want to extract using our schema format. Each field has a name, a type, and an optional description to guide the extraction.

  • 17 field types including text, currency, date, IBAN, and address
  • Nested arrays for line items, tables, and repeating sections
  • Optional descriptions to clarify ambiguous fields
02

Send your documents

Upload any of 40+ file formats via URL or base64. Send up to 20 files per request — they are combined into a single extraction result.

  • 40+ formats: PDF, Office, EPUB, LaTeX, email, Jupyter, images, and more
  • Up to 20 files combined into one structured result
  • Built-in OCR for scanned pages and photos
03

Get structured data

Receive JSON with extracted fields, confidence scores, and source citations. Every field includes provenance so you know exactly where the value came from.

  • Confidence scores between 0 and 1 for every field
  • Source citations linking each value to its location in the document
  • Missing fields return null with a confidence score of 0

Intelligent Extraction

The API automatically selects the best extraction approach for your schema and documents. Complex schemas, dense tables, and nested structures are handled without any configuration.

Schema-Driven Results

Define 17 typed fields — dates, IBANs, currencies, addresses, nested arrays — and get structured JSON back. No prompt engineering, no output parsing.

Deep Content Understanding

Images and scanned documents aren't treated as pixel grids to OCR. The API understands what they depict — product photos, charts, handwritten notes — and extracts field values from that visual meaning.

Built-In Trust Scores

Every extracted value includes a confidence score and a verbatim source citation from the document. Route low-confidence results to human review.

Multi-File Merge

Send up to 20 files per request and get one unified extraction across all of them. Mix formats freely — a PDF invoice, a DOCX contract, and a JPEG receipt in the same call.

40+ File Formats

PDF, DOCX, PPTX, ODT, ODS, XLSX, EPUB, LaTeX, EML, Jupyter notebooks, images (PNG, JPEG, TIFF, BMP, WebP, JP2, and more), plus text and markup formats like YAML, TOML, RST, and Org — all in the same endpoint.

No Model Training

Your documents are never used to train or improve AI models. This is guaranteed for all plans — not gated behind an enterprise contract.

Real-world pipelines, ready to ship

Each recipe chains multiple APIs into a complete workflow. Pick one, tweak it, and deploy — or use it as a starting point for your own pipeline.

Extract Academic Paper Metadata

Extract title, authors, abstract, and citation info from academic papers.

Extract Article Text

Extract clean article content — title, author, date, and body text — from PDFs, Word docs, and web pages.

Extract Contract Clause Data

Extract parties, dates, and clauses from a contract into structured JSON for legal review workflows.

Extract Court Filing Data

Extract case numbers, parties, filing dates, court details, and relief sought from court filing documents and legal pleadings.

Extract Customs Declaration

Merge a commercial invoice, packing list, and bill of lading into a unified customs declaration.

Extract Delivery Note Data

Extract shipment details, item quantities, and delivery confirmation data from warehouse delivery notes and goods received notes.

Extract Fleet Vehicle Registration Data

Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.

Extract Invoice Data

Extract vendor name, line items, totals, and dates from invoice documents.

Extract KPI Data

Extract campaign or business KPIs from report documents — metrics, values, periods, and targets.

Extract KYC Onboarding Data

Extract client identity verification details, company information, and beneficial ownership data from KYC onboarding documents.

Extract Legal Invoice Data

Extract timekeeper entries, disbursements, matter references, and billing summaries from law firm invoices.

Extract Medical Record

Extract patient details, diagnoses, and medications from a medical record into structured JSON for healthcare workflows.

Extract Multi-Invoice Data

Extract structured data from multiple invoice files in a single API call using an array schema.

Extract NDA Terms

Extract parties, obligations, restrictions, permitted disclosures, and expiry dates from non-disclosure agreements.

Extract Product Catalog Entry

Extract product name, SKU, price, and specifications from a catalog document into structured JSON for e-commerce workflows.

Extract Property Appraisal

Extract appraised value, property details, and comparable sales from a property appraisal report into structured JSON.

Extract Property Deed Data

Extract property ownership, legal descriptions, encumbrances, and recording details from property deeds and land registry documents.

Extract Purchase Order Data

Extract line items, quantities, unit prices, delivery dates, and supplier details from purchase order documents.

Extract Real Estate Listing

Extract property address, price, room count, and features from a listing document into structured JSON for MLS and property platforms.

Extract Receipt Data

Extract merchant, date, line items, tax, and total from receipts.

Extract Rental Application

Extract applicant details, employment history, income, and references from a rental application form into structured JSON for tenant screening.

Extract Resume Data

Extract candidate name, contact details, work history, and skills from resumes.

Extract Supplier Invoice Data for ERP Import

Extract supplier invoice details structured for direct import into ERP systems like SAP, Oracle, or Microsoft Dynamics.

Extract Terms and Conditions

Extract clause types, obligations, limitations, and governing law from terms and conditions documents.

Extract Traffic Fine Data

Extract violation details, fine amounts, vehicle information, and payment deadlines from traffic fine notices.

One n8n node for your entire pipeline

Most n8n document workflows chain three or four separate services. The Iteration Layer community node covers extraction, transformation, and generation in a single install — wire up multi-step pipelines visually instead of writing glue code.

Start building right now

One API call, one credit deducted. Chains naturally with our other APIs — pipe the output of one into the next without glue code. You'll be up and running in minutes.

  • Full OpenAPI 3.1 specification available for code generation and IDE integration.
  • MCP server support for seamless integration with AI agents and tools.
  • Comprehensive documentation with examples for every field type and edge case.
Request
curl -X POST \
  https://api.iterationlayer.com/document-extraction/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "files": [{
    "type": "url",
    "name": "invoice.pdf",
    "url": "https://example.com/invoice.pdf"
  }],
  "schema": {
    "fields": [
      {
        "name": "invoice_number",
        "type": "TEXT",
        "description": "The invoice number"
      },
      {
        "name": "total_amount",
        "type": "CURRENCY_AMOUNT",
        "description": "The total amount"
      },
      {
        "name": "line_items",
        "type": "ARRAY",
        "description": "Line items",
        "fields": [
          {
            "name": "description",
            "type": "TEXT"
          },
          {
            "name": "amount",
            "type": "CURRENCY_AMOUNT"
          }
        ]
      }
    ]
  }
}'
Response
{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "invoice.pdf"
    },
    "total_amount": {
      "type": "CURRENCY_AMOUNT",
      "value": 1250.00,
      "confidence": 0.95,
      "citations": ["Total: $1,250.00"],
      "source": "invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Consulting (10h)",
            "confidence": 0.98,
            "citations": ["Consulting (10h)"]
          },
          "amount": {
            "value": 1000.00,
            "confidence": 0.96,
            "citations": ["$1,000.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "invoice.pdf"
    }
  }
}
Request
import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const result = await client.extract({
  files: [{
    type: "url",
    name: "invoice.pdf",
    url: "https://example.com/invoice.pdf",
  }],
  schema: {
    fields: [
      {
        type: "TEXT",
        name: "invoice_number",
        description: "The invoice number",
      },
      {
        type: "CURRENCY_AMOUNT",
        name: "total_amount",
        description: "The total amount",
      },
      {
        type: "ARRAY",
        name: "line_items",
        description: "Line items",
        fields: [
          { type: "TEXT", name: "description" },
          { type: "CURRENCY_AMOUNT", name: "amount" },
        ],
      },
    ],
  },
});
Response
{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "invoice.pdf"
    },
    "total_amount": {
      "type": "CURRENCY_AMOUNT",
      "value": 1250.00,
      "confidence": 0.95,
      "citations": ["Total: $1,250.00"],
      "source": "invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Consulting (10h)",
            "confidence": 0.98,
            "citations": ["Consulting (10h)"]
          },
          "amount": {
            "value": 1000.00,
            "confidence": 0.96,
            "citations": ["$1,000.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "invoice.pdf"
    }
  }
}
Request
from iterationlayer import IterationLayer

client = IterationLayer(
    api_key="YOUR_API_KEY"
)

result = client.extract(
    files=[{
        "type": "url",
        "name": "invoice.pdf",
        "url": "https://example.com/invoice.pdf",
    }],
    schema={
        "fields": [
            {
                "type": "TEXT",
                "name": "invoice_number",
                "description": "The invoice number",
            },
            {
                "type": "CURRENCY_AMOUNT",
                "name": "total_amount",
                "description": "The total amount",
            },
            {
                "type": "ARRAY",
                "name": "line_items",
                "description": "Line items",
                "fields": [
                    {"type": "TEXT", "name": "description"},
                    {"type": "CURRENCY_AMOUNT", "name": "amount"},
                ],
            },
        ],
    },
)
Response
{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "invoice.pdf"
    },
    "total_amount": {
      "type": "CURRENCY_AMOUNT",
      "value": 1250.00,
      "confidence": 0.95,
      "citations": ["Total: $1,250.00"],
      "source": "invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Consulting (10h)",
            "confidence": 0.98,
            "citations": ["Consulting (10h)"]
          },
          "amount": {
            "value": 1000.00,
            "confidence": 0.96,
            "citations": ["$1,000.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "invoice.pdf"
    }
  }
}
Request
import il "github.com/iterationlayer/sdk-go"

client := il.NewClient("YOUR_API_KEY")

result, err := client.Extract(il.ExtractRequest{
  Files: []il.FileInput{
    il.NewFileFromURL(
      "invoice.pdf",
      "https://example.com/invoice.pdf",
    ),
  },
  Schema: il.ExtractionSchema{
    "invoice_number": il.NewTextFieldConfig(
      "invoice_number",
      "The invoice number",
    ),
    "total_amount": il.NewCurrencyAmountFieldConfig(
      "total_amount",
      "The total amount",
    ),
  },
})
Response
{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "invoice.pdf"
    },
    "total_amount": {
      "type": "CURRENCY_AMOUNT",
      "value": 1250.00,
      "confidence": 0.95,
      "citations": ["Total: $1,250.00"],
      "source": "invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Consulting (10h)",
            "confidence": 0.98,
            "citations": ["Consulting (10h)"]
          },
          "amount": {
            "value": 1000.00,
            "confidence": 0.96,
            "citations": ["$1,000.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "invoice.pdf"
    }
  }
}

Official SDKs for every major language

Install the SDK, set your API key, and start chaining requests. Full type safety, automatic retries, and idiomatic error handling included.

Your data stays in the EU

Your data is processed on EU servers and never stored beyond temporary logs. Zero retention, GDPR-compliant by design, with a Data Processing Agreement available for every customer. Learn more about our security practices .

No data storage, no model training

We don't store your files or processing results, and your data is never used to train or improve AI models. Logs are automatically deleted after 90 days.

EU-hosted infrastructure

All processing runs on servers located in the European Union. Your data never leaves the EU.

GDPR-compliant by design

Full compliance with EU data protection regulations. Data Processing Agreement available for all customers.

Pricing

Start with free trial credits. No credit card required.

Developer

For individuals & small projects

$29.99 /month
1,000 credits included
Most Popular

Startup

Save 40%

For growing teams

$119.99 /month
5,000 credits included

Business

Save 47%

For high-volume workloads

$319.99 /month
15,000 credits included

Or pay as you go from $0.022/credit with automatic volume discounts.

All APIs included Free trial credits per API Project-based budget caps Auto overage billing

Frequently asked questions

What file formats are supported?
The API accepts 40+ file formats including PDF, DOCX, PPTX, ODT, ODS, XLSX, EPUB, CSV, TSV, HTML, LaTeX, EML, Jupyter notebooks, and all common image formats. Scanned documents are processed with built-in OCR.
How does schema-based extraction work?
You define a schema describing the fields you want (name, type, description). The API uses AI to locate and extract those fields from the document.
What are confidence scores?
Every extracted field includes a confidence score between 0 and 1, indicating how certain the API is about the result. Use these to build human review flows.
How many files can I send per request?
You can send up to 20 files per request. All files are combined into a single extraction result — the API pulls fields from across all documents. The total size limit is 200 MB with 50 MB per file.
Does it handle scanned documents?
Yes. The API includes built-in OCR for scanned documents and images. No separate OCR step is needed.
What happens when a field isn't found?
Missing fields return null with a confidence score of 0. You can use confidence thresholds to decide when to flag documents for manual review.

Still evaluating?

See how we compare — and where the competition still wins. Choosing the right tool shouldn't require a week of research.

Built for how you work

Whether you're building pipelines in code, automating workflows, orchestrating AI agents, or shipping client projects — Iteration Layer fits your process.