Document to Markdown
Convert any document or image to clean markdown with a single API call. Send one file and receive the markdown output — no schema, no extraction, just clean text ready for LLM pipelines or downstream processing.
Key Features
- Multi-Format Support — Convert PDF, DOCX, XLSX, images, HTML, Markdown, CSV, JSON, and plain text.
- Built-In OCR — Scanned PDFs and image files are processed through OCR automatically. No separate step required.
- Image Descriptions — For image files, the response includes a plain-language description of the visual content alongside the extracted markdown.
- LLM-Grade Output — The markdown format is the same used internally by the Document Extraction API. Tables, structure, and layout are preserved for reliable LLM consumption.
Overview
The Document to Markdown API converts a document to clean markdown. You send one file (base64 or URL) and receive a JSON object with the result.
Endpoint: POST /document-to-markdown/v1/convert
Limits:
- Max file size: 50 MB
Supported File Formats
- Documents: PDF, DOCX, PPTX, ODT, EPUB, RTF
- Spreadsheets: XLSX, XLS, ODS, CSV, TSV
- Email: EML, MSG (headers, body, and attachment extraction)
- Notebooks: Jupyter (.ipynb)
- Academic & Publishing: LaTeX (.tex, .latex), BibTeX (.bib), Typst (.typst, .typ)
- Markup & Text: HTML, Markdown, JSON, XML, YAML, TOML, RST, Org, Djot, MDX, TXT
- Images: PNG, JPEG, GIF, WebP, AVIF, HEIF, BMP, TIFF, JP2, PNM/PBM/PGM/PPM, SVG
How It Works
Every conversion runs the same ingestion pipeline used by Document Extraction:
- Parse — the file format is detected and validated.
-
Ingest — the file is converted to markdown using the appropriate processor:
- PDF — pages are rendered to images and run through OCR.
- Images (PNG, JPEG, GIF, WebP, AVIF, HEIF, BMP, TIFF, JP2, PNM) — OCR extracts text and a vision model generates a description of the visual content.
- Office documents (DOCX, PPTX, ODT, ODS, XLSX/XLS) — content is extracted and normalized to markdown with formatting, tables, lists, and footnotes preserved.
- EPUB — chapters are extracted and converted to markdown via the HTML pipeline.
- LaTeX — converted to markdown: headings, formatting, lists, tables, math equations, and code blocks.
- Jupyter Notebooks — code and markdown cells are extracted with outputs.
- RTF — converted to markdown with bold, italic, strikethrough, Unicode, and special characters.
- Email (EML, MSG) — headers and body are parsed into structured markdown. Attachments are extracted, ingested through the pipeline, and returned separately.
- CSV/TSV — converted to markdown tables with auto-detected delimiters.
- HTML — converted to markdown preserving structure.
- Text and markup formats (Markdown, JSON, XML, YAML, TOML, RST, Org, Djot, MDX, BibTeX, Typst, TXT) — returned as-is for direct LLM consumption.
- Return — the result is returned as a JSON object.
There is no LLM extraction step. The API stops after ingestion.
How Nested Files Work
Some file formats contain other files — emails have attachments, archives have entries. When the API encounters a container format, it extracts the nested files and ingests each one through the same pipeline.
Currently supported containers: EML and MSG (email attachments).
The response includes:
-
markdown— the container’s own content (email headers and body), plus an “Attachments” section listing filenames -
nested_files— an array of ingested nested files, each withnameandmarkdown(anddescriptionfor image files)
Each nested file is billed as its own document. An email with a 3-page PDF attachment costs 1 credit (email) + 3 credits (PDF pages) = 4 credits total.
For Document Extraction, nested file markdown is appended to the container’s markdown so the LLM sees the full content — email body and all attachments — as one combined context.
Request Format
curl -X POST \
https://api.iterationlayer.com/document-to-markdown/v1/convert \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"file": {
"type": "base64",
"name": "invoice.pdf",
"base64": "<base64-encoded-file>"
}
}'{
"success": true,
"data": {
"name": "invoice.pdf",
"mime_type": "application/pdf",
"markdown": "# Invoice\n\n**Invoice Number:** INV-2024-0042\n\n**Date:** 2024-03-15\n\n| Description | Qty | Unit Price | Total |\n|---|---|---|---|\n| Consulting | 10h | $100.00 | $1,000.00 |\n| Support | 5h | $80.00 | $400.00 |\n\n**Total: $1,400.00**"
}
}import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
apiKey: "YOUR_API_KEY",
});
const result = await client.convertToMarkdown({
file: {
type: "base64",
name: "invoice.pdf",
base64: "<base64-encoded-file>",
},
});{
"success": true,
"data": {
"name": "invoice.pdf",
"mime_type": "application/pdf",
"markdown": "# Invoice\n\n**Invoice Number:** INV-2024-0042\n\n**Date:** 2024-03-15\n\n| Description | Qty | Unit Price | Total |\n|---|---|---|---|\n| Consulting | 10h | $100.00 | $1,000.00 |\n| Support | 5h | $80.00 | $400.00 |\n\n**Total: $1,400.00**"
}
}from iterationlayer import IterationLayer
client = IterationLayer(api_key="YOUR_API_KEY")
result = client.convert_to_markdown(
file={
"type": "base64",
"name": "invoice.pdf",
"base64": "<base64-encoded-file>",
}
){
"success": true,
"data": {
"name": "invoice.pdf",
"mime_type": "application/pdf",
"markdown": "# Invoice\n\n**Invoice Number:** INV-2024-0042\n\n**Date:** 2024-03-15\n\n| Description | Qty | Unit Price | Total |\n|---|---|---|---|\n| Consulting | 10h | $100.00 | $1,000.00 |\n| Support | 5h | $80.00 | $400.00 |\n\n**Total: $1,400.00**"
}
}import il "github.com/iterationlayer/sdk-go"
client := il.NewClient("YOUR_API_KEY")
result, err := client.ConvertToMarkdown(il.ConvertRequest{
File: il.NewFileFromBase64(
"invoice.pdf",
"<base64-encoded-file>",
),
}){
"success": true,
"data": {
"name": "invoice.pdf",
"mime_type": "application/pdf",
"markdown": "# Invoice\n\n**Invoice Number:** INV-2024-0042\n\n**Date:** 2024-03-15\n\n| Description | Qty | Unit Price | Total |\n|---|---|---|---|\n| Consulting | 10h | $100.00 | $1,000.00 |\n| Support | 5h | $80.00 | $400.00 |\n\n**Total: $1,400.00**"
}
}Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file |
FileInput |
Yes | The file to convert. |
webhook_url |
string |
No | HTTPS URL to receive results asynchronously. If provided, returns 201 immediately. See Webhooks. |
Async Mode
Add a webhook_url parameter to process the request in the background. The API returns 201 Accepted immediately and delivers the result to your webhook URL when processing completes. See Webhooks for payload format and retry behavior.
FileInput
The file is either a base64-encoded binary or a URL reference.
| Parameter | Type | Required | Description |
|---|---|---|---|
type |
"base64" | "url" |
Yes | Input method. |
name |
string |
Yes | File name including extension. Used to detect the format. |
base64 |
string |
When type = "base64" |
Base64-encoded file content. |
url |
string |
When type = "url" |
Public URL to fetch the file from. |
Response Format
The response is a JSON object with the conversion result.
| Field | Type | Description |
|---|---|---|
name |
string |
File name from the request. |
mime_type |
string |
Detected MIME type of the file. |
markdown |
string |
Extracted markdown content. Empty string if no text was found. |
description |
string |
Plain-language description of the image content. Present only for image files (PNG, JPEG, GIF, WebP). |
Image Files
For image files, the response includes both markdown (OCR output) and description (vision model output). The description field describes what the image depicts — suitable for use as alt text, for downstream search indexing, or as context in LLM prompts.
{
"name": "product-photo.png",
"mime_type": "image/png",
"markdown": "Sale — 30% off all items",
"description": "A product photograph of a white ceramic mug on a wooden table. The mug has a minimalist design with no text or logo. Natural lighting from the left."
}Recipes
For complete, runnable examples see the Recipes page.
- Convert Invoice to Markdown — Convert a PDF invoice to structured markdown.
- Convert Contract to Markdown — Extract contract text and clauses as clean markdown.
- Convert Resume to Markdown — Convert a resume PDF to structured markdown for downstream processing.
Error Responses
| Status | Description |
|---|---|
| 400 |
Invalid request (missing or invalid file parameter) |
| 401 | Missing or invalid API key |
| 422 | Processing error (file could not be parsed or ingested) |
| 429 | Rate limit exceeded |