Introducing the New
PyMuPDF4LLM:
Now Including Layout

The first document intelligence library that reads PDF structure natively — no image rendering, no GPU, no reconstruction loss.

TRY DEMO SEE DOCS

0×Faster than vision models

0×Cost reduction at scale

0%Table accuracy, financial docs

0.0MParameters vs VLM billions

GNN trained on PDF internals — not pixels

Graph Neural Network reads vector structure directly. No image rendering pipeline, no OCR uncertainty. By parsing primitives instead of pixels, we preserve 100% of the table semantics and document hierarchy.

TRY IT NOW

GNN trained on PDF internals — not pixels

One Foundation. Multiple Extensions.

From low-level PDF manipulation to LLM-ready extraction.
Choose what your workflow needs.

PYMUPDF01

The Fastest PDF Processing Library in Python

Lightning-fast PDF processing at maximum speed, with minimal dependencies, powered by the MuPDF engine.

LEARN MORE

PYMUPDF4LLM02

Seamless PDF Integration for LLMs

Connect your PDF documents directly to Large Language Models with optimized text extraction.

LEARN MORE

PYMUPDF PRO03

Advanced PDF Capabilities for Enterprise

Enhanced features for complex document workflows and enterprise-grade performance.

LEARN MORE

Compare Features

Not sure which product is right for you? Here's how they stack up.

Product

PYMUPDF

PYMUPDF4LLM

PYMUPDF PRO

LICENSE

AGPLFree to use under the GNU Affero General Public License. Requires open-sourcing your application if distributed.CommercialPaid license for proprietary or commercial use without AGPL obligations.

CommercialPyMuPDF Pro is available under a Commercial license only. No AGPL option is available — designed for proprietary and enterprise use.

SOURCE CODE

Open SourceSource code is publicly available and can be inspected, modified, and contributed to.

-Source code is not publicly available.

INPUT FILES

PDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, ImagePDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, and Image formats are natively supported.

DOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPXDOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPX are not supported in this product.

PDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, ImagePDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, and Image formats are natively supported.

DOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPXDOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPX are not supported in this product.

PDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, ImagePDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, and Image formats are natively supported.

DOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPXDOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPX are not supported in this product.

PDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, ImagePDF, XPS, EPUB, CBZ, MOBI, FB2, SVG, TXT, and Image formats are natively supported.

DOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPXDOC/DOCX, XLS/XLSX, PPT/PPTX, HWP/HWPX supported via conversion layer.

OUTPUT FILES

PDF, SVG, ImageGenerates PDF, SVG, and Image formats directly.

Markdown, JSON, TXTMarkdown, JSON, and TXT are supported — ideal for structured or AI-ready output.

PDF, SVG, ImageGenerates PDF, SVG, and Image formats directly.

Markdown, JSON, TXTMarkdown, JSON, and TXT are not available in this product.

PDF, SVG, ImageGenerates PDF, SVG, and Image formats directly.

Markdown, JSON, TXTMarkdown, JSON, and TXT are supported — ideal for structured or AI-ready output.

PDF, SVG, ImageGenerates PDF, SVG, and Image formats directly.

Markdown, JSON, TXTMarkdown, JSON, and TXT available for structured or AI-ready output.

PAGE ANALYSIS

Advanced Page AnalysisUses trained data for enhanced structural recognition and superior layout results.

Basic Page AnalysisReturns document structure including layout and element positions.

Advanced Page AnalysisUses trained data for enhanced structural recognition and superior layout results.

All IncludedIncludes both basic document structure detection and advanced trained-data analysis.

TEXT EXTRACTION

Advanced Text ExtractionExtracts text with structure tags (headings, lists, tables), page layout analysis, and semantic understanding. Includes superior table extraction with full cell structure and data type recognition.

Basic Text ExtractionExtracts text with structured layout information and bounding box data. Includes basic table extraction.

Advanced Text ExtractionExtracts text with structure tags (headings, lists, tables), page layout analysis, and semantic understanding. Includes superior table extraction with full cell structure and data type recognition.

All IncludedIncludes both basic structured extraction and advanced semantic text extraction with superior table extraction.

IMAGE EXTRACTION

Advanced Image ExtractionAdvanced detection and rendering of image areas on the page — saves to disk or embeds in Markdown output.

Basic Image ExtractionExtracts embedded images from PDF pages.

Advanced Image ExtractionAdvanced detection and rendering of image areas on the page — saves to disk or embeds in Markdown output.

All IncludedIncludes both basic image extraction and advanced image area detection and rendering.

VECTOR EXTRACTION

Advanced Vector ExtractionSuperior detection of picture areas with precise vector element identification.

Basic Vector ExtractionExtracts and clusters vector graphics from PDF pages.

Advanced Vector ExtractionSuperior detection of picture areas with precise vector element identification.

All IncludedIncludes both basic vector extraction/clustering and superior picture area detection.

OCR

Automatic OCRAutomatically applies OCR based on page content analysis — no manual trigger needed.

On-demandOn-demand invocation of built-in Tesseract for text detection on pages or images.

Automatic OCRAutomatically applies OCR based on page content analysis — no manual trigger needed.

All IncludedIncludes both on-demand Tesseract invocation and automatic OCR based on page content analysis.

LEARN MORE

Everything you need for PDF workflows

SEE ALL CAPABILITIES

01 EXTRACTION

Extract text, images, tables, and metadata. Pull structured data from any PDF with precision. Get raw text, formatted tables, embedded images, fonts, annotations, and document metadata, all with simple Python commands.

02 ANALYSIS

Understand document structure and layout. Analyze reading order, detect document elements, identify tables and columns, and preserve visual hierarchy. Perfect for building RAG pipelines or processing complex documents.

03 CONVERSION

Convert PDFs to any format you need. Transform PDFs into Markdown, HTML, images, or text while maintaining formatting.

04 MANIPULATION

Create, edit, and transform PDFs programmatically. Merge, split, rotate, crop, and watermark PDFs. Add annotations, modify pages, insert images, and generate new PDFs from scratch. Full programmatic control over every element.

Start building in minutes

Real Python examples, straight from the docs.

EXTRACTION

How to Extract all Document Text

Leverage advanced algorithms for precise data extraction, ensuring your LLMs receive structured, context-rich information. Transform complex documents into useful data.