NotebookLM Tools

Scripts and tools for preparing documents of various formats for NotebookLM. Handles PDF, DJVU, EPUB, FB2, MOBI, DOC/DOCX, and many other formats -- converting, OCR-ing, and splitting them into word-count-limited chunks suitable for upload.

prepare_all.sh

The main entry point. Converts and splits documents of various formats into word-count-limited chunks ready for NotebookLM.

Supported formats

Format	Processing
PDF	OCR if no text layer, then split into chunks
DJVU, DJV	Extract text layer or page-by-page OCR, then split as text
EPUB, FB2, MOBI, AZW, AZW3, DOC, DOCX, RTF, ODT, HTML, HTM, LIT, PDB, LRF	Convert to PDF via `ebook-convert`, then split
TXT	Split directly as text (no PDF conversion)

Usage

./prepare_all.sh [input_dir]

input_dir defaults to the current directory. The script finds all supported files, converts/OCRs as needed, and splits them into chunks in the output directory.

Examples

# Process all documents in the current directory
./prepare_all.sh

# Process a specific directory
./prepare_all.sh /path/to/books

# Customize chunk size and output
MAX_CHUNK_WORDS=300000 OUTPUT_DIR=chunks ./prepare_all.sh /path/to/books

# Set OCR language and keep intermediate PDFs
OCR_LANG=rus+eng KEEP_PDF=1 ./prepare_all.sh /path/to/books

# Skip OCR entirely
SKIP_OCR=1 ./prepare_all.sh /path/to/books

Environment variables

MAX_CHUNK_WORDS - Maximum words per output chunk (default: 400000)
OUTPUT_DIR - Directory for output files (default: out)
OUTPUT_PREFIX - Prefix added to output filenames (default: none)
KEEP_PDF - Set to 1 to keep intermediate PDFs from format conversions (default: 0)
OCR_LANG - Tesseract language(s) for OCR (default: rus+eng). Supports + syntax for multiple languages.
SKIP_OCR - Set to 1 to disable OCR for all formats (default: 0)

Dependencies

prepare_all.sh relies on scripts from the subdirectories below, plus:

pdftotext, pdfinfo, qpdf - PDF manipulation (poppler-utils, qpdf)
ocrmypdf - PDF OCR
ebook-convert - Format conversion (install Calibre)

For DJVU support: ddjvu, djvutxt, djvused, tesseract (djvulibre, tesseract-ocr).

Tool directories

pdf-tools/

Scripts for working with PDF files -- splitting, combining, OCR-ing.

ocr_pdf.sh - Adds OCR text layer to image-only PDFs in place, with progress bar
split_pdf.sh - Splits large PDFs into word-count-limited chunks
combine_pdf.sh - Combines multiple PDFs into chunks by word count
pdf_process.sh - All-in-one: split, combine, and generate a manifest

djvu-tools/

Scripts for extracting text from DJVU files and optionally converting to PDF.

ocr_djvu.sh - OCRs DJVU files with ocrodjvu, embeds text layer in place, extracts .txt
djvu_to_txt.sh - Extracts text from DJVU via page-by-page tesseract (does not modify originals)

text-tools/

Scripts for working with text and ebook formats.

split_text.sh - Splits large text files into word-count-limited chunks
split_epub.py - Splits EPUB files into smaller parts
epub-to-text.py / epub-to-text2.py - Converts EPUB to plain text

automation/

Browser automation for NotebookLM.

add_links_script.py - Automates adding links as sources to a NotebookLM notebook via Playwright

Installation

System packages

# Fedora
sudo dnf install poppler-utils qpdf ocrmypdf djvulibre tesseract tesseract-langpack-rus calibre

# Debian/Ubuntu
sudo apt install poppler-utils qpdf ocrmypdf djvulibre-bin tesseract-ocr tesseract-ocr-rus calibre

# macOS
brew install poppler qpdf ocrmypdf djvulibre tesseract calibre

Python packages

pip install -r text-tools/requirements.txt

For browser automation, also run:

playwright install chromium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NotebookLM Tools

prepare_all.sh

Supported formats

Usage

Examples

Environment variables

Dependencies

Tool directories

pdf-tools/

djvu-tools/

text-tools/

automation/

Installation

System packages

Python packages

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
automation		automation
djvu-tools		djvu-tools
pdf-tools		pdf-tools
text-tools		text-tools
LICENSE		LICENSE
README.md		README.md
prepare_all.sh		prepare_all.sh

Folders and files

Latest commit

History

Repository files navigation

NotebookLM Tools

prepare_all.sh

Supported formats

Usage

Examples

Environment variables

Dependencies

Tool directories

pdf-tools/

djvu-tools/

text-tools/

automation/

Installation

System packages

Python packages

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages