Skip to content

steventine/pdf-comparer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Comparer

Compares two versions of a PDF document and outputs a structured markdown report of all textual differences, page by page. Changed words are highlighted in bold. No summarization — every change is reported exactly as written.

This was created to help identify the exact differences between incremental revisions of the VEX IQ Robotics Game Manual (like https://link.vex.com/docs/25-26/viqrc-mixandmatch-manual)

This project was written primarily using Claude Code

Requirements

  • Python 3
  • pandoc (optional, for PDF export)

Setup

Create a virtual environment and install dependencies (one-time):

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt

Usage

.venv/bin/python compare.py old.pdf new.pdf

This prints the diff to the terminal. To save it as a markdown file:

.venv/bin/python compare.py old.pdf new.pdf > diff.md

Converting the Output to PDF

Install pandoc and a LaTeX distribution if you haven't already:

# Ubuntu/Debian
sudo apt install pandoc texlive

# macOS
brew install pandoc

Then convert the markdown diff to a landscape PDF with minimal margins in one step:

.venv/bin/python compare.py old.pdf new.pdf | \
  pandoc --from gfm -o diff.pdf \
    -V geometry:"landscape,margin=0.5in" \
    -V fontsize=10pt \
    --lua-filter=table-widths.lua

Or from a saved markdown file:

pandoc --from gfm diff.md -o diff.pdf \
  -V geometry:"landscape,margin=0.5in" \
  -V fontsize=10pt \
  --lua-filter=table-widths.lua

The table-widths.lua filter (included in this repo) ensures table columns wrap correctly instead of overflowing the page.

Output Format

Changes are grouped by page number. Each changed paragraph appears as a table row with the original text on the left and the new text on the right. Modified words are shown in bold.

**Page 12:**

| Original | New |
|----------|-----|
| The robot must **stop** within the zone. | The robot must **halt** within the zone. |
| **This rule applies to all matches.** | [DELETED] |
| [ADDED] | **A five-second grace period applies.** |

Page footers and whitespace-only differences are ignored.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors