Compares two versions of a PDF document and outputs a structured markdown report of all textual differences, page by page. Changed words are highlighted in bold. No summarization — every change is reported exactly as written.
This was created to help identify the exact differences between incremental revisions of the VEX IQ Robotics Game Manual (like https://link.vex.com/docs/25-26/viqrc-mixandmatch-manual)
This project was written primarily using Claude Code
- Python 3
- pandoc (optional, for PDF export)
Create a virtual environment and install dependencies (one-time):
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt.venv/bin/python compare.py old.pdf new.pdfThis prints the diff to the terminal. To save it as a markdown file:
.venv/bin/python compare.py old.pdf new.pdf > diff.mdInstall pandoc and a LaTeX distribution if you haven't already:
# Ubuntu/Debian
sudo apt install pandoc texlive
# macOS
brew install pandocThen convert the markdown diff to a landscape PDF with minimal margins in one step:
.venv/bin/python compare.py old.pdf new.pdf | \
pandoc --from gfm -o diff.pdf \
-V geometry:"landscape,margin=0.5in" \
-V fontsize=10pt \
--lua-filter=table-widths.luaOr from a saved markdown file:
pandoc --from gfm diff.md -o diff.pdf \
-V geometry:"landscape,margin=0.5in" \
-V fontsize=10pt \
--lua-filter=table-widths.luaThe table-widths.lua filter (included in this repo) ensures table columns wrap correctly instead of overflowing the page.
Changes are grouped by page number. Each changed paragraph appears as a table row with the original text on the left and the new text on the right. Modified words are shown in bold.
**Page 12:**
| Original | New |
|----------|-----|
| The robot must **stop** within the zone. | The robot must **halt** within the zone. |
| **This rule applies to all matches.** | [DELETED] |
| [ADDED] | **A five-second grace period applies.** |
Page footers and whitespace-only differences are ignored.