Skip to content

jschof1/docx2md

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ docx2md

Convert Word documents to clean Markdown β€” from the command line.

MIT License Shellcheck GitHub stars

One dependency Β· Zero config Β· Works everywhere

Install Β· Usage Β· Why Β· Contributing


✨ What it does

docx2md wraps pandoc with sensible defaults so you can convert .docx files to Markdown without memorizing flags.

docx2md report.docx
# βœ“ report.docx β†’ report.md

That's it. No config files, no build step, no Node.js runtime. One script, one dependency.

πŸš€ Install

One-liner (macOS, Linux, WSL)

curl -fsSL https://raw.githubusercontent.com/jschof1/docx2md/main/install.sh | bash

Or with git

git clone https://github.com/jschof1/docx2md.git
cd docx2md
sudo make install

Or just download it

curl -fsSL https://raw.githubusercontent.com/jschof1/docx2md/main/docx2md -o /usr/local/bin/docx2md
chmod +x /usr/local/bin/docx2md

Prerequisite

pandoc must be installed:

Platform Install
macOS brew install pandoc
Ubuntu/Debian sudo apt install pandoc
Fedora sudo dnf install pandoc
Windows (WSL) sudo apt install pandoc
Windows (native) choco install pandoc
Arch sudo pacman -S pandoc

πŸ“– Usage

Basic

docx2md report.docx                    # β†’ report.md
docx2md report.docx notes.md           # β†’ notes.md (custom output name)

Extract images

docx2md --images report.docx           # images extracted to ./images/
docx2md -i assets report.docx          # images extracted to ./assets/

Batch convert

docx2md chapter1.docx chapter2.docx chapter3.docx
docx2md *.docx

Pipe to other tools

docx2md -s report.docx | head -20      # preview first 20 lines
docx2md -s report.docx | wc -w         # word count
docx2md -s report.docx > output.md     # redirect to file

All options

USAGE
    docx2md [OPTIONS] <input.docx...>
    docx2md <input.docx> [output.md]

OPTIONS
    -i, --images [DIR]    Extract images into DIR (default: images/)
    -s, --stdout          Write Markdown to stdout instead of a file
    -q, --quiet           Suppress all output except errors
    -w, --wrap MODE       Line wrapping: none (default), auto, or preserve
    -h, --help            Show this help message
    -V, --version         Show version number

πŸ€” Why

There are plenty of docx-to-markdown tools. Here's why this one exists:

  • Zero config β€” no config files, no presets, no decisions to make
  • One dependency β€” only pandoc, which you probably already have
  • Batch mode β€” convert 50 docs in one command
  • Image extraction β€” pull embedded images out with one flag
  • Pipes β€” stdout mode works with head, grep, wc, and everything else
  • Portable β€” pure bash, runs on macOS, Linux, WSL, anywhere with a shell
  • Fast β€” no runtime, no daemon, no overhead

πŸ—ΊοΈ How it compares

docx2md mattn/docx2md microsoft/markitdown
Dependencies pandoc None (Go binary) Python + packages
Install size ~5 KB ~3 MB ~50 MB
Batch mode βœ… ❌ βœ…
Image extraction βœ… βœ… βœ…
Stdout / pipes βœ… ❌ βœ…
Config needed None None None
Language Bash Go Python

πŸ§ͺ Development

git clone https://github.com/jschof1/docx2md.git
cd docx2md

# Run tests
make test

# Lint
make lint

# Install locally
make install

# Uninstall
make uninstall

❓ FAQ

Does it convert .doc (old Word format)? Not directly. Convert to .docx first with libreoffice --convert-to docx file.doc, then use docx2md.

What about tables, footnotes, and math? Pandoc handles all of these. Complex tables may need manual cleanup, but most convert cleanly.

Why wrap pandoc? Isn't this just pandoc -f docx -t markdown? Yes, and that's the point. Nobody remembers those flags. docx2md report.docx is easier to type, easier to remember, and handles batch conversion and image extraction without reaching for the pandoc manual.

Contributing

Contributions welcome. Please:

  1. Fork the repo
  2. Create a feature branch (git checkout -b my-feature)
  3. Commit your changes
  4. Open a pull request

Keep it simple β€” this tool's value is its simplicity.

License

MIT β€” use it however you like.


If this saved you time, consider giving it a ⭐

⭐ Star on GitHub

About

πŸ“„ Convert Word documents to clean Markdown from the command line. One dependency, zero config, works everywhere.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors