A CLI utility that converts MHT/MHTML files into clean Markdown, extracting embedded images and resources to a local assets folder with relative links.
- Python >= 3.12
git clone https://github.com/kgoldberg/md-convert.git
cd md-convert
python -m venv .venv
source .venv/bin/activate
pip install -e .For development (includes pytest):
pip install -e ".[dev]"md-convert INPUT.mhtThis parses the MHT file, extracts embedded resources to an assets/ directory, and writes the Markdown output to page.md (same name as input with .md extension).
| Option | Description |
|---|---|
INPUT |
Path to the MHT/MHTML file to convert (required) |
-o, --output FILE |
Write Markdown to a specific file (default: input filename with .md extension) |
-a, --assets-folder DIR |
Directory for extracted assets (default: assets) |
Convert an MHT file (output written to page.md):
md-convert page.mhtConvert to a specific output file:
md-convert page.mht -o notes/page.mdConvert with a custom assets folder:
md-convert page.mht -o page.md -a images- Parse -- Reads the MHT/MHTML file using Python's
emailmodule and validates themultipart/relatedMIME structure. Handles non-standard charsets likeunicode(UTF-16) from Microsoft Word/Outlook. - Extract -- Writes embedded images and resources to the assets folder, deduplicating files and handling filename collisions.
- Rewrite -- Updates
<img>and<link>references in the HTML to point to the extracted local assets. - Clean -- Pre-processes Word/Outlook HTML by converting CSS-class headings to proper HTML headings, unwrapping layout tables, and stripping MSO conditional comments.
- Convert -- Transforms the cleaned HTML into Markdown using
markdownifywith ATX-style headings and dash-style bullets.
source .venv/bin/activate
pip install -e ".[dev]"
pytestSee LICENSE for details.