mail-parser is a production-grade email parsing library for Python that transforms raw email messages into structured Python objects. Originally built as the foundation for SpamScope, it excels at security analysis, forensics, and RFC-compliant email processing.
Always use factory functions instead of direct MailParser() instantiation:
import mailparser
mail = mailparser.parse_from_file(filepath) # Standard email files
mail = mailparser.parse_from_string(raw_email) # Email as string
mail = mailparser.parse_from_bytes(email_bytes) # Email as bytes
mail = mailparser.parse_from_file_msg(msg_file) # Outlook .msg filesEvery parsed component offers three access patterns (src/mailparser/core.py:550-570):
mail.subject # Python object (decoded string)
mail.subject_raw # Raw header value (JSON list)
mail.subject_json # JSON-serialized versionThis pattern applies to all properties via __getattr__ magic in core.py.
Headers with hyphens use underscore substitution (core.py:__getattr__):
mail.X_MSMail_Priority # Accesses "X-MSMail-Priority" header
mail.Content_Type # Accesses "Content-Type" headerThe project uses uv (modern pip/virtualenv replacement) exclusively:
uv sync # Install all dev/test dependencies (defined in pyproject.toml)
make install # Alias for uv syncNever use pip directly—all commands in Makefile use uv run prefix.
make test # pytest with coverage (generates coverage.xml, junit.xml, htmlcov/)
make lint # ruff check .
make format # ruff format .
make check # lint + test
make pre-commit # Run all pre-commit hooksWhen adding features or fixing bugs you MUST follow these steps:
- Add relevant test email to
tests/mails/if demonstrating new case - Write tests in the corresponding test file following existing patterns, under
tests/ - Run
make testto verify all tests pass before committing - Run
uv run mail-parser -f tests/mails/mail_test_11 -jto manually verify JSON output and that new changes work as expected - Run
make pre-committo ensure code style compliance before pushing
Test data location: tests/mails/ contains malformed emails, Outlook files, and various encodings
(mail_test_1 through mail_test_17, mail_malformed_1-3, mail_outlook_1).
Critical testing rule: When modifying parsing logic, test against malformed emails to ensure security defect detection still works.
make build # uv build → creates dist/*.tar.gz and dist/*.whl
make release # build + twine upload to PyPIVersion is dynamically loaded from src/mailparser/version.py (see
pyproject.toml:tool.hatch.version).
The parser identifies RFC violations that could indicate malicious intent (core.py:240-268):
mail.has_defects # Boolean flag
mail.defects # List of defect dicts by content type
mail.defects_categories # Set of defect class names (e.g., "StartBoundaryNotFoundDefect")Epilogue defect handling (core.py:320-335): When EPILOGUE_DEFECTS are detected, parser extracts hidden
content between MIME boundaries that could contain malicious payloads.
get_server_ipaddress(trust) method (core.py:487-528) extracts sender IPs with trust-level validation:
# Finds first non-private IP in trusted headers
mail.get_server_ipaddress(trust="Received")Filters out private IP ranges using Python's ipaddress module.
Complex regex-based parsing (utils.py:302-360, patterns in const.py:24-73) extracts hop-by-hop routing:
# Returns list of dicts with: by, from, date, date_utc, delay, envelope_from, hop, with
mail.receivedKey pattern: RECEIVED_COMPILED_LIST contains pre-compiled regexes for "from", "by", "with", "id", "for",
"via", "envelope-from", "envelope-sender", and date patterns. Recent fixes addressed IBM gateway duplicate matches
(see comments in const.py:26-38).
If parsing fails, falls back to receiveds_not_parsed() returning {"raw": <header>, "hop": <n>}
structure.
Package uses modern src-layout (src/mailparser/) for cleaner imports and testing isolation:
src/mailparser/
├── __init__.py # Exports factory functions
├── __main__.py # CLI entry point (mail-parser command)
├── core.py # MailParser class (760 lines)
├── utils.py # Parsing utilities (582 lines)
├── const.py # Regex patterns and constants
├── exceptions.py # Exception hierarchy
└── version.py # Version string
Outlook .msg file parsing requires system-level Perl module:
apt-get install libemail-outlook-message-perl # Debian/UbuntuTriggered via msgconvert() function in utils.py that shells out to Perl script. Raises MailParserOutlookError
if unavailable.
__main__.py provides production CLI with mutually exclusive input modes (-f, -s, -k), JSON output (-j),
and selective printing (-b, -a, -r, -t).
Entry point defined in pyproject.toml:project.scripts:
[project.scripts]
mail-parser = "mailparser.__main__:main"Single linter/formatter (replaces black, isort, flake8):
[tool.ruff.lint]
select = ["E", "F", "I"] # pycodestyle, pyflakes, isort
# "UP", "B", "SIM", "S", "PT" commented out in pyproject.tomlKey markers in pyproject.toml:tool.pytest.ini_options:
integration: marks integration tests- Coverage outputs: XML (for CI), HTML (for local), terminal
- JUnit XML for CI integration
- Don't instantiate
MailParser()directly—use factory functions from__init__.py - Don't use
pip—always useuvor Makefile targets - Don't ignore defects—they're critical for security analysis
- Don't assume headers exist—use
.get()pattern or handleNone - Test against malformed emails—
tests/mails/mail_malformed_*files exist for this reason
Dockerfile uses Python 3.10-slim-bookworm with Outlook dependencies pre-installed. Container runs as non-root
mailparser user.
docker build -t mail-parser .
docker run mail-parser -f /path/to/email- Property implementation:
core.py:540-730(all@propertydecorators) - Attachment extraction:
core.py:355-475(walks multipart, handles encoding) - Received parsing logic:
utils.py:302-455+const.py:24-73(regex patterns) - CLI implementation:
__main__.py:30-347(argparse + output formatting) - Exception hierarchy:
exceptions.py:20-60(5 exception types)
When adding features:
- Add test email to
tests/mails/if demonstrating new case - Write tests in
tests/test_mail_parser.pyfollowing existing patterns - Test both normal and
_raw/_jsonproperty variants - Verify defect detection for security-relevant changes
- Run
make checkbefore committing