This directory contains automated validation scripts for converting PDFs to accessible HTML with MathML.
All scripts use Python 3 standard library only - no external dependencies required.
validate_conversion.py - Comprehensive validation suite that runs all checks
python3 scripts/validate_conversion.py path/to/file.htmlOutputs a complete report with overall PASS/FAIL verdict.
verify_wcag.py - WCAG 2.1 Level AA compliance
- Unicode violations (U+1D400-U+1D7FF mathematical alphanumeric symbols)
- H1 tag presence
<main>landmark element- MathML
role="math"attributes - Breadcrumb navigation
python3 scripts/verify_wcag.py path/to/file.htmlcheck_heading_hierarchy.py - Semantic heading structure
- Exactly one H1 heading
- No skipped heading levels (H1→H2→H3, never H1→H3)
- All headings have unique IDs
- No duplicate IDs
python3 scripts/check_heading_hierarchy.py path/to/file.htmlcheck_links_images.py - Links and images validation
- Broken internal anchor links (#id references)
- Missing image files (relative paths)
- Images without alt attributes
- Empty links without text or aria-label
python3 scripts/check_links_images.py path/to/file.htmlcheck_mathml.py - MathML accessibility
- All
<math>elements haverole="math" - All
<math>elements havearia-labelwith LaTeX - Proper
<semantics>wrappers - LaTeX
<annotation>elements - Statistics: inline vs block display mode
python3 scripts/check_mathml.py path/to/file.htmlfix_mathml.py - Post-processing for accessibility
- Replace Unicode mathematical characters with HTML entities
- Fix heading hierarchy
- Add ARIA attributes to math elements
- Add breadcrumb and back button navigation
- Wrap content in
<main>landmark - Extract title and set page title
python3 scripts/fix_mathml.py input.html output.htmlfix_unicode_violations.py - Batch fix Unicode violations
- Converts Unicode mathematical characters to HTML entities
- Designed for fixing multiple files
python3 scripts/fix_unicode_violations.py- Upload PDF to Mathpix API
- Download TeX format
- Convert with Pandoc:
pandoc input.tex -f latex -t html --mathml --standalone -o output.html - Post-process:
python3 scripts/fix_mathml.py output.html fixed.html - Validate:
python3 scripts/validate_conversion.py fixed.html - Fix violations if needed
- Re-validate until PASS
================================================================================
PDF TO HTML CONVERSION VALIDATION REPORT
================================================================================
File: graduate/exams/analysis/2025Jan_complex.html
1. WCAG 2.1 LEVEL AA COMPLIANCE
✅ PASS
2. HEADING HIERARCHY
✅ PASS
3. LINKS AND IMAGES
✅ PASS
4. MATHML ACCESSIBILITY
✅ PASS
Statistics:
Total math elements: 47
With role="math": 47
With aria-label: 47
Inline/Block: 32/15
================================================================================
OVERALL VERDICT
================================================================================
✅ PRODUCTION READY - All checks passed
✅ WCAG 2.1 Level AA compliant
✅ ADA Title II & III compliant
✅ Lawsuit risk: MINIMAL
These scripts ensure:
- WCAG 2.1 Level AA compliance
- ADA Title II & III compliance (public entities, places of public accommodation)
- Section 508 compliance (federal accessibility standards)
- Minimized lawsuit risk from accessibility violations
All scripts require Python 3.6+ (uses f-strings and type hints).
All scripts use only Python standard library:
re- Regular expressionsos- File system operationssys- System-specific parameters- Built-in data structures (dict, list, set)
No pip install required!
While not required, you can use a virtual environment:
# Create venv (already gitignored)
python3 -m venv venv
# Activate (optional - scripts work without activation)
source venv/bin/activate
# No packages to install - all standard library!All scripts follow standard Unix exit codes:
- 0 = All checks passed (success)
- 1 = One or more checks failed (failure)
This allows integration with CI/CD pipelines.
When adding new validation checks:
- Use only Python standard library (no external dependencies)
- Return
(passed: bool, details: list)tuple - Provide clear, actionable error messages
- Add to
validate_conversion.pymaster script - Document in this README
- See
/.claude/commands/mathml-general-exam.mdfor exam conversion workflow - See
/.claude/commands/mathml-any-pdf.mdfor general PDF conversion workflow - See
/CLAUDE.mdfor website content management guidelines