forked from python-openxml/python-docx
-
Notifications
You must be signed in to change notification settings - Fork 0
Establish comprehensive testing strategy for new features #43
Copy link
Copy link
Open
Labels
agentTriggers the developer agentTriggers the developer agent
Description
Feature Description
Set up a multi-layered testing strategy for validating new python-docx features (starting with comments support). The approach ensures correctness without requiring Microsoft Word for every test run, while still producing files that Word will accept.
Testing Layers
Layer 1: XML Structure Tests (every PR)
- Validate that python-docx produces the correct OOXML elements
- For comments: assert
word/comments.xmlcontainsw:commentelements with correct attributes (w:id,w:author,w:date,w:initials) - Assert
word/document.xmlcontains matchingw:commentRangeStartandw:commentRangeEndmarkers - Assert
[Content_Types].xmlincludes the comments content type - Assert
word/_rels/document.xml.relsincludes the comments relationship - Follow existing test patterns in the codebase (unit tests with pytest)
Layer 2: OOXML Schema Validation (every PR)
- Validate output XML against the official ECMA-376 OOXML XSD schemas
- Use
lxml.etree.XMLSchemafor validation - Create a pytest fixture or helper that validates a generated
.docxagainst the schemas - Schemas are publicly available from ECMA — download and include in test fixtures or fetch during CI setup
- Catches: missing required elements, invalid attributes, wrong element ordering
Layer 3: Round-Trip Tests (every PR)
- Write a feature with python-docx → save → re-open with python-docx → assert data reads back correctly
- For comments:
- Create document with comments → re-read → assert comment text, author, date match
- Create threaded comments → re-read → assert reply threading is preserved
- Create comments on specific text ranges → re-read → assert ranges are correct
- Modify comments → save → re-read → assert modifications persisted
- Delete comments → save → re-read → assert comments removed
Layer 4: Reference File Comparison (every PR)
- Maintain a set of reference
.docxfiles created in Microsoft Word (one-time manual step) - For comments: create a Word doc with various comment scenarios:
- Simple comment on a word
- Comment on a paragraph
- Threaded reply chain (2-3 levels)
- Comment by multiple authors
- Comment with formatted text (bold, italic)
- Test fixtures read these reference files and assert python-docx parses them correctly
- Optionally compare XML output of python-docx against reference file XML for structural equivalence
- Store reference files in
tests/fixtures/ortests/ref-docs/
Layer 5: LibreOffice Headless Validation (CI, optional)
- Install LibreOffice in CI (
sudo apt-get install libreoffice-writer) - After generating a
.docx, convert to PDF headlessly:libreoffice --headless --convert-to pdf output.docx
- If LibreOffice rejects the file or errors during conversion, the test fails
- Optionally: render to images and do pixel-based visual regression testing for critical features
- Add as a separate CI job or test marker so it doesn't slow down the main test suite
Implementation
New test helpers (tests/helpers/)
validate_ooxml(docx_path)— unzips and validates all XML parts against OOXML schemasextract_xml_part(docx_path, part_name)— extracts a specific XML part for assertionassert_round_trip(create_fn, assert_fn)— creates a doc, saves, re-opens, runs assertions
CI changes
- Add
lxmlto dev dependencies if not already present (for schema validation) - Optionally add a LibreOffice validation job in GitHub Actions
Reference file creation (one-time manual)
- Create reference
.docxfiles in Microsoft Word for each feature area - Document what each reference file contains in a
tests/ref-docs/README.md - Commit to the repo as test fixtures
Acceptance Criteria
- XML structure test helpers created and documented
- OOXML schema validation helper created (with schema files or download script)
- Round-trip test pattern established with at least one example test
- Reference file comparison pattern established
- LibreOffice headless validation working in CI (can be a separate optional job)
- All test helpers are usable for the upcoming comments feature
Dependencies
None — this should be implemented before or alongside the comments feature.
Out of Scope
- The comments feature itself (separate issue)
- Visual regression testing with screenshot comparison (future enhancement)
- Testing with actual Microsoft Word (manual only, not in CI)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
agentTriggers the developer agentTriggers the developer agent