Skip to content

Establish comprehensive testing strategy for new features #43

@loadfix

Description

@loadfix

Feature Description

Set up a multi-layered testing strategy for validating new python-docx features (starting with comments support). The approach ensures correctness without requiring Microsoft Word for every test run, while still producing files that Word will accept.

Testing Layers

Layer 1: XML Structure Tests (every PR)

  • Validate that python-docx produces the correct OOXML elements
  • For comments: assert word/comments.xml contains w:comment elements with correct attributes (w:id, w:author, w:date, w:initials)
  • Assert word/document.xml contains matching w:commentRangeStart and w:commentRangeEnd markers
  • Assert [Content_Types].xml includes the comments content type
  • Assert word/_rels/document.xml.rels includes the comments relationship
  • Follow existing test patterns in the codebase (unit tests with pytest)

Layer 2: OOXML Schema Validation (every PR)

  • Validate output XML against the official ECMA-376 OOXML XSD schemas
  • Use lxml.etree.XMLSchema for validation
  • Create a pytest fixture or helper that validates a generated .docx against the schemas
  • Schemas are publicly available from ECMA — download and include in test fixtures or fetch during CI setup
  • Catches: missing required elements, invalid attributes, wrong element ordering

Layer 3: Round-Trip Tests (every PR)

  • Write a feature with python-docx → save → re-open with python-docx → assert data reads back correctly
  • For comments:
    • Create document with comments → re-read → assert comment text, author, date match
    • Create threaded comments → re-read → assert reply threading is preserved
    • Create comments on specific text ranges → re-read → assert ranges are correct
    • Modify comments → save → re-read → assert modifications persisted
    • Delete comments → save → re-read → assert comments removed

Layer 4: Reference File Comparison (every PR)

  • Maintain a set of reference .docx files created in Microsoft Word (one-time manual step)
  • For comments: create a Word doc with various comment scenarios:
    • Simple comment on a word
    • Comment on a paragraph
    • Threaded reply chain (2-3 levels)
    • Comment by multiple authors
    • Comment with formatted text (bold, italic)
  • Test fixtures read these reference files and assert python-docx parses them correctly
  • Optionally compare XML output of python-docx against reference file XML for structural equivalence
  • Store reference files in tests/fixtures/ or tests/ref-docs/

Layer 5: LibreOffice Headless Validation (CI, optional)

  • Install LibreOffice in CI (sudo apt-get install libreoffice-writer)
  • After generating a .docx, convert to PDF headlessly:
    libreoffice --headless --convert-to pdf output.docx
  • If LibreOffice rejects the file or errors during conversion, the test fails
  • Optionally: render to images and do pixel-based visual regression testing for critical features
  • Add as a separate CI job or test marker so it doesn't slow down the main test suite

Implementation

New test helpers (tests/helpers/)

  • validate_ooxml(docx_path) — unzips and validates all XML parts against OOXML schemas
  • extract_xml_part(docx_path, part_name) — extracts a specific XML part for assertion
  • assert_round_trip(create_fn, assert_fn) — creates a doc, saves, re-opens, runs assertions

CI changes

  • Add lxml to dev dependencies if not already present (for schema validation)
  • Optionally add a LibreOffice validation job in GitHub Actions

Reference file creation (one-time manual)

  • Create reference .docx files in Microsoft Word for each feature area
  • Document what each reference file contains in a tests/ref-docs/README.md
  • Commit to the repo as test fixtures

Acceptance Criteria

  • XML structure test helpers created and documented
  • OOXML schema validation helper created (with schema files or download script)
  • Round-trip test pattern established with at least one example test
  • Reference file comparison pattern established
  • LibreOffice headless validation working in CI (can be a separate optional job)
  • All test helpers are usable for the upcoming comments feature

Dependencies

None — this should be implemented before or alongside the comments feature.

Out of Scope

  • The comments feature itself (separate issue)
  • Visual regression testing with screenshot comparison (future enhancement)
  • Testing with actual Microsoft Word (manual only, not in CI)

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentTriggers the developer agent

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions