Skip to content

Page break insert and delete API #20

@loadfix

Description

@loadfix

Feature Description

python-docx currently supports adding a page break only by appending a w:br element to a run (run.add_break(WD_BREAK.PAGE)). This is low-level, requires the caller to manage run/paragraph structure manually, and provides no way to insert a page break at an arbitrary position within the document or to delete an existing one.

This issue adds a clean high-level API for inserting and deleting page breaks anywhere in the document, consistent with the existing Document.add_paragraph() / Document.add_section() patterns.

Acceptance Criteria

Insertion

  • paragraph.add_page_break() inserts a page break at the end of the given paragraph (appends a run containing <w:br w:type="page"/>)
  • document.add_page_break() inserts a new paragraph containing a page break at the end of the document body and returns the paragraph
  • Both methods return the Paragraph object containing the break so the caller can chain further operations
  • Inserted page breaks round-trip correctly — save and reload produces the same structure

Deletion

  • paragraph.clear_page_breaks() removes all <w:br w:type="page"/> runs from the paragraph
  • A utility paragraph.has_page_break (bool property) allows callers to detect whether a paragraph contains a page break before deciding to delete it
  • If a run contains only a page break (no text), the run is removed entirely; if it contains other content, only the <w:br> element is removed

Edge cases

  • Inserting a page break into an empty paragraph works correctly
  • Deleting a page break from a paragraph that has none is a no-op (no exception raised)
  • Page breaks within table cells are handled without error

Suggested Implementation

docx/text/paragraph.py — add to Paragraph:

@property
def has_page_break(self) -> bool:
    """True if this paragraph contains at least one page break."""
    return bool(self._p.findall('.//{%s}br[@{%s}type="page"]' % (nsmap['w'], nsmap['w'])))

def add_page_break(self) -> "Paragraph":
    """Append a page break run to this paragraph. Returns self."""
    run = self.add_run()
    run.add_break(WD_BREAK.PAGE)
    return self

def clear_page_breaks(self) -> None:
    """Remove all page break elements from this paragraph."""
    for br in self._p.findall('.//{%s}br[@{%s}type="page"]' % (nsmap['w'], nsmap['w'])):
        r = br.getparent()
        r.remove(br)
        if len(r) == 0 and r.text is None:  # run is now empty
            r.getparent().remove(r)

docx/document.py — add to Document:

def add_page_break(self) -> Paragraph:
    """Add a page break paragraph at the end of the document. Returns the paragraph."""
    paragraph = self.add_paragraph()
    paragraph.add_page_break()
    return paragraph

Tests — add to tests/unit/text/test_paragraph.py and tests/unit/test_document.py:

  • Test insertion appends correct XML structure
  • Test has_page_break returns True/False correctly
  • Test clear_page_breaks removes only <w:br type="page"> elements, leaves text runs intact
  • Test round-trip (save → reload → check)

Dependencies

None.

Out of Scope

  • Column breaks (WD_BREAK.COLUMN) — separate issue
  • Text wrapping breaks (WD_BREAK.TEXT_WRAPPING) — separate issue
  • Inserting a page break before/after a specific paragraph by index — can be addressed in a follow-up

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentTriggers the developer agent

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions