forked from python-openxml/python-docx
-
Notifications
You must be signed in to change notification settings - Fork 0
Page break insert and delete API #20
Copy link
Copy link
Closed
Labels
agentTriggers the developer agentTriggers the developer agent
Description
Feature Description
python-docx currently supports adding a page break only by appending a w:br element to a run (run.add_break(WD_BREAK.PAGE)). This is low-level, requires the caller to manage run/paragraph structure manually, and provides no way to insert a page break at an arbitrary position within the document or to delete an existing one.
This issue adds a clean high-level API for inserting and deleting page breaks anywhere in the document, consistent with the existing Document.add_paragraph() / Document.add_section() patterns.
Acceptance Criteria
Insertion
paragraph.add_page_break()inserts a page break at the end of the given paragraph (appends a run containing<w:br w:type="page"/>)document.add_page_break()inserts a new paragraph containing a page break at the end of the document body and returns the paragraph- Both methods return the
Paragraphobject containing the break so the caller can chain further operations - Inserted page breaks round-trip correctly — save and reload produces the same structure
Deletion
paragraph.clear_page_breaks()removes all<w:br w:type="page"/>runs from the paragraph- A utility
paragraph.has_page_break(bool property) allows callers to detect whether a paragraph contains a page break before deciding to delete it - If a run contains only a page break (no text), the run is removed entirely; if it contains other content, only the
<w:br>element is removed
Edge cases
- Inserting a page break into an empty paragraph works correctly
- Deleting a page break from a paragraph that has none is a no-op (no exception raised)
- Page breaks within table cells are handled without error
Suggested Implementation
docx/text/paragraph.py — add to Paragraph:
@property
def has_page_break(self) -> bool:
"""True if this paragraph contains at least one page break."""
return bool(self._p.findall('.//{%s}br[@{%s}type="page"]' % (nsmap['w'], nsmap['w'])))
def add_page_break(self) -> "Paragraph":
"""Append a page break run to this paragraph. Returns self."""
run = self.add_run()
run.add_break(WD_BREAK.PAGE)
return self
def clear_page_breaks(self) -> None:
"""Remove all page break elements from this paragraph."""
for br in self._p.findall('.//{%s}br[@{%s}type="page"]' % (nsmap['w'], nsmap['w'])):
r = br.getparent()
r.remove(br)
if len(r) == 0 and r.text is None: # run is now empty
r.getparent().remove(r)docx/document.py — add to Document:
def add_page_break(self) -> Paragraph:
"""Add a page break paragraph at the end of the document. Returns the paragraph."""
paragraph = self.add_paragraph()
paragraph.add_page_break()
return paragraphTests — add to tests/unit/text/test_paragraph.py and tests/unit/test_document.py:
- Test insertion appends correct XML structure
- Test
has_page_breakreturns True/False correctly - Test
clear_page_breaksremoves only<w:br type="page">elements, leaves text runs intact - Test round-trip (save → reload → check)
Dependencies
None.
Out of Scope
- Column breaks (
WD_BREAK.COLUMN) — separate issue - Text wrapping breaks (
WD_BREAK.TEXT_WRAPPING) — separate issue - Inserting a page break before/after a specific paragraph by index — can be addressed in a follow-up
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
agentTriggers the developer agentTriggers the developer agent