Skip to content

feat: Page break insert and delete API#42

Merged
citconv-agents[bot] merged 2 commits intomasterfrom
agent/issue-20
Apr 3, 2026
Merged

feat: Page break insert and delete API#42
citconv-agents[bot] merged 2 commits intomasterfrom
agent/issue-20

Conversation

@citconv-agents
Copy link
Copy Markdown

@citconv-agents citconv-agents bot commented Apr 3, 2026

Summary

Implements #20

This PR was automatically generated by the Developer Agent.

Original Issue

Feature Description

python-docx currently supports adding a page break only by appending a w:br element to a run (run.add_break(WD_BREAK.PAGE)). This is low-level, requires the caller to manage run/paragraph structure manually, and provides no way to insert a page break at an arbitrary position within the document or to delete an existing one.

This issue adds a clean high-level API for inserting and deleting page breaks anywhere in the document, consistent with the existing Document.add_paragraph() / Document.add_section() patterns.

Acceptance Criteria

Insertion

  • paragraph.add_page_break() inserts a page break at the end of the given paragraph (appends a run containing <w:br w:type="page"/>)
  • document.add_page_break() inserts a new paragraph containing a page break at the end of the document body and returns the paragraph
  • Both methods return the Paragraph object containing the break so the caller can chain further operations
  • Inserted page breaks round-trip correctly — save and reload produces the same structure

Deletion

  • paragraph.clear_page_breaks() removes all <w:br w:type="page"/> runs from the paragraph
  • A utility paragraph.has_page_break (bool property) allows callers to detect whether a paragraph contains a page break before deciding to delete it
  • If a run contains only a page break (no text), the run is removed entirely; if it contains other content, only the <w:br> element is removed

Edge cases

  • Inserting a page break into an empty paragraph works correctly
  • Deleting a page break from a paragraph that has none is a no-op (no exception raised)
  • Page breaks within table cells are handled without error

Suggested Implementation

docx/text/paragraph.py — add to Paragraph:

@property
def has_page_break(self) -> bool:
    """True if this paragraph contains at least one page break."""
    return bool(self._p.findall('.//{%s}br[@{%s}type="page"]' % (nsmap['w'], nsmap['w'])))

def add_page_break(self) -> "Paragraph":
    """Append a page break run to this paragraph. Returns self."""
    run = self.add_run()
    run.add_break(WD_BREAK.PAGE)
    return self

def clear_page_breaks(self) -> None:
    """Remove all page break elements from this paragraph."""
    for br in self._p.findall('.//{%s}br[@{%s}type="page"]' % (nsmap['w'], nsmap['w'])):
        r = br.getparent()
        r.remove(br)
        if len(r) == 0 and r.text is None:  # run is now empty
            r.getparent().remove(r)

docx/document.py — add to Document:

def add_page_break(self) -> Paragraph:
    """Add a page break paragraph at the end of the document. Returns the paragraph."""
    paragraph = self.add_paragraph()
    paragraph.add_page_break()
    return paragraph

Tests — add to tests/unit/text/test_paragraph.py and tests/unit/test_document.py:

  • Test insertion appends correct XML structure
  • Test has_page_break returns True/False correctly
  • Test clear_page_breaks removes only <w:br type="page"> elements, leaves text runs intact
  • Test round-trip (save → reload → check)

Dependencies

None.

Out of Scope

  • Column breaks (WD_BREAK.COLUMN) — separate issue
  • Text wrapping breaks (WD_BREAK.TEXT_WRAPPING) — separate issue
  • Inserting a page break before/after a specific paragraph by index — can be addressed in a follow-up

Generated by Developer Agent using Claude Code

Add `Paragraph.add_page_break()`, `Paragraph.has_page_break`, and
`Paragraph.clear_page_breaks()` methods for high-level page break
management. Update `Document.add_page_break()` to delegate to the
new paragraph method.

Closes #20

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@citconv-agents citconv-agents bot added the agent-pr PR created by an agent label Apr 3, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 3, 2026

Security Agent Report

SECURITY_PASS

Security Review: PR #42 — Page Break Insert/Delete API

Branch: agent/issue-20
Base: origin/master
Date: 2026-04-03

Summary

No security issues found. All changes are internal refactoring and new paragraph-level page break methods with no exposure to external input.


Files Reviewed

File Change Type
src/docx/document.py Refactor: delegates add_page_break() to Paragraph.add_page_break()
src/docx/text/paragraph.py New methods: add_page_break(), clear_page_breaks(), has_page_break
tests/test_document.py Test update (no production code)
tests/text/test_paragraph.py New tests (no production code)

Checks

Injection Risks (XPath / XXE / XML injection)

The two new XPath queries in src/docx/text/paragraph.py are:

self._p.xpath('.//w:br[@w:type="page"]')  # lines 83, 98
  • Hardcoded strings — no user-controlled input is interpolated.
  • self._p is a BaseOxmlElement whose .xpath() override (xmlchemy.py:692) passes the project-wide nsmap, so the w: prefix is safely bound.
  • No external XML is parsed in these changes; no XXE surface is introduced.

Path Traversal / File Handling

No file paths or file I/O in the changed code.

New Dependencies

No new dependencies added (requirements*.txt, pyproject.toml, setup.cfg unchanged in diff).

Secrets / Credentials

No API keys, tokens, passwords, or other secrets present.

Data Exposure

No sensitive data is read, logged, or returned. The methods operate entirely on the in-memory lxml element tree.


Notes (Non-Security)

One pre-existing pattern (outside this PR's scope) in src/docx/oxml/comments.py:65 interpolates an integer comment_id directly into an XPath string:

comment_elms = self.xpath(f"(./w:comment[@w:id='{comment_id}'])[1]")

This is not part of PR #42 but may warrant a follow-up review: if comment_id ever originates from untrusted input, XPath injection is theoretically possible. Because comment_id is typed as int and all call-sites appear to source it from internal document state, the practical risk is currently low.


Result: PASS — no security issues in this PR.

@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 3, 2026

Security Agent Report

SECURITY_PASS

Security Review — PR #42

Summary

PR #42 adds Paragraph.add_page_break(), Paragraph.clear_page_breaks(), and Paragraph.has_page_break, and refactors Document.add_page_break() to delegate to the new paragraph method.

Files Reviewed

  • src/docx/document.py
  • src/docx/text/paragraph.py
  • tests/test_document.py
  • tests/text/test_paragraph.py

Findings

No issues found.

XPath expressions (paragraph.py:86, paragraph.py:100)
Both xpath('.//w:br[@w:type="page"]') calls use fully hardcoded expressions with no user-controlled input — no XPath injection risk.

XML manipulation (paragraph.py:87-91)
r.remove(br) and r.getparent().remove(r) operate on already-parsed lxml element objects from the document's own tree. No raw XML strings are constructed or parsed here — no XML injection or XXE risk.

No new dependencies introduced.

No file I/O or path handling added.

No secrets, credentials, or sensitive data present in any changed file.

Conclusion

All changes are safe, well-scoped DOM manipulations with hardcoded XPath expressions. No security concerns identified.

@citconv-agents citconv-agents bot added the security-passed Security agent passed label Apr 3, 2026
@citconv-agents
Copy link
Copy Markdown
Author

citconv-agents bot commented Apr 3, 2026

Review Agent

REVIEW_APPROVED

Summary

Clean, correct refactoring that moves page-break logic to the right abstraction level (Paragraph) and adds two genuinely useful new methods.

What's good

  • Document.add_page_break() delegation is the right call. The page-break concern belongs on Paragraph, not Document. The refactor eliminates the WD_BREAK import from document.py.
  • Return type annotation added to Document.add_page_break() -> Paragraph — a small but worthwhile improvement that was missing before.
  • clear_page_breaks correctly uses a pre-materialized xpath() list (not a lazy iterator), so mutating the tree during iteration is safe. The logic to remove the entire run when it becomes empty (len(r) == 0 and not r.text) is correct and handles all the tested edge cases including: no-op, run-only-break, run-with-text, multiple page breaks, and non-page line breaks preserved.
  • has_page_break vs contains_page_break: These are semantically distinct (has_page_break looks for explicit <w:br w:type="page"/> elements; contains_page_break looks for rendered page breaks via lastRenderedPageBreaks). The naming is clear and the distinction is real.
  • Namespace handling in xpath: self._p.xpath(...) calls BaseOxmlElement.xpath() which injects namespaces=nsmap (xmlchemy.py:692), so @w:type="page" is correctly resolved. Consistent with other xpath calls in the codebase (e.g. comments.py: [@w:id=...]).
  • Test coverage is thorough: parametrized tests cover the no-op, single-break, mixed-content, multi-break, and preserved-line-break cases for clear_page_breaks, and correct cases for has_page_break.
  • Updated document test correctly drops the run_ mock and asserts on paragraph_.add_page_break() instead.

Minor observations (not blocking)

  • clear_page_breaks does not remove a run that contains only <w:rPr> after the page break is stripped (since len(r) == 1 in that case). This is a defensible design choice — preserving run properties avoids silent style loss — but it's an untested edge case worth noting. If this scenario can arise in real documents, a test and docstring note would be worthwhile.

@citconv-agents citconv-agents bot added the review-approved Review agent approved label Apr 3, 2026
@citconv-agents citconv-agents bot merged commit 50e2dc2 into master Apr 3, 2026
7 of 8 checks passed
@citconv-agents citconv-agents bot deleted the agent/issue-20 branch April 3, 2026 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-pr PR created by an agent review-approved Review agent approved security-passed Security agent passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant