feat: Page break insert and delete API by citconv-agents[bot] · Pull Request #42 · loadfix/python-docx

citconv-agents · 2026-04-03T08:56:23Z

Summary

Implements #20

This PR was automatically generated by the Developer Agent.

Original Issue

Feature Description

python-docx currently supports adding a page break only by appending a w:br element to a run (run.add_break(WD_BREAK.PAGE)). This is low-level, requires the caller to manage run/paragraph structure manually, and provides no way to insert a page break at an arbitrary position within the document or to delete an existing one.

This issue adds a clean high-level API for inserting and deleting page breaks anywhere in the document, consistent with the existing Document.add_paragraph() / Document.add_section() patterns.

Acceptance Criteria

Insertion

paragraph.add_page_break() inserts a page break at the end of the given paragraph (appends a run containing <w:br w:type="page"/>)
document.add_page_break() inserts a new paragraph containing a page break at the end of the document body and returns the paragraph
Both methods return the Paragraph object containing the break so the caller can chain further operations
Inserted page breaks round-trip correctly — save and reload produces the same structure

Deletion

paragraph.clear_page_breaks() removes all <w:br w:type="page"/> runs from the paragraph
A utility paragraph.has_page_break (bool property) allows callers to detect whether a paragraph contains a page break before deciding to delete it
If a run contains only a page break (no text), the run is removed entirely; if it contains other content, only the <w:br> element is removed

Edge cases

Inserting a page break into an empty paragraph works correctly
Deleting a page break from a paragraph that has none is a no-op (no exception raised)
Page breaks within table cells are handled without error

Suggested Implementation

docx/text/paragraph.py — add to Paragraph:

@property
def has_page_break(self) -> bool:
    """True if this paragraph contains at least one page break."""
    return bool(self._p.findall('.//{%s}br[@{%s}type="page"]' % (nsmap['w'], nsmap['w'])))

def add_page_break(self) -> "Paragraph":
    """Append a page break run to this paragraph. Returns self."""
    run = self.add_run()
    run.add_break(WD_BREAK.PAGE)
    return self

def clear_page_breaks(self) -> None:
    """Remove all page break elements from this paragraph."""
    for br in self._p.findall('.//{%s}br[@{%s}type="page"]' % (nsmap['w'], nsmap['w'])):
        r = br.getparent()
        r.remove(br)
        if len(r) == 0 and r.text is None:  # run is now empty
            r.getparent().remove(r)

docx/document.py — add to Document:

def add_page_break(self) -> Paragraph:
    """Add a page break paragraph at the end of the document. Returns the paragraph."""
    paragraph = self.add_paragraph()
    paragraph.add_page_break()
    return paragraph

Tests — add to tests/unit/text/test_paragraph.py and tests/unit/test_document.py:

Test insertion appends correct XML structure
Test has_page_break returns True/False correctly
Test clear_page_breaks removes only <w:br type="page"> elements, leaves text runs intact
Test round-trip (save → reload → check)

Dependencies

None.

Out of Scope

Column breaks (WD_BREAK.COLUMN) — separate issue
Text wrapping breaks (WD_BREAK.TEXT_WRAPPING) — separate issue
Inserting a page break before/after a specific paragraph by index — can be addressed in a follow-up

Generated by Developer Agent using Claude Code

Add `Paragraph.add_page_break()`, `Paragraph.has_page_break`, and `Paragraph.clear_page_breaks()` methods for high-level page break management. Update `Document.add_page_break()` to delegate to the new paragraph method. Closes #20 Co-Authored-By: Claude Opus 4.6 <[email protected]>

citconv-agents · 2026-04-03T08:57:45Z

Security Agent Report

SECURITY_PASS

Security Review: PR #42 — Page Break Insert/Delete API

Branch: agent/issue-20
Base: origin/master
Date: 2026-04-03

Summary

No security issues found. All changes are internal refactoring and new paragraph-level page break methods with no exposure to external input.

Files Reviewed

File	Change Type
`src/docx/document.py`	Refactor: delegates `add_page_break()` to `Paragraph.add_page_break()`
`src/docx/text/paragraph.py`	New methods: `add_page_break()`, `clear_page_breaks()`, `has_page_break`
`tests/test_document.py`	Test update (no production code)
`tests/text/test_paragraph.py`	New tests (no production code)

Checks

Injection Risks (XPath / XXE / XML injection)

The two new XPath queries in src/docx/text/paragraph.py are:

self._p.xpath('.//w:br[@w:type="page"]')  # lines 83, 98

Hardcoded strings — no user-controlled input is interpolated.
self._p is a BaseOxmlElement whose .xpath() override (xmlchemy.py:692) passes the project-wide nsmap, so the w: prefix is safely bound.
No external XML is parsed in these changes; no XXE surface is introduced.

Path Traversal / File Handling

No file paths or file I/O in the changed code.

New Dependencies

No new dependencies added (requirements*.txt, pyproject.toml, setup.cfg unchanged in diff).

Secrets / Credentials

No API keys, tokens, passwords, or other secrets present.

Data Exposure

No sensitive data is read, logged, or returned. The methods operate entirely on the in-memory lxml element tree.

Notes (Non-Security)

One pre-existing pattern (outside this PR's scope) in src/docx/oxml/comments.py:65 interpolates an integer comment_id directly into an XPath string:

comment_elms = self.xpath(f"(./w:comment[@w:id='{comment_id}'])[1]")

This is not part of PR #42 but may warrant a follow-up review: if comment_id ever originates from untrusted input, XPath injection is theoretically possible. Because comment_id is typed as int and all call-sites appear to source it from internal document state, the practical risk is currently low.

Result: PASS — no security issues in this PR.

citconv-agents · 2026-04-03T12:14:54Z

Security Agent Report

SECURITY_PASS

Security Review — PR #42

Summary

PR #42 adds Paragraph.add_page_break(), Paragraph.clear_page_breaks(), and Paragraph.has_page_break, and refactors Document.add_page_break() to delegate to the new paragraph method.

Files Reviewed

src/docx/document.py
src/docx/text/paragraph.py
tests/test_document.py
tests/text/test_paragraph.py

Findings

No issues found.

XPath expressions (paragraph.py:86, paragraph.py:100)
Both xpath('.//w:br[@w:type="page"]') calls use fully hardcoded expressions with no user-controlled input — no XPath injection risk.

XML manipulation (paragraph.py:87-91)
r.remove(br) and r.getparent().remove(r) operate on already-parsed lxml element objects from the document's own tree. No raw XML strings are constructed or parsed here — no XML injection or XXE risk.

No new dependencies introduced.

No file I/O or path handling added.

No secrets, credentials, or sensitive data present in any changed file.

Conclusion

All changes are safe, well-scoped DOM manipulations with hardcoded XPath expressions. No security concerns identified.

citconv-agents · 2026-04-03T12:19:26Z

Review Agent

REVIEW_APPROVED

Summary

Clean, correct refactoring that moves page-break logic to the right abstraction level (Paragraph) and adds two genuinely useful new methods.

What's good

Document.add_page_break() delegation is the right call. The page-break concern belongs on Paragraph, not Document. The refactor eliminates the WD_BREAK import from document.py.
Return type annotation added to Document.add_page_break() -> Paragraph — a small but worthwhile improvement that was missing before.
clear_page_breaks correctly uses a pre-materialized xpath() list (not a lazy iterator), so mutating the tree during iteration is safe. The logic to remove the entire run when it becomes empty (len(r) == 0 and not r.text) is correct and handles all the tested edge cases including: no-op, run-only-break, run-with-text, multiple page breaks, and non-page line breaks preserved.
has_page_break vs contains_page_break: These are semantically distinct (has_page_break looks for explicit <w:br w:type="page"/> elements; contains_page_break looks for rendered page breaks via lastRenderedPageBreaks). The naming is clear and the distinction is real.
Namespace handling in xpath: self._p.xpath(...) calls BaseOxmlElement.xpath() which injects namespaces=nsmap (xmlchemy.py:692), so @w:type="page" is correctly resolved. Consistent with other xpath calls in the codebase (e.g. comments.py: [@w:id=...]).
Test coverage is thorough: parametrized tests cover the no-op, single-break, mixed-content, multi-break, and preserved-line-break cases for clear_page_breaks, and correct cases for has_page_break.
Updated document test correctly drops the run_ mock and asserts on paragraph_.add_page_break() instead.

Minor observations (not blocking)

clear_page_breaks does not remove a run that contains only <w:rPr> after the page break is stripped (since len(r) == 1 in that case). This is a defensible design choice — preserving run properties avoids silent style loss — but it's an untested edge case worth noting. If this scenario can arise in real documents, a test and docstring note would be worthwhile.

citconv-agents bot added the agent-pr PR created by an agent label Apr 3, 2026

citconv-agents bot mentioned this pull request Apr 3, 2026

Page break insert and delete API #20

Closed

trigger security agent

3577d3a

citconv-agents bot added the security-passed Security agent passed label Apr 3, 2026

citconv-agents bot added the review-approved Review agent approved label Apr 3, 2026

citconv-agents bot merged commit 50e2dc2 into master Apr 3, 2026
7 of 8 checks passed

citconv-agents bot deleted the agent/issue-20 branch April 3, 2026 12:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Page break insert and delete API#42

feat: Page break insert and delete API#42
citconv-agents[bot] merged 2 commits intomasterfrom
agent/issue-20

citconv-agents bot commented Apr 3, 2026

Uh oh!

citconv-agents bot commented Apr 3, 2026

Uh oh!

citconv-agents bot commented Apr 3, 2026

Uh oh!

citconv-agents bot commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

citconv-agents bot commented Apr 3, 2026

Summary

Original Issue

Feature Description

Acceptance Criteria

Insertion

Deletion

Edge cases

Suggested Implementation

Dependencies

Out of Scope

Uh oh!

citconv-agents bot commented Apr 3, 2026

Security Review: PR #42 — Page Break Insert/Delete API

Summary

Files Reviewed

Checks

Injection Risks (XPath / XXE / XML injection)

Path Traversal / File Handling

New Dependencies

Secrets / Credentials

Data Exposure

Notes (Non-Security)

Uh oh!

citconv-agents bot commented Apr 3, 2026

Security Review — PR #42

Summary

Files Reviewed

Findings

No issues found.

Conclusion

Uh oh!

citconv-agents bot commented Apr 3, 2026

Summary

What's good

Minor observations (not blocking)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant