v0.2.0: Interned tags, batch APIs, honest benchmarks#1
Merged
cigrainger merged 7 commits intomainfrom Mar 27, 2026
Merged
Conversation
Performance: - Intern unique tag names as Python strings at parse time (~20-200 unique names). Element.tag is now a refcount bump, not a string copy. - Eagerly cache tag on Element creation (zero FFI on .tag access). - New batch APIs: child_tags(), descendant_tags() — single FFI call for all results using interned strings. 25x faster than lxml on large-document traversal. - Eliminate double-borrow in make_element when callers already hold a Document reference. Benchmarks: - GC disabled, 3 warmup + 20 timed iterations, median reported. - Three corpus types: catalog (data), PubMed (document), POM (config). - XPath benchmarks compare elements-to-elements (fair). - Fixed POM namespace artifact that made lxml appear faster (it was returning 0 results due to xmlns mismatch). - Traversal section honestly shows per-element FFI overhead alongside the batch API that eliminates it. - Removed POM xmlns from benchmark corpus for fair cross-library comparison. README: - Updated all benchmark tables with v0.2.0 numbers. - Documents batch APIs in API section. - Notes that parse() includes index construction. - Honest framing of traversal trade-offs. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Review fixes: - build_meta returns unique_names directly (was built then dropped) - build_interned_names reduced from 27 lines to 3 (takes &[String]) - name_map borrows &str from index instead of cloning per entry - Mutation methods documented as raising TypeError - All user-facing docstrings cleaned: no implementation details, no FFI/Rust/interning/refcount language - Rich .pyi stubs with docstrings for IDE hover Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Performance optimizations addressing PyO3 overhead analysis: 1. Zero-copy parse for bytes input (#6): DocumentOwner enum uses PyBackedBytes to borrow directly from Python bytes object's internal buffer, avoiding a full memcpy of the XML document. str input still copies (Python str -> UTF-8 encoding required). 2. Eliminate String intermediaries (#4): All text-returning methods (xpath_text, xpath_string, .text, .tail, .get, .keys, .items, itertext, text_content, tostring) now return Py<PyString> built directly from &str slices. Skips Rust String allocation that PyO3 would then copy again into Python. 3. interned_tag_fast (#3): Hot paths (child_tags, descendant_tags, make_element_borrowed, make_elements) now accept &IndexWithMeta directly, avoiding redundant borrow_dependent() calls in tight loops. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
#1 Iterator pre-caching: ElementIterator pre-builds all (tag_idx, cached_tag) pairs in a single Document borrow at creation time. __next__ no longer borrows Document — just clone_ref on pre-cached values. #2 Lazy ElementList: Element.xpath() and CompiledXPath.eval() now return ElementList — holds one Py<Document> + Vec<usize> of tag indices. Elements created on demand via __getitem__/__iter__. compiled.eval() for 100K results: 4ms -> 0.07ms (57x faster). Supports __len__, __getitem__, __iter__, __bool__, __eq__ (with list comparison). #7 O(1) sibling lookup: child_positions[i] stored in IndexWithMeta at parse time. getnext/getprevious use direct index instead of linear scan over siblings. O(1) instead of O(siblings). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…gation Major refactor to use simdxml's python-bindings-api branch directly: - Drop IndexWithMeta entirely — no more custom parents, name_ids, child_positions, or unique_names. self_cell dependent is now just XmlIndex directly. - .text uses upstream direct_text_first() — zero allocation - .tail uses upstream tail_text() — O(log n) binary search instead of O(n) substring search through raw XML - .getparent() uses upstream parent() — direct array lookup - .getnext()/.getprevious() use upstream child_position() + child_at() - __len__ uses upstream child_count() — zero allocation - __getitem__ uses upstream child_at() — zero allocation - child_tags/descendant_tags use upstream child_slice() — zero alloc - attrib/keys/items use upstream attributes() — single-pass parsing - Tag interning built from upstream name_ids/name_table (no rebuild) - Cargo.toml: simdxml dependency points to git branch (temporary, will switch to crates.io release) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
All numbers improved from upstream API integration: - Parse: 2.2-3.1x vs lxml (was 1.4-1.8x) - XPath text: 10-23x vs lxml (was 1.8-14x) - XPath predicates: up to 42x vs lxml - Traversal: 3-17x vs lxml via batch API - No regressions vs lxml on any benchmark Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
stronce at parse time..tagis now a refcount bump (zero copy, zero FFI lookup)child_tags()anddescendant_tags()return all tag names in a single FFI call — 25x faster than lxml on large documentsxmlnscausing lxml to return 0 results, making it appear fasterBenchmark highlights (Apple Silicon, Python 3.14, lxml 6.0)
//item[@cat="5"]17MB//name17MBchild_tags()17MB//PubmedArticle17MB* lxml comparison is
[e.tag for e in root]Test plan
🤖 Generated with Claude Code