v0.2.0: Interned tags, batch APIs, honest benchmarks by cigrainger · Pull Request #1 · simdxml/simdxml-python

cigrainger · 2026-03-27T10:17:24Z

Summary

Tag name interning: ~20-200 unique names created as Python str once at parse time. .tag is now a refcount bump (zero copy, zero FFI lookup)
Batch traversal APIs: child_tags() and descendant_tags() return all tag names in a single FFI call — 25x faster than lxml on large documents
Honest benchmarks: GC disabled, warmup, three corpus types (catalog/PubMed/POM), elements-to-elements comparison, traversal weakness documented
Fixed namespace artifact: POM benchmark had xmlns causing lxml to return 0 results, making it appear faster
Eliminated double-borrows in Element construction when callers already hold Document reference

Benchmark highlights (Apple Silicon, Python 3.14, lxml 6.0)

Operation	simdxml	lxml	Speedup
Parse 17MB catalog	59 ms	87 ms	1.5x
XPath `//item[@cat="5"]` 17MB	1.8 ms	74 ms	41x
XPath text `//name` 17MB	3.3 ms	40 ms	12x
`child_tags()` 17MB	0.44 ms	11.8 ms*	27x
`//PubmedArticle` 17MB	0.37 ms	9.4 ms	25x

* lxml comparison is [e.tag for e in root]

Test plan

191 tests passing
ruff clean
pyright 0 errors
Benchmarks run and README updated with fresh numbers

🤖 Generated with Claude Code

Performance: - Intern unique tag names as Python strings at parse time (~20-200 unique names). Element.tag is now a refcount bump, not a string copy. - Eagerly cache tag on Element creation (zero FFI on .tag access). - New batch APIs: child_tags(), descendant_tags() — single FFI call for all results using interned strings. 25x faster than lxml on large-document traversal. - Eliminate double-borrow in make_element when callers already hold a Document reference. Benchmarks: - GC disabled, 3 warmup + 20 timed iterations, median reported. - Three corpus types: catalog (data), PubMed (document), POM (config). - XPath benchmarks compare elements-to-elements (fair). - Fixed POM namespace artifact that made lxml appear faster (it was returning 0 results due to xmlns mismatch). - Traversal section honestly shows per-element FFI overhead alongside the batch API that eliminates it. - Removed POM xmlns from benchmark corpus for fair cross-library comparison. README: - Updated all benchmark tables with v0.2.0 numbers. - Documents batch APIs in API section. - Notes that parse() includes index construction. - Honest framing of traversal trade-offs. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Review fixes: - build_meta returns unique_names directly (was built then dropped) - build_interned_names reduced from 27 lines to 3 (takes &[String]) - name_map borrows &str from index instead of cloning per entry - Mutation methods documented as raising TypeError - All user-facing docstrings cleaned: no implementation details, no FFI/Rust/interning/refcount language - Rich .pyi stubs with docstrings for IDE hover Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Performance optimizations addressing PyO3 overhead analysis: 1. Zero-copy parse for bytes input (#6): DocumentOwner enum uses PyBackedBytes to borrow directly from Python bytes object's internal buffer, avoiding a full memcpy of the XML document. str input still copies (Python str -> UTF-8 encoding required). 2. Eliminate String intermediaries (#4): All text-returning methods (xpath_text, xpath_string, .text, .tail, .get, .keys, .items, itertext, text_content, tostring) now return Py<PyString> built directly from &str slices. Skips Rust String allocation that PyO3 would then copy again into Python. 3. interned_tag_fast (#3): Hot paths (child_tags, descendant_tags, make_element_borrowed, make_elements) now accept &IndexWithMeta directly, avoiding redundant borrow_dependent() calls in tight loops. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

#1 Iterator pre-caching: ElementIterator pre-builds all (tag_idx, cached_tag) pairs in a single Document borrow at creation time. __next__ no longer borrows Document — just clone_ref on pre-cached values. #2 Lazy ElementList: Element.xpath() and CompiledXPath.eval() now return ElementList — holds one Py<Document> + Vec<usize> of tag indices. Elements created on demand via __getitem__/__iter__. compiled.eval() for 100K results: 4ms -> 0.07ms (57x faster). Supports __len__, __getitem__, __iter__, __bool__, __eq__ (with list comparison). #7 O(1) sibling lookup: child_positions[i] stored in IndexWithMeta at parse time. getnext/getprevious use direct index instead of linear scan over siblings. O(1) instead of O(siblings). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…gation Major refactor to use simdxml's python-bindings-api branch directly: - Drop IndexWithMeta entirely — no more custom parents, name_ids, child_positions, or unique_names. self_cell dependent is now just XmlIndex directly. - .text uses upstream direct_text_first() — zero allocation - .tail uses upstream tail_text() — O(log n) binary search instead of O(n) substring search through raw XML - .getparent() uses upstream parent() — direct array lookup - .getnext()/.getprevious() use upstream child_position() + child_at() - __len__ uses upstream child_count() — zero allocation - __getitem__ uses upstream child_at() — zero allocation - child_tags/descendant_tags use upstream child_slice() — zero alloc - attrib/keys/items use upstream attributes() — single-pass parsing - Tag interning built from upstream name_ids/name_table (no rebuild) - Cargo.toml: simdxml dependency points to git branch (temporary, will switch to crates.io release) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

All numbers improved from upstream API integration: - Parse: 2.2-3.1x vs lxml (was 1.4-1.8x) - XPath text: 10-23x vs lxml (was 1.8-14x) - XPath predicates: up to 42x vs lxml - Traversal: 3-17x vs lxml via batch API - No regressions vs lxml on any benchmark Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

cigrainger and others added 7 commits March 27, 2026 21:16

Switch to simdxml 0.2.0 from crates.io

e5470fa

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

cigrainger merged commit 39508b5 into main Mar 27, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0: Interned tags, batch APIs, honest benchmarks#1

v0.2.0: Interned tags, batch APIs, honest benchmarks#1
cigrainger merged 7 commits intomainfrom
v0.2.0

cigrainger commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cigrainger commented Mar 27, 2026

Summary

Benchmark highlights (Apple Silicon, Python 3.14, lxml 6.0)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant