Skip to content

v0.2.0: Interned tags, batch APIs, honest benchmarks#1

Merged
cigrainger merged 7 commits intomainfrom
v0.2.0
Mar 27, 2026
Merged

v0.2.0: Interned tags, batch APIs, honest benchmarks#1
cigrainger merged 7 commits intomainfrom
v0.2.0

Conversation

@cigrainger
Copy link
Copy Markdown
Contributor

Summary

  • Tag name interning: ~20-200 unique names created as Python str once at parse time. .tag is now a refcount bump (zero copy, zero FFI lookup)
  • Batch traversal APIs: child_tags() and descendant_tags() return all tag names in a single FFI call — 25x faster than lxml on large documents
  • Honest benchmarks: GC disabled, warmup, three corpus types (catalog/PubMed/POM), elements-to-elements comparison, traversal weakness documented
  • Fixed namespace artifact: POM benchmark had xmlns causing lxml to return 0 results, making it appear faster
  • Eliminated double-borrows in Element construction when callers already hold Document reference

Benchmark highlights (Apple Silicon, Python 3.14, lxml 6.0)

Operation simdxml lxml Speedup
Parse 17MB catalog 59 ms 87 ms 1.5x
XPath //item[@cat="5"] 17MB 1.8 ms 74 ms 41x
XPath text //name 17MB 3.3 ms 40 ms 12x
child_tags() 17MB 0.44 ms 11.8 ms* 27x
//PubmedArticle 17MB 0.37 ms 9.4 ms 25x

* lxml comparison is [e.tag for e in root]

Test plan

  • 191 tests passing
  • ruff clean
  • pyright 0 errors
  • Benchmarks run and README updated with fresh numbers

🤖 Generated with Claude Code

cigrainger and others added 7 commits March 27, 2026 21:16
Performance:
- Intern unique tag names as Python strings at parse time (~20-200
  unique names). Element.tag is now a refcount bump, not a string copy.
- Eagerly cache tag on Element creation (zero FFI on .tag access).
- New batch APIs: child_tags(), descendant_tags() — single FFI call
  for all results using interned strings. 25x faster than lxml on
  large-document traversal.
- Eliminate double-borrow in make_element when callers already hold
  a Document reference.

Benchmarks:
- GC disabled, 3 warmup + 20 timed iterations, median reported.
- Three corpus types: catalog (data), PubMed (document), POM (config).
- XPath benchmarks compare elements-to-elements (fair).
- Fixed POM namespace artifact that made lxml appear faster (it was
  returning 0 results due to xmlns mismatch).
- Traversal section honestly shows per-element FFI overhead alongside
  the batch API that eliminates it.
- Removed POM xmlns from benchmark corpus for fair cross-library
  comparison.

README:
- Updated all benchmark tables with v0.2.0 numbers.
- Documents batch APIs in API section.
- Notes that parse() includes index construction.
- Honest framing of traversal trade-offs.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Review fixes:
- build_meta returns unique_names directly (was built then dropped)
- build_interned_names reduced from 27 lines to 3 (takes &[String])
- name_map borrows &str from index instead of cloning per entry
- Mutation methods documented as raising TypeError
- All user-facing docstrings cleaned: no implementation details,
  no FFI/Rust/interning/refcount language
- Rich .pyi stubs with docstrings for IDE hover

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Performance optimizations addressing PyO3 overhead analysis:

1. Zero-copy parse for bytes input (#6): DocumentOwner enum uses
   PyBackedBytes to borrow directly from Python bytes object's
   internal buffer, avoiding a full memcpy of the XML document.
   str input still copies (Python str -> UTF-8 encoding required).

2. Eliminate String intermediaries (#4): All text-returning methods
   (xpath_text, xpath_string, .text, .tail, .get, .keys, .items,
   itertext, text_content, tostring) now return Py<PyString> built
   directly from &str slices. Skips Rust String allocation that
   PyO3 would then copy again into Python.

3. interned_tag_fast (#3): Hot paths (child_tags, descendant_tags,
   make_element_borrowed, make_elements) now accept &IndexWithMeta
   directly, avoiding redundant borrow_dependent() calls in tight
   loops.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
#1 Iterator pre-caching: ElementIterator pre-builds all (tag_idx,
   cached_tag) pairs in a single Document borrow at creation time.
   __next__ no longer borrows Document — just clone_ref on pre-cached
   values.

#2 Lazy ElementList: Element.xpath() and CompiledXPath.eval() now
   return ElementList — holds one Py<Document> + Vec<usize> of tag
   indices. Elements created on demand via __getitem__/__iter__.
   compiled.eval() for 100K results: 4ms -> 0.07ms (57x faster).
   Supports __len__, __getitem__, __iter__, __bool__, __eq__ (with
   list comparison).

#7 O(1) sibling lookup: child_positions[i] stored in IndexWithMeta
   at parse time. getnext/getprevious use direct index instead of
   linear scan over siblings. O(1) instead of O(siblings).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…gation

Major refactor to use simdxml's python-bindings-api branch directly:

- Drop IndexWithMeta entirely — no more custom parents, name_ids,
  child_positions, or unique_names. self_cell dependent is now just
  XmlIndex directly.
- .text uses upstream direct_text_first() — zero allocation
- .tail uses upstream tail_text() — O(log n) binary search instead
  of O(n) substring search through raw XML
- .getparent() uses upstream parent() — direct array lookup
- .getnext()/.getprevious() use upstream child_position() + child_at()
- __len__ uses upstream child_count() — zero allocation
- __getitem__ uses upstream child_at() — zero allocation
- child_tags/descendant_tags use upstream child_slice() — zero alloc
- attrib/keys/items use upstream attributes() — single-pass parsing
- Tag interning built from upstream name_ids/name_table (no rebuild)
- Cargo.toml: simdxml dependency points to git branch (temporary,
  will switch to crates.io release)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
All numbers improved from upstream API integration:
- Parse: 2.2-3.1x vs lxml (was 1.4-1.8x)
- XPath text: 10-23x vs lxml (was 1.8-14x)
- XPath predicates: up to 42x vs lxml
- Traversal: 3-17x vs lxml via batch API
- No regressions vs lxml on any benchmark

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@cigrainger cigrainger merged commit 39508b5 into main Mar 27, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant