You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf: Vectorize get_chunk_slice for faster sharded writes (zarr-developers#3713)
* perf: Skip bounds check for initial elements in 2^n hypercube
* lint:Use a list comprehension rather than a for loop
* pref:Add decode_morton_vectorized
* perf:Replace math.log2() with bit_length()
* perf:Use magic numbers for 2D and 3D
* perf:Add 4D Morton magic numbers
* perf:Add Morton magic numbers for 5D
* perf:Remove singleton dimensions to reduce ndims
* Add changes
* fix:Address type annotation and linting issues
* perf:Remove magic number functions
* test:Add power of 2 sharding indexing tests
* test: Add Morton order benchmarks with cache clearing
Add benchmarks that clear the _morton_order LRU cache before each
iteration to measure the full Morton computation cost:
- test_sharded_morton_indexing: 512-4096 chunks per shard
- test_sharded_morton_indexing_large: 32768 chunks per shard
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* fix:Bound LRU cache of _morton_order to 16
* test:Add a single chunk test for a large shard
* test:Add indexing benchmarks for writing
* tests:Add single chunk write test for sharding
* perf: Vectorize get_chunk_slice for faster sharded writes
Add vectorized methods to _ShardIndex and _ShardReader for batch
chunk slice lookups, reducing per-chunk function call overhead
when writing to shards.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* refactor: Return ndarray from _morton_order, simplify to_dict_vectorized
_morton_order now returns a read-only npt.NDArray[np.intp] (annotated as
Iterable[Sequence[int]]) instead of a tuple of tuples, eliminating the
intermediate list-of-tuples allocation. morton_order_iter converts rows to
tuples on the fly. to_dict_vectorized no longer requires a redundant
chunk_coords_tuples argument; tuple conversion happens inline during dict
population. get_chunk_slices_vectorized accepts any integer array dtype
(npt.NDArray[np.integer[Any]]) and casts to uint64 internally.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* perf: Cache tuple keys separately from ndarray in _morton_order_keys
Add _morton_order_keys() as a second lru_cache that converts the ndarray
returned by _morton_order into a tuple of tuples. This restores cached
access to hashable chunk coordinate keys without reverting to the old
dual-argument interface. morton_order_iter now uses _morton_order_keys,
and to_dict_vectorized derives its keys from _morton_order_keys internally
using the shard index shape, keeping the call site single-argument.
Result: test_sharded_morton_write_single_chunk[(32,32,32)] improves from
~33ms to ~7ms (~5x speedup over prior to this PR's changes).
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* tests: Clear _morton_order_keys cache alongside _morton_order in benchmarks
All benchmark functions that call _morton_order.cache_clear() now also
call _morton_order_keys.cache_clear() to ensure both caches are reset
before each benchmark iteration.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
* refactor: Use npt.NDArray[np.intp] as return type for _morton_order
More precise than Iterable[Sequence[int]] and accurately reflects the
actual return value. Remove the now-unused Iterable import.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
---------
Co-authored-by: Claude Opus 4.5 <[email protected]>
Co-authored-by: Davis Bennett <[email protected]>
0 commit comments