Skip to content

Tags: ml-rust/boostr

Tags

v0.1.0

Toggle v0.1.0's commit message
feat(quant/cpu): add NEON fused dequant+dot kernels for aarch64

Implement fused dequantization and dot product kernels for Q4_K,
Q5_K, and Q6_K formats using ARM NEON intrinsics. Adds a shared
horizontal sum helper (dot_f32.rs) used across all three kernels.

These kernels are the aarch64 counterpart to the existing x86 AVX2
implementations and are wired into the dispatch logic added in the
previous commit.