Tags: ml-rust/boostr
Tags
feat(quant/cpu): add NEON fused dequant+dot kernels for aarch64 Implement fused dequantization and dot product kernels for Q4_K, Q5_K, and Q6_K formats using ARM NEON intrinsics. Adds a shared horizontal sum helper (dot_f32.rs) used across all three kernels. These kernels are the aarch64 counterpart to the existing x86 AVX2 implementations and are wired into the dispatch logic added in the previous commit.