Sort arrays with SIMD vqsort in Python
VQSort supports 16,32,64 bit numeric types (signed/unsigned integer and floats).
import vqsort
import numpy as np
arr = np.random.randint(0, 1000, size=[100000], dtype=np.uint32)
# still need to assign in case type is not supported by vqsort such as 8-bit
arr = vqsort.sort(arr, in_place=True)After testing on Apple Silicon M3, floats show an advantage, integers do not.
uint64
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 0.53 | 0.63 | 1.21x
300000 | 0.49 | 0.60 | 1.22x
1000000 | 0.46 | 0.56 | 1.22x
3000000 | 0.43 | 0.53 | 1.23x
10000000 | 0.41 | 0.49 | 1.22x
30000000 | 0.38 | 0.47 | 1.22x
uint32
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 1.87 | 1.80 | 0.96x
300000 | 1.72 | 1.70 | 0.99x
1000000 | 1.55 | 1.54 | 0.99x
3000000 | 1.43 | 1.43 | 1.00x
10000000 | 1.31 | 1.33 | 1.01x
30000000 | 1.23 | 1.24 | 1.01x
uint16
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 0.20 | 1.43 | 7.20x
300000 | 0.19 | 1.27 | 6.84x
1000000 | 0.18 | 1.13 | 6.31x
3000000 | 0.18 | 1.03 | 5.70x
10000000 | 0.18 | 1.04 | 5.69x
30000000 | 0.18 | 1.15 | 6.33x
float64
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 0.88 | 0.67 | 0.77x
300000 | 0.83 | 0.64 | 0.77x
1000000 | 0.71 | 0.55 | 0.78x
3000000 | 0.70 | 0.56 | 0.80x
10000000 | 0.62 | 0.50 | 0.80x
30000000 | 0.58 | 0.47 | 0.81x
float32
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 1.71 | 1.63 | 0.95x
300000 | 1.59 | 1.53 | 0.96x
1000000 | 1.41 | 1.41 | 1.00x
3000000 | 1.32 | 1.32 | 1.00x
10000000 | 1.16 | 1.17 | 1.02x
30000000 | 1.08 | 1.10 | 1.02x
float16
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
Abort at vqsort_f16a.cc:37: Assert 0:
Aborted (core dumped)
uint64
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 0.86 | 0.87 | 1.01x
300000 | 0.79 | 0.79 | 1.01x
1000000 | 0.74 | 0.74 | 1.01x
3000000 | 0.69 | 0.70 | 1.02x
10000000 | 0.66 | 0.66 | 1.01x
30000000 | 0.62 | 0.62 | 1.00x
uint32
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 2.03 | 2.02 | 0.99x
300000 | 1.86 | 1.85 | 0.99x
1000000 | 1.64 | 1.63 | 0.99x
3000000 | 1.51 | 1.50 | 0.99x
10000000 | 1.45 | 1.45 | 1.00x
30000000 | 1.36 | 1.35 | 1.00x
uint16
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 3.76 | 3.72 | 0.99x
300000 | 3.37 | 3.38 | 1.00x
1000000 | 3.08 | 3.07 | 1.00x
3000000 | 2.81 | 2.82 | 1.00x
10000000 | 3.64 | 3.69 | 1.01x
30000000 | 4.02 | 4.07 | 1.01x
float64
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 0.69 | 0.92 | 1.34x
300000 | 0.64 | 0.86 | 1.35x
1000000 | 0.58 | 0.79 | 1.37x
3000000 | 0.53 | 0.73 | 1.38x
10000000 | 0.49 | 0.69 | 1.38x
30000000 | 0.46 | 0.64 | 1.38x
float32
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 1.40 | 1.98 | 1.41x
300000 | 1.26 | 1.81 | 1.43x
1000000 | 1.14 | 1.66 | 1.46x
3000000 | 1.04 | 1.55 | 1.48x
10000000 | 0.96 | 1.42 | 1.49x
30000000 | 0.88 | 1.26 | 1.43x
float16
N | numpy GB/s | vqsort GB/s | advantage
--------------------------------------------------
100000 | 2.21 | 3.47 | 1.57x
300000 | 2.10 | 3.78 | 1.80x
1000000 | 2.21 | 4.24 | 1.91x
3000000 | 2.40 | 5.04 | 2.10x
10000000 | 2.58 | 5.41 | 2.09x
30000000 | 2.59 | 5.31 | 2.05x