Skip to content

Avoid overallocating arrays in coalesce primitives / views#9132

Merged
Dandandan merged 10 commits intoapache:mainfrom
Dandandan:avoid_over_allocating
Jan 10, 2026
Merged

Avoid overallocating arrays in coalesce primitives / views#9132
Dandandan merged 10 commits intoapache:mainfrom
Dandandan:avoid_over_allocating

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Jan 10, 2026

Which issue does this PR close?

Rationale for this change

The code was calling .reserve(batch_size) which reserves space to at least batch_size additional elements (https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve).

This also improves performance a bit:

filter: primitive, 8192, nulls: 0, selectivity: 0.001
                        time:   [59.509 ms 59.660 ms 59.856 ms]
                        change: [−3.0781% −2.7917% −2.4795%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

filter: primitive, 8192, nulls: 0, selectivity: 0.01
                        time:   [6.0072 ms 6.0226 ms 6.0428 ms]
                        change: [−8.7042% −7.1161% −6.0455%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking filter: primitive, 8192, nulls: 0, selectivity: 0.1: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.5s, enable flat sampling, or reduce sample count to 50.
filter: primitive, 8192, nulls: 0, selectivity: 0.1
                        time:   [1.8664 ms 1.8709 ms 1.8772 ms]
                        change: [−15.187% −14.905% −14.632%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

filter: primitive, 8192, nulls: 0, selectivity: 0.8
                        time:   [2.5191 ms 2.5444 ms 2.5717 ms]
                        change: [−13.064% −11.414% −10.003%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

Benchmarking filter: primitive, 8192, nulls: 0.1, selectivity: 0.001: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, or reduce sample count to 60.
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001
                        time:   [76.422 ms 76.671 ms 76.973 ms]
                        change: [−5.5096% −4.0229% −2.8048%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.01
                        time:   [10.197 ms 10.228 ms 10.262 ms]
                        change: [−3.6627% −3.0569% −2.4919%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

filter: primitive, 8192, nulls: 0.1, selectivity: 0.1
                        time:   [4.6635 ms 4.6750 ms 4.6915 ms]
                        change: [−9.4939% −8.5908% −7.8383%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.8
                        time:   [4.7777 ms 4.8115 ms 4.8467 ms]
                        change: [−9.9226% −9.1384% −8.3813%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

What changes are included in this PR?

Changes it to use self.views.reserve(self.batch_size - self.views.len()) to avoid allocating more than necessary (i.e. 2x the amount).

Are these changes tested?

Existing tests

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 10, 2026
@Dandandan
Copy link
Contributor Author

run benchmark coalesce_kernels

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing avoid_over_allocating (974232e) to 96637fc diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=avoid_over_allocating
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                avoid_over_allocating                  main
-----                                                                                ---------------------                  ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.00    255.1±5.13ms        ? ?/sec    1.00    255.1±4.95ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.00      8.5±0.14ms        ? ?/sec    1.00      8.5±0.08ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      3.9±0.11ms        ? ?/sec    1.04      4.1±0.16ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.00      3.4±0.04ms        ? ?/sec    1.03      3.5±0.03ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.28    309.6±6.07ms        ? ?/sec    1.00    241.0±4.45ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.00      9.1±0.14ms        ? ?/sec    1.02      9.4±0.13ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.00      4.4±0.11ms        ? ?/sec    1.04      4.6±0.14ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.00      4.6±0.08ms        ? ?/sec    1.03      4.7±0.14ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.00     57.4±0.45ms        ? ?/sec    1.02     58.6±0.51ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.00     11.3±0.17ms        ? ?/sec    1.02     11.5±0.18ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.00      9.3±0.29ms        ? ?/sec    1.00      9.3±0.38ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.23      9.9±0.21ms        ? ?/sec    1.00      8.0±0.23ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.00     67.6±1.70ms        ? ?/sec    1.02     69.3±0.57ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.00     12.4±0.24ms        ? ?/sec    1.01     12.5±0.11ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.01      9.9±0.32ms        ? ?/sec    1.00      9.8±0.31ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.00      9.5±0.25ms        ? ?/sec    1.02      9.7±0.23ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.00     47.2±0.35ms        ? ?/sec    1.02     48.0±0.42ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.00      5.7±0.04ms        ? ?/sec    1.04      5.9±0.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.00      4.3±0.16ms        ? ?/sec    1.05      4.6±0.20ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.00      2.8±0.07ms        ? ?/sec    1.12      3.1±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.00     57.6±1.23ms        ? ?/sec    1.01     58.4±0.88ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.00      7.8±0.08ms        ? ?/sec    1.03      8.0±0.08ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.01      5.6±0.18ms        ? ?/sec    1.00      5.5±0.09ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.8±0.07ms        ? ?/sec    1.04      4.0±0.05ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.00     41.0±0.30ms        ? ?/sec    1.01     41.6±0.40ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.00      4.4±0.02ms        ? ?/sec    1.05      4.6±0.21ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.00      2.3±0.20ms        ? ?/sec    1.08      2.5±0.21ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.00  1344.5±22.48µs        ? ?/sec    1.13  1520.9±23.11µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.00     50.5±0.23ms        ? ?/sec    1.02     51.4±1.10ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.00      6.8±0.04ms        ? ?/sec    1.01      6.9±0.10ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.00      3.4±0.22ms        ? ?/sec    1.11      3.8±0.22ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.00      3.8±0.02ms        ? ?/sec    1.06      4.0±0.03ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.001                                1.00     94.4±1.58ms        ? ?/sec    1.01     95.6±1.98ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                 1.00      8.8±0.08ms        ? ?/sec    1.03      9.1±0.04ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                  1.00      3.4±0.26ms        ? ?/sec    1.09      3.8±0.06ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                  1.00      2.8±0.02ms        ? ?/sec    1.10      3.1±0.03ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                              1.00    122.3±2.16ms        ? ?/sec    1.02    124.4±0.76ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                               1.00     14.5±0.10ms        ? ?/sec    1.02     14.8±0.22ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                1.00      6.7±0.39ms        ? ?/sec    1.06      7.1±0.43ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                1.00      9.0±0.33ms        ? ?/sec    1.04      9.4±0.07ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.00     64.8±1.60ms        ? ?/sec    1.00     64.5±0.73ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.01      7.4±0.08ms        ? ?/sec    1.00      7.3±0.16ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.00      4.0±0.40ms        ? ?/sec    1.03      4.2±0.23ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            1.00  1383.6±25.90µs        ? ?/sec    1.12  1553.5±100.89µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.00     82.0±1.26ms        ? ?/sec    1.01     82.5±2.08ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.00     10.8±0.23ms        ? ?/sec    1.07     11.5±0.23ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.00      5.1±0.15ms        ? ?/sec    1.03      5.2±0.15ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.00      3.9±0.03ms        ? ?/sec    1.04      4.1±0.03ms        ? ?/sec

@Dandandan
Copy link
Contributor Author

Dandandan commented Jan 10, 2026

Small improvement (~8% on my machine) on low selectivity bench (80%) (avoiding one reallocation).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find @Dandandan ❤️

I am slightly worried about this being a bomb for future developers but I have an suggestion of how to fix it

@Dandandan
Copy link
Contributor Author

run benchmark coalesce_kernels

@Dandandan
Copy link
Contributor Author

Results on my machine in the PR description

@Dandandan Dandandan requested a review from alamb January 10, 2026 13:01
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@alamb
Copy link
Contributor

alamb commented Jan 10, 2026

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @alamb, you asked to view the benchmark queue (#9132 (comment)).

Job User Benchmarks Comment
arrow-9121-3732535494.sh alamb arrow_reader https://github.com/apache/arrow-rs/pull/9121#issuecomment-3732535494
arrow-9079-3732578277.sh alamb row_format https://github.com/apache/arrow-rs/pull/9079#issuecomment-3732578277
arrow-9080-3732579469.sh alamb row_format https://github.com/apache/arrow-rs/pull/9080#issuecomment-3732579469
arrow-9078-3732580631.sh alamb row_format https://github.com/apache/arrow-rs/pull/9078#issuecomment-3732580631
arrow-9132-3732631349.sh Dandandan coalesce_kernels https://github.com/apache/arrow-rs/pull/9132#issuecomment-3732631349
arrow-9091-3732700110.sh alamb json-reader https://github.com/apache/arrow-rs/pull/9091#issuecomment-3732700110
arrow-9086-3732582609.sh alamb json-reader https://github.com/apache/arrow-rs/pull/9086#issuecomment-3732582609

@Dandandan
Copy link
Contributor Author

Thanks, next up push_batch_with_filter!

@Dandandan
Copy link
Contributor Author

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @Dandandan, you asked to view the benchmark queue (#9132 (comment)).

Job User Benchmarks Comment
arrow-9121-3732535494.sh alamb arrow_reader https://github.com/apache/arrow-rs/pull/9121#issuecomment-3732535494
arrow-9079-3732578277.sh alamb row_format https://github.com/apache/arrow-rs/pull/9079#issuecomment-3732578277
arrow-9080-3732579469.sh alamb row_format https://github.com/apache/arrow-rs/pull/9080#issuecomment-3732579469
arrow-9078-3732580631.sh alamb row_format https://github.com/apache/arrow-rs/pull/9078#issuecomment-3732580631
arrow-9132-3732631349.sh Dandandan coalesce_kernels https://github.com/apache/arrow-rs/pull/9132#issuecomment-3732631349
arrow-9091-3732700110.sh alamb json-reader https://github.com/apache/arrow-rs/pull/9091#issuecomment-3732700110
arrow-9086-3732582609.sh alamb json-reader https://github.com/apache/arrow-rs/pull/9086#issuecomment-3732582609

@Dandandan
Copy link
Contributor Author

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @Dandandan, you asked to view the benchmark queue (#9132 (comment)).

Job User Benchmarks Comment
arrow-9080-3732579469.sh alamb row_format https://github.com/apache/arrow-rs/pull/9080#issuecomment-3732579469
arrow-9078-3732580631.sh alamb row_format https://github.com/apache/arrow-rs/pull/9078#issuecomment-3732580631
arrow-9132-3732631349.sh Dandandan coalesce_kernels https://github.com/apache/arrow-rs/pull/9132#issuecomment-3732631349
arrow-9091-3732700110.sh alamb json-reader https://github.com/apache/arrow-rs/pull/9091#issuecomment-3732700110
arrow-9086-3732582609.sh alamb json-reader https://github.com/apache/arrow-rs/pull/9086#issuecomment-3732582609
19728_3732883138.sh Dandandan default https://github.com/apache/datafusion/pull/19728#issuecomment-3732883138
19390_3732922549.sh geoffreyclaude in_list https://github.com/apache/datafusion/pull/19390#issuecomment-3732922549

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing avoid_over_allocating (36149ff) to 96637fc diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=avoid_over_allocating
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                avoid_over_allocating                  main
-----                                                                                ---------------------                  ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.00    254.7±3.91ms        ? ?/sec    1.00    255.7±4.99ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.00      8.5±0.12ms        ? ?/sec    1.01      8.6±0.14ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      3.9±0.12ms        ? ?/sec    1.05      4.1±0.20ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.00      3.4±0.09ms        ? ?/sec    1.03      3.5±0.05ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.31    314.0±7.84ms        ? ?/sec    1.00    239.9±3.88ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.00      9.2±0.22ms        ? ?/sec    1.02      9.4±0.13ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.00      4.4±0.15ms        ? ?/sec    1.02      4.5±0.05ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.00      4.5±0.03ms        ? ?/sec    1.04      4.7±0.03ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.00     57.9±2.05ms        ? ?/sec    1.00     57.9±0.97ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.01     11.3±0.20ms        ? ?/sec    1.00     11.2±0.15ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.00      9.1±0.36ms        ? ?/sec    1.08      9.9±0.28ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.21      9.9±0.40ms        ? ?/sec    1.00      8.2±0.56ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.00     67.8±0.78ms        ? ?/sec    1.01     68.5±1.33ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.00     12.4±0.26ms        ? ?/sec    1.01     12.6±0.16ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.00      9.5±0.40ms        ? ?/sec    1.02      9.7±0.28ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.00      9.6±0.39ms        ? ?/sec    1.03      9.9±0.51ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.00     46.9±0.40ms        ? ?/sec    1.02     47.9±0.59ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.00      5.6±0.09ms        ? ?/sec    1.05      5.9±0.05ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.00      4.3±0.16ms        ? ?/sec    1.06      4.5±0.26ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.00      2.8±0.06ms        ? ?/sec    1.12      3.1±0.06ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.00     56.4±0.62ms        ? ?/sec    1.03     58.4±0.44ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.00      7.8±0.11ms        ? ?/sec    1.04      8.1±0.08ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.00      5.4±0.24ms        ? ?/sec    1.00      5.4±0.21ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.8±0.07ms        ? ?/sec    1.05      3.9±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.00     40.9±0.44ms        ? ?/sec    1.03     42.0±0.50ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.00      4.4±0.02ms        ? ?/sec    1.04      4.6±0.10ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.00      2.2±0.14ms        ? ?/sec    1.14      2.5±0.22ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.00   1325.4±9.09µs        ? ?/sec    1.17  1544.3±18.08µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.00     50.4±0.21ms        ? ?/sec    1.03     51.7±0.50ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.00      6.9±0.14ms        ? ?/sec    1.01      7.0±0.05ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.00      3.4±0.12ms        ? ?/sec    1.05      3.6±0.05ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.00      3.8±0.03ms        ? ?/sec    1.06      4.0±0.03ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.001                                1.00     94.6±0.73ms        ? ?/sec    1.02     96.2±2.38ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                 1.00      8.9±0.37ms        ? ?/sec    1.02      9.1±0.13ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                  1.00      3.6±0.35ms        ? ?/sec    1.03      3.7±0.09ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                  1.00      2.8±0.03ms        ? ?/sec    1.10      3.1±0.03ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                              1.00    123.1±2.66ms        ? ?/sec    1.01    124.5±2.24ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                               1.00     14.5±0.22ms        ? ?/sec    1.01     14.7±0.33ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                1.00      6.5±0.18ms        ? ?/sec    1.10      7.1±0.39ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                1.00      8.8±0.04ms        ? ?/sec    1.04      9.2±0.14ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.00     64.1±0.85ms        ? ?/sec    1.01     64.7±0.67ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.03      7.5±0.25ms        ? ?/sec    1.00      7.3±0.04ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.00      4.0±0.38ms        ? ?/sec    1.01      4.0±0.25ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            1.00   1378.2±7.55µs        ? ?/sec    1.00   1378.2±8.63µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.00     81.9±0.97ms        ? ?/sec    1.01     82.7±0.82ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.01     11.0±0.16ms        ? ?/sec    1.00     10.9±0.14ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.02      5.3±0.32ms        ? ?/sec    1.00      5.2±0.14ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.00      3.8±0.02ms        ? ?/sec    1.04      3.9±0.02ms        ? ?/sec

@Dandandan Dandandan merged commit 200fb06 into apache:main Jan 10, 2026
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 10, 2026

looks like a nice set of improvements

Dandandan added a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
# Which issue does this PR close?

- Closes apache#9135 

# Rationale for this change
The code was calling `.reserve(batch_size)` which reserves space to at
least `batch_size` additional elements
(https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve).

This also improves performance a bit:

```
filter: primitive, 8192, nulls: 0, selectivity: 0.001
                        time:   [59.509 ms 59.660 ms 59.856 ms]
                        change: [−3.0781% −2.7917% −2.4795%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

filter: primitive, 8192, nulls: 0, selectivity: 0.01
                        time:   [6.0072 ms 6.0226 ms 6.0428 ms]
                        change: [−8.7042% −7.1161% −6.0455%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking filter: primitive, 8192, nulls: 0, selectivity: 0.1: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.5s, enable flat sampling, or reduce sample count to 50.
filter: primitive, 8192, nulls: 0, selectivity: 0.1
                        time:   [1.8664 ms 1.8709 ms 1.8772 ms]
                        change: [−15.187% −14.905% −14.632%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

filter: primitive, 8192, nulls: 0, selectivity: 0.8
                        time:   [2.5191 ms 2.5444 ms 2.5717 ms]
                        change: [−13.064% −11.414% −10.003%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

Benchmarking filter: primitive, 8192, nulls: 0.1, selectivity: 0.001: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, or reduce sample count to 60.
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001
                        time:   [76.422 ms 76.671 ms 76.973 ms]
                        change: [−5.5096% −4.0229% −2.8048%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.01
                        time:   [10.197 ms 10.228 ms 10.262 ms]
                        change: [−3.6627% −3.0569% −2.4919%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

filter: primitive, 8192, nulls: 0.1, selectivity: 0.1
                        time:   [4.6635 ms 4.6750 ms 4.6915 ms]
                        change: [−9.4939% −8.5908% −7.8383%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.8
                        time:   [4.7777 ms 4.8115 ms 4.8467 ms]
                        change: [−9.9226% −9.1384% −8.3813%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
```

# What changes are included in this PR?
Changes it to use `self.views.reserve(self.batch_size -
self.views.len())` to avoid allocating more than necessary (i.e. 2x the
amount).

# Are these changes tested?

Existing tests

# Are there any user-facing changes?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Overallocating arrays in coalesce primitives / views

3 participants