Avoid overallocating arrays in coalesce primitives / views#9132
Avoid overallocating arrays in coalesce primitives / views#9132Dandandan merged 10 commits intoapache:mainfrom
Conversation
|
run benchmark coalesce_kernels |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Small improvement (~8% on my machine) on low selectivity bench (80%) (avoiding one reallocation). |
alamb
left a comment
There was a problem hiding this comment.
Nice find @Dandandan ❤️
I am slightly worried about this being a bomb for future developers but I have an suggestion of how to fix it
|
run benchmark coalesce_kernels |
|
Results on my machine in the PR description |
|
show benchmark queue |
|
🤖 Hi @alamb, you asked to view the benchmark queue (#9132 (comment)).
|
|
Thanks, next up |
|
show benchmark queue |
|
🤖 Hi @Dandandan, you asked to view the benchmark queue (#9132 (comment)).
|
|
show benchmark queue |
|
🤖 Hi @Dandandan, you asked to view the benchmark queue (#9132 (comment)).
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
looks like a nice set of improvements |
# Which issue does this PR close? - Closes apache#9135 # Rationale for this change The code was calling `.reserve(batch_size)` which reserves space to at least `batch_size` additional elements (https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve). This also improves performance a bit: ``` filter: primitive, 8192, nulls: 0, selectivity: 0.001 time: [59.509 ms 59.660 ms 59.856 ms] change: [−3.0781% −2.7917% −2.4795%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe filter: primitive, 8192, nulls: 0, selectivity: 0.01 time: [6.0072 ms 6.0226 ms 6.0428 ms] change: [−8.7042% −7.1161% −6.0455%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe Benchmarking filter: primitive, 8192, nulls: 0, selectivity: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.5s, enable flat sampling, or reduce sample count to 50. filter: primitive, 8192, nulls: 0, selectivity: 0.1 time: [1.8664 ms 1.8709 ms 1.8772 ms] change: [−15.187% −14.905% −14.632%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe filter: primitive, 8192, nulls: 0, selectivity: 0.8 time: [2.5191 ms 2.5444 ms 2.5717 ms] change: [−13.064% −11.414% −10.003%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe Benchmarking filter: primitive, 8192, nulls: 0.1, selectivity: 0.001: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, or reduce sample count to 60. filter: primitive, 8192, nulls: 0.1, selectivity: 0.001 time: [76.422 ms 76.671 ms 76.973 ms] change: [−5.5096% −4.0229% −2.8048%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe filter: primitive, 8192, nulls: 0.1, selectivity: 0.01 time: [10.197 ms 10.228 ms 10.262 ms] change: [−3.6627% −3.0569% −2.4919%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild filter: primitive, 8192, nulls: 0.1, selectivity: 0.1 time: [4.6635 ms 4.6750 ms 4.6915 ms] change: [−9.4939% −8.5908% −7.8383%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe filter: primitive, 8192, nulls: 0.1, selectivity: 0.8 time: [4.7777 ms 4.8115 ms 4.8467 ms] change: [−9.9226% −9.1384% −8.3813%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ``` # What changes are included in this PR? Changes it to use `self.views.reserve(self.batch_size - self.views.len())` to avoid allocating more than necessary (i.e. 2x the amount). # Are these changes tested? Existing tests # Are there any user-facing changes?
Which issue does this PR close?
Rationale for this change
The code was calling
.reserve(batch_size)which reserves space to at leastbatch_sizeadditional elements (https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve).This also improves performance a bit:
What changes are included in this PR?
Changes it to use
self.views.reserve(self.batch_size - self.views.len())to avoid allocating more than necessary (i.e. 2x the amount).Are these changes tested?
Existing tests
Are there any user-facing changes?