Skip to content

perf: improve calculating length performance for nested arrays in row conversion#9079

Merged
Dandandan merged 4 commits intoapache:mainfrom
rluvaton:improve-performance-of-rows-encoding-for-calculating-rows-length
Jan 14, 2026
Merged

perf: improve calculating length performance for nested arrays in row conversion#9079
Dandandan merged 4 commits intoapache:mainfrom
rluvaton:improve-performance-of-rows-encoding-for-calculating-rows-length

Conversation

@rluvaton
Copy link
Member

@rluvaton rluvaton commented Dec 31, 2025

Which issue does this PR close?

N/A

Rationale for this change

Making the row length calculation faster which result in faster row conversion

What changes are included in this PR?

  1. Instead of iterating over the rows and getting the length from the byte slice, we use the offsets directly, this
  2. Added 3 new APIs for Rows (explained below)

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, added 3 functions to Rows:

  • row_len - get the row length at index
  • row_len_unchecked - get the row length at index without bound checks
  • lengths - get iterator over the lengths of the rows

Related to:

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 31, 2025
@rluvaton
Copy link
Member Author

run benchmark row_format

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-performance-of-rows-encoding-for-calculating-rows-length (314bccb) to 843bee2 diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench row_format
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-performance-of-rows-encoding-for-calculating-rows-length
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                                                         improve-performance-of-rows-encoding-for-calculating-rows-length    main
-----                                                                                                                         ----------------------------------------------------------------    ----
append_rows 10 large_list(0) of u64(0)                                                                                        1.06    679.0±9.64ns        ? ?/sec                                 1.00    641.6±3.59ns        ? ?/sec
append_rows 10 list(0) of u64(0)                                                                                              1.03    724.9±8.12ns        ? ?/sec                                 1.00    701.2±6.21ns        ? ?/sec
append_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                 1.00    370.0±2.66µs        ? ?/sec                                 1.00    368.2±2.37µs        ? ?/sec
append_rows 4096 bool(0, 0.5)                                                                                                 1.00      8.6±0.15µs        ? ?/sec                                 1.00      8.6±0.04µs        ? ?/sec
append_rows 4096 bool(0.3, 0.5)                                                                                               1.00     17.0±0.16µs        ? ?/sec                                 1.00     17.0±0.16µs        ? ?/sec
append_rows 4096 i64(0)                                                                                                       1.00      7.7±0.23µs        ? ?/sec                                 1.01      7.8±0.20µs        ? ?/sec
append_rows 4096 i64(0.3)                                                                                                     1.00     15.3±0.16µs        ? ?/sec                                 1.01     15.4±0.64µs        ? ?/sec
append_rows 4096 large_list(0) of u64(0)                                                                                      1.04    169.2±1.98µs        ? ?/sec                                 1.00    163.0±1.18µs        ? ?/sec
append_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                         1.05   960.2±14.65ns        ? ?/sec                                 1.00    916.8±9.70ns        ? ?/sec
append_rows 4096 list(0) of u64(0)                                                                                            1.00    165.0±0.67µs        ? ?/sec                                 1.01    166.2±1.54µs        ? ?/sec
append_rows 4096 list(0) sliced to 10 of u64(0)                                                                               1.02   1032.2±4.55ns        ? ?/sec                                 1.00   1014.1±6.97ns        ? ?/sec
append_rows 4096 string view(1..100, 0)                                                                                       1.00    114.4±1.49µs        ? ?/sec                                 1.00    114.9±1.59µs        ? ?/sec
append_rows 4096 string view(1..100, 0.5)                                                                                     1.01    103.3±4.52µs        ? ?/sec                                 1.00    102.6±0.69µs        ? ?/sec
append_rows 4096 string view(10, 0)                                                                                           1.00     52.0±0.36µs        ? ?/sec                                 1.00     52.1±1.26µs        ? ?/sec
append_rows 4096 string view(100, 0)                                                                                          1.00     75.9±1.26µs        ? ?/sec                                 1.01     76.5±0.81µs        ? ?/sec
append_rows 4096 string view(100, 0.5)                                                                                        1.00     85.3±1.02µs        ? ?/sec                                 1.00     85.5±0.61µs        ? ?/sec
append_rows 4096 string view(30, 0)                                                                                           1.00     54.1±0.24µs        ? ?/sec                                 1.01     54.4±1.76µs        ? ?/sec
append_rows 4096 string(10, 0)                                                                                                1.00     48.2±0.64µs        ? ?/sec                                 1.01     48.6±0.79µs        ? ?/sec
append_rows 4096 string(100, 0)                                                                                               1.00     71.5±0.79µs        ? ?/sec                                 1.01     72.0±0.78µs        ? ?/sec
append_rows 4096 string(100, 0.5)                                                                                             1.00     81.6±0.55µs        ? ?/sec                                 1.00     81.5±0.42µs        ? ?/sec
append_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                       1.00    219.9±1.87µs        ? ?/sec                                 1.00    220.2±1.38µs        ? ?/sec
append_rows 4096 string(30, 0)                                                                                                1.00     49.3±0.15µs        ? ?/sec                                 1.00     49.4±0.38µs        ? ?/sec
append_rows 4096 string_dictionary(10, 0)                                                                                     1.01     75.3±0.62µs        ? ?/sec                                 1.00     74.7±0.52µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0)                                                                                    1.01    145.5±3.81µs        ? ?/sec                                 1.00    144.1±1.76µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0.5)                                                                                  1.01    108.8±1.26µs        ? ?/sec                                 1.00    108.1±1.73µs        ? ?/sec
append_rows 4096 string_dictionary(30, 0)                                                                                     1.01     78.5±0.95µs        ? ?/sec                                 1.00     78.1±0.17µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                     1.01     27.2±0.52µs        ? ?/sec                                 1.00     26.9±0.51µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                    1.00     46.5±0.29µs        ? ?/sec                                 1.02     47.5±1.05µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                     1.00     27.2±0.24µs        ? ?/sec                                 1.01     27.3±0.17µs        ? ?/sec
append_rows 4096 u64(0)                                                                                                       1.00      7.6±0.11µs        ? ?/sec                                 1.01      7.7±0.11µs        ? ?/sec
append_rows 4096 u64(0.3)                                                                                                     1.00     14.6±0.11µs        ? ?/sec                                 1.00     14.7±0.16µs        ? ?/sec
convert_columns 10 large_list(0) of u64(0)                                                                                    1.02   964.5±34.57ns        ? ?/sec                                 1.00    945.3±4.36ns        ? ?/sec
convert_columns 10 list(0) of u64(0)                                                                                          1.00  1002.4±12.15ns        ? ?/sec                                 1.00  1002.3±16.59ns        ? ?/sec
convert_columns 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)             1.01    373.4±2.62µs        ? ?/sec                                 1.00    370.5±3.57µs        ? ?/sec
convert_columns 4096 bool(0, 0.5)                                                                                             1.00      8.9±0.11µs        ? ?/sec                                 1.01      8.9±0.05µs        ? ?/sec
convert_columns 4096 bool(0.3, 0.5)                                                                                           1.00     17.3±0.14µs        ? ?/sec                                 1.00     17.2±0.11µs        ? ?/sec
convert_columns 4096 i64(0)                                                                                                   1.02      8.1±0.24µs        ? ?/sec                                 1.00      8.0±0.15µs        ? ?/sec
convert_columns 4096 i64(0.3)                                                                                                 1.00     15.5±0.30µs        ? ?/sec                                 1.00     15.5±0.23µs        ? ?/sec
convert_columns 4096 large_list(0) of u64(0)                                                                                  1.03    169.1±0.53µs        ? ?/sec                                 1.00    163.4±4.11µs        ? ?/sec
convert_columns 4096 large_list(0) sliced to 10 of u64(0)                                                                     1.05  1261.6±37.35ns        ? ?/sec                                 1.00   1206.5±6.41ns        ? ?/sec
convert_columns 4096 list(0) of u64(0)                                                                                        1.00    166.4±2.33µs        ? ?/sec                                 1.00    166.5±1.80µs        ? ?/sec
convert_columns 4096 list(0) sliced to 10 of u64(0)                                                                           1.01  1336.2±17.33ns        ? ?/sec                                 1.00   1325.0±6.66ns        ? ?/sec
convert_columns 4096 string view(1..100, 0)                                                                                   1.00    114.7±0.51µs        ? ?/sec                                 1.00    114.9±0.50µs        ? ?/sec
convert_columns 4096 string view(1..100, 0.5)                                                                                 1.00    103.0±0.83µs        ? ?/sec                                 1.00    103.2±1.40µs        ? ?/sec
convert_columns 4096 string view(10, 0)                                                                                       1.01     53.0±0.23µs        ? ?/sec                                 1.00     52.5±0.24µs        ? ?/sec
convert_columns 4096 string view(100, 0)                                                                                      1.00     77.1±0.88µs        ? ?/sec                                 1.01     77.9±1.87µs        ? ?/sec
convert_columns 4096 string view(100, 0.5)                                                                                    1.00     86.2±1.09µs        ? ?/sec                                 1.00     86.1±0.41µs        ? ?/sec
convert_columns 4096 string view(30, 0)                                                                                       1.01     55.3±2.25µs        ? ?/sec                                 1.00     54.7±1.14µs        ? ?/sec
convert_columns 4096 string(10, 0)                                                                                            1.00     48.1±0.38µs        ? ?/sec                                 1.01     48.8±0.74µs        ? ?/sec
convert_columns 4096 string(100, 0)                                                                                           1.00     72.5±1.56µs        ? ?/sec                                 1.00     72.2±0.68µs        ? ?/sec
convert_columns 4096 string(100, 0.5)                                                                                         1.00     82.2±0.41µs        ? ?/sec                                 1.00     82.0±0.38µs        ? ?/sec
convert_columns 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                   1.00    220.9±1.82µs        ? ?/sec                                 1.01    223.3±3.11µs        ? ?/sec
convert_columns 4096 string(30, 0)                                                                                            1.00     49.7±0.20µs        ? ?/sec                                 1.00     49.8±0.51µs        ? ?/sec
convert_columns 4096 string_dictionary(10, 0)                                                                                 1.00     76.8±1.09µs        ? ?/sec                                 1.00     76.7±0.82µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0)                                                                                1.01    147.5±1.48µs        ? ?/sec                                 1.00    145.8±1.12µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0.5)                                                                              1.00    109.8±0.74µs        ? ?/sec                                 1.00    109.3±1.28µs        ? ?/sec
convert_columns 4096 string_dictionary(30, 0)                                                                                 1.00     79.4±0.41µs        ? ?/sec                                 1.00     79.5±0.38µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(10, 0)                                                                 1.01     28.3±0.33µs        ? ?/sec                                 1.00     27.9±0.28µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(100, 0)                                                                1.00     47.5±0.15µs        ? ?/sec                                 1.00     47.5±1.34µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(30, 0)                                                                 1.01     28.3±0.45µs        ? ?/sec                                 1.00     28.1±0.67µs        ? ?/sec
convert_columns 4096 u64(0)                                                                                                   1.00      7.8±0.13µs        ? ?/sec                                 1.00      7.9±0.12µs        ? ?/sec
convert_columns 4096 u64(0.3)                                                                                                 1.00     14.9±0.08µs        ? ?/sec                                 1.00     14.8±0.13µs        ? ?/sec
convert_columns_prepared 10 large_list(0) of u64(0)                                                                           1.08    759.0±8.64ns        ? ?/sec                                 1.00   700.4±11.06ns        ? ?/sec
convert_columns_prepared 10 list(0) of u64(0)                                                                                 1.06   802.5±21.76ns        ? ?/sec                                 1.00    756.5±7.68ns        ? ?/sec
convert_columns_prepared 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)    1.00    370.8±3.83µs        ? ?/sec                                 1.00   370.1±16.68µs        ? ?/sec
convert_columns_prepared 4096 bool(0, 0.5)                                                                                    1.00      8.8±0.05µs        ? ?/sec                                 1.00      8.7±0.08µs        ? ?/sec
convert_columns_prepared 4096 bool(0.3, 0.5)                                                                                  1.00     17.2±0.22µs        ? ?/sec                                 1.00     17.2±0.12µs        ? ?/sec
convert_columns_prepared 4096 i64(0)                                                                                          1.00      7.8±0.14µs        ? ?/sec                                 1.00      7.8±0.13µs        ? ?/sec
convert_columns_prepared 4096 i64(0.3)                                                                                        1.01     15.5±0.15µs        ? ?/sec                                 1.00     15.4±0.23µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) of u64(0)                                                                         1.04    169.5±0.54µs        ? ?/sec                                 1.00    162.7±0.48µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) sliced to 10 of u64(0)                                                            1.05  1049.8±22.05ns        ? ?/sec                                 1.00  1001.7±38.20ns        ? ?/sec
convert_columns_prepared 4096 list(0) of u64(0)                                                                               1.00    165.4±0.67µs        ? ?/sec                                 1.00    166.2±0.84µs        ? ?/sec
convert_columns_prepared 4096 list(0) sliced to 10 of u64(0)                                                                  1.04  1153.0±12.41ns        ? ?/sec                                 1.00  1106.6±29.55ns        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0)                                                                          1.00    114.8±1.38µs        ? ?/sec                                 1.00    114.6±0.41µs        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0.5)                                                                        1.00    103.0±0.85µs        ? ?/sec                                 1.00    102.9±0.90µs        ? ?/sec
convert_columns_prepared 4096 string view(10, 0)                                                                              1.01     52.6±0.42µs        ? ?/sec                                 1.00     52.2±0.96µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0)                                                                             1.00     76.1±0.73µs        ? ?/sec                                 1.00     76.1±1.04µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0.5)                                                                           1.00     85.9±0.40µs        ? ?/sec                                 1.00     85.6±0.41µs        ? ?/sec
convert_columns_prepared 4096 string view(30, 0)                                                                              1.00     54.0±1.03µs        ? ?/sec                                 1.01     54.4±2.40µs        ? ?/sec
convert_columns_prepared 4096 string(10, 0)                                                                                   1.00     47.8±0.19µs        ? ?/sec                                 1.01     48.5±0.22µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0)                                                                                  1.00     71.9±0.71µs        ? ?/sec                                 1.00     71.7±0.87µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0.5)                                                                                1.01     82.4±3.36µs        ? ?/sec                                 1.00     81.9±0.30µs        ? ?/sec
convert_columns_prepared 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                          1.00    220.9±3.14µs        ? ?/sec                                 1.02   224.2±13.62µs        ? ?/sec
convert_columns_prepared 4096 string(30, 0)                                                                                   1.00     49.7±0.74µs        ? ?/sec                                 1.00     49.6±0.50µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(10, 0)                                                                        1.00     75.3±0.73µs        ? ?/sec                                 1.00     75.2±1.23µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0)                                                                       1.00    144.4±1.75µs        ? ?/sec                                 1.00    144.5±2.05µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0.5)                                                                     1.02    109.3±0.57µs        ? ?/sec                                 1.00    107.6±1.28µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(30, 0)                                                                        1.00     78.8±0.26µs        ? ?/sec                                 1.00     78.4±0.66µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0)                                                        1.02     27.4±0.68µs        ? ?/sec                                 1.00     27.0±0.28µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0)                                                       1.00     46.8±0.29µs        ? ?/sec                                 1.00     46.6±0.21µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0)                                                        1.00     27.3±0.46µs        ? ?/sec                                 1.01     27.5±0.80µs        ? ?/sec
convert_columns_prepared 4096 u64(0)                                                                                          1.00      7.8±0.14µs        ? ?/sec                                 1.01      7.8±0.17µs        ? ?/sec
convert_columns_prepared 4096 u64(0.3)                                                                                        1.00     14.7±0.13µs        ? ?/sec                                 1.00     14.8±0.26µs        ? ?/sec
convert_rows 10 large_list(0) of u64(0)                                                                                       1.03  1562.8±27.98ns        ? ?/sec                                 1.00   1520.5±8.11ns        ? ?/sec
convert_rows 10 list(0) of u64(0)                                                                                             1.02  1734.5±30.12ns        ? ?/sec                                 1.00  1706.9±37.56ns        ? ?/sec
convert_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                1.00    298.8±5.58µs        ? ?/sec                                 1.03   306.3±14.82µs        ? ?/sec
convert_rows 4096 bool(0, 0.5)                                                                                                1.00     16.5±0.46µs        ? ?/sec                                 1.01     16.6±1.30µs        ? ?/sec
convert_rows 4096 bool(0.3, 0.5)                                                                                              1.00     16.5±0.34µs        ? ?/sec                                 1.00     16.5±0.29µs        ? ?/sec
convert_rows 4096 i64(0)                                                                                                      1.00     34.8±0.34µs        ? ?/sec                                 1.00     34.7±0.13µs        ? ?/sec
convert_rows 4096 i64(0.3)                                                                                                    1.01     34.9±1.65µs        ? ?/sec                                 1.00     34.7±0.38µs        ? ?/sec
convert_rows 4096 large_list(0) of u64(0)                                                                                     1.00    268.0±9.28µs        ? ?/sec                                 1.01    269.7±1.45µs        ? ?/sec
convert_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                        1.06      2.1±0.03µs        ? ?/sec                                 1.00  1960.1±24.29ns        ? ?/sec
convert_rows 4096 list(0) of u64(0)                                                                                           1.00    269.7±1.08µs        ? ?/sec                                 1.00    269.6±3.09µs        ? ?/sec
convert_rows 4096 list(0) sliced to 10 of u64(0)                                                                              1.03      2.2±0.04µs        ? ?/sec                                 1.00      2.2±0.02µs        ? ?/sec
convert_rows 4096 string view(1..100, 0)                                                                                      1.01    176.5±4.01µs        ? ?/sec                                 1.00    175.4±0.73µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)                                                                                    1.00    140.6±1.06µs        ? ?/sec                                 1.00    141.0±0.80µs        ? ?/sec
convert_rows 4096 string view(10, 0)                                                                                          1.00     83.3±0.60µs        ? ?/sec                                 1.01     84.3±3.60µs        ? ?/sec
convert_rows 4096 string view(100, 0)                                                                                         1.01    130.1±7.97µs        ? ?/sec                                 1.00    128.6±1.45µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)                                                                                       1.01    119.1±0.80µs        ? ?/sec                                 1.00    117.9±1.11µs        ? ?/sec
convert_rows 4096 string view(30, 0)                                                                                          1.01     95.1±5.43µs        ? ?/sec                                 1.00     94.0±0.44µs        ? ?/sec
convert_rows 4096 string(10, 0)                                                                                               1.00     60.3±0.30µs        ? ?/sec                                 1.00     60.3±0.98µs        ? ?/sec
convert_rows 4096 string(100, 0)                                                                                              1.00    110.3±2.16µs        ? ?/sec                                 1.01    111.1±3.71µs        ? ?/sec
convert_rows 4096 string(100, 0.5)                                                                                            1.00    103.3±0.84µs        ? ?/sec                                 1.00    103.8±3.52µs        ? ?/sec
convert_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                      1.00    302.0±3.21µs        ? ?/sec                                 1.02    307.4±7.47µs        ? ?/sec
convert_rows 4096 string(30, 0)                                                                                               1.00     72.6±2.06µs        ? ?/sec                                 1.01     73.4±3.75µs        ? ?/sec
convert_rows 4096 string_dictionary(10, 0)                                                                                    1.00     60.4±0.76µs        ? ?/sec                                 1.00     60.7±0.65µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0)                                                                                   1.00    110.4±2.89µs        ? ?/sec                                 1.01    111.4±1.99µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0.5)                                                                                 1.00    103.6±1.39µs        ? ?/sec                                 1.00    104.1±2.15µs        ? ?/sec
convert_rows 4096 string_dictionary(30, 0)                                                                                    1.00     72.9±1.13µs        ? ?/sec                                 1.00     72.9±1.67µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                    1.00     60.3±0.73µs        ? ?/sec                                 1.00     60.3±0.19µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                   1.00    110.0±1.84µs        ? ?/sec                                 1.00    110.5±2.12µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                    1.01     73.2±4.36µs        ? ?/sec                                 1.00     72.5±0.43µs        ? ?/sec
convert_rows 4096 u64(0)                                                                                                      1.00     32.0±0.23µs        ? ?/sec                                 1.00     32.0±0.26µs        ? ?/sec
convert_rows 4096 u64(0.3)                                                                                                    1.00     32.0±0.13µs        ? ?/sec                                 1.00     32.0±0.33µs        ? ?/sec
iterate rows                                                                                                                  1.00      2.6±0.07µs        ? ?/sec                                 1.00      2.6±0.01µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jan 10, 2026

run benchmark row_format

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-performance-of-rows-encoding-for-calculating-rows-length (4b4f41c) to 5a1e482 diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench row_format
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-performance-of-rows-encoding-for-calculating-rows-length
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                                                         improve-performance-of-rows-encoding-for-calculating-rows-length    main
-----                                                                                                                         ----------------------------------------------------------------    ----
append_rows 10 large_list(0) of u64(0)                                                                                        1.01   629.0±15.21ns        ? ?/sec                                 1.00    625.2±8.78ns        ? ?/sec
append_rows 10 list(0) of u64(0)                                                                                              1.00    681.2±2.25ns        ? ?/sec                                 1.00    683.6±4.38ns        ? ?/sec
append_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                 1.01   372.7±17.83µs        ? ?/sec                                 1.00    370.7±2.72µs        ? ?/sec
append_rows 4096 53 columns                                                                                                   1.00   1751.0±8.31µs        ? ?/sec                                 1.00  1755.4±12.84µs        ? ?/sec
append_rows 4096 bool(0, 0.5)                                                                                                 1.00      8.6±0.14µs        ? ?/sec                                 1.00      8.6±0.03µs        ? ?/sec
append_rows 4096 bool(0.3, 0.5)                                                                                               1.01     17.1±0.30µs        ? ?/sec                                 1.00     16.9±0.12µs        ? ?/sec
append_rows 4096 i64(0)                                                                                                       1.00      7.9±0.18µs        ? ?/sec                                 1.00      7.9±0.11µs        ? ?/sec
append_rows 4096 i64(0.3)                                                                                                     1.00     17.6±0.11µs        ? ?/sec                                 1.00     17.7±0.21µs        ? ?/sec
append_rows 4096 large_list(0) of u64(0)                                                                                      1.00    165.6±1.79µs        ? ?/sec                                 1.00    165.8±5.82µs        ? ?/sec
append_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                         1.01   899.8±13.45ns        ? ?/sec                                 1.00   888.5±40.06ns        ? ?/sec
append_rows 4096 list(0) of u64(0)                                                                                            1.07    172.0±3.15µs        ? ?/sec                                 1.00    161.1±0.66µs        ? ?/sec
append_rows 4096 list(0) sliced to 10 of u64(0)                                                                               1.01   1006.4±7.54ns        ? ?/sec                                 1.00   991.7±48.07ns        ? ?/sec
append_rows 4096 string view(1..100, 0)                                                                                       1.01    117.1±3.83µs        ? ?/sec                                 1.00    115.8±1.67µs        ? ?/sec
append_rows 4096 string view(1..100, 0.5)                                                                                     1.00    104.4±0.92µs        ? ?/sec                                 1.00    104.7±2.35µs        ? ?/sec
append_rows 4096 string view(10, 0)                                                                                           1.00     53.7±0.56µs        ? ?/sec                                 1.00     53.7±0.65µs        ? ?/sec
append_rows 4096 string view(100, 0)                                                                                          1.01     76.9±0.82µs        ? ?/sec                                 1.00     76.1±0.90µs        ? ?/sec
append_rows 4096 string view(100, 0.5)                                                                                        1.01     86.9±0.53µs        ? ?/sec                                 1.00     86.4±0.56µs        ? ?/sec
append_rows 4096 string view(30, 0)                                                                                           1.01     56.7±0.76µs        ? ?/sec                                 1.00     56.1±0.23µs        ? ?/sec
append_rows 4096 string(10, 0)                                                                                                1.00     47.8±0.38µs        ? ?/sec                                 1.00     47.8±0.37µs        ? ?/sec
append_rows 4096 string(100, 0)                                                                                               1.02     73.0±1.61µs        ? ?/sec                                 1.00     71.7±1.73µs        ? ?/sec
append_rows 4096 string(100, 0.5)                                                                                             1.00     83.5±0.57µs        ? ?/sec                                 1.01     83.9±0.63µs        ? ?/sec
append_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                       1.00    228.6±1.58µs        ? ?/sec                                 1.00    229.2±1.56µs        ? ?/sec
append_rows 4096 string(30, 0)                                                                                                1.00     49.5±0.44µs        ? ?/sec                                 1.00     49.5±0.35µs        ? ?/sec
append_rows 4096 string_dictionary(10, 0)                                                                                     1.00     75.3±1.02µs        ? ?/sec                                 1.00     75.5±0.69µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0)                                                                                    1.00    144.7±1.75µs        ? ?/sec                                 1.00    144.3±1.09µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0.5)                                                                                  1.00    107.9±1.63µs        ? ?/sec                                 1.00    108.2±0.46µs        ? ?/sec
append_rows 4096 string_dictionary(30, 0)                                                                                     1.00     77.4±1.19µs        ? ?/sec                                 1.00     77.4±1.44µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                     1.00     27.0±0.62µs        ? ?/sec                                 1.01     27.2±0.53µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                    1.00     46.4±0.65µs        ? ?/sec                                 1.01     46.8±0.37µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                     1.00     27.4±0.57µs        ? ?/sec                                 1.02     27.8±0.33µs        ? ?/sec
append_rows 4096 u64(0)                                                                                                       1.00      7.6±0.24µs        ? ?/sec                                 1.02      7.7±0.06µs        ? ?/sec
append_rows 4096 u64(0.3)                                                                                                     1.00     13.9±0.65µs        ? ?/sec                                 1.00     13.9±0.22µs        ? ?/sec
append_rows 8192 53 columns                                                                                                   1.00      3.7±0.10ms        ? ?/sec                                 1.02      3.8±0.07ms        ? ?/sec
convert_columns 10 large_list(0) of u64(0)                                                                                    1.00    895.8±9.88ns        ? ?/sec                                 1.03   924.0±12.32ns        ? ?/sec
convert_columns 10 list(0) of u64(0)                                                                                          1.00    948.4±5.09ns        ? ?/sec                                 1.03    977.8±4.85ns        ? ?/sec
convert_columns 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)             1.00    375.2±6.29µs        ? ?/sec                                 1.00    374.7±3.55µs        ? ?/sec
convert_columns 4096 53 columns                                                                                               1.00  1751.6±11.56µs        ? ?/sec                                 1.00  1753.7±13.93µs        ? ?/sec
convert_columns 4096 bool(0, 0.5)                                                                                             1.00      8.8±0.10µs        ? ?/sec                                 1.01      8.9±0.13µs        ? ?/sec
convert_columns 4096 bool(0.3, 0.5)                                                                                           1.00     17.2±0.32µs        ? ?/sec                                 1.00     17.3±0.16µs        ? ?/sec
convert_columns 4096 i64(0)                                                                                                   1.00      7.9±0.14µs        ? ?/sec                                 1.01      8.0±0.13µs        ? ?/sec
convert_columns 4096 i64(0.3)                                                                                                 1.01     18.0±0.29µs        ? ?/sec                                 1.00     17.8±0.13µs        ? ?/sec
convert_columns 4096 large_list(0) of u64(0)                                                                                  1.00    166.2±0.83µs        ? ?/sec                                 1.00    166.0±0.82µs        ? ?/sec
convert_columns 4096 large_list(0) sliced to 10 of u64(0)                                                                     1.00  1181.0±14.29ns        ? ?/sec                                 1.00  1181.5±25.82ns        ? ?/sec
convert_columns 4096 list(0) of u64(0)                                                                                        1.06    172.4±1.41µs        ? ?/sec                                 1.00    161.9±2.21µs        ? ?/sec
convert_columns 4096 list(0) sliced to 10 of u64(0)                                                                           1.00  1282.2±17.66ns        ? ?/sec                                 1.01   1298.0±6.43ns        ? ?/sec
convert_columns 4096 string view(1..100, 0)                                                                                   1.00    116.7±1.51µs        ? ?/sec                                 1.00    116.5±3.54µs        ? ?/sec
convert_columns 4096 string view(1..100, 0.5)                                                                                 1.00    105.2±1.61µs        ? ?/sec                                 1.01    106.1±6.98µs        ? ?/sec
convert_columns 4096 string view(10, 0)                                                                                       1.01     55.1±1.49µs        ? ?/sec                                 1.00     54.6±0.67µs        ? ?/sec
convert_columns 4096 string view(100, 0)                                                                                      1.03     78.4±0.84µs        ? ?/sec                                 1.00     76.0±1.00µs        ? ?/sec
convert_columns 4096 string view(100, 0.5)                                                                                    1.00     87.3±0.52µs        ? ?/sec                                 1.00     87.3±1.96µs        ? ?/sec
convert_columns 4096 string view(30, 0)                                                                                       1.00     57.9±1.11µs        ? ?/sec                                 1.00     57.7±1.50µs        ? ?/sec
convert_columns 4096 string(10, 0)                                                                                            1.00     48.5±0.39µs        ? ?/sec                                 1.01     48.7±1.19µs        ? ?/sec
convert_columns 4096 string(100, 0)                                                                                           1.01     73.0±1.15µs        ? ?/sec                                 1.00     72.1±1.11µs        ? ?/sec
convert_columns 4096 string(100, 0.5)                                                                                         1.00     84.0±1.40µs        ? ?/sec                                 1.00     84.1±0.94µs        ? ?/sec
convert_columns 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                   1.01    231.3±1.31µs        ? ?/sec                                 1.00    228.7±2.39µs        ? ?/sec
convert_columns 4096 string(30, 0)                                                                                            1.00     50.0±1.29µs        ? ?/sec                                 1.00     49.9±0.24µs        ? ?/sec
convert_columns 4096 string_dictionary(10, 0)                                                                                 1.00     76.4±0.83µs        ? ?/sec                                 1.00     76.4±0.74µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0)                                                                                1.00    145.6±1.26µs        ? ?/sec                                 1.01    147.1±1.72µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0.5)                                                                              1.00    110.4±1.82µs        ? ?/sec                                 1.00    110.2±1.93µs        ? ?/sec
convert_columns 4096 string_dictionary(30, 0)                                                                                 1.00     78.2±0.83µs        ? ?/sec                                 1.00     78.1±0.42µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(10, 0)                                                                 1.00     27.9±0.54µs        ? ?/sec                                 1.00     28.0±0.93µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(100, 0)                                                                1.00     48.0±0.38µs        ? ?/sec                                 1.01     48.3±0.10µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(30, 0)                                                                 1.00     28.1±0.39µs        ? ?/sec                                 1.00     28.2±0.36µs        ? ?/sec
convert_columns 4096 u64(0)                                                                                                   1.00      7.9±0.06µs        ? ?/sec                                 1.00      7.9±0.13µs        ? ?/sec
convert_columns 4096 u64(0.3)                                                                                                 1.00     14.0±0.08µs        ? ?/sec                                 1.00     14.1±0.18µs        ? ?/sec
convert_columns 8192 53 columns                                                                                               1.00      3.8±0.04ms        ? ?/sec                                 1.02      3.8±0.12ms        ? ?/sec
convert_columns_prepared 10 large_list(0) of u64(0)                                                                           1.00    686.1±6.25ns        ? ?/sec                                 1.01    692.0±7.21ns        ? ?/sec
convert_columns_prepared 10 list(0) of u64(0)                                                                                 1.00    738.6±3.71ns        ? ?/sec                                 1.01    745.1±4.03ns        ? ?/sec
convert_columns_prepared 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)    1.00    373.2±4.90µs        ? ?/sec                                 1.00   373.6±18.87µs        ? ?/sec
convert_columns_prepared 4096 53 columns                                                                                      1.00  1757.1±41.44µs        ? ?/sec                                 1.00  1751.0±14.15µs        ? ?/sec
convert_columns_prepared 4096 bool(0, 0.5)                                                                                    1.00      8.7±0.09µs        ? ?/sec                                 1.00      8.7±0.06µs        ? ?/sec
convert_columns_prepared 4096 bool(0.3, 0.5)                                                                                  1.00     17.4±1.18µs        ? ?/sec                                 1.00     17.4±1.16µs        ? ?/sec
convert_columns_prepared 4096 i64(0)                                                                                          1.00      8.0±0.16µs        ? ?/sec                                 1.03      8.3±1.19µs        ? ?/sec
convert_columns_prepared 4096 i64(0.3)                                                                                        1.00     17.9±0.49µs        ? ?/sec                                 1.00     17.9±0.54µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) of u64(0)                                                                         1.01    166.4±1.97µs        ? ?/sec                                 1.00    165.4±1.74µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) sliced to 10 of u64(0)                                                            1.00   969.3±13.03ns        ? ?/sec                                 1.01   979.3±22.02ns        ? ?/sec
convert_columns_prepared 4096 list(0) of u64(0)                                                                               1.07    172.4±2.12µs        ? ?/sec                                 1.00    161.4±0.56µs        ? ?/sec
convert_columns_prepared 4096 list(0) sliced to 10 of u64(0)                                                                  1.01  1090.9±18.64ns        ? ?/sec                                 1.00  1079.6±11.15ns        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0)                                                                          1.00    116.2±0.61µs        ? ?/sec                                 1.00    115.9±0.83µs        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0.5)                                                                        1.00    105.0±1.25µs        ? ?/sec                                 1.00    104.6±0.45µs        ? ?/sec
convert_columns_prepared 4096 string view(10, 0)                                                                              1.01     54.3±0.41µs        ? ?/sec                                 1.00     53.7±0.30µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0)                                                                             1.02     77.9±6.86µs        ? ?/sec                                 1.00     76.0±1.18µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0.5)                                                                           1.00     87.2±0.32µs        ? ?/sec                                 1.00     86.8±0.36µs        ? ?/sec
convert_columns_prepared 4096 string view(30, 0)                                                                              1.01     57.1±0.44µs        ? ?/sec                                 1.00     56.5±1.01µs        ? ?/sec
convert_columns_prepared 4096 string(10, 0)                                                                                   1.00     48.2±0.25µs        ? ?/sec                                 1.00     48.1±0.45µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0)                                                                                  1.00     72.2±0.95µs        ? ?/sec                                 1.01     72.7±0.48µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0.5)                                                                                1.00     83.9±2.18µs        ? ?/sec                                 1.00     84.0±0.64µs        ? ?/sec
convert_columns_prepared 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                          1.00    227.9±4.15µs        ? ?/sec                                 1.01    230.6±6.52µs        ? ?/sec
convert_columns_prepared 4096 string(30, 0)                                                                                   1.00     49.7±0.19µs        ? ?/sec                                 1.00     49.7±0.20µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(10, 0)                                                                        1.00     75.3±0.53µs        ? ?/sec                                 1.01     75.8±1.40µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0)                                                                       1.00    144.8±1.30µs        ? ?/sec                                 1.00    145.0±1.67µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0.5)                                                                     1.00    107.9±0.86µs        ? ?/sec                                 1.00    108.4±1.16µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(30, 0)                                                                        1.01     78.4±0.98µs        ? ?/sec                                 1.00     77.3±2.50µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0)                                                        1.01     27.3±1.19µs        ? ?/sec                                 1.00     27.1±0.12µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0)                                                       1.00     46.2±0.50µs        ? ?/sec                                 1.01     46.7±0.41µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0)                                                        1.00     27.3±0.17µs        ? ?/sec                                 1.02     27.9±0.72µs        ? ?/sec
convert_columns_prepared 4096 u64(0)                                                                                          1.00      7.7±0.14µs        ? ?/sec                                 1.02      7.8±0.26µs        ? ?/sec
convert_columns_prepared 4096 u64(0.3)                                                                                        1.00     13.9±0.17µs        ? ?/sec                                 1.00     13.9±0.10µs        ? ?/sec
convert_columns_prepared 8192 53 columns                                                                                      1.00      3.8±0.07ms        ? ?/sec                                 1.01      3.8±0.06ms        ? ?/sec
convert_rows 10 large_list(0) of u64(0)                                                                                       1.01  1537.6±94.57ns        ? ?/sec                                 1.00  1528.3±17.50ns        ? ?/sec
convert_rows 10 list(0) of u64(0)                                                                                             1.00  1709.1±14.15ns        ? ?/sec                                 1.00  1706.3±33.49ns        ? ?/sec
convert_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                1.00    302.8±2.93µs        ? ?/sec                                 1.01    306.6±3.38µs        ? ?/sec
convert_rows 4096 53 columns                                                                                                  1.01      3.0±0.03ms        ? ?/sec                                 1.00      3.0±0.09ms        ? ?/sec
convert_rows 4096 bool(0, 0.5)                                                                                                1.00     16.5±0.13µs        ? ?/sec                                 1.00     16.4±0.16µs        ? ?/sec
convert_rows 4096 bool(0.3, 0.5)                                                                                              1.00     16.5±0.51µs        ? ?/sec                                 1.00     16.5±0.22µs        ? ?/sec
convert_rows 4096 i64(0)                                                                                                      1.00     33.2±0.39µs        ? ?/sec                                 1.00     33.2±0.24µs        ? ?/sec
convert_rows 4096 i64(0.3)                                                                                                    1.01     33.4±0.85µs        ? ?/sec                                 1.00     33.2±0.33µs        ? ?/sec
convert_rows 4096 large_list(0) of u64(0)                                                                                     1.00    269.2±2.82µs        ? ?/sec                                 1.01    271.7±5.42µs        ? ?/sec
convert_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                        1.05      2.1±0.02µs        ? ?/sec                                 1.00  1973.9±16.39ns        ? ?/sec
convert_rows 4096 list(0) of u64(0)                                                                                           1.02    275.4±2.85µs        ? ?/sec                                 1.00    269.0±1.44µs        ? ?/sec
convert_rows 4096 list(0) sliced to 10 of u64(0)                                                                              1.01      2.2±0.03µs        ? ?/sec                                 1.00      2.1±0.03µs        ? ?/sec
convert_rows 4096 string view(1..100, 0)                                                                                      1.00    175.9±1.53µs        ? ?/sec                                 1.00    175.5±2.24µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)                                                                                    1.01    141.5±1.92µs        ? ?/sec                                 1.00    139.8±0.56µs        ? ?/sec
convert_rows 4096 string view(10, 0)                                                                                          1.01     84.3±0.38µs        ? ?/sec                                 1.00     83.4±0.63µs        ? ?/sec
convert_rows 4096 string view(100, 0)                                                                                         1.01    129.0±2.65µs        ? ?/sec                                 1.00    128.3±0.79µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)                                                                                       1.02    120.0±1.50µs        ? ?/sec                                 1.00    117.6±0.37µs        ? ?/sec
convert_rows 4096 string view(30, 0)                                                                                          1.02     94.7±1.66µs        ? ?/sec                                 1.00     93.3±1.28µs        ? ?/sec
convert_rows 4096 string(10, 0)                                                                                               1.01     60.8±1.24µs        ? ?/sec                                 1.00     60.4±0.40µs        ? ?/sec
convert_rows 4096 string(100, 0)                                                                                              1.02    111.7±1.63µs        ? ?/sec                                 1.00    109.8±1.24µs        ? ?/sec
convert_rows 4096 string(100, 0.5)                                                                                            1.00    103.7±0.29µs        ? ?/sec                                 1.00    103.2±0.63µs        ? ?/sec
convert_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                      1.00    301.9±6.27µs        ? ?/sec                                 1.00    300.7±6.49µs        ? ?/sec
convert_rows 4096 string(30, 0)                                                                                               1.02     74.6±3.05µs        ? ?/sec                                 1.00     73.5±0.62µs        ? ?/sec
convert_rows 4096 string_dictionary(10, 0)                                                                                    1.02     61.3±4.62µs        ? ?/sec                                 1.00     60.3±0.26µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0)                                                                                   1.01    111.3±1.08µs        ? ?/sec                                 1.00    110.5±3.46µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0.5)                                                                                 1.00    104.1±0.86µs        ? ?/sec                                 1.00    103.8±1.52µs        ? ?/sec
convert_rows 4096 string_dictionary(30, 0)                                                                                    1.01     74.2±0.53µs        ? ?/sec                                 1.00     73.8±1.76µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                    1.00     60.8±0.92µs        ? ?/sec                                 1.00     60.5±0.77µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                   1.01    111.3±1.56µs        ? ?/sec                                 1.00    110.1±1.65µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                    1.00     74.2±0.60µs        ? ?/sec                                 1.00     73.9±1.76µs        ? ?/sec
convert_rows 4096 u64(0)                                                                                                      1.00     32.6±0.28µs        ? ?/sec                                 1.00     32.7±0.32µs        ? ?/sec
convert_rows 4096 u64(0.3)                                                                                                    1.01     32.8±0.93µs        ? ?/sec                                 1.00     32.6±0.14µs        ? ?/sec
convert_rows 8192 53 columns                                                                                                  1.00      7.2±0.12ms        ? ?/sec                                 1.01      7.3±0.23ms        ? ?/sec
iterate rows                                                                                                                  1.00      2.6±0.01µs        ? ?/sec                                 1.00      2.6±0.01µs        ? ?/sec

alamb added a commit that referenced this pull request Jan 13, 2026
…ow conversion (#9080)

# Which issue does this PR close?

N/A

# Rationale for this change

Making the row length calculation faster which result in faster row
conversion

# What changes are included in this PR?

1. Instead of iterating over the bytes and getting the length from the
byte slice, we use the offsets directly, this is faster as it saves us
going to the buffer
2. Added new API for `GenericByteViewArray` (explained below)

# Are these changes tested?

Yes

# Are there any user-facing changes?

Yes, added `lengths` function to `GenericByteViewArray` to get an
iterator over the lengths of the items in the array

-----

Related to:
- #9078 
- #9079

---------

Co-authored-by: Andrew Lamb <[email protected]>
…culating-rows-length

# Conflicts:
#	arrow-row/src/lib.rs
@rluvaton
Copy link
Member Author

@alamb can you please review and hopefully merge if approved with no comments

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @rluvaton

I have some concerns about the introduction of unsafe APIs and it wasn't clear how this would improve performance (and it doesn't seem like the benchmarks show any improvements)

Did you see improvements in any of your internal benchmarks?

}
}

/// Returns the length of the row at index `row` in bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please document that "length" means bytes here (not, for example, columns)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased

///
/// # Safety
/// Caller must ensure that `index` is less than the number of offsets (#rows + 1)
pub unsafe fn row_len_unchecked(&self, index: usize) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this new pub unsafe API?

in terms of pub it doesn't seem to be used anywhere other than row_len so we could at least make it non pul

In terms of unsafe, Given that row_len does an assert, what is the rationale for using unsafe rather than just normal array access?

As in, why not

    pub fn row_len(&self, row: usize) -> usize {
      self.offsets[row+1] - self.offsets[row]
    }

That would be simpler and not use unsafe

It in theory may have one extra bounds check, but unless the performance gains are compelling I think we should avoid adding new unsafe when possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the unchecked version and only kept row_len

@rluvaton
Copy link
Member Author

and it wasn't clear how this would improve performance

Although the benchmarks show no improvement, there is no really a benchmark that go through that path extensively.

This should improve the performance nonetheless as we are now doing less computation than before.

@rluvaton
Copy link
Member Author

show benchmark queue

Dandandan pushed a commit that referenced this pull request Jan 14, 2026
…n row conversion (#9078)

# Which issue does this PR close?

N/A

# Rationale for this change

Making the row length calculation faster which result in faster row
conversion

# What changes are included in this PR?

Instead of iterating over the items in the array and getting the length
from the byte slice, we use the offsets directly and zip with nulls if
necessary

# Are these changes tested?

Existing tests

# Are there any user-facing changes?

Faster encoding

------

Split to 2 more PRs as the other 2 add a change to the public API

Related to:
- #9079
- #9080

---------

Co-authored-by: Andrew Lamb <[email protected]>
@Dandandan
Copy link
Contributor

Although the benchmarks show no improvement, there is no really a benchmark that go through that path extensively.

I see, perhaps we can add a benchmark case?

@Dandandan Dandandan merged commit 13d497a into apache:main Jan 14, 2026
13 checks passed
@rluvaton rluvaton deleted the improve-performance-of-rows-encoding-for-calculating-rows-length branch January 14, 2026 13:10
@alamb
Copy link
Contributor

alamb commented Jan 14, 2026

show benchmark queue

BTW the benchmark runner was rebooted for some reason. I have fixed it

Dandandan pushed a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
…ow conversion (apache#9080)

# Which issue does this PR close?

N/A

# Rationale for this change

Making the row length calculation faster which result in faster row
conversion

# What changes are included in this PR?

1. Instead of iterating over the bytes and getting the length from the
byte slice, we use the offsets directly, this is faster as it saves us
going to the buffer
2. Added new API for `GenericByteViewArray` (explained below)

# Are these changes tested?

Yes

# Are there any user-facing changes?

Yes, added `lengths` function to `GenericByteViewArray` to get an
iterator over the lengths of the items in the array

-----

Related to:
- apache#9078 
- apache#9079

---------

Co-authored-by: Andrew Lamb <[email protected]>
Dandandan pushed a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
…n row conversion (apache#9078)

# Which issue does this PR close?

N/A

# Rationale for this change

Making the row length calculation faster which result in faster row
conversion

# What changes are included in this PR?

Instead of iterating over the items in the array and getting the length
from the byte slice, we use the offsets directly and zip with nulls if
necessary

# Are these changes tested?

Existing tests

# Are there any user-facing changes?

Faster encoding

------

Split to 2 more PRs as the other 2 add a change to the public API

Related to:
- apache#9079
- apache#9080

---------

Co-authored-by: Andrew Lamb <[email protected]>
Dandandan pushed a commit to Dandandan/arrow-rs that referenced this pull request Jan 15, 2026
… conversion (apache#9079)

# Which issue does this PR close?

N/A

# Rationale for this change

Making the row length calculation faster which result in faster row
conversion

# What changes are included in this PR?

1. Instead of iterating over the rows and getting the length from the
byte slice, we use the offsets directly, this
2. Added 3 new APIs for `Rows` (explained below)

# Are these changes tested?

Yes

# Are there any user-facing changes?

Yes, added 3 functions to `Rows`:
- `row_len` - get the row length at index
- `row_len_unchecked` - get the row length at index without bound checks
- `lengths` - get iterator over the lengths of the rows

-----

Related to:
- apache#9078
- apache#9080

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants