Skip to content

[SPARK-55520][TESTS] Regenerate benchmark results#54313

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-55520
Closed

[SPARK-55520][TESTS] Regenerate benchmark results#54313
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-55520

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Feb 13, 2026

What changes were proposed in this pull request?

This PR aims to regenerate benchmark results to check the intermediate status as a part of Apache Spark 4.2.0 preparation.

Please note that V2FunctionBenchmark is excluded because it's broken due to NumericEvalContext.evalMode() error currently. It's good to identify this kind of bug as early as possible via this PR.

Why are the changes needed?

Apache Spark 4.2.0 introduced many improvements on top of the key dependency differences from Spark 4.1.0:

We updated the benchmark result 4 months ago. So, it's time to make them up-to-date with our actual code and the current infra.

- OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.14.0-1017-azure
- OpenJDK 64-Bit Server VM 21.0.8+9-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.14.0-1017-azure

Does this PR introduce any user-facing change?

No. This is a change on benchmark result files.

How was this patch tested?

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

ZooKeeperPersistenceEngine with JavaSerializer 6638 6863 209 0.0 6637519.7 1.0X
FileSystemPersistenceEngine with JavaSerializer 3161 3168 11 0.0 3160685.6 2.1X
FileSystemPersistenceEngine with JavaSerializer (lz4) 873 899 33 0.0 873026.8 7.6X
FileSystemPersistenceEngine with JavaSerializer (lzf) 3286 3302 16 0.0 3286382.2 2.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a regression to take a look at. Java 17 and 21 are the same.

Indexed 11 12 0 0.1 10698.4 1.0X
No Index 8 8 0 0.1 7430.8 1.4X
Indexed 12 12 0 0.1 11413.1 1.0X
No Index 13 14 1 0.1 12820.2 0.9X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a regression because the ratio is reverted. However, Java 17 result looks fine.

Indexed 11 12 0 0.1 10694.7 1.0X
No Index 8 8 0 0.1 7443.8 1.4X
Indexed 12 12 0 0.1 11461.0 1.0X
No Index 13 14 0 0.1 12881.5 0.9X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. The ratio is reverted but Java 17 result looks fine.

Compression 10000 times at level 1 without buffer pool 648 653 6 0.0 64756.8 1.0X
Compression 10000 times at level 2 without buffer pool 688 688 1 0.0 68788.7 0.9X
Compression 10000 times at level 3 without buffer pool 804 810 10 0.0 80354.7 0.8X
Compression 10000 times at level 1 with buffer pool 580 582 2 0.0 58024.5 1.1X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are expected improvement due to buffer pool.

Decompression 10000 times from level 1 without buffer pool 595 597 3 0.0 59453.4 1.0X
Decompression 10000 times from level 2 without buffer pool 594 595 1 0.0 59429.2 1.0X
Decompression 10000 times from level 3 without buffer pool 595 596 1 0.0 59501.2 1.0X
Decompression 10000 times from level 1 with buffer pool 542 543 1 0.0 54194.0 1.1X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. These are expected improvement due to buffer pool.

------------------------------------------------------------------------------------------------------------------------
Use HashSet 1 1 0 1530.2 0.7 1.0X
Use EnumSet 2 2 0 461.2 2.2 0.3X
Use HashSet 1 1 0 761.6 1.3 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashSet seems to be improved by 2x in both Java 17 and 21.

Murmur3_x86_32 35 35 3 60.6 16.5 1.0X
xxHash 64-bit 28 28 0 76.0 13.2 1.3X
HiveHasher 44 44 0 48.1 20.8 0.8X
Murmur3_x86_32 26 26 0 79.4 12.6 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratio is changed because Murmur3_x86_32 is improved .

Spark 3497 3501 4 0.3 3497.4 1.4X
Spark Binary 2637 2639 2 0.4 2637.2 1.8X
Common Codecs 4987 4999 12 0.2 4986.8 1.0X
Java 4037 4040 4 0.2 4036.7 1.2X
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratio is reverted because Java 21 becomes faster. Like Java 17, this should be the fastest one.

LongDelta 652 652 0 102.9 9.7 1.2X
PassThrough 837 837 1 80.2 12.5 1.0X
RunLengthEncoding 1231 1231 0 54.5 18.3 0.7X
DictionaryEncoding 729 730 2 92.1 10.9 1.1X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratio of DictionaryEncoding is changed.

Test read with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
ConstantColumnVector 1839 1840 2 222.7 4.5 1.0X
OnHeapColumnVector 0 0 0 1580533.5 0.0 7096.3X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous value 0 looks wrong to me.

ParquetReader Vectorized: DataPageV1 167 171 4 94.0 10.6 1.0X
ParquetReader Vectorized: DataPageV2 193 197 2 81.5 12.3 0.9X
ParquetReader Vectorized -> Row: DataPageV1 171 179 8 92.1 10.9 1.0X
ParquetReader Vectorized -> Row: DataPageV2 203 204 2 77.6 12.9 0.8X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes slower slightly.

ParquetReader Vectorized: DataPageV1 151 153 1 103.9 9.6 1.0X
ParquetReader Vectorized: DataPageV2 156 158 2 100.7 9.9 1.0X
ParquetReader Vectorized -> Row: DataPageV1 162 163 1 97.2 10.3 0.9X
ParquetReader Vectorized -> Row: DataPageV2 162 163 2 97.1 10.3 0.9X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratio is reverted.

SQL Parquet Vectorized: DataPageV2 1476 1479 4 7.1 140.8 4.9X
SQL Parquet MR: DataPageV1 3564 3582 26 2.9 339.9 2.0X
SQL Parquet MR: DataPageV2 3578 3585 9 2.9 341.2 2.0X
ParquetReader Vectorized: DataPageV1 879 886 7 11.9 83.8 8.3X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this was the fastest but now becomes slower relatively than SQL ORC Vectorized somehow.

50 ints (non-compact): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
In expression 99 100 1 100.7 9.9 1.0X
InSet expression 132 136 7 75.9 13.2 0.8X
In expression 156 158 3 64.0 15.6 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In becomes slower relatively, but it could be due to different GitHub Action CPUs.

First 10 integers using SELECT and LIMIT 69 74 6 0.0 6947861.5 2.1X
First 10 integers referencing external table in anchor 137 152 14 0.0 13678740.6 1.1X
First 10 integers using VALUES 162 180 18 0.0 16176649.5 1.0X
First 10 integers using SELECT 147 157 9 0.0 14653403.4 1.1X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This becomes slower relatively and consistently.

long/nullable int/string to primitive wholestage off 36 37 1 2.8 362.6 1.0X
long/nullable int/string to primitive wholestage on 29 32 2 3.5 285.8 1.3X
long/nullable int/string to primitive wholestage off 32 32 1 3.2 315.9 1.0X
long/nullable int/string to primitive wholestage on 32 38 6 3.1 317.6 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is too small to say there is a difference.

@dongjoon-hyun
Copy link
Member Author

The audit is finished. Could you review this PR, @peter-toth ?

@dongjoon-hyun
Copy link
Member Author

Thank you, @peter-toth . SPARK-55519 V2FunctionBenchmark is broken and the further investigation will follow later independently.

Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-55520 branch February 13, 2026 19:41
rpnkv pushed a commit to rpnkv/spark that referenced this pull request Feb 18, 2026
### What changes were proposed in this pull request?

This PR aims to regenerate benchmark results to check the intermediate status as a part of Apache Spark 4.2.0 preparation.

Please note that `V2FunctionBenchmark` is excluded because it's broken due to `NumericEvalContext.evalMode()` error currently. It's good to identify this kind of bug as early as possible via this PR.
- [SPARK-55519 `V2FunctionBenchmark` is broken](https://issues.apache.org/jira/browse/SPARK-55519)

### Why are the changes needed?

Apache Spark 4.2.0 introduced many improvements on top of the key dependency differences from Spark 4.1.0:
- apache#53396
- apache#53582
- apache#53347
- apache#54233
- apache#54292

We updated the benchmark result 4 months ago. So, it's time to make them up-to-date with our actual code and the current infra.
- apache#52600

```
- OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.14.0-1017-azure
```

```
- OpenJDK 64-Bit Server VM 21.0.8+9-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.14.0-1017-azure
```

### Does this PR introduce _any_ user-facing change?

No. This is a change on benchmark result files.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#54313 from dongjoon-hyun/SPARK-55520.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: dongjoon-hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants