[SPARK-53893][TESTS] Regenerate benchmark results after upgrading to Scala 2.13.17 by dongjoon-hyun · Pull Request #52600 · apache/spark

dongjoon-hyun · 2025-10-14T02:10:54Z

What changes were proposed in this pull request?

This PR aims to regenerate benchmark results after upgrading to Scala 2.13.17.

Why are the changes needed?

Since last update, we change important libraries, not only Scala, but also Hadoop, ORC, ZSTD libraries. This PR aims to make the benchmark result up-to-date as a way to detect any performance regression.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

…Scala 2.13.17

dongjoon-hyun · 2025-10-14T02:37:48Z

core/benchmarks/LZFBenchmark-jdk21-results.txt

 -----------------------------------------------------------------------------------------------------------------------------
-Compression 1024 array values in 1 threads                44             50           3          0.0       43112.9       1.0X
-Compression 1024 array values single-threaded             32             33           0          0.0       31315.1       1.4X
+Compression 1024 array values in 2 threads                23             25           1          0.0       22108.7       1.0X


The number of thread seems to be changed by GitHub Action side.

spark/core/src/test/scala/org/apache/spark/io/LZFBenchmark.scala

Lines 69 to 77 in 7d028c6

def getNThreads: Int = {

var nThreads = Runtime.getRuntime.availableProcessors

val jmx = ManagementFactory.getOperatingSystemMXBean

if (jmx != null) {

val loadAverage = jmx.getSystemLoadAverage.toInt

if (nThreads > 1 && loadAverage >= 1) nThreads = Math.max(1, nThreads - loadAverage)

}

nThreads

}

dongjoon-hyun · 2025-10-14T02:51:52Z

core/benchmarks/ZStandardBenchmark-results.txt

-Decompression 10000 times from level 3 with buffer pool               540            541           0          0.0       54016.9       1.1X
+Decompression 10000 times from level 1 without buffer pool            166            167           0          0.1       16645.6       1.0X
+Decompression 10000 times from level 2 without buffer pool            166            166           0          0.1       16558.6       1.0X
+Decompression 10000 times from level 3 without buffer pool            166            167           0          0.1       16629.4       1.0X


The ZSTD decompression speed is improved in Java 17 benchmark. Since Java 21 doesn't show any change, it could be some transient result.

dongjoon-hyun · 2025-10-14T03:14:20Z

sql/core/benchmarks/EncodeBenchmark-results.txt

-UTF-32                                            56295          56403         153          0.2        5629.5       1.0X
-UTF-16                                            50644          50653          13          0.2        5064.4       1.1X
-UTF-8                                             30599          30619          28          0.3        3059.9       1.8X
+UTF-32                                            33517          33545          41          0.3        3351.7       1.0X


Here, UTF-32 performance is improved, but Java 21 result is the same. This could be a transient one.

dongjoon-hyun · 2025-10-14T03:15:04Z

Thank you, @LuciferYang and @yaooqinn .

dongjoon-hyun · 2025-10-14T03:16:06Z

At the first glance, there is no outstanding regression .Let me merge this since this is only a result of regeneration.

…Scala 2.13.17 ### What changes were proposed in this pull request? This PR aims to regenerate benchmark results after upgrading to Scala 2.13.17. ### Why are the changes needed? Since last update, we change important libraries, not only Scala, but also Hadoop, ORC, ZSTD libraries. This PR aims to make the benchmark result up-to-date as a way to detect any performance regression. - apache#52509 - apache#51127 - apache#52478 - apache#52591 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52600 from dongjoon-hyun/SPARK-53893. Lead-authored-by: Dongjoon Hyun <[email protected]> Co-authored-by: dongjoon-hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? This PR aims to regenerate benchmark results to check the intermediate status as a part of Apache Spark 4.2.0 preparation. Please note that `V2FunctionBenchmark` is excluded because it's broken due to `NumericEvalContext.evalMode()` error currently. It's good to identify this kind of bug as early as possible via this PR. - [SPARK-55519 `V2FunctionBenchmark` is broken](https://issues.apache.org/jira/browse/SPARK-55519) ### Why are the changes needed? Apache Spark 4.2.0 introduced many improvements on top of the key dependency differences from Spark 4.1.0: - #53396 - #53582 - #53347 - #54233 - #54292 We updated the benchmark result 4 months ago. So, it's time to make them up-to-date with our actual code and the current infra. - #52600 ``` - OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 6.11.0-1018-azure + OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.14.0-1017-azure ``` ``` - OpenJDK 64-Bit Server VM 21.0.8+9-LTS on Linux 6.11.0-1018-azure + OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.14.0-1017-azure ``` ### Does this PR introduce _any_ user-facing change? No. This is a change on benchmark result files. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #54313 from dongjoon-hyun/SPARK-55520. Lead-authored-by: Dongjoon Hyun <[email protected]> Co-authored-by: dongjoon-hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? This PR aims to regenerate benchmark results to check the intermediate status as a part of Apache Spark 4.2.0 preparation. Please note that `V2FunctionBenchmark` is excluded because it's broken due to `NumericEvalContext.evalMode()` error currently. It's good to identify this kind of bug as early as possible via this PR. - [SPARK-55519 `V2FunctionBenchmark` is broken](https://issues.apache.org/jira/browse/SPARK-55519) ### Why are the changes needed? Apache Spark 4.2.0 introduced many improvements on top of the key dependency differences from Spark 4.1.0: - apache#53396 - apache#53582 - apache#53347 - apache#54233 - apache#54292 We updated the benchmark result 4 months ago. So, it's time to make them up-to-date with our actual code and the current infra. - apache#52600 ``` - OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 6.11.0-1018-azure + OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.14.0-1017-azure ``` ``` - OpenJDK 64-Bit Server VM 21.0.8+9-LTS on Linux 6.11.0-1018-azure + OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.14.0-1017-azure ``` ### Does this PR introduce _any_ user-facing change? No. This is a change on benchmark result files. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#54313 from dongjoon-hyun/SPARK-55520. Lead-authored-by: Dongjoon Hyun <[email protected]> Co-authored-by: dongjoon-hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

[SPARK-53893][TESTS] Regenerate benchmark results after upgrading to …

7b7ce19

…Scala 2.13.17

github-actions bot added SQL MLLIB CORE AVRO labels Oct 14, 2025

dongjoon-hyun commented Oct 14, 2025

View reviewed changes

LuciferYang approved these changes Oct 14, 2025

View reviewed changes

yaooqinn approved these changes Oct 14, 2025

View reviewed changes

dongjoon-hyun commented Oct 14, 2025

View reviewed changes

dongjoon-hyun closed this in 8898ec9 Oct 14, 2025

dongjoon-hyun deleted the SPARK-53893 branch October 14, 2025 03:18

dongjoon-hyun mentioned this pull request Feb 13, 2026

[SPARK-55520][TESTS] Regenerate benchmark results #54313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-53893][TESTS] Regenerate benchmark results after upgrading to Scala 2.13.17#52600

[SPARK-53893][TESTS] Regenerate benchmark results after upgrading to Scala 2.13.17#52600
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-53893

dongjoon-hyun commented Oct 14, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun Oct 14, 2025

Uh oh!

dongjoon-hyun Oct 14, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun Oct 14, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented Oct 14, 2025

Uh oh!

dongjoon-hyun commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def getNThreads: Int = {
	var nThreads = Runtime.getRuntime.availableProcessors
	val jmx = ManagementFactory.getOperatingSystemMXBean
	if (jmx != null) {
	val loadAverage = jmx.getSystemLoadAverage.toInt
	if (nThreads > 1 && loadAverage >= 1) nThreads = Math.max(1, nThreads - loadAverage)
	}
	nThreads
	}

Conversation

dongjoon-hyun commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 14, 2025

Uh oh!

dongjoon-hyun commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dongjoon-hyun commented Oct 14, 2025 •

edited

Loading

dongjoon-hyun Oct 14, 2025 •

edited

Loading

dongjoon-hyun Oct 14, 2025 •

edited

Loading