Skip to content

[SPARK-55508][BUILD] Upgrade compress-lzf to 1.2.0#54292

Closed
dongjoon-hyun wants to merge 3 commits intoapache:masterfrom
dongjoon-hyun:SPARK-55508
Closed

[SPARK-55508][BUILD] Upgrade compress-lzf to 1.2.0#54292
dongjoon-hyun wants to merge 3 commits intoapache:masterfrom
dongjoon-hyun:SPARK-55508

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Feb 13, 2026

What changes were proposed in this pull request?

This PR aims to upgrade compress-lzf to 1.2.0.

Why are the changes needed?

To use the latest bug fixed ones. Currently, we use 1.1.2 released 3 years ago (on 2023-01-29).

Does this PR introduce any user-facing change?

No behavior change.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@pan3793
Copy link
Member

pan3793 commented Feb 13, 2026

@dongjoon-hyun, there were benchmarks added in core for lzf, could you please update the benchmark results too?

@dongjoon-hyun
Copy link
Member Author

Sure, @pan3793 !

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 13, 2026

Thank you, @zhengruifeng and @pan3793 . Java 17 result look good. Java 21 result will arrive soon.

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.2.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-55508 branch February 13, 2026 03:16
-----------------------------------------------------------------------------------------------------------------------------
Compression 1024 array values in 2 threads 23 25 1 0.0 22108.7 1.0X
Compression 1024 array values single-threaded 32 33 0 0.0 31296.0 0.7X
Compression 1024 array values in 1 threads 47 54 4 0.0 45412.7 1.0X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with Java 17 result. For Java 17, Compression 1024 array values in 1 threads is recorded before and after, @pan3793 .

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it looks like a result of non-deterministic benchmark code instead of bumping the version.

def getNThreads: Int = {
var nThreads = Runtime.getRuntime.availableProcessors
val jmx = ManagementFactory.getOperatingSystemMXBean
if (jmx != null) {
val loadAverage = jmx.getSystemLoadAverage.toInt
if (nThreads > 1 && loadAverage >= 1) nThreads = Math.max(1, nThreads - loadAverage)
}
nThreads
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, seems it was obsolete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'ManagementFactory.getOperatingSystemMXBean().getSystemLoadAverage()' is the real time value in the GitHub Action, isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a little bit hacky ...

getSystemLoadAverage returns the system load average for the last minute. In benchmark cases, the last minute load will be high because it was compiling the code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I agree with you. We may had better simply choose a constant value, 2, always.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, Kent gave us a result from M2 Max but GitHub Action resource is very smaller than that.

[info] OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5
[info] Apple M2 Max
[info] Compress large objects:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------------------------------------
[info] Compression 1024 array values in 7 threads                12             13           1          0.1       11788.2       1.0X
[info] Compression 1024 array values single-threaded             23             23           0          0.0       22512.7       0.5X

dongjoon-hyun added a commit that referenced this pull request Feb 13, 2026
### What changes were proposed in this pull request?

This PR aims to regenerate benchmark results to check the intermediate status as a part of Apache Spark 4.2.0 preparation.

Please note that `V2FunctionBenchmark` is excluded because it's broken due to `NumericEvalContext.evalMode()` error currently. It's good to identify this kind of bug as early as possible via this PR.
- [SPARK-55519 `V2FunctionBenchmark` is broken](https://issues.apache.org/jira/browse/SPARK-55519)

### Why are the changes needed?

Apache Spark 4.2.0 introduced many improvements on top of the key dependency differences from Spark 4.1.0:
- #53396
- #53582
- #53347
- #54233
- #54292

We updated the benchmark result 4 months ago. So, it's time to make them up-to-date with our actual code and the current infra.
- #52600

```
- OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.14.0-1017-azure
```

```
- OpenJDK 64-Bit Server VM 21.0.8+9-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.14.0-1017-azure
```

### Does this PR introduce _any_ user-facing change?

No. This is a change on benchmark result files.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54313 from dongjoon-hyun/SPARK-55520.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: dongjoon-hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
rpnkv pushed a commit to rpnkv/spark that referenced this pull request Feb 18, 2026
### What changes were proposed in this pull request?

This PR aims to upgrade `compress-lzf` to 1.2.0.

### Why are the changes needed?

To use the latest bug fixed ones. Currently, we use 1.1.2 released 3 years ago (on 2023-01-29).

- https://github.com/ning/compress/releases/tag/compress-lzf-1.2.0 (2026-01-02)
- https://github.com/ning/compress/releases/tag/compress-lzf-1.1.3 (2025-09-26)

### Does this PR introduce _any_ user-facing change?

No behavior change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#54292 from dongjoon-hyun/SPARK-55508.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: dongjoon-hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
rpnkv pushed a commit to rpnkv/spark that referenced this pull request Feb 18, 2026
### What changes were proposed in this pull request?

This PR aims to regenerate benchmark results to check the intermediate status as a part of Apache Spark 4.2.0 preparation.

Please note that `V2FunctionBenchmark` is excluded because it's broken due to `NumericEvalContext.evalMode()` error currently. It's good to identify this kind of bug as early as possible via this PR.
- [SPARK-55519 `V2FunctionBenchmark` is broken](https://issues.apache.org/jira/browse/SPARK-55519)

### Why are the changes needed?

Apache Spark 4.2.0 introduced many improvements on top of the key dependency differences from Spark 4.1.0:
- apache#53396
- apache#53582
- apache#53347
- apache#54233
- apache#54292

We updated the benchmark result 4 months ago. So, it's time to make them up-to-date with our actual code and the current infra.
- apache#52600

```
- OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.14.0-1017-azure
```

```
- OpenJDK 64-Bit Server VM 21.0.8+9-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.14.0-1017-azure
```

### Does this PR introduce _any_ user-facing change?

No. This is a change on benchmark result files.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#54313 from dongjoon-hyun/SPARK-55520.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: dongjoon-hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants