Skip to content

[SPARK-54611][BUILD] Upgrade lz4-java to 1.10.1#53347

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-54611
Closed

[SPARK-54611][BUILD] Upgrade lz4-java to 1.10.1#53347
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-54611

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Dec 5, 2025

What changes were proposed in this pull request?

This PR aims to upgrade lz4-java to 1.10.1.

Why are the changes needed?

To bring the latest bug fixes.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the BUILD label Dec 5, 2025
@dongjoon-hyun
Copy link
Member Author

cc @yawkat , @LuciferYang , @peter-toth

@dongjoon-hyun
Copy link
Member Author

Thank you, @peter-toth !

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.2.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-54611 branch December 5, 2025 20:38
dongjoon-hyun added a commit that referenced this pull request Feb 13, 2026
### What changes were proposed in this pull request?

This PR aims to regenerate benchmark results to check the intermediate status as a part of Apache Spark 4.2.0 preparation.

Please note that `V2FunctionBenchmark` is excluded because it's broken due to `NumericEvalContext.evalMode()` error currently. It's good to identify this kind of bug as early as possible via this PR.
- [SPARK-55519 `V2FunctionBenchmark` is broken](https://issues.apache.org/jira/browse/SPARK-55519)

### Why are the changes needed?

Apache Spark 4.2.0 introduced many improvements on top of the key dependency differences from Spark 4.1.0:
- #53396
- #53582
- #53347
- #54233
- #54292

We updated the benchmark result 4 months ago. So, it's time to make them up-to-date with our actual code and the current infra.
- #52600

```
- OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.14.0-1017-azure
```

```
- OpenJDK 64-Bit Server VM 21.0.8+9-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.14.0-1017-azure
```

### Does this PR introduce _any_ user-facing change?

No. This is a change on benchmark result files.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54313 from dongjoon-hyun/SPARK-55520.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: dongjoon-hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
rpnkv pushed a commit to rpnkv/spark that referenced this pull request Feb 18, 2026
### What changes were proposed in this pull request?

This PR aims to regenerate benchmark results to check the intermediate status as a part of Apache Spark 4.2.0 preparation.

Please note that `V2FunctionBenchmark` is excluded because it's broken due to `NumericEvalContext.evalMode()` error currently. It's good to identify this kind of bug as early as possible via this PR.
- [SPARK-55519 `V2FunctionBenchmark` is broken](https://issues.apache.org/jira/browse/SPARK-55519)

### Why are the changes needed?

Apache Spark 4.2.0 introduced many improvements on top of the key dependency differences from Spark 4.1.0:
- apache#53396
- apache#53582
- apache#53347
- apache#54233
- apache#54292

We updated the benchmark result 4 months ago. So, it's time to make them up-to-date with our actual code and the current infra.
- apache#52600

```
- OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.14.0-1017-azure
```

```
- OpenJDK 64-Bit Server VM 21.0.8+9-LTS on Linux 6.11.0-1018-azure
+ OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.14.0-1017-azure
```

### Does this PR introduce _any_ user-facing change?

No. This is a change on benchmark result files.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#54313 from dongjoon-hyun/SPARK-55520.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: dongjoon-hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
pan3793 added a commit that referenced this pull request Feb 27, 2026
…gression

### What changes were proposed in this pull request?

Previously, lz4-java was upgraded to 1.10.x to address CVEs,

- #53327
- #53347
- #53971

while this casues significant performance drop, see the benchmark report at

- #53453

this PR follows the [suggestion](#53290 (comment)) to migrate to safeDecompressor.

### Why are the changes needed?

Mitigate performance regression.

### Does this PR introduce _any_ user-facing change?

No, except for performance.

### How was this patch tested?

GHA for functionality, [benchmark](#53453 (comment)) for performance.

> TL;DR - my test results show lz4-java 1.10.1 is about 10~15% slower on lz4 compression than 1.8.0, and is about ~5% slower on lz4 decompression even with migrating to suggested safeDecompressor

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #53454 from pan3793/SPARK-54571.

Lead-authored-by: Cheng Pan <[email protected]>
Co-authored-by: pan3793 <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
SteNicholas added a commit to apache/celeborn that referenced this pull request Mar 3, 2026
… CVE‐2025‐12183 and CVE-2025-66566

### What changes were proposed in this pull request?

- Bump lz4-java version from 1.8.0 to 1.10.4 to resolve CVE‐2025‐12183 and CVE-2025-66566.
- `Lz4Decompressor` follows the [suggestion](apache/spark#53290 (comment)) to move from `fastDecompressor` to `safeDecompressor` to mitigate the performance.

Backport:

- apache/spark#53327
- apache/spark#53347
- apache/spark#53971
- apache/spark#53454
- apache/spark#54585

### Why are the changes needed?

- [CVE‐2025‐12183](https://sites.google.com/sonatype.com/vulnerabilities/cve-2025-12183): Various lz4-java compression and decompression implementations do not guard against out-of-bounds memory access. Untrusted input may lead to denial of service and information disclosure. Vulnerable Maven coordinates: org.lz4:lz4-java up to and including 1.8.0.

- [CVE-2025-66566](GHSA-cmp6-m4wj-q63q): Insufficient clearing of the output buffer in Java-based decompressor implementations in lz4-java 1.10.0 and earlier allows remote attackers to read previous buffer contents via crafted compressed input. In applications where the output buffer is reused without being cleared, this may lead to disclosure of sensitive data. JNI-based implementations are not affected.

Therefore, lz4-java version should upgrade to 1.10.4.

### Does this PR resolve a correctness bug?

No.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #3555 from SteNicholas/CELEBORN-2218.

Lead-authored-by: SteNicholas <[email protected]>
Co-authored-by: Cheng Pan <[email protected]>
Signed-off-by: SteNicholas <[email protected]>
SteNicholas added a commit to apache/celeborn that referenced this pull request Mar 3, 2026
… CVE‐2025‐12183 and CVE-2025-66566

- Bump lz4-java version from 1.8.0 to 1.10.4 to resolve CVE‐2025‐12183 and CVE-2025-66566.
- `Lz4Decompressor` follows the [suggestion](apache/spark#53290 (comment)) to move from `fastDecompressor` to `safeDecompressor` to mitigate the performance.

Backport:

- apache/spark#53327
- apache/spark#53347
- apache/spark#53971
- apache/spark#53454
- apache/spark#54585

- [CVE‐2025‐12183](https://sites.google.com/sonatype.com/vulnerabilities/cve-2025-12183): Various lz4-java compression and decompression implementations do not guard against out-of-bounds memory access. Untrusted input may lead to denial of service and information disclosure. Vulnerable Maven coordinates: org.lz4:lz4-java up to and including 1.8.0.

- [CVE-2025-66566](GHSA-cmp6-m4wj-q63q): Insufficient clearing of the output buffer in Java-based decompressor implementations in lz4-java 1.10.0 and earlier allows remote attackers to read previous buffer contents via crafted compressed input. In applications where the output buffer is reused without being cleared, this may lead to disclosure of sensitive data. JNI-based implementations are not affected.

Therefore, lz4-java version should upgrade to 1.10.4.

No.

No.

CI.

Closes #3555 from SteNicholas/CELEBORN-2218.

Lead-authored-by: SteNicholas <[email protected]>
Co-authored-by: Cheng Pan <[email protected]>
Signed-off-by: SteNicholas <[email protected]>
(cherry picked from commit dca3749)
Signed-off-by: SteNicholas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants