[SPARK-42664][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions` by LuciferYang · Pull Request #42414 · apache/spark

LuciferYang · 2023-08-09T14:15:20Z

What changes were proposed in this pull request?

This is pr using BloomFilterAggregate to implement bloomFilter function for DataFrameStatFunctions.

Why are the changes needed?

Add Spark connect jvm client api coverage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add new test
Manually check Scala 2.13

LuciferYang · 2023-08-09T14:16:53Z

cc @hvanhovell I make a clean one, let's restart this

hvanhovell · 2023-08-09T14:46:50Z

@LuciferYang does this return the same results as the one in sql/core?

LuciferYang · 2023-08-09T15:15:59Z

Let me check again, this pr has been put for too long, I also can't remember clearly ...

LuciferYang · 2023-08-09T16:47:04Z

@hvanhovell I generated some random sequences (covering 5 data types that need to be supported) and used different parameters to compare the output results (including numHashFunctions, bits. bitCount,bits. data. mkString of BloomFilterImpl) with the outputs in the sql/core module, and their results are consistent.

So I think their results should be consistent.

hvanhovell · 2023-08-09T17:16:42Z

@LuciferYang by consistent you mean exactly the same?

LuciferYang · 2023-08-09T17:19:39Z

@LuciferYang by consistent you mean exactly the same?

Yes, Have you found any cases with different results?

hvanhovell · 2023-08-09T17:20:16Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala

+      fpp: Double): BloomFilter = {
+
+    val agg = if (!fpp.isNaN) {
+      Column.fn("bloom_filter_agg", col, lit(expectedNumItems), lit(fpp))


I don't really like the ambiguity here. Since we are managing this function ourselves, can we just have one way of invoking it. I kind of prefer Column.fn("bloom_filter_agg", col, lit(expectedNumItems), lit(numBits)).

Alternatively you pass all three, where you pick either fpp or numItems and pass null for the other field. Another idea would be to have different names.

Let me think about how to refactor.

fe958a6 chang e to only use Column.fn("bloom_filter_agg", col, lit(expectedNumItems), lit(numBits)).

hvanhovell · 2023-08-09T17:26:22Z

connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientDataFrameStatSuite.scala

Maybe add a negative test case where mightContain evaluates to false?

6ffbfa0 Added checks for values that are definitely not included.

hvanhovell · 2023-08-09T17:30:11Z

connector/connect/server/src/main/scala/org/apache/spark/util/sketch/BloomFilterHelper.scala

+/**
+ * `BloomFilterHelper` is used to bridge helper methods in BloomFilter`
+ */
+private[spark] object BloomFilterHelper {


Why can't you directly reference BloomFilter.optimalNumOfBits(expectedNumItems, fpp)? Alternatively you can hide a lot of this by creating dedicated constructors for the BloomFilterAggregate.

4709dd5 make BloomFilter.optimalNumOfBits public and call it directly

hvanhovell · 2023-08-09T17:31:45Z

...rc/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala

      SQLConf.get.getConf(RUNTIME_BLOOM_FILTER_MAX_NUM_BITS))

+  // Mark as lazy so that `updater` is not evaluated during tree transformation.
+  private lazy val updater: BloomFilterUpdater = first.dataType match {


For the records lazy vals are not for free.

Yes, but I haven't thought of other ways yet. This is similar to the cases of estimatedNumItems and numBits. If it's not lazy, then there will be an issue of Invalid call to dataType on unresolved object

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala

Lines 143 to 151 in 55b07b1

// Mark as lazy so that `estimatedNumItems` is not evaluated during tree transformation.

private lazy val estimatedNumItems: Long =

Math.min(estimatedNumItemsExpression.eval().asInstanceOf[Number].longValue,

SQLConf.get.getConf(RUNTIME_BLOOM_FILTER_MAX_NUM_ITEMS))

// Mark as lazy so that `numBits` is not evaluated during tree transformation.

private lazy val numBits: Long =

Math.min(numBitsExpression.eval().asInstanceOf[Number].longValue,

SQLConf.get.getConf(RUNTIME_BLOOM_FILTER_MAX_NUM_BITS))

hvanhovell

Looks pretty good! Can you address the comments?

This reverts commit dfbe1c4.

LuciferYang · 2023-08-10T03:54:53Z

...connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala

+
+        // Check expectedNumItems is LongType and value greater than 0L
+        val expectedNumItemsExpr = children(1)
+        val expectedNumItems = expectedNumItemsExpr match {


Change to Column.fn("bloom_filter_agg", col, lit(expectedNumItems), lit(numBits), the logic indeed appears simpler now, and I have a point for discussion.

@hvanhovell Do you think we should check the validity of the input here? By checking here, the error message can be exactly the same as the api in sql/core. However, if we use the validation mechanism of BloomFilterAggregate, the content of the error message will be different, but the code will be more concise.

Perhaps we don't need to ensure that the error message is the same as before?

We can do that in a follow-up.

LuciferYang · 2023-08-10T05:13:33Z

connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientDataFrameStatSuite.scala

+    val filter1 = df.stat.bloomFilter("id", 1000, 0.03)
+    assert(filter1.expectedFpp() - 0.03 < 1e-3)
+    assert(data.forall(filter1.mightContain))
+    assert(notContainValues.forall(n => !filter1.mightContain(n)))


Added checks for values that are definitely not included.

LuciferYang · 2023-08-10T05:35:41Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala

+      numBits
+    }
+
+    if (fpp <= 0d || fpp >= 1d) {


In the subsequent process, fpp is no longer involved, so a check is added here. Otherwise, if the user passes an invalid fpp value, the error message will "Number of bits must be positive", which is quite strange.

common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java

LuciferYang · 2023-08-10T07:47:05Z

common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java

   * @param p false positive rate (must be 0 < p < 1)
   */
-  private static long optimalNumOfBits(long n, double p) {
+  public static long optimalNumOfBits(long n, double p) {


Change to public is because DataFrameStatFunctions#buildBloomFilter needs to use this method to calculate the numBits from expectedNumItems and fpp

If you find (must be 0 < p < 1) to be quite messy, we can try changing it to (must be {@literal 0 < p < 1})

LuciferYang · 2023-08-10T11:47:59Z

unidoc check still failed, but I can run it successfully locally, and I am investigating how to resolve this.

common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java

…d in apache/spark#53038.

## Changes | Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | N/A | Feat | Build | Update build configuration to support Spark 4.1 UT | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/pom.xml`, `gluten-ut/spark41/pom.xml`, `tools/gluten-it/pom.xml` | | [#52165](apache/spark#52165) | Fix | Dependency | Update Parquet dependency version to 1.16.0 to avoid NoSuchMethodError issue | `gluten-ut/spark41/pom.xml` | | [#51477](apache/spark#51477) | Fix | Compatibility | Update imports to reflect streaming runtime package refactoring in Apache Spark | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala`, `gluten-ut/spark41/.../GlutenStreamingQuerySuite.scala` | | [#50674](apache/spark#50674) | Fix | Compatibility | Fix compatibility issue introduced by `TypedConfigBuilder` | `gluten-substrait/.../ExpressionConverter.scala`, `gluten-ut/spark41/.../GlutenCSVSuite.scala`, `gluten-ut/spark41/.../GlutenJsonSuite.scala` | | [#49766](apache/spark#49766) | Fix | Compatibility | Disable V2 bucketing in GlutenDynamicPartitionPruningSuite since spark.sql.sources.v2.bucketing.enabled is now enabled by default | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala` | | [#42414](apache/spark#42414), [#53038](apache/spark#53038) | Fix | Bug Fix | Resolve an issue introduced by SPARK-42414, as identified in SPARK-53038 | `backends-velox/.../VeloxBloomFilterAggregate.scala` | | N/A | Fix | Bug Fix | Enforce row fallback for unsupported cached batches - keep columnar execution only when schema validation succeeds | `backends-velox/.../ColumnarCachedBatchSerializer.scala` | | [SPARK-53132](apache/spark#53132), [SPARK-53142](apache/spark#53142) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 KeyGroupedPartitioningSuite tests. Excluded tests: `SPARK-53322*`, `SPARK-54439*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [SPARK-53535](https://issues.apache.org/jira/browse/SPARK-53535), [SPARK-54220](https://issues.apache.org/jira/browse/SPARK-54220) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenParquetIOSuite tests. Excluded tests: `SPARK-53535*`, `vectorized reader: missing all struct fields*`, `SPARK-54220*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52645](apache/spark#52645) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenStreamingQuerySuite tests. Excluded tests: `SPARK-53942: changing the number of stateless shuffle partitions via config`, `SPARK-53942: stateful shuffle partitions are retained from old checkpoint` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#47856](apache/spark#47856) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenDataFrameWindowFunctionsSuite and GlutenJoinSuite tests. Excluded tests: `SPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold`, `SPARK-49386: test SortMergeJoin (with spill by size threshold)` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52157](apache/spark#52157) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenQueryExecutionSuite tests. Excluded test: `#53413: Cleanup shuffle dependencies for commands` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#48470](apache/spark#48470) | 4.1.0 | Test Exclusion | Exclude split test in GlutenRegexpExpressionsSuite. Excluded test: `GlutenRegexpExpressionsSuite.SPLIT` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#51623](apache/spark#51623) | 4.1.0 | Test Exclusion | Add `spark.sql.unionOutputPartitioning=false` to Maven test args. Excluded tests: `GlutenBroadcastExchangeSuite.SPARK-52962`, `GlutenDataFrameSetOperationsSuite.SPARK-52921*` | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/spark41/.../VeloxTestSettings.scala`, `tools/gluten-it/common/.../Suite.scala` | | N/A | 4.1.0 | Test Exclusion | Excludes failed SQL tests that need to be fixed for Spark 4.1 compatibility. Excluded tests: `decimalArithmeticOperations.sql`, `identifier-clause.sql`, `keywords.sql`, `literals.sql`, `operators.sql`, `exists-orderby-limit.sql`, `postgreSQL/date.sql`, `nonansi/keywords.sql`, `nonansi/literals.sql`, `datetime-legacy.sql`, `datetime-parsing-invalid.sql`, `misc-functions.sql` | `gluten-ut/spark41/.../VeloxSQLQueryTestSettings.scala` |

## Changes | Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | N/A | Feat | Build | Update build configuration to support Spark 4.1 UT | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/pom.xml`, `gluten-ut/spark41/pom.xml`, `tools/gluten-it/pom.xml` | | [#52165](apache/spark#52165) | Fix | Dependency | Update Parquet dependency version to 1.16.0 to avoid NoSuchMethodError issue | `gluten-ut/spark41/pom.xml` | | [#51477](apache/spark#51477) | Fix | Compatibility | Update imports to reflect streaming runtime package refactoring in Apache Spark | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala`, `gluten-ut/spark41/.../GlutenStreamingQuerySuite.scala` | | [#50674](apache/spark#50674) | Fix | Compatibility | Fix compatibility issue introduced by `TypedConfigBuilder` | `gluten-substrait/.../ExpressionConverter.scala`, `gluten-ut/spark41/.../GlutenCSVSuite.scala`, `gluten-ut/spark41/.../GlutenJsonSuite.scala` | | [#49766](apache/spark#49766) | Fix | Compatibility | Disable V2 bucketing in GlutenDynamicPartitionPruningSuite since spark.sql.sources.v2.bucketing.enabled is now enabled by default | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala` | | [#42414](apache/spark#42414), [#53038](apache/spark#53038) | Fix | Bug Fix | Resolve an issue introduced by SPARK-42414, as identified in SPARK-53038 | `backends-velox/.../VeloxBloomFilterAggregate.scala` | | N/A | Fix | Bug Fix | Enforce row fallback for unsupported cached batches - keep columnar execution only when schema validation succeeds | `backends-velox/.../ColumnarCachedBatchSerializer.scala` | | [SPARK-53132](apache/spark#53132), [SPARK-53142](apache/spark#53142) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 KeyGroupedPartitioningSuite tests. Excluded tests: `SPARK-53322*`, `SPARK-54439*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [SPARK-53535](https://issues.apache.org/jira/browse/SPARK-53535), [SPARK-54220](https://issues.apache.org/jira/browse/SPARK-54220) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenParquetIOSuite tests. Excluded tests: `SPARK-53535*`, `vectorized reader: missing all struct fields*`, `SPARK-54220*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52645](apache/spark#52645) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenStreamingQuerySuite tests. Excluded tests: `SPARK-53942: changing the number of stateless shuffle partitions via config`, `SPARK-53942: stateful shuffle partitions are retained from old checkpoint` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#47856](apache/spark#47856) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenDataFrameWindowFunctionsSuite and GlutenJoinSuite tests. Excluded tests: `SPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold`, `SPARK-49386: test SortMergeJoin (with spill by size threshold)` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52157](apache/spark#52157) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenQueryExecutionSuite tests. Excluded test: `#53413: Cleanup shuffle dependencies for commands` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#48470](apache/spark#48470) | 4.1.0 | Test Exclusion | Exclude split test in GlutenRegexpExpressionsSuite. Excluded test: `GlutenRegexpExpressionsSuite.SPLIT` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#51623](apache/spark#51623) | 4.1.0 | Test Exclusion | Add `spark.sql.unionOutputPartitioning=false` to Maven test args. Excluded tests: `GlutenBroadcastExchangeSuite.SPARK-52962`, `GlutenDataFrameSetOperationsSuite.SPARK-52921*` | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/spark41/.../VeloxTestSettings.scala`, `tools/gluten-it/common/.../Suite.scala` | | N/A | 4.1.0 | Test Exclusion | Excludes failed SQL tests that need to be fixed for Spark 4.1 compatibility. Excluded tests: `decimalArithmeticOperations.sql`, `identifier-clause.sql`, `keywords.sql`, `literals.sql`, `operators.sql`, `exists-orderby-limit.sql`, `postgreSQL/date.sql`, `nonansi/keywords.sql`, `nonansi/literals.sql`, `datetime-legacy.sql`, `datetime-parsing-invalid.sql`, `misc-functions.sql` | `gluten-ut/spark41/.../VeloxSQLQueryTestSettings.scala` | | apache#11252 | 4.1.0 | Test Exclusion | Exclude Gluten test for SPARK-47939: Explain should work with parameterized queries | `gluten-ut/spark41/.../VeloxTestSettings.scala` |

## Changes | Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | N/A | Feat | Build | Update build configuration to support Spark 4.1 UT | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/pom.xml`, `gluten-ut/spark41/pom.xml`, `tools/gluten-it/pom.xml` | | [#52165](apache/spark#52165) | Fix | Dependency | Update Parquet dependency version to 1.16.0 to avoid NoSuchMethodError issue | `gluten-ut/spark41/pom.xml` | | [#51477](apache/spark#51477) | Fix | Compatibility | Update imports to reflect streaming runtime package refactoring in Apache Spark | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala`, `gluten-ut/spark41/.../GlutenStreamingQuerySuite.scala` | | [#50674](apache/spark#50674) | Fix | Compatibility | Fix compatibility issue introduced by `TypedConfigBuilder` | `gluten-substrait/.../ExpressionConverter.scala`, `gluten-ut/spark41/.../GlutenCSVSuite.scala`, `gluten-ut/spark41/.../GlutenJsonSuite.scala` | | [#49766](apache/spark#49766) | Fix | Compatibility | Disable V2 bucketing in GlutenDynamicPartitionPruningSuite since spark.sql.sources.v2.bucketing.enabled is now enabled by default | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala` | | [#42414](apache/spark#42414), [#53038](apache/spark#53038) | Fix | Bug Fix | Resolve an issue introduced by SPARK-42414, as identified in SPARK-53038 | `backends-velox/.../VeloxBloomFilterAggregate.scala` | | N/A | Fix | Bug Fix | Enforce row fallback for unsupported cached batches - keep columnar execution only when schema validation succeeds | `backends-velox/.../ColumnarCachedBatchSerializer.scala` | | [SPARK-53132](apache/spark#53132), [SPARK-53142](apache/spark#53142) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 KeyGroupedPartitioningSuite tests. Excluded tests: `SPARK-53322*`, `SPARK-54439*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [SPARK-53535](https://issues.apache.org/jira/browse/SPARK-53535), [SPARK-54220](https://issues.apache.org/jira/browse/SPARK-54220) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenParquetIOSuite tests. Excluded tests: `SPARK-53535*`, `vectorized reader: missing all struct fields*`, `SPARK-54220*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52645](apache/spark#52645) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenStreamingQuerySuite tests. Excluded tests: `SPARK-53942: changing the number of stateless shuffle partitions via config`, `SPARK-53942: stateful shuffle partitions are retained from old checkpoint` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#47856](apache/spark#47856) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenDataFrameWindowFunctionsSuite and GlutenJoinSuite tests. Excluded tests: `SPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold`, `SPARK-49386: test SortMergeJoin (with spill by size threshold)` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52157](apache/spark#52157) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenQueryExecutionSuite tests. Excluded test: `#53413: Cleanup shuffle dependencies for commands` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#48470](apache/spark#48470) | 4.1.0 | Test Exclusion | Exclude split test in GlutenRegexpExpressionsSuite. Excluded test: `GlutenRegexpExpressionsSuite.SPLIT` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#51623](apache/spark#51623) | 4.1.0 | Test Exclusion | Add `spark.sql.unionOutputPartitioning=false` to Maven test args. Excluded tests: `GlutenBroadcastExchangeSuite.SPARK-52962`, `GlutenDataFrameSetOperationsSuite.SPARK-52921*` | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/spark41/.../VeloxTestSettings.scala`, `tools/gluten-it/common/.../Suite.scala` | | N/A | 4.1.0 | Test Exclusion | Excludes failed SQL tests that need to be fixed for Spark 4.1 compatibility. Excluded tests: `decimalArithmeticOperations.sql`, `identifier-clause.sql`, `keywords.sql`, `literals.sql`, `operators.sql`, `exists-orderby-limit.sql`, `postgreSQL/date.sql`, `nonansi/keywords.sql`, `nonansi/literals.sql`, `datetime-legacy.sql`, `datetime-parsing-invalid.sql`, `misc-functions.sql` | `gluten-ut/spark41/.../VeloxSQLQueryTestSettings.scala` | | #11252 | 4.1.0 | Test Exclusion | Exclude Gluten test for SPARK-47939: Explain should work with parameterized queries | `gluten-ut/spark41/.../VeloxTestSettings.scala` |

init

beaaae6

github-actions bot added SQL CONNECT labels Aug 9, 2023

LuciferYang mentioned this pull request Aug 9, 2023

[SPARK-42664][CONNECT] Support bloomFilter function for DataFrameStatFunctions #40352

Closed

fix test exception

a154c51

hvanhovell reviewed Aug 9, 2023

View reviewed changes

LuciferYang added 4 commits August 10, 2023 11:12

pass 4

dfbe1c4

Revert "pass 4"

d600ebb

This reverts commit dfbe1c4.

make optimalNumOfBits public

4709dd5

only pass [col, expectedNumItems: Long, numBits: Long]

fe958a6

LuciferYang commented Aug 10, 2023

View reviewed changes

add negative test check

6ffbfa0

LuciferYang commented Aug 10, 2023

View reviewed changes

use child

cf3104a

LuciferYang commented Aug 10, 2023

View reviewed changes

common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java Outdated Show resolved Hide resolved

LuciferYang commented Aug 10, 2023

View reviewed changes

common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java Outdated Show resolved Hide resolved

fix doc test

80a6b4b

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

02ce20e

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

9f52c7c

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

8ad16b7

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

72d553f

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

344364c

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

4baf85c

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

0352540

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

2cae0e4

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

ccfac71

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

f165d55

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

6db2eaf

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

0857f6d

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

321e0c8

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

b155b53

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

32e02ee

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

4577b6e

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

1a60eca

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

84053f3

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

1eb4f29

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

380cd10

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

5b1c708

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 5, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

d108684

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 5, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

5ee3a2b

…d in apache/spark#53038.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 5, 2026

[Fix] Resolve an issue introduced by apache/spark#42414, as identifie…

4041dee

…d in apache/spark#53038.

baibaichen mentioned this pull request Jan 7, 2026

[GLUTEN-11343][CORE][VL] Support Spark 4.1 UT apache/gluten#11353

Merged

	// Mark as lazy so that `estimatedNumItems` is not evaluated during tree transformation.
	private lazy val estimatedNumItems: Long =
	Math.min(estimatedNumItemsExpression.eval().asInstanceOf[Number].longValue,
	SQLConf.get.getConf(RUNTIME_BLOOM_FILTER_MAX_NUM_ITEMS))

	// Mark as lazy so that `numBits` is not evaluated during tree transformation.
	private lazy val numBits: Long =
	Math.min(numBitsExpression.eval().asInstanceOf[Number].longValue,
	SQLConf.get.getConf(RUNTIME_BLOOM_FILTER_MAX_NUM_BITS))

Conversation

LuciferYang commented Aug 9, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

LuciferYang commented Aug 9, 2023

Uh oh!

hvanhovell commented Aug 9, 2023

Uh oh!

LuciferYang commented Aug 9, 2023

Uh oh!

LuciferYang commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hvanhovell commented Aug 9, 2023

Uh oh!

LuciferYang commented Aug 9, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Aug 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Aug 10, 2023

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LuciferYang commented Aug 9, 2023 •

edited

Loading

LuciferYang Aug 10, 2023 •

edited

Loading