Skip to content

feat(spark): add arrays_overlap with Spark three-valued null semantics#20781

Draft
n0r0shi wants to merge 1 commit intoapache:mainfrom
n0r0shi:spark-arrays-overlap
Draft

feat(spark): add arrays_overlap with Spark three-valued null semantics#20781
n0r0shi wants to merge 1 commit intoapache:mainfrom
n0r0shi:spark-arrays-overlap

Conversation

@n0r0shi
Copy link

@n0r0shi n0r0shi commented Mar 7, 2026

Which issue does this PR close?

Closes #15914 (partial — adds one more Spark-compatible function)

Related: apache/datafusion-comet#3645

Rationale

Spark's arrays_overlap uses three-valued null logic, which differs from DataFusion's built-in array_has_any:

Input Spark arrays_overlap DataFusion array_has_any
[1, 2], [2, 3] true true
[1, 2], [3, 4] false false
[1, NULL], [3] null false
[1, 2], [3, NULL] null false
[1, NULL], [1, 3] true true

In Spark, when there's no definite overlap but either array contains a null element, the result is null.

What changes are included in this PR?

Adds SparkArraysOverlap to the datafusion-spark crate, following the same pattern as SparkArrayContains: delegate to DataFusion's array_has_any, then patch rows where the result is false and either input array contains null elements to null.

Are these changes tested?

Unit tests covering:

  • Definite overlap → true
  • No overlap, no nulls → false
  • No overlap, null in left → null
  • No overlap, null in right → null
  • Overlap with nulls present → true (definite match trumps null)
  • Null list → null
  • Multi-row mixed cases

DataFusion's built-in `array_has_any` returns `false` when arrays have
no definite overlap but contain null elements. Spark's `arrays_overlap`
returns `null` in this case (three-valued logic: overlap is unknown
because nulls could match any value).

This wraps `array_has_any` and patches results where `false` + either
array has nulls → `null`, following the same pattern as SparkArrayContains.
@github-actions github-actions bot added the spark label Mar 7, 2026
@comphead
Copy link
Contributor

comphead commented Mar 7, 2026

it might be similar to #20611

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[EPIC] Complete datafusion-spark Spark Compatible Functions

2 participants