[SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method#42612
[SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method#42612ConeyLiu wants to merge 5 commits intoapache:masterfrom
Conversation
|
Hi, @cloud-fan @rdblue @beliefer @LuciferYang could you help to review this? Thanks a lot. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala
Outdated
Show resolved
Hide resolved
|
Gentle ping @cloud-fan @sunchao, could you please also take a look at this? Really appreciate. |
|
Apologies @ConeyLiu , just saw this PR. I think this makes sense. Could you rebase it? I'll review afterwards. |
c6f0a84 to
50f09cc
Compare
|
@sunchao that's OK, rebased. |
There was a problem hiding this comment.
Hmm instead of adding these two parameters, can we instead check staticObject and functionName?
There was a problem hiding this comment.
+1, we can check if staticObject is a subclass of BoundFunction
There was a problem hiding this comment.
For V2 UDF, the staticObject here is the class of the BoundFunction, and the functionName is the magic name: invoke. However, we need the name and canonicalName to build the UserDefinedScalarFunc.
There was a problem hiding this comment.
we can instantiate staticObject and call its canonicalName function?
There was a problem hiding this comment.
Seems is not easy. The BoundFunction is created by calling the bind method of UnboundFunction which is loaded by FunctionCatalog. And we have no guarantee the BoundFunction has no-args constructor to create with reflect.
There was a problem hiding this comment.
It's better to reuse StaticInoke so that any optimizations for it can still apply. How about we add one more parameter function: Option[ScalarFunction[_]] to it instead of two strings?
There was a problem hiding this comment.
I was thinking extending StaticInvoke somehow like
case class V2StaticInvoke(wrapped: StaticInvoke, name: String, canonicalName: String) extends InvokeLikebut this way we still need to override quite a few members in InvokeLike so may not worth it.
There was a problem hiding this comment.
How about we add one more parameter function: Option[ScalarFunction[_]] to it instead of two strings?
Changed to this way. And it is the initial implementation.
There was a problem hiding this comment.
@cloud-fan @sunchao Although adding ScalarFunction as a new parameter is convenient, it makes StaticInvoke look very strange.
We already call the scalarFunc.getClass, scalarFunc.resultType(), scalarFunc.isResultNullable and scalarFunc.isDeterministic before passed the new ScalarFunction.
There was a problem hiding this comment.
I'm OK to keep as it is. For me it is just a nit and one parameter makes the change less intrusive.
a7060ef to
12982ff
Compare
| * @param scalarFunction the [[ScalarFunction]] object if this is calling the magic method of the | ||
| * [[ScalarFunction]] otherwise is unset. | ||
| */ | ||
| case class StaticInvoke( |
There was a problem hiding this comment.
we can override stringArgs in StaticInvoke, to exclude the new parameter if it's None, to avoid the golden file changes.
|
Thanks! merged to master branch |
|
Thanks @sunchao @beliefer @cloud-fan |
| if (scalarFunction.nonEmpty) { | ||
| super.stringArgs | ||
| } else { | ||
| super.stringArgs.take(8) |
There was a problem hiding this comment.
this is fragile. I think super.stringArgs.dropRight(1) is better.
There was a problem hiding this comment.
@cloud-fan I have submitted a follow-up PR for this.
… magic method ### What changes were proposed in this pull request? This is follow up #42612 and to address the comment #42612 (comment) ### Why are the changes needed? To address comments. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43262 from ConeyLiu/42612-followup. Authored-by: Xianyang Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Right now we only support pushing down the V2 UDF that has not a magic method. Because the V2 UDF will be analyzed into the
ApplyFunctionExpressionwhich could be translated and pushed down. However, a V2 UDF that has the magic method will be analyzed intoStaticInvokeorInvokethat can not be translated into V2 expression and then can not be pushed down to the data source. The magic method is suggested.Why are the changes needed?
This PR adds the support of pushing down the V2 UDF that has a magic method.
Does this PR introduce any user-facing change?
Yes, now the V2 UDF with the magic method could be pushed down.
How was this patch tested?
New UTs.
Was this patch authored or co-authored using generative AI tooling?
No