Skip to content

[Data] - Predicate Pushdown - Push predicate exprs past eligible operators#58555

Merged
bveeramani merged 11 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/predicate_pushdown_passthrough
Nov 14, 2025
Merged

[Data] - Predicate Pushdown - Push predicate exprs past eligible operators#58555
bveeramani merged 11 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/predicate_pushdown_passthrough

Conversation

@goutamvenkat-anyscale
Copy link
Contributor

@goutamvenkat-anyscale goutamvenkat-anyscale commented Nov 12, 2025

Description

Push Filter past Join (depends on the join op), Filter into Union branches, Filter past projections (accounting for renames), past all shuffle ops.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@goutamvenkat-anyscale goutamvenkat-anyscale requested a review from a team as a code owner November 12, 2025 04:21
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-designed refactoring of the predicate pushdown logic. By moving to a trait-based system with PredicatePushable and PredicatePushdownBehavior, the logic becomes much more extensible and maintainable, which is a great improvement. The changes are consistently applied across various operators, and the new tests are comprehensive and well-structured.

I've found one important issue in the new _clone_op_with_new_inputs helper that could lead to an inconsistent operator graph. I've also left a comment on a potentially confusing implementation in the new PredicatePushable interface.

Overall, this is excellent work that improves the foundation of the query optimizer.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Nov 12, 2025
Signed-off-by: Goutam <[email protected]>
@goutamvenkat-anyscale goutamvenkat-anyscale added the go add ONLY when ready to merge, run all tests label Nov 12, 2025
Signed-off-by: Goutam <[email protected]>
return op

input_op = op.input_dependencies[0]
predicate_expr = op._predicate_expr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation for op is LogicalOperator, but I don't think LogicalOperator is guaranteed to have a _predicate_expr attribute. Is there a risk of attribute errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we check if it's an expression based Filter in the if statement above. I can make the typing here explicit to make it easier.

return input_op.apply_predicate(predicate_expr)

# Case 2: Check if operator allows predicates to pass through
if isinstance(input_op, PredicatePassThrough):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: LogicalOperatorSupportsPredicatePushdown and PredicatePassThrough appear to follow very different naming convention. Any reason not to use similar naming patterns (e.g., PredicatePushdown/PredicatePassThrough or LogicalOperatorSupportsPredicatePushdown/LogicalOperatorSupportsPredicatePassThrough)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this one. I can change this to LogicalOperatorSupportsPredicatePassThrough to be consistent.

if behavior == PredicatePushdownBehavior.PASSTHROUGH:
# Push filter through and recursively try to push further
new_filter = Filter(
input_op.input_dependencies[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Here and below) Are we always guaranteed to have one input dependency in this case? If this is an assumption we're making, add assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes added an assertion

branch₁ ─> Filter(predicate) ─┐
branch₂ ─> Filter(predicate) ─┤ Union
branch₃ ─> Filter(predicate) ─┘
def _push_filter_through_conditionally(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This _push_filter_through_conditionally/PredicatePushdownBehavior .CONDITIONAL abstractions confused me because they sounds like a generic abstractions, but the implementation is specific to joins.

Don't really have a good alternative suggestion, but thought it was worth calling out.

Copy link
Contributor Author

@goutamvenkat-anyscale goutamvenkat-anyscale Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it's specific to joins. But this could apply to Intersect (not yet in RD), Window operator etc.

)


class TestPredicatePushdownIntoRead:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests for the join case anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in test_join

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed it

Comment on lines +376 to +378
# Verify plan: all filters pushed into Read, passthrough ops remain
plan = _get_optimized_plan(ds)
assert "Filter[Filter" not in plan, f"No Filter operators should remain: {plan}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere -- would prefer to not test against the string representations because that's historically been pretty brittle.

If it'd make sense to do so, one option is to add a utility function to LogicalPlan to see it structurally matches another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some helper functions in test utils to replace the string assertions with class assertions

Signed-off-by: Goutam <[email protected]>
Signed-off-by: Goutam <[email protected]>
Signed-off-by: Goutam <[email protected]>
)


class TestPredicatePushdownIntoRead:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed it

@bveeramani bveeramani merged commit 7a05d43 into ray-project:master Nov 14, 2025
6 checks passed
@goutamvenkat-anyscale goutamvenkat-anyscale deleted the goutam/predicate_pushdown_passthrough branch November 14, 2025 19:43
ArturNiederfahrenhorst pushed a commit to ArturNiederfahrenhorst/ray that referenced this pull request Nov 16, 2025
…ators (ray-project#58555)

## Description
Push `Filter` past Join (depends on the join op), `Filter` into Union
branches, `Filter` past projections (accounting for renames), past all
shuffle ops.
## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <[email protected]>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ators (ray-project#58555)

## Description
Push `Filter` past Join (depends on the join op), `Filter` into Union
branches, `Filter` past projections (accounting for renames), past all
shuffle ops.
## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…ators (ray-project#58555)

## Description
Push `Filter` past Join (depends on the join op), `Filter` into Union
branches, `Filter` past projections (accounting for renames), past all
shuffle ops.
## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <[email protected]>
Signed-off-by: YK <[email protected]>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ators (ray-project#58555)

## Description
Push `Filter` past Join (depends on the join op), `Filter` into Union
branches, `Filter` past projections (accounting for renames), past all
shuffle ops.
## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <[email protected]>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…ators (ray-project#58555)

## Description
Push `Filter` past Join (depends on the join op), `Filter` into Union
branches, `Filter` past projections (accounting for renames), past all
shuffle ops.
## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ators (ray-project#58555)

## Description
Push `Filter` past Join (depends on the join op), `Filter` into Union
branches, `Filter` past projections (accounting for renames), past all
shuffle ops.
## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <[email protected]>
Signed-off-by: peterxcli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants