[Data] map_batches support limit_pushdown#57880
Merged
alexeykudinkin merged 7 commits intoray-project:masterfrom Oct 24, 2025
Merged
[Data] map_batches support limit_pushdown#57880alexeykudinkin merged 7 commits intoray-project:masterfrom
map_batches support limit_pushdown#57880alexeykudinkin merged 7 commits intoray-project:masterfrom
Conversation
Signed-off-by: You-Cheng Lin <[email protected]>
map_batches support limit_pushdownmap_batches support limit_pushdown
Signed-off-by: You-Cheng Lin <[email protected]>
| fn_constructor_kwargs: Optional[Dict[str, Any]] = None, | ||
| min_rows_per_bundled_input: Optional[int] = None, | ||
| compute: Optional[ComputeStrategy] = None, | ||
| preserve_row_count: bool = False, |
Contributor
There was a problem hiding this comment.
preserves_row_count
| assert result_with == expected | ||
|
|
||
|
|
||
| def test_limit_pushdown_preserve_row_count_with_map_batches( |
Contributor
There was a problem hiding this comment.
Can we make this a parameterized test?
python/ray/data/dataset.py
Outdated
| worker. | ||
| memory: The heap memory in bytes to reserve for each parallel map worker. | ||
| concurrency: This argument is deprecated. Use ``compute`` argument. | ||
| preserve_row_count: Set to True only if the UDF always emits the same number of records it receives (no drops or duplicates). When true, the optimizer can push downstream limits past this transform for better pruning. |
Contributor
There was a problem hiding this comment.
For the 2nd sentence: When set to True, the logical optimizer, in the presence of a limit(limit=k), will only scan k rows prior to executing the UDF, thereby saving on compute resources.
goutamvenkat-anyscale
requested changes
Oct 21, 2025
Contributor
goutamvenkat-anyscale
left a comment
There was a problem hiding this comment.
lgtm. Just a few cosmetic changes
Signed-off-by: You-Cheng Lin <[email protected]>
goutamvenkat-anyscale
approved these changes
Oct 21, 2025
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
2bc26a1 to
93e8111
Compare
Signed-off-by: You-Cheng Lin <[email protected]>
alexeykudinkin
approved these changes
Oct 24, 2025
xinyuangui2
pushed a commit
to xinyuangui2/ray
that referenced
this pull request
Oct 27, 2025
## Description This PR adds a `preserve_row` option to `map_batches`. When `preserve_row` is true, the limit operator can be pushed down through this `map_batches` call for optimization. Note: `map_group` is built on `map_batches`, but limit pushdown support for `map_group` is out of scope for this PR, so `preserve_row_count` is set to false for it. ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: You-Cheng Lin <[email protected]> Signed-off-by: xgui <[email protected]>
landscapepainter
pushed a commit
to landscapepainter/ray
that referenced
this pull request
Nov 17, 2025
## Description This PR adds a `preserve_row` option to `map_batches`. When `preserve_row` is true, the limit operator can be pushed down through this `map_batches` call for optimization. Note: `map_group` is built on `map_batches`, but limit pushdown support for `map_group` is out of scope for this PR, so `preserve_row_count` is set to false for it. ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: You-Cheng Lin <[email protected]>
Aydin-ab
pushed a commit
to Aydin-ab/ray-aydin
that referenced
this pull request
Nov 19, 2025
## Description This PR adds a `preserve_row` option to `map_batches`. When `preserve_row` is true, the limit operator can be pushed down through this `map_batches` call for optimization. Note: `map_group` is built on `map_batches`, but limit pushdown support for `map_group` is out of scope for this PR, so `preserve_row_count` is set to false for it. ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: You-Cheng Lin <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
Future-Outlier
pushed a commit
to Future-Outlier/ray
that referenced
this pull request
Dec 7, 2025
## Description This PR adds a `preserve_row` option to `map_batches`. When `preserve_row` is true, the limit operator can be pushed down through this `map_batches` call for optimization. Note: `map_group` is built on `map_batches`, but limit pushdown support for `map_group` is out of scope for this PR, so `preserve_row_count` is set to false for it. ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: You-Cheng Lin <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
bveeramani
added a commit
that referenced
this pull request
Feb 4, 2026
This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([#39486](#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([#57880](#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR #60448](#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]>
tiennguyentony
pushed a commit
to tiennguyentony/ray
that referenced
this pull request
Feb 7, 2026
…ct#60756) This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (ray-project#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (ray-project#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (ray-project#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([ray-project#39486](ray-project#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([ray-project#57880](ray-project#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR ray-project#60448](ray-project#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: tiennguyentony <[email protected]>
tiennguyentony
pushed a commit
to tiennguyentony/ray
that referenced
this pull request
Feb 7, 2026
…ct#60756) This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (ray-project#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (ray-project#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (ray-project#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([ray-project#39486](ray-project#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([ray-project#57880](ray-project#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR ray-project#60448](ray-project#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: tiennguyentony <[email protected]>
tiennguyentony
pushed a commit
to tiennguyentony/ray
that referenced
this pull request
Feb 7, 2026
…ct#60756) This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (ray-project#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (ray-project#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (ray-project#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([ray-project#39486](ray-project#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([ray-project#57880](ray-project#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR ray-project#60448](ray-project#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]>
elliot-barn
pushed a commit
that referenced
this pull request
Feb 9, 2026
This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([#39486](#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([#57880](#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR #60448](#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: elliot-barn <[email protected]>
elliot-barn
pushed a commit
that referenced
this pull request
Feb 9, 2026
This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([#39486](#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([#57880](#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR #60448](#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]>
Kunchd
pushed a commit
to Kunchd/ray
that referenced
this pull request
Feb 17, 2026
…ct#60756) This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (ray-project#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (ray-project#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (ray-project#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([ray-project#39486](ray-project#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([ray-project#57880](ray-project#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR ray-project#60448](ray-project#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]>
ans9868
pushed a commit
to ans9868/ray
that referenced
this pull request
Feb 18, 2026
…ct#60756) This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (ray-project#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (ray-project#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (ray-project#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([ray-project#39486](ray-project#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([ray-project#57880](ray-project#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR ray-project#60448](ray-project#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: Adel Nour <[email protected]>
Aydin-ab
pushed a commit
to kunling-anyscale/ray
that referenced
this pull request
Feb 20, 2026
…ct#60756) This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (ray-project#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (ray-project#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (ray-project#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([ray-project#39486](ray-project#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([ray-project#57880](ray-project#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR ray-project#60448](ray-project#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
## Description This PR adds a `preserve_row` option to `map_batches`. When `preserve_row` is true, the limit operator can be pushed down through this `map_batches` call for optimization. Note: `map_group` is built on `map_batches`, but limit pushdown support for `map_group` is out of scope for this PR, so `preserve_row_count` is set to false for it. ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: You-Cheng Lin <[email protected]> Signed-off-by: peterxcli <[email protected]>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
…ct#60756) This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (ray-project#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (ray-project#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (ray-project#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([ray-project#39486](ray-project#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([ray-project#57880](ray-project#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR ray-project#60448](ray-project#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: peterxcli <[email protected]>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
…ct#60756) This PR updates the operator fusion rule to fuse `MapBatches` even if they modify the row counts. The intention of this PR is to preserve the historical operator fusion behavior and avoid introducing regressions. For more details, see the timeline below. --- ### Timeline of Changes | Date | Event | Description | | :--- | :--- | :--- | | **June 8, 2023** | **Limit pushdown added** | Added limit pushdown and a property to `MapBatches` incorrectly stating it doesn't modify row counts. (ray-project#35950) | | **June 27, 2023** | **Limit pushdown disabled** | Rule disabled because it incorrectly pushed limits past UDFs that modified row counts. (ray-project#36831) | | **April 28, 2025** | **Fusion restricted** | Added logic to stop fusing operators that modify row counts when the downstream has a batch size. `MapBatches` stayed fused only because of its incorrect property (ray-project#52570). | | **July 8, 2025** | **Limit pushdown re-enabled with special case** | Re-enabled with a special case to prevent pushing limits past `MapBatches`. ([ray-project#39486](ray-project#39486)) | | **Oct 24, 2025** | **Special case removed** | Special case removed, re-introducing the bug where limits are pushed past `MapBatches`. ([ray-project#57880](ray-project#57880)) | | **Feb 2, 2026** | **Property Fix** | Updated `MapBatches` to correctly report it modifies rows by default. This fixed the pushdown bug but broke fusion logic. ([PR ray-project#60448](ray-project#60448)) | | **Feb 4, 2026** | (This PR) | Add a special-case to preserve the historical `MapBatches` fusion behavior | --- <!-- BUGBOT_STATUS --><sup><a href="proxy.php?url=https://cursor.com/dashboard?tab=bugbot">Cursor Bugbot</a> reviewed your changes and found no issues for commit <u>d99e7b1</u></sup><!-- /BUGBOT_STATUS --> --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: peterxcli <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a
preserve_rowoption to map_batches. When preserve_rowis true, the limit operator can be pushed down through this map_batchescall for optimization.Note:
map_groupis built on map_batches, but limit pushdown support for map_groupis out of scope for this PR, so preserve_row_countis set to false for it.Related issues
Additional information