[Data] Improve Streaming Repartition#58728
Merged
raulchen merged 5 commits intoray-project:masterfrom Nov 19, 2025
Merged
Conversation
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
xinyuangui2
reviewed
Nov 18, 2025
Contributor
xinyuangui2
left a comment
There was a problem hiding this comment.
Personally I think using BatchMapTransformFn with disabled_shaping is easier to understand than OutputBlockSizeOption.target_num_rows_per_block
| 2. Whenever the pending total reaches the target row count, try to build a ready bundle. | ||
| 3. Determine the slice needed from the final bundle so the ready bundle holds an exact multiple of the target rows. | ||
| 4. Submit that ready bundle to a remote map task; the task slices each block according to the slice metadata stored in the RefBundle (the bundle now contains n × target rows for n ≥ 1). | ||
| 5. We configured the `OutputBlockSizeOption.target_num_rows_per_block` to the target number of rows per block in plan_streaming_repartition_op so the output buffer further splits the n × target rows into n blocks of exactly the target size. |
Contributor
There was a problem hiding this comment.
Let's also add comments to MAX_SAFE_ROWS_PER_BLOCK_FACTOR saying it should be < 2
Co-authored-by: Xinyuan <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
raulchen
approved these changes
Nov 19, 2025
400Ping
pushed a commit
to 400Ping/ray
that referenced
this pull request
Nov 21, 2025
## Description - Document the internal logic of Streaming Repartition Implementation - Add `num_rows_per_block` to Streaming Repartition name ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: Xinyuan <[email protected]>
ykdojo
pushed a commit
to ykdojo/ray
that referenced
this pull request
Nov 27, 2025
## Description - Document the internal logic of Streaming Repartition Implementation - Add `num_rows_per_block` to Streaming Repartition name ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: Xinyuan <[email protected]> Signed-off-by: YK <[email protected]>
SheldonTsen
pushed a commit
to SheldonTsen/ray
that referenced
this pull request
Dec 1, 2025
## Description - Document the internal logic of Streaming Repartition Implementation - Add `num_rows_per_block` to Streaming Repartition name ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: Xinyuan <[email protected]>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
## Description - Document the internal logic of Streaming Repartition Implementation - Add `num_rows_per_block` to Streaming Repartition name ## Related issues ## Additional information --------- Signed-off-by: You-Cheng Lin <[email protected]> Signed-off-by: You-Cheng Lin <[email protected]> Co-authored-by: Xinyuan <[email protected]> Signed-off-by: peterxcli <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
num_rows_per_blockto Streaming Repartition nameRelated issues
Additional information