Skip to content

[Data] Disable streaming generator backpressure for download partition actor#57688

Merged
bveeramani merged 4 commits intomasterfrom
data-download-partition-backpressure
Oct 14, 2025
Merged

[Data] Disable streaming generator backpressure for download partition actor#57688
bveeramani merged 4 commits intomasterfrom
data-download-partition-backpressure

Conversation

@bveeramani
Copy link
Member

Summary

This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism.

Changes

  • Set _generator_backpressure_num_objects to -1 for the partition actor
  • Use dedicated ray_remote_args for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic)

Test plan

  • Verify download operations complete successfully
  • Confirm improved parallelism in download tasks
  • Check that backpressure is properly disabled for partition actor

🤖 Generated with Claude Code

…n actor

This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism.

Changes:
- Set `_generator_backpressure_num_objects` to -1 for the partition actor
- Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
@bveeramani bveeramani requested a review from a team as a code owner October 14, 2025 04:20
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a performance bottleneck in download operations by disabling streaming generator backpressure for the internal partition actor. The change to use dedicated ray_remote_args for the partition actor is a good improvement for encapsulation, preventing user-provided arguments from unintentionally affecting internal logic. The implementation is straightforward and well-commented, though there is a minor typo in one of the comments that should be corrected.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Balaji Veeramani <[email protected]>
@ray-gardener ray-gardener bot added the data Ray Data-related issues label Oct 14, 2025
Comment on lines 121 to +126
self._ray_remote_args = self._apply_default_remote_args(
self._ray_remote_args, self.data_context
)
self._ray_actor_task_remote_args = self._apply_default_actor_task_remote_args(
ray_actor_task_remote_args, self.data_context
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my understanding, what's the semantic difference between ray_remote_args, _ray_remote_args_fn and now _ray_actor_task_remote_args?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ray_remote_args -- passed to the top-level actor remote.
@ray.remote(**ray_remote_args)
class Actor:
    ...
  • _ray_remote_args_fn -- like ray_remote_args, but generated dynamically at runtime. IIRC mostly used for async stuff, but I forget.

  • _ray_actor_task_remote_args passed to options. In theory, should support same values as ray_remote_args, but doesn't.

@ray.remote
class Actor:
    def task(self):
        ...

actor = Actor.remote()
actor.task.options(**ray_actor_task_remote_args).remote()

Copy link
Contributor

@omatthew98 omatthew98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@bveeramani bveeramani enabled auto-merge (squash) October 14, 2025 22:30
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 14, 2025
@bveeramani bveeramani merged commit e00c87e into master Oct 14, 2025
8 checks passed
@bveeramani bveeramani deleted the data-download-partition-backpressure branch October 14, 2025 23:37
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…n actor (ray-project#57688)

## Summary
This change disables Ray Core's streaming generator backpressure for the
partition actor used in download operations. The partition actor is a
lightweight, fast operation that batches URIs before they're sent to
download tasks. When backpressure was enabled, Ray Core would throttle
the partition actor's output, which starved the downstream download
tasks of work and reduced parallelism.

## Changes
- Set `_generator_backpressure_num_objects` to -1 for the partition
actor
- Use dedicated `ray_remote_args` for the partition actor instead of the
user-provided args (which should only affect download tasks, not
internal partitioning logic)

## Test plan
- [ ] Verify download operations complete successfully
- [ ] Confirm improved parallelism in download tasks
- [ ] Check that backpressure is properly disabled for partition actor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
…n actor (ray-project#57688)

## Summary
This change disables Ray Core's streaming generator backpressure for the
partition actor used in download operations. The partition actor is a
lightweight, fast operation that batches URIs before they're sent to
download tasks. When backpressure was enabled, Ray Core would throttle
the partition actor's output, which starved the downstream download
tasks of work and reduced parallelism.

## Changes
- Set `_generator_backpressure_num_objects` to -1 for the partition
actor
- Use dedicated `ray_remote_args` for the partition actor instead of the
user-provided args (which should only affect download tasks, not
internal partitioning logic)

## Test plan
- [ ] Verify download operations complete successfully
- [ ] Confirm improved parallelism in download tasks
- [ ] Check that backpressure is properly disabled for partition actor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xgui <[email protected]>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
…n actor (#57688)

## Summary
This change disables Ray Core's streaming generator backpressure for the
partition actor used in download operations. The partition actor is a
lightweight, fast operation that batches URIs before they're sent to
download tasks. When backpressure was enabled, Ray Core would throttle
the partition actor's output, which starved the downstream download
tasks of work and reduced parallelism.

## Changes
- Set `_generator_backpressure_num_objects` to -1 for the partition
actor
- Use dedicated `ray_remote_args` for the partition actor instead of the
user-provided args (which should only affect download tasks, not
internal partitioning logic)

## Test plan
- [ ] Verify download operations complete successfully
- [ ] Confirm improved parallelism in download tasks
- [ ] Check that backpressure is properly disabled for partition actor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: elliot-barn <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…n actor (ray-project#57688)

## Summary
This change disables Ray Core's streaming generator backpressure for the
partition actor used in download operations. The partition actor is a
lightweight, fast operation that batches URIs before they're sent to
download tasks. When backpressure was enabled, Ray Core would throttle
the partition actor's output, which starved the downstream download
tasks of work and reduced parallelism.

## Changes
- Set `_generator_backpressure_num_objects` to -1 for the partition
actor
- Use dedicated `ray_remote_args` for the partition actor instead of the
user-provided args (which should only affect download tasks, not
internal partitioning logic)

## Test plan
- [ ] Verify download operations complete successfully
- [ ] Confirm improved parallelism in download tasks
- [ ] Check that backpressure is properly disabled for partition actor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…n actor (ray-project#57688)

## Summary
This change disables Ray Core's streaming generator backpressure for the
partition actor used in download operations. The partition actor is a
lightweight, fast operation that batches URIs before they're sent to
download tasks. When backpressure was enabled, Ray Core would throttle
the partition actor's output, which starved the downstream download
tasks of work and reduced parallelism.

## Changes
- Set `_generator_backpressure_num_objects` to -1 for the partition
actor
- Use dedicated `ray_remote_args` for the partition actor instead of the
user-provided args (which should only affect download tasks, not
internal partitioning logic)

## Test plan
- [ ] Verify download operations complete successfully
- [ ] Confirm improved parallelism in download tasks
- [ ] Check that backpressure is properly disabled for partition actor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Aydin Abiar <[email protected]>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…n actor (ray-project#57688)

## Summary
This change disables Ray Core's streaming generator backpressure for the
partition actor used in download operations. The partition actor is a
lightweight, fast operation that batches URIs before they're sent to
download tasks. When backpressure was enabled, Ray Core would throttle
the partition actor's output, which starved the downstream download
tasks of work and reduced parallelism.

## Changes
- Set `_generator_backpressure_num_objects` to -1 for the partition
actor
- Use dedicated `ray_remote_args` for the partition actor instead of the
user-provided args (which should only affect download tasks, not
internal partitioning logic)

## Test plan
- [ ] Verify download operations complete successfully
- [ ] Confirm improved parallelism in download tasks
- [ ] Check that backpressure is properly disabled for partition actor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants