[Data] Disable streaming generator backpressure for download partition actor#57688
[Data] Disable streaming generator backpressure for download partition actor#57688bveeramani merged 4 commits intomasterfrom
Conversation
…n actor This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism. Changes: - Set `_generator_backpressure_num_objects` to -1 for the partition actor - Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Balaji Veeramani <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a performance bottleneck in download operations by disabling streaming generator backpressure for the internal partition actor. The change to use dedicated ray_remote_args for the partition actor is a good improvement for encapsulation, preventing user-provided arguments from unintentionally affecting internal logic. The implementation is straightforward and well-commented, though there is a minor typo in one of the comments that should be corrected.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
| self._ray_remote_args = self._apply_default_remote_args( | ||
| self._ray_remote_args, self.data_context | ||
| ) | ||
| self._ray_actor_task_remote_args = self._apply_default_actor_task_remote_args( | ||
| ray_actor_task_remote_args, self.data_context | ||
| ) |
There was a problem hiding this comment.
for my understanding, what's the semantic difference between ray_remote_args, _ray_remote_args_fn and now _ray_actor_task_remote_args?
There was a problem hiding this comment.
ray_remote_args-- passed to the top-level actor remote.
@ray.remote(**ray_remote_args)
class Actor:
...-
_ray_remote_args_fn-- likeray_remote_args, but generated dynamically at runtime. IIRC mostly used for async stuff, but I forget. -
_ray_actor_task_remote_argspassed to options. In theory, should support same values asray_remote_args, but doesn't.
@ray.remote
class Actor:
def task(self):
...
actor = Actor.remote()
actor.task.options(**ray_actor_task_remote_args).remote()…n actor (ray-project#57688) ## Summary This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism. ## Changes - Set `_generator_backpressure_num_objects` to -1 for the partition actor - Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic) ## Test plan - [ ] Verify download operations complete successfully - [ ] Confirm improved parallelism in download tasks - [ ] Check that backpressure is properly disabled for partition actor 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…n actor (ray-project#57688) ## Summary This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism. ## Changes - Set `_generator_backpressure_num_objects` to -1 for the partition actor - Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic) ## Test plan - [ ] Verify download operations complete successfully - [ ] Confirm improved parallelism in download tasks - [ ] Check that backpressure is properly disabled for partition actor 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: xgui <[email protected]>
…n actor (#57688) ## Summary This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism. ## Changes - Set `_generator_backpressure_num_objects` to -1 for the partition actor - Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic) ## Test plan - [ ] Verify download operations complete successfully - [ ] Confirm improved parallelism in download tasks - [ ] Check that backpressure is properly disabled for partition actor 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: elliot-barn <[email protected]>
…n actor (ray-project#57688) ## Summary This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism. ## Changes - Set `_generator_backpressure_num_objects` to -1 for the partition actor - Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic) ## Test plan - [ ] Verify download operations complete successfully - [ ] Confirm improved parallelism in download tasks - [ ] Check that backpressure is properly disabled for partition actor 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…n actor (ray-project#57688) ## Summary This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism. ## Changes - Set `_generator_backpressure_num_objects` to -1 for the partition actor - Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic) ## Test plan - [ ] Verify download operations complete successfully - [ ] Confirm improved parallelism in download tasks - [ ] Check that backpressure is properly disabled for partition actor 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Aydin Abiar <[email protected]>
…n actor (ray-project#57688) ## Summary This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism. ## Changes - Set `_generator_backpressure_num_objects` to -1 for the partition actor - Use dedicated `ray_remote_args` for the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic) ## Test plan - [ ] Verify download operations complete successfully - [ ] Confirm improved parallelism in download tasks - [ ] Check that backpressure is properly disabled for partition actor 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Future-Outlier <[email protected]>
Summary
This change disables Ray Core's streaming generator backpressure for the partition actor used in download operations. The partition actor is a lightweight, fast operation that batches URIs before they're sent to download tasks. When backpressure was enabled, Ray Core would throttle the partition actor's output, which starved the downstream download tasks of work and reduced parallelism.
Changes
_generator_backpressure_num_objectsto -1 for the partition actorray_remote_argsfor the partition actor instead of the user-provided args (which should only affect download tasks, not internal partitioning logic)Test plan
🤖 Generated with Claude Code