Skip to content

[train] implement BaseWorkerGroup for V1/V2 compatibility#57151

Merged
matthewdeng merged 7 commits intoray-project:masterfrom
matthewdeng:worker-group-interface
Oct 4, 2025
Merged

[train] implement BaseWorkerGroup for V1/V2 compatibility#57151
matthewdeng merged 7 commits intoray-project:masterfrom
matthewdeng:worker-group-interface

Conversation

@matthewdeng
Copy link
Contributor

@matthewdeng matthewdeng commented Oct 3, 2025

Problem: Backend configurations were tightly coupled to V1 WorkerGroup, relying on methods/attributes to implemented in V2 WorkerGroup.

Solution:

  • Add BaseWorkerGroup defining common methods (execute, __len__, get_resources_per_worker, etc.)
  • Both V1 and V2 WorkerGroup now implement the interface
  • Update all backend configs to use BaseWorkerGroup instead of V1 WorkerGroup
    • Note: Horovod explicitly uses V1 WorkerGroup directly because it is not implemented for V2.

@matthewdeng matthewdeng marked this pull request as ready for review October 3, 2025 17:35
@matthewdeng matthewdeng requested a review from a team as a code owner October 3, 2025 17:35
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Matthew Deng <[email protected]>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Matthew Deng <[email protected]>
@ray-gardener ray-gardener bot added the train Ray Train Related Issue label Oct 3, 2025
Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Address the cursor comment and should be good



@DeveloperAPI
class WorkerGroupInterface(abc.ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class WorkerGroupInterface(abc.ABC):
class RayWorkerGroup(abc.ABC):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then subclasses can be RayTrainWorkerGroup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to be BaseWorkerGroup

Comment on lines +167 to +168
resources = worker_group.get_resources_per_worker()
num_gpus_per_worker = resources.get("GPU", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great no more sketchy stuff here!

Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Matthew Deng <[email protected]>
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Matthew Deng <[email protected]>
@matthewdeng matthewdeng enabled auto-merge (squash) October 4, 2025 05:33
@matthewdeng matthewdeng disabled auto-merge October 4, 2025 05:33
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 4, 2025
@matthewdeng matthewdeng changed the title [train] implement WorkerGroupInterface for V1/V2 compatibility [train] implement BaseWorkerGroup for V1/V2 compatibility Oct 4, 2025
@matthewdeng matthewdeng enabled auto-merge (squash) October 4, 2025 05:34
@matthewdeng matthewdeng merged commit e0313c4 into ray-project:master Oct 4, 2025
9 checks passed
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
liulehui pushed a commit to liulehui/ray that referenced this pull request Oct 9, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Josh Kodi <[email protected]>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…ect#57151)

**Problem:** Backend configurations were tightly coupled to V1
`WorkerGroup`, relying on methods/attributes to implemented in V2
`WorkerGroup`.

**Solution:**
* Add `BaseWorkerGroup` defining common methods (`execute`, `__len__`,
`get_resources_per_worker`, etc.)
* Both V1 and V2 `WorkerGroup` now implement the interface
* Update all backend configs to use `BaseWorkerGroup` instead of V1
`WorkerGroup`
* **Note:** Horovod explicitly uses V1 WorkerGroup directly because it
is not implemented for V2.

---------

Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants