[train] Enable v2 for ray/train/tests#56868
Conversation
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request introduces significant changes to enable Ray Train v2. The modifications include updating CI configurations, migrating numerous tests to use the v2 API by setting RAY_TRAIN_V2_ENABLED=1, and cleaning up obsolete v1 examples and tests. The changes are generally consistent and well-structured.
My main concern is the removal of tests in python/ray/train/tests/test_gpu_auto_transfer.py that verified critical GPU-related functionality. While the tests used v1 APIs, the underlying features are still important for v2. I've left a specific comment on this.
Otherwise, the migration strategy seems sound, with clear use of environment variables to toggle between API versions and TODOs marking areas for future work.
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <[email protected]>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Josh Kodi <[email protected]>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Josh Kodi <[email protected]>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <[email protected]>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: #57534, #57256, #56868, #56820, #56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by #57042 and #57121. --------- Signed-off-by: Justin Yu <[email protected]>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <[email protected]>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <[email protected]>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <[email protected]> Signed-off-by: xgui <[email protected]>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]> Signed-off-by: xgui <[email protected]>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <[email protected]> Signed-off-by: xgui <[email protected]>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <[email protected]> Signed-off-by: elliot-barn <[email protected]>
#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]> Signed-off-by: elliot-barn <[email protected]>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: #57534, #57256, #56868, #56820, #56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by #57042 and #57121. --------- Signed-off-by: Justin Yu <[email protected]> Signed-off-by: elliot-barn <[email protected]>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <[email protected]>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <[email protected]>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations. --------- Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
ray-project#56868 replaced `gpu` tags with `train_v2_gpu` in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the `gpu` tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new `train_v2_gpu` tag as well. Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
Ports over the remaining unit tests that were marked as TODOs from this series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816. Notably: * `test_new_dataset_config -> test_data_integration` * `test_backend -> test_torch_trainer, test_worker_group` * `test_gpu -> test_torch_gpu` This PR also finishes migrating the Tune LightGBM/Keras examples which were unblocked by ray-project#57042 and ray-project#57121. --------- Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
Note
Enable Train v2 across CI with new CPU/GPU jobs, migrate tests to v2 with env flags and tag updates, remove legacy Tune examples/utilities, and fix docs/BUILD configurations.
train_v2_gpu.gpu_only, usegpu), adjust include/except tag filters.RAY_TRAIN_V2_ENABLED(on/off per test); retag v2 tests (train_v2,train_v2_gpu).fit().team:ml/ray_airtoteam:data.construct_pathfromtrain/_internal/utils.pyand its unit test; update pydoclint baseline accordingly.Written by Cursor Bugbot for commit b88db74. This will update automatically on new commits. Configure here.
Legacy V1 tests
This PR explicitly marked legacy v1 tests by running them with RAY_TRAIN_V2_ENABLED=0. See the BUILD files.
Followups
There are some tests where some tests should be migrated to V2 and some other tests in the file are legacy V1 tests. I've marked these as TODOs in the BUILD files to migrate in a follow-up.