[Ray Data | Docs] Error in ray.data.groupby example in docs#57036
[Ray Data | Docs] Error in ray.data.groupby example in docs#57036richardliaw merged 1 commit intoray-project:masterfrom
Conversation
`group.drop` used in `normalize_variety`requires either the `columns` arg specified since by default the drop axis is 0. Signed-off-by: jpatra72 <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a bug in the groupby docstring example by specifying the columns argument in pd.DataFrame.drop. I've added a suggestion to further improve the example by using a more performant and robust vectorized implementation for the normalization logic, which also handles the edge case of division by zero.
| for feature in group.drop(columns=["variety"]).columns: | ||
| group[feature] = group[feature] / group[feature].abs().max() |
There was a problem hiding this comment.
This implementation can be improved for performance and robustness. Iterating over DataFrame columns is generally less efficient than using vectorized operations. Additionally, if a feature column contains only zeros, group[feature].abs().max() will be 0, leading to division by zero.
A more idiomatic and robust pandas implementation would be to perform this normalization using vectorized operations and handle the division-by-zero edge case.
| for feature in group.drop(columns=["variety"]).columns: | |
| group[feature] = group[feature] / group[feature].abs().max() | |
| features_to_normalize = group.columns.drop("variety") | |
| max_abs_values = group[features_to_normalize].abs().max() | |
| # Avoid division by zero for columns that are all zeros. | |
| max_abs_values[max_abs_values == 0] = 1 | |
| group[features_to_normalize] = group[features_to_normalize] / max_abs_values |
…ect#57036) ## Why are these changes needed? `group.drop` used in `normalize_variety` requires the `columns` arg specified since by default the drop axis is 0 in pandas ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 175837f. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Signed-off-by: jpatra72 <[email protected]> Signed-off-by: Seiji Eicher <[email protected]>
## Why are these changes needed? `group.drop` used in `normalize_variety` requires the `columns` arg specified since by default the drop axis is 0 in pandas ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 175837f. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Signed-off-by: jpatra72 <[email protected]> Signed-off-by: Douglas Strodtman <[email protected]>
…ect#57036) ## Why are these changes needed? `group.drop` used in `normalize_variety` requires the `columns` arg specified since by default the drop axis is 0 in pandas ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 175837f. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Signed-off-by: jpatra72 <[email protected]>
…ect#57036) ## Why are these changes needed? `group.drop` used in `normalize_variety` requires the `columns` arg specified since by default the drop axis is 0 in pandas ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 175837f. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Signed-off-by: jpatra72 <[email protected]> Signed-off-by: Josh Kodi <[email protected]>
…ect#57036) ## Why are these changes needed? `group.drop` used in `normalize_variety` requires the `columns` arg specified since by default the drop axis is 0 in pandas ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 175837f. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Signed-off-by: jpatra72 <[email protected]>
…ect#57036) ## Why are these changes needed? `group.drop` used in `normalize_variety` requires the `columns` arg specified since by default the drop axis is 0 in pandas ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 175837f. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Signed-off-by: jpatra72 <[email protected]>
…ect#57036) ## Why are these changes needed? `group.drop` used in `normalize_variety` requires the `columns` arg specified since by default the drop axis is 0 in pandas ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 175837f. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Signed-off-by: jpatra72 <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
…ect#57036) ## Why are these changes needed? `group.drop` used in `normalize_variety` requires the `columns` arg specified since by default the drop axis is 0 in pandas ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 175837f. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Signed-off-by: jpatra72 <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
Why are these changes needed?
group.dropused innormalize_varietyrequires thecolumnsarg specified since by default the drop axis is 0 in pandasChecks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.Note
Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility.
Written by Cursor Bugbot for commit 175837f. This will update automatically on new commits. Configure here.