Skip to content

[Ray Data | Docs] Error in ray.data.groupby example in docs#57036

Merged
richardliaw merged 1 commit intoray-project:masterfrom
jpatra72:patch-2
Oct 1, 2025
Merged

[Ray Data | Docs] Error in ray.data.groupby example in docs#57036
richardliaw merged 1 commit intoray-project:masterfrom
jpatra72:patch-2

Conversation

@jpatra72
Copy link
Contributor

@jpatra72 jpatra72 commented Sep 30, 2025

Why are these changes needed?

group.drop used in normalize_variety requires the columns arg specified since by default the drop axis is 0 in pandas

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Note

Corrects the groupby example to use group.drop(columns=["variety"]) for pandas compatibility.

Written by Cursor Bugbot for commit 175837f. This will update automatically on new commits. Configure here.

`group.drop` used in `normalize_variety`requires either the `columns` arg specified since by default the drop axis is 0. 

Signed-off-by: jpatra72 <[email protected]>
@jpatra72 jpatra72 requested a review from a team as a code owner September 30, 2025 12:35
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the groupby docstring example by specifying the columns argument in pd.DataFrame.drop. I've added a suggestion to further improve the example by using a more performant and robust vectorized implementation for the normalization logic, which also handles the edge case of division by zero.

Comment on lines +2708 to 2709
for feature in group.drop(columns=["variety"]).columns:
group[feature] = group[feature] / group[feature].abs().max()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation can be improved for performance and robustness. Iterating over DataFrame columns is generally less efficient than using vectorized operations. Additionally, if a feature column contains only zeros, group[feature].abs().max() will be 0, leading to division by zero.

A more idiomatic and robust pandas implementation would be to perform this normalization using vectorized operations and handle the division-by-zero edge case.

Suggested change
for feature in group.drop(columns=["variety"]).columns:
group[feature] = group[feature] / group[feature].abs().max()
features_to_normalize = group.columns.drop("variety")
max_abs_values = group[features_to_normalize].abs().max()
# Avoid division by zero for columns that are all zeros.
max_abs_values[max_abs_values == 0] = 1
group[features_to_normalize] = group[features_to_normalize] / max_abs_values

@jpatra72 jpatra72 changed the title [Ray Data | Docs] Mistake in ray.data.groupby example [Ray Data | Docs] Error in ray.data.groupby example in docs Sep 30, 2025
@ray-gardener ray-gardener bot added docs An issue or change related to documentation data Ray Data-related issues labels Sep 30, 2025
@goutamvenkat-anyscale goutamvenkat-anyscale added the go add ONLY when ready to merge, run all tests label Sep 30, 2025
@richardliaw richardliaw merged commit 5ae0b3b into ray-project:master Oct 1, 2025
6 checks passed
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57036)

## Why are these changes needed?

`group.drop` used in `normalize_variety` requires the `columns` arg
specified since by default the drop axis is 0 in pandas

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Corrects the groupby example to use group.drop(columns=["variety"])
for pandas compatibility.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
175837f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: jpatra72 <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
## Why are these changes needed?

`group.drop` used in `normalize_variety` requires the `columns` arg
specified since by default the drop axis is 0 in pandas

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Corrects the groupby example to use group.drop(columns=["variety"])
for pandas compatibility.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
175837f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: jpatra72 <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
liulehui pushed a commit to liulehui/ray that referenced this pull request Oct 9, 2025
…ect#57036)

## Why are these changes needed?

`group.drop` used in `normalize_variety` requires the `columns` arg
specified since by default the drop axis is 0 in pandas


## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Corrects the groupby example to use group.drop(columns=["variety"])
for pandas compatibility.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
175837f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: jpatra72 <[email protected]>
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
…ect#57036)

## Why are these changes needed?

`group.drop` used in `normalize_variety` requires the `columns` arg
specified since by default the drop axis is 0 in pandas

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Corrects the groupby example to use group.drop(columns=["variety"])
for pandas compatibility.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
175837f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: jpatra72 <[email protected]>
Signed-off-by: Josh Kodi <[email protected]>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…ect#57036)

## Why are these changes needed?

`group.drop` used in `normalize_variety` requires the `columns` arg
specified since by default the drop axis is 0 in pandas


## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Corrects the groupby example to use group.drop(columns=["variety"])
for pandas compatibility.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
175837f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: jpatra72 <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…ect#57036)

## Why are these changes needed?

`group.drop` used in `normalize_variety` requires the `columns` arg
specified since by default the drop axis is 0 in pandas


## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Corrects the groupby example to use group.drop(columns=["variety"])
for pandas compatibility.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
175837f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: jpatra72 <[email protected]>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ect#57036)

## Why are these changes needed?

`group.drop` used in `normalize_variety` requires the `columns` arg
specified since by default the drop axis is 0 in pandas

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Corrects the groupby example to use group.drop(columns=["variety"])
for pandas compatibility.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
175837f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: jpatra72 <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…ect#57036)

## Why are these changes needed?

`group.drop` used in `normalize_variety` requires the `columns` arg
specified since by default the drop axis is 0 in pandas

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Corrects the groupby example to use group.drop(columns=["variety"])
for pandas compatibility.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
175837f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Signed-off-by: jpatra72 <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues docs An issue or change related to documentation go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants