Skip to content

[Data] Improve Download Op Display Name#57773

Merged
bveeramani merged 4 commits intoray-project:masterfrom
kyuds:download-op-name
Oct 16, 2025
Merged

[Data] Improve Download Op Display Name#57773
bveeramani merged 4 commits intoray-project:masterfrom
kyuds:download-op-name

Conversation

@kyuds
Copy link
Member

@kyuds kyuds commented Oct 16, 2025

Description

Some considerations:

  • Original issue description showed examples for using parenthesis or square brackets to enclose the uri. I implemented mine in square brackets as it was first mentioned, but please let me know if parenthesis is preferred.

TODO:

  • Need to write tests (how should I do this though? I feel there are two ways: capture stdout/stderr from test_download_expression.py::TestDownloadExpressionIntegration::test_download_expression_with_map_batches, or try to mock plan_download_op.py::plan_download_op to check the name of the download operator. Im unsure on which one is preferred)

Related issues

Fixes #57732

Types of change

  • Bug fix 🐛
  • New feature ✨
  • Enhancement 🚀
  • Code refactoring 🔧
  • Documentation update 📖
  • Chore 🧹
  • Style 🎨

Checklist

Does this PR introduce breaking changes?

  • Yes ⚠️
  • No

Testing:

  • Added/updated tests for my changes
  • Tested the changes manually
  • This PR is not tested ❌ (please explain why)

Code Quality:

  • Signed off every commit (git commit -s)
  • Ran pre-commit hooks (setup guide)

Documentation:

  • Updated documentation (if applicable) (contribution guide)
  • Added new APIs to doc/source/ (if applicable)

Additional context

Signed-off-by: kyuds <[email protected]>
@kyuds kyuds requested a review from a team as a code owner October 16, 2025 00:24
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability.

I have a couple of suggestions for consistency:

  • The operator names in Ray Data typically use parentheses for parameters, e.g., Repartition(num_outputs=2). It would be more consistent to use Partition(uri_column_name) and Download(uri_column_name) instead of square brackets.

Regarding testing, you're right to think about how to verify the operator name. Instead of capturing stdout, which can be brittle, you can inspect the output of Dataset.stats(). The stats string contains the names of the physical operators. You can add a test in python/ray/data/tests/test_download_expression.py that creates a dataset with a download expression and then asserts that the new operator names appear in the stats output. For example:

def test_download_operator_name(self, tmp_path):
    # ... setup code to create a dataset with a download expression ...
    ds = ray.data.from_pandas(...)
    ds = ds.with_column("content", download("file_uri"))
    stats = ds.stats()
    assert "Download(file_uri)" in stats
    assert "Partition(file_uri)" in stats

This approach is used in other tests in the Ray Data codebase and is more robust than checking stdout.

Once these changes are addressed and tests are added, this PR will be in great shape.

@kyuds
Copy link
Member Author

kyuds commented Oct 16, 2025

requesting for initial feedback @bveeramani ! Thank you!

Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyuds could you rebase off of #57775 and make sure the naming works well with multiple URI columns?

@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Oct 16, 2025
@kyuds
Copy link
Member Author

kyuds commented Oct 16, 2025

ok thats why you mentioned the multi-uri in the issue description. I was going to ask about that. Will do it before tomorrow!

@kyuds
Copy link
Member Author

kyuds commented Oct 16, 2025

done! Im on an x86_64 macbook right now (I know... very old). I'll test my diffs when I get to an m1 machine.

Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for minor comment

@kyuds
Copy link
Member Author

kyuds commented Oct 16, 2025

confirmed it works for both tqdm and rich progress! PTAL @bveeramani !

@bveeramani bveeramani enabled auto-merge (squash) October 16, 2025 18:38
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 16, 2025
@bveeramani bveeramani merged commit 1622ff8 into ray-project:master Oct 16, 2025
8 checks passed
@kyuds kyuds deleted the download-op-name branch October 16, 2025 21:59
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability.

---------

Signed-off-by: kyuds <[email protected]>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability.

---------

Signed-off-by: kyuds <[email protected]>
Signed-off-by: xgui <[email protected]>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability.

---------

Signed-off-by: kyuds <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability.

---------

Signed-off-by: kyuds <[email protected]>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability.

---------

Signed-off-by: kyuds <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability.

---------

Signed-off-by: kyuds <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Improve Download operator display name in progress bars

2 participants