[Data] Improve Download Op Display Name#57773
Conversation
Signed-off-by: kyuds <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability.
I have a couple of suggestions for consistency:
- The operator names in Ray Data typically use parentheses for parameters, e.g.,
Repartition(num_outputs=2). It would be more consistent to usePartition(uri_column_name)andDownload(uri_column_name)instead of square brackets.
Regarding testing, you're right to think about how to verify the operator name. Instead of capturing stdout, which can be brittle, you can inspect the output of Dataset.stats(). The stats string contains the names of the physical operators. You can add a test in python/ray/data/tests/test_download_expression.py that creates a dataset with a download expression and then asserts that the new operator names appear in the stats output. For example:
def test_download_operator_name(self, tmp_path):
# ... setup code to create a dataset with a download expression ...
ds = ray.data.from_pandas(...)
ds = ds.with_column("content", download("file_uri"))
stats = ds.stats()
assert "Download(file_uri)" in stats
assert "Partition(file_uri)" in statsThis approach is used in other tests in the Ray Data codebase and is more robust than checking stdout.
Once these changes are addressed and tests are added, this PR will be in great shape.
|
requesting for initial feedback @bveeramani ! Thank you! |
|
ok thats why you mentioned the multi-uri in the issue description. I was going to ask about that. Will do it before tomorrow! |
Signed-off-by: kyuds <[email protected]>
|
done! Im on an x86_64 macbook right now (I know... very old). I'll test my diffs when I get to an m1 machine. |
bveeramani
left a comment
There was a problem hiding this comment.
LGTM except for minor comment
Signed-off-by: kyuds <[email protected]>
|
confirmed it works for both tqdm and rich progress! PTAL @bveeramani ! |
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability. --------- Signed-off-by: kyuds <[email protected]>
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability. --------- Signed-off-by: kyuds <[email protected]> Signed-off-by: xgui <[email protected]>
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability. --------- Signed-off-by: kyuds <[email protected]> Signed-off-by: elliot-barn <[email protected]>
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability. --------- Signed-off-by: kyuds <[email protected]>
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability. --------- Signed-off-by: kyuds <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
This pull request improves the display names for the Download operator, making them more descriptive by including the URI column name. This is a great enhancement for observability. --------- Signed-off-by: kyuds <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
Description
Some considerations:
TODO:
test_download_expression.py::TestDownloadExpressionIntegration::test_download_expression_with_map_batches, or try to mockplan_download_op.py::plan_download_opto check the name of the download operator. Im unsure on which one is preferred)Related issues
Fixes #57732
Types of change
Checklist
Does this PR introduce breaking changes?
Testing:
Code Quality:
git commit -s)Documentation:
doc/source/(if applicable)Additional context