[Data] - Add DataType to Expression class#55915
[Data] - Add DataType to Expression class#55915alexeykudinkin merged 24 commits intoray-project:masterfrom
Conversation
Signed-off-by: Goutam V <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request introduces a new DataType class to represent data types within Ray Data expressions, and integrates it into the Expr class hierarchy. The new DataType class provides a unified abstraction over PyArrow, NumPy, and Python types, with methods for conversion and type inference.
The implementation is a good first step. I've identified a few areas for improvement in the new DataType class to enhance its robustness and correctness, particularly around type conversion fallbacks and exception handling. My comments focus on ensuring the new DataType class is as reliable and performant as possible.
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
bveeramani
left a comment
There was a problem hiding this comment.
Minor comment but looks good to me https://github.com/ray-project/ray/pull/55915/files#r2302280395
python/ray/data/datatype.py
Outdated
| ) | ||
|
|
||
| # Type checking methods | ||
| def is_arrow_type(self) -> bool: |
There was a problem hiding this comment.
Nit: Could use a field like self._internal_type_kind or similar to store the kind of type on construction and add some type of enum to check against rather than having to constantly call these functions / the underlying isinstance function.
Signed-off-by: Goutam V <[email protected]>
python/ray/data/datatype.py
Outdated
|
|
||
| def __post_init__(self): | ||
| """Validate the _internal_type after initialization.""" | ||
| if not isinstance(self._internal_type, (pa.DataType, np.dtype, type)): |
There was a problem hiding this comment.
We need to match support of batch UDFs: ie we'd have to support PA, Numpy, Pandas, Python (not a fan of the latter but since we already support it)
There was a problem hiding this comment.
Do you mind if I add Pandas support in a separate PR? I've added a TODO for this.
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V <[email protected]>
Signed-off-by: Goutam V. <[email protected]>
Signed-off-by: Goutam V. <[email protected]>
Signed-off-by: Goutam V. <[email protected]>
Signed-off-by: Goutam V. <[email protected]>
Signed-off-by: Goutam V. <[email protected]>
Signed-off-by: Goutam V. <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Add return_dtype to all Expr types. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V <[email protected]> Signed-off-by: Goutam V. <[email protected]> Signed-off-by: sampan <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Add return_dtype to all Expr types. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V <[email protected]> Signed-off-by: Goutam V. <[email protected]> Signed-off-by: jugalshah291 <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Add return_dtype to all Expr types. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V <[email protected]> Signed-off-by: Goutam V. <[email protected]> Signed-off-by: yenhong.wong <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Add return_dtype to all Expr types. ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V <[email protected]> Signed-off-by: Goutam V. <[email protected]> Signed-off-by: Douglas Strodtman <[email protected]>
Original PR #55915 by goutamvenkat-anyscale Original: ray-project/ray#55915
Merged from original PR #55915 Original: ray-project/ray#55915
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Add return_dtype to all Expr types. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V <[email protected]> Signed-off-by: Goutam V. <[email protected]>
Why are these changes needed?
Add return_dtype to all Expr types.
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.