Skip to content

[data] Remove unification of schemas#55926

Merged
alexeykudinkin merged 36 commits intoray-project:masterfrom
iamjustinhsu:jhsu/take-first-non-empty-schema
Aug 28, 2025
Merged

[data] Remove unification of schemas#55926
alexeykudinkin merged 36 commits intoray-project:masterfrom
iamjustinhsu:jhsu/take-first-non-empty-schema

Conversation

@iamjustinhsu
Copy link
Contributor

Why are these changes needed?

  • unification of schemas is slow
  • revert back to pre https://github.com/ray-project/ray/pull/53454/files commit. if no unification before, no unification after. if unification before, we can leave it there or add it back. If I removed it I added a comment with # NOTE

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
@iamjustinhsu iamjustinhsu marked this pull request as ready for review August 26, 2025 19:04
@iamjustinhsu iamjustinhsu requested a review from a team as a code owner August 26, 2025 19:04
Signed-off-by: iamjustinhsu <[email protected]>
@ray-gardener ray-gardener bot added the data Ray Data-related issues label Aug 27, 2025
The first non-empty schema, or None if all schemas are empty.
"""
for schema in schemas:
if not _is_empty_schema(schema):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would there be empty schemas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident enough to remove this statement because we know empty schemas have appeared before

Copy link
Contributor Author

@iamjustinhsu iamjustinhsu Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok my most recent commit found a case 9734268 where we are taking the first non-empty bundle

iamjustinhsu and others added 10 commits August 27, 2025 10:35
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
@iamjustinhsu iamjustinhsu requested a review from a team as a code owner August 27, 2025 22:45
@iamjustinhsu iamjustinhsu added the go add ONLY when ready to merge, run all tests label Aug 27, 2025
(_create_mixed_complex_schema, 0.0002),
],
)
def test_unify_schemas_equivalent_performance(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goutamvenkat-anyscale should handle cases where the schemas are NOT identical

Signed-off-by: iamjustinhsu <[email protected]>
)
def test_dedupe_schema_handle_empty(
old_schema: Optional["Schema"],
old_schema_is_empty: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this flag? You can just test if the schema is empty, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexeykudinkin yea it's not completely necessary, just thought it would be easier to read than before, i can revert back

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: iamjustinhsu <[email protected]>
@aslonnie
Copy link
Collaborator

cc @aslonnie

@alexeykudinkin alexeykudinkin merged commit 21b77f6 into ray-project:master Aug 28, 2025
5 checks passed
omatthew98 pushed a commit to omatthew98/ray that referenced this pull request Aug 28, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

<!-- For example: "Closes ray-project#1234" -->

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
omatthew98 pushed a commit to omatthew98/ray that referenced this pull request Aug 28, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

<!-- For example: "Closes ray-project#1234" -->

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
aslonnie pushed a commit that referenced this pull request Aug 28, 2025
Cherry pick two fixes for ray data (from
#55854 and
#55926).

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Co-authored-by: iamjustinhsu <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
tohtana pushed a commit to tohtana/ray that referenced this pull request Aug 29, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
tohtana pushed a commit to tohtana/ray that referenced this pull request Aug 29, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
gangsf pushed a commit to gangsf/ray that referenced this pull request Sep 2, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Signed-off-by: Gang Zhao <[email protected]>
sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Sep 8, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Signed-off-by: sampan <[email protected]>
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Signed-off-by: jugalshah291 <[email protected]>
wyhong3103 pushed a commit to wyhong3103/ray that referenced this pull request Sep 12, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Signed-off-by: yenhong.wong <[email protected]>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Signed-off-by: Douglas Strodtman <[email protected]>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
- unification of schemas is slow
- revert back to pre https://github.com/ray-project/ray/pull/53454/files
commit. if no unification before, no unification after. if unification
before, we can leave it there or add it back. If I removed it I added a
comment with `# NOTE`
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <[email protected]>
Signed-off-by: Alexey Kudinkin <[email protected]>
Co-authored-by: Alexey Kudinkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants