Skip to content

Ytsaurus: allow create table/table functions/dictionaries with subset of columns.#87982

Merged
vdimir merged 4 commits intoClickHouse:masterfrom
MikhailBurdukov:fix_schema_comp
Oct 8, 2025
Merged

Ytsaurus: allow create table/table functions/dictionaries with subset of columns.#87982
vdimir merged 4 commits intoClickHouse:masterfrom
MikhailBurdukov:fix_schema_comp

Conversation

@MikhailBurdukov
Copy link
Contributor

Allows to create a table/table functions/dicts with subset of columns from YtSaurus source.
Example from the integration test:

    yt.create_table(
        table_path,
        '{"a":10,"b":20, "c": 1}{"a":20,"b":40, "c": 2}',
        schema={"a": "int32", "b": "int32", "c": "int32"},
    )
    instance.query(
          f"SELECT a,b FROM ytsaurus('{YT_URI}','{table_path}', '{YT_DEFAULT_TOKEN}', 'a Int32, b Int32')"
      )
      == "10\t20\n20\t40\n"

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Ytsaurus: allow create table/table functions/dictionaries with subset of columns.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@MikhailBurdukov MikhailBurdukov marked this pull request as ready for review October 1, 2025 15:17
@vdimir vdimir added the can be tested Allows running workflows for external contributors label Oct 1, 2025
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Oct 1, 2025

Workflow [PR], commit [77dbff1]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Oct 1, 2025
@vdimir vdimir self-assigned this Oct 2, 2025
const auto & schema_json = schema.extract<Poco::JSON::Object::Ptr>();

if (!schema_json->has("$attributes"))
throw Exception(ErrorCodes::LOGICAL_ERROR, "No \"$attributes\" property in yt table schema");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a malformed client to send invalid schema json? If so, we should probably use a different error code here, since LOGICAL_ERROR is meant for unreachable code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, suppose it is possible. INCORRECT_DATA will be fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so

@vdimir vdimir enabled auto-merge October 6, 2025 14:42
@MikhailBurdukov
Copy link
Contributor Author

@vdimir
Hi! There seems to be some kind of problem with merging. What should we do with it?

Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel):
| ERROR: Failed to insert data into CI DB, exception [command failed with, exit_code 254, stderr:
>>>
An error occurred (AccessDeniedException) when calling the GetParameters operation: User: arn:aws:sts::542516086801:assumed-role/ec2_admin/i-0fb3ee365709b6161 is not authorized to perform: ssm:GetParameters on resource: arn:aws:ssm:us-east-1:542516086801:parameter/clickhouse-test-stat-url because no identity-based policy allows the ssm:GetParameters action
<<<]
Stress test (amd_msan):
| ERROR: Failed to insert data into CI DB, exception [command failed with, exit_code 254, stderr:
>>>
An error occurred (AccessDeniedException) when calling the GetParameters operation: User: arn:aws:sts::542516086801:assumed-role/ec2_admin/i-0d279861f39ac207f is not authorized to perform: ssm:GetParameters on resource: arn:aws:ssm:us-east-1:542516086801:parameter/clickhouse-test-stat-url because no identity-based policy allows the ssm:GetParameters action
<<<]
Finish Workflow:
| ERROR: Failed to insert data into CI DB, exception [command failed with, exit_code 254, stderr:
>>>
An error occurred (AccessDeniedException) when calling the GetParameters operation: User: arn:aws:sts::542516086801:assumed-role/ec2_admin/i-0b8eb1fc638cfb8df is not authorized to perform: ssm:GetParameters on resource: arn:aws:ssm:us-east-1:542516086801:parameter/clickhouse-test-stat-url because no identity-based policy allows the ssm:GetParameters action
<<<]

@vdimir vdimir added this pull request to the merge queue Oct 8, 2025
Merged via the queue into ClickHouse:master with commit e1c1cc9 Oct 8, 2025
358 of 361 checks passed
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 8, 2025
@vdimir
Copy link
Member

vdimir commented Oct 8, 2025

I've rerun CI, thanks for pointing it out

@kssenii
Copy link
Member

kssenii commented Oct 8, 2025

The test is now broken
https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=e1c1cc99b3ea5d9e9f86612f3c245a03e90dc6e5&name_0=MasterCI&name_1=Integration%20tests%20%28amd_tsan%2C%203%2F6%29&name_1=Integration%20tests%20%28amd_tsan%2C%203%2F6%29

File: test_ytsaurus/test_tables.py:475 - in test_ytsaurus_select_subset_of_columns
    yt = YTsaurusCLI(started_cluster, instance, YT_HOST, YT_PORT)
E   NameError: name 'YT_HOST' is not defined

@Algunenano
Copy link
Member

Let's revert and reintroduce. Everything is failing

@MikhailBurdukov
Copy link
Contributor Author

"Merging" race with #88165

Yeh, let's revert and I will rewrite the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants