Skip to content

Settings to write and verify parquet checksums#79012

Merged
al13n321 merged 11 commits intomasterfrom
pqcs
Nov 3, 2025
Merged

Settings to write and verify parquet checksums#79012
al13n321 merged 11 commits intomasterfrom
pqcs

Conversation

@al13n321
Copy link
Member

@al13n321 al13n321 commented Apr 11, 2025

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Details

output_format_parquet_write_checksums (previously we never wrote checksums), input_format_parquet_verify_checksums (previously we always verified them), simple.

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Apr 11, 2025

Workflow [PR], commit [55d56e7]

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 11, 2025
@al13n321
Copy link
Member Author

CC @PedroTadim , after this PR fuzzer may want to disable at least one of these two settings when messing with file contents.

@Avogar Avogar self-assigned this Apr 11, 2025
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Jun 3, 2025

Dear @Avogar, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Oct 20, 2025

Workflow [PR], commit [ee30e00]

Summary:

job_name test_name status info comment
Integration tests (amd_binary, 2/5) failure
test_merge_tree_s3/test.py::test_merge_canceled_by_s3_errors[node-broken_s3_always_multi_part] FAIL cidb, flaky
Integration tests (amd_tsan, 3/6) failure
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_id[False-FLAT] FAIL cidb
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_id[True-FLAT] FAIL cidb
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_id[False-HASHED] FAIL cidb
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_id[True-HASHED] FAIL cidb
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_id[False-HASHED_ARRAY] FAIL cidb
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_id[True-HASHED_ARRAY] FAIL cidb
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_complex_key[True-COMPLEX_KEY_HASHED] FAIL cidb
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_complex_key[False-COMPLEX_KEY_HASHED] FAIL cidb
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_complex_key[True-COMPLEX_KEY_HASHED_ARRAY] FAIL cidb
test_ytsaurus/test_dictionaries.py::test_yt_dictionary_complex_key[False-COMPLEX_KEY_HASHED_ARRAY] FAIL cidb
8 more test cases not shown
Stress test (amd_debug) failure
Server died FAIL cidb
Hung check failed, possible deadlock found (see hung_check.log) FAIL cidb
Killed by signal (in clickhouse-server.log) FAIL cidb
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL cidb
Killed by signal (output files) FAIL cidb
Found signal in gdb.log FAIL cidb

@al13n321
Copy link
Member Author

Fuzzer error looks unrelated, and I couldn't reproduce it. It says the query

CREATE TABLE source_table (`id` UInt32, `name` String, `value` Float64) ENGINE = MergeTree ORDER BY id;
CREATE TABLE alias_syntax_3__fuzz_27 (`id` LowCardinality(UInt8), `name` Int32, `value` Nullable(Float32)) ENGINE = Alias('source_table') settings allow_suspicious_low_cardinality_types=1;
SELECT murmurHash2_32(isNotNull(2)), * FROM alias_syntax_3__fuzz_27 PREWHERE murmurHash2_32(murmurHash2_32(murmurHash2_32(murmurHash2_32(materialize(toFixedString('', 1)), 1, 1, 1, isNotNull(1), 2, murmurHash3_64(1))), murmurHash2_32(murmurHash2_32(1, 1, materialize(1), isNotNull('')), materialize(2)), 2 <=> 1, isNullable(1), 1, *)) WHERE not(equals(murmurHash2_32(1, '', isNotNull(toLowCardinality(1)), materialize(toUInt128(2)), '\0', 2, 2), isNotNull(2))) ORDER BY ALL ASC NULLS FIRST;

failed with Logical error: 'Unexpected return type from murmurHash2_32. Expected Nullable(UInt32). Got UInt32. But I'm getting Function isNotDistinctFrom can be used only in the JOIN ON section. After removing the <=>, the query succeeds.

@al13n321 al13n321 enabled auto-merge November 3, 2025 18:54
@al13n321
Copy link
Member Author

al13n321 commented Nov 3, 2025

The remaining failed tests are flaky.

@al13n321 al13n321 added this pull request to the merge queue Nov 3, 2025
Merged via the queue into master with commit 8981034 Nov 3, 2025
120 of 124 checks passed
@al13n321 al13n321 deleted the pqcs branch November 3, 2025 19:14
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 3, 2025
mkmkme pushed a commit to Altinity/ClickHouse that referenced this pull request Dec 16, 2025
Settings to write and verify parquet checksums
mkmkme added a commit to Altinity/ClickHouse that referenced this pull request Dec 16, 2025
Settings to write and verify parquet checksums
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Dec 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants