Add additional checks for missing streams in Wide parts#92076
Add additional checks for missing streams in Wide parts#92076Avogar merged 10 commits intoClickHouse:masterfrom
Conversation
|
Workflow [PR], commit [e5e5426] Summary: ❌
|
…-check-for-unexpected-streams
…-check-for-unexpected-streams
…-check-for-unexpected-streams
|
Hm, there are still some issues - https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=92076&sha=latest&name_0=PR&name_1=Stateless+tests+%28amd_asan%2C+distributed+plan%2C+parallel%2C+2%2F2%29
|
|
Tests are good now |
| if (null_map->size() != nested_column->size()) | ||
| throw Exception( | ||
| ErrorCodes::INCORRECT_DATA, | ||
| settings.native_format ? ErrorCodes::INCORRECT_DATA : ErrorCodes::LOGICAL_ERROR, |
There was a problem hiding this comment.
Hm, why it is OK (INCORRECT_DATA over LOGICAL_ERROR) for native format? (here and below)
There was a problem hiding this comment.
Because user can try to read corrupted data in Native format and we don't want to throw LOGICAL_ERROR when we try to read some corrupted user data.
There was a problem hiding this comment.
But these functions can be called only for Native format? Am I missing something?
I guess native_format == false is when we reading MergeTree data, and in this case we will haveLOGICAL_ERROR with this patch, and for Native over network it will be INCORRECT_DATA?
(Note, StorageMemory also uses NativeReader which will use INCORRECT_DATA as well)
There was a problem hiding this comment.
I guess native_format == false is when we reading MergeTree data, and in this case we will haveLOGICAL_ERROR with this patch, and for Native over network it will be INCORRECT_DATA?
Yes. And over TCP protocol we can also receive some corrupted data in Native format from different language clients.
My intention for these changes is actually to throw logical error in reading from MergeTree, because in serializations we have logic of skipping reading data if returned buffer for some stream is nullptr. And without such checks it may lead to inconsistent in-memory state of the columns and crashes in random places if we have some bug or missing file in the data part.
95e809a
Cherry pick #92076 to 25.12: Add additional checks for missing streams in Wide parts
Backport #92076 to 25.12: Add additional checks for missing streams in Wide parts
Cherry pick #92076 to 25.8: Add additional checks for missing streams in Wide parts
Cherry pick #92076 to 25.10: Add additional checks for missing streams in Wide parts
Cherry pick #92076 to 25.11: Add additional checks for missing streams in Wide parts
Cherry pick #92076 to 25.3: Add additional checks for missing streams in Wide parts
Backport #92076 to 25.11: Add additional checks for missing streams in Wide parts
Backport #92076 to 25.10: Add additional checks for missing streams in Wide parts
Backport #92076 to 25.8: Add additional checks for missing streams in Wide parts
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
If some files in data part are missing, we can end up with inconsistent in-memory state of some columns which can lead to crashes. It's better to throw logical errors in this case