Set max message size on parquet v3 reader#1198
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
As I understand, PR increased message size limit from 100 megabytes to 2 gigabytes. |
I have briefly checked the |
|
Was this change tested? |
Not with real data. @dima-altinity was going to test it with real customer data, I have not heard from him yet. On my side, I have tried creating a parquet file with a big metadata footer with the following ChatGPT provided script: Before these changes, I would get: After these changes, I get: I opted for not introducing a test because the |
|
It may be interesting to know if this memory is under control of CH memory quotes (some libraries have issues with this, some others don't), but probably this is out of the scope of this PR. |
|
Verification test: https://github.com/Altinity/clickhouse-regression/blob/52cf60980b118efa4fe7dc0d790b1d7de2c757a0/parquet/tests/native_reader.py#L237 The test dynamically generates a parquet file, puts it to the user_files in clickhouse and does a cleanup after execution in order to not keep 300MB size parquet file. Currently reading this file is only possible with Just |
|
Marking it as obsolete due to ClickHouse#91737 |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Set max message size on parquet v3 reader to avoid getting DB::Exception: apache::thrift::transport::TTransportException: MaxMessageSize reached
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: