Skip to content

Refactor how IInputFormat deals with reading many files at once#80931

Merged
al13n321 merged 22 commits intomasterfrom
shp
Jun 25, 2025
Merged

Refactor how IInputFormat deals with reading many files at once#80931
al13n321 merged 22 commits intomasterfrom
shp

Conversation

@al13n321
Copy link
Member

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Closes #65963

Changes how e.g. SELECT ... FROM file('part{00..99}.parquet', ...) does coordination among the 100 file readers.

Goals:

  1. Replace Utilizing a Shared Parsing Pool for Multiple Parquet Streams #66253 . I.e. if a few big files are mixed with many small files, use more threads for the big files once the small readers complete.
  2. Give IInputFormat access to ActionsDAG of the filter condition. Previously it only had a KeyCondition constructed from that ActionsDAG. This is needed for A new parquet reader that supports filter push down, which improves the total time on clickbench by 50+% compared to arrow parquet reader #70611
  3. Share a ThreadPoolCallbackRunnerFast instance among file readers. This is needed for [Very WIP] Yet another parquet reader #78380 . Might be useful for ParallelParsingInputFormat or ParallelParsingInputFormat if thread scheduling overhead is noticeable there; I haven't tried that.
  4. Make it possible to share other things among the readers, e.g. memory budget or preprocessed filter expression (e.g. split by columns, to avoid repeating this work for each file).
  5. Fix the inconsistency between max_parsing_threads and max_download_threads settings: the former is the total across all files, the latter used to be per file (i.e. total number of threads was max_download_threads * file_count). Made max_download_threads be a total as well. Not sure what to do about compatibility.

@al13n321 al13n321 requested a review from Avogar May 28, 2025 02:20
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented May 28, 2025

Workflow [PR], commit [dbfe932]

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label May 28, 2025
Copy link
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I see query with Parquet file in reading_from_file performance test slowed down:
Screenshot 2025-06-05 at 12 39 47
Before the merge let's check if it's connected to the changes

@al13n321
Copy link
Member Author

I couldn't reproduce the performance difference. On my machine I'm getting ~0.33s both on master and on this PR. I guess it was a false positive?

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Jun 18, 2025

Workflow [PR], commit [c789529]

Summary:

job_name test_name status info comment
AST fuzzer (amd_tsan) failure
Task failed: $?=127 FAIL
AST fuzzer (amd_ubsan) failure
Task failed: $?=127 FAIL
BuzzHouse (amd_tsan) failure
Task failed: $?=127 FAIL
BuzzHouse (amd_ubsan) failure
Task failed: $?=127 FAIL

@al13n321
Copy link
Member Author

kyligence-git pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 15, 2025
Fix rebase issue:
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 18, 2025
Fix rebase issue:
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 18, 2025
Fix rebase issue:
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 20, 2025
Fix rebase issue:
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
kyligence-git pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 20, 2025
Fix rebase issue:
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
kyligence-git pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 21, 2025
Fix rebase issue:
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 28, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 29, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
kyligence-git pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 29, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
kyligence-git pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 30, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
kyligence-git pushed a commit to Kyligence/ClickHouse that referenced this pull request Jul 31, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
kyligence-git pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 1, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
kyligence-git pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 2, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
kyligence-git pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 3, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 7, 2025
Fix rebase issue:
- 20250806 ClickHouse#84821
- 20250804 ClickHouse#83997
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 8, 2025
Fix rebase issue:
- 20250806 ClickHouse#84821
- 20250804 ClickHouse#83997
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 11, 2025
Fix rebase issue:
- 20250806 ClickHouse#84821
- 20250804 ClickHouse#83997
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 12, 2025
Fix rebase issue:
- 20250806 ClickHouse#84821
- 20250804 ClickHouse#83997
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 13, 2025
Fix rebase issue:
- 20250806 ClickHouse#84821
- 20250804 ClickHouse#83997
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 14, 2025
Fix rebase issue:
- 20250806 ClickHouse#84821
- 20250804 ClickHouse#83997
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 15, 2025
Fix rebase issue:
- 20250806 ClickHouse#84821
- 20250804 ClickHouse#83997
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 16, 2025
Fix rebase issue:
- 20250806 ClickHouse#84821
- 20250804 ClickHouse#83997
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
baibaichen pushed a commit to Kyligence/ClickHouse that referenced this pull request Aug 18, 2025
Fix rebase issue:
- 20250806 ClickHouse#84821
- 20250804 ClickHouse#83997
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
lgbo-ustc pushed a commit to bigo-sg/ClickHouse that referenced this pull request Sep 17, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
lgbo-ustc pushed a commit to Kyligence/ClickHouse that referenced this pull request Sep 17, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
lgbo-ustc pushed a commit to Kyligence/ClickHouse that referenced this pull request Sep 28, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
zzcclp pushed a commit to Kyligence/ClickHouse that referenced this pull request Dec 19, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
(cherry picked from commit c68fb33)
zzcclp pushed a commit to Kyligence/ClickHouse that referenced this pull request Dec 25, 2025
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
(cherry picked from commit c68fb33)
zzcclp pushed a commit to Kyligence/ClickHouse that referenced this pull request Mar 10, 2026
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
(cherry picked from commit c68fb33)
(cherry picked from commit 12ab7a6)
zzcclp pushed a commit to Kyligence/ClickHouse that referenced this pull request Mar 11, 2026
Fix rebase issue:
- 20250728 ClickHouse#84180
- 20250713 ClickHouse#82949
- 20250703 ClickHouse#82934
- 20250626 ClickHouse#80931
- 20250604 ClickHouse#79649
- 20250502 ClickHouse#79180
- 20250416 ClickHouse#78485
- 20250306 ClickHouse#76662

Co-authored-by: liuneng1994 <[email protected]>
(cherry picked from commit c68fb33)
(cherry picked from commit 12ab7a6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Distributing Workloads More Evenly Between Parquet Pipes

4 participants