[GLUTEN-7028][CH][Part-1] Using PushingPipelineExecutor to write merge tree#7029
Merged
baibaichen merged 41 commits intoapache:mainfrom Sep 6, 2024
Merged
[GLUTEN-7028][CH][Part-1] Using PushingPipelineExecutor to write merge tree#7029baibaichen merged 41 commits intoapache:mainfrom
PushingPipelineExecutor to write merge tree#7029baibaichen merged 41 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI |
1bd3751 to
bfeafec
Compare
|
Run Gluten Clickhouse CI |
1 similar comment
|
Run Gluten Clickhouse CI |
9cbb5c8 to
8a5c251
Compare
|
Run Gluten Clickhouse CI |
1 similar comment
|
Run Gluten Clickhouse CI |
d271966 to
929465a
Compare
|
Run Gluten Clickhouse CI |
929465a to
c57298f
Compare
|
Run Gluten Clickhouse CI |
2 similar comments
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
Contributor
|
LGTM |
dc6c979 to
33a6ec6
Compare
|
Run Gluten Clickhouse CI |
2 similar comments
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
e17282f to
190e062
Compare
|
Run Gluten Clickhouse CI |
1 similar comment
|
Run Gluten Clickhouse CI |
73ee427 to
1db8664
Compare
|
Run Gluten Clickhouse CI |
3 similar comments
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
10c5b8e to
eeb966e
Compare
|
Run Gluten Clickhouse CI |
parseStorage => getStorage
SparkStorageMergeTree => SparkWriteStorageMergeTree
…geTreeTableInstance
3487e3c to
67f377b
Compare
|
Run Gluten Clickhouse CI |
Member
|
LGTM |
loneylee
approved these changes
Sep 6, 2024
baibaichen
added a commit
to Kyligence/gluten
that referenced
this pull request
Sep 6, 2024
baibaichen
added a commit
that referenced
this pull request
Sep 6, 2024
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240906) * Fix build due to ClickHouse/ClickHouse#65832 * Fix UT due to ClickHouse/ClickHouse#65832 * Fix conflict with #7122 * Fix conflict with #7029 * Run GlutenClickHouseMergeTreeCacheDataSSuite locally --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang Chen <[email protected]>
dcoliversun
pushed a commit
to dcoliversun/gluten
that referenced
this pull request
Sep 11, 2024
…rge tree (apache#7029) * 1. Rename Storages/Mergetree to Storages/MergeTree 2. Move MergeTreeTool.cpp/.h from Common to Storages/MergeTree 3. Move CustomStorageMergeTree.cpp/.h and StorageMergeTreeFactory.cpp/.h to MergeTree folderMove CustomStorageMergeTree.cpp/.h and StorageMergeTreeFactory.cpp/.h to MergeTree folder 4. Add CustomMergeTreeDataWriter 5. Remove TempStorageFreer 6. Add SubstraitParserUtils * Make query_map_ as QueryContextManager member * EMBEDDED_PLAN and create_plan_and_executor * minor refactor * tmp * SparkStorageMergeTree CustomMergeTreeDataWriter => SparkMergeTreeDataWriter * Add SparkMergeTreeSink * use SparkStorageMergeTree and SparkMergeTreeSink * Introduce GlutenSettings.h * GlutenMergeTreeWriteSettings * Fix Test Build * typo * ContextPtr => const ContextPtr & * minor refactor * fix style * using GlutenMergeTreeWriteSettings * [TMP] GlutenMergeTreeWriteSettings refactor * [TMP] StorageMergeTreeWrapper * [TMP] StorageMergeTreeWrapper::commitPartToRemoteStorageIfNeeded * [TMP] StorageMergeTreeWrapper::saveMetadata * move thread pool * tmp * rename * move to sparkmergetreesink.h/cpp * MergeTreeTableInstance * sameStructWith => sameTable * parseStorageAndRestore => restoreStorage parseStorage => getStorage * Sink with MergeTreeTable table; * remvoe SparkMergeTreeWriter::writeTempPartAndFinalize * refactor SinkHelper::writeTempPart * Remove write_setting of SparkMergeTreeWriter * SparkMergeTreeWriter using PushingPipelineExecutor * SparkMergeTreeWriteSettings * tmp * GlutenMergeTreeWriteSettings => SparkMergeTreeWriteSettings * make CustomStorageMergeTree constructor protected * MergeTreeTool.cpp/.h => SparkMergeTreeMeta.cpp/.h * CustomStorageMergeTree.cpp/.h => SparkStorageMergeTree.cpp/.h * CustomStorageMergeTree => SparkStorageMergeTree SparkStorageMergeTree => SparkWriteStorageMergeTree * Refactor move codes from MergeTreeRelParser to MergeTreeTable and MergeTreeTableInstance * Refactor Make static member to normal member
dcoliversun
pushed a commit
to dcoliversun/gluten
that referenced
this pull request
Sep 11, 2024
) * [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240906) * Fix build due to ClickHouse/ClickHouse#65832 * Fix UT due to ClickHouse/ClickHouse#65832 * Fix conflict with apache#7122 * Fix conflict with apache#7029 * Run GlutenClickHouseMergeTreeCacheDataSSuite locally --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang Chen <[email protected]>
11 tasks
sharkdtu
pushed a commit
to sharkdtu/gluten
that referenced
this pull request
Nov 11, 2024
…rge tree (apache#7029) * 1. Rename Storages/Mergetree to Storages/MergeTree 2. Move MergeTreeTool.cpp/.h from Common to Storages/MergeTree 3. Move CustomStorageMergeTree.cpp/.h and StorageMergeTreeFactory.cpp/.h to MergeTree folderMove CustomStorageMergeTree.cpp/.h and StorageMergeTreeFactory.cpp/.h to MergeTree folder 4. Add CustomMergeTreeDataWriter 5. Remove TempStorageFreer 6. Add SubstraitParserUtils * Make query_map_ as QueryContextManager member * EMBEDDED_PLAN and create_plan_and_executor * minor refactor * tmp * SparkStorageMergeTree CustomMergeTreeDataWriter => SparkMergeTreeDataWriter * Add SparkMergeTreeSink * use SparkStorageMergeTree and SparkMergeTreeSink * Introduce GlutenSettings.h * GlutenMergeTreeWriteSettings * Fix Test Build * typo * ContextPtr => const ContextPtr & * minor refactor * fix style * using GlutenMergeTreeWriteSettings * [TMP] GlutenMergeTreeWriteSettings refactor * [TMP] StorageMergeTreeWrapper * [TMP] StorageMergeTreeWrapper::commitPartToRemoteStorageIfNeeded * [TMP] StorageMergeTreeWrapper::saveMetadata * move thread pool * tmp * rename * move to sparkmergetreesink.h/cpp * MergeTreeTableInstance * sameStructWith => sameTable * parseStorageAndRestore => restoreStorage parseStorage => getStorage * Sink with MergeTreeTable table; * remvoe SparkMergeTreeWriter::writeTempPartAndFinalize * refactor SinkHelper::writeTempPart * Remove write_setting of SparkMergeTreeWriter * SparkMergeTreeWriter using PushingPipelineExecutor * SparkMergeTreeWriteSettings * tmp * GlutenMergeTreeWriteSettings => SparkMergeTreeWriteSettings * make CustomStorageMergeTree constructor protected * MergeTreeTool.cpp/.h => SparkMergeTreeMeta.cpp/.h * CustomStorageMergeTree.cpp/.h => SparkStorageMergeTree.cpp/.h * CustomStorageMergeTree => SparkStorageMergeTree SparkStorageMergeTree => SparkWriteStorageMergeTree * Refactor move codes from MergeTreeRelParser to MergeTreeTable and MergeTreeTableInstance * Refactor Make static member to normal member
sharkdtu
pushed a commit
to sharkdtu/gluten
that referenced
this pull request
Nov 11, 2024
) * [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240906) * Fix build due to ClickHouse/ClickHouse#65832 * Fix UT due to ClickHouse/ClickHouse#65832 * Fix conflict with apache#7122 * Fix conflict with apache#7029 * Run GlutenClickHouseMergeTreeCacheDataSSuite locally --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang Chen <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR refactors
SparkMergeTreeWriter, usingPushingPipelineExecutorto write mergetree instead of manually written codes.SparkMergeTreeWriterdid 4 different tasksDB::Squashingto merge blocks into one bigger blocks, this functionality is now done byPlanSquashingTransformandApplySquashingTransformSparkMergeTreeDataWriterSinkHelperand it's derived classes.SparkMergeTreeSinkandPushingPipelineExecutorThe current work flow looks like:
We did this works for two reasons:
WriteFilesExec, so now we can write parquet and orc in one native pipeline wittout modify spark source codes, see [GLUTEN-6067][CH] [Part 3-2] Basic support for Native Write in Spark 3.5 #6586PushingPipelineExecutor.After this PR, we can unify writing for all formats for spark 3.2, 3.3 and 3.5.
Other Refactor:
Storage/MergetreetoStorage/MergeTreeStorage/MergeTreeCustomStorageMergeTreetoSparkStorageMergeTreeSparkWriteStorageMergeTreeand implementwritemethod to createSparkMergeTreeSink.MergeTreeTableInstanceand inherit fromMergeTreeTable(Fixes: #7028)
How was this patch tested?
Using Existed Tests