Support Iceberg Metadata Files Cache#77156
Conversation
tests/integration/test_storage_iceberg/configs/users.d/users.xml
Outdated
Show resolved
Hide resolved
|
Before I finish this PR, I have to implement cache of ManifestList/File first. Because:
|
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
26dd9e9 to
c05f8d0
Compare
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Outdated
Show resolved
Hide resolved
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadataFilesCache.h
Show resolved
Hide resolved
| if (manifest_cache) | ||
| { | ||
| auto manifest_file = manifest_cache->getOrSetManifestFile(IcebergMetadataFilesCache::getKey(configuration_ptr, filename), create_fn); | ||
| schema_processor.addIcebergTableSchema(manifest_file->getSchemaObject()); |
There was a problem hiding this comment.
btw why do we need
schema_processor.addIcebergTableSchema(manifest_file->getSchemaObject());
?
There was a problem hiding this comment.
We need to construct schema_processor while constructing ManifestFileContent. See line 604.
If we get it from cache, we also need to do it (again), because it could be a different IcebergMetadata without adding anything to schema_processor
There was a problem hiding this comment.
may be remove it from line 604 if we anyway do it here? to avoid doing it twice unnecessarily. And also add it in line 623
There was a problem hiding this comment.
We have to do it once before the ctor of ManifestFileContent, because we need it here https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFile.cpp#L181
And adding it twice is not bad. We use map inside it to deduplicate
dd8cbb0 to
7e4e53b
Compare
…eatadata_cache Support Iceberg Metadata Files Cache
Support Iceberg Metadata Files Cache ClickHouse#77156
…eatadata_cache Support Iceberg Metadata Files Cache
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support
IcebergMetadataFilesCache, which will store manifest files/list and metadata.json in one cache.Documentation entry for user-facing changes