Skip to content

Polymorphic parts (in-memory format)#10697

Merged
alesapin merged 44 commits intoClickHouse:masterfrom
CurtizJ:polymorphic-parts
Jul 8, 2020
Merged

Polymorphic parts (in-memory format)#10697
alesapin merged 44 commits intoClickHouse:masterfrom
CurtizJ:polymorphic-parts

Conversation

@CurtizJ
Copy link
Member

@CurtizJ CurtizJ commented May 6, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added new in-memory format of parts in MergeTree-family tables, which stores data in memory. Parts are written on disk at first merge. Part will be created in in-memory format if its size in rows or bytes is below thresholds min_rows_for_compact_part and min_bytes_for_compact_part. Also optional support of Write-Ahead-Log is available, which is enabled by default and is controlled by setting in_memory_parts_enable_wal.

@CurtizJ CurtizJ marked this pull request as draft May 6, 2020 12:09
@CurtizJ
Copy link
Member Author

CurtizJ commented May 6, 2020

TODO:

  • Support partition commands
  • Tune MergeSelector's algorithm for in-memory parts
  • Add memory limit for those parts in total

@blinkov blinkov added doc-alert pr-feature Pull request with new product feature labels May 6, 2020
@CurtizJ CurtizJ force-pushed the polymorphic-parts branch from c22be14 to 4c03f48 Compare May 15, 2020 00:39
@CurtizJ CurtizJ force-pushed the polymorphic-parts branch from 7b3e0f4 to e8262cc Compare May 15, 2020 00:57
@alesapin alesapin assigned alesapin and unassigned Akazz Jun 26, 2020

merger_mutator.renameMergedTemporaryPart(new_part, future_part.parts, nullptr);

DataPartsVector parts_to_remove_immediately;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to move this logic into grabOldParts of MergeTree. Maybe introduce additional setting for lifetime of in-memory parts.

else if (type == MergeTreeDataPartType::WIDE)
return std::make_shared<MergeTreeDataPartWide>(*this, name, part_info, volume, relative_path);
else if (type == MergeTreeDataPartType::IN_MEMORY)
return std::make_shared<MergeTreeDataPartInMemory>(*this, name, part_info, volume, relative_path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the in-memory part also require disk space?

if (part_in_memory && getSettings()->in_memory_parts_enable_wal)
{
auto wal = getWriteAheadLog();
wal->addPart(part_in_memory->block, part_in_memory->name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't create any disk reservations for this. So our WAL is not accounted in our disk space. Maybe we can use a reservation from the part?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now write_ahead_log_max_bytes are reserved

}

/// Calculates uncompressed sizes in memory.
void MergeTreeDataPartInMemory::calculateEachColumnSizesOnDisk(ColumnSizeByName & each_columns_size, ColumnSize & total_size) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this method to return zero :) Maybe we have to rename it?

for (const auto & part : inserted_parts)
{
auto part_in_memory = asInMemoryPart(part);
if (!part_in_memory->waitUntilMerged(in_memory_parts_timeout))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what user should do with this error? I think most of the users will just retry this error and duplicate data.

Copy link
Member Author

@CurtizJ CurtizJ Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this functionality at all. It's better to implement syncronous insert, that depends on fsync of WAL. I think this guaranty will be enough.

@CurtizJ
Copy link
Member Author

CurtizJ commented Jul 3, 2020

From "Yandex synchronization check" report: cannot find source file: MergeTree/MergeTreeWriteAheadLog.cpp
But this file is already in ya.make. I don't know why it failed.

Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we have to finish the throttler in the next PR.

@alesapin
Copy link
Member

alesapin commented Jul 8, 2020

Performance seems not related to changes.

@alesapin alesapin merged commit 84f8bf1 into ClickHouse:master Jul 8, 2020
alexey-milovidov added a commit that referenced this pull request Jul 12, 2020
@alexey-milovidov
Copy link
Member

#12437

traceon added a commit to traceon/ClickHouse that referenced this pull request Jul 13, 2020
* master: (27 commits)
  Whitespaces
  Fix typo
  Fix UBSan report in base64
  Correct default secure port for clickhouse-benchmark ClickHouse#11044
  Remove test with bug ClickHouse#10697
  Update in-functions.md (ClickHouse#12430)
  Allow nullable key in MergeTree
  Update arithmetic-functions.md
  [docs] add rabbitmq docs (ClickHouse#12326)
  Lower block sizes and look what will happen ClickHouse#9248
  Fix lifetime_bytes/lifetime_rows for Buffer direct block write
  Retrigger CI
  Fix up  test_mysql_protocol failed
  Implement lifetime_rows/lifetime_bytes for Buffer engine
  Add comment regarding proxy tunnel usage in PocoHTTPClient.cpp
  Add lifetime_rows/lifetime_bytes interface (exported via system.tables)
  Tiny IStorage refactoring
  Trigger integration-test-runner image rebuild.
  Delete log.txt
  Fix test_mysql_client/test_python_client error
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants