Merged
Conversation
…e level, better defaults.
…are.name and client.software.version now initialized
3973762 to
94261b9
Compare
Contributor
Author
|
It seems like Also it seem that the issue #11216 affects performance (when we drain librdkafka internal queue it start reporting 'stalled' status). |
Contributor
Author
|
Benchmark results: consume speed (measured by https://github.com/filimonov/ch-kafka-consume-perftest): That PR (v20.5.1.3632. 26d93fd)
Base master commit:
With default settings: 2.58x times faster. <-- that was the main goal of changing defaults. |
This was referenced Jun 4, 2020
abyss7
approved these changes
Jun 6, 2020
Contributor
|
Yandex check failure is unrelated |
azat
reviewed
Jun 17, 2020
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support for all format settings in Kafka, expose some setting on table level, adjust the defaults for better performance.
Detailed description / Documentation draft:
Now it is possible to use any setting related to format parsing during the creation of Kafka table.
Sample:
ClickHouse/tests/integration/test_storage_kafka/test.py
Lines 288 to 295 in 26d93fd
Those settings changed/added:
kafka_max_block_size: Number of rows collected by poll(s) for flushing data from Kafka. Default changed tomin_insert_block_size / kafka_num_consumers(for the single consumer it is 1048576, before thatmax_block_size(65536) was used, which was suboptimal, because was leading to too frequent commits and too small insert blocks in the target table. See Fixed reschedule issue in Kafka #11149 (comment) Kafka fixes part2 #8917 (comment)kafka_poll_max_batch_size: Maximum amount of messages to be polled in a single Kafka poll. (default now ismin(max_block_size, kafka_max_block_size), normally 65536). Can now be configurable separated, before is waskafka_max_block_size. It's better to do smaller polls, to avoid bigger allocations, and to give a chance for librdkafka to fill the queue while we processing polled block.kafka_poll_timeout_ms: Timeout for the single poll from Kafka (default is taken from stream_poll_timeout_ms=500ms), new setting, can now be configurable per table (beforestream_poll_timeout_mswas always used).kafka_flush_interval_ms: Timeout for flushing data from Kafka (default is taken from stream_flush_interval_ms=7500ms), new setting, can now be configurable per table (beforestream_flush_interval_mswas always used).To understand the relation of those settings check the following preuso-code illustrating how ClickHouse consumes the data from the kafka:
Librdkafka settings adjusted:
client.software.name- now filled as "ClickHouse"client.software.version- now filled with clickhouse version, for example "v20.5.1.1-prestable"queued.min.messages- default (100000) is increased tokafka_max_block_size(but not decreased), which allows preventing fast draining of the librdkafka queue during the building of a single insert block. Improves performance significantly, but may lead to bigger memory consumption.Extra:
Closes #11308
Closes #4116
Closes #8056