Adjustable asynchronous insert timeouts by jkartseva · Pull Request #58486 · ClickHouse/ClickHouse

jkartseva · 2024-01-04T06:39:05Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate.

async_insert_busy_timeout_ms is aliased to async_insert_busy_timeout_max_ms.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

robot-ch-test-poll3 · 2024-01-04T06:41:42Z

This is an automated comment for commit 505b9ba with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	✅ success
ClickBench	Runs [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table	✅ success
ClickHouse build check	Builds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker keeper image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docker server image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docs check	Builds and tests the documentation	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Mergeable Check	Checks if all other necessary checks are successful	✅ success
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	✅ success
SQLancer	Fuzzing tests that detect logical bugs with SQLancer tool	✅ success
Sqllogic	Run clickhouse on the sqllogic test set against sqlite and checks that all statements are passed	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	✅ success
Style check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success
Unit tests	Runs the unit tests for different release types	✅ success
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	✅ success

Check name	Description	Status
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	⏳ pending
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	❌ failure

qoega · 2024-01-04T13:48:02Z

src/Interpreters/AsynchronousInsertQueue.cpp

Why you can't set min to 0? Auto adjusting should be capable to understand that it is a bad idea to really set it to 0. But even in that case it is ok to just start to write synchronously for this requests.

This can be done. I would still like to preserve milliseconds as the smallest time unit, without dispersing into microseconds.

src/Interpreters/AsynchronousInsertQueue.cpp

qoega · 2024-01-04T13:55:32Z

src/Interpreters/AsynchronousInsertQueue.cpp

size_in_bytes?

size_in_bytes corresponds to the entries with the same key.hash, they have already been inserted and accounted for by the metric. Here, we should only increase the metric by the size of the current entry (entry_data_size).
However, we should subtract the size of the previously inserted entries from the metric if the entreis are to be processed synchronously.

qoega · 2024-01-04T13:59:53Z

src/Interpreters/AsynchronousInsertQueue.cpp

I'm not sure how it will work with numbers like min=1, max=5. Will it stuck at 1 and 2 indefinitely?

qoega · 2024-01-04T14:03:26Z

src/Interpreters/AsynchronousInsertQueue.cpp

I do not get if it, how it works. We update timeout before we check bytes/queries limit.
Will we update busy timeout in this case as well? As I understand the algorithm we do not need to batch more if there is enough bytes per batch.

We do not mark the whole queue shard for immediate processing, but rather only the entries with the specific key.hash that have reached either bytes or # of queries limit:

ClickHouse/src/Interpreters/AsynchronousInsertQueue.cpp

Lines 326 to 354 in 06274c7

auto [it, inserted] = shard.iterators.try_emplace(key.hash);

if (inserted)

{

auto now = std::chrono::steady_clock::now();

auto timeout = now + Milliseconds{key.settings.async_insert_busy_timeout_ms};

it->second = shard.queue.emplace(timeout, Container{key, std::make_unique<InsertData>()}).first;

}

auto queue_it = it->second;

auto & data = queue_it->second.data;

size_t entry_data_size = entry->chunk.byteSize();

assert(data);

data->size_in_bytes += entry_data_size;

data->entries.emplace_back(entry);

insert_future = entry->getFuture();

LOG_TRACE(log, "Have {} pending inserts with total {} bytes of data for query '{}'",

data->entries.size(), data->size_in_bytes, key.query_str);

bool has_enough_bytes = data->size_in_bytes >= key.settings.async_insert_max_data_size;

bool has_enough_queries = data->entries.size() >= key.settings.async_insert_max_query_number && key.settings.async_insert_deduplicate;

/// Here we check whether we hit the limit on maximum data size in the buffer.

/// And use setting from query context.

/// It works, because queries with the same set of settings are already grouped together.

if (!flush_stopped && (has_enough_bytes || has_enough_queries))

{

data_to_process = std::move(data);

Other entries that have fallen into the same queue shard will remain in the async queue.
Hence, I believe, we should adjust the timeout.

antonio2368 · 2024-01-24T10:52:50Z

src/Core/Settings.h

we want this to be default for async inserts no?
also, if there is a reason, there is a compatibility setting which can preserve behaviour for older versions (check SettingsChangesHistory)

I wanted this setting to be deployed and adopted gradually, so I added enable_async_insert_adaptive_busy_timeout with a kill switch in mind. We might meet a corner case not accounted for during testing.
Would allow_experimental... be a better name for the purpose? The setting can be marked as deprecated later on.

Hm I see this more as an implementation detail for async inserts and a strictly better implementation.
From users perspective, he should worry only if he wants to use async inserts or not. I'm okay with having this flag as a safety net if the new implementation misbehaves but I cannot see a reason why someone would want the old behavior.

Ready for review. Tests failures are not relevant, e.g., https://s3.amazonaws.com/clickhouse-test-reports/0/0cbadc9b52f5e728a397b3081c6887c0f4213856/stress_test__debug_.html is also failing on master.
I'll take a look at what might be wrong there.

Currently, the new feature is disabled by default, we may choose to enable it. I want to preserve allow_experimental_<..> kill switch, though.

src/Interpreters/AsynchronousInsertQueue.cpp

antonio2368 · 2024-01-24T11:08:54Z

src/Interpreters/AsynchronousInsertQueue.h

more descriptive function name
also, move implementation of both functions to cpp?

antonio2368 · 2024-01-24T11:35:00Z

src/Interpreters/AsynchronousInsertQueue.cpp

why don't we start from min timeout if adaptive is enabled?

I've been considering a smooth transition from the state 'enable_async_insert_adaptive_busy_timeout=0' to '1'. async_insert_busy_timeout_ms is a timeout anticipated by a customer, and the adaptive algorithm treats it as the initial point. It minimizes the value if inserts are not frequent.

is a timeout anticipated by a customer

if they touch for some reason the timeout setting, otherwise they don't anticipate anything as long as it works

the problem here is that we would like to solve the timeout of the initial insert, and if the user set async_insert_busy_timeout_ms to some high value, it will take a long time for even a simple insert of 1 row.
IMO we should start from minimal flushtimeout and gradually grow from there if the insert rate increases.

src/Interpreters/AsynchronousInsertQueue.cpp

antonio2368 · 2024-01-25T08:43:33Z

src/Core/Settings.h

shouldn't this be same as async_insert_busy_timeout_ms?
in the current implementation async_insert_busy_timeout_ms is the total time limit which will force a flush, i.e. maximum we can keep the entries before flushing.
Also, what if user changed async_insert_busy_timeout_ms to 2s? He would get errors how async_insert_busy_timeout_max_ms is not okay, and then he needs to figure out what it is and what value to set it to.

Why not put async_insert_busy_timeout_ms as alias to async_insert_busy_timeout_max_ms to keep backward compatibility?

This sounds good to me. Is there a way to alias two settings (async_insert_busy_timeout_ms and async_insert_busy_timeout_max_ms) so that they are tied to the same value?

TIL ALIAS expression, will do.

jkartseva · 2024-01-26T02:53:00Z

The newly added metrics

SELECT 
  toStartOfInterval(event_time, INTERVAL {rounding:UInt32} SECOND)::INT AS t,
  avg(metric)
FROM (
  SELECT event_time, sum(CurrentMetric_AsynchronousInsertQueueBytes) AS metric 
  FROM clusterAllReplicas(default, system.metric_log)
  WHERE event_date >= toDate(now() - {seconds:UInt32})
    AND event_time >= now() - {seconds:UInt32}
  GROUP BY event_time)
GROUP BY t
ORDER BY t WITH FILL STEP {rounding:UInt32} SETTINGS skip_unavailable_shards = 1

SELECT 
  toStartOfInterval(event_time, INTERVAL {rounding:UInt32} SECOND)::INT AS t,
  avg(value)
FROM (
  SELECT event_time, avg(timeout_milliseconds) AS value 
  FROM clusterAllReplicas(default, system.asynchronous_insert_log)
  WHERE event_date >= toDate(now() - {seconds:UInt32})
    AND event_time >= now() - {seconds:UInt32}
  GROUP BY event_time)
GROUP BY t
ORDER BY t WITH FILL STEP {rounding:UInt32} SETTINGS skip_unavailable_shards = 1

clickhouse-ci · 2024-01-30T21:45:36Z

This is an automatic comment. The PR descriptions does not match the template.

Please, edit it accordingly.

The error is: Changelog entry required for category 'Improvement'

antonio2368 · 2024-01-31T07:36:11Z

We can maybe add async_insert_use_adaptive_busy_timeout to random settings in clickhouse-test?

jkartseva · 2024-01-31T18:25:17Z

I'm confused about "New settings are not reflected in settings changes history (see new_settings.txt)", though I have updated settings.md

https://s3.amazonaws.com/clickhouse-test-reports/58486/da12bf77cfeceef9476c29c447ef82acacf6cb8d/upgrade_check__debug_.html

Do you know by any chance where this new_settings.txt is located? @antonio2368

antonio2368 · 2024-01-31T18:29:36Z

@jkartseva that file is generated by a query during upgrade test run to tell you which new settings were not added to src/Core/SettingsChangesHistory.h
Go to the failed test and open new_settings.txt
https://s3.amazonaws.com/clickhouse-test-reports/58486/da12bf77cfeceef9476c29c447ef82acacf6cb8d/upgrade_check__debug_.html

jkartseva · 2024-01-31T20:24:26Z

We can maybe add async_insert_use_adaptive_busy_timeout to random settings in clickhouse-test?

I'll publish a follow-up PR.

jkartseva · 2024-02-01T05:07:57Z

Set version to 24.2 in settings history. It should be good to merge.

The current timeout for checking updates in the asynchronous queue is equal to the timeout used for queue entry (async_insert_busy_timeout_ms). That means that, in the worst case, an entry spends twice the time of the asynchronous timeout in the queue.

antonio2368 · 2024-02-01T08:31:58Z

src/Interpreters/AsynchronousInsertQueue.cpp

what will we do then for backward compatibility if someone set async_insert_busy_timeout_ms to a value less then default async_insert_busy_timeout_min_ms?

Yes, it slipped my mind to update it, thanks. In that case, the minimum value of the two is taken as the async timeout.

Implement the algorithm described in ClickHouse#56783 for adaptive asynchronous insert timeouts. - The adaptive async insert timeout can take values within [async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms]. - The initial value is set to async_insert_busy_timeout_min_ms. - If the elapsed time since the most recent queue insert was greater than the maximum timeout, process the queue content immediately, and reduce the timeout. - If the elapsed time was long enough (longer than a would-be decreased timeout), decrease the timeout. - The adaptive timeout is changes exponentially based on the async_insert_busy_timeout_{increase|decrease}_rate. Fixes: ClickHouse#56783

Add AsynchronousInsertQueueSize and AsynchronousInsertQueueBytes metrics to improve observability of asynchronous inserts. The metrics do not account for tasks dispatched for immediate processing (as opposed to, e.g., PendingAsyncInsert). ``` SELECT value FROM system.metrics WHERE metric IN ('AsynchronousInsertQueueSize', 'PendingAsyncInsert') Query id: a711dd83-b48d-4ad5-8031-fa59b21a7c38 ┌─value─┐ │ 18 │ │ 23 │ └───────┘ ``` ``` SELECT value FROM system.metrics WHERE metric IN ('AsynchronousInsertQueueSize', 'AsynchronousInsertQueueBytes') Query id: b35a7ceb-2bb5-46ad-b301-e6cf03508699 ┌─value─┐ │ 28 │ │ 1372 │ └───────┘ ```

Add initial tests.

In addition to the time since the most recent insert, consider the elapsed time between the two recent queue flushes when decreasing the timeout or processing an entry synchronously.

jkartseva · 2024-02-02T07:54:56Z

The adaptive timeout ramping up from the minimum could contradict the async_insert_deduplicate setting, whose intention is to eliminate duplicates in a batch. At the very least, synchronous queue flush should be disabled for the setting. I'll file a follow-up.

antonio2368 · 2024-02-02T09:54:21Z

The adaptive timeout ramping up from the minimum could contradict the async_insert_deduplicate setting, whose intention is to eliminate duplicates in a batch. At the very least, synchronous queue flush should be disabled for the setting. I'll file a follow-up.

No, deduplication eliminates duplicates in general (or at least in replicated_deduplication_window_for_async_inserts window).
Removing duplicates from batch is just optimization so we don't have to do it after verifying with Keeper.

qoega · 2024-08-20T09:51:53Z

tests/integration/test_async_insert_adaptive_busy_timeout/test.py

+            fixed_tm_table_name,
+            fixed_tm_settings,
+            thread_num=15,
+            tasks=1000,


@jkartseva can you help me understand why you check with fixed timeout 1000 iterations, but compare it with 100 of adaptive timeout?

jkartseva added the can be tested Allows running workflows for external contributors label Jan 4, 2024

robot-ch-test-poll3 added the pr-improvement Pull request with some product improvements label Jan 4, 2024

jkartseva marked this pull request as draft January 4, 2024 06:42

antonio2368 self-assigned this Jan 4, 2024

qoega reviewed Jan 4, 2024

View reviewed changes

jkartseva force-pushed the adjustable-async-timeouts branch 6 times, most recently from 78fb04f to fb51320 Compare January 13, 2024 03:59

jkartseva force-pushed the adjustable-async-timeouts branch 12 times, most recently from 64a8343 to 4a7c239 Compare January 22, 2024 01:24

jkartseva marked this pull request as ready for review January 22, 2024 07:42

jkartseva mentioned this pull request Jan 23, 2024

Fix race on Context::async_insert_queue #59082

Merged

antonio2368 reviewed Jan 24, 2024

View reviewed changes

antonio2368 reviewed Jan 25, 2024

View reviewed changes

jkartseva force-pushed the adjustable-async-timeouts branch from 4a7c239 to ff3408d Compare January 25, 2024 23:31

antonio2368 approved these changes Jan 30, 2024

View reviewed changes

jkartseva force-pushed the adjustable-async-timeouts branch from 0a64193 to 7b37a27 Compare January 30, 2024 21:06

jkartseva force-pushed the adjustable-async-timeouts branch from 7b37a27 to da12bf7 Compare January 31, 2024 05:02

jkartseva force-pushed the adjustable-async-timeouts branch from d51cfe5 to f776ec7 Compare February 1, 2024 05:06

jkartseva force-pushed the adjustable-async-timeouts branch from f776ec7 to 8ac5ccd Compare February 1, 2024 08:28

antonio2368 reviewed Feb 1, 2024

View reviewed changes

jkartseva added 6 commits February 1, 2024 22:45

Log timeout for entries processed asynchronously

4ef02a1

Integration tests for adaptive async insert timeout

17d2455

Add initial tests.

Adaptive async timeouts: consider queue flush history

689c368

In addition to the time since the most recent insert, consider the elapsed time between the two recent queue flushes when decreasing the timeout or processing an entry synchronously.

Update settings.md

5b07039

jkartseva force-pushed the adjustable-async-timeouts branch 3 times, most recently from 56f24fd to dda191d Compare February 1, 2024 23:58

jkartseva added 2 commits February 2, 2024 07:15

Update existing tests

7c715e6

Update SettingsChangesHistory.h

505b9ba

jkartseva force-pushed the adjustable-async-timeouts branch from dda191d to 505b9ba Compare February 2, 2024 07:17

jkartseva merged commit f41f4e4 into ClickHouse:master Feb 2, 2024

qoega reviewed Aug 20, 2024

View reviewed changes

michaelsmithxyz mentioned this pull request Sep 20, 2024

Documentation of the default value for async_insert_use_adaptive_busy_timeout is incorrect ClickHouse/clickhouse-docs#2649

Closed

	auto [it, inserted] = shard.iterators.try_emplace(key.hash);
	if (inserted)
	{
	auto now = std::chrono::steady_clock::now();
	auto timeout = now + Milliseconds{key.settings.async_insert_busy_timeout_ms};
	it->second = shard.queue.emplace(timeout, Container{key, std::make_unique<InsertData>()}).first;
	}

	auto queue_it = it->second;
	auto & data = queue_it->second.data;
	size_t entry_data_size = entry->chunk.byteSize();

	assert(data);
	data->size_in_bytes += entry_data_size;
	data->entries.emplace_back(entry);
	insert_future = entry->getFuture();

	LOG_TRACE(log, "Have {} pending inserts with total {} bytes of data for query '{}'",
	data->entries.size(), data->size_in_bytes, key.query_str);

	bool has_enough_bytes = data->size_in_bytes >= key.settings.async_insert_max_data_size;
	bool has_enough_queries = data->entries.size() >= key.settings.async_insert_max_query_number && key.settings.async_insert_deduplicate;

	/// Here we check whether we hit the limit on maximum data size in the buffer.
	/// And use setting from query context.
	/// It works, because queries with the same set of settings are already grouped together.
	if (!flush_stopped && (has_enough_bytes \|\| has_enough_queries))
	{
	data_to_process = std::move(data);

Conversation

jkartseva commented Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

robot-ch-test-poll3 commented Jan 4, 2024 • edited by robot-clickhouse-ci-2 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkartseva Jan 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkartseva commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clickhouse-ci bot commented Jan 30, 2024

Uh oh!

antonio2368 commented Jan 31, 2024

Uh oh!

jkartseva commented Jan 31, 2024

Uh oh!

antonio2368 commented Jan 31, 2024

Uh oh!

jkartseva commented Jan 31, 2024

Uh oh!

jkartseva commented Feb 1, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkartseva commented Feb 2, 2024

Uh oh!

antonio2368 commented Feb 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

jkartseva commented Jan 4, 2024 •

edited

Loading

robot-ch-test-poll3 commented Jan 4, 2024 •

edited by robot-clickhouse-ci-2

Loading

jkartseva Jan 8, 2024 •

edited

Loading

jkartseva commented Jan 26, 2024 •

edited

Loading