Adjustable asynchronous insert timeouts#58486
Conversation
|
This is an automated comment for commit 505b9ba with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
There was a problem hiding this comment.
Why you can't set min to 0? Auto adjusting should be capable to understand that it is a bad idea to really set it to 0. But even in that case it is ok to just start to write synchronously for this requests.
There was a problem hiding this comment.
This can be done. I would still like to preserve milliseconds as the smallest time unit, without dispersing into microseconds.
There was a problem hiding this comment.
size_in_bytes corresponds to the entries with the same key.hash, they have already been inserted and accounted for by the metric. Here, we should only increase the metric by the size of the current entry (entry_data_size).
However, we should subtract the size of the previously inserted entries from the metric if the entreis are to be processed synchronously.
There was a problem hiding this comment.
I'm not sure how it will work with numbers like min=1, max=5. Will it stuck at 1 and 2 indefinitely?
There was a problem hiding this comment.
I do not get if it, how it works. We update timeout before we check bytes/queries limit.
Will we update busy timeout in this case as well? As I understand the algorithm we do not need to batch more if there is enough bytes per batch.
There was a problem hiding this comment.
We do not mark the whole queue shard for immediate processing, but rather only the entries with the specific key.hash that have reached either bytes or # of queries limit:
ClickHouse/src/Interpreters/AsynchronousInsertQueue.cpp
Lines 326 to 354 in 06274c7
Other entries that have fallen into the same queue shard will remain in the async queue.
Hence, I believe, we should adjust the timeout.
78fb04f to
fb51320
Compare
64a8343 to
4a7c239
Compare
src/Core/Settings.h
Outdated
There was a problem hiding this comment.
we want this to be default for async inserts no?
also, if there is a reason, there is a compatibility setting which can preserve behaviour for older versions (check SettingsChangesHistory)
There was a problem hiding this comment.
I wanted this setting to be deployed and adopted gradually, so I added enable_async_insert_adaptive_busy_timeout with a kill switch in mind. We might meet a corner case not accounted for during testing.
Would allow_experimental... be a better name for the purpose? The setting can be marked as deprecated later on.
There was a problem hiding this comment.
Hm I see this more as an implementation detail for async inserts and a strictly better implementation.
From users perspective, he should worry only if he wants to use async inserts or not. I'm okay with having this flag as a safety net if the new implementation misbehaves but I cannot see a reason why someone would want the old behavior.
There was a problem hiding this comment.
Ready for review. Tests failures are not relevant, e.g., https://s3.amazonaws.com/clickhouse-test-reports/0/0cbadc9b52f5e728a397b3081c6887c0f4213856/stress_test__debug_.html is also failing on master.
I'll take a look at what might be wrong there.
Currently, the new feature is disabled by default, we may choose to enable it. I want to preserve allow_experimental_<..> kill switch, though.
There was a problem hiding this comment.
more descriptive function name
also, move implementation of both functions to cpp?
There was a problem hiding this comment.
why don't we start from min timeout if adaptive is enabled?
There was a problem hiding this comment.
I've been considering a smooth transition from the state 'enable_async_insert_adaptive_busy_timeout=0' to '1'. async_insert_busy_timeout_ms is a timeout anticipated by a customer, and the adaptive algorithm treats it as the initial point. It minimizes the value if inserts are not frequent.
There was a problem hiding this comment.
is a timeout anticipated by a customer
if they touch for some reason the timeout setting, otherwise they don't anticipate anything as long as it works
the problem here is that we would like to solve the timeout of the initial insert, and if the user set async_insert_busy_timeout_ms to some high value, it will take a long time for even a simple insert of 1 row.
IMO we should start from minimal flushtimeout and gradually grow from there if the insert rate increases.
src/Core/Settings.h
Outdated
There was a problem hiding this comment.
shouldn't this be same as async_insert_busy_timeout_ms?
in the current implementation async_insert_busy_timeout_ms is the total time limit which will force a flush, i.e. maximum we can keep the entries before flushing.
Also, what if user changed async_insert_busy_timeout_ms to 2s? He would get errors how async_insert_busy_timeout_max_ms is not okay, and then he needs to figure out what it is and what value to set it to.
Why not put async_insert_busy_timeout_ms as alias to async_insert_busy_timeout_max_ms to keep backward compatibility?
There was a problem hiding this comment.
This sounds good to me. Is there a way to alias two settings (async_insert_busy_timeout_ms and async_insert_busy_timeout_max_ms) so that they are tied to the same value?
There was a problem hiding this comment.
TIL ALIAS expression, will do.
4a7c239 to
ff3408d
Compare
0a64193 to
7b37a27
Compare
|
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: Changelog entry required for category 'Improvement' |
7b37a27 to
da12bf7
Compare
|
We can maybe add |
|
I'm confused about "New settings are not reflected in settings changes history (see new_settings.txt)", though I have updated Do you know by any chance where this |
|
@jkartseva that file is generated by a query during upgrade test run to tell you which new settings were not added to |
I'll publish a follow-up PR. |
d51cfe5 to
f776ec7
Compare
|
Set version to 24.2 in settings history. It should be good to merge. |
The current timeout for checking updates in the asynchronous queue is equal to the timeout used for queue entry (async_insert_busy_timeout_ms). That means that, in the worst case, an entry spends twice the time of the asynchronous timeout in the queue.
f776ec7 to
8ac5ccd
Compare
There was a problem hiding this comment.
what will we do then for backward compatibility if someone set async_insert_busy_timeout_ms to a value less then default async_insert_busy_timeout_min_ms?
There was a problem hiding this comment.
Yes, it slipped my mind to update it, thanks. In that case, the minimum value of the two is taken as the async timeout.
Implement the algorithm described in ClickHouse#56783 for adaptive asynchronous insert timeouts. - The adaptive async insert timeout can take values within [async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms]. - The initial value is set to async_insert_busy_timeout_min_ms. - If the elapsed time since the most recent queue insert was greater than the maximum timeout, process the queue content immediately, and reduce the timeout. - If the elapsed time was long enough (longer than a would-be decreased timeout), decrease the timeout. - The adaptive timeout is changes exponentially based on the async_insert_busy_timeout_{increase|decrease}_rate. Fixes: ClickHouse#56783
Add AsynchronousInsertQueueSize and AsynchronousInsertQueueBytes
metrics to improve observability of asynchronous inserts.
The metrics do not account for tasks dispatched for immediate processing
(as opposed to, e.g., PendingAsyncInsert).
```
SELECT value
FROM system.metrics
WHERE metric IN ('AsynchronousInsertQueueSize', 'PendingAsyncInsert')
Query id: a711dd83-b48d-4ad5-8031-fa59b21a7c38
┌─value─┐
│ 18 │
│ 23 │
└───────┘
```
```
SELECT value
FROM system.metrics
WHERE metric IN ('AsynchronousInsertQueueSize', 'AsynchronousInsertQueueBytes')
Query id: b35a7ceb-2bb5-46ad-b301-e6cf03508699
┌─value─┐
│ 28 │
│ 1372 │
└───────┘
```
Add initial tests.
In addition to the time since the most recent insert, consider the elapsed time between the two recent queue flushes when decreasing the timeout or processing an entry synchronously.
56f24fd to
dda191d
Compare
dda191d to
505b9ba
Compare
|
The adaptive timeout ramping up from the minimum could contradict the |
No, deduplication eliminates duplicates in general (or at least in |
| fixed_tm_table_name, | ||
| fixed_tm_settings, | ||
| thread_num=15, | ||
| tasks=1000, |
There was a problem hiding this comment.
@jkartseva can you help me understand why you check with fixed timeout 1000 iterations, but compare it with 100 of adaptive timeout?

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate.
async_insert_busy_timeout_ms is aliased to async_insert_busy_timeout_max_ms.
Documentation entry for user-facing changes