Added settings to disallow concurrent backups and restores#45072
Added settings to disallow concurrent backups and restores#45072SmitaRKulkarni merged 8 commits intomasterfrom
Conversation
Implementation: * Added server level settings to disallow concurrent backups and restores, which are read and set when BackupWorker is created in Context. * Settings are set to true by default. * Before starting backup or restores, added a check to see if any other backups/restores are running (except internal ones). Testing: * Added a test test_backup_and_restore_on_cluster/test_disallow_concurrency.
tests/integration/test_backup_restore_on_cluster/test_disallow_concurrency.py
Outdated
Show resolved
Hide resolved
…ncurrent internal backups & restores - Added settings to disallow concurrent backups and restores
|
When performing backup/restore on a cluster, the node which receives the requests forwards it to all nodes including itself. Thus the same node also will be processing 2 requests (1 internal). In case concurrent backup/restore is turned off, then added a check to verify that internal request is from the self node using backup_uuid. |
…ncurrent backups and restores
| <allowed_disk>backups</allowed_disk> | ||
| </backups> | ||
| <allow_concurrent_backups>false</allow_concurrent_backups> | ||
| <allow_concurrent_restores>false</allow_concurrent_restores> |
There was a problem hiding this comment.
For me the term concurrent backups looks like if you're making two backups of the same resource (e.g. the same table) at the same time. For the thing you constrained in this PR the name parallel backups probably would be more clear. What do you think?
There was a problem hiding this comment.
For this setting, I feel 'concurrent' is better than 'parallel'. The constraint is on running more than 1 backup/restore at the same time irrespective of the name. As we already refer to this as 'concurrent', I think it would be confusing to call the constraint on it 'parallel'. Please let me know your thoughts on this.
There was a problem hiding this comment.
AFAIK the term was never referred in the code or settings or configuration as concurrent or parallel so it's quite a random choice.
There was a problem hiding this comment.
Its not in settings or configuration, but we have error messages like
ClickHouse/src/Backups/BackupImpl.cpp
Line 501 in 67e2bf3
and tests named
test_concurrency, so to keep in line with existing structure I used 'concurrent'.
tests/integration/test_backup_restore_on_cluster/configs/disallow_concurrency.xml
Show resolved
Hide resolved
|
There is some flaks occurs for the new test test_backup_restore_on_cluster/test_disallow_concurrency.py::test_concurrent_restores_on_different_node |
Addressed in ##45497 |
|
@vitlibar : As this PR is already merged, I will address your comments in another PR |
|
@SmitaRKulkarni, the new tests are still flaky, please take a look: link |
This PR #45982 was merged yesterday at 19:18 which should fix these issues, I believe after that there are no new failures till now. I will keep an eye on the integration test fails. |
|
Seems like some issues still remain: https://s3.amazonaws.com/clickhouse-test-reports/0/c0bc549e7798c6d079912dd6fb56065463270892/integration_tests__tsan__[3/4].html |
|
All the new fails are with the same reason : #45437 |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added settings to disallow concurrent backups and restores resolves #43891
Implementation:
Testing: