Use cluster state data to check concurrent backup/restore by SmitaRKulkarni · Pull Request #45982 · ClickHouse/ClickHouse

SmitaRKulkarni · 2023-02-02T18:27:18Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Use cluster state data to check concurrent backup/restore

When concurrent backup/restores are disabled on cluster, with current implementation its possible that 2 nodes start making a backup/restore and can either conflict or not. To ensure that a cluster only runs 1 backup/restore at a time, we use cluster state data from zookeeper.
Implementation:

BackupWorker checks the if any backup/restore which has a path in zookeeper has status not completed, if yes, new backup/restore is stopped.
For not on cluster only active backup / restore is checked.
Removed restore_uuid from RestoreSettings, as it is no longer used.

Implementation: * BackupWorker checks the if any backup/restore which has a path in zookeeper has status not completed, if yes, new backup/restore is stopped. * For not on cluster only active backup / restore is checked. * Removed restore_uuid from RestoreSettings, as it is no longer used.

src/Backups/BackupsWorker.cpp

src/Backups/BackupCoordinationRemote.h

src/Backups/BackupsWorker.cpp

…re Coordination - Use cluster state data to check concurrent backup/restore

…up_restore

…cluster state data to check concurrent backup/restore

… - Use cluster state data to check concurrent backup/restore

…ath - Use cluster state data to check concurrent backup/restore

…up_restore

src/Backups/BackupsWorker.cpp

vitlibar · 2023-02-14T13:18:39Z

src/Backups/BackupsWorker.cpp

        {
-            String root_zk_path = context->getConfigRef().getString("backups.zookeeper_path", "/clickhouse/backups");
-            restore_settings.coordination_zk_path = root_zk_path + "/restore-" + toString(UUIDHelpers::generateV4());
+            restore_settings.coordination_zk_path = root_zk_path + "/restore-" + toString(restore_id);


restore_id is something which could be specified by user. It can be the same for two separate restores if they're not simultaneous. And coordination_zk_path must be always unique, that's why it's better to use a random UUID here.

updated to use restore_uuid, which is generated.

As you mentioned restore_id can be specified by user, I believe we use this from restore_settings

ClickHouse/src/Backups/BackupsWorker.cpp

Lines 388 to 395 in 0506d92

/// `restore_id` will be used as a key to the `infos` map, so it should be unique.

OperationID restore_id;

if (restore_settings.internal)

restore_id = "internal-" + toString(UUIDHelpers::generateV4()); /// Always generate `restore_id` for internal restore to avoid collision if both internal and non-internal restores are on the same host

else if (!restore_settings.id.empty())

restore_id = restore_settings.id;

else

restore_id = toString(*restore_settings.restore_uuid);

In that case if restore_settings.id is not empty restore_id will be the one specified by user, then we will have 2 restores will same restore_id in the map. Seems like this is an issue. Correct me if I am wrong.

src/Backups/RestoreCoordinationRemote.cpp

…tion_path and added uuid in settings - Use cluster state data to check concurrent backup/restore

SmitaRKulkarni · 2023-02-21T15:54:46Z

Unrelated failures :
Integration test - test_s3_cluster/test.py::test_parallel_distributed_insert_select_with_schema_inference - fixed by #46488

Stress test (ubsan) - fixed by #46521

SmitaRKulkarni requested a review from vitlibar February 2, 2023 18:27

robot-ch-test-poll3 added the pr-improvement Pull request with some product improvements label Feb 2, 2023

SmitaRKulkarni mentioned this pull request Feb 8, 2023

Flaky test_backup_restore_on_cluster/test_disallow_concurrency.py #45486

Closed

vitlibar self-assigned this Feb 8, 2023

vitlibar reviewed Feb 8, 2023

View reviewed changes

src/Backups/BackupsWorker.cpp Outdated Show resolved Hide resolved

vitlibar reviewed Feb 8, 2023

View reviewed changes

src/Backups/BackupCoordinationRemote.h Outdated Show resolved Hide resolved

vitlibar reviewed Feb 8, 2023

View reviewed changes

src/Backups/BackupsWorker.cpp Outdated Show resolved Hide resolved

CheSema mentioned this pull request Feb 9, 2023

Make a bug in HTTP interface less annoying #46183

Merged

SmitaRKulkarni added 2 commits February 10, 2023 12:04

Addressed review comments and moved concurrency check to Backup/Resto…

7fee899

…re Coordination - Use cluster state data to check concurrent backup/restore

Merge branch 'master' into Cluster_state_for_disallow_concurrent_back…

a89d208

…up_restore

SmitaRKulkarni requested a review from vitlibar February 10, 2023 11:18

SmitaRKulkarni and others added 4 commits February 10, 2023 13:53

Fixed build issue caused after merge master in BackupsWorker.h - Use …

94fba0b

…cluster state data to check concurrent backup/restore

Fixed style check by removing trailing whitespaces in BackupsWorker.h…

2ce6783

… - Use cluster state data to check concurrent backup/restore

Fixed clang tidy build by updating parameter name to common_backups_p…

9817c56

…ath - Use cluster state data to check concurrent backup/restore

Merge branch 'master' into Cluster_state_for_disallow_concurrent_back…

3ee3806

…up_restore

vitlibar reviewed Feb 14, 2023

View reviewed changes

src/Backups/BackupsWorker.cpp Outdated Show resolved Hide resolved

vitlibar reviewed Feb 14, 2023

View reviewed changes

src/Backups/RestoreCoordinationRemote.cpp Outdated Show resolved Hide resolved

Updated Backup/Restore Coordination construction and removed coordina…

0506d92

…tion_path and added uuid in settings - Use cluster state data to check concurrent backup/restore

SmitaRKulkarni requested a review from vitlibar February 16, 2023 13:38

SmitaRKulkarni mentioned this pull request Feb 18, 2023

Fix flakiness of test_backup_restore_on_cluster/test_disallow_concurrency #46517

Merged

vitlibar approved these changes Feb 21, 2023

View reviewed changes

vitlibar merged commit 49330b3 into master Feb 21, 2023

vitlibar deleted the Cluster_state_for_disallow_concurrent_backup_restore branch February 21, 2023 18:18

This was referenced Feb 22, 2023

Added settings to disallow concurrent backups and restores #45072

Merged

Failed test test_backup_restore_on_cluster/test_concurrency.py #45437

Closed

SmitaRKulkarni added the v23.1-must-backport label Feb 28, 2023

robot-ch-test-poll3 mentioned this pull request Feb 28, 2023

Cherry pick #45982 to 23.1: Use cluster state data to check concurrent backup/restore #47035

Closed

robot-ch-test-poll1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 4, 2023

SmitaRKulkarni mentioned this pull request Mar 28, 2023

Documentation for settings to disallow concurrent backup/restore #48105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cluster state data to check concurrent backup/restore#45982

Use cluster state data to check concurrent backup/restore#45982
vitlibar merged 8 commits intomasterfrom
Cluster_state_for_disallow_concurrent_backup_restore

SmitaRKulkarni commented Feb 2, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vitlibar Feb 14, 2023

Uh oh!

SmitaRKulkarni Feb 16, 2023

Uh oh!

SmitaRKulkarni Feb 16, 2023

Uh oh!

Uh oh!

SmitaRKulkarni commented Feb 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	/// `restore_id` will be used as a key to the `infos` map, so it should be unique.
	OperationID restore_id;
	if (restore_settings.internal)
	restore_id = "internal-" + toString(UUIDHelpers::generateV4()); /// Always generate `restore_id` for internal restore to avoid collision if both internal and non-internal restores are on the same host
	else if (!restore_settings.id.empty())
	restore_id = restore_settings.id;
	else
	restore_id = toString(*restore_settings.restore_uuid);

Conversation

SmitaRKulkarni commented Feb 2, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vitlibar Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

SmitaRKulkarni Feb 16, 2023

Choose a reason for hiding this comment

Uh oh!

SmitaRKulkarni Feb 16, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SmitaRKulkarni commented Feb 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants