Skip to content

Use cluster state data to check concurrent backup/restore#45982

Merged
vitlibar merged 8 commits intomasterfrom
Cluster_state_for_disallow_concurrent_backup_restore
Feb 21, 2023
Merged

Use cluster state data to check concurrent backup/restore#45982
vitlibar merged 8 commits intomasterfrom
Cluster_state_for_disallow_concurrent_backup_restore

Conversation

@SmitaRKulkarni
Copy link
Member

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Use cluster state data to check concurrent backup/restore

When concurrent backup/restores are disabled on cluster, with current implementation its possible that 2 nodes start making a backup/restore and can either conflict or not. To ensure that a cluster only runs 1 backup/restore at a time, we use cluster state data from zookeeper.
Implementation:

  • BackupWorker checks the if any backup/restore which has a path in zookeeper has status not completed, if yes, new backup/restore is stopped.
  • For not on cluster only active backup / restore is checked.
  • Removed restore_uuid from RestoreSettings, as it is no longer used.

Implementation:
* BackupWorker checks the if any backup/restore which has a path in zookeeper has status not completed, if yes, new backup/restore is stopped.
* For not on cluster only active backup / restore is checked.
* Removed restore_uuid from RestoreSettings, as it is no longer used.
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-improvement Pull request with some product improvements label Feb 2, 2023
@vitlibar vitlibar self-assigned this Feb 8, 2023
SmitaRKulkarni and others added 4 commits February 10, 2023 13:53
…cluster state data to check concurrent backup/restore
… - Use cluster state data to check concurrent backup/restore
…ath - Use cluster state data to check concurrent backup/restore
{
String root_zk_path = context->getConfigRef().getString("backups.zookeeper_path", "/clickhouse/backups");
restore_settings.coordination_zk_path = root_zk_path + "/restore-" + toString(UUIDHelpers::generateV4());
restore_settings.coordination_zk_path = root_zk_path + "/restore-" + toString(restore_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore_id is something which could be specified by user. It can be the same for two separate restores if they're not simultaneous. And coordination_zk_path must be always unique, that's why it's better to use a random UUID here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use restore_uuid, which is generated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned restore_id can be specified by user, I believe we use this from restore_settings

/// `restore_id` will be used as a key to the `infos` map, so it should be unique.
OperationID restore_id;
if (restore_settings.internal)
restore_id = "internal-" + toString(UUIDHelpers::generateV4()); /// Always generate `restore_id` for internal restore to avoid collision if both internal and non-internal restores are on the same host
else if (!restore_settings.id.empty())
restore_id = restore_settings.id;
else
restore_id = toString(*restore_settings.restore_uuid);

In that case if restore_settings.id is not empty restore_id will be the one specified by user, then we will have 2 restores will same restore_id in the map. Seems like this is an issue. Correct me if I am wrong.

…tion_path and added uuid in settings - Use cluster state data to check concurrent backup/restore
@SmitaRKulkarni
Copy link
Member Author

Unrelated failures :
Integration test - test_s3_cluster/test.py::test_parallel_distributed_insert_select_with_schema_inference - fixed by #46488

Stress test (ubsan) - fixed by #46521

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants