Skip to content

Added settings to disallow concurrent backups and restores#45072

Merged
SmitaRKulkarni merged 8 commits intomasterfrom
43891_Disallow_concurrent_backups_and_restores
Jan 20, 2023
Merged

Added settings to disallow concurrent backups and restores#45072
SmitaRKulkarni merged 8 commits intomasterfrom
43891_Disallow_concurrent_backups_and_restores

Conversation

@SmitaRKulkarni
Copy link
Member

@SmitaRKulkarni SmitaRKulkarni commented Jan 9, 2023

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added settings to disallow concurrent backups and restores resolves #43891
Implementation:

  • Added server level settings to disallow concurrent backups and restores, which are read and set when BackupWorker is created in Context.
  • Settings are set to true by default.
  • Before starting backup or restores, added a check to see if any other backups/restores are running. For internal request it checks if its from the self node using backup_uuid.

Testing:

  • Added a test test_backup_and_restore_on_cluster/test_disallow_concurrency.

Implementation:
* Added server level settings to disallow concurrent backups and restores, which are read and set when BackupWorker is created in Context.
* Settings are set to true by default.
* Before starting backup or restores, added a check to see if any other backups/restores are running (except internal ones).
Testing:
* Added a test test_backup_and_restore_on_cluster/test_disallow_concurrency.
@SmitaRKulkarni SmitaRKulkarni marked this pull request as draft January 9, 2023 17:17
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-improvement Pull request with some product improvements label Jan 9, 2023
@SmitaRKulkarni SmitaRKulkarni marked this pull request as ready for review January 10, 2023 08:02
@hanfei1991 hanfei1991 self-assigned this Jan 13, 2023
Copy link
Member

@hanfei1991 hanfei1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

SmitaRKulkarni and others added 2 commits January 17, 2023 22:27
…ncurrent internal backups & restores - Added settings to disallow concurrent backups and restores
@SmitaRKulkarni
Copy link
Member Author

When performing backup/restore on a cluster, the node which receives the requests forwards it to all nodes including itself. Thus the same node also will be processing 2 requests (1 internal). In case concurrent backup/restore is turned off, then added a check to verify that internal request is from the self node using backup_uuid.
Restores did not have backup_uuid, added it (similar to backups).

@SmitaRKulkarni
Copy link
Member Author

Failures are unrelated
Stress test (ubsan) - #45372
Integration tests (release) [2/4] - #45393

@SmitaRKulkarni SmitaRKulkarni merged commit 6aa6341 into master Jan 20, 2023
@SmitaRKulkarni SmitaRKulkarni deleted the 43891_Disallow_concurrent_backups_and_restores branch January 20, 2023 08:17
<allowed_disk>backups</allowed_disk>
</backups>
<allow_concurrent_backups>false</allow_concurrent_backups>
<allow_concurrent_restores>false</allow_concurrent_restores>
Copy link
Member

@vitlibar vitlibar Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the term concurrent backups looks like if you're making two backups of the same resource (e.g. the same table) at the same time. For the thing you constrained in this PR the name parallel backups probably would be more clear. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this setting, I feel 'concurrent' is better than 'parallel'. The constraint is on running more than 1 backup/restore at the same time irrespective of the name. As we already refer to this as 'concurrent', I think it would be confusing to call the constraint on it 'parallel'. Please let me know your thoughts on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the term was never referred in the code or settings or configuration as concurrent or parallel so it's quite a random choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not in settings or configuration, but we have error messages like

ErrorCodes::BACKUP_ALREADY_EXISTS, "A concurrent backup writing to the same destination {} detected", backup_name_for_logging);

and tests named test_concurrency, so to keep in line with existing structure I used 'concurrent'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks!

@SmitaRKulkarni
Copy link
Member Author

@vitlibar : As this PR is already merged, I will address your comments in another PR

@tavplubix
Copy link
Member

@SmitaRKulkarni, the new tests are still flaky, please take a look: link

@SmitaRKulkarni
Copy link
Member Author

SmitaRKulkarni commented Feb 22, 2023

@SmitaRKulkarni, the new tests are still flaky, please take a look: link

This PR #45982 was merged yesterday at 19:18 which should fix these issues, I believe after that there are no new failures till now. I will keep an eye on the integration test fails.

@tavplubix
Copy link
Member

@SmitaRKulkarni
Copy link
Member Author

All the new fails are with the same reason : #45437
will have a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce a server setting to disallow concurrently running backups.

7 participants