Skip to content

Fix KVM incremental volume snapshot creation#12666

Merged
sureshanaparti merged 2 commits intoapache:4.22from
scclouds:fix-incremental-snapshot-creation
Apr 10, 2026
Merged

Fix KVM incremental volume snapshot creation#12666
sureshanaparti merged 2 commits intoapache:4.22from
scclouds:fix-incremental-snapshot-creation

Conversation

@JoaoJandre
Copy link
Copy Markdown
Contributor

Description

During the creation of incremental snapshots, CloudStack sends an asynchronous command to Libvirt to back up the volume. After sending the command, ACS waits for Libvirt to signal the completion of the execution to continue with the snapshot process. However, sporadically, Libvirt signals the completion of the command before the operating system actually releases the write lock on the snapshot file. When this occurs, an error is thrown when ACS attempts to rebase the snapshot.

This PR changes the rebase so that if ACS encounters a lock error while rebasing the snapshot, another attempt is made after 60 seconds.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

This issue is extremely hard to reproduce. But I have tested that the normal incremental snapshot workflow still works as expected.

Copy link
Copy Markdown
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 5.26316% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.60%. Comparing base (5caf6cd) to head (5d0f4d9).
⚠️ Report is 60 commits behind head on 4.22.

Files with missing lines Patch % Lines
...ud/hypervisor/kvm/storage/KVMStorageProcessor.java 0.00% 18 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #12666      +/-   ##
============================================
- Coverage     17.60%   17.60%   -0.01%     
- Complexity    15659    15678      +19     
============================================
  Files          5917     5918       +1     
  Lines        531394   531775     +381     
  Branches      64970    65025      +55     
============================================
+ Hits          93575    93637      +62     
- Misses       427269   427571     +302     
- Partials      10550    10567      +17     
Flag Coverage Δ
uitests 3.70% <ø> (-0.01%) ⬇️
unittests 18.67% <5.26%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce sporadic failures during KVM incremental volume snapshot creation by retrying qemu-img rebase when libvirt/qemu-img reports a transient image lock.

Changes:

  • Detect the specific “image is in use” lock error during snapshot rebase.
  • Add a one-time retry of the rebase after a 60-second delay.
  • Change non-lock rebase failures to throw and abort the snapshot workflow (previously logged and continued).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sureshanaparti
Copy link
Copy Markdown
Contributor

@JoaoJandre can you re-target this to 4.22 branch?

@JoaoJandre JoaoJandre force-pushed the fix-incremental-snapshot-creation branch from 4dce326 to 1768b5b Compare February 19, 2026 16:21
@sureshanaparti
Copy link
Copy Markdown
Contributor

@blueorangutan package

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17424

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian Build Failed (tid-15841)

Copy link
Copy Markdown
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@sureshanaparti
Copy link
Copy Markdown
Contributor

Tested create/restore/delete snapshots after enabling KVM incremental snapshot, no issues noticed during rebase.

kvm-incr-snapshots-records
[root@ref-trl-11553-k-Mol8-suresh-anaparti-kvm2 ~]# grep -i "Rebasing snapshot" /var/log/cloudstack/agent/agent.log
2026-04-10 18:06:01,941 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-3:[]) (logid:48bc30b2) Rebasing snapshot [d7b0a35e-bde6-4575-a61c-6aa70c4cd03d] with parent [/mnt/5ed8c342-7e92-374f-87df-4f637c3ba8a6/snapshots/2/3/908a7fbc-a813-4eac-9ff0-d631235efef2].
2026-04-10 18:06:13,930 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-5:[]) (logid:6bd45a5a) Rebasing snapshot [4234734b-5ba1-4f10-8a1f-3c43f13db138] with parent [/mnt/5ed8c342-7e92-374f-87df-4f637c3ba8a6/snapshots/2/5/64f6b68c-10cd-4875-9cc6-cda1c931e6d7].
2026-04-10 18:17:05,951 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-1:[]) (logid:78887764) Rebasing snapshot [ba323c4c-f323-4f1f-895c-6f399009380f] with parent [/mnt/5ed8c342-7e92-374f-87df-4f637c3ba8a6/snapshots/2/3/d7b0a35e-bde6-4575-a61c-6aa70c4cd03d].
2026-04-10 18:17:21,938 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-2:[]) (logid:6f03688f) Rebasing snapshot [0441d98c-c697-4447-9ab4-e2454219d157] with parent [/mnt/5ed8c342-7e92-374f-87df-4f637c3ba8a6/snapshots/2/5/4234734b-5ba1-4f10-8a1f-3c43f13db138].
2026-04-10 18:20:13,943 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-4:[]) (logid:f49a0e86) Rebasing snapshot [b0621379-270b-4746-95ee-05a787d59f78] with parent [/mnt/5ed8c342-7e92-374f-87df-4f637c3ba8a6/snapshots/2/3/92c84e66-b5a9-4ad0-94a1-9defbe235464].
2026-04-10 18:20:25,967 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-1:[]) (logid:001d7d16) Rebasing snapshot [1ab68b10-3c13-4427-b93a-d6d6bfd6591e] with parent [/mnt/5ed8c342-7e92-374f-87df-4f637c3ba8a6/snapshots/2/5/ece81f6a-defc-4a4f-b262-f6627582f2fa].
[root@ref-trl-11553-k-Mol8-suresh-anaparti-kvm2 ~]#


[root@ref-trl-11553-k-Mol8-suresh-anaparti-kvm2 ~]# grep -i "Taking incremental volume snapshot of volume" /var/log/cloudstack/agent/agent.log 
2026-04-10 18:04:37,745 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-3:[]) (logid:9f9b410e) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":3,"name":"ROOT-3","path":"93710230-f706-4908-8c3e-a416c3183be7","uuid":"93710230-f706-4908-8c3e-a416c3183be7"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].
2026-04-10 18:04:50,044 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-5:[]) (logid:ee9c9b5a) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":5,"name":"DATA01","path":"beed02e7-8000-4d36-95f4-bce0526dbb60","uuid":"beed02e7-8000-4d36-95f4-bce0526dbb60"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].
2026-04-10 18:05:51,720 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-3:[]) (logid:48bc30b2) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":3,"name":"ROOT-3","path":"93710230-f706-4908-8c3e-a416c3183be7","uuid":"93710230-f706-4908-8c3e-a416c3183be7"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].
2026-04-10 18:06:03,702 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-5:[]) (logid:6bd45a5a) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":5,"name":"DATA01","path":"beed02e7-8000-4d36-95f4-bce0526dbb60","uuid":"beed02e7-8000-4d36-95f4-bce0526dbb60"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].
2026-04-10 18:16:55,727 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-1:[]) (logid:78887764) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":3,"name":"ROOT-3","path":"93710230-f706-4908-8c3e-a416c3183be7","uuid":"93710230-f706-4908-8c3e-a416c3183be7"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].
2026-04-10 18:17:11,716 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-2:[]) (logid:6f03688f) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":5,"name":"DATA01","path":"beed02e7-8000-4d36-95f4-bce0526dbb60","uuid":"beed02e7-8000-4d36-95f4-bce0526dbb60"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].
2026-04-10 18:18:39,787 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-3:[]) (logid:a2e28393) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":3,"name":"ROOT-3","path":"93710230-f706-4908-8c3e-a416c3183be7","uuid":"93710230-f706-4908-8c3e-a416c3183be7"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].
2026-04-10 18:19:01,696 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-4:[]) (logid:17591ff0) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":5,"name":"DATA01","path":"beed02e7-8000-4d36-95f4-bce0526dbb60","uuid":"beed02e7-8000-4d36-95f4-bce0526dbb60"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].
2026-04-10 18:20:03,709 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-4:[]) (logid:f49a0e86) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":3,"name":"ROOT-3","path":"93710230-f706-4908-8c3e-a416c3183be7","uuid":"93710230-f706-4908-8c3e-a416c3183be7"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].
2026-04-10 18:20:15,702 DEBUG [kvm.storage.KVMStorageProcessor] (AgentRequest-Handler-1:[]) (logid:001d7d16) Taking incremental volume snapshot of volume [volumeTO {"dataStore":"PrimaryDataStoreTO {\"id\":2,\"name\":\"ref-trl-11553-k-Mol8-suresh-anaparti-kvm-pri2\",\"poolType\":\"NetworkFilesystem\",\"uuid\":\"6257b3f7-7a85-3433-93b4-0c47491b1037\"}","id":5,"name":"DATA01","path":"beed02e7-8000-4d36-95f4-bce0526dbb60","uuid":"beed02e7-8000-4d36-95f4-bce0526dbb60"}] attached to running VM [i-2-3-VM]. Snapshot will be copied to [LibvirtStoragePool {"uuid":"5ed8c342-7e92-374f-87df-4f637c3ba8a6"}].

@sureshanaparti sureshanaparti merged commit 7c7b2ae into apache:4.22 Apr 10, 2026
25 of 26 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Apache CloudStack 4.22.1 Apr 10, 2026
@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-15847)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 47748 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12666-t15847-kvm-ol8.zip
Smoke tests completed. 149 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants