Skip to content

fix(pvc): handle FileSystemResizePending condition to prevent pod cre…#9981

Merged
mnencia merged 3 commits intocloudnative-pg:mainfrom
jmealo:fix/pvc-filesystem-resize-pending
Feb 21, 2026
Merged

fix(pvc): handle FileSystemResizePending condition to prevent pod cre…#9981
mnencia merged 3 commits intocloudnative-pg:mainfrom
jmealo:fix/pvc-filesystem-resize-pending

Conversation

@jmealo
Copy link
Contributor

@jmealo jmealo commented Feb 14, 2026

fix(pvc): handle FileSystemResizePending condition to prevent pod creation deadlock

Closes #9980

Problem

When a PVC has the FileSystemResizePending condition, CloudNativePG incorrectly classifies it as "resizing" even when no pod is attached. This causes a deadlock:

  1. PVC resize completes at storage layer
  2. Kubernetes adds FileSystemResizePending condition (filesystem resize needs pod mount)
  3. Pod is deleted or crashes
  4. CloudNativePG sees PVC as "resizing" and does NOT create a new pod
  5. Filesystem resize never completes because no pod mounts the volume
  6. Cluster is stuck

Root Cause

In classifyPVC(), the resizing check happens before the pod existence check:

// Before (buggy):
if isResizing(pvc) {
    return resizing  // Returns even when no pod exists
}
if hasPod(pvc, podList) {
    return healthy
}

The FileSystemResizePending condition specifically indicates the volume resize is done at the storage layer and requires a pod mount to complete the filesystem resize. Returning resizing prevents pod creation, creating a deadlock.

Solution

Reorder the classification logic to:

  1. First check if pod exists - if a pod is attached and resizing, filesystem resize can complete
  2. If no pod and FileSystemResizePending - classify as dangling to trigger pod recreation
  3. If no pod but only Resizing - classify as resizing (storage resize still in progress)
// After (fixed):
if hasPod(pvc, podList) {
    if isResizing(pvc) {
        return resizing  // Pod attached, resize can complete
    }
    return healthy
}

if isResizing(pvc) {
    if isFileSystemResizePending(pvc) {
        return dangling  // Needs pod to complete filesystem resize
    }
    return resizing  // Storage resize still in progress
}

Changes

  • pkg/reconciler/persistentvolumeclaim/resources.go: Add isFileSystemResizePending() helper function
  • pkg/reconciler/persistentvolumeclaim/status.go: Reorder classification logic in classifyPVC()
  • pkg/reconciler/persistentvolumeclaim/resources_test.go: Add comprehensive tests

Testing

  • Unit tests: Added 7 new tests covering:
    • isFileSystemResizePending() function (4 tests)
    • Classification behavior with FileSystemResizePending (3 tests)
  • All existing tests pass: 75/75 specs
  • Manual verification: Tested resize scenarios on AKS with Azure Disk CSI driver

Compatibility

  • Backward compatible: Yes - only changes classification of edge case
  • API changes: None
  • Backport targets: release-1.28, release-1.27

AI Assistance

This fix was developed with assistance from Claude Opus 4.5 while developing the Dynamic Storage feature. The issue was discovered while running E2E tests on Azure AKS 1.34.2.

@jmealo jmealo requested a review from a team as a code owner February 14, 2026 02:27
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 14, 2026
@cnpg-bot cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.27 release-1.28 labels Feb 14, 2026
@github-actions
Copy link
Contributor

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@dosubot dosubot bot added the bug 🐛 Something isn't working label Feb 14, 2026
@jmealo jmealo force-pushed the fix/pvc-filesystem-resize-pending branch 2 times, most recently from f70df38 to 8964d8d Compare February 14, 2026 02:32
@sxd
Copy link
Member

sxd commented Feb 18, 2026

/test depth=push

@github-actions
Copy link
Contributor

@sxd, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22146955316

@cnpg-bot cnpg-bot added the ok to merge 👌 This PR can be merged label Feb 18, 2026
@armru armru force-pushed the fix/pvc-filesystem-resize-pending branch from 8201b29 to ead99f2 Compare February 19, 2026 15:01
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 19, 2026
@armru armru added this to the 1.29.0 milestone Feb 19, 2026
jmealo and others added 2 commits February 20, 2026 15:18
…ation deadlock

When a PVC has the FileSystemResizePending condition, the volume has been
resized at the storage layer but the filesystem resize is pending - it
requires a pod to mount the volume to complete the resize.

Previously, classifyPVC() would return "resizing" for such PVCs even when
no pod was attached. This caused a deadlock:
1. PVC needs a pod mount to complete filesystem resize
2. CNPG sees PVC as "resizing" so doesn't create a pod
3. Filesystem resize never completes → cluster stuck

The fix reorders the classification logic to:
1. First check if a pod exists - if so, resizing can complete
2. If no pod and FileSystemResizePending condition is present, classify
   as "dangling" to trigger pod recreation
3. If no pod but still waiting for volume resize at storage layer,
   continue to classify as "resizing"

This ensures pods are recreated when needed to complete filesystem resizes,
while still respecting storage-layer resize operations in progress.

Signed-off-by: Jeff Mealo <[email protected]>
Assisted-by: Claude Opus 4.5
… logging

Check that the FileSystemResizePending condition has ConditionTrue status
before triggering PVC reclassification as dangling. Add an Info-level log
message when this reclassification occurs to aid debugging.

Signed-off-by: Armando Ruocco <[email protected]>
@mnencia mnencia force-pushed the fix/pvc-filesystem-resize-pending branch from ead99f2 to 9861b18 Compare February 20, 2026 14:18
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 20, 2026
@mnencia mnencia force-pushed the fix/pvc-filesystem-resize-pending branch from 7f19a15 to 34c94b0 Compare February 20, 2026 18:03
@mnencia mnencia removed the ok to merge 👌 This PR can be merged label Feb 20, 2026
@mnencia
Copy link
Member

mnencia commented Feb 20, 2026

/test

@github-actions
Copy link
Contributor

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22235398611

@cnpg-bot cnpg-bot added the ok to merge 👌 This PR can be merged label Feb 20, 2026
@mnencia mnencia merged commit 0c0fa23 into cloudnative-pg:main Feb 21, 2026
34 of 36 checks passed
@dosubot
Copy link

dosubot bot commented Feb 21, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

cnpg-bot pushed a commit that referenced this pull request Feb 21, 2026
After a volume resize completes at the storage layer, the
filesystem resize still needs a running pod to mount the volume.
If the pod is missing, the instance was stuck indefinitely
because no new pod was created. Now the operator detects this
condition and recreates the pod to complete the resize.

Closes #9980

Signed-off-by: Jeff Mealo <[email protected]>
Signed-off-by: Armando Ruocco <[email protected]>
Signed-off-by: Marco Nenciarini <[email protected]>
Co-authored-by: Armando Ruocco <[email protected]>
Co-authored-by: Marco Nenciarini <[email protected]>
(cherry picked from commit 0c0fa23)
cnpg-bot pushed a commit that referenced this pull request Feb 21, 2026
After a volume resize completes at the storage layer, the
filesystem resize still needs a running pod to mount the volume.
If the pod is missing, the instance was stuck indefinitely
because no new pod was created. Now the operator detects this
condition and recreates the pod to complete the resize.

Closes #9980

Signed-off-by: Jeff Mealo <[email protected]>
Signed-off-by: Armando Ruocco <[email protected]>
Signed-off-by: Marco Nenciarini <[email protected]>
Co-authored-by: Armando Ruocco <[email protected]>
Co-authored-by: Marco Nenciarini <[email protected]>
(cherry picked from commit 0c0fa23)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-requested ◀️ This pull request should be backported to all supported releases bug 🐛 Something isn't working lgtm This PR has been approved by a maintainer ok to merge 👌 This PR can be merged release-1.27 release-1.28 size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: PVC with FileSystemResizePending condition causes pod creation deadlock

5 participants