feat(majorupgrade): auto-cleanup failed upgrade job on image rollback by mnencia · Pull Request #10104 · cloudnative-pg/cloudnative-pg

mnencia · 2026-03-02T22:02:16Z

Set BackoffLimit=0 on the major upgrade job so Kubernetes does not retry a failed pg_upgrade. When the user reverts the image after a failed upgrade, the operator automatically deletes the failed job and lets the cluster restart on the original version.

Inspired by Maxim Ivanov's initial rollback approach in #9344.

Closes #9128

github-actions · 2026-03-02T22:02:29Z

❗ By default, the pull request is configured to backport to all release branches.

To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

mnencia · 2026-03-02T22:04:09Z

/test

github-actions · 2026-03-02T22:04:19Z

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22597783147

mnencia · 2026-03-03T10:35:40Z

/test ft=postgres-major-upgrade

github-actions · 2026-03-03T10:35:51Z

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22619106064

mnencia · 2026-03-03T10:56:24Z

/test

github-actions · 2026-03-03T10:56:36Z

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22619835859

NiccoloFei · 2026-03-06T11:51:46Z

/test ft=postgres-major-upgrade

github-actions · 2026-03-06T11:51:58Z

@NiccoloFei, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22762269483

NiccoloFei · 2026-03-06T12:54:26Z

/test

github-actions · 2026-03-06T12:54:39Z

@NiccoloFei, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22764323799

Set BackoffLimit=0 on the major upgrade job so Kubernetes does not retry a failed pg_upgrade (retries won't produce a different result and the retry pods hold the primary PVC, blocking recovery). When the upgrade job exists but has not completed, the reconciler now checks whether the user rolled back the image to the previous major version. If so it deletes the job with foreground propagation and requeues, allowing the cluster to restart on the original version without manual intervention. Move the majorupgrade.Reconcile call above the running-jobs guard in reconcileResources so the reconciler can act on failed upgrade jobs that would otherwise block the controller indefinitely. Inspired by Maxim Ivanov's initial rollback approach in #9344. Signed-off-by: Marco Nenciarini <[email protected]>

Add a rollback scenario to the major upgrade E2E suite. The test creates a cluster with the pgvector extension, attempts an upgrade to a minimal image that lacks the extension (causing pg_upgrade to fail), then reverts the image and verifies the operator automatically cleans up the failed job and the cluster recovers on the original version with its timeline unchanged. Based on Niccolò Fei's E2E test work in #9344. Co-authored-by: Niccolò Fei <[email protected]> Signed-off-by: Marco Nenciarini <[email protected]>

Reflect that the operator now automatically detects an image rollback and deletes the failed upgrade job. Users only need to revert the image — manual job deletion is no longer required. Signed-off-by: Marco Nenciarini <[email protected]>

Signed-off-by: Armando Ruocco <[email protected]>

…#10104) Set BackoffLimit=0 on the major upgrade job so Kubernetes does not retry a failed pg_upgrade. When the user reverts the image after a failed upgrade, the operator automatically deletes the failed job and lets the cluster restart on the original version. Inspired by Maxim Ivanov's initial rollback approach in #9344. Closes #9128 Signed-off-by: Marco Nenciarini <[email protected]> Signed-off-by: Armando Ruocco <[email protected]> Co-authored-by: Niccolò Fei <[email protected]> Co-authored-by: Armando Ruocco <[email protected]> (cherry picked from commit 5b7b799)

cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.27 release-1.28 labels Mar 2, 2026

mnencia force-pushed the fix/major-upgrade-rollback-cleanup branch from 7163f55 to a91e27c Compare March 2, 2026 22:03

mnencia force-pushed the fix/major-upgrade-rollback-cleanup branch 3 times, most recently from fce1be5 to 2d29c47 Compare March 3, 2026 10:25

cnpg-bot added the ok to merge 👌 This PR can be merged label Mar 3, 2026

mnencia marked this pull request as ready for review March 3, 2026 10:56

mnencia requested review from a team, NiccoloFei, jsilvela and litaocdl as code owners March 3, 2026 10:56

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 3, 2026

dosubot bot added the enhancement 🪄 New feature or request label Mar 3, 2026

mnencia mentioned this pull request Mar 3, 2026

feat(majorupgrade): Allow image rollbacks on failed major upgrades #9344

Closed

NiccoloFei force-pushed the fix/major-upgrade-rollback-cleanup branch 3 times, most recently from 97882f5 to ba49825 Compare March 6, 2026 11:50

NiccoloFei approved these changes Mar 6, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 6, 2026

armru force-pushed the fix/major-upgrade-rollback-cleanup branch 2 times, most recently from d94d0d3 to 69879b5 Compare March 6, 2026 13:58

armru approved these changes Mar 6, 2026

View reviewed changes

armru force-pushed the fix/major-upgrade-rollback-cleanup branch 2 times, most recently from 27df5ec to 3b7d9ff Compare March 6, 2026 14:02

mnencia and others added 4 commits March 9, 2026 09:04

chore: emit event on deletion

aafd3dc

Signed-off-by: Armando Ruocco <[email protected]>

mnencia force-pushed the fix/major-upgrade-rollback-cleanup branch from 3b7d9ff to aafd3dc Compare March 9, 2026 08:04

mnencia removed the release-1.25 label Mar 9, 2026

gbartolini approved these changes Mar 9, 2026

View reviewed changes

mnencia merged commit 5b7b799 into main Mar 9, 2026
39 of 42 checks passed

mnencia deleted the fix/major-upgrade-rollback-cleanup branch March 9, 2026 08:17

github-actions bot mentioned this pull request Mar 9, 2026

Backport failure for pull request 10104 #10220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(majorupgrade): auto-cleanup failed upgrade job on image rollback#10104

feat(majorupgrade): auto-cleanup failed upgrade job on image rollback#10104
mnencia merged 4 commits intomainfrom
fix/major-upgrade-rollback-cleanup

mnencia commented Mar 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

mnencia commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

mnencia commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

mnencia commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

NiccoloFei commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

NiccoloFei commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mnencia commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

mnencia commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

mnencia commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

mnencia commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

NiccoloFei commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

NiccoloFei commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mnencia commented Mar 2, 2026 •

edited

Loading