Skip to content

c8d/system: Fix race between df and prune#51979

Merged
thaJeztah merged 1 commit intomoby:masterfrom
vvoland:c8d-prune-race
Feb 2, 2026
Merged

c8d/system: Fix race between df and prune#51979
thaJeztah merged 1 commit intomoby:masterfrom
vvoland:c8d-prune-race

Conversation

@vvoland
Copy link
Contributor

@vvoland vvoland commented Feb 2, 2026

When running docker system df concurrently with docker system prune or image removal, DiskUsage would fail with "snapshot does not exist" error.

This happened because layerDiskUsage walks all snapshots and gets their usage, but a concurrent prune could delete a snapshot between Walk and Usage calls.

Handle NotFound errors gracefully by skipping deleted snapshots instead of returning an error.

- What I did

- How I did it

- How to verify it

- Human readable description for the release notes

Fix `docker system df` failing when run concurrently with `docker system prune`.

- A picture of a cute animal (not mandatory but encouraged)

When running `docker system df` concurrently with `docker system prune`
or image removal, `DiskUsage` would fail with "snapshot does not exist"
error.

This happened because layerDiskUsage walks all snapshots and gets their
usage, but a concurrent prune could delete a snapshot between Walk and
Usage calls.

Handle NotFound errors gracefully by skipping deleted snapshots instead
of returning an error.

Signed-off-by: Paweł Gronowski <[email protected]>
@vvoland vvoland added this to the 29.2.1 milestone Feb 2, 2026
@vvoland vvoland self-assigned this Feb 2, 2026
@github-actions github-actions bot added area/testing area/daemon Core Engine containerd-integration Issues and PRs related to containerd integration labels Feb 2, 2026
@vvoland vvoland added impact/changelog kind/bugfix PR's that fix bugs and removed area/testing labels Feb 2, 2026
@pjonsson
Copy link

pjonsson commented Feb 2, 2026

I don't know how the cache is handled in Docker, but if there's a walk, can the GC specified by:

{
  "builder": {
    "gc": {
      "enabled": true,
      "defaultKeepStorage": "40GB"
    }
  },
}

trigger the same problem for df in a different code path, or is that covered by this PR as well?

@vvoland
Copy link
Contributor Author

vvoland commented Feb 2, 2026

The builder GC is handled by Buildkit (https://github.com/moby/buildkit) which handles this differently and doesn't walk all available snapshots and also seems to handle the not found errors gracefully: https://github.com/moby/buildkit/blob/649062d5e7be1785c31e79729a5725c699cf1370/cache/refs.go#L361

So I think it should be fine, but if there are some issues then they need to be handled on buildkit side.

cc @crazy-max @tonistiigi

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

Failure on Oracle 8 is unrelated, and a known flaky test;

=== Failed
=== FAIL: amd64.docker.docker.integration.networking TestAccessPublishedPortFromHost/userland-proxy=false/IPv6=true (2.24s)
    port_mapping_linux_test.go:413: assertion failed: error is not nil: Get "http://[fdfb:5cbb:29bf::2]:1237": dial tcp [fdfb:5cbb:29bf::2]:1237: connect: connection refused
    --- FAIL: TestAccessPublishedPortFromHost/userland-proxy=false/IPv6=true (2.24s)

=== FAIL: amd64.docker.docker.integration.networking TestAccessPublishedPortFromHost (8.19s)

@thaJeztah thaJeztah merged commit d392ea1 into moby:master Feb 2, 2026
261 of 266 checks passed
Copy link

@lingotesoropuro-jpg lingotesoropuro-jpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gracias

Copy link

@lingotesoropuro-jpg lingotesoropuro-jpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gracias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Core Engine containerd-integration Issues and PRs related to containerd integration impact/changelog kind/bugfix PR's that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Running docker system df fails with "Error response from daemon: failed to calculate image disk usage"

4 participants