Skip to content

Add OCI/Image Volume Source support#10579

Merged
fuweid merged 1 commit intocontainerd:mainfrom
wzshiming:feat/oci-image-mount
Feb 18, 2025
Merged

Add OCI/Image Volume Source support#10579
fuweid merged 1 commit intocontainerd:mainfrom
wzshiming:feat/oci-image-mount

Conversation

@wzshiming
Copy link
Contributor

Fixed #10496

@k8s-ci-robot
Copy link

Hi @wzshiming. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@wzshiming wzshiming force-pushed the feat/oci-image-mount branch 2 times, most recently from 4446688 to fdc84b3 Compare August 12, 2024 06:38
@wzshiming wzshiming force-pushed the feat/oci-image-mount branch 22 times, most recently from 8de3699 to 1589b41 Compare August 12, 2024 16:40
mountPath := "/image-mount"
EnsureImageExists(t, testMountImage)
EnsureImageExists(t, testImage)
testImageMountSELinux(t, testImage, testMountImage, mountPath, "s0:c4,c5", "system_u:object_r:container_file_t:s0:c4,c5 pause")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wzshiming

I think this case fails.

   default: --- PASS: TestImageMount (1.84s)
    default: === RUN   TestImageMountSELinux
    default:     main_test.go:786: Image "registry.k8s.io/pause:3.10" already exists, not pulling.
    default:     main_test.go:786: Image "registry.k8s.io/e2e-test-images/resource-consumer:1.10" already exists, not pulling.
    default:     container_log_test.go:145:
    default:         	Error Trace:	/root/go/src/github.com/containerd/containerd/integration/container_log_test.go:145
    default:         	            				/root/go/src/github.com/containerd/containerd/integration/image_mount_test.go:113
    default:         	            				/root/go/src/github.com/containerd/containerd/integration/image_mount_test.go:67
    default:         	Error:      	Should be true
    default:         	Test:       	TestImageMountSELinux
    default: --- FAIL: TestImageMountSELinux (2.47s)

Would you please show more log in your commit? It would be easier to fix. Thanks

Copy link
Contributor Author

@wzshiming wzshiming Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error Trace: /root/go/src/github.com/containerd/containerd/integration/container_log_test.go:145

func checkContainerLog(t *testing.T, log string, messages []string) {
lines := strings.Split(strings.TrimSpace(log), "\n")
require.Len(t, lines, len(messages), "log line number should match")
for i, line := range lines {
ts, msg, ok := strings.Cut(line, " ")
require.True(t, ok)
_, err := time.Parse(time.RFC3339Nano, ts)

Maybe we can just retest, it doesn't seem to be caused by this change.

I will tried to find out why

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check what it will show to us after adding debug log

criruntime "k8s.io/cri-api/pkg/apis/runtime/v1"
)

func TestImageMountOnWS(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails as well.

=== RUN   TestImageMountOnWS
    main_test.go:790: Pull test image "ghcr.io/containerd/windows/nanoserver:ltsc2022"
    container_log_test.go:148: 
        	Error Trace:	D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/container_log_test.go:148
        	            				D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/image_mount_test.go:160
        	            				D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/image_mount_windows_test.go:37
        	Error:      	Not equal: 
        	            	expected: "stdout F License.txt"
        	            	actual  : "stderr F The system cannot find the path specified."
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-stdout F License.txt
        	            	+stderr F The system cannot find the path specified.
        	Test:       	TestImageMountOnWS
        	Messages:   	log content should match
E0213 19:33:03.238072    5016 remote_runtime.go:159] StopPodSandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a" from runtime service failed: rpc error: code = Unknown desc = failed to cleanup image mounts for sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to convert to full path: The system cannot find the path specified.
    main_test.go:271: 
        	Error Trace:	D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/main_test.go:271
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1176
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1354
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1684
        	Error:      	Received unexpected error:
        	            	rpc error: code = Unknown desc = failed to cleanup image mounts for sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to convert to full path: The system cannot find the path specified.
        	Test:       	TestImageMountOnWS
E0213 19:33:03.280214    5016 remote_runtime.go:179] RemovePodSandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a" from runtime service failed: rpc error: code = Unknown desc = failed to forcibly stop sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to cleanup image mounts for sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to convert to full path: The system cannot find the path specified.
    main_test.go:272: 
        	Error Trace:	D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/main_test.go:272
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1176
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1354
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1684
        	Error:      	Received unexpected error:
        	            	rpc error: code = Unknown desc = failed to forcibly stop sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to cleanup image mounts for sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to convert to full path: The system cannot find the path specified.
        	Test:       	TestImageMountOnWS
--- FAIL: TestImageMountOnWS (17.97s)

Copy link
Contributor Author

@wzshiming wzshiming Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that leaves Windows Server support, which may need someone more familiar with Windows to figure out exactly what is causing the problem.
I've tried everything I can find, but to no avail. I will remove it, just support Linux first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it, just support Linux first.

@wzshiming sure. we can have track item to add test in windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works!

@carlory
Copy link

carlory commented Feb 17, 2025

FYI cri adds the subpath support for image volume type: kubernetes/kubernetes#130135

@mikebrow
Copy link
Member

FYI cri adds the subpath support for image volume type: kubernetes/kubernetes#130135

nod .. let's get this one merged and follow up with subpath, unless subpath has already merged and the test is required.

@mikebrow
Copy link
Member

=== RUN TestImageMount
main_test.go:786: Image "registry.k8s.io/pause:3.10" already exists, not pulling.
main_test.go:790: Pull test image "ghcr.io/containerd/alpine:3.14.0"
image_mount_test.go:165:
Error Trace: /home/runner/work/containerd/containerd/integration/image_mount_test.go:165
/home/runner/work/containerd/containerd/integration/image_mount_test.go:44
Error: Received unexpected error:
timeout exceeded
Test: TestImageMount
--- FAIL: TestImageMount (31.71s)

hmm.. a timeout .. maybe use BusyBox it should always be smaller..

@fuweid
Copy link
Member

fuweid commented Feb 18, 2025

   default: time="2025-02-17T12:02:03.557783499Z" level=debug msg="remove snapshot" key=/run/containerd-test/io.containerd.grpc.v1.cri/image-volumes/e282eb3b5a1a8337b5e9016253b41f3359460325ec91e4d0e0dd966ad5187bca/ee6521f290b2168b6e0935a181d4cff9be1ac3f505666ef0e3c98fae8199917a snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.558221517Z" level=info msg="Container df376860f476b3d601d8c82e6ab95adcd1b7e49688fe937fd688ed8b54feccab: CDI devices from CRI Config.CDIDevices: []"
    default: time="2025-02-17T12:02:03.559726745Z" level=debug msg="schedule snapshotter cleanup" snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.560854785Z" level=debug msg="removed snapshot" key=k8s.io/95/e3c6fc9bf3c590cd333f266b2379748796bc7e9f1ea2ed840f06a018f19ab15e snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.561669736Z" level=debug msg="removed snapshot" key=k8s.io/99//run/containerd-test/io.containerd.grpc.v1.cri/image-volumes/e282eb3b5a1a8337b5e9016253b41f3359460325ec91e4d0e0dd966ad5187bca/ee6521f290b2168b6e0935a181d4cff9be1ac3f505666ef0e3c98fae8199917a snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.562498973Z" level=debug msg="removed snapshot" key=k8s.io/97/dbe9f8f1545805758f5aefc6b671fb8f320c8e1a2bcab1f9156e527d9552d7af snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.563695813Z" level=info msg="CreateContainer within sandbox \"e282eb3b5a1a8337b5e9016253b41f3359460325ec91e4d0e0dd966ad5187bca\" for &ContainerMetadata{Name:test-image-mount-container,Attempt:0,} returns container id \"df376860f476b3d601d8c82e6ab95adcd1b7e49688fe937fd688ed8b54feccab\""

Checked the code and found that snapshot creation doesn't use lease so that containerd GC cleanups the volume after creation. That's why the log content is empty. Since the GC cleanups snapshots async, CI could be happy if ls -Z executes before GC.

please consider using the following code to fix.

diff --git a/internal/cri/server/container_image_mount.go b/internal/cri/server/container_image_mount.go
index 4e84da212..b8d9cf59c 100644
--- a/internal/cri/server/container_image_mount.go
+++ b/internal/cri/server/container_image_mount.go
@@ -23,6 +23,7 @@ import (
        "path/filepath"

        containerd "github.com/containerd/containerd/v2/client"
+       "github.com/containerd/containerd/v2/core/leases"
        "github.com/containerd/containerd/v2/core/mount"
        "github.com/containerd/errdefs"
        "github.com/containerd/log"
@@ -39,6 +40,11 @@ func (c *criService) mutateMounts(
        sandboxID string,
        platform imagespec.Platform,
 ) error {
+       if err := c.ensureLeaseExist(ctx, sandboxID); err != nil {
+               return fmt.Errorf("failed to ensure lease %v for sandbox: %w", sandboxID, err)
+       }
+
+       ctx = leases.WithLease(ctx, sandboxID)
        for _, m := range extraMounts {
                err := c.mutateImageMount(ctx, m, snapshotter, sandboxID, platform)
                if err != nil {
@@ -48,6 +54,17 @@ func (c *criService) mutateMounts(
        return nil
 }

+func (c *criService) ensureLeaseExist(ctx context.Context, sandboxID string) error {
+       leaseSvc := c.client.LeasesService()
+       _, err := leaseSvc.Create(ctx, leases.WithID(sandboxID))
+       if err != nil {
+               if errdefs.IsAlreadyExists(err) {
+                       err = nil
+               }
+       }
+       return err
+}
+
 func (c *criService) mutateImageMount(
        ctx context.Context,
        extraMount *runtime.Mount,
diff --git a/internal/cri/server/sandbox_remove.go b/internal/cri/server/sandbox_remove.go
index f7e1ed880..b2f728e6d 100644
--- a/internal/cri/server/sandbox_remove.go
+++ b/internal/cri/server/sandbox_remove.go
@@ -21,6 +21,7 @@ import (
        "fmt"
        "time"

+       "github.com/containerd/containerd/v2/core/leases"
        "github.com/containerd/containerd/v2/pkg/tracing"
        "github.com/containerd/errdefs"
        "github.com/containerd/log"
@@ -56,6 +57,12 @@ func (c *criService) RemovePodSandbox(ctx context.Context, r *runtime.RemovePodS
                return nil, fmt.Errorf("failed to forcibly stop sandbox %q: %w", id, err)
        }

+       if err := c.client.LeasesService().Delete(ctx, leases.Lease{ID: id}); err != nil {
+               if !errdefs.IsNotFound(err) {
+                       return nil, fmt.Errorf("failed to delete lease for sandbox %q: %w", id, err)
+               }
+       }
+
        // Return error if sandbox network namespace is not closed yet.
        if sandbox.NetNS != nil {
                nsPath := sandbox.NetNS.GetPath()
diff --git a/internal/cri/server/sandbox_run.go b/internal/cri/server/sandbox_run.go
index e895d66b3..5e25a007f 100644
--- a/internal/cri/server/sandbox_run.go
+++ b/internal/cri/server/sandbox_run.go
@@ -31,6 +31,7 @@ import (
        "github.com/containerd/typeurl/v2"
        runtime "k8s.io/cri-api/pkg/apis/runtime/v1"

+       "github.com/containerd/containerd/v2/core/leases"
        sb "github.com/containerd/containerd/v2/core/sandbox"
        "github.com/containerd/containerd/v2/internal/cri/annotations"
        "github.com/containerd/containerd/v2/internal/cri/bandwidth"
@@ -87,6 +88,22 @@ func (c *criService) RunPodSandbox(ctx context.Context, r *runtime.RunPodSandbox
                }
        }()

+       leaseSvc := c.client.LeasesService()
+       ls, lerr := leaseSvc.Create(ctx, leases.WithID(id))
+       if lerr != nil {
+               return nil, fmt.Errorf("failed to create lease for sandbox name %q: %w", name, lerr)
+       }
+       defer func() {
+               if retErr != nil {
+                       deferCtx, deferCancel := util.DeferContext()
+                       defer deferCancel()
+
+                       if derr := leaseSvc.Delete(deferCtx, ls); derr != nil {
+                               log.G(deferCtx).WithError(derr).Error("failed to delete lease during cleanup")
+                       }
+               }
+       }()
+
        var (
                err         error
                sandboxInfo = sb.Sandbox{ID: id}

@wzshiming
Copy link
Contributor Author

@fuweid it works, thank you!!!

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

let's get this one merged first. Thanks @wzshiming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Add OCI Volume Source support