CRI: Properly restore IP information for userns sandboxes by dcantah · Pull Request #10400 · containerd/containerd

dcantah · 2024-06-28T17:05:11Z

Due to when we setup networking for userns containers (AFTER the OCI runtime has ran), this was causing the underlying object for the sandbox to not have the IP information saved in the metadata extension. Because we're trying to move away from modifying the underlying container object for sandboxes (or even just keeping around a reference to it), I don't want to update the underlying container afterwards to include this. So, to remedy this we can update the sandboxes metadata after network setup runs for userns containers, and then our recovery logic just needs to be altered a bit.

Our recovery logic already checks containers with the sandbox label first, and then checks our shiny new sandbox store. Our IP information is stored in the sandbox store objects, so all we need to do is check if it exists, and if so and we already found a container for it (although without the IP information) then just update the sandbox in our in-memory store.

k8s-ci-robot · 2024-06-28T17:05:14Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

dcantah · 2024-07-02T19:23:20Z

/retest

internal/cri/server/sandbox_run.go

dcantah · 2024-07-15T19:17:30Z

@fuweid Sorry for delay, un-drafted this and added an integration test. Only some of the CI machines support userns in full so I added a log so we can grep for if the testcase is actually being hit. "restart_linux_test.go:36: adding userns pods for containerd restart test"

dcantah · 2024-07-15T19:29:38Z

/retest

Fixes: containerd#10363 Due to when we setup networking for userns containers (AFTER the OCI runtime has ran), this was causing the underlying object for the sandbox to not have the IP information saved in the metadata extension. Because we're trying to move away from modifying the underlying container object for sandboxes (or even just keeping around a reference to it), I don't want to update the underlying container afterwards to include this. So, to remedy this we can update the sandboxes metadata after network setup runs for userns containers, and then our recovery logic just needs to be altered a bit. Our recovery logic already checks containers with the sandbox label first, and then checks our shiny new sandbox store. Our IP information is stored in the sandbox store objects, so all we need to do is check if it exists, and if so and we already found a container for it (although without the IP information) then just update the sandbox in our in-memory store. Signed-off-by: Danny Canter <[email protected]>

Add a new case to TestContainerdRestartSandboxRecover that will launch a userns pod if the host and runtime supports it. The new case will just test that we have networking info in the pod metadata on restart. Signed-off-by: Danny Canter <[email protected]>

k8s-ci-robot · 2024-07-16T01:42:34Z

@dcantah: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-containerd-k8s-e2e-ec2	`3f9c70c`	link	false	`/test pull-containerd-k8s-e2e-ec2`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

fuweid · 2024-07-16T03:24:00Z

internal/cri/server/sandbox_run.go

 		sandboxCreateNetworkTimer.UpdateSince(netStart)
 	}

+	if err := sandboxInfo.AddExtension(podsandbox.MetadataKey, &sandbox.Metadata); err != nil {


maybe we should move these two updates AddExtension and SandboxStore.Update into the end of if !hostNetwork(config) && userNsEnabled because non-user-ns case doesn't need to update it.

Works for me. When I'm by my computer I'll see if there's any gotchas

fuweid · 2024-07-16T05:33:58Z

internal/cri/server/restart.go

 			}
 			log.G(ctx2).Debugf("Loaded sandbox %+v", sb)
-			if err := c.sandboxStore.Add(sb); err != nil {
+			if err := c.sandboxStore.Add(sb); err != nil && !errors.Is(err, errdefs.ErrAlreadyExists) {


is it related to this fix?

fuweid · 2024-07-16T06:03:05Z

internal/cri/server/restart.go

 	}
 	for _, sbx := range storedSandboxes {
-		if _, err := c.sandboxStore.Get(sbx.ID); err == nil {
+		sb, err := c.sandboxStore.Get(sbx.ID)


Suggestion: we should merge metadata-db.sandbox-store.metadata with metadata-db.container.metadata in function podSandboxLoader.RecoverContainer. This loop is designed for the sandboxes which are managed by remote sandbox plugin. I think we should not involve two sources in one loop.

In the RecoverContainer function, we should prefer to use metadata from metadata-db.sandbox. What do you think?

@dmcgowan @mxpv

I found that the sandboxes bucket wasn't in v1. Should we add the version before namespace for sandbox?
#10467

containerd/core/metadata/buckets.go

Lines 309 to 315 in 67a0efc

func getSandboxBucket(tx *bolt.Tx, namespace string) *bolt.Bucket {

return getBucket(

tx,

[]byte(namespace),

bucketKeyObjectSandboxes,

)

}

I agree, but that seemed a big shift for this PR. This "use the underlying container for the given sandbox" vs "use the sandbox api as the source of truth to grab the sandboxes" is very confusing atm.

use the underlying container for the given sandbox

It's to recover sandbox created by previous release actually.

but that seemed a big shift for this PR

OK. What about adding a condition that if IP is empty, we should use metadata from sandbox bucket 😂

I can give that a try!

k8s-ci-robot · 2024-09-19T23:44:27Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

AkihiroSuda · 2024-10-17T10:25:04Z

Removing from v2.0 milestone as staled

github-actions · 2025-01-16T00:11:30Z

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed.

github-actions · 2025-01-24T00:11:28Z

This PR was closed because it has been stalled for 7 days with no activity.

k8s-ci-robot added do-not-merge/work-in-progress size/M labels Jun 28, 2024

dcantah mentioned this pull request Jun 28, 2024

[v2.0.0] No CNI info for pod sandbox after containerd restart when using user namespaces #10363

Closed

dcantah force-pushed the cri/userns-persist-ips branch from b2113dc to 51dbc98 Compare June 29, 2024 03:39

dcantah added this to the 2.0 milestone Jul 1, 2024

dcantah marked this pull request as ready for review July 2, 2024 16:27

k8s-ci-robot removed the do-not-merge/work-in-progress label Jul 2, 2024

dcantah added the area/cri Container Runtime Interface (CRI) label Jul 2, 2024

fuweid reviewed Jul 2, 2024

View reviewed changes

internal/cri/server/sandbox_run.go Show resolved Hide resolved

dcantah force-pushed the cri/userns-persist-ips branch from 51dbc98 to c148cc0 Compare July 9, 2024 06:41

k8s-ci-robot added size/L and removed size/M labels Jul 9, 2024

dcantah force-pushed the cri/userns-persist-ips branch 3 times, most recently from c56d3d0 to 7bf5940 Compare July 9, 2024 09:21

dcantah marked this pull request as draft July 9, 2024 21:18

k8s-ci-robot added the do-not-merge/work-in-progress label Jul 9, 2024

dcantah force-pushed the cri/userns-persist-ips branch 6 times, most recently from 12fe520 to b5a0bad Compare July 15, 2024 07:29

k8s-ci-robot added size/M and removed size/L labels Jul 15, 2024

dcantah force-pushed the cri/userns-persist-ips branch from b5a0bad to 82fdfad Compare July 15, 2024 08:15

dcantah marked this pull request as ready for review July 15, 2024 19:15

k8s-ci-robot removed the do-not-merge/work-in-progress label Jul 15, 2024

dosubot bot added the kind/bug label Jul 15, 2024

dcantah force-pushed the cri/userns-persist-ips branch from 82fdfad to 0feb455 Compare July 15, 2024 19:20

dcantah added 2 commits July 15, 2024 18:34

dcantah force-pushed the cri/userns-persist-ips branch from 0feb455 to 3f9c70c Compare July 16, 2024 01:34

fuweid reviewed Jul 16, 2024

View reviewed changes

rata mentioned this pull request Aug 19, 2024

internal/cri: simplify netns setup with pinned userns #10607

Merged

k8s-ci-robot added the needs-rebase label Sep 19, 2024

AkihiroSuda modified the milestones: 2.0, 2.1 Oct 17, 2024

github-actions bot added the Stale label Jan 16, 2025

github-actions bot closed this Jan 24, 2025

dmcgowan removed this from the 2.1 milestone Mar 8, 2025

	func getSandboxBucket(tx bolt.Tx, namespace string) bolt.Bucket {
	return getBucket(
	tx,
	[]byte(namespace),
	bucketKeyObjectSandboxes,
	)
	}

Conversation

dcantah commented Jun 28, 2024

Uh oh!

k8s-ci-robot commented Jun 28, 2024

Uh oh!

dcantah commented Jul 2, 2024

Uh oh!

Uh oh!

dcantah commented Jul 15, 2024

Uh oh!

dcantah commented Jul 15, 2024

Uh oh!

k8s-ci-robot commented Jul 16, 2024

Uh oh!

fuweid Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

dcantah Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

fuweid Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

fuweid Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcantah Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

fuweid Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

dcantah Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Sep 19, 2024

Uh oh!

AkihiroSuda commented Oct 17, 2024

Uh oh!

github-actions bot commented Jan 16, 2025

Uh oh!

github-actions bot commented Jan 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fuweid Jul 16, 2024 •

edited

Loading