CRI: Properly restore IP information for userns sandboxes#10400
CRI: Properly restore IP information for userns sandboxes#10400dcantah wants to merge 2 commits intocontainerd:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
b2113dc to
51dbc98
Compare
|
/retest |
51dbc98 to
c148cc0
Compare
c56d3d0 to
7bf5940
Compare
12fe520 to
b5a0bad
Compare
b5a0bad to
82fdfad
Compare
|
@fuweid Sorry for delay, un-drafted this and added an integration test. Only some of the CI machines support userns in full so I added a log so we can grep for if the testcase is actually being hit. "restart_linux_test.go:36: adding userns pods for containerd restart test" |
82fdfad to
0feb455
Compare
|
/retest |
Fixes: containerd#10363 Due to when we setup networking for userns containers (AFTER the OCI runtime has ran), this was causing the underlying object for the sandbox to not have the IP information saved in the metadata extension. Because we're trying to move away from modifying the underlying container object for sandboxes (or even just keeping around a reference to it), I don't want to update the underlying container afterwards to include this. So, to remedy this we can update the sandboxes metadata after network setup runs for userns containers, and then our recovery logic just needs to be altered a bit. Our recovery logic already checks containers with the sandbox label first, and then checks our shiny new sandbox store. Our IP information is stored in the sandbox store objects, so all we need to do is check if it exists, and if so and we already found a container for it (although without the IP information) then just update the sandbox in our in-memory store. Signed-off-by: Danny Canter <[email protected]>
Add a new case to TestContainerdRestartSandboxRecover that will launch a userns pod if the host and runtime supports it. The new case will just test that we have networking info in the pod metadata on restart. Signed-off-by: Danny Canter <[email protected]>
0feb455 to
3f9c70c
Compare
|
@dcantah: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| sandboxCreateNetworkTimer.UpdateSince(netStart) | ||
| } | ||
|
|
||
| if err := sandboxInfo.AddExtension(podsandbox.MetadataKey, &sandbox.Metadata); err != nil { |
There was a problem hiding this comment.
maybe we should move these two updates AddExtension and SandboxStore.Update into the end of if !hostNetwork(config) && userNsEnabled because non-user-ns case doesn't need to update it.
There was a problem hiding this comment.
Works for me. When I'm by my computer I'll see if there's any gotchas
| } | ||
| log.G(ctx2).Debugf("Loaded sandbox %+v", sb) | ||
| if err := c.sandboxStore.Add(sb); err != nil { | ||
| if err := c.sandboxStore.Add(sb); err != nil && !errors.Is(err, errdefs.ErrAlreadyExists) { |
| } | ||
| for _, sbx := range storedSandboxes { | ||
| if _, err := c.sandboxStore.Get(sbx.ID); err == nil { | ||
| sb, err := c.sandboxStore.Get(sbx.ID) |
There was a problem hiding this comment.
Suggestion: we should merge metadata-db.sandbox-store.metadata with metadata-db.container.metadata in function podSandboxLoader.RecoverContainer. This loop is designed for the sandboxes which are managed by remote sandbox plugin. I think we should not involve two sources in one loop.
In the RecoverContainer function, we should prefer to use metadata from metadata-db.sandbox. What do you think?
I found that the sandboxes bucket wasn't in v1. Should we add the version before namespace for sandbox?
#10467
containerd/core/metadata/buckets.go
Lines 309 to 315 in 67a0efc
There was a problem hiding this comment.
I agree, but that seemed a big shift for this PR. This "use the underlying container for the given sandbox" vs "use the sandbox api as the source of truth to grab the sandboxes" is very confusing atm.
There was a problem hiding this comment.
use the underlying container for the given sandbox
It's to recover sandbox created by previous release actually.
but that seemed a big shift for this PR
OK. What about adding a condition that if IP is empty, we should use metadata from sandbox bucket 😂
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Removing from v2.0 milestone as staled |
|
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed. |
|
This PR was closed because it has been stalled for 7 days with no activity. |
Fixes: #10363
Due to when we setup networking for userns containers (AFTER the OCI runtime has ran), this was causing the underlying object for the sandbox to not have the IP information saved in the metadata extension. Because we're trying to move away from modifying the underlying container object for sandboxes (or even just keeping around a reference to it), I don't want to update the underlying container afterwards to include this. So, to remedy this we can update the sandboxes metadata after network setup runs for userns containers, and then our recovery logic just needs to be altered a bit.
Our recovery logic already checks containers with the sandbox label first, and then checks our shiny new sandbox store. Our IP information is stored in the sandbox store objects, so all we need to do is check if it exists, and if so and we already found a container for it (although without the IP information) then just update the sandbox in our in-memory store.