Add support for userns in stateless and stateful pods with idmap mounts (KEP-127, k8s >= 1.27)#8287
Conversation
|
Hi @rata. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
0e5d719 to
fd201ac
Compare
fd201ac to
ac56187
Compare
|
Rebased, the test fail seems like a flake (is on windows, that is untouched) |
|
/ok-to-test |
|
@dims: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ac56187 to
ba89783
Compare
ba89783 to
ef903cd
Compare
|
Updated this to include support for sbserver too :) |
ef903cd to
fee0126
Compare
|
The runc PR was just merged. I'll see how to update the runc dependency in containerd and add unit and e2e tests here :) |
d6154d6 to
f4fdd49
Compare
f4fdd49 to
072c2dd
Compare
|
@fuweid I see you self assigned for this PR. Now it is ready for review! Take a look at the PR description, I've updated it now :) |
|
It seems I need to skip the tests on old kernels and that is causing the CI failures. |
f51f6c4 to
71da691
Compare
When the kubelet sends the uid/gid mappings for a mount, just pass them down to the OCI runtime. OCI runtimes support this since runc 1.2 and crun 1.8.1. And whenever we add mounts (container mounts or image spec volumes) and userns are requested by the kubelet, we use those mappings in the mounts so the mounts are idmapped correctly. If no userns is used, we don't send any mappings which just keeps the current behavior. Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
Signed-off-by: Rodrigo Campos <[email protected]>
This reverts commit 7e6ab84. Signed-off-by: Rodrigo Campos <[email protected]>
Future patches will use that field. Signed-off-by: Rodrigo Campos <[email protected]>
runc, as mandated by the runtime-spec, ignores unknown fields in the config.json. This is unfortunate for cases where we _must_ enable that feature or fail. For example, if we want to start a container with user namespaces and volumes, using the uidMappings/gidMappings field is needed so the UID/GIDs in the volume don't end up with garbage. However, if we don't fail when runc will ignore these fields (because they are unknown to runc), we will just start a container without using the mappings and the UID/GIDs the container will persist to volumes the hostUID/GID, that can change if the container is re-scheduled by Kubernetes. This will end up in volumes having "garbage" and unmapped UIDs that the container can no longer change. So, let's avoid this entirely by just checking that runc supports idmap mounts if the container we are about to create needs them. Please note that the "runc features" subcommand is only run when we are using idmap mounts. If idmap mounts are not used, the subcommand is not run and therefore this should not affect containers that don't use idmap mounts in any way. Signed-off-by: Rodrigo Campos <[email protected]>
|
@AkihiroSuda pushed fixing your comments (all documentation changes), so all should be green again |
|
One small nit in the docs with a link rendering issue due to typo; once we fix that, LGTM and hopefully we can go ahead and merge this in |
|
@estesp thanks, fixed! PTAL :) |
Signed-off-by: Rodrigo Campos <[email protected]>
crun 1.9 was just released with fixes and exposes idmap mounts support via the "features" sub-command. We use that feature to throw a clear error to users (if they request idmap mounts and the OCI runtime doesn't support it), but also to skip tests on CI when the OCI runtime doesn't support it. Let's bump it so the CI runs the tests with crun. Signed-off-by: Rodrigo Campos <[email protected]>
This adds support in containerd for k8s stateless and stateful pods with user namespaces as implemented in k8s >= 1.27. Kubernetes 1.28 added stateful pod support, but no other changes are needed in containerd, we just use idmap mounts for all volumes (stateless, like configmaps, or stateful, like hostPath volumes).
We have some requirements:
This requires runc with this PR applied to support idmap mounts: opencontainers/runc#3717. This is expected to be part of runc 1.2 (yet to be released). I'm changing the runc version to that commit, we can update it again when this is included in a release.
I also check if idmap mounts are used, then if the runc version supports that, to avoid silently ignoring the feature and creating issues to users (see the commit msg for more details). If it is not supported, we throw a clear error to the user.
Note I didn't update the runc go module, as that broke the unit tests. It probably needs to be done with more care. Furthermore, we don't need that for this, the runtime-spec module is already up to date with what we need, IIUC.
I've added this in both cri/server and cri/sbserver.
@mikebrow as you requested, here is the output if we try to use it with
ctrand an oldruncversion:And with a new runc (that supports idmap mounts):
TODO:
script/setup/runc-versionto point to the runc version with the PR merged. EDIT: updated crun-version, that has a realease with the features. We keep the same runc version and will open a new PR when runc makes a new minor release.Fixes: #8286