Skip to content

Add support for userns in stateless and stateful pods with idmap mounts (KEP-127, k8s >= 1.27)#8287

Merged
fuweid merged 11 commits intocontainerd:mainfrom
kinvolk:rata/userns-stateless-idmap
Sep 14, 2023
Merged

Add support for userns in stateless and stateful pods with idmap mounts (KEP-127, k8s >= 1.27)#8287
fuweid merged 11 commits intocontainerd:mainfrom
kinvolk:rata/userns-stateless-idmap

Conversation

@rata
Copy link
Contributor

@rata rata commented Mar 17, 2023

This adds support in containerd for k8s stateless and stateful pods with user namespaces as implemented in k8s >= 1.27. Kubernetes 1.28 added stateful pod support, but no other changes are needed in containerd, we just use idmap mounts for all volumes (stateless, like configmaps, or stateful, like hostPath volumes).

We have some requirements:

  • The filesystems should support idmap mounts. The most late adition was tmpfs, that we merged support in Linux 6.3 for idmap mounts, so in practice you will need Linux 6.3 for most stateless pods to work with userns.
  • The OCI runtime needs to support idmap mounts too.

This requires runc with this PR applied to support idmap mounts: opencontainers/runc#3717. This is expected to be part of runc 1.2 (yet to be released). I'm changing the runc version to that commit, we can update it again when this is included in a release.

I also check if idmap mounts are used, then if the runc version supports that, to avoid silently ignoring the feature and creating issues to users (see the commit msg for more details). If it is not supported, we throw a clear error to the user.

Note I didn't update the runc go module, as that broke the unit tests. It probably needs to be done with more care. Furthermore, we don't need that for this, the runtime-spec module is already up to date with what we need, IIUC.

I've added this in both cri/server and cri/sbserver.

@mikebrow as you requested, here is the output if we try to use it with ctr and an old runc version:

$ sudo bin/ctr --address /run/containerd-rata/containerd.sock container create --config tmp/config-userns-idmap.json rata-test
$ sudo bin/ctr --address /run/containerd-rata/containerd.sock t start rata-test
ctr: failed to create shim task: failed to detect OCI runtime features: OCI runtime doesn't support idmap mounts: missing `mountExtensions.idmap` entry in `features` command: unknown

And with a new runc (that supports idmap mounts):

$ sudo bin/ctr --address /run/containerd-rata/containerd.sock container create  --config tmp/config-userns-idmap.json rata-test 
$ sudo bin/ctr --address /run/containerd-rata/containerd.sock t start rata-test
root@runc:/# exit

TODO:

  • Update script/setup/runc-version to point to the runc version with the PR merged. EDIT: updated crun-version, that has a realease with the features. We keep the same runc version and will open a new PR when runc makes a new minor release.
  • Add unit test
  • Add integration tests.

Fixes: #8286

@k8s-ci-robot
Copy link

Hi @rata. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rata rata force-pushed the rata/userns-stateless-idmap branch from 0e5d719 to fd201ac Compare March 17, 2023 17:26
@fuweid fuweid self-requested a review March 18, 2023 02:21
@rata rata force-pushed the rata/userns-stateless-idmap branch from fd201ac to ac56187 Compare March 21, 2023 12:01
@rata
Copy link
Contributor Author

rata commented Mar 21, 2023

Rebased, the test fail seems like a flake (is on windows, that is untouched)

@dims
Copy link
Member

dims commented Mar 29, 2023

/ok-to-test

@k8s-ci-robot
Copy link

@dims: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rata rata force-pushed the rata/userns-stateless-idmap branch from ac56187 to ba89783 Compare April 13, 2023 09:12
@rata rata changed the title Add support for userns in stateless pods with idmap mounts (KEP-127, k8s >= 1.27) Add support for userns in stateless and stateful pods with idmap mounts (KEP-127, k8s >= 1.27) Jul 11, 2023
@rata rata force-pushed the rata/userns-stateless-idmap branch from ba89783 to ef903cd Compare July 11, 2023 13:39
@rata
Copy link
Contributor Author

rata commented Jul 11, 2023

Updated this to include support for sbserver too :)

@rata rata force-pushed the rata/userns-stateless-idmap branch from ef903cd to fee0126 Compare July 12, 2023 11:33
@rata
Copy link
Contributor Author

rata commented Jul 17, 2023

The runc PR was just merged. I'll see how to update the runc dependency in containerd and add unit and e2e tests here :)

@rata rata force-pushed the rata/userns-stateless-idmap branch 2 times, most recently from d6154d6 to f4fdd49 Compare July 19, 2023 14:12
@rata rata marked this pull request as ready for review July 19, 2023 14:12
@rata rata force-pushed the rata/userns-stateless-idmap branch from f4fdd49 to 072c2dd Compare July 19, 2023 14:20
@rata
Copy link
Contributor Author

rata commented Jul 19, 2023

@fuweid I see you self assigned for this PR. Now it is ready for review! Take a look at the PR description, I've updated it now :)

@rata
Copy link
Contributor Author

rata commented Jul 19, 2023

It seems I need to skip the tests on old kernels and that is causing the CI failures.

fuweid
fuweid previously approved these changes Jul 20, 2023
Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me.

@fuweid fuweid dismissed their stale review July 20, 2023 05:27

need to solve the CI issue

@rata rata force-pushed the rata/userns-stateless-idmap branch 4 times, most recently from f51f6c4 to 71da691 Compare July 20, 2023 13:52
When the kubelet sends the uid/gid mappings for a mount, just pass them
down to the OCI runtime.

OCI runtimes support this since runc 1.2 and crun 1.8.1.

And whenever we add mounts (container mounts or image spec volumes) and
userns are requested by the kubelet, we use those mappings in the mounts
so the mounts are idmapped correctly. If no userns is used, we don't
send any mappings which just keeps the current behavior.

Signed-off-by: Rodrigo Campos <[email protected]>
This reverts commit 7e6ab84.

Signed-off-by: Rodrigo Campos <[email protected]>
Future patches will use that field.

Signed-off-by: Rodrigo Campos <[email protected]>
runc, as mandated by the runtime-spec, ignores unknown fields in the
config.json. This is unfortunate for cases where we _must_ enable that
feature or fail.

For example, if we want to start a container with user namespaces and
volumes, using the uidMappings/gidMappings field is needed so the
UID/GIDs in the volume don't end up with garbage. However, if we don't
fail when runc will ignore these fields (because they are unknown to
runc), we will just start a container without using the mappings and the
UID/GIDs the container will persist to volumes the hostUID/GID, that can
change if the container is re-scheduled by Kubernetes.

This will end up in volumes having "garbage" and unmapped UIDs that the
container can no longer change. So, let's avoid this entirely by just
checking that runc supports idmap mounts if the container we are about
to create needs them.

Please note that the "runc features" subcommand is only run when we are
using idmap mounts. If idmap mounts are not used, the subcommand is not
run and therefore this should not affect containers that don't use idmap
mounts in any way.

Signed-off-by: Rodrigo Campos <[email protected]>
@rata
Copy link
Contributor Author

rata commented Sep 13, 2023

@AkihiroSuda pushed fixing your comments (all documentation changes), so all should be green again

@estesp
Copy link
Member

estesp commented Sep 13, 2023

One small nit in the docs with a link rendering issue due to typo; once we fix that, LGTM and hopefully we can go ahead and merge this in

@rata
Copy link
Contributor Author

rata commented Sep 13, 2023

@estesp thanks, fixed! PTAL :)

crun 1.9 was just released with fixes and exposes idmap mounts support
via the "features" sub-command.

We use that feature to throw a clear error to users (if they request
idmap mounts and the OCI runtime doesn't support it), but also to skip
tests on CI when the OCI runtime doesn't support it.

Let's bump it so the CI runs the tests with crun.

Signed-off-by: Rodrigo Campos <[email protected]>
Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) impact/changelog ok-to-test

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Support for userns in k8s >= 1.27

9 participants