Skip to content

[release/2.1] sys: fix pidfd leak in UnshareAfterEnterUserns#12179

Merged
mxpv merged 1 commit intocontainerd:release/2.1from
k8s-infra-cherrypick-robot:cherry-pick-12167-to-release/2.1
Aug 7, 2025
Merged

[release/2.1] sys: fix pidfd leak in UnshareAfterEnterUserns#12179
mxpv merged 1 commit intocontainerd:release/2.1from
k8s-infra-cherrypick-robot:cherry-pick-12167-to-release/2.1

Conversation

@k8s-infra-cherrypick-robot

This is an automated cherry-pick of #12167

/assign fuweid

@k8s-ci-robot
Copy link

Hi @k8s-infra-cherrypick-robot. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dosubot dosubot bot added the kind/bug label Aug 7, 2025
@austinvazquez
Copy link
Member

/ok-to-test

@fuweid fuweid force-pushed the cherry-pick-12167-to-release/2.1 branch from dd0618f to 779b7b9 Compare August 7, 2025 14:38
@fuweid
Copy link
Member

fuweid commented Aug 7, 2025

cc @dcantah @jfernandez @rata @halaney

UnshareAfterEnterUserns() creates a pidfd via os.StartProcess() with
CLONE_PIDFD but fails to close the file descriptor in any code path,
resulting in a file descriptor leak for every container that uses user
namespace isolation.

The leak occurs because:
- The pidfd is created when PidFD field is set in SysProcAttr
- The original defer block only calls PidfdSendSignal() and
  pidfdWaitid()
- No code path calls unix.Close(pidfd) to release the file descriptor

This causes one pidfd leak per container launch when user namespace
isolation is enabled (e.g., Kubernetes pods with hostUsers: false). In
production environments with high container churn, this can exhaust the
system's file descriptor limit.

Fix the leak by adding a defer statement immediately after process
creation that ensures unix.Close(pidfd) is always called, regardless of
which code path is taken. This guarantees cleanup even if the function
returns early due to errors or lack of pidfd support.

This follows the same cleanup pattern already established in
core/mount/mount_idmapped_utils_linux.go:getUsernsFD() which properly
closes its pidfd.

Closes: containerd#12166
Signed-off-by: Jose Fernandez <[email protected]>
[Move SupportsPidFD up to handle dupfd in Go 1.23.{0,1} and simplify backport]
Signed-off-by: Wei Fu <[email protected]>
@fuweid fuweid force-pushed the cherry-pick-12167-to-release/2.1 branch from 779b7b9 to 5ef6ea7 Compare August 7, 2025 17:36
@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Aug 7, 2025
@mxpv mxpv merged commit ea643fe into containerd:release/2.1 Aug 7, 2025
54 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

8 participants