runtime/v2: cleanup dead shim before delete bundle#4538
Merged
estesp merged 1 commit intocontainerd:masterfrom Sep 21, 2020
Merged
runtime/v2: cleanup dead shim before delete bundle#4538estesp merged 1 commit intocontainerd:masterfrom
estesp merged 1 commit intocontainerd:masterfrom
Conversation
|
Build succeeded.
|
2e52f91 to
6a82da4
Compare
|
Build succeeded.
|
Member
Author
|
ping @crosbymichael @AkihiroSuda @cpuguy83 @mikebrow PTAL~ |
thaJeztah
reviewed
Sep 11, 2020
Merged
Member
|
@fuweid I see you tagged ttrpc v1.0.2; https://github.com/containerd/ttrpc/releases/tag/v1.0.2 can you update the vendor.conf to use that tag? |
The shim delete action needs bundle information to cleanup resources created by shim. If the cleanup dead shim is called after delete bundle, the part of resources maybe leaky. The ttrpc client UserOnCloseWait() can make sure that resources are cleanup before delete bundle, which synchronizes task deletion and cleanup deadshim. It might slow down the task deletion, but it can make sure that resources can be cleanup and avoid EBUSY umount case. For example, the sandbox container like Kata/Firecracker might have mount points over the rootfs. If containerd handles task deletion and cleanup deadshim parallelly, the task deletion will meet EBUSY during umount and fail to cleanup bundle, which makes case worse. And also update cleanupAfterDeadshim, which makes sure that cleanupAfterDeadshim must be called after shim disconnected. In some case, shim fails to call runc-create for some reason, but the runc-create already makes runc-init into ready state. If containerd doesn't call shim deletion, the runc-init process will be leaky and hold the cgroup, which makes pod terminating :(. Signed-off-by: Wei Fu <[email protected]>
6a82da4 to
4b05d03
Compare
|
Build succeeded.
|
Member
Author
|
@thaJeztah sorry for late update. I updated the vendor.conf and PTAL. Thanks! |
AkihiroSuda
approved these changes
Sep 20, 2020
This was referenced Nov 25, 2020
Member
|
This commit seems causing a regression: #4769 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The shim delete action needs bundle information to cleanup resources
created by shim. If the cleanup dead shim is called after delete bundle,
the part of resources maybe leaky.
The ttrpc client UserOnCloseWait() can make sure that resources are
cleanup before delete bundle, which synchronizes task deletion and
cleanup deadshim. It might slow down the task deletion, but it can make
sure that resources can be cleanup and avoid EBUSY umount case. For
example, the sandbox container like Kata/Firecracker might have mount
points over the rootfs. If containerd handles task deletion and cleanup
deadshim parallelly, the task deletion will meet EBUSY during umount and
fail to cleanup bundle, which makes case worse.
And also update cleanupAfterDeadshim, which makes sure that
cleanupAfterDeadshim must be called after shim disconnected. In some
case, shim fails to call runc-create for some reason, but the runc-create
already makes runc-init into ready state. If containerd doesn't call shim
deletion, the runc-init process will be leaky and hold the cgroup, which
makes pod terminating :(.
Signed-off-by: Wei Fu [email protected]