libcontainer: Store state.json before sync procRun#2575
libcontainer: Store state.json before sync procRun#2575AkihiroSuda merged 1 commit intoopencontainers:masterfrom
Conversation
If the procRun state has been synced and the runc-create process has
been killed for some reason, the runc-init[2:stage] process will be
leaky. And the runc command also fails to parse root directory because
the container doesn't have state.json.
In order to make it possible to clean the leaky runc-init[2:stage]
process , we should store the status before sync procRun.
```before
current workflow:
[ child ] <-> [ parent ]
procHooks --> [run hooks]
<-- procResume
procReady --> [final setup]
<-- procRun
( killed for some reason)
( store state.json )
```
```expected
expected workflow:
[ child ] <-> [ parent ]
procHooks --> [run hooks]
<-- procResume
procReady --> [final setup]
store state.json
<-- procRun
```
Signed-off-by: Wei Fu <[email protected]>
66b39cf to
ba0246d
Compare
|
ping @AkihiroSuda @kolyshkin PTAL, thanks! |
|
ping @opencontainers/runc-maintainers PTAL, thanks~ |
kolyshkin
left a comment
There was a problem hiding this comment.
LGTM. Wonder what was the reason of runc-create's premature demise that you saw?
@kolyshkin I didn't see what happened at the comment. We use kubernetes and containerd. The CRI StartContainer returned time out error and runc-init process was leaky and tried to open exec.fifo. It is hard to reproduce this issue. I meet few cases per few days in my production. When this case shows up, we see the system reclaims many more memory or OOM for the container during runc-create. But didn't find the root cause.... This patch is used to prevent runc-init leaky. I would like to share the root cause if I can get the clue from log monitor. :) |
|
ping @AkihiroSuda ~ |
If the procRun state has been synced and the runc-create process has
been killed for some reason, the runc-init[2:stage] process will be
leaky. And the runc command also fails to parse root directory because
the container doesn't have state.json.
In order to make it possible to clean the leaky runc-init[2:stage]
process , we should store the status before sync procRun.
Signed-off-by: Wei Fu [email protected]