fix container discarding event#9652
Conversation
|
Hi @ningmingxiao. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
9affa74 to
d1e0739
Compare
433e0b3 to
0c4810d
Compare
Signed-off-by: ningmingxiao <[email protected]>
|
@ningmingxiao can you add issue link? thanks. |
|
| containerEventsDroppedCount.Inc() | ||
| log.G(ctx).Debugf("containerEventsChan is full, discarding event %+v", event) | ||
| } | ||
| eq.enqueue(event) |
There was a problem hiding this comment.
What does the queue add over just
go func(e runtime.ContainerEventResponse) {
c.containerEventsChan <- e
}(event)Neither approach seems to guarantee the order if that is the intent.
There was a problem hiding this comment.
If order preservation is important, could also just have a single dequeue go routine which runs when there are items in the queue, like
func (eq *eventQueue) enqueue(c chan<- runtime.ContainerEventResponse, event runtime.ContainerEventResponse) {
eq.lock.Lock()
defer eq.lock.Unlock()
eq.items = append(eq.items, event)
if len(eq.items) == 1 {
go func() {
var event *runtime.ContainerEventResponse
for {
eq.lock.Lock()
if event != nil {
eq.items = eq.items[1:]
}
if len(eq.items) == 0 {
eq.lock.Unlock()
return
}
event = &eq.items[0]
eq.lock.Unlock()
c <- *event
}
}()
}
}There was a problem hiding this comment.
If just generate one event
for {
eq.lock.Lock()
// here len(eq.items)=1 ,event is not nill
if event != nil {
//after next stup len(eq.items) = 0
eq.items = eq.items[1:]
}
//here will return len(eq.items)=0 will lost a event
if len(eq.items) == 0 {
eq.lock.Unlock()
return
}
event = &eq.items[0]
eq.lock.Unlock()
c <- *event
}
There was a problem hiding this comment.
how about add a lock in criService struct
c.mu.lock()
eq.enqueue(event)
c.mu.unlock()
to keep event in order
@dmcgowan
There was a problem hiding this comment.
This should be looked for a better solution overall rather than just add more locking/queuing. There are already TODOs around the publishing of the events and it is using a channel which a large buffer which is not a good pattern to avoid lost messages. It probably makes more sense to use a queue but to avoid leaking memory if there is no one calling GetContainerEvents the queue should have an expiration if nothing is removing items from the queue. I don't think a quick fix to just add a queue is going to work as it may lead to a memory leak.
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed. |
|
This PR was closed because it has been stalled for 7 days with no activity. |
use crictl create 1000 containers and containerEventsChan will full, then don't call GetContainerEvents to consume events, events will lost. will show "containerEventsChan is full, discarding event ..."
issue:#8892