KAFKA-2143: fix replica offset truncate to beginning during leader migration. by becketqin · Pull Request #129 · apache/kafka

becketqin · 2015-08-10T18:04:29Z

No description provided.

lazyval · 2015-08-10T18:11:03Z

Seems like it fixes symptom (don't truncate log when follower is confused), instead of fixing original cause (possible race condition).

asfbot · 2015-08-10T19:13:58Z

kafka-trunk-git-pr #121 SUCCESS
This pull request looks good

becketqin · 2015-08-10T19:35:46Z

@lazyval , ideally the exception returned by leader should contain more information. However, that is a big wire protocol change and perhaps not necessary. The easier fix is to handle the exception correctly. So the problem is how to deal with the race condition. The follower is not confused in this case, it just needs one more check to understand the situation.

artem-kirillov-gd · 2015-08-11T15:11:54Z

@becketqin, original issue https://issues.apache.org/jira/browse/KAFKA-2143 states that replica gets ahead of leader but you applied changes to branch for case when replica is far behind leader.

gwenshap · 2015-08-23T03:05:15Z

core/src/main/scala/kafka/server/ReplicaFetcherThread.scala

The doc is a bit confusing and I think it is important to be clear here. Does "follower" refer to the old leader or a 3rd replica? (i.e. if broker A is the old leader and B is the new leader, does "follower" refer to A or C?)

Yeah... I think it perhaps is not easy to explain. Actually the follower can be either A or C. Because both of them are possible to have their high watermark greater than B's Log end offset.

The rationale behind this handling offset out of range is to figure out whether an offset OST the follower is trying to fetch was too big or to small. The way we check this is to query the log end offset (LEO) on the leader. If OST is greater than LEO it is too big, otherwise it is too small.
However, there is some time between the follower receives the OffsetOutOfRange exception and starts to handle it. That means when the follower got the exception first time, OST might be greater than leader's LEO. But when the follower starts to handle the exception and query the LEO again, OST is already smaller than leader's LEO - because the leader might already got some new messages appended, the offset perhaps is no longer too big. In this case, current code would say OST was too small and truncate all the logs, which is a bug.

Can you reword the second scenario and clarify? Especially the part where "However, there is some time between the follower receives the OffsetOutOfRange exception and starts to handle it. That means when the follower got the exception first time, OST might be greater than leader's LEO. But when the follower starts to handle the exception and query the LEO again, OST is already smaller than leader's LEO - because the leader might already got some new messages appended, the offset perhaps is no longer too big."

It is non-intuitive and understanding this is important for understanding the code and our logic.

@gwenshap Sorry for late reply... I somehow missed email from your followup comments...
I will put a more detailed sequence there but this could be a bit verbose. And people probably still need to read more code to understand what happened.

…eader migration.

asfbot · 2015-09-18T20:19:21Z

kafka-trunk-git-pr #446 FAILURE
Looks like there's a problem with this pull request

guozhangwang · 2015-10-03T01:03:46Z

LGTM.

guozhangwang · 2015-10-05T18:36:07Z

@becketqin Before committing this fix we need to figure out one thing, that this issue is ONLY due to an unclean election, and since in this case inconsistency will probably happen already we may allow follower to be different from leader. But as @junrao mentioned this can be also triggered by other scenarios then we cannot allow replicas to be inconsistent.

becketqin · 2015-10-05T21:09:18Z

@guozhangwang The other issue is tracked by KAFKA-2477. After that is fixed, we should not see the OffsetOutOfRangeException triggered by that issue.

…gration. Author: Jiangjie Qin <[email protected]> Reviewers: Gwen Shapira, Guozhang Wang Closes #129 from becketqin/KAFKA-2143 (cherry picked from commit 87eccb9) Signed-off-by: Guozhang Wang <[email protected]>

guozhangwang · 2016-01-29T04:20:21Z

Merged to trunk and 0.9.0.

Upload tier epoch state to tiered storage, so it can be retrieved when needed. For each segment we upload to tiered storage, we also upload a snapshot of the tier epoch cache at that point. The snapshot is truncated to the end offset of the segment being tiered.

bjehp · 2020-01-13T04:11:35Z

@becketqin I think there are some problems with the comments. According to the logic of your program, fetchOffset should be the max value of (leaderStartOffset, replica.logEndOffset). But now the comment says that the fetchOffset is leaderStartOffset. Here, I give an immature comment, please consider adoption.

Origin code

      val leaderStartOffset: Long = earliestOrLatestOffset(topicAndPartition, ListOffsetRequest.EARLIEST_TIMESTAMP,
        brokerConfig.brokerId)
      warn("Replica %d for partition %s reset its fetch offset from %d to current leader %d's start offset %d"
        .format(brokerConfig.brokerId, topicAndPartition, replica.logEndOffset.messageOffset, sourceBroker.id, leaderStartOffset))
      val offsetToFetch = Math.max(leaderStartOffset, replica.logEndOffset.messageOffset)
      // Only truncate log when current leader's log start offset is greater than follower's log end offset.
      if (leaderStartOffset > replica.logEndOffset.messageOffset)
        replicaMgr.logManager.truncateFullyAndStartAt(topicAndPartition, leaderStartOffset)
      offsetToFetch

Suppose modified code

      val leaderStartOffset: Long = earliestOrLatestOffset(topicAndPartition, ListOffsetRequest.EARLIEST_TIMESTAMP,
        brokerConfig.brokerId)
      val offsetToFetch = Math.max(leaderStartOffset, replica.logEndOffset.messageOffset)
      warn("Replica %d for partition %s reset its fetch offset from %d to %d, which is the max of current leader %d's start offset %d and replica logEndOffset %d"
        .format(brokerConfig.brokerId, topicPartition, replica.logEndOffset.messageOffset, offsetToFetch, sourceBroker.id, leaderStartOffset, replica.logEndOffset.messageOffset))
      // Only truncate log when current leader's log start offset is greater than follower's log end offset.
      if (leaderStartOffset > replica.logEndOffset.messageOffset)
        replicaMgr.logManager.truncateFullyAndStartAt(topicAndPartition, leaderStartOffset)
      offsetToFetch

CSMDS-564: Fix GHA runner label for unit tests (apache#74) CSMDS-885: Jenkins unit tester action - patching api4jenkins with MavenModuleSetBuild (apache#129) CSMDS-882: Updating workflow runner to redhat8-builds (apache#128)

gwenshap reviewed Aug 23, 2015
View reviewed changes

becketqin added 2 commits September 18, 2015 11:52

Fix for KAFKA-2134, fix replica offset truncate to beginning during l…

d7bfaf5

…eader migration.

Addressed Gwen's comments.

baeb9c6

becketqin force-pushed the KAFKA-2143 branch from 71f8a47 to baeb9c6 Compare September 18, 2015 20:08

asfgit closed this in 87eccb9 Jan 29, 2016

apurvam added a commit to apurvam/kafka that referenced this pull request Mar 9, 2017

Client side implementation of the idempotent producer. (apache#129)

b4169bd

hachikuji pushed a commit to hachikuji/kafka that referenced this pull request Mar 9, 2017

Client side implementation of the idempotent producer. (apache#129)

f64bb26

apurvam added a commit to apurvam/kafka that referenced this pull request Mar 15, 2017

Client side implementation of the idempotent producer. (apache#129)

d531120

hachikuji pushed a commit to hachikuji/kafka that referenced this pull request Mar 22, 2017

Client side implementation of the idempotent producer. (apache#129)

9dc320c

apurvam added a commit to apurvam/kafka that referenced this pull request Mar 24, 2017

Client side implementation of the idempotent producer. (apache#129)

31ff08e

hachikuji pushed a commit to hachikuji/kafka that referenced this pull request Mar 28, 2017

Client side implementation of the idempotent producer. (apache#129)

f6614bb

apurvam added a commit to apurvam/kafka that referenced this pull request Mar 31, 2017

Client side implementation of the idempotent producer. (apache#129)

7c61295

efeg pushed a commit to efeg/kafka that referenced this pull request Jan 29, 2020

Remove the old replica check when executing tasks. (apache#129)

2e07e72

yangdaixai mentioned this pull request Aug 6, 2021

KAFKA-13175; Better indicates for created topic is marked for deleti… #11185

Closed

3 tasks

Conversation

becketqin commented Aug 10, 2015

Uh oh!

lazyval commented Aug 10, 2015

Uh oh!

asfbot commented Aug 10, 2015

Uh oh!

becketqin commented Aug 10, 2015

Uh oh!

artem-kirillov-gd commented Aug 11, 2015

Uh oh!

gwenshap Aug 23, 2015

Choose a reason for hiding this comment

Uh oh!

becketqin Aug 23, 2015

Choose a reason for hiding this comment

Uh oh!

gwenshap Sep 1, 2015

Choose a reason for hiding this comment

Uh oh!

becketqin Sep 18, 2015

Choose a reason for hiding this comment

Uh oh!

asfbot commented Sep 18, 2015

Uh oh!

guozhangwang commented Oct 3, 2015

Uh oh!

guozhangwang commented Oct 5, 2015

Uh oh!

becketqin commented Oct 5, 2015

Uh oh!

guozhangwang commented Jan 29, 2016

Uh oh!

bjehp commented Jan 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bjehp commented Jan 13, 2020 •

edited

Loading