KAFKA-1893: Allow regex subscriptions in the new consumer by SinghAsDev · Pull Request #128 · apache/kafka

SinghAsDev · 2015-08-10T18:02:12Z

No description provided.

hachikuji · 2015-08-10T18:08:17Z

clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java

It would be nice to keep this asynchronous. Instead of calling listTopics, which blocks, could we initiate a new metadata fetch directly and add a listener to handle the completion?

Hey @hachikuji , even I was initially thinking to have it asynchronous, but I could not convince myself that we will gain much with it? schedulePatternSubscriptionTask is already a delayed task that user do not have to wait on. Moreover initiating a new metadata fetch directly needs to handle retries, so more params for max retries, interval, etc. I was under impression that we made ListTopics handle all that so that we will not have to worry about it again during regex implementation. I might be missing something.

@SinghAsDev Since KafkaConsumer has only one thread, even scheduled tasks have to be executed in that thread, which means the user has to wait for them. Since you can't really control when the tasks will be executed, in the worst case, it could turn a non-blocking call into a blocking one. And I don't see why error handling can't be handled asynchronously. For updating regex subscriptions, I wouldn't think it too much of a big deal even if we just ignored failures and waited for the next metadata update, though it would be easy to implement retries with backoff (I think we do this for heartbeats already).

True. For error handling, I was saying that it might require more configs not the complexity. However, we should be able to reuse the existing configs. Will update the patch. Thanks!

asfbot · 2015-08-10T18:47:53Z

kafka-trunk-git-pr #120 SUCCESS
This pull request looks good

benstopford · 2015-08-11T20:10:41Z

Good comments from Grant. Other than that this change looks good.

SinghAsDev · 2015-08-12T22:44:27Z

@hachikuji @granthenke I have addressed your review comments. Let me know if I am still missing something. Thanks!

asfbot · 2015-08-12T23:04:26Z

kafka-trunk-git-pr #133 SUCCESS
This pull request looks good

hachikuji · 2015-08-12T23:13:23Z

clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java

@SinghAsDev Are there any error cases to check here? Disconnects for example?

Do you mean in case of errors, I should log it and schedule the task as in case of onFailure?

@SinghAsDev Haha, I can understand why this is a little confusing, but the network layer considers the request a "success" if it gets a response. However, that doesn't mean that there wasn't an error code in that response. It might be nice to handle this at the network layer, but it seems to me that there wasn't a generic way to check for errors. Each response object had the error code at a different location in its schema, so the only thing we could do is pass the response back and let the application determine if there was an error. The one case where we might be able to infer errors generically is by checking ClientResponse.wasDisconnected(), but I don't think we do this either (I've forgotten if there's a good reason for that).

@hachikuji updated. Let me know if I got that right :)

@SinghAsDev I think that works! I wasn't sure if we needed to handle errors in the MetadataResponse itself, but it looks like the topics are only added to the Cluster if they had no error in the response.

asfbot · 2015-08-13T18:49:40Z

kafka-trunk-git-pr #140 SUCCESS
This pull request looks good

hachikuji · 2015-08-13T20:28:58Z

LGTM

@guozhangwang Do you want to have a look?

guozhangwang · 2015-08-14T23:07:36Z

clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java

Do we need a new config for subscription interval? Could we piggy-back the logic on regular metadata refresh?

Makes sense. Removed.

hachikuji · 2015-08-17T19:04:45Z

@SinghAsDev Another problem that @guozhangwang and I briefly talked about was whether we should augment the Metadata object to support getting metadata for all topics (with a hook to get a notification when there is a change)*. Your approach is exactly the one I had in mind, but it does result in a little more overhead since metadata refreshes are done both in KafkaConsumer and in NetworkClient. The nice thing about separating them is that it lets you refresh the full metadata list less frequently. If your regex only covers a small set of the topics, then that makes "normal" metadata refreshes cheaper and they can be done more often. However, it seems a little questionable whether this capability is actually useful in practice, so reducing the metadata overhead might be the bigger win. What do you think?

*Note that it's still useful to be able to bypass NetworkClient to support the listTopics API (which doesn't affect subscriptions).

eribeiro · 2015-08-17T22:43:49Z

clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java

@SinghAsDev you are not increasing pos! It should be topicsToSubscribe[pos++] = topic, right?

In fact, I would use a list as below:

List<String> topicsToSubscribe = new ArrayList<>(partitions.size()); blahblahblah .... subscribeTopics(topicsToSubscribe.toArray(new String[0]), true);

But I am fine with the array too.

Aha, nice catch. Will fix. Thanks!

asfbot · 2015-08-18T06:57:03Z

kafka-trunk-git-pr #160 FAILURE
Looks like there's a problem with this pull request

SinghAsDev · 2015-08-18T19:37:29Z

@hachikuji that makes sense and I actually suggested the same even for listTopics patch. Is my understanding correct that what you are suggesting is similar to my suggestion on listTopics JIRA?

hachikuji · 2015-08-18T22:35:27Z

@SinghAsDev I think the listTopics approach is still fine and doesn't need to be changed. The key difference is that listTopics doesn't affect subscriptions, so it should not have any impact on the Metadata shared throughout the consumer.

guozhangwang · 2015-08-26T01:44:56Z

@SinghAsDev Sorry for being late on this, regarding the subscribe semantics I am convinced 1) we do not need to have an extra blacklist pattern in the API but just let users specify that in the regex, and 2) we do not need to do incremental subscription and replacing subscription should just be fine.

Regarding the metadata refresh, what I was thinking is that the current patch took a different scheduled task with different topic sets outside Metadata for regex subscription, while if we can piggy-back it with metadata refresh we can potentially reduce the code complexity while the cost of having metadata refresh always asking for all topics seems OK to me since in practice users would probably want to set the "regex refreshing interval" to be the same as "metadata refreshing interval" the same anyways. What do you think?

On the other hand, we do not necessarily need to make ListTopics API also changing the Metadata states object as @hachikuji suggested since I feel it is in many cases a one-time thing.

I could review this PR again and check it in once it gets rebased / updated.

SinghAsDev · 2015-08-26T22:29:23Z

@guozhangwang @hachikuji makes sense. Will update. However, for this to happen we need to somehow make networkClient aware of the fact that if pattern subscription is being used get metadata for all topics. There are a few ways I could think of for doing this.

Have Metadata object maintain a flag that indicates wether we need metadata for all topics or just for the ones maintained in metadata.
Have NetworkClient maintain a flag which is set by KafkaConsumer during subscribe(pattern) and unsubscribed during unsubscribe(pattern). This will require changing the KafkaClient interface though.

I am more inclined towards Option 1. What do you guys suggest?

hachikuji · 2015-08-26T22:39:35Z

@SinghAsDev Option 1 definitely sounds nicer to me.

hachikuji · 2015-08-26T23:05:57Z

@SinghAsDev Btw, I was planning to use a MetadataListener in KAFKA-2464 to hook into metadata updates: https://github.com/apache/kafka/pull/165/files#diff-62bba39339405475f71241a182ef9819R229. Seems like that might be sufficient for this case too.

SinghAsDev · 2015-08-26T23:14:52Z

@hachikuji yea, it will be needed. Should I wait for your patch to go in?

hachikuji · 2015-08-26T23:17:05Z

@SinghAsDev Nah, yours is probably easier to get through, so go ahead. I'll rebase accordingly.

guozhangwang · 2015-08-27T00:23:54Z

+1 on option 1 also.

asfbot · 2015-08-27T00:30:25Z

kafka-trunk-git-pr #233 FAILURE
Looks like there's a problem with this pull request

SinghAsDev · 2015-08-27T00:31:58Z

@hachikuji @guozhangwang I have updated the PR with the discussed approach. Let me know how it looks. I will add some unit tests for the new methods added to metadata tonight.

SinghAsDev · 2015-08-27T00:35:03Z

Ahh.. looks like I will again have to rebase. Will do that along with adding unit tests for Metadata.

hachikuji · 2015-08-27T03:25:04Z

clients/src/main/java/org/apache/kafka/clients/Metadata.java

I wonder if we should clear the topic collection when this is set. If we did that, then the change in NetworkClient would be unnecessary. Is there any reason not to?

This will be called only when a pattern is subscribed. At that time yes, what you are saying makes sense. However, in subsequent metadata updates the changes in NetworkClient is still needed as metadata will have topics that matched pattern and were added in last update. Makes sense?

Yeah, I guess that makes sense. It kind of feels weird to be tracking a topic list in Metadata that we're not going to use, however. My feeling is that maybe we shouldn't update Metadata for topics subscribed from a regex. Instead we just update metadata once in subscribe(pattern). Does that make sense?

I am a bit lost here. Even if we clear topics here, topics will be added by KafkaConsumer.onMetadataUpdate and any metadata update after that will fetch metadata for only those topics if NetworkClient's changes are removed. I think I am missing something here :)

Probably my fault. :) I was trying to suggest to not have KafkaConsumer.onMetadataUpdate modify the Metadata topic list, just the subscription topic list. So basically in KafkaConsumer.subscribe(pattern), we set the Metadata allTopics flag, and we don't modify Metadata again (unless the user changes their subscription). Does that make any sense?

This might not work as well as I thought since the metadata is also updated in partitionsFor(). I actually think it would be better for partitionsFor() to use the same approach as listTopics(), but that is out of scope for this issue.

Haha.. I actually made the changes and realized the same.. It does make it look cleaner though. Also, one more issue would be is when subscribed via pattern, metadata will always store metadata for all topics and a fetch will return metadata for all topics, not just for the topics that are actually subscribed via pattern subscription.

For now, I think we can just leave it as it is. If you think we should refactor partitionsFor() then let me know, we probably should create a separate JIRA for it though.

Refactoring partitionsFor does seem worthwhile to me since then Metadata is only modified by subscription (and metadata for subscribed topics is all we care about in a steady state). Maybe we can leave this patch as it is and plan to fix it in that JIRA.

Agreed. Created KAFKA-2506.

guozhangwang · 2015-09-09T21:37:27Z

LGTM.

About unsubscribing, I think it is useful to keep the unsubscribe() to unsubscribe any topic / partition / pattern instead of keeping a unsubscribe(Pattern) since we will only have one pattern subscribed at the same time.

…hile subscribing via pattern

SinghAsDev · 2015-09-10T00:34:26Z

@hachikuji @guozhangwang changed the behavior of unsubscribe. Up for your reviews again :)

asfbot · 2015-09-10T00:42:33Z

kafka-trunk-git-pr #385 SUCCESS
This pull request looks good

hachikuji · 2015-09-10T18:55:50Z

clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java

To be clear, we'll continue fetching metadata for all the topics that were added to Metadata directly. Is that right? If instead we cleared those topics, would the NetworkClient fetch metadata for all topics until we've set a new subscription?

I think this is a bug in the protocol that if the metadata request contains empty list, then returns the full topic list. This should be fixed in another ticket.

Another two general comments: 1) we could remove a duplicate ConsumerRebalanceListener / RebalanceListener, and then remove the Consumer parameter in the callback functions; 2) we probably can create a member Metadata.Listener inside KafkaConsumer instead of letting KafkaConsumer to implement this interface along with Consumer, since it is only used in subscribe(Pattern).

Just to not drag this ticket further I suggest we do these in later tickets. @hachikuji Does that sound good to you?

@guozhangwang Yep, sounds good to me.

@guozhangwang @hachikuji thanks for the reviews and helping in getting this committed. If there is no JIRA already I can create the JIRA and PR for fixing the issues mentioned by @guozhangwang . Let me know. Thanks!

hachikuji · 2015-09-10T20:52:15Z

@SinghAsDev I just created a ticket for removing the consumer instance from the callback. You can create a ticket for the other issue if you want.

SinghAsDev · 2015-09-11T02:18:22Z

@hachikuji I have created KAFKA-2533, will create PR shortly.

Initial implementation of the Tier Fetcher. Supports fetching a single partition at a time from tiered storage.

…#128)

…er/consumer to network client (apache#286) [LI-HOTFX] Allow passing client software name and version from producer/consumer to network client (apache#128) TICKET = N/A LI_DESCRIPTION = Add config to pass customized client name from producer/consumer to ApiVersionsRequest. Starting from kafka 2.4, brokers are able to collect clients' version and name, see KIP-511 for more details. With kafka 2.4 and linkedin-kafka-clients 10, we should make this name unique so that in future when collecting user metrics, we can distinguish supported clients from those using unsupported clients EXIT_CRITERIA = N/A Co-authored-by: Ke Hu <[email protected]> (cherry picked from commit 011ebf8) fixing merge conflicts and tests

…cer/consumer to network client (apache#286) Original commit: [LI-HOTFX] Allow passing client software name and version from producer/consumer to network client (apache#128) TICKET = N/A LI_DESCRIPTION = Add config to pass customized client name from producer/consumer to ApiVersionsRequest. Starting from kafka 2.4, brokers are able to collect clients' version and name, see KIP-511 for more details. With kafka 2.4 and linkedin-kafka-clients 10, we should make this name unique so that in future when collecting user metrics, we can distinguish supported clients from those using unsupported clients EXIT_CRITERIA = N/A Co-authored-by: Ke Hu <[email protected]> (cherry picked from commit 011ebf8) fixing merge conflicts and tests

CSMDS-564: Fix GHA runner label for unit tests (apache#74) CSMDS-885: Jenkins unit tester action - patching api4jenkins with MavenModuleSetBuild (apache#129) CSMDS-882: Updating workflow runner to redhat8-builds (apache#128)

)

hachikuji reviewed Aug 10, 2015
View reviewed changes

SinghAsDev force-pushed the KAFKA-1893 branch from 7294571 to 81b5f9d Compare August 12, 2015 22:41

hachikuji reviewed Aug 12, 2015
View reviewed changes

guozhangwang reviewed Aug 14, 2015
View reviewed changes

eribeiro reviewed Aug 17, 2015
View reviewed changes

SinghAsDev force-pushed the KAFKA-1893 branch from 63160ed to 4fbe360 Compare August 27, 2015 00:30

hachikuji reviewed Aug 27, 2015
View reviewed changes

Ashish Singh added 4 commits September 9, 2015 17:11

KAFKA-1893: Allow regex subscriptions in the new consumer

0284af5

Fix typo

7d022a4

Add method to SubscriptionState to not have to use subscribe(topic) w…

c9dd8bd

…hile subscribing via pattern

Unsubscribe(pattern) => Unsubscribe()

8c0def4

SinghAsDev force-pushed the KAFKA-1893 branch from 8abf03e to 8c0def4 Compare September 10, 2015 00:28

hachikuji reviewed Sep 10, 2015
View reviewed changes

asfgit closed this in fd12396 Sep 10, 2015

guozhangwang mentioned this pull request Sep 18, 2015

KIP-28: First patch #130

Closed

hachikuji added a commit to hachikuji/kafka that referenced this pull request Feb 20, 2017

Add relative timestamps to new message format (apache#128)

78374cb

hachikuji added a commit to hachikuji/kafka that referenced this pull request Feb 23, 2017

Add relative timestamps to new message format (apache#128)

67b72a7

hachikuji added a commit to hachikuji/kafka that referenced this pull request Feb 23, 2017

Add relative timestamps to new message format (apache#128)

6bf5c5b

hachikuji added a commit to hachikuji/kafka that referenced this pull request Mar 2, 2017

Add relative timestamps to new message format (apache#128)

ec438e2

hachikuji added a commit to hachikuji/kafka that referenced this pull request Mar 3, 2017

Add relative timestamps to new message format (apache#128)

5b97de4

hachikuji added a commit to hachikuji/kafka that referenced this pull request Mar 9, 2017

Add relative timestamps to new message format (apache#128)

bc541e9

hachikuji added a commit to hachikuji/kafka that referenced this pull request Mar 10, 2017

Add relative timestamps to new message format (apache#128)

3c7ba50

apurvam pushed a commit to apurvam/kafka that referenced this pull request Mar 15, 2017

Add relative timestamps to new message format (apache#128)

7ec9b6c

ijuma pushed a commit to ijuma/kafka that referenced this pull request Mar 22, 2017

Add relative timestamps to new message format (apache#128)

ac4982a

hachikuji added a commit to hachikuji/kafka that referenced this pull request Mar 22, 2017

Add relative timestamps to new message format (apache#128)

66af446

jsancio pushed a commit to jsancio/kafka that referenced this pull request Aug 6, 2019

CPKAFKA-1746: Implement Tier Fetcher (apache#128)

4f7a5db

Initial implementation of the Tier Fetcher. Supports fetching a single partition at a time from tiered storage.

efeg added a commit to efeg/kafka that referenced this pull request Jan 29, 2020

Fix execution proposal print format and increase concurrency. (apache…

832bbbe

…#128)

davide-armand pushed a commit to aiven/kafka that referenced this pull request Dec 1, 2025

test(inkless:controlplane): add pg to property test [INK-42] (apache#128

b03c271

)

Conversation

SinghAsDev commented Aug 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asfbot commented Aug 10, 2015

Uh oh!

benstopford commented Aug 11, 2015

Uh oh!

SinghAsDev commented Aug 12, 2015

Uh oh!

asfbot commented Aug 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asfbot commented Aug 13, 2015

Uh oh!

hachikuji commented Aug 13, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji commented Aug 17, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asfbot commented Aug 18, 2015

Uh oh!

SinghAsDev commented Aug 18, 2015

Uh oh!

hachikuji commented Aug 18, 2015

Uh oh!

guozhangwang commented Aug 26, 2015

Uh oh!

SinghAsDev commented Aug 26, 2015

Uh oh!

hachikuji commented Aug 26, 2015

Uh oh!

hachikuji commented Aug 26, 2015

Uh oh!

SinghAsDev commented Aug 26, 2015

Uh oh!

hachikuji commented Aug 26, 2015

Uh oh!

guozhangwang commented Aug 27, 2015

Uh oh!

asfbot commented Aug 27, 2015

Uh oh!

SinghAsDev commented Aug 27, 2015

Uh oh!

SinghAsDev commented Aug 27, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment