Skip to content

KAFKA-1695: Adding zookeeper authentication. Zookeeper acls should be set so only broker can modify zk entries with the exception of /consumers.#93

Closed
Parth-Brahmbhatt wants to merge 4 commits intoapache:trunkfrom
Parth-Brahmbhatt:KAFKA-1695

Conversation

@Parth-Brahmbhatt
Copy link
Contributor

No description provided.

@Parth-Brahmbhatt Parth-Brahmbhatt changed the title Kafka-1682: Adding zookeeper authentication. Zookeeper acls should be set so only broker can modify zk entries with the exception of /consumers. KAFKA-1695: Adding zookeeper authentication. Zookeeper acls should be set so only broker can modify zk entries with the exception of /consumers. Jul 23, 2015
@Parth-Brahmbhatt
Copy link
Contributor Author

The change is backward compatible, all modified functions have a default so it should not break any scripts.

If a user enabled zookeeper authentication by adding "java.security.auth.login.config=some-jaas-file.jaas" where some-jaas-file.jaas has Client section, then they are essentially locking all broker zookeeper entries so only users/processes that has access to the keytab or knows the digest user/secret listed in Client section can modify those entries. Please see https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zookeeper+and+SASL Section "JAAS conf file: Kerberos authentication".

@asfbot
Copy link

asfbot commented Jul 23, 2015

kafka-trunk-git-pr #33 SUCCESS
This pull request looks good

@Parth-Brahmbhatt
Copy link
Contributor Author

@gwenshap Given you were the original author of the jira , it would be great if you can review this. Also let me know if you think adding a SECUTIRY.md at top level which describes how kafka zookeeper interaction and kafka znodes can be secured with acls makes sense.

@gwenshap
Copy link
Contributor

I'll be happy to review, but - doesn't this depend on Kafka having kerberos tickets first? Otherwise we can't really check if Kafka manages to access these znode.

@Parth-Brahmbhatt
Copy link
Contributor Author

@gwenshap "doesn't this depend on Kafka having kerberos tickets first?" is a little ambiguous. If you mean kafka security work which allows kafka server to run with a keytab and authenticates client connecting to kafka then that is not needed for this.

For supporting this, zookeeper server needs to be configured to support security.

In addition, Kafka brokers needs to start with -Djava.security.auth.login.config=/path/to/client/jaas/file.conf where file.conf will need to have following section:

Client { // used for zookeeper connection
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/etc/security/keytabs/kafka.service.keytab"
storeKey=true
useTicketCache=false
serviceName="zookeeper"
principal="kafka/[email protected]";
};

Just by doing this we can assure all zookeeper nodes are secured with acls. So if you meant the second option where a keytab is configured to access zookeeper , then yes we need that. Another option is to use digest by specifying the following and in this case no keytabs are required:

Client {
org.apache.zookeeper.server.auth.DigestLoginModule required
username="bob"
password="bobsecret";
};

I haven't tested the digest approach but don't see why it wont work. Will update the PR once I have tested with digests too.

@asfbot
Copy link

asfbot commented Jul 24, 2015

kafka-trunk-git-pr #51 SUCCESS
This pull request looks good

@Parth-Brahmbhatt
Copy link
Contributor Author

@gwenshap Took me sometime to get the whole digest setup working. With the above specified Client section and only this patch over trunk I was able to secure zookeeper nodes. Kafka and zookeeper connections were authenticated using digests so no keytabs were required.

For all nodes the acls were set as following:

[zk: c6401.ambari.apache.org:2181(CONNECTED) 41]getAcl /brokers/ids
'world,'anyone
: r
'sasl,'bob
: cdrwa
[zk: c6401.ambari.apache.org:2181(CONNECTED) 35] getAcl /brokers/ids/0
'world,'anyone
: r
'sasl,'bob
: cdrwa

One issue I found was even though the producer was able to produce successfully, consumer failed to consume until I started consumer such that it also had access to the same client section as above. This should not be required given all nodes allow anyone to read and /consumers is open for anyone to write.

[zk: c6401.ambari.apache.org:2181(CONNECTED) 39] getAcl consumers
'world,'anyone
: cdrwa

Stepping through the code it seems like somehow the code gets stuck in ConsoleConsumer when it tries to check existence of some broker here

if (!checkZkPathExists(options.valueOf(zkConnectOpt),"/brokers/ids")) {
. Its not throwing exceptions, its not exiting , it just gets stuck forever at that statement. Seems like some bug in ZkClient but I will have to debug further.

@Parth-Brahmbhatt
Copy link
Contributor Author

@gwenshap so further debugging and it turned out to be an issue with my jaas. The consumer works even without specifying in client section as expected.

So this change alone is sufficient to secure zookeeper entries and kafka -> zookeeper communication.

@gwenshap
Copy link
Contributor

Looking at the patch, there seem to be some unrelated stuff in there.

I assume you'll clean it up. Should I be looking at anything except zkutils file?

Zookeeper configured for security is understandable.

The extra Kafka configuration (i.e. starting with an extra flag and an extra config file), was this covered in one of the security KIPs? Because I remember hearing some objections to extra configuration files but I don't remember how it got resolved...

@Parth-Brahmbhatt
Copy link
Contributor Author

The only file to look at is ZkUtil. Cleaning everything else up.

@Parth-Brahmbhatt
Copy link
Contributor Author

Not sure why Pull request shows the diff, these changes are from commits that were merged to trunk and I just rebased on top of them. The files other than ZkUtil are identical.

@ijuma
Copy link
Member

ijuma commented Jul 27, 2015

@Parth-Brahmbhatt, you merged instead of rebasing. That's the reason why the commits are still there.

@Parth-Brahmbhatt
Copy link
Contributor Author

@ijuma The commits are ok, Why does it show the file diff? Given the files are identical I would expect it to show no diff. I thought it might be a whitespace issue but adding ?w=1 still does not change the diff. I will probably have to take my 2 commits, and git --force push but then we will lost all the conversations.

@ijuma
Copy link
Member

ijuma commented Jul 28, 2015

@Parth-Brahmbhatt, I looked at your branch more closely and the commit graph looks weird. In particular, the following commit appears twice in the graph with 2 different parents:

"KAFKA-1695: Zookeeper acls should be set so only broker can modify zk entries with the exception of /consumers."

The end result is that the files diff is showing the combined changes in the commits listed here (which includes some commits from trunk indeed):

https://github.com/apache/kafka/pull/93/commits

I think the only thing to do here is to rebase so that we end up with a nice and linear history. If you don't want to lose the comments attached to existing commits, you can push to a new branch, close this PR and open a new one linking to this one.

@asfbot
Copy link

asfbot commented Aug 11, 2015

kafka-trunk-git-pr #126 SUCCESS
This pull request looks good

@ijuma
Copy link
Member

ijuma commented Aug 11, 2015

@fpj, can you please take a look at this when you have a chance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: should be configFile. There are other instances of this in the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally all Kafka configuration is done through the KafkaConfig classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ijuma Fixed. @gwenshap Please see comment below.

@fpj
Copy link
Contributor

fpj commented Aug 24, 2015

The patch in general looks good. It creates some default acls and adds a parameter to all calls that create znodes. With this patch, however, ConsumersPatch is left wide open, and it is probably not ideal to leave it like that, but perhaps will go away with the new consumer making it less of a problem. An option to strengthen the security here is to use authentication, check it here:

https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zookeeper+and+SASL

@ijuma
Copy link
Member

ijuma commented Aug 24, 2015

@Parth-Brahmbhatt, @fpj Any suggestions on how we should test this? Some automated tests would be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a space between if and (

@ijuma
Copy link
Member

ijuma commented Sep 4, 2015

@Parth-Brahmbhatt, there are two usual paths when one runs into a situation like this:

  1. Convert the object into a class so that parameters can be passed during instantiation
  2. Use an implicit parameter in the object methods to pass a context object implicitly

For option 1, we could have a class parameter called defaultAcls and the code that did the read of the system property, parsing and so on would be elsewhere. This separation is a nice bonus. Sadly, I think this change would affect quite a few files. I am not aware of an automated refactoring tool for this, but it would be worth checking.

For option 2, there would be an implicit parameter in every method with the acls. Due to the way implicits work, you'd want to introduce a new type for it (e.g ZkAcls). It involves some repetition as it needs to be part of every method. It could perhaps replace the acls parameter though. In order to make this implicit automatically available, you could add the default implementation in the companion object of ZkAcls (the compiler will look there if it can't find one in scope). In a test, you can create a different implicit and put it in scope so that it's used before the one in the companion object. I think this way may be less invasive although it's less clear than option 1, especially for someone who is not very familiar with Scala's implicit resolution rules.

Before we go in either direction, it may be worth discussing this with another committer (maybe @junrao and/or @gwenshap). In my opinion, reading a property inside a Scala object to control behaviour is something to be avoided as it makes testing much harder as can be seen here.

@fpj
Copy link
Contributor

fpj commented Oct 9, 2015

@ijuma @Parth-Brahmbhatt
I had another look at this PR, and I was trying to implement the following approach. For methods that create znodes in ZkUtils, instead of passing an instance of ZkClient I pass an instance of ZkClientContext, which contains an instance of ZkClient, the implementation of isSecure, and DefaultAcls (I moved the latter two from ZkUtils). The calls in ZkUtils ended up looking like this:

private def createParentPath(client: ZkClientContext, path: String)(acls: List[ACL] = client.DefaultAcls): Unit

Note that I've split the list of parameters into two, so that I could change the default ACL. I don't find the syntax super cute, but it works. To instantiate ZkClientContext, I require a parameter for the java.security.auth.login.config property so that isSecure executes properly. As I was trying to do this, a problem I came across is that in thekafka.admin package, we are making calls to ZkUtils due to command line execution. In such cases, it isn't clear how we will enforce that the command sets the login config property so that we declare the calls to ZkUtils secure. In general, the main problem I came across as I was making changes was exactly determining what the current configuration is.

But, let's go by parts. First, I'm interested in knowing if you find the above approach acceptable. Second, even without the approach above, how are you planning on enforcing that the calls via command line are executed securely?

Hope it makes sense.

@ijuma
Copy link
Member

ijuma commented Oct 9, 2015

@fpj, I like the idea of passing a ZkClientContext that includes the authentication information. You could use an implicit parameter to make it even easier to use, potentially. Having the acls as a separate parameter list is a bit annoying because one can't just exclude the second parameter list.

Good question about the admin package. I can look into it, but perhaps @junrao knows.

@Parth-Brahmbhatt
Copy link
Contributor Author

for the admin package part, I was assuming that we will just modify https://github.com/apache/kafka/blob/trunk/bin/kafka-run-class.sh so if user sets KAFKA_JAAS param we will set the java.security.auth.login.config to the value of that param. If they forget to set the environment variable the command line script will fail indicating it could not write to zookeeper because of authorization issue as long as the brokers them self were started with correct environment variables at least once.

The ZkClientContext change seems very invasive given we will have to change every caller and I don't quite understand how it is much different from making all callers pass a acls param + making getDefaultAcl a method that dynamically gets the acl each time by evaluating isSecure. May be I just don't understand the proposal well enough.

@fpj
Copy link
Contributor

fpj commented Oct 9, 2015

@Parth-Brahmbhatt

If they forget to set the environment variable the command line script will fail indicating it could not
write to zookeeper because of authorization issue as long as the brokers them self were started with
correct environment variables at least once.

I don't understand why the script would fail. Not setting the parameter implicitly means that security is off as I'm reading this patch. Also, I'm thinking that a user could intentionally not set the parameter to run the admin command in a non-secure fashion. Could you clarify if this is being taken care of and how, please?

The ZkClientContext change seems very invasive
It indeed requires to change all callers, and I won't say it isn't a pain. But, the intention is to avoid having to determine values for isSecureand DefaultAcls in a static object.

@Parth-Brahmbhatt
Copy link
Contributor Author

Sorry for not explaining it in a better way. I was trying to say that if the brokers were started with correct parameter , by setting KAFKA_JAAS environment variable, on startup it should set the correct acls as part of makeSurePersistentPathExists, unless the brokers are being moved from an unsecure environment to a secure environment in which case all the paths will already exist. To handle the case where all paths already exists I need to add makeSureCorrectAcls are set which was not added in the original PR as my pr to get/set acl in zkCli was only available in zkClient-0.6 onwards which is now merged into kafka.

Once these acls are set ocrrectly, even if a client script that needs to write to zookeeper gets started without KAFKA_JAAS variable it will fail because zookeeper should reject any write request that is not authorized. Does that make sense?

@fpj
Copy link
Contributor

fpj commented Oct 9, 2015

@Parth-Brahmbhatt
The default with security on is CREATOR_ALL which means that only zk clients with the right credentials will be able to create children in the kafka subtree or modify nodes in that tree. If the client running something from the command line doesn't pass the right credentials, then the default will be OPEN, but it doesn't matter because of the acls already in zookeeper. Is that a correct assessment? Will you add makeSureCorrectAcls?

About the ZkClientContext discussion, I'd like to reach an agreement on what to do about the fact that we are reading a configuration parameter from a static object. What I proposed does increase the number of changes and the files it touches. The current PR is much nicer in this sense because all changes are in ZkUtils. But, it sounds like we need a better way of controlling the behavior of the calls in ZkUtils. I'm not sure we can do it without changes in other classes, but I'm open to suggestions. Also, check the comment from @ijuma about the implicit parameter, maybe you find that option better.

@ijuma
Copy link
Member

ijuma commented Oct 10, 2015

@Parth-Brahmbhatt, as @fpj is saying, the fact that we read a system property in order to define the isSecure val in a Scala object (which is a JVM-wide singleton) is not great because it makes testing harder. ZkUtils is meant to be a place for utility methods with no state (see https://code.google.com/p/google-singleton-detector/ for reasons why one should avoid singletons).

I previously outlined the options as I saw it:

  1. Convert the object into a class so that parameters can be passed during instantiation
  2. Use an implicit parameter in the object methods to pass a context object implicitly

@fpj's proposal is a variation of two that moves ZkClient into the context object (which makes sense) and doesn't make the parameter implicit (which is up for discussion I think).

If my arguments have not been convincing enough, then we could keep the singleton and change the isSecured val to a var that could be changed from tests only. I personally seriously dislike this approach, but I mention it because @junrao said that it could be OK if the ability to change isSecured was only needed for tests.

@junrao
Copy link
Contributor

junrao commented Oct 10, 2015

The current patch seems to work well for a new cluster. If ZK credential is set properly in the brokers, the top level paths will be created to only allow authenticated clients to create child nodes. If some one runs the acl cli w/o ZK credential, the command will fail.

If there is already an existing cluster, we have to think through the upgrade path. I am not sure adding the support of makeSureCorrectAcls in ZkUtils is ideal. Suppose you do that and do a rolling upgrade. After the first broker is upgraded, the acl for existing ZK paths will be set and the rest of the brokers won't be able to write to ZK any more. This can be problematic since the rest of the brokers may need to write to ZK for doing controller related stuff, for changing ISRs, for creating new topics, etc.

One alternative is the following. We introduce a broker config set.zookeeper.acl, which defaults to false. ZK acls will only be set if set.zookeeper.acl is true. We also make a separate ZKSecurityMigrationTool that will take the ZK credential and set the acl on all existing ZK paths managed by Kafka. To do the upgrade, we (1) rolling restart all Kafka brokers to pass in the ZK credential (after this step, all brokers can read from ZK if acl is set, but they are not setting the acl yet); (2) do a second rolling restart of all brokers by setting set.zookeeper.acl to true (after this step, new ZK paths will be created with acl); (3) run ZKSecurityMigrationTool to set acl on all existing ZK paths. After this point, ZK is secured. Nobody can change the data in ZK w/o ZK credential. Will that work? If this works, we will probably need to make ZkUtils a class so that we can pass in the config set.zookeeper.acl.

@Parth-Brahmbhatt
Copy link
Contributor Author

Jun, you are right, I did not consider the rolling upgrade scenario. The steps you described will work. If we make ZkUtils a class I guess we will have to change a lot of code but it will also address the concerns that @ijuma raised?

@fpj
Copy link
Contributor

fpj commented Oct 12, 2015

@Parth-Brahmbhatt @junrao @ijuma
I don't have a very strong preference between creating a context class or making ZkUtils a class, you can solve the problem of making it easier to test both ways. However, given the current structure of ZkUtils, I find it more elegant to introduce an extended zk context rather than making them all class methods. But again, I believe we can get the same functionality we need both ways and the amount of changes seems to be roughly the same.

On the issue of rolling upgrades (good point!), the approach described sounds good, but I think we can avoid the second rolling start by using a /kafka/issecured znode or some other name like that. During the rolling restart, we set set.zookeeper.acl on all brokers and they set a watch on that znode. Once the znode is created, brokers start using credentials. That same znode could serve as a general mechanism to indicate that a Kafka cluster is secured. Such a znode can be created either by the external tool (ZKSecurityMigrationTool) or by the last broker in the sequence of the rolling restart.

@junrao
Copy link
Contributor

junrao commented Oct 12, 2015

@fpj So far, we only store topic/clientId level configs in ZK. All broker level configs are stored in broker configuration file. While having a /kafka/issecured ZK path is also fine, for consistency, it's probably better to stick with a broker config.

Also, just to make sure. If a ZK client is authenticated, but doesn't set acl when creating a ZK path, any client can read/write this path, right?

@Parth-Brahmbhatt
Copy link
Contributor Author

@junrao As long as the acls are set with "anyone can read/write/create/delete" any authenticated or unauthenticated client will be able to perform that operation.

@fpj I don't really have anything against introducing the zkContext and if I understand correctly with that approach, ZkUtil will still be a scala "object" and we will just replace zkClient param in all the methods with zkContext. Makes sense to me.

@junrao
Copy link
Contributor

junrao commented Oct 12, 2015

@Parth-Brahmbhatt If the acl is not provided when creating a path, does it get the "anyone has full permission" acl by default?

Also, it seems that in this patch, we only need to set the ACL when creating a ZK path. It doesn't seem that we need to set it during an update.

@fpj
Copy link
Contributor

fpj commented Oct 12, 2015

@Parth-Brahmbhatt I had sometime today (I'm in a different time zone) and I end up working on the changes to make ZkUtils a class. @ijuma and I had a short conversation offline and ended up concluding that it is an approach more common in the current code base, implicits aren't typically used. But, feel free to disagree.

The patch I'm working on is based on yours, and if you're ok with it, I'd like to propose a new PR with these changes. It also includes all the recent commits, so it isn't stale. Let me know if you're ok with it, please. You've done all the work so far and I don't want you to feel that I'm interfering with your progress.

@junrao

While having a /kafka/issecured ZK path is also fine, for consistency, it's probably better to stick with a broker config.

Ok, let me think about this, I really don't like the idea of two rolling starts, but maybe the alternatives aren't great.

If a ZK client is authenticated, but doesn't set acl when creating a ZK path, any client can read/write this path, right?

You always need to pass an acl parameter, even if the parameter is open unsafe. If it is open unsafe, then yeah, anyone can read/write/add children.

@Parth-Brahmbhatt
Copy link
Contributor Author

@junrao by default all nodes have "anyone has full permission".

@fpj I don't really have a preference one way or another. If you have a patch with ZkUtils as a class feel free to post a PR. Thanks for helping out.

@fpj
Copy link
Contributor

fpj commented Oct 12, 2015

Ok, thanks @Parth-Brahmbhatt. I should be able to create an initial PR tomorrow. I have the main code compiling, but not tests. I fix the tests before I push the initial PR.

@ijuma
Copy link
Member

ijuma commented Jan 20, 2016

Can we please close this PR since ZK authentication was integrated via #303?

@stumped2 stumped2 closed this Feb 2, 2016
guozhangwang added a commit to guozhangwang/kafka that referenced this pull request Mar 3, 2016
…reams-tech-preview

Backport recent changes from trunk/streams: batch #3
xiowu0 pushed a commit to xiowu0/kafka that referenced this pull request Dec 3, 2020
This adds additional information to the quota logs, the bounds which are configured
are also logged.  When enforcement occurs there is an existing log, this is changed
to info (from debug).  Finally, many of the quota logs contain usless information
(NaN) or (0.0) value entries.  Logging for these entries is skipped entirely.

TICKET =
LI_DESCRIPTION = LIKAFKA-32112,
EXIT_CRITERIA = MANUAL this is not going to merged with upstream since it change
levels and I assume the original changes to dump all the quota metrics is not in upstream
this too will probably be rejected from upstream.
wyuka pushed a commit to wyuka/kafka that referenced this pull request Mar 4, 2022
This adds additional information to the quota logs, the bounds which are configured
are also logged.  When enforcement occurs there is an existing log, this is changed
to info (from debug).  Finally, many of the quota logs contain usless information
(NaN) or (0.0) value entries.  Logging for these entries is skipped entirely.

TICKET =
LI_DESCRIPTION = LIKAFKA-32112,
EXIT_CRITERIA = MANUAL this is not going to merged with upstream since it change
levels and I assume the original changes to dump all the quota metrics is not in upstream
this too will probably be rejected from upstream.
wyuka pushed a commit to wyuka/kafka that referenced this pull request Mar 28, 2022
This adds additional information to the quota logs, the bounds which are configured
are also logged.  When enforcement occurs there is an existing log, this is changed
to info (from debug).  Finally, many of the quota logs contain usless information
(NaN) or (0.0) value entries.  Logging for these entries is skipped entirely.

TICKET =
LI_DESCRIPTION = LIKAFKA-32112,
EXIT_CRITERIA = MANUAL this is not going to merged with upstream since it change
levels and I assume the original changes to dump all the quota metrics is not in upstream
this too will probably be rejected from upstream.
wyuka pushed a commit to wyuka/kafka that referenced this pull request Jun 16, 2022
This adds additional information to the quota logs, the bounds which are configured
are also logged.  When enforcement occurs there is an existing log, this is changed
to info (from debug).  Finally, many of the quota logs contain usless information
(NaN) or (0.0) value entries.  Logging for these entries is skipped entirely.

TICKET =
LI_DESCRIPTION = LIKAFKA-32112,
EXIT_CRITERIA = MANUAL this is not going to merged with upstream since it change
levels and I assume the original changes to dump all the quota metrics is not in upstream
this too will probably be rejected from upstream.
rustd pushed a commit to rustd/pranavfinaldemokafka that referenced this pull request Feb 9, 2024
k0b3rIT added a commit to k0b3rIT/kafka that referenced this pull request Mar 24, 2025
…pache#93)

Co-authored-by: placzkodobos <[email protected]>
Co-authored-by: Viktor Somogyi-Vass <[email protected]>
Co-authored-by: Bertalan Kondrat <[email protected]>
davide-armand pushed a commit to aiven/kafka that referenced this pull request Dec 1, 2025
fvaleri added a commit to fvaleri/kafka that referenced this pull request Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants