KAFKA-1695: Adding zookeeper authentication. Zookeeper acls should be set so only broker can modify zk entries with the exception of /consumers.#93
Conversation
|
The change is backward compatible, all modified functions have a default so it should not break any scripts. If a user enabled zookeeper authentication by adding "java.security.auth.login.config=some-jaas-file.jaas" where some-jaas-file.jaas has Client section, then they are essentially locking all broker zookeeper entries so only users/processes that has access to the keytab or knows the digest user/secret listed in Client section can modify those entries. Please see https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zookeeper+and+SASL Section "JAAS conf file: Kerberos authentication". |
|
kafka-trunk-git-pr #33 SUCCESS |
|
@gwenshap Given you were the original author of the jira , it would be great if you can review this. Also let me know if you think adding a SECUTIRY.md at top level which describes how kafka zookeeper interaction and kafka znodes can be secured with acls makes sense. |
|
I'll be happy to review, but - doesn't this depend on Kafka having kerberos tickets first? Otherwise we can't really check if Kafka manages to access these znode. |
|
@gwenshap "doesn't this depend on Kafka having kerberos tickets first?" is a little ambiguous. If you mean kafka security work which allows kafka server to run with a keytab and authenticates client connecting to kafka then that is not needed for this. For supporting this, zookeeper server needs to be configured to support security. In addition, Kafka brokers needs to start with -Djava.security.auth.login.config=/path/to/client/jaas/file.conf where file.conf will need to have following section: Client { // used for zookeeper connection Just by doing this we can assure all zookeeper nodes are secured with acls. So if you meant the second option where a keytab is configured to access zookeeper , then yes we need that. Another option is to use digest by specifying the following and in this case no keytabs are required: Client { I haven't tested the digest approach but don't see why it wont work. Will update the PR once I have tested with digests too. |
|
kafka-trunk-git-pr #51 SUCCESS |
|
@gwenshap Took me sometime to get the whole digest setup working. With the above specified Client section and only this patch over trunk I was able to secure zookeeper nodes. Kafka and zookeeper connections were authenticated using digests so no keytabs were required. For all nodes the acls were set as following: [zk: c6401.ambari.apache.org:2181(CONNECTED) 41]getAcl /brokers/ids One issue I found was even though the producer was able to produce successfully, consumer failed to consume until I started consumer such that it also had access to the same client section as above. This should not be required given all nodes allow anyone to read and /consumers is open for anyone to write. [zk: c6401.ambari.apache.org:2181(CONNECTED) 39] getAcl consumers Stepping through the code it seems like somehow the code gets stuck in ConsoleConsumer when it tries to check existence of some broker here . Its not throwing exceptions, its not exiting , it just gets stuck forever at that statement. Seems like some bug in ZkClient but I will have to debug further. |
|
@gwenshap so further debugging and it turned out to be an issue with my jaas. The consumer works even without specifying in client section as expected. So this change alone is sufficient to secure zookeeper entries and kafka -> zookeeper communication. |
|
Looking at the patch, there seem to be some unrelated stuff in there. I assume you'll clean it up. Should I be looking at anything except zkutils file? Zookeeper configured for security is understandable. The extra Kafka configuration (i.e. starting with an extra flag and an extra config file), was this covered in one of the security KIPs? Because I remember hearing some objections to extra configuration files but I don't remember how it got resolved... |
|
The only file to look at is ZkUtil. Cleaning everything else up. |
|
Not sure why Pull request shows the diff, these changes are from commits that were merged to trunk and I just rebased on top of them. The files other than ZkUtil are identical. |
|
@Parth-Brahmbhatt, you merged instead of rebasing. That's the reason why the commits are still there. |
|
@ijuma The commits are ok, Why does it show the file diff? Given the files are identical I would expect it to show no diff. I thought it might be a whitespace issue but adding ?w=1 still does not change the diff. I will probably have to take my 2 commits, and git --force push but then we will lost all the conversations. |
|
@Parth-Brahmbhatt, I looked at your branch more closely and the commit graph looks weird. In particular, the following commit appears twice in the graph with 2 different parents: "KAFKA-1695: Zookeeper acls should be set so only broker can modify zk entries with the exception of /consumers." The end result is that the files diff is showing the combined changes in the commits listed here (which includes some commits from trunk indeed): https://github.com/apache/kafka/pull/93/commits I think the only thing to do here is to rebase so that we end up with a nice and linear history. If you don't want to lose the comments attached to existing commits, you can push to a new branch, close this PR and open a new one linking to this one. |
… entries with the exception of /consumers.
7534ba5 to
d2dfd71
Compare
|
kafka-trunk-git-pr #126 SUCCESS |
|
@fpj, can you please take a look at this when you have a chance? |
There was a problem hiding this comment.
Code style: should be configFile. There are other instances of this in the file.
There was a problem hiding this comment.
Normally all Kafka configuration is done through the KafkaConfig classes.
|
The patch in general looks good. It creates some default acls and adds a parameter to all calls that create znodes. With this patch, however, ConsumersPatch is left wide open, and it is probably not ideal to leave it like that, but perhaps will go away with the new consumer making it less of a problem. An option to strengthen the security here is to use authentication, check it here: https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zookeeper+and+SASL |
|
@Parth-Brahmbhatt, @fpj Any suggestions on how we should test this? Some automated tests would be good. |
There was a problem hiding this comment.
put a space between if and (
|
@Parth-Brahmbhatt, there are two usual paths when one runs into a situation like this:
For option 1, we could have a class parameter called For option 2, there would be an implicit parameter in every method with the acls. Due to the way implicits work, you'd want to introduce a new type for it (e.g Before we go in either direction, it may be worth discussing this with another committer (maybe @junrao and/or @gwenshap). In my opinion, reading a property inside a Scala object to control behaviour is something to be avoided as it makes testing much harder as can be seen here. |
|
@ijuma @Parth-Brahmbhatt Note that I've split the list of parameters into two, so that I could change the default ACL. I don't find the syntax super cute, but it works. To instantiate But, let's go by parts. First, I'm interested in knowing if you find the above approach acceptable. Second, even without the approach above, how are you planning on enforcing that the calls via command line are executed securely? Hope it makes sense. |
|
@fpj, I like the idea of passing a Good question about the admin package. I can look into it, but perhaps @junrao knows. |
|
for the admin package part, I was assuming that we will just modify https://github.com/apache/kafka/blob/trunk/bin/kafka-run-class.sh so if user sets KAFKA_JAAS param we will set the java.security.auth.login.config to the value of that param. If they forget to set the environment variable the command line script will fail indicating it could not write to zookeeper because of authorization issue as long as the brokers them self were started with correct environment variables at least once. The ZkClientContext change seems very invasive given we will have to change every caller and I don't quite understand how it is much different from making all callers pass a acls param + making getDefaultAcl a method that dynamically gets the acl each time by evaluating isSecure. May be I just don't understand the proposal well enough. |
I don't understand why the script would fail. Not setting the parameter implicitly means that security is off as I'm reading this patch. Also, I'm thinking that a user could intentionally not set the parameter to run the admin command in a non-secure fashion. Could you clarify if this is being taken care of and how, please?
|
|
Sorry for not explaining it in a better way. I was trying to say that if the brokers were started with correct parameter , by setting KAFKA_JAAS environment variable, on startup it should set the correct acls as part of makeSurePersistentPathExists, unless the brokers are being moved from an unsecure environment to a secure environment in which case all the paths will already exist. To handle the case where all paths already exists I need to add makeSureCorrectAcls are set which was not added in the original PR as my pr to get/set acl in zkCli was only available in zkClient-0.6 onwards which is now merged into kafka. Once these acls are set ocrrectly, even if a client script that needs to write to zookeeper gets started without KAFKA_JAAS variable it will fail because zookeeper should reject any write request that is not authorized. Does that make sense? |
|
@Parth-Brahmbhatt About the |
|
@Parth-Brahmbhatt, as @fpj is saying, the fact that we read a system property in order to define the I previously outlined the options as I saw it:
@fpj's proposal is a variation of two that moves If my arguments have not been convincing enough, then we could keep the singleton and change the |
|
The current patch seems to work well for a new cluster. If ZK credential is set properly in the brokers, the top level paths will be created to only allow authenticated clients to create child nodes. If some one runs the acl cli w/o ZK credential, the command will fail. If there is already an existing cluster, we have to think through the upgrade path. I am not sure adding the support of makeSureCorrectAcls in ZkUtils is ideal. Suppose you do that and do a rolling upgrade. After the first broker is upgraded, the acl for existing ZK paths will be set and the rest of the brokers won't be able to write to ZK any more. This can be problematic since the rest of the brokers may need to write to ZK for doing controller related stuff, for changing ISRs, for creating new topics, etc. One alternative is the following. We introduce a broker config set.zookeeper.acl, which defaults to false. ZK acls will only be set if set.zookeeper.acl is true. We also make a separate ZKSecurityMigrationTool that will take the ZK credential and set the acl on all existing ZK paths managed by Kafka. To do the upgrade, we (1) rolling restart all Kafka brokers to pass in the ZK credential (after this step, all brokers can read from ZK if acl is set, but they are not setting the acl yet); (2) do a second rolling restart of all brokers by setting set.zookeeper.acl to true (after this step, new ZK paths will be created with acl); (3) run ZKSecurityMigrationTool to set acl on all existing ZK paths. After this point, ZK is secured. Nobody can change the data in ZK w/o ZK credential. Will that work? If this works, we will probably need to make ZkUtils a class so that we can pass in the config set.zookeeper.acl. |
|
Jun, you are right, I did not consider the rolling upgrade scenario. The steps you described will work. If we make ZkUtils a class I guess we will have to change a lot of code but it will also address the concerns that @ijuma raised? |
|
@Parth-Brahmbhatt @junrao @ijuma On the issue of rolling upgrades (good point!), the approach described sounds good, but I think we can avoid the second rolling start by using a |
|
@fpj So far, we only store topic/clientId level configs in ZK. All broker level configs are stored in broker configuration file. While having a /kafka/issecured ZK path is also fine, for consistency, it's probably better to stick with a broker config. Also, just to make sure. If a ZK client is authenticated, but doesn't set acl when creating a ZK path, any client can read/write this path, right? |
|
@junrao As long as the acls are set with "anyone can read/write/create/delete" any authenticated or unauthenticated client will be able to perform that operation. @fpj I don't really have anything against introducing the zkContext and if I understand correctly with that approach, ZkUtil will still be a scala "object" and we will just replace zkClient param in all the methods with zkContext. Makes sense to me. |
|
@Parth-Brahmbhatt If the acl is not provided when creating a path, does it get the "anyone has full permission" acl by default? Also, it seems that in this patch, we only need to set the ACL when creating a ZK path. It doesn't seem that we need to set it during an update. |
|
@Parth-Brahmbhatt I had sometime today (I'm in a different time zone) and I end up working on the changes to make ZkUtils a class. @ijuma and I had a short conversation offline and ended up concluding that it is an approach more common in the current code base, implicits aren't typically used. But, feel free to disagree. The patch I'm working on is based on yours, and if you're ok with it, I'd like to propose a new PR with these changes. It also includes all the recent commits, so it isn't stale. Let me know if you're ok with it, please. You've done all the work so far and I don't want you to feel that I'm interfering with your progress.
Ok, let me think about this, I really don't like the idea of two rolling starts, but maybe the alternatives aren't great.
You always need to pass an acl parameter, even if the parameter is open unsafe. If it is open unsafe, then yeah, anyone can read/write/add children. |
|
Ok, thanks @Parth-Brahmbhatt. I should be able to create an initial PR tomorrow. I have the main code compiling, but not tests. I fix the tests before I push the initial PR. |
|
Can we please close this PR since ZK authentication was integrated via #303? |
…reams-tech-preview Backport recent changes from trunk/streams: batch #3
This adds additional information to the quota logs, the bounds which are configured are also logged. When enforcement occurs there is an existing log, this is changed to info (from debug). Finally, many of the quota logs contain usless information (NaN) or (0.0) value entries. Logging for these entries is skipped entirely. TICKET = LI_DESCRIPTION = LIKAFKA-32112, EXIT_CRITERIA = MANUAL this is not going to merged with upstream since it change levels and I assume the original changes to dump all the quota metrics is not in upstream this too will probably be rejected from upstream.
This adds additional information to the quota logs, the bounds which are configured are also logged. When enforcement occurs there is an existing log, this is changed to info (from debug). Finally, many of the quota logs contain usless information (NaN) or (0.0) value entries. Logging for these entries is skipped entirely. TICKET = LI_DESCRIPTION = LIKAFKA-32112, EXIT_CRITERIA = MANUAL this is not going to merged with upstream since it change levels and I assume the original changes to dump all the quota metrics is not in upstream this too will probably be rejected from upstream.
This adds additional information to the quota logs, the bounds which are configured are also logged. When enforcement occurs there is an existing log, this is changed to info (from debug). Finally, many of the quota logs contain usless information (NaN) or (0.0) value entries. Logging for these entries is skipped entirely. TICKET = LI_DESCRIPTION = LIKAFKA-32112, EXIT_CRITERIA = MANUAL this is not going to merged with upstream since it change levels and I assume the original changes to dump all the quota metrics is not in upstream this too will probably be rejected from upstream.
This adds additional information to the quota logs, the bounds which are configured are also logged. When enforcement occurs there is an existing log, this is changed to info (from debug). Finally, many of the quota logs contain usless information (NaN) or (0.0) value entries. Logging for these entries is skipped entirely. TICKET = LI_DESCRIPTION = LIKAFKA-32112, EXIT_CRITERIA = MANUAL this is not going to merged with upstream since it change levels and I assume the original changes to dump all the quota metrics is not in upstream this too will probably be rejected from upstream.
…) (apache#93) Reviewers: Andrew Choi <[email protected]>, Luke Chen <[email protected]> Co-authored-by: dengziming <[email protected]>
…pache#93) Co-authored-by: placzkodobos <[email protected]> Co-authored-by: Viktor Somogyi-Vass <[email protected]> Co-authored-by: Bertalan Kondrat <[email protected]>
No description provided.