[RFC] Add proxy users for distributed queries#11498
[RFC] Add proxy users for distributed queries#11498azat wants to merge 5 commits intoClickHouse:masterfrom
Conversation
|
@abyss7 @filimonov can you take a look? This is more "complete" replacement for #11391 |
|
@abyss7 did you have a chance to look? (I'm curious about your thoughts on the idea in general) |
May be combine them |
|
And definitely need an opinion of @vitlibar |
|
Regarding our talk about the permissions of proxy users. It still not clear for me if we should allow proxying per user or globally for all users. The use case when we need to limit the list of users in which name that proxy can talk is still not very clear for me. Let's imagine: We have user Alice logged in on node1. If Alice asks for local resources everything works as before. If Alice want to go to another cluster node (node2) data then:
If Trudy (bad guy) wants to execute the selects pretending that he's Alice, he will first need either to connect to node1 as Alice, either to connect to node2 with Faythe credentials and start running queries in the name of Alice. In both scenarios, he needs to steal the secrets (credentials of Alice or credentials of Faythe) first. If we expect that Trudy can get secrets - we can't build any safe system. Faythe actually can have only single permission (proxy = talking in the name of other users), she can have zero access to data. That pairing (Faythe can talk in the name of Alice, but he can't talk in the name of Bob) sound safer, but introduces more complexity, and duplicates an ACL / RBAC functionality - i.e. if we want to limit access of Bob to distributed queries we can create it on node1 but not on node2 or set an ACL for Bob on a Distributed table, or set an ACL for Bob on the underlying table. Having one more option - Bob exists and can access table but Faythe can't talk in his name doesn't have too much sense, because otherwise Bob can just connect to every single node in the cluster in the loop directly (w/o Faythe proxy) and do what he wants. If we really need to make it very granular / expect the use case on non-homogeneous clusters (when users and their permissions differ significantly between nodes) we actually may need some mapping of users on node2 to users on node1 etc. So in that case task is more complex and it will have a lot of corner cases (which still can't be covered/resolved by But maybe I'm wrong, and missing something. |
For me to, but I guess that there can be some potential use cases. For example, suppose you have "local" user (the user who executes remote query) and "proxy" user (the user which is used to login from one clickhouse server to another, i.e. And let's assume that "proxy" user has only Thoughts? @vitlibar @abyss7 @filimonov |
My expectations is that when cluster admin create user |
That just to enforce identifucation, i.e. to forbid queries from proxy user if he don't send in which name he want to execute the command. Actually we can make it a bit more secure by adding smth like a signature. I.e. |
|
Ok, I agree that additional
Good point
Yes, something like this can be added, later. |
For this the following had been added: - CREATE USER proxied_user PROXY proxy_user (covered) - ALTER USER proxied_user [ADD|REMOVE] PROXY proxy_user (covered) - GRANT PROXY ON *.* TO prooxy_user (covered) - REVOKE PROXY ON *.* FROM proxy_user (covered) - allowed_proxy_user config directive - allowed_proxy_users into system.users table - proxy_user in remote_servers (covered) Where: - proxy_user -- user that allows to proxy - proxied_user -- update list of users that are alloewd to proxy to it Left: - documentation - more testing - remote() support - exception constructor via fmt v2: rename setting to proxied_user to avoid oerlaps with client CLI v3: rebase and fix conflicts
|
If I understand the problem correctly you'd like to use I mean this option becomes an alternative to |
Thanks! So, that was my initial way (like #11391 but it does not have flag to switch, and user/password is still there plus it has some issues) But there are some issues with connecting via original user/password:
Personally I prefer use initial user credentials to login to remote servers over proxy user, and I can address all the issues above (that are issues right now, i.e. everything except LDAP), but before I will start another version of this PR it is better to came to the agreement on this |
As you said it seems it can be solved. Separate connection pools sound better than reconnections at this point.
For the currently available authentication methods it shouldn't be problem at all. For new authentication methods it might be of course.
You can store any part of the context you need to perform asynchronous INSERT beforehand. This issue doesn't seem to be difficult to solve.
Why is it similar? Proxy users in MySQL seem to be a different thing, it's related more likely to authentication in general than to access rights on other nodes while we're executing distributed queries. |
100 users and cluster 20 shards x 2 replicas = pool of 4000 connections (outcoming), and the ~ same incoming. 100 users is not a big deal in a big company. 20 shards too.
It is similar idea. But used another way, which sounds like fitting well to existing clickhouse concepts. in genral the idea is the following: server which are the part of cluster should be able to 'talk' each other. They should not need to use some external user creds to connect each other (they may need that connection out of the context of user queries). They should share the common secret. For the user only single login should happen after that he should have full access to resources he allowed to access. Comparing to MySQL i would point the concept of replication setup: |
|
@filimonov Ok, it sounds reasonable, thanks for the explanation. |
|
I have another idea. Cluster nodes check two filters for row policies - both for cluster's user and for initial user. It seems all we have to do to solve this task is to check access rights in the same way, i.e. provide access if both initial user and cluster's user on that node have corresponding grants. |
Interesting, i was not aware of that.
That means that cluster user should have a superset of rights of all possible initial_users (i think in most use cases it will end up with super user/root rights, and that will lead to the situation when we check grants for 2 users, one of them anyway have rights for everything). That would prevent the possibility to raise the permissions (i.e. bad user can't get more rights by sending another initial_user header) but it still leaves the hole in user identification/trackability possibilities. I.e. If users Alice and Bob both work in the same departament and have the same rights Alice could pretend that she is Bob and drop some tables, and it will be impossible (or hard) to track such situation in the logs (Bob could be fired by angry boss before Bob will able to prove that it was Alice). So generally that sounds like admin should be able to decide if particular user is allowed to set up / use initial_user header. That corresponds to that PROXY grant introduced here.
Generally in that idea i like that it reuse existing concept of initial_user instead of introducing new concept (proxied_user) but it sounds like it may be more complicated to maintain compatibility with existing code and bit harder for the version upgrade. But it sounds solvable - may be we can give proxy grant for all users initially for backward compatibility, and allow to change that (smth like BTW: one of the things we need to think of / clarify is the 'transitivity' of initial_user. Distributed over Distributed, or just a user passing the initial_user by manipulating headers. I didn't think about that yet, but that subject will definitely pop up. |
Yes, but there were worries (from @abyss7 ) even about storing password in RAM, so storing password on disk looks questionable.
This is still not enough, since most of the time proxy user will have all grants, and since right now you can send another proxied_user it is not safe (
initial_user (it is the initiator of the Distributed queyr) is not the same as proxied_user (user who executes local query) and should not be.
Not sure that this is an issue, since you will get entry in query_log for each remote query + initial query, so you can get information you need anyway (by using initial_query_id) |
|
So to summarize I'm still not sure about PROXYing over user authorization (get user credential and authorize user with them on the remote side, w/o any reconnect)...
And in this case no need in any PROXY grants at all
Looks like two opposite things here:
Something like this may help, but looks complex, and I don't see any real benefits over real user authorization... |
How? Storing user credentials in session is a very-very bad idea, not safe and is not acceptable. For authentication schemas like Kubernetes it require to ask for lot if tickets for executing single query, also not acceptable. |
It's mostly terminology. 1)We have logged in user.
I agree that it may lead to some issues due to existing initial_user usage patterns. But may be it will not be so hard to review / check them. And it will be better for the users to have less entities / simpler model compatible with existing. |
|
Proposal:
That sounds simple, should solve the issue, and is backward compatible. What do you think? |
|
I doubt there is a case where we should skip checking access rights for the initial user. Also it seems there is no reason to skip checking access rights for the remote users. And I think we should prefer a solution which would be consistent with how other queries are processed. I suppose access rights' checking for distributed queries should be implemented in the similar way: |
That is not done good & safely. I can drop all your databases just having zookeeper access. You will not even know in that case who dropped your databases.
That sounds like optimization to avoid sending queries that can't be executed on remote nodes due to a lack of permissions.
Interesting idea.
That leaves the issue described before (trackability):
|
Ok, I agree it's better to make sure on a node that a passed |
Good point. |
Yes, but it's unclear what to do about that. Don't give access to your zookeeper to bad people) |
Some simple signature can be used to confirm that the data is created by proper person and was not changed, for example |
Ok. |
|
That secret should be inaccessible for users, should be configured between the servers (otherwise the user still can add bad queries into zookeeper w/o help of server). I think it should be either smth completely isolated, or derived from interserver credentials (those listed in remote_servers) |
password/user already stored for the connection. And we can use password hash for authorization (and it is already supported by the server but this does not allowed by the client but can be adjusted for this)
I guess that this kind of auth type will have some cache with TTL, that can address this (although not sure).
#11498 (comment) does not protects from user spoof, but this is addressed in other comments.
AFAICS right now only one user is checked (+ #8926) ?
Distributed queries (SELECT and INSERT) do not uses zookeeper, I guess you are talking about
So either user password should be used in the hash or some inter-server secret |
That is a security hole/weakness, categorized as CVE-312 or more specific CVE-316
For Kerberos - you can't. Inside the ticket, there are the name of the target server (so you can't reuse a ticket for serverA to access serverB) and also timestamps (you can't reuse outdated tickets, and Kerberos requires clock sync, i.e. ntpd).
Yes (but similar problem exists there)
It should be interserver secret. Using user password as a secret for making signatures will allow user to create valid signatures without server help. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add proxy users for distributed queries
Detailed description / Documentation draft:
Based on ideas from #9751
Usage example
For this the following had been added:
CREATE USER proxied_user ALLOW PROXYING VIA proxy_user (covered)ALTER USER proxied_user [ALLOW|DENY] PROXYING VIA proxy_user (covered)allowed_proxy_user config directiveallowed_proxy_users into system.users tableWhere:
Left:
system.processesusescurrent_user(and also log message from ProcessList.cpp)more testingnon-clumsy GRANTexception constructor via fmtremote() supportRevs:
Conflicts: #11391 (outperform)
Fixes: #6843
Fixes: #9751
Cc: @filimonov
Cc: @abyss7
Details
HEAD: