Skip to content

[CANCELED] Ability to preserve user for distributed queries#11391

Closed
azat wants to merge 4 commits intoClickHouse:masterfrom
azat:dist-preserve-initial_user
Closed

[CANCELED] Ability to preserve user for distributed queries#11391
azat wants to merge 4 commits intoClickHouse:masterfrom
azat:dist-preserve-initial_user

Conversation

@azat
Copy link
Member

@azat azat commented Jun 2, 2020

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Preserve initial_user for Distributed queries if user was not specified in remote_servers

Fixes: #6843

@azat azat marked this pull request as draft June 2, 2020 23:42
@blinkov blinkov added the pr-improvement Pull request with some product improvements label Jun 2, 2020
@qoega qoega added the doc-alert label Jun 3, 2020
@azat azat changed the title Preserve initial_user for Distributed queries if user is not specified in remote_servers Preserve initial_user for Distributed queries if user was not specified in remote_servers Jun 3, 2020
@azat azat force-pushed the dist-preserve-initial_user branch from 64e2cf4 to 586c3b3 Compare June 3, 2020 23:33
@azat azat marked this pull request as ready for review June 4, 2020 08:20
@abyss7 abyss7 self-assigned this Jun 4, 2020
@filimonov
Copy link
Contributor

Related:
#9751
#8926

@abyss7
Copy link
Contributor

abyss7 commented Jun 5, 2020

It doesn't look good in some cases.

  • What to do if the propagated user is not defined on the whole cluster?
  • How to handle situations when incoming connection to the initiator is secured, but outcoming is not?
  • How to provide backward compatibility for current production clusters where all the settings and contraints are properly defined for 'default' user only? Especially for the 1st case.

Also, it's won't work, if we implement any other authentication options. So it may be a temporary solution for now.

My suggestion is to put this behavior behind config option like <propagate_plain_user_password>1</propagate_plain_user_password>.

Copy link
Contributor

@abyss7 abyss7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the new behavior under some config option.

Copy link
Contributor

@abyss7 abyss7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you should make sure, that a raw password isn't stored in RAM on the long-term basis, i.e. it should be sent to remote and then erased immediately from memory.

@azat azat changed the title Preserve initial_user for Distributed queries if user was not specified in remote_servers Ability to preserve user for distributed queries Jun 5, 2020
@azat
Copy link
Member Author

azat commented Jun 5, 2020

How to handle situations when incoming connection to the initiator is secured, but outcoming is not?

This PR does not changes this, am I missing something?

How to provide backward compatibility for current production clusters where all the settings and contraints are properly defined for 'default' user only? Especially for the 1st case.

Agree, backward compatibility should be defined.
I went without for two reasons, simplicity (forgot to mark this PR as RFC, egh) and #6843 (comment) misread

Also, it's won't work, if we implement any other authentication options. So it may be a temporary solution for now.

Yes, and like @filimonov noted that there can be some issues for kerberos/LDAP (since they requires some keys IIRC).

Also you should make sure, that a raw password isn't stored in RAM on the long-term basis, i.e. it should be sent to remote and then erased immediately from memory.

But it is already stored in the ClientInfo, does it that significant? (Context::setUser)

Actually there is another way to go, suggested by @filimonov in #9751
And looks like it will resolve all the issues (and remove extra reconnect), any thoughts on this?

@azat azat marked this pull request as draft June 5, 2020 19:58
@abyss7
Copy link
Contributor

abyss7 commented Jun 6, 2020

How to handle situations when incoming connection to the initiator is secured, but outcoming is not?

This PR does not changes this, am I missing something?

Yes, but AFAIK current implementation never re-sends passwords.

Also you should make sure, that a raw password isn't stored in RAM on the long-term basis, i.e. it should be sent to remote and then erased immediately from memory.

But it is already stored in the ClientInfo, does it that significant? (Context::setUser)

Actually yes, so it's not getting worse.

Actually there is another way to go, suggested by @filimonov in #9751
And looks like it will resolve all the issues (and remove extra reconnect), any thoughts on this?

Do you plan to continue this PR anyway, or will postpone it?

@azat
Copy link
Member Author

azat commented Jun 6, 2020

Do you plan to continue this PR anyway, or will postpone it?

Sure, I will continue (marked as draft to avoid attracting extra attention until it will be reworked/fixed)

Right now I have two options:

  • just introduce options to make it backward compatible (simple)
  • introduce proxy/impersonate users (as suggested in Impersonate users in cluster #9751, any cons for it?), but it will require more time

Proxy user does not have mentioned limitations, but it looks a little bit oversized for this kind of support, I guess I will try it and see how it will looks eventually

@azat
Copy link
Member Author

azat commented Jun 7, 2020

Also, it's won't work, if we implement any other authentication options. So it may be a temporary solution for now.

Actually the same thing is used in StorageReplicatedMergeTree::sendRequestToLeaderReplica

/// A: Don't think so, since the user should be same for subsequent queries.
if (current_user != new_current_user)
{
current_user = new_current_user;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is not the only place where the connection will be established, another one in ConnectionPoolWithFailover->getServerRevision, and during this it may use stalled current_user user

(gdb) bt
0  DB::Connection::connect (this=0x7f7b09ab1118, timeouts=...) at ../src/Client/Connection.cpp:89
1  0x000000000e219675 in DB::Connection::getServerRevision (this=0x7f7b09ab1118, timeouts=...) at ../src/Client/Connection.cpp:286
2  0x000000000e22fac3 in DB::ConnectionPoolWithFailover::tryGetEntry (this=0x7f791fcff258, pool=..., timeouts=..., fail_message=...,
   settings=0x7f74adecf3b0, table_to_check=0x7f787e5b4bb0) at ../src/Client/ConnectionPoolWithFailover.cpp:232
3  0x000000000e230856 in DB::ConnectionPoolWithFailover::<>::operator() () at ../src/Client/ConnectionPoolWithFailover.cpp:153
...
9  std::__1::function<PoolWithFailoverBase<DB::IConnectionPool>::TryResult ()>::operator()() const () at ../contrib/libcxx/include/functional:2473
10 PoolWithFailoverBase<DB::IConnectionPool>::getMany() () at ../src/Common/PoolWithFailoverBase.h:225
11 0x000000000e230b8b in DB::ConnectionPoolWithFailover::getManyImpl() () at ../src/Client/ConnectionPoolWithFailover.cpp:211
12 0x000000000e23129f in DB::ConnectionPoolWithFailover::getManyChecked () at ../src/Client/ConnectionPoolWithFailover.cpp:156
13 0x000000000d6759b3 in DB::RemoteQueryExecutor::<lambda()>::operator() () at ../src/Interpreters/StorageID.h:77
...
20 DB::RemoteQueryExecutor::sendQuery (this=0x7f74adecf020) at ../src/DataStreams/RemoteQueryExecutor.cpp:149
...
46 0x00007f7b22308fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486

@azat azat changed the title Ability to preserve user for distributed queries [CANCELED] Ability to preserve user for distributed queries Jun 18, 2020
@azat azat closed this Jun 24, 2020
@azat azat deleted the dist-preserve-initial_user branch October 2, 2020 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding "use 'initial_user' option" to user specification in cluster definition

5 participants