Skip to content

Added new tests for session_log and fixed the inconsistency between login and logout.#52958

Merged
tavplubix merged 12 commits intoClickHouse:masterfrom
arenadata:ADQM-1011
Aug 7, 2023
Merged

Added new tests for session_log and fixed the inconsistency between login and logout.#52958
tavplubix merged 12 commits intoClickHouse:masterfrom
arenadata:ADQM-1011

Conversation

@Demilivor
Copy link
Contributor

@Demilivor Demilivor commented Aug 3, 2023

Added several new tests for session_log and fixed the inconsistency of the session_log records between user login and logout.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fixed the record inconsistency in session_log between login and logout.

Inconsistency comes from this behavior:
During a TCP session, the client authenticates as Alice,  with interface 'TCP' (set in TCPHandler::makeSession()) on the server side, and then server calls makeSessionContext() with this client_info.
After session setup, the client sends Protocol::Client::Query packet, the TCP handler accepts it, and then reads client_info.

    ClientInfo client_info = session->getClientInfo(); 
    if (client_tcp_protocol_version >= DBMS_MIN_REVISION_WITH_CLIENT_INFO)
        client_info.read(*in, client_tcp_protocol_version); // <- overwrites interface,  hostname, client_name and several other parameters important for the session_log record.

client_tcp_protocol_version >= DBMS_MIN_REVISION_WITH_CLIENT_INFO is true for our case

This client_info is used for Session::createQueryContext and it can differ from session->getClientInfo().

Session::createQueryContext uses client info from query_context for the Login Success session log record.
Session::~Session uses client info from Session::getClientInfo()

This can lead to these results in the session log:

Login Success: User Alice, HTTP (from query_context) <- Some fields are overwritten by client_info.read() by some not-good clients.
Logout:  User Alice, TCP (from session_context)

This pull request forces using Session::getClientInfo() for login success and logout session_log records.

I think we should verify client_info from the client in TCPHandler. It should validate 'interface' at least, it should not be HTTP, GRPC, or other, host, client_name, and protocol versions also must match between session->getClientInfo() and received client_info.

@clickhouse-ci
Copy link

clickhouse-ci bot commented Aug 3, 2023

This is an automatic comment. The PR descriptions does not match the template.

Please, edit it accordingly.

The error is: More than one changelog category specified: 'Build/Testing/Packaging Improvement', 'Bug Fix (user-visible misbehavior in an official stable release)'

@Demilivor Demilivor changed the title Added new tests for session_log and fixed the inconsistency between login and logout. DRAFT Added new tests for session_log and fixed the inconsistency between login and logout. Aug 3, 2023
@Demilivor Demilivor changed the title DRAFT Added new tests for session_log and fixed the inconsistency between login and logout. Added new tests for session_log and fixed the inconsistency between login and logout. Aug 3, 2023
@Demilivor
Copy link
Contributor Author

@tavplubix Please be a reviewer of this pull request and assign the label 'Can be tested'

@tavplubix tavplubix self-assigned this Aug 3, 2023
@tavplubix tavplubix added the can be tested Allows running workflows for external contributors label Aug 3, 2023
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-bugfix Pull request with bugfix, not backported by default label Aug 3, 2023
@robot-clickhouse-ci-2
Copy link
Contributor

robot-clickhouse-ci-2 commented Aug 3, 2023

This is an automated comment for commit 7dbc0d3 with description of existing statuses. It's updated for the latest CI running
The full report is available here
The overall status of the commit is 🟡 pending

Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR🟡 pending
Mergeable CheckChecks if all other necessary checks are successful🟢 success
Push to DockerhubThe check for building and pushing the CI related docker images to docker hub🟢 success

@Demilivor
Copy link
Contributor Author

Fast test failed with strange error:

2023-08-03 12:05:53 [ip-172-31-10-33] 2023.08.03 12:05:52.869718 [ 130985 ] {d7841282-2497-4c48-98d9-f427aadce8ff} <Error> executeQuery: Code: 46. DB::Exception: Unknown table function mysql:  or incorrect parameterized view. (UNKNOWN_FUNCTION) (version 23.8.1.1) (from [::ffff:127.0.0.1]:54922) (comment: 02834_remote_session_log.sh) (in query: SELECT * FROM mysql('127.0.0.1:9004', 'system', 'one', '02834_USER_193379', '[HIDDEN]')), Stack trace (when copying this message, always include the lines below):
2023-08-03 12:05:53 

Why mysql function is unknown?
These tests pass on my machine:

Running 3 stateless tests (MainProcess).

02835_drop_user_during_session:                                         [ OK ]
02834_remote_session_log:                                               [ OK ]
02833_concurrrent_sessions:                                             [ OK ]

3 tests passed. 0 tests skipped. 12.99 s elapsed (MainProcess).

@tavplubix
Copy link
Member

It's because we disable some features in Fast test to make build faster:

+ cmake /ClickHouse -DCMAKE_CXX_COMPILER=clang++-16 -DCMAKE_C_COMPILER=clang-16 -DENABLE_LIBRARIES=0 -DENABLE_TESTS=0 -DENABLE_UTILS=0 -DENABLE_EMBEDDED_COMPILER=0 -DENABLE_THINLTO=0 -DENABLE_NURAFT=1 -DENABLE_SIMDJSON=1 -DENABLE_JEMALLOC=1 -DENABLE_LIBURING=1 -DCOMPILER_CACHE=sccache

and clickhouse was built without mysql support. It can be fixed with no-fasttest tag:

#!/usr/bin/env bash
# Tags: no-fasttest

@Demilivor
Copy link
Contributor Author

My tests do not pass on CI.

Also, I saw such an error:

2023.08.04 18:24:20.127471 [ 699 ] {044506f2-8724-48f6-a0fd-32523b27d2a5} <Error> executeQuery: Code: 497. DB::Exception: 02835_USER_1129: Not enough privileges. To execute this query it's necessary to have grant SELECT for at least one column on system.numbers. (ACCESS_DENIED) (version 23.8.1.1) (from [::ffff:127.0.0.1]:44592) (comment: 02835_drop_user_during_session.sh) (in query: SELECT COUNT(*) FROM system.numbers)

Here:
https://s3.amazonaws.com/clickhouse-test-reports/52958/be067e0ddf1752ae4d68de1742384b01a0f02859/stateless_tests__debug__[3_5]/clickhouse-server.err.log

It looks like the newly created user does not have the access rights to select from system.numbers.

@tavplubix tavplubix added the do not test disable testing on pull request label Aug 7, 2023
@tavplubix
Copy link
Member

Restarting tests again and again does not make sense. I will check the failures and merge if they are unrelated.

@tavplubix
Copy link
Member

Stress tests - check_status.tsv doesn't exists - #53064 (comment)
Upgrade check - Found signal in gdb.log - #52540 (comment)
Integration tests - test_storage_azure_blob_storage and test_zookeeper_config are flaky

@alexey-milovidov
Copy link
Member

@Demilivor
Copy link
Contributor Author

I don't know why those tests are so slow in the debug build.

02835_drop_user_during_session | OK | 63.3 <-- Too great time for such a small test.
02833_concurrrent_sessions | FAIL | 205.64 <-- Timeout fail

Release results:

02835_drop_user_during_session | OK | 2.16
02833_concurrrent_sessions | OK | 3.73

I got similar results on my machine as in the release version. I ran those tests hundreds of times, and the tests always passed.

ASAN and TSAN tests are also quite slow, but not so much. These tests rely on timeouts and expect some sufficient performance. It seems I have 2 ways:

  • Increase timeouts even more,
  • Prohibit tests from running in debug mode.

200 seconds is already too long tests for 5 threaded test, so I'm going to prohibit running tests in debug mode. (no-debug tag should do the job.) And make another pull request.

@Demilivor
Copy link
Contributor Author

Created PR with fixed tests: #53255

Please give some recommendations if you have such (special builds? compilation keys?) How should I run auto tests before creation PR to avoid such failures in the future?

@tavplubix
Copy link
Member

Please give some recommendations if you have such (special builds? compilation keys?) How should I run auto tests before creation PR to avoid such failures in the future?

It's hard to avoid that completely. We have Stateless tests flaky check in CI that is supposed to avoid that. It runs new/changed tests up to 100 times in parallel (unless there's a no-parallel tag) with ThreadFuzzer enabled. It helps a lot, but not always.

P.S. I'm sorry that it was reverted because of just one test failure. Usually, we only revert PRs that actually break something.

@Demilivor
Copy link
Contributor Author

Thanks! I did not know that. I just launched my *sh tests with thread fuzzer 10 times and got no fail (time increased more than 5 times with default settings)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors do not test disable testing on pull request pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants