Added new tests for session_log and fixed the inconsistency between login and logout. by Demilivor · Pull Request #52958 · ClickHouse/ClickHouse

Demilivor · 2023-08-03T00:44:45Z

Added several new tests for session_log and fixed the inconsistency of the session_log records between user login and logout.

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fixed the record inconsistency in session_log between login and logout.

Inconsistency comes from this behavior:
During a TCP session, the client authenticates as Alice, with interface 'TCP' (set in TCPHandler::makeSession()) on the server side, and then server calls makeSessionContext() with this client_info.
After session setup, the client sends Protocol::Client::Query packet, the TCP handler accepts it, and then reads client_info.

    ClientInfo client_info = session->getClientInfo(); 
    if (client_tcp_protocol_version >= DBMS_MIN_REVISION_WITH_CLIENT_INFO)
        client_info.read(*in, client_tcp_protocol_version); // <- overwrites interface,  hostname, client_name and several other parameters important for the session_log record.

client_tcp_protocol_version >= DBMS_MIN_REVISION_WITH_CLIENT_INFO is true for our case

This client_info is used for Session::createQueryContext and it can differ from session->getClientInfo().

Session::createQueryContext uses client info from query_context for the Login Success session log record.
Session::~Session uses client info from Session::getClientInfo()

This can lead to these results in the session log:

Login Success: User Alice, HTTP (from query_context) <- Some fields are overwritten by client_info.read() by some not-good clients.
Logout:  User Alice, TCP (from session_context)

This pull request forces using Session::getClientInfo() for login success and logout session_log records.

I think we should verify client_info from the client in TCPHandler. It should validate 'interface' at least, it should not be HTTP, GRPC, or other, host, client_name, and protocol versions also must match between session->getClientInfo() and received client_info.

clickhouse-ci · 2023-08-03T00:54:14Z

This is an automatic comment. The PR descriptions does not match the template.

Please, edit it accordingly.

The error is: More than one changelog category specified: 'Build/Testing/Packaging Improvement', 'Bug Fix (user-visible misbehavior in an official stable release)'

Demilivor · 2023-08-03T07:34:50Z

@tavplubix Please be a reviewer of this pull request and assign the label 'Can be tested'

tests/integration/test_session_log/test.py

robot-clickhouse-ci-2 · 2023-08-03T09:39:32Z

This is an automated comment for commit 7dbc0d3 with description of existing statuses. It's updated for the latest CI running
The full report is available here
The overall status of the commit is 🟡 pending

Check name	Description	Status
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	🟡 pending
Mergeable Check	Checks if all other necessary checks are successful	🟢 success
Push to Dockerhub	The check for building and pushing the CI related docker images to docker hub	🟢 success

Demilivor · 2023-08-03T11:14:59Z

Fast test failed with strange error:

2023-08-03 12:05:53 [ip-172-31-10-33] 2023.08.03 12:05:52.869718 [ 130985 ] {d7841282-2497-4c48-98d9-f427aadce8ff} <Error> executeQuery: Code: 46. DB::Exception: Unknown table function mysql:  or incorrect parameterized view. (UNKNOWN_FUNCTION) (version 23.8.1.1) (from [::ffff:127.0.0.1]:54922) (comment: 02834_remote_session_log.sh) (in query: SELECT * FROM mysql('127.0.0.1:9004', 'system', 'one', '02834_USER_193379', '[HIDDEN]')), Stack trace (when copying this message, always include the lines below):
2023-08-03 12:05:53

Why mysql function is unknown?
These tests pass on my machine:

Running 3 stateless tests (MainProcess).

02835_drop_user_during_session:                                         [ OK ]
02834_remote_session_log:                                               [ OK ]
02833_concurrrent_sessions:                                             [ OK ]

3 tests passed. 0 tests skipped. 12.99 s elapsed (MainProcess).

tavplubix · 2023-08-03T11:58:06Z

It's because we disable some features in Fast test to make build faster:

+ cmake /ClickHouse -DCMAKE_CXX_COMPILER=clang++-16 -DCMAKE_C_COMPILER=clang-16 -DENABLE_LIBRARIES=0 -DENABLE_TESTS=0 -DENABLE_UTILS=0 -DENABLE_EMBEDDED_COMPILER=0 -DENABLE_THINLTO=0 -DENABLE_NURAFT=1 -DENABLE_SIMDJSON=1 -DENABLE_JEMALLOC=1 -DENABLE_LIBURING=1 -DCOMPILER_CACHE=sccache

and clickhouse was built without mysql support. It can be fixed with no-fasttest tag:

ClickHouse/tests/queries/0_stateless/00900_parquet_time_to_ch_date_time.sh

Lines 1 to 2 in 01d664a

    
           #!/usr/bin/env bash 
        
           # Tags: no-fasttest

tests/queries/0_stateless/02833_concurrrent_sessions.sh

Demilivor · 2023-08-04T09:03:25Z

My tests do not pass on CI.

Also, I saw such an error:

2023.08.04 18:24:20.127471 [ 699 ] {044506f2-8724-48f6-a0fd-32523b27d2a5} <Error> executeQuery: Code: 497. DB::Exception: 02835_USER_1129: Not enough privileges. To execute this query it's necessary to have grant SELECT for at least one column on system.numbers. (ACCESS_DENIED) (version 23.8.1.1) (from [::ffff:127.0.0.1]:44592) (comment: 02835_drop_user_during_session.sh) (in query: SELECT COUNT(*) FROM system.numbers)

Here:
https://s3.amazonaws.com/clickhouse-test-reports/52958/be067e0ddf1752ae4d68de1742384b01a0f02859/stateless_tests__debug__[3_5]/clickhouse-server.err.log

It looks like the newly created user does not have the access rights to select from system.numbers.

tests/queries/0_stateless/02833_concurrrent_sessions.sh

tavplubix · 2023-08-07T09:08:11Z

Restarting tests again and again does not make sense. I will check the failures and merge if they are unrelated.

tavplubix · 2023-08-07T12:18:07Z

Stress tests - check_status.tsv doesn't exists - #53064 (comment)
Upgrade check - Found signal in gdb.log - #52540 (comment)
Integration tests - test_storage_azure_blob_storage and test_zookeeper_config are flaky

alexey-milovidov · 2023-08-10T01:10:33Z

The test is flaky: https://s3.amazonaws.com/clickhouse-test-reports/53059/e03258bbdc6dada1dbe00158a880415deb2f1db9/stateless_tests__debug__[5_5].html

Demilivor · 2023-08-10T03:12:01Z

I don't know why those tests are so slow in the debug build.

02835_drop_user_during_session | OK | 63.3 <-- Too great time for such a small test.
02833_concurrrent_sessions | FAIL | 205.64 <-- Timeout fail

Release results:

02835_drop_user_during_session | OK | 2.16
02833_concurrrent_sessions | OK | 3.73

I got similar results on my machine as in the release version. I ran those tests hundreds of times, and the tests always passed.

ASAN and TSAN tests are also quite slow, but not so much. These tests rely on timeouts and expect some sufficient performance. It seems I have 2 ways:

Increase timeouts even more,
Prohibit tests from running in debug mode.

200 seconds is already too long tests for 5 threaded test, so I'm going to prohibit running tests in debug mode. (no-debug tag should do the job.) And make another pull request.

Demilivor · 2023-08-10T05:00:17Z

Created PR with fixed tests: #53255

Please give some recommendations if you have such (special builds? compilation keys?) How should I run auto tests before creation PR to avoid such failures in the future?

tavplubix · 2023-08-10T10:58:43Z

Please give some recommendations if you have such (special builds? compilation keys?) How should I run auto tests before creation PR to avoid such failures in the future?

It's hard to avoid that completely. We have Stateless tests flaky check in CI that is supposed to avoid that. It runs new/changed tests up to 100 times in parallel (unless there's a no-parallel tag) with ThreadFuzzer enabled. It helps a lot, but not always.

P.S. I'm sorry that it was reverted because of just one test failure. Usually, we only revert PRs that actually break something.

Demilivor · 2023-08-11T03:01:59Z

Thanks! I did not know that. I just launched my *sh tests with thread fuzzer 10 times and got no fail (time increased more than 5 times with default settings)

Demilivor mentioned this pull request Aug 3, 2023

What kinds of issues system.session_log table has? #51760

Closed

Demilivor changed the title ~~Added new tests for session_log and fixed the inconsistency between login and logout.~~ DRAFT Added new tests for session_log and fixed the inconsistency between login and logout. Aug 3, 2023

Demilivor force-pushed the ADQM-1011 branch from 8bac76e to 27814dc Compare August 3, 2023 05:19

Demilivor changed the title ~~DRAFT Added new tests for session_log and fixed the inconsistency between login and logout.~~ Added new tests for session_log and fixed the inconsistency between login and logout. Aug 3, 2023

Demilivor commented Aug 3, 2023

View reviewed changes

tests/integration/test_session_log/test.py Outdated Show resolved Hide resolved

tavplubix self-assigned this Aug 3, 2023

tavplubix added the can be tested Allows running workflows for external contributors label Aug 3, 2023

robot-clickhouse-ci-2 added the pr-bugfix Pull request with bugfix, not backported by default label Aug 3, 2023

tavplubix approved these changes Aug 3, 2023

View reviewed changes

tests/queries/0_stateless/02833_concurrrent_sessions.sh Outdated Show resolved Hide resolved

Added new tests for session_log plus bugfix

1e7daca

Demilivor force-pushed the ADQM-1011 branch from 7cf1c9d to 1e7daca Compare August 3, 2023 22:06

Demilivor added 2 commits August 4, 2023 06:09

Reduced test flakiness

24cdaf3

removed no-parallel

be067e0

tests update

6010381

tavplubix reviewed Aug 4, 2023

View reviewed changes

tests/queries/0_stateless/02833_concurrrent_sessions.sh Outdated Show resolved Hide resolved

Demilivor added 8 commits August 4, 2023 21:53

timeout increase

86036fa

added missed comma

712ac83

Merge branch 'master' into ADQM-1011

c2f9410

Merge branch 'master' into ADQM-1011

b65a86c

Merge branch 'master' into ADQM-1011

476d5b7

Merge branch 'master' into ADQM-1011

da0c013

Merge branch 'master' into ADQM-1011

a5a7a12

Merge branch 'master' into ADQM-1011

7dbc0d3

tavplubix added the do not test disable testing on pull request label Aug 7, 2023

tavplubix merged commit 73d75cd into ClickHouse:master Aug 7, 2023

alexey-milovidov mentioned this pull request Aug 10, 2023

Revert "Added new tests for session_log and fixed the inconsistency between login and logout." #53247

Merged

Demilivor mentioned this pull request Aug 10, 2023

#2 Added new tests for session_log and fixed the inconsistency between login and logout. #53255

Merged

Conversation

Demilivor commented Aug 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Uh oh!

clickhouse-ci bot commented Aug 3, 2023

Uh oh!

Demilivor commented Aug 3, 2023

Uh oh!

Uh oh!

robot-clickhouse-ci-2 commented Aug 3, 2023 • edited by robot-clickhouse-ci-1 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Demilivor commented Aug 3, 2023

Uh oh!

tavplubix commented Aug 3, 2023

Uh oh!

Uh oh!

Demilivor commented Aug 4, 2023

Uh oh!

Uh oh!

tavplubix commented Aug 7, 2023

Uh oh!

tavplubix commented Aug 7, 2023

Uh oh!

alexey-milovidov commented Aug 10, 2023

Uh oh!

Demilivor commented Aug 10, 2023

Uh oh!

Demilivor commented Aug 10, 2023

Uh oh!

tavplubix commented Aug 10, 2023

Uh oh!

Demilivor commented Aug 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Demilivor commented Aug 3, 2023 •

edited

Loading

robot-clickhouse-ci-2 commented Aug 3, 2023 •

edited by robot-clickhouse-ci-1

Loading