RabbitMQ improvements by kssenii · Pull Request #12761 · ClickHouse/ClickHouse

kssenii · 2020-07-25T17:39:57Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Improvements in StorageRabbitMQ: Added connection and channels failure handling, proper commits, insert failures handling, better exchanges, queue durability and queue resume opportunity, new queue settings. Fixed tests.

kssenii · 2020-07-25T17:40:07Z

I was reading some related articles and realized that I made a logical mistake in insert query part, so thought I should fix it.
But then I thought why not also add some improvements...

alesapin

I know nothing about RabbitMQ( Will try to review better next time.

src/Storages/RabbitMQ/ReadBufferFromRabbitMQConsumer.cpp

alesapin · 2020-07-29T14:11:46Z

src/Storages/RabbitMQ/ReadBufferFromRabbitMQConsumer.cpp

-        local_exchange_declared = false;
-        LOG_ERROR(log, "Failed to declare local direct-exchange. Reason: {}", message);
-    });
+    std::atomic<bool> bindings_created = false, bindings_error = false;


Why they have to be atomic?

because otherwise later in the loop you'll catch error: this loop is infinite; none of its condition variables (bindings_created, bindings_error) are updated in the loop body

src/Storages/RabbitMQ/ReadBufferFromRabbitMQConsumer.cpp

src/Storages/RabbitMQ/StorageRabbitMQ.cpp

alesapin

I need new review type: Request clarifications

alesapin · 2020-08-28T12:12:28Z

src/Storages/RabbitMQ/StorageRabbitMQ.cpp

+
+    auto new_context = std::make_shared<Context>(context);
+    if (!schema_name.empty())
+        new_context->setSetting("format_schema", schema_name);


Why do we need to share context in each read? Actually here just created another shared ptr, all changes will be reflected in the original object.

alesapin · 2020-08-28T12:16:26Z

src/Storages/RabbitMQ/StorageRabbitMQ.cpp

+            looping_task->deactivate();
+        }
+
+        if ((update_channels = restoreConnection(true)))


Quite unclear, let's do it in separate lines because this flag used later.

alesapin · 2020-08-28T12:18:39Z

src/Storages/RabbitMQ/StorageRabbitMQ.cpp

    std::atomic<bool> stub = {false};
    copyData(*in, *block_io.out, &stub);

+    /* Need to stop loop even if connection is ok, because sending ack() with loop running in another thread will lead to a lot of data


Maybe we can have a separate connection for it? Current logic is quite complicated, needs additional flags to stop the loop, to deactivate tasks, and so on.

Can't have a separate connection, because: there is only one unique event loop for each connection, loop handles onReceived() callbacks for channel, which made consume() call, therefore committing messages with channel->ack() needs to be called on the same channel, this means that the channel and the loop must share the same connection as a loop declared with a different connection will not be able to access those channel's callbacks

alesapin · 2020-08-28T12:20:14Z

src/Storages/RabbitMQ/StorageRabbitMQ.cpp

        }

-        UInt64 num_consumers = rabbitmq_settings.rabbitmq_num_consumers;
+        String exchange_type = rabbitmq_settings.rabbitmq_exchange_type.value;


Incompatible change because the order of arguments changed.

alesapin · 2020-08-28T12:22:53Z

src/Storages/RabbitMQ/StorageRabbitMQ.cpp

-        parsed_address, global_context, login_password, routing_keys[0], local_exchange_name,
-        log, num_consumers * num_queues, bind_by_id, use_transactional_channel,
+        parsed_address, global_context, login_password, routing_keys, exchange_name, exchange_type,
+        producer_id.fetch_add(1), unique_strbase, persistent, wait_confirm, log,


producer_id just a unique number?

kind of, serial number to represent a channel id

alesapin · 2020-08-28T12:36:57Z

src/Storages/RabbitMQ/WriteBufferToRabbitMQProducer.cpp

    connection->close();
+
+    size_t cnt_retries = 0;
+    while (!connection->closed() && ++cnt_retries != (RETRIES_MAX >> 1))


What this bit-shift means?

just to decrease the number of retries

alesapin · 2020-08-28T12:37:09Z

src/Storages/RabbitMQ/WriteBufferToRabbitMQProducer.cpp

+    while (!connection->closed() && ++cnt_retries != (RETRIES_MAX >> 1))
+    {
+        event_handler->iterateLoop();
+        std::this_thread::sleep_for(std::chrono::milliseconds(CONNECT_SLEEP >> 3));


is it bad to reuse constants in another context but modifying them for needed case? may be I should just change connect_sleep to wait_sleep not to be case-specific

alesapin · 2020-08-28T12:42:26Z

src/Storages/RabbitMQ/StorageRabbitMQ.cpp

+        /// Commit
+        for (auto & stream : streams)
+        {
+            if (!stream->as<RabbitMQBlockInputStream>()->sendAck())


What will happen if we have already successfully inserted data to all views but one of streams got error sending ack? Is data duplication possible?

hm, false is returned by the sendAck function in only two cases:

if connection failed. In this case all channels will be closed and will be unable to send ack. Also ack is made based on delivery tags, which is unique to channel, so if channels fail, those delivery tags will become invalid and there is no way to send specific ack from a different channel. Actually once the server realises that it has messages in a queue waiting for confirm from a channel which suddenly closed, it will immediately make those messages accessible to other consumers. So yes, in this case duplicates are inevitable.

size of the sent frame (libraries's internal request interface) exceeds max frame - internal library error. This is more common for message frames, but not likely to happen to ack frame I suppose. So I do not believe it is likely to happen. But I can add one more check: if sendAck failed and connection still usable, retry to send ack and do not break from the loop in this case, but if frame size exceeded in the first time, then I believe for the second time nothing will change. Also in this case it is ok if failed to send ack, because the next attempt to send ack on the same channel will also commit all previously not-committed messages, so there will be no duplicates. Anyway I do not think that for ack frame this will ever happen

Sorry for butting in, but this explanation would make an ideal comment for this code. This also goes for most things you ever have to explain to someone about the code.

alesapin · 2020-08-28T12:44:25Z

src/Storages/RabbitMQ/WriteBufferToRabbitMQProducer.cpp

-    {
-        throw Exception("Cannot set up connection for producer", ErrorCodes::CANNOT_CONNECT_RABBITMQ);
-    }
+    if (setupConnection(false))


But what happens in the bad case? Maybe we should throw an exception?

now it will continue to attempt reconnect in writingFunc and will set up channel for producer the moment it managed to do so, but yes, should better throw if unable to connect while in buffer constructor

alesapin · 2020-08-28T12:45:30Z

src/Storages/RabbitMQ/WriteBufferToRabbitMQProducer.h

+    String channel_id;
+    ConcurrentBoundedQueue<std::pair<UInt64, String>> payloads, returned;
+    UInt64 delivery_tag = 0;
+    std::atomic<bool> wait_all = true;


Need comments for each flag.

alesapin · 2020-08-28T12:49:21Z

Also please fix clang-tidy errors https://clickhouse-builds.s3.yandex.net/12761/4e0c61972109f7c9ffd6962b37e3652e7201bfd8/clickhouse_special_build_check/build_log_758658258_1598611482.txt

…o rabbitmq-improvements

alesapin

Not finished, will try tomorrow.

src/Storages/RabbitMQ/ReadBufferFromRabbitMQConsumer.cpp

src/Storages/RabbitMQ/StorageRabbitMQ.cpp

alesapin · 2020-09-04T14:29:31Z

test_host_ip_change/test.py::test_ip_change_update_dns_cache flaky test.

alesapin

Despite the code became more complicated, a lot of useful fixes and improvements were added. Now we have different failovers on interaction with RabbitMQ, virtual columns, new settings, fixes for race conditions and so on. Also, tests should become more stable. I think we can merge it because the code quite isolated and tests are good.

alesapin · 2020-09-08T06:35:50Z

test_distributed_over_live_view flaky

alesapin · 2020-09-08T06:36:31Z

OOM in GCC build not related to changes.

kssenii added 7 commits July 25, 2020 16:55

Fix bug with insert, simplify exchanges logic

a88e391

Add virtuals

ac448db

Move exchange init, add bridge-exchange

f0f6111

Add queue resume read

f9a4bf9

Add dl-exchange, commits

2b57857

More reliable publishing

22b1606

Update docs

92efb84

robot-clickhouse added the pr-improvement Pull request with some product improvements label Jul 25, 2020

alesapin added the can be tested label Jul 27, 2020

alesapin self-assigned this Jul 27, 2020

kssenii added 2 commits July 28, 2020 10:43

Fix build, async acks -> sync acks, fix tests

0ee54c8

Fix build

469e46a

alesapin reviewed Jul 29, 2020

View reviewed changes

kssenii added 2 commits July 31, 2020 04:59

Remove redundant, move subscription

763c337

Add connection restore in insert, better confirms

5a934c0

kssenii force-pushed the rabbitmq-improvements branch from 834101f to 1db3f9e Compare July 31, 2020 20:34

kssenii added 4 commits August 1, 2020 13:05

Add consumer connection track and restore

c2bed35

Small fixes

62293f8

Merge

60124b7

Stop publish untill batch is confirmed

d5b1332

kssenii force-pushed the rabbitmq-improvements branch from 1db3f9e to d5b1332 Compare August 3, 2020 14:47

kssenii added 7 commits August 4, 2020 20:50

Better confirmListener

053f31c

Merge

add698a

Allow multiple consumers for same queues

24b032b

Add some message properties

1213161

Update docs

eff0233

More tests, better reconnect

2ea32a7

Merge

139a19d

kssenii force-pushed the rabbitmq-improvements branch from ff4b693 to 139a19d Compare August 8, 2020 18:41

Better & cleaner

4fecfdb

alesapin mentioned this pull request Aug 27, 2020

Disable several flaky tests for RabbitMQ #14161

Merged

Global refactoring

4e0c619

alesapin requested changes Aug 28, 2020

View reviewed changes

kssenii added 2 commits August 31, 2020 09:27

Better settings

647cf57

Better shutdown

e57d1c8

kssenii mentioned this pull request Aug 31, 2020

Can't create DB with RabbitMQ + Protobuf #14277

Closed

kssenii added 2 commits September 1, 2020 07:58

Better mv, more comments

c2fb72a

Update docs

7b0713b

kssenii force-pushed the rabbitmq-improvements branch from a977457 to 7b0713b Compare September 1, 2020 08:29

kssenii added 2 commits September 1, 2020 10:51

Merge branch 'master' of https://github.com/ClickHouse/ClickHouse int…

15f735b

…o rabbitmq-improvements

Fixes

6682c62

kssenii force-pushed the rabbitmq-improvements branch from 9df2378 to 6682c62 Compare September 1, 2020 20:09

kssenii mentioned this pull request Sep 2, 2020

DOCSUP-2003: Edit and translate to Russian RabbitMQ description #14349

Merged

alesapin reviewed Sep 2, 2020

View reviewed changes

Fixes

e1ef558

kssenii force-pushed the rabbitmq-improvements branch 3 times, most recently from dd1c27b to e1ef558 Compare September 3, 2020 10:52

Merge branch 'master' into rabbitmq-improvements

4b7c303

alesapin added 3 commits September 7, 2020 11:36

Merge branch 'master' into kssenii-rabbitmq-improvements

4ce975c

Better name

f6237dc

Tiny improvements

17650e8

alesapin approved these changes Sep 7, 2020

View reviewed changes

Tiny fixes, better tests

40c8290

alesapin merged commit 4364bff into ClickHouse:master Sep 8, 2020

Conversation

kssenii commented Jul 25, 2020 • edited by alesapin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kssenii commented Jul 25, 2020

Uh oh!

alesapin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alesapin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kssenii Aug 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alesapin commented Aug 28, 2020

Uh oh!

alesapin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alesapin commented Sep 4, 2020

Uh oh!

alesapin left a comment

Choose a reason for hiding this comment

Uh oh!

alesapin commented Sep 8, 2020

Uh oh!

alesapin commented Sep 8, 2020

Uh oh!

kssenii commented Jul 25, 2020 •

edited by alesapin

Loading

kssenii Aug 28, 2020 •

edited

Loading