Optimize memory consumption during backup to S3 by vitlibar · Pull Request #45188 · ClickHouse/ClickHouse

vitlibar · 2023-01-11T18:05:11Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Optimize memory consumption during backup to S3: files to S3 now will be copied directly without using WriteBufferFromS3 (which could use a lot of memory).

This PR fixes #43083

src/IO/S3/copyDataToS3.cpp

src/IO/StdStreamBufFromReadBuffer.cpp

CheSema · 2023-01-14T00:02:47Z

I like that approach. Actually this is the best way when we upload file from local disk to S3.
I have several question. Mostly just to make history, what and why this code do.

When you copy file you know its size. Why do you still use exponential part upload size instead some fixed?

When you copy from S3 to S3 why do you prefer seakable reading from S3 instead UploadPartCopyRequest? My concern is the following. With seakable reading you read a file twice. First is to calc sha hash. Second is to upload it to S3. Indeed this read is made with fixed buffer size, memory consumption is low. However I think it is worth to do with existing code in order not to have one more realisation with UploadPartCopyRequest.

vitlibar · 2023-01-16T10:18:16Z

When you copy file you know its size. Why do you still use exponential part upload size instead some fixed?

Yes you're right, that code was copied from WriteBufferFromS3 but there are no reasons why we should keep that exponential approach for those copy functions.

When you copy from S3 to S3 why do you prefer seakable reading from S3 instead UploadPartCopyRequest? My concern is the following. With seakable reading you read a file twice. First is to calc sha hash. Second is to upload it to S3. Indeed this read is made with fixed buffer size, memory consumption is low. However I think it is worth to do with existing code in order not to have one more realisation with UploadPartCopyRequest.

I don't quite understand. I don't prefer seekable reading over UploadPartCopyRequest. Copying from S3 to S3 with UploadPartCopyRequest must always be better. Seekable reading is not for copying from S3 to S3, it's mostly for uploading data from local disks to S3.

CheSema · 2023-01-16T11:28:02Z

I don't quite understand. I don't prefer seekable reading over UploadPartCopyRequest

Sorry. Indeed, you don't. You have dedicated realisation for S3toS3 with UploadPartCopyRequest. This is great.

vitlibar · 2023-01-16T14:07:05Z

When you copy file you know its size. Why do you still use exponential part upload size instead some fixed?

I did that. Now an uploaded file is splitted into parts of the same size.

…S3 to decrease memory consumption.

Correctly use AbortMultipleUpload request. Support std::ios_base::end StdStreamBufFromReadBuffer::seekpos().

CheSema · 2023-01-17T13:36:24Z

src/IO/S3/copyDataToS3.cpp

+                {
+                    task->req = fillUploadPartRequest(part_number, part_offset, part_size);
+
+                    schedule([this, task, task_finish_notify]()


Help me, please, to understand a thing here, how many concurrent uploads are run here? Is it limited thread pool?

My concern is that it runs a lot of tasks, each makes a new connection. It could run out of available inodes or face other limitation inside CH.

In compare with WriteBufferFromS3, WriteBufferFromS3 uses threads to write the data which it consumed. If consuming is not fast, it just upload in a background with 1 thread. However WriteBufferFromS3 also is able to produce a lot of tasks. Here is a different situation, a lot of task is run at start. Each allocates at least 1M (buffer) + new connection.

It's normally used with IOThreadPool . It's limited of course. By default it runs up to 100 threads, so maximum 100 parallel uploads. But it actually must be smaller because IOThreadPool is also used for other tasks.

In fact it should use its own thread pools backups_thread_pool and restores_thread_pool, so I changed the code to use the correct thread pools. Those thread pools are created specially for backups and restores.

src/IO/StdStreamBufFromReadBuffer.cpp

CheSema · 2023-01-17T14:23:40Z

src/IO/S3/copyDataToS3.cpp

+
+            for (size_t part_number = 1; position < end_position; ++part_number)
+            {
+                if (multipart_upload_aborted)


Isn't it better to also check multipart_upload_aborted right before processUploadPartRequest? In order to stop work as fast as possible if non retryable error occurs.
My concern is that the code have to wait all task which has been scheduled. If 1000 task were scheduled and only 100 of them is running and finishing with connection timeout. It would take 10 x timeout to finish the upload with an error.

It seems like the queue size is pretty big

auto queue_size = config.getUInt(".threadpool_writer_queue_size", 1000000);

and all task are about to be pushed to the schedule's queue. In case if first task fail no need to run the rest.

Isn't it better to also check multipart_upload_aborted right before processUploadPartRequest?

Done, I changed the code.

vitlibar · 2023-01-21T11:37:15Z

The test test_backup_restore_on_cluster/test_disallow_concurrency.py::test_concurrent_restores_on_different_node is flaky

… WriteBufferFromS3RequestsErrors Ref: ClickHouse#45188

Lloyd-Pottiger · 2024-01-11T07:18:56Z

src/IO/StdStreamFromReadBuffer.h

+    StdStreamFromReadBuffer(std::unique_ptr<ReadBuffer> buf, size_t size) : Base(&stream_buf), stream_buf(std::move(buf), size) { }
+    StdStreamFromReadBuffer(ReadBuffer & buf, size_t size) : Base(&stream_buf), stream_buf(buf, size) { }


I have some questions about this,

Base(&stream_buf) will call the following constructor

template <class _CharT, class _Traits> basic_streambuf<_CharT, _Traits>::basic_streambuf() : __binp_(nullptr), __ninp_(nullptr), __einp_(nullptr), __bout_(nullptr), __nout_(nullptr), __eout_(nullptr) { } template <class _CharT, class _Traits> basic_streambuf<_CharT, _Traits>::basic_streambuf(const basic_streambuf& __sb) : __loc_(__sb.__loc_), __binp_(__sb.__binp_), __ninp_(__sb.__ninp_), __einp_(__sb.__einp_), __bout_(__sb.__bout_), __nout_(__sb.__nout_), __eout_(__sb.__eout_) { }

so, __ninp_ and __einp_ are nullptr. But std::iostream will call sgetc() to get char, so we will always call underflow() which does not advance the position.

inline _LIBCPP_HIDE_FROM_ABI_AFTER_V1 int_type sgetc() { if (__ninp_ == __einp_) return underflow(); return traits_type::to_int_type(*__ninp_); }

robot-ch-test-poll added the pr-improvement Pull request with some product improvements label Jan 11, 2023

vitlibar force-pushed the backup-to-s3-memory-optimization branch from 1f8e6d0 to 39153b0 Compare January 11, 2023 20:38

CheSema reviewed Jan 13, 2023

View reviewed changes

src/IO/S3/copyDataToS3.cpp Outdated Show resolved Hide resolved

CheSema reviewed Jan 14, 2023

View reviewed changes

src/IO/StdStreamBufFromReadBuffer.cpp Outdated Show resolved Hide resolved

nikitamikhaylov assigned CheSema Jan 15, 2023

vitlibar force-pushed the backup-to-s3-memory-optimization branch from 87e4f70 to a934fbf Compare January 16, 2023 15:13

vitlibar added 2 commits January 17, 2023 09:35

Copy files to S3 during backup directly without using WriteBufferFrom…

14a7ee8

…S3 to decrease memory consumption.

Split upload into parts of the same size for smooth uploading.

1c84518

Correctly use AbortMultipleUpload request. Support std::ios_base::end StdStreamBufFromReadBuffer::seekpos().

vitlibar force-pushed the backup-to-s3-memory-optimization branch from a934fbf to 1c84518 Compare January 17, 2023 08:36

Fix build.

0bea056

CheSema reviewed Jan 17, 2023

View reviewed changes

src/IO/StdStreamBufFromReadBuffer.cpp Outdated Show resolved Hide resolved

CheSema reviewed Jan 17, 2023

View reviewed changes

vitlibar added 2 commits January 17, 2023 17:56

Fix using std::ios_base::end in StdStreamFromReadBuffer::seekg().

2de4553

Abort multipart upload faster.

1a680b0

CheSema approved these changes Jan 18, 2023

View reviewed changes

vitlibar force-pushed the backup-to-s3-memory-optimization branch 2 times, most recently from 633d795 to 1a680b0 Compare January 20, 2023 16:29

vitlibar merged commit f0fda58 into ClickHouse:master Jan 21, 2023

vitlibar deleted the backup-to-s3-memory-optimization branch January 21, 2023 11:37

azat mentioned this pull request Jan 24, 2023

Fix possible in-use table after DETACH #45493

Merged

AVMusorin mentioned this pull request May 16, 2023

Fix metrics WriteBufferFromS3Bytes, WriteBufferFromS3Microseconds and WriteBufferFromS3RequestsErrors #49930

Merged

AVMusorin added a commit to AVMusorin/ClickHouse that referenced this pull request May 17, 2023

Fix metrics WriteBufferFromS3Bytes, WriteBufferFromS3Microseconds and…

7df4820

… WriteBufferFromS3RequestsErrors Ref: ClickHouse#45188

Lloyd-Pottiger reviewed Jan 11, 2024

View reviewed changes

		StdStreamFromReadBuffer(std::unique_ptr<ReadBuffer> buf, size_t size) : Base(&stream_buf), stream_buf(std::move(buf), size) { }
		StdStreamFromReadBuffer(ReadBuffer & buf, size_t size) : Base(&stream_buf), stream_buf(buf, size) { }

Conversation

vitlibar commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Uh oh!

Uh oh!

Uh oh!

CheSema commented Jan 14, 2023

Uh oh!

vitlibar commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CheSema commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vitlibar commented Jan 16, 2023

Uh oh!

CheSema Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vitlibar Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

vitlibar Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CheSema Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CheSema Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

vitlibar Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

vitlibar commented Jan 21, 2023

Uh oh!

Lloyd-Pottiger Jan 11, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vitlibar commented Jan 11, 2023 •

edited

Loading

vitlibar commented Jan 16, 2023 •

edited

Loading

CheSema commented Jan 16, 2023 •

edited

Loading

CheSema Jan 17, 2023 •

edited

Loading

CheSema Jan 17, 2023 •

edited

Loading