Avoid crash due to reading from remote server after disconnect in remote queries during cancellation#89740
Merged
azat merged 1 commit intoClickHouse:masterfrom Nov 10, 2025
Merged
Conversation
…uring cancellation
Previously only sendCancel() was protected against disconnection, but
RemoteQueryExecutor may read after sendCancel() and this will lead to
NULL pointer dereference:
2025.11.07 14:05:20.159362 [ 73554 ] {b7ebe401-aa59-4306-9663-278657eac110} <Debug> ReadBuffer: ReadBuffer is canceled by the exception: Code: 241. DB::Exception: Query memory tracker: fault injected. Would use 2.00 MiB (attempt to allocate chunk of 1.00 MiB), maximum: 9.31 GiB. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):
0. ./ci/tmp/build/./src/Common/Exception.cpp:131: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000013c3c7df
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c8902ce
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c88fd80
3. ./src/Common/Exception.h:141: DB::Exception::Exception<char const*, char const*, String, String, String>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<char const*>::type, std::type_identity<String>::type, std::type_identity<String>::type, std::type_identity<String>::type>, char const*&&, char const*&&, String&&, String&&, String&&) @ 0x0000000013cbcc21
4. ./ci/tmp/build/./src/Common/MemoryTracker.cpp:333: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x0000000013cba10e
5. ./ci/tmp/build/./src/Common/MemoryTracker.cpp:476: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x0000000013cb9dc4
6. ./ci/tmp/build/./src/Common/CurrentMemoryTracker.cpp:80: CurrentMemoryTracker::allocImpl(long, bool) @ 0x0000000013c0c079
7. ./src/Common/CurrentMemoryTracker.cpp:103: Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x0000000013c096c7
8. ./src/Common/PODArray.h:167: void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 0ul, 0ul>::reallocPowerOfTwoElements<>(unsigned long) @ 0x0000000013cc7f0e
9. ./src/Common/PODArray.h:243: DB::CompressedReadBufferBase::readCompressedData(unsigned long&, unsigned long&, bool) @ 0x000000001af881d7
10. ./ci/tmp/build/./src/Compression/CompressedReadBuffer.cpp:10: non-virtual thunk to DB::CompressedReadBuffer::nextImpl() @ 0x000000001af87922
11. ./ci/tmp/build/./src/IO/ReadBuffer.cpp:96: DB::ReadBuffer::next() @ 0x0000000013d245ed
12. ./src/IO/ReadBuffer.h:81: DB::NativeReader::read() @ 0x000000001a4e2e4f
13. ./ci/tmp/build/./src/Client/Connection.cpp:1422: DB::Connection::receiveDataImpl(DB::NativeReader&) @ 0x000000001a2edb59
14. ./ci/tmp/build/./src/Client/Connection.cpp:1434: DB::Connection::receivePacket() @ 0x000000001a2ed154
15. ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:398: DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x000000001a335768
16. ./ci/tmp/build/./src/QueryPipeline/RemoteQueryExecutorReadContext.cpp:67: DB::RemoteQueryExecutorReadContext::Task::run(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>, std::function<void ()>) @ 0x000000001779498d
17. ./ci/tmp/build/./src/Common/AsyncTaskExecutor.cpp:89: void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, FiberStack&, Fiber::RoutineImpl<DB::AsyncTaskExecutor::Routine>>>(boost::context::detail::transfer_t) @ 0x0000000017793e83
(version 25.10.1.3832 (official build))
2025.11.07 14:05:20.159570 [ 28135 ] {} <Debug> ReadBuffer: ReadBuffer is canceled by the exception: Code: 210. DB::NetException: Connection reset by peer, while reading from socket (peer: [::ffff:127.0.0.1]:57678, local: [::ffff:127.0.0.10]:9000). (NETWORK_ERROR), Stack trace (when copying this message, always include the lines below):
0. ./ci/tmp/build/./src/Common/Exception.cpp:131: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000013c3c7df
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c8902ce
2. ./src/Common/NetException.h:26: DB::NetException::NetException<String, String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&, String&&) @ 0x0000000013e6d433
3. ./ci/tmp/build/./src/IO/ReadBufferFromPocoSocket.cpp:83: DB::ReadBufferFromPocoSocketBase::socketReceiveBytesImpl(char*, unsigned long) @ 0x0000000013e6f17b
4. ./ci/tmp/build/./src/IO/ReadBufferFromPocoSocket.cpp:107: DB::ReadBufferFromPocoSocketBase::nextImpl() @ 0x0000000013e6fa75
5. ./ci/tmp/build/./src/IO/ReadBuffer.cpp:96: DB::ReadBuffer::next() @ 0x0000000013d245ed
6. ./src/IO/ReadBuffer.h:81: DB::TCPHandler::runImpl() @ 0x000000001a46b7b0
7. ./ci/tmp/build/./src/Server/TCPHandler.cpp:2836: DB::TCPHandler::run() @ 0x000000001a48ec99
8. ./ci/tmp/build/./base/poco/Net/src/TCPServerConnection.cpp:40: Poco::Net::TCPServerConnection::start() @ 0x000000001f54aa87
9. ./ci/tmp/build/./base/poco/Net/src/TCPServerDispatcher.cpp:115: Poco::Net::TCPServerDispatcher::run() @ 0x000000001f54af19
10. ./ci/tmp/build/./base/poco/Foundation/src/ThreadPool.cpp:205: Poco::PooledThread::run() @ 0x000000001f5116c7
11. ./base/poco/Foundation/src/Thread_POSIX.cpp:341: Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001f50fac1
12. ? @ 0x0000000000094ac3
13. ? @ 0x00000000001268c0
(version 25.10.1.3832 (official build))
2025.11.07 14:05:20.159568 [ 63583 ] {b7ebe401-aa59-4306-9663-278657eac110} <Trace> ReadFromParallelRemoteReplicasStep: (127.0.0.10:9000) Cancelling query because enough data has been read
BaseDaemon: Address: 0x8. Access: read. Address not mapped to object.
BaseDaemon: Stack trace: 0x000000001a2ebf7b 0x000000001a335768 0x000000001a33547e 0x000000001777d565 0x000000001a53597a 0x000000001a5300e6 0x000000001a534103 0x0000000013d908eb 0x0000000013d97d66 0x0000000013d8d812 0x0000000013d9549a 0x00007f521e393ac3 0x00007f521e4258c0
BaseDaemon: 2.0. inlined from ./src/IO/VarInt.h:98: DB::readVarUInt(unsigned long&, DB::ReadBuffer&)
BaseDaemon: 2. ./ci/tmp/build/./src/Client/Connection.cpp:1315: DB::Connection::receivePacket() @ 0x000000001a2ebf7b
BaseDaemon: 3. ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:398: DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x000000001a335768
BaseDaemon: 4. ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:253: DB::MultiplexedConnections::receivePacket() @ 0x000000001a33547e
BaseDaemon: 5. ./ci/tmp/build/./src/QueryPipeline/RemoteQueryExecutor.cpp:808: DB::RemoteQueryExecutor::finish() @ 0x000000001777d565
So let's simply throw in sendCancel() if the connection was lost
Contributor
|
Workflow [PR], commit [0d8387c] Summary: ❌
|
shankar-iyer
approved these changes
Nov 9, 2025
Member
Author
|
CI:
|
Merged
via the queue into
ClickHouse:master
with commit Nov 10, 2025
afc4cac
123 of 131 checks passed
robot-ch-test-poll4
added a commit
that referenced
this pull request
Nov 24, 2025
Cherry pick #89740 to 25.8: Avoid crash due to reading from remote server after disconnect in remote queries during cancellation
robot-clickhouse
added a commit
that referenced
this pull request
Nov 24, 2025
…r after disconnect in remote queries during cancellation
This was referenced Nov 24, 2025
robot-ch-test-poll4
added a commit
that referenced
this pull request
Nov 24, 2025
Cherry pick #89740 to 25.9: Avoid crash due to reading from remote server after disconnect in remote queries during cancellation
robot-clickhouse
added a commit
that referenced
this pull request
Nov 24, 2025
…r after disconnect in remote queries during cancellation
This was referenced Nov 24, 2025
robot-ch-test-poll4
added a commit
that referenced
this pull request
Nov 24, 2025
Cherry pick #89740 to 25.10: Avoid crash due to reading from remote server after disconnect in remote queries during cancellation
robot-clickhouse
added a commit
that referenced
this pull request
Nov 24, 2025
…er after disconnect in remote queries during cancellation
clickhouse-gh bot
added a commit
that referenced
this pull request
Nov 24, 2025
Backport #89740 to 25.10: Avoid crash due to reading from remote server after disconnect in remote queries during cancellation
rienath
added a commit
that referenced
this pull request
Nov 25, 2025
Backport #89740 to 25.8: Avoid crash due to reading from remote server after disconnect in remote queries during cancellation
rienath
added a commit
that referenced
this pull request
Nov 25, 2025
Backport #89740 to 25.9: Avoid crash due to reading from remote server after disconnect in remote queries during cancellation
Member
|
Sorry, I thought I was typing in console |
AVMusorin
pushed a commit
to AVMusorin/ClickHouse
that referenced
this pull request
Jan 28, 2026
The race condition causes a crash when reading from a disconnected
connection in distributed queries.
The `in` buffer:
```
(gdb) frame 1
(gdb) print this->in
$19 = {__ptr_ = 0x0, __cntrl_ = 0x0}
(gdb) print this->connected
$18 = false
```
But the same time:
```
(gdb) frame 2
async_callback=...) at ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:398
(gdb) print this->sent_query
$15 = true
(gdb) print this->cancelled
$16 = false
(gdb) print this->active_connection_count
$17 = 1
```
Current solution will not fix the issue, it just apply a check like in
`sendCancel()`.
Related PRs:
- ClickHouse#89740
- ClickHouse#92807
- ClickHouse#93029
batkovic75
pushed a commit
to batkovic75/ClickHouse
that referenced
this pull request
Jan 30, 2026
The race condition causes a crash when reading from a disconnected
connection in distributed queries.
The `in` buffer:
```
(gdb) frame 1
(gdb) print this->in
$19 = {__ptr_ = 0x0, __cntrl_ = 0x0}
(gdb) print this->connected
$18 = false
```
But the same time:
```
(gdb) frame 2
async_callback=...) at ./ci/tmp/build/./src/Client/MultiplexedConnections.cpp:398
(gdb) print this->sent_query
$15 = true
(gdb) print this->cancelled
$16 = false
(gdb) print this->active_connection_count
$17 = 1
```
Current solution will not fix the issue, it just apply a check like in
`sendCancel()`.
Related PRs:
- ClickHouse#89740
- ClickHouse#92807
- ClickHouse#93029
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Avoid crash due to reading from remote server after disconnect in remote queries during cancellation. Resolves #89468
Details
Previously only
sendCancel()was protected against disconnection, butRemoteQueryExecutormay read aftersendCancel()and this will lead to NULL pointer dereference:So let's simply throw in
sendCancel()if the connection was lost.CI: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=89712&sha=9eebfa48a4c58a959ae9b57f3fa6e6eaa5bac61e&name_0=PR