miner: Avoid stack-use-after-return in validationinterface#18742
miner: Avoid stack-use-after-return in validationinterface#18742fanquake merged 4 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Oh, great find! How did you find this one and in what commit was it introduced? :) |
Good question. This is the traceback: |
|
I might write a test tomorrow with steps to reproduce |
faadd38 to
fab402a
Compare
fab402a to
bbbbf38
Compare
|
@practicalswift This is a race, so I couldn't find a test that is reproducible. Though, I have attached a unit test, that fails when run long enough in a loop. All you need to do is |
|
If someone has issues reproducing the crash in the unit test, the following diff might help: diff --git a/src/validationinterface.cpp b/src/validationinterface.cpp
index 11000774c0..6a311eeb44 100644
--- a/src/validationinterface.cpp
+++ b/src/validationinterface.cpp
@@ -11,6 +11,7 @@
#include <primitives/block.h>
#include <primitives/transaction.h>
#include <scheduler.h>
+#include <util/time.h>
#include <future>
#include <unordered_map>
@@ -80,6 +81,7 @@ public:
++it->count;
{
REVERSE_LOCK(lock);
+ UninterruptibleSleep(std::chrono::microseconds(500));
f(*it->callbacks);
}
it = --it->count ? std::next(it) : m_list.erase(it); |
|
Concept ACK. Running just fa9f3a51897d5457492f84f7c8a41fea783d687d (no additional patch), I generally see a crash between 5 - 20 iterations: /bitcoin# while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done
Running 1 test case...
*** No errors detected
Running 1 test case...
*** No errors detected
Running 1 test case...
*** No errors detected
Running 1 test case...
*** No errors detected
Running 1 test case...
=================================================================
==95792==ERROR: AddressSanitizer: unknown-crash on address 0x7f8f04f60dc0 at pc 0x55abc9388623 bp 0x7f8f057777f0 sp 0x7f8f057777e8
READ of size 8 at 0x7f8f04f60dc0 thread T4
#0 0x55abc9388622 in CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14::operator()(CValidationInterface&) const /bitcoin/src/validationinterface.cpp:241:75
#1 0x55abc9388622 in void MainSignalsInstance::Iterate<CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14>(CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14&&) /bitcoin/src/validationinterface.cpp:83
#2 0x55abc9388622 in CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&) /bitcoin/src/validationinterface.cpp:241
#3 0x55abc8b9c94c in validationinterface_tests::unregister_validation_interface_race::test_method()::$_0::operator()() const /bitcoin/src/test/validationinterface_tests.cpp:28:30
#4 0x55abc8b9c94c in void std::__invoke_impl<void, validationinterface_tests::unregister_validation_interface_race::test_method()::$_0>(std::__invoke_other, validationinterface_tests::unregister_validation_interface_race::test_method()::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:60
#5 0x55abc8b9c94c in std::__invoke_result<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0>::type std::__invoke<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0>(validationinterface_tests::unregister_validation_interface_race::test_method()::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:95
#6 0x55abc8b9c94c in decltype(std::__invoke(_S_declval<0ul>())) std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:244
#7 0x55abc8b9c94c in std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:253
#8 0x55abc8b9c94c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196
#9 0x7f8f0c6d4b2e (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbbb2e)
#10 0x7f8f0cccefa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)
#11 0x7f8f0c3a34ce in clone (/lib/x86_64-linux-gnu/libc.so.6+0xf94ce)
Address 0x7f8f04f60dc0 is located in stack of thread T5 at offset 32 in frame
#0 0x55abc8b9d1af in std::thread::_State_impl<std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_1> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196
This frame has 1 object(s):
[32, 40) 'sub.i.i.i.i.i' <== Memory access at offset 32 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
Thread T5 created by T0 here:
#0 0x55abc7d1002d in pthread_create (/bitcoin/src/test/test_bitcoin+0x37702d)
#1 0x7f8f0c6d4db4 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbbdb4)
#2 0x55abc8b97b1e in validationinterface_tests::unregister_validation_interface_race_invoker() /bitcoin/src/test/validationinterface_tests.cpp:19:1
#3 0x55abc7ded77f in boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:118:11
SUMMARY: AddressSanitizer: unknown-crash /bitcoin/src/validationinterface.cpp:241:75 in CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14::operator()(CValidationInterface&) const
Shadow bytes around the buggy address:
0x0ff2609e4160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e4170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e4180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e4190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e41a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0ff2609e41b0: 00 00 00 00 00 00 00 00[00]00 00 00 00 00 00 00
0x0ff2609e41c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e41d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e41e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e41f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e4200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
Thread T4 created by T0 here:
#0 0x55abc7d1002d in pthread_create (/bitcoin/src/test/test_bitcoin+0x37702d)
#1 0x7f8f0c6d4db4 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbbdb4)
#2 0x55abc8b97b1e in validationinterface_tests::unregister_validation_interface_race_invoker() /bitcoin/src/test/validationinterface_tests.cpp:19:1
#3 0x55abc7ded77f in boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:118:11
==95792==ABORTING |
|
cc @promag , @ryanofsky You seem qualified to review this? |
promag
left a comment
There was a problem hiding this comment.
Code review ACK bbbbf387d9a35f095a80985810a461f87a21ebde.
Like @ryanofsky suggested, we should only have RegisterValidationInterface(shared_ptr) so this is a step in that direction.
src/rpc/mining.cpp
Outdated
There was a problem hiding this comment.
An alternative is to call SyncWithValidationInterfaceQueue here.
There was a problem hiding this comment.
Alternative to what? The notification we are listening to is never put in the queue.
There was a problem hiding this comment.
I mean that after UnregisterValidationInterface + SyncWithValidationInterfaceQueue there is no pointer to sc in validation queue.
There was a problem hiding this comment.
How does SyncWithValidationInterfaceQueue prevent new notifications from being scheduled for sc?
There was a problem hiding this comment.
New notifications won't be scheduled because the UnregisterValidationInterface above. Only problem is a concurrent notification being called, so SyncWithValidationInterfaceQueue ensures that case doesn't happen.
There was a problem hiding this comment.
Now I understand. Yeah, correct.
|
Could you revert or move formatting changes in validationinterface.h/cpp files to a separate commit? I'm starting at that looking for actual changes but is it all reformatting? |
|
Concept ACK. This is a good find and all changes look ok, but there's so much going on it's a headache to reverse engineer everything that's happening. Suggest making one change per commit or perhaps dropping some of the changes |
|
It is also interesting if the bug described #18742 (comment) started happening recently. Unless I missed something, it seems like the bug has been present a very long time, and I wonder if #18524 made it more likely to happen. |
bbbbf38 to
fa33a63
Compare
|
Addressed feedback by @ryanofsky. No code changes, just more commits. |
ryanofsky
left a comment
There was a problem hiding this comment.
Code review ACK fa33a632bf0240794561dbabb8fe14e999ac6504. Great find and changes are very clear now!
|
@jnewbery in this case it's the test that's fixed so what you suggest is to just add the |
|
I can put the test in
|
This commit is (intentionally) adding a broken test. The test is broken because it registering a subscriber object that can go out of scope while events are still being sent. To run the broken test and reproduce the bug: - Remove comment /** and */ - ./configure --with-sanitizers=address - export ASAN_OPTIONS=detect_leaks=0 - make - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done
…hValidationInterfaceQueue For the purpose of this test the two have the same outcome, but this one is shorter and avoids a sleep for 0.1 seconds.
This is achieved by switching to a shared_ptr. Also, switch the validationinterfaces in the tests to use shared_ptrs for the same reason.
fa6d1a0 to
7777f2a
Compare
|
Pushed an empty diff to defuse the test. Added instructions to the commit message on how to add the fuse back in. |
|
@MarcoFalke did you see #18742 (comment)? |
|
@promag Yes, but that causes a deadlock on shutdown. Also the shutdown sequence should be bug free right now because the message handler is stopped before any unregisters should happen. I still like your idea, but I think it should be prepared, reviewed and merged for 0.21, not as a backport. For the backport we should aim at a fix that has little chance of breaking stuff. The fix here should satisfy this because the same fix has already been applied to the wallet in #18338, which is also in 0.20 |
|
Code review ACK 7777f2a |
|
Code review ACK 7777f2a. |
|
For backport, only the first and last commit are needed. |
Thanks. Will have this up in #18973 shortly. |
|
post merge code review ACK 7777f2a |
This commit is (intentionally) adding a broken test. The test is broken because it registering a subscriber object that can go out of scope while events are still being sent. To run the broken test and reproduce the bug: - Remove comment /** and */ - ./configure --with-sanitizers=address - export ASAN_OPTIONS=detect_leaks=0 - make - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done Github-Pull: bitcoin#18742 Rebased-From: fab6d06
This is achieved by switching to a shared_ptr. Also, switch the validationinterfaces in the tests to use shared_ptrs for the same reason. Github-Pull: bitcoin#18742 Rebased-From: 7777f2a
245c862 test: disable script fuzz tests (fanquake) 9a8fb4c fuzz: Remove enumeration of expected deserialization exceptions in ProcessMessage(...) fuzzer (practicalswift) 6161c94 [net processing] Only send a getheaders for one block in an INV (John Newbery) cf2a6e2 test: Remove const to work around compiler error on xenial (Wladimir J. van der Laan) cc7d344 miner: Avoid stack-use-after-return in validationinterface (MarcoFalke) 37a6207 test: Add unregister_validation_interface_race test (MarcoFalke) ff4dc20 gui: Fix manual coin control with multiple wallets loaded (João Barbosa) ed0afe8 test: Add test for conflicted wallet tx notifications (Russell Yanofsky) 251e321 rpc: Relock wallet only if most recent callback (João Barbosa) ca4dac4 rpc: Add mutex to guard deadlineTimers (João Barbosa) a3fe458 [docs] Improve commenting in ProcessGetData() (John Newbery) 011532e [test] test that an invalid GETDATA doesn't prevent processing of future messages (Amiti Uttarwar) 1e73d72 [net processing] ignore unknown INV types in GETDATA messages (Amiti Uttarwar) fb82173 [net processing] ignore tx GETDATA from blocks-only peers (Amiti Uttarwar) 315ae14 gui: Fix itemWalletAddress leak when not tree mode (João Barbosa) Pull request description: Backports the following PRs to the 0.20 branch: * #18578: gui: Fix leak in CoinControlDialog::updateView * #18808: [net processing] Drop unknown types in getdata * #18814: rpc: Relock wallet only if most recent callback * #18878: test: Add test for conflicted wallet tx notifications * #18894: gui: Fix manual coin control with multiple wallets loaded * #18742: miner: Avoid stack-use-after-return in validationinterface * #18962: net processing: Only send a getheaders for one block in an INV * #18975: test: Remove const to work around compiler error on xenial ACKs for top commit: promag: Tested ACK 245c862 coin control with multiple wallets. laanwj: ACK 245c862 MarcoFalke: ACK 245c862 solved the conflicts myself as a sanity check. Did not re-review 🍷 Tree-SHA512: 285e5a5fad5bbeba6032742c65dc68836e8eccfcceda9e69fec4ddd162a3f61679a96f9bbe3d434267835af67c21ac4c05accf6f63e827c2eb47203c6daafe31
Summary: This is a backport of Core [[bitcoin/bitcoin#18742 | PR18742]] [1/3] bitcoin/bitcoin@fa770ce Test Plan: `ninja` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D9103
…hValidationInterfaceQueue Summary: For the purpose of this test the two have the same outcome, but this one is shorter and avoids a sleep for 0.1 seconds. This is a backport of Core [[bitcoin/bitcoin#18742 | PR18742]] [2/3] bitcoin/bitcoin@fa5ceb2 Depends on D9103 Test Plan: `ninja check` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D9104
Summary: This is a squash of 2 commits, to add the unbroken test and the solution at the same time, and a revert of D5140 (different solution to the same problem) > test: Add unregister_validation_interface_race test bitcoin/bitcoin@fab6d06 > miner: Avoid stack-use-after-return in validationinterface > > This is achieved by switching to a shared_ptr. > > Also, switch the validationinterfaces in the tests to use shared_ptrs > for the same reason. bitcoin/bitcoin@7777f2a This is a backport of Core [[bitcoin/bitcoin#18742 | PR18742]] [3/3] This replaces the solution in D5140 See comment https://reviews.bitcoinabc.org/D5140#124338 Depends on D9104 Test Plan: Test suggested by this PR: ``` export ASAN_OPTIONS=detect_leaks=0 ninja && ninja check while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done ``` Test from D5140: ``` CC=clang CXX=clang++ cmake -GNinja .. -DCMAKE_BUILD_TYPE=Debug ninja check-extended ``` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Subscribers: Fabien Differential Revision: https://reviews.bitcoinabc.org/D9105
This commit is (intentionally) adding a broken test. The test is broken because it registering a subscriber object that can go out of scope while events are still being sent. To run the broken test and reproduce the bug: - Remove comment /** and */ - ./configure --with-sanitizers=address - export ASAN_OPTIONS=detect_leaks=0 - make - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done Github-Pull: #18742 Rebased-From: fab6d06
This is achieved by switching to a shared_ptr. Also, switch the validationinterfaces in the tests to use shared_ptrs for the same reason. Github-Pull: #18742 Rebased-From: 7777f2a
When a validationinterface has itself unregistered in one thread, but is about to get executed in another thread [1], there is a race:
[1]
bitcoin/src/validationinterface.cpp
Lines 82 to 83 in 6413980
This happens in the miner. More generally it happens everywhere where at least one thread is generating notifications and another one is unregistering a validationinterface.
This issue has been fixed in commit ab31b9d, but the fix has not been applied to the miner.
Example where this happened in practice: https://travis-ci.org/github/bitcoin/bitcoin/jobs/675322230#L4414