Skip to content

Reverse cs_main, cs_wallet lock order and reduce cs_main locking#16426

Merged
maflcko merged 5 commits intobitcoin:masterfrom
ariard:2019-07-remove-more-locking-chain
May 1, 2020
Merged

Reverse cs_main, cs_wallet lock order and reduce cs_main locking#16426
maflcko merged 5 commits intobitcoin:masterfrom
ariard:2019-07-remove-more-locking-chain

Conversation

@ariard
Copy link

@ariard ariard commented Jul 20, 2019

This change is intended to make the bitcoin node and its rpc, network and gui interfaces more responsive while the wallet is in use. Currently, because the node's cs_main mutex is always locked before the wallet's cs_wallet mutex (to prevent deadlocks), cs_main currently stays locked while the wallet does relatively slow things like creating and listing transactions.

Switching the lock order so cs_main is acquired after cs_wallet allows cs_main to be only locked intermittently while the wallet is doing slow operations, so the node is not blocked waiting for the wallet.

To review the present PR, most of getting right the move is ensuring any LockAssertion in Chain::Lock method is amended as a LOCK(cs_main). And in final commit, check that any wallet code which was previously locking the chain is now calling a method, enforcing the lock taking job. So far the only exception I found is handleNotifications, which should be corrected.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 20, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ryanofsky
Copy link
Contributor

ryanofsky commented Jul 20, 2019

Great work! I'd suggest changing the PR title to something like "Reverse cs_main, cs_wallet lock order and reduce cs_main locking" to motivate it better and describe the change.

The current title "Remove Chain::Lock interface" and starting sentence "Follow-up in the set of Chain interface refactoring" make this sound mostly like a refactoring. But this is more of a locking change, and a change to make the wallet more asynchronous and event driven than a refactoring change.

@ariard ariard changed the title Remove Chain::Lock interface Reverse cs_main, cs_wallet lock order and reduce cs_main locking Jul 20, 2019
@practicalswift
Copy link
Contributor

Concept ACK

Excellent work!

@maflcko
Copy link
Member

maflcko commented Jul 22, 2019

Concept ACK. This might help IBD, because cs_main had to be acquired due to the lock order requirement. If it doesn't help IBD, the change will hopefully still speed up the msghand thread because the wallets take the main lock less.

@jnewbery
Copy link
Contributor

Big concept ACK! Thanks @ariard

@promag
Copy link
Contributor

promag commented Jul 25, 2019

Concept ACK, mother of god, not locking cs_main!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding an interfaces::Chain::checkFinalTx method, it might be better to call IsFinalTx directly with the wallet's height and time, avoiding going through CheckFinalTx.

This could perform better since it wouldn't require locking cs_main, and could make wallet calls return more internally consistent information when the wallet is catching up to the chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #16426 (comment)

Instead of adding an interfaces::Chain::checkFinalTx method, it might be better to call IsFinalTx

#17443 is a step in this direction (#17443 (review))

@ariard
Copy link
Author

ariard commented Nov 11, 2019

Finally rebased after merge of #15931, PR should be ready for review.

Apart of 3efe38d which use the new m_last_block_processed_height to avoid querying the chainstate and introduce some modifications, other commits are pretty straight-forward. It's just taking cs_main inside the Chain implementation instead of using Chain::lock. Lock order is reversed only in last commit f057aed to avoid any deadlock issue.

If you still feel PR is hard to review, I can subsplit easily in another PR.

@ryanofsky
Copy link
Contributor

Approach ACK. Code changes here are very simple after #15931.

All the changes before the last commit f057aedf80d8bd6083c2227e42d4887be4c2933b seem straightforward and don't change behavior other than locking cs_main in new places recursively, so all the new locks are no-ops.

Only the last commit f057aedf80d8bd6083c2227e42d4887be4c2933b is the big scary change that removes cs_main locks all over the wallet, leaving us to hope that remaining locking is sufficient and that stretches of wallet code that used to run under cs_main aren't making assumptions about the tip not changing and other wallet threads not running.


I'll review this PR more closely this week. It might be nice to simplify the PR description now that #15931 is merged. I think the description just basically needs to say that interfaces::Chain::Lock is a wrapper around around cs_main, and that this PR changes wallet code to not lock and unlock cs_main directly anymore, and not keep cs_main locked while it locks cs_wallet and does other work. Instead, after this PR, wallet code only locks cs_main indirectly and intermittently when it needs to look up bits of chain information, and never holds onto cs_main while it does other work.

@jnewbery
Copy link
Contributor

Restating my concept ACK. I plan to review this fully soon.

Thanks for rebasing this so quickly

@ariard ariard force-pushed the 2019-07-remove-more-locking-chain branch from f057aed to 8aea39a Compare November 11, 2019 21:50
@ariard
Copy link
Author

ariard commented Nov 11, 2019

Only the last commit f057aed is the big scary change that removes cs_main locks all over the wallet, leaving us to hope that remaining locking is sufficient and that stretches of wallet code that used to run under cs_main aren't making assumptions about the tip not changing and other wallet threads not running.

Most of the code making assumptions about the tip is confined in the rescan logic, so I think this one should get the focus, you can grep for all methods fetching tip like getHeight, getBlockHeight getX and check no decision is made on return value of 2 different calls of them. Concerning other wallet threads, it should be cover by wallet lock in itself.

@ariard ariard force-pushed the 2019-07-remove-more-locking-chain branch 5 times, most recently from 7ef1044 to 0bdbc43 Compare November 12, 2019 18:58
@ariard ariard force-pushed the 2019-07-remove-more-locking-chain branch from 0bdbc43 to f59c5c2 Compare November 21, 2019 19:42
@meshcollider
Copy link
Contributor

Concept ACK

@jonatack
Copy link
Member

jonatack commented May 1, 2020

All good, I agree!

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 2, 2020
7918c1b test: Add CreateWalletFromFile test (Russell Yanofsky)

Pull request description:

  Add unit test calling CreateWalletFromFile, which isn't currently called from other unit tests, with some basic checks to make sure it rescans and registers for notifications correctly.

  Motivation for this change was to try to write a test that would fail without the early `handleNotifications` call in ef8c6ca from bitcoin#16426, but succeed with it:

  https://github.com/bitcoin/bitcoin/blob/ef8c6ca60767cac589d98ca57ee33179608ccda8/src/wallet/wallet.cpp#L3978-L3986

  However, writing a full test for the race condition that call prevents isn't possible without the locking changes from bitcoin#16426. So this PR just adds as much test coverage as is possible now.

  This new test is also useful for bitcoin#15719, since it detects the stale notifications.transactionAddedToMempool notifications that PR eliminates.

ACKs for top commit:
  MarcoFalke:
    ACK 7918c1b
  jonatack:
    ACK 7918c1b

Tree-SHA512: 44035aee698ecb722c6039d061d8fac2011e9da0b314e4aff19be1d610b53cacff99016b34d6b84669bb3b61041b2318d9d8e3363658f087802ae4aa36ca17b8
@promag
Copy link
Contributor

promag commented May 3, 2020

I'm also still curious about @promag's comment in #16426 (review) and possible effects on the GUI.

@ryanofsky missed this, AFAICT this doesn't add new issues to the GUI.

domob1812 added a commit to domob1812/namecoin-core that referenced this pull request May 4, 2020
Upstream bitcoin/bitcoin#16426 reversed the lock
order between the wallet and cs_main.  In the name code, we had to
do the same updates in some places.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jun 1, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with stale and out of order notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This is new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jun 5, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jun 5, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with stale and out of order notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This is new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jun 5, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jun 5, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 1, 2020
This just drops two interfaces::Chain methods replacing them with other calls, not changing behavior in any way.

Motivation was

- Need to get rid of findFirstBlockWithTimeAndHeight for bitcoin#10102
- Followup from bitcoin#16426 (comment)
- phantomcircuit comments about findNextBlock test http://www.erisian.com.au/bitcoin-core-dev/log-2020-06-06.html#l-214
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 1, 2020
This just drops two interfaces::Chain methods replacing them with other calls.

Motivation for removing these chain methods:

- Need to get rid of findFirstBlockWithTimeAndHeight for bitcoin#10102, which doesn't
  support overloaded methods
- Followup from
  bitcoin#16426 (comment)
- phantomcircuit comments about findNextBlock test
  http://www.erisian.com.au/bitcoin-core-dev/log-2020-06-06.html#l-214

Behavior is not changing in any way here. A TODO comment in
ScanForWalletTransactions was removed, but just because it was invalid (see
bitcoin#19195 (comment)), not
because it was implemented.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 10, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 10, 2020
This just drops two interfaces::Chain methods replacing them with other calls.

Motivation for removing these chain methods:

- Need to get rid of findFirstBlockWithTimeAndHeight for bitcoin#10102, which doesn't
  support overloaded methods
- Followup from
  bitcoin#16426 (comment)
- phantomcircuit comments about findNextBlock test
  http://www.erisian.com.au/bitcoin-core-dev/log-2020-06-06.html#l-214

Behavior is not changing in any way here. A TODO comment in
ScanForWalletTransactions was removed, but just because it was invalid (see
bitcoin#19195 (comment)), not
because it was implemented.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 11, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 11, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 13, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 14, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jul 21, 2020
This just drops three interfaces::Chain methods replacing them with other calls.

Motivation for removing these chain methods:

- Need to get rid of findFirstBlockWithTimeAndHeight for bitcoin#10102, which doesn't
  support overloaded methods
- Followup from
  bitcoin#16426 (comment)
- phantomcircuit comments about findNextBlock test
  http://www.erisian.com.au/bitcoin-core-dev/log-2020-06-06.html#l-214

Behavior is not changing in any way here. A TODO comment in
ScanForWalletTransactions was removed, but just because it was invalid (see
bitcoin#19195 (comment)), not
because it was implemented.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Aug 7, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Aug 7, 2020
Move wallet startup code closer to a simple model where the wallet attaches to
the chain with a single chain.handleNotifications() call, and just starts
passively receiving blocks and mempool notifications from the last update point,
instead having to actively rescan blocks and request a mempool snapshot, and
deal with the tip changing, and deal with early or stale notifications.

Also, stop locking the cs_wallet mutex and registering for validationinterface
notifications before the rescan. This was new behavior since
6a72f26
bitcoin#16426 and is not ideal because it stops
other wallets and rpcs and the gui from receiving new notifications until after the
scan completes.

This change is a half-step towards implementing multiwallet parallel scans
(bitcoin#11756), since it provides needed
locator and birthday timestamp information to the Chain interface, and it
rationalizes locking and event ordering in the startup code. The second half of
implementing parallel rescans requires moving the ScanForWalletTransactions
implementation (which this PR does not touch) from the wallet to the node.
@hebasto
Copy link
Member

hebasto commented Sep 19, 2020

FWIW, this PR causes the issue #19049.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.