Skip to content

wallet: cache IsMine scriptPubKeys to improve performance of descriptor wallets#26008

Merged
fanquake merged 7 commits intobitcoin:masterfrom
achow101:improve-many-desc-ismine
Feb 20, 2024
Merged

wallet: cache IsMine scriptPubKeys to improve performance of descriptor wallets#26008
fanquake merged 7 commits intobitcoin:masterfrom
achow101:improve-many-desc-ismine

Conversation

@achow101
Copy link
Member

@achow101 achow101 commented Sep 4, 2022

Wallets that have a ton of non-ranged descriptors (such as a migrated non-HD wallet) perform fairly poorly due to looping through all of the wallet's ScriptPubKeyMans. This is done in various places, such as IsMine, and helper functions for fetching a ScriptPubKeyMan and a SolvingProvider. This also has a bit of a performance impact on standard descriptor wallets, although less noticeable due to the small number of SPKMs.

As these functions are based on doing IsMine for each ScriptPubKeyMan, we can improve this performance by caching IsMine scriptPubKeys for all descriptors and use that to determine which ScriptPubKeyMan to actually use for those things. This cache is used exclusively and we no longer iterate the SPKMs.

Also added a benchmark for IsMine.

@achow101 achow101 marked this pull request as ready for review September 4, 2022 23:03
@achow101 achow101 force-pushed the improve-many-desc-ismine branch from 4aa65c3 to 23dffde Compare September 4, 2022 23:10
@DrahtBot DrahtBot added the Wallet label Sep 5, 2022
@achow101 achow101 force-pushed the improve-many-desc-ismine branch from 23dffde to 34590dd Compare September 5, 2022 04:32
@achow101
Copy link
Member Author

achow101 commented Sep 5, 2022

Dropped the signing changes as they were causing issues with being able to sign transactions that we have the keys for but not the descriptors. #23417 would allow us to resolve the performance issues for signing while retaining this behavior.

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 5, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ryanofsky, josibake, furszy
Concept ACK aureleoules, darosior, willcl-ark

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29136 (wallet: addhdkey RPC to add just keys to wallets via new void(KEY) descriptor by achow101)
  • #29130 (wallet: Add createwalletdescriptor and gethdkeys RPCs for adding new automatically generated descriptors by achow101)
  • #28574 (wallet: optimize migration process, batch db transactions by furszy)
  • #28333 (wallet: Construct ScriptPubKeyMans with all data rather than loaded progressively by achow101)
  • #28201 (Silent Payments: sending by josibake)
  • #27865 (wallet: Track no-longer-spendable TXOs separately by achow101)
  • #27286 (wallet: Keep track of the wallet's own transaction outputs in memory by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Comment on lines 3471 to 3566
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m_uncached_spkms.insert(spkm);
if (!ranged) {
for (const auto& script : spkm->GetScriptPubKeys()) {
m_cached_spks[script].insert(spkm);
m_uncached_spkms.erase(spkm);
}
}
if (!ranged) {
for (const auto& script : spkm->GetScriptPubKeys()) {
m_cached_spks[script].insert(spkm);
}
}
else {
m_uncached_spkms.insert(spkm);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the way it was written as it ensures that no spkm will be accidentally missed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the reason to remove this validation ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is incorrect, unneeded, and caused a segfault.

It is incorrect because LoadWallet is supposed to be called before SetupDescriptorScriptPubKeyMans. Doing it after will break things that SetupDescriptorScriptPubKeyMans sets, which also caused a segfault. It is also entirely unneeded as loading an empty wallet doesn't do anything, and SetupDescriptorScriptPubKeyMans is doing all of the setup needed.

@achow101 achow101 force-pushed the improve-many-desc-ismine branch from 020f5b8 to f9f467b Compare September 21, 2022 15:11
Comment on lines 345 to 348
Copy link
Contributor

@w0xlt w0xlt Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the names below work better and make the purpose of the variable clearer.

Suggested change
std::unordered_map<CScript, std::unordered_set<ScriptPubKeyMan*>, SaltedSipHasher> m_cached_spks;
//! Set of which descriptors are not in m_cached_spks
std::unordered_set<ScriptPubKeyMan*> m_uncached_spkms;
std::unordered_map<CScript, std::unordered_set<ScriptPubKeyMan*>, SaltedSipHasher> m_ranged_spks;
//! Set of which descriptors are not in m_cached_spks
std::unordered_set<ScriptPubKeyMan*> m_non_ranged_spkms;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer describing the variable with how we expect to use it, rather than what it contains. We're using it as a cache of spkms, the type of spkm doesn't matter.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good, but adds complexity by adding two caches instead of accessing m_spk_managers directly.

Perhaps the PR description might include benchmark improvement as a reason for the added complexity.

Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@achow101 achow101 force-pushed the improve-many-desc-ismine branch from fd1b75e to 9b248d1 Compare December 2, 2022 19:24
@josibake
Copy link
Member

reACK a966473

@DrahtBot DrahtBot requested a review from furszy February 12, 2024 19:04
Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two more comments. They aren't really blocking but I'm still struggling with the memory consumption topic. Isn't std::unordered_map going to store all keys (entire scripts) in memory for duplicated so it can solve hash collisions?

Apart from that, just as an extra note for the future: It seems that we will need to rearrange code so that these functions aren't called so many times within the same process.

@DrahtBot DrahtBot requested a review from furszy February 12, 2024 19:44
@maflcko
Copy link
Member

maflcko commented Feb 14, 2024

Tested time loadwallet with my normal regtest wallet (#28510 (comment)), didn't review or check for correctness.

Before:

real	0m35.182s

After:

real	0m3.622s

@achow101 achow101 force-pushed the improve-many-desc-ismine branch from a966473 to e7c141b Compare February 15, 2024 01:26
@achow101
Copy link
Member Author

achow101 commented Feb 15, 2024

They aren't really blocking but I'm still struggling with the memory consumption topic. Isn't std::unordered_map going to store all keys (entire scripts) in memory for duplicated so it can solve hash collisions?

Yes.

I've done some memory usage profiling with a large wallet, and various combinations of ordered and unordered maps and sets.

The baseline memory usage of m_map_script_pub_keys, which is the same for all of the following, is 1.2 MiB

  • std::unordered_map<CScript, std::unorderd_set<ScriptPubKeyMan*>, SaltedSipHasher> m_cached_spks: 3.4 MiB
  • std::unordered_map<CScript, std::set<ScriptPubKeyMan*>, SaltedSipHasher> m_cached_spks: 2.6 MiB
  • std::unordered_map<CScript, std::vector<ScriptPubKeyMan*>, SaltedSipHasher> m_cached_spks: 1.2 MiB
  • std::map<CScript, std::unorderd_set<ScriptPubKeyMan*>, SaltedSipHasher> m_cached_spks: 3.7 MiB
  • std::map<CScript, std::set<ScriptPubKeyMan*>, SaltedSipHasher> m_cached_spks: 2.9 MiB
  • std::map<CScript, std::vector<ScriptPubKeyMan*>, SaltedSipHasher> m_cached_spks: 1.5 MiB

The benchmark indicates that there isn't really a meaningful difference between all of these approaches. It seems that not using unordered_* may be slightly faster as hashing is not needed.

In the latest push, I've changed to using std::unordered_map<CScript, std::vector<ScriptPubKeyMan*>, SaltedSipHasher> m_cached_spks so the memory consumption should be about double the size of all scriptPubKeys.

@achow101 achow101 force-pushed the improve-many-desc-ismine branch from e7c141b to b20d882 Compare February 15, 2024 01:54
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/21591404299

@maflcko
Copy link
Member

maflcko commented Feb 15, 2024

I re-checked b20d882fd53c21098eb7858b2dbca84eafd2b344 and re-confirmed time loadwallet (#26008 (comment))

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK b20d882fd53c21098eb7858b2dbca84eafd2b344. I don't have much insight into performance impact of this change, but it seems like it should make migrated legacy wallets much faster (10 times faster loading in one case: #26008 (comment)) even if it does increase memory usage, and it seems to be implemented correctly.

re: #26008 (review)

Apart from that, just as an extra note for the future: It seems that we will need to rearrange code so that these functions aren't called so many times within the same process.

I'm not sure what this is referring to, so would be curious

@achow101
Copy link
Member Author

I've also added one more commit that speeds up loading and migration as I noticed one spot where m_spk_managers was being iterated again during loading.

@josibake
Copy link
Member

ACK ac246f6

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ac246f68299d0dc208ae513dfad1a8fc91b5e6d4

Looks good, but wondering about a possible bug in the GetScriptPubKeyMans legacy case (see below)

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed. Need to update the second commit description.

After TopUp completes, the wallet containing each SPKM will want to know
what new scriptPubKeys were generated. In order for all TopUp calls
(including ones internal the the SPKM), we use a callback function in
the WalletStorage interface.
Have CWallet maintain a cache of all known scriptPubKeys for its
DescriptorSPKMs in order to improve performance of the functions that
require searching for scriptPubKeys.
Instead of iterating m_spk_managers a DescriptorSPKM has been loaded in
order to get it's ID to compare, have LoadDescriptorSPKM return a
reference to the loaded DescriptorSPKM so it can be queried directly.
@achow101
Copy link
Member Author

Need to update the second commit description.

Done

@furszy
Copy link
Member

furszy commented Feb 16, 2024

Apart from that, just as an extra note for the future: It seems that we will need to rearrange code so that these functions aren't called so many times within the same process.

I'm not sure what this is referring to, so would be curious

@ryanofsky.
An easy example is CWallet::AvailableCoins, which calls CWallet::IsMine first, few lines later calls CWallet::GetSolvingProvider and then calls CWallet::CalculateMaximumSignedInputSize which internally calls CWallet::GetSolvingProvider again. So, only this function, traverses the spkm map at least 3 times per output.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK e041ed9. Just suggested changes since last review

Copy link
Member

@josibake josibake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK e041ed9

Reviewed diff, looks good!

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK e041ed9

It isn't blocking, but I have to admit that I'm still not really happy with the doubled script storage. I think we can do better. Will be experimenting with possible solutions.
A first measure to decrease it and remain fast, without changing the structure, could be to use the cache only for the inactive SPKMs. Since the active ones are limited in number (max 8), they can be checked quickly. However, this approach does not address the issue of handling really large scripts, which could probably be resolved by changing the structure for a probabilistic one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.