Skip to content

wallet: Migrate legacy wallets to descriptor wallets without requiring BDB#26596

Merged
glozow merged 6 commits intobitcoin:masterfrom
achow101:indep-desc-migrate2
Jul 11, 2024
Merged

wallet: Migrate legacy wallets to descriptor wallets without requiring BDB#26596
glozow merged 6 commits intobitcoin:masterfrom
achow101:indep-desc-migrate2

Conversation

@achow101
Copy link
Member

@achow101 achow101 commented Nov 28, 2022

#26606 introduced BerkeleyRODatabase which is an independent parser for BDB files. This PR uses this in legacy wallet migration so that migration will continue to work once the legacy wallet and BDB are removed. LegacyDataSPKM is introduced to have the minimum data and functions necessary for a legacy wallet to be loaded for migration.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 28, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK cbergqvist, theStack, furszy

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29124 (wallet: Automatically repair corrupted metadata with doubled derivation path by achow101)
  • #28574 (wallet: optimize migration process, batch db transactions by furszy)
  • #28333 (wallet: Construct ScriptPubKeyMans with all data rather than loaded progressively by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@achow101 achow101 force-pushed the indep-desc-migrate2 branch 2 times, most recently from 9d6ce8a to 03fdbaf Compare November 29, 2022 00:44
@Sjors
Copy link
Member

Sjors commented Nov 29, 2022

I think it would be easier to review if you made a separate PR for BerkeleyRODatabase and have it work as a regular, albeit read-only, wallet database backend.

I'm pleasantly surprised at how small the implementation is.

@luke-jr
Copy link
Member

luke-jr commented Nov 29, 2022

I don't see how reimplementing BDB is better than just continuing to use BDB.

@achow101 achow101 force-pushed the indep-desc-migrate2 branch from bb4ea93 to a8060af Compare November 29, 2022 21:22
@achow101
Copy link
Member Author

I don't see how reimplementing BDB is better than just continuing to use BDB.

The reimplementation is quite compact because it is for only reading BDB files, only the features that we use, and none of the actual DB system stuff. BDB itself contains a ton of code, a lot of which we don't use. Maintaining it as a dependency will eventually be more burdensome - we already need to patch it in order to compile in the depends system. This reimplementation does not carry those issues - it implements a format that isn't going to change, and it only depends on things that are already being used elsewhere in the codebase.

@achow101
Copy link
Member Author

I think it would be easier to review if you made a separate PR for BerkeleyRODatabase and have it work as a regular, albeit read-only, wallet database backend.

I've opeend #26606 for just BerkeleyRODatabase and its use in wallettool's dump.

@achow101
Copy link
Member Author

Rebased following #26606. Now ready for review.

Copy link
Contributor

@cbergqvist cbergqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code reviewed e11fa4524723aebc0d879ee083c2d6d46849d89a.

Overall concept: Not fully convinced about the benefit of breaking out LegacyDataSPKM, seems to add complexity. Is the goal to delete LegacyScriptPubKeyMan in the future when ripping out BDB and only keep LegacyDataSPKM?

Why perform the inlining of IsMine() in 31e918afc3920a2f8aa367ad323f590b4856aa26 before the function is removed? Seems risky in case of other PRs modifying it prior to removal etc.

LegacyScriptPubKeyMan::NewKeyPool() only touches data from LegacyDataSPKM, should it be moved to that class as public (or protected and then exposed in the subclass?). (If LegacyScriptPubKeyMan is slated for removal I can understand you want to signal that method will no longer be needed). (Have not checked if there are other methods along the same lines).

Assuming LegacyDataSPKM name is kept, comment in src/script/signingprovider.h could be updated to refer to LegacyDataSPKM::setWatchOnly instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a comment about the consequences of setting keypool_size=0? I'm guessing there are still cases where we generate some pubkeys despite having a zero pool size, otherwise you would have removed some of the code and the for-loop below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to have any additional explanation. keypool_size only matters for ranged descriptors. Non-HD keys do not make ranged descriptors, so the value here is ignored. The loop below is for an entirely different set of descriptors and is unrelated to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't fully grokked the code yet, so I understand if you'd rather not answer my questions, but if you want:

I don't think this needs to have any additional explanation. keypool_size only matters for ranged descriptors. Non-HD keys do not make ranged descriptors, so the value here is ignored.

We're in an HD chain for-loop, how is this not involving HD keys?
Surely legacy wallets could be HD wallets.

The loop below is for an entirely different set of descriptors and is unrelated to this.

I'm referring to the loop over desc_spks which is using the pubkeys from the DescriptorScriptPubKeyMan that's just been created and comes from DescriptorScriptPubKeyMan::m_map_script_pub_keys which is filled out in DescriptorScriptPubKeyMan::TopUpWithDB() which is affected by m_keypool_size.

Would m_keypool_size already always be 0 before this change?

Copy link
Member Author

@achow101 achow101 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line this comment is on is in the one that goes over the non-HD keys. However, I see that you're confused about the similar line in the loop below that iterates the hd chains.

The keypool_size here is a runtime value that depends on the startup options. It's is set when the DescriptorScriptPubKeyman is instantiated on loading, and the value is never directly persisted to disk. It ends up written as part of range_end when range_end is updated during TopUp(), but range_end can also be set directly.

The value that does matter is the range_end stored in WalletDescriptor. When we are going over the hd chains, we set range_end to match the chain counter stored in the CHDChain that is being migrated. This results in a descriptor that will produce exactly the same scriptPubKeys as the original hd chain, including it's keypool. We are enforcing that the migrated descriptors produce exactly the same scriptPubKeys (no more, no less). Setting keypool_size could mean that TopUp() would generate more scriptPubKeys than we need/expect, which would cause migration to assert and crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to clarify this matter!

@achow101
Copy link
Member Author

achow101 commented Jun 6, 2024

Is the goal to delete LegacyScriptPubKeyMan in the future when ripping out BDB and only keep LegacyDataSPKM?

Yes.

We are trying to remove the legacy wallet in its entirety, leaving only behind the absolute minimum necessary to perform migration.

Why perform the inlining of IsMine() in 31e918a before the function is removed? Seems risky in case of other PRs modifying it prior to removal etc.

IsMine() is unlikely to change. Besides the fact that any changes to the legacy wallet are likely to be closed instead of seriously considered, IsMine() has not had any material changes in the last several years. It could be included in #28710 but that PR is big enough as it is.

LegacyScriptPubKeyMan::NewKeyPool() only touches data from LegacyDataSPKM, should it be moved to that class as public (or protected and then exposed in the subclass?)

No. The separation exists now to indicate which things can be removed at the end. LegacyDataSPKM is the bare minimum that must stick around for migration, everything else remains in LegacyScriptPubKeyMan and will be removed at the same time.

Comment on lines 369 to 372
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Implements the full legacy wallet behavior
class LegacyScriptPubKeyMan : public LegacyDataSPKM
// Implements the full legacy wallet behavior
// Slated for deletion once Berkeley DB library dependency is removed.
class LegacyScriptPubKeyMan : public LegacyDataSPKM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to have a comment saying such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code is unable to communicate why it is like it is, it's good to explain the otherwise surprising state of it. It would hopefully limit the amount of questions like my prior one from #26596 (review):

LegacyScriptPubKeyMan::NewKeyPool() only touches data from LegacyDataSPKM, should it be moved to that class as public (or protected and then exposed in the subclass?).

But I guess you're fairly confident it will be deleted in time for the next release anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't fully grokked the code yet, so I understand if you'd rather not answer my questions, but if you want:

I don't think this needs to have any additional explanation. keypool_size only matters for ranged descriptors. Non-HD keys do not make ranged descriptors, so the value here is ignored.

We're in an HD chain for-loop, how is this not involving HD keys?
Surely legacy wallets could be HD wallets.

The loop below is for an entirely different set of descriptors and is unrelated to this.

I'm referring to the loop over desc_spks which is using the pubkeys from the DescriptorScriptPubKeyMan that's just been created and comes from DescriptorScriptPubKeyMan::m_map_script_pub_keys which is filled out in DescriptorScriptPubKeyMan::TopUpWithDB() which is affected by m_keypool_size.

Would m_keypool_size already always be 0 before this change?

Copy link
Contributor

@cbergqvist cbergqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a8f8a2d5d0e2328cbe2e569c7e12f653d18f456d

(Ran tests on 31632d18ea122def82820770f14d9aa009726114 but only comments changed since then).
Functional tests including --extended passed (a few automatically skipped).
make check passed.

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Comment on lines +4373 to +4394
if (!IsBDBFile(BDBDataFile(*wallet_path))) {
return util::Error{_("Error: This wallet is already a descriptor wallet")};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in commit f732495: now that the legacy-wallet-check is done without loading the wallet, is the check for the descriptor flag below still needed? (can a BDB wallet ever have this flag set?)

    // Before anything else, check if there is something to migrate.
    if (local_wallet->IsWalletFlagSet(WALLET_FLAG_DESCRIPTORS)) {
        if (was_loaded) {
            reload_wallet(local_wallet);
        }
        return util::Error{_("Error: This wallet is already a descriptor wallet")};
    }

https://github.com/achow101/bitcoin/blob/a8f8a2d5d0e2328cbe2e569c7e12f653d18f456d/src/wallet/wallet.cpp#L4425-L4431

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is theoretically possible to have a descriptor wallet using BDB under 2 scenarios, however this is still an unsupported configuration:

  1. The descriptor wallet was created on an unreleased version after Native Descriptor Wallets using DescriptorScriptPubKeyMan #16528 but before wallet: Add sqlite as an alternative wallet database and use it for new descriptor wallets #19077 (there was a period of about 6 months where this could have happened)
  2. The user manipulated the database using external tooling where they created a BDB file and manually inserted descriptor wallet records

Copy link
Member

@furszy furszy Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really an issue but this works because TopUp sets the descriptor range_end to 1 on the first run when the descriptor is not ranged.
As this would fail if we ever change that, what if we set the range properly in WalletDescriptor constructor?
E.g.

WalletDescriptor w_desc(std::move(desc), creation_time, /*range_start=*/0, /*range_end=*/1, /*next_index=*/0)

Also, this change doesn't seem to be related to c653f4fdbfe06 description? --> nvm about this last point, theStack just made me notice that this is because m_keypool_size is not accesible from LegacyDataSPKM.

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small comment; you don't have to take it.

I believe this PR could get merged quite fast if you leave 89503710386f52d9162f67fcd707eceffa954faa for a follow-up and make IsMine compatible with LegacyDataSPKM in this PR.

I'm suggesting this because this PR starts testing the newly introduced BDB reader class and allows people to use it, which is great since we want it polished for the next release. The IsMine removal is also nice but I wouldn't miss a deadline for it.

@theStack
Copy link
Contributor

Still trying to fully grasp the IsMine inlining commit 89503710386f52d9162f67fcd707eceffa954faa, meanwhile a few comments:

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't really an issue because CheckDecryptionKey does not verify the outcome of the write call (which would be "successful" because the read-only bdb class return true for all write operations). But.. for the sake of correctness I see two possible paths to follow:

  1. As this write is done exclusively on wallets with no checksum on crypted keys, we could move the upgrade code to the loading code in walletdb.cpp; removing fDecryptionThoroughlyChecked entirely and requiring such ancient wallets to provide the passphrase during load to perform the upgrade (adding a "passphrase" arg to the loadwallet RPC command).

  2. Introduce a new WalletBatch::IsReadOnly() method to avoid calling the write inside CheckDecryptionKey.

The first approach will make the legacy wallet code simpler while the second one is a minimal change.

@achow101
Copy link
Member Author

achow101 commented Jun 24, 2024

I believe this PR could get merged quite fast if you leave 8950371 for a follow-up and make IsMine compatible with LegacyDataSPKM in this PR.

Moved to #30328

The IsMine removal is also nice but I wouldn't miss a deadline for it.

I think it needs to be in the next release as it is a critical part of migration. I would prefer to have it in a release that people can use before deleting the legacy wallet.

Updated

  • in c653f4f: typo in commit subject (s/MigrateToLegacy/MigrateToDescriptor/)

Fixed

No, we should not be migrating a wallet if we cannot decrypt the keys. BerkeleyRODatabase always returns true for all write operations so it doesn't matter. Also, I don't think it is useful to invert that and have to have specific bypasses for BerkeleyRODatabase in all functions that attempts to write to it.

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange CI https://github.com/bitcoin/bitcoin/actions/runs/9649050504/job/26611651920?pr=26596 error.. it doesn't seem to be related. The "salvage" command opens the wallet in DatabaseFormat::BERKELEY format.

When we reopen the wallet to do the migration, instead of opening using
BDB, open it using the BerkeleyRO implementation.
@furszy
Copy link
Member

furszy commented Jun 27, 2024

Now that the IsMine changes were moved to another PR, could reorder the commits just so 0cce574 is introduced before c3373a39a1acc94dbf644103fab2e477d49da671. This will let us keep the IsMine assertions inside MigrateToDescriptor for now.

Other than that, code review ACK 9700c21.

Copy link
Contributor

@cbergqvist cbergqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git range-diff master a8f8a2d 9700c217

Agree with @furszy on reordering the commits to make it possible to retain the 3 asserts in MigrateToDescriptor().

achow101 and others added 5 commits July 1, 2024 14:24
In order to load the necessary data for migrating a legacy wallet
without the full LegacyScriptPubKeyMan, move the data storage and
loading components to LegacyDataSPKM. LegacyScriptPubKeyMan now
subclasses that.
IsMine is necessary for migration. It should be inlined with migration
when the legacy wallet is removed.
In SetupLegacyScriptPubKeyMan, a base LegacyDataSPKM will be created if
the database has the format "bdb_ro" (i.e. the wallet was opened only
for migration purposes).

All of the loading functions are now called with a LegacyDataSPKM object
instead of LegacyScriptPubKeyMan.
The migration process reloads the wallet after all failures.
This commit tests the behavior by trying to obtain a new address
after a decryption failure during migration.
@achow101
Copy link
Member Author

achow101 commented Jul 1, 2024

Reordered the commits and preserved the IsMine asserts.

Copy link
Contributor

@cbergqvist cbergqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8ce3739

git range-diff 9700c217~5..9700c217 8ce3739~5..8ce3739 - not the easiest read, also used GH compare.

Passed make check & test/functional/test_runner.py including p2p_handshake.py which failed on CI.

Only one non-critical reservation: #26596 (comment)

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-review ACK 8ce3739

Comment on lines +3614 to +3622
LegacyDataSPKM* CWallet::GetLegacyDataSPKM() const
{
if (IsWalletFlagSet(WALLET_FLAG_DESCRIPTORS)) {
return nullptr;
}
auto it = m_internal_spk_managers.find(OutputType::LEGACY);
if (it == m_internal_spk_managers.end()) return nullptr;
return dynamic_cast<LegacyDataSPKM*>(it->second);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was about to suggest to deduplicate shared code between this method and ::GetLegacyScriptPubKeyMan above, but since the latter is removed soon anyway (in PR #28710, commit 5652a9c#diff-1f2db0e4d5c12d109c7f0962333c245b49b696cb39ff432da048e9d6c08944d8), it's probably not worth it.

Copy link
Member

@furszy furszy Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you re-touch it again. Could:

LegacyDataSPKM* CWallet::GetLegacyDataSPKM() const
{
    if (IsWalletFlagSet(WALLET_FLAG_DESCRIPTORS)) { return nullptr; }
    return dynamic_cast<LegacyDataSPKM*>(GetScriptPubKeyMan(OutputType::LEGACY, /*internal=*/true));
}

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 8ce3739

Comment on lines +3614 to +3622
LegacyDataSPKM* CWallet::GetLegacyDataSPKM() const
{
if (IsWalletFlagSet(WALLET_FLAG_DESCRIPTORS)) {
return nullptr;
}
auto it = m_internal_spk_managers.find(OutputType::LEGACY);
if (it == m_internal_spk_managers.end()) return nullptr;
return dynamic_cast<LegacyDataSPKM*>(it->second);
}
Copy link
Member

@furszy furszy Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you re-touch it again. Could:

LegacyDataSPKM* CWallet::GetLegacyDataSPKM() const
{
    if (IsWalletFlagSet(WALLET_FLAG_DESCRIPTORS)) { return nullptr; }
    return dynamic_cast<LegacyDataSPKM*>(GetScriptPubKeyMan(OutputType::LEGACY, /*internal=*/true));
}

Comment on lines 793 to +798
void LegacyScriptPubKeyMan::LoadKeyMetadata(const CKeyID& keyID, const CKeyMetadata& meta)
{
LOCK(cs_KeyStore);
LegacyDataSPKM::LoadKeyMetadata(keyID, meta);
UpdateTimeFirstKey(meta.nCreateTime);
mapKeyMetadata[keyID] = meta;
}
Copy link
Member

@furszy furszy Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 7461d0c:

At this point, just a non-blocking nit but you are locking cs_KeyStore twice. One here, and another one inside LegacyDataSPKM::LoadKeyMetadata.

Same for LoadScriptMetadata.

Comment on lines +813 to +816
bool LegacyDataSPKM::AddKeyPubKeyInner(const CKey& key, const CPubKey& pubkey)
{
LOCK(cs_KeyStore);
return FillableSigningProvider::AddKeyPubKey(key, pubkey);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 7461d0c:

FillableSigningProvider::AddKeyPubKey already locks cs_Keystore internally.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants