Skip to content

wallet: Add sqlite as an alternative wallet database and use it for new descriptor wallets#19077

Merged
meshcollider merged 26 commits intobitcoin:masterfrom
achow101:sqlite-wallet
Oct 15, 2020
Merged

wallet: Add sqlite as an alternative wallet database and use it for new descriptor wallets#19077
meshcollider merged 26 commits intobitcoin:masterfrom
achow101:sqlite-wallet

Conversation

@achow101
Copy link
Member

@achow101 achow101 commented May 27, 2020

This PR adds a new class SQLiteDatabase which is a subclass of WalletDatabase. This provides access to a SQLite database that is used to store the wallet records. To keep compatibility with BDB and to complexity of the change down, we don't make use of many SQLite's features. We use it strictly as a key-value store. We create a table main which has two columns, key and value both with the type blob.

For new descriptor wallets, we will create a SQLiteDatabase instead of a BerkeleyDatabase. There is no requirement that all SQLite wallets are descriptor wallets, nor is there a requirement that all descriptor wallets be SQLite wallets. This allows for existing descriptor wallets to work as well as keeping open the option to migrate existing wallets to SQLite.

We keep the name wallet.dat for SQLite wallets. We are able to determine which database type to use by searching for specific magic bytes in the wallet.dat file. SQLite begins it's files with a null terminated string SQLite format 3. BDB has 0x00053162 at byte 12 (note that the byte order of this integer depends on the system endianness). So when we see that there is a wallet.dat file that we want to open, we check for the magic bytes to determine which database system to use.

I decided to keep the wallet.dat naming to keep things like backup script to continue to function as they won't need to be modified to look for a different file name. It also simplifies a couple of things in the implementation and the tests as wallet.dat is something that is specifically being looked for. If we don't want this behavior, then I do have another branch which creates wallet.sqlite files instead, but I find that this direction is easier.

@achow101 achow101 force-pushed the sqlite-wallet branch 2 times, most recently from 92552af to e09e7d7 Compare May 27, 2020 03:18
@achow101
Copy link
Member Author

I've dropped the amalgamation file

@achow101 achow101 force-pushed the sqlite-wallet branch 3 times, most recently from 92d36c5 to 20dcff6 Compare May 27, 2020 05:11
@jonasschnelli
Copy link
Contributor

jonasschnelli commented May 27, 2020

Pretty amazing work! Thanks for doing this.
For testing purposes, would it make sense to add logdb (#5686, simple implementation) in order to test and benchmark?

Concept ACK on a BDB replacement for descriptor wallets.
Still unsure wether sqlite or an internal format should be chosen. Maybe a comparison(-matrix) of internal vs. sqlite could be done?

As for concrete implementation steps, maybe it would make sense to PR the DB flexibility first, then additional storage engines later.

@achow101
Copy link
Member Author

For testing purposes, would it make sense to add logdb (#5686, simple implementation) in order to test and benchmark?

I don't think it really makes sense to add a database system that we aren't going to use.

Still unsure wether sqlite or an internal format should be chosen. Maybe a comparison(-matrix) of internal vs. sqlite could be done?

I think there's two primary reasons to choose sqlite over an internal format.

  1. Review and implementation are much simpler The library already exists so implementation just means correctly using the API. Reviewers won't have to review a file format implementation and convince themselves that that format won't corrupt and is robust.

  2. Better guarantees of consistency and non-corruption. SQLite is very well tested and very widely used. I think they are able better guarantee that data will get written, won't get lost, and won't get corrupted, than we would be able to with an internal format.

As for concrete implementation steps, maybe it would make sense to PR the DB flexibility first, then additional storage engines later.

#18971 does the DB class stuff that gives us this flexibility. This PR is adding in the storage engine and the logic for CWallet to choose which storage to use.

@achow101 achow101 force-pushed the sqlite-wallet branch 2 times, most recently from 930518e to 0ba7d4e Compare May 27, 2020 18:41
@practicalswift
Copy link
Contributor

Concept ACK

Nice work! Very readable code!

@DrahtBot
Copy link
Contributor

DrahtBot commented May 27, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@laanwj
Copy link
Member

laanwj commented May 28, 2020

CONCEPT ACK
🎉 🥳 🎉
Very happy to move on from BerkeleyDB and I've always liked sqlite as a versatile but still minimalistic replacement.

achow101 and others added 13 commits October 14, 2020 11:28
Rewrite uses the VACUUM command which does exactly what we want. A
specific advertised use case is to compact a database and ensure that
any deleted data is actually deleted.
MakeWalletDatabase no longer has a default DatabaseFormat. Instead
callers, like CWallet::Create, need to specify the database type to
create if the file does not exist. If it exists and NONE is given, then
CreateWalletDatabase will try to autodetect the type.
Descriptor wallets don't support dumpwallet, so make the tests that do
dumpwallet legacy wallet only.
@achow101
Copy link
Member Author

Rebased. There was a silent merge conflict with #20130.

@hebasto @S3RK @promag @Sjors @ryanofsky @meshcollider @fjahr Could you all re-review/re-ACK this so we can merge it before the feature freeze tomorrow?

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK c4a29d0, only rebased since my previous review, verified with git range-diff master d18892dcc c4a29d0a9.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK c4a29d0. I am honestly confused about reasons for locking into wallet.dat again when it's so easy now to use a clean format. I assume I'm just very dense, or there's some unstated reason, because the only thing that's been brought up are unrealistic compatibility scenarios (all require actively creating a wallet with non-default descriptor+sqlite option, then trying to using the descriptor+sqlite wallets with old software or scripts and ignoring the results) that we didn't pay attention to with previous PRs like #11687, which did not require any active interfaction.

Changes since previous review were: using pkg-config, naming/whitespace/documentation/quoting/const/log/assert changes, fixing leak if database constructor throws, enabling unconditional fullsync

`./` | `wallet.dat` | Personal wallet (BDB) with keys and transactions
`./` | `wallet.dat` | Personal wallet with keys and transactions. May be either a Berkeley DB or SQLite database file.
`./` | `.walletlock` | Wallet lock file
`./` | `wallet.dat-journal` | SQLite Rollback Journal file for `wallet.dat`. Usually created at start and deleted on shutdown. A user *must keep it as safe* as the `wallet.dat` file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Include sqlite3 in documentation" (6c6639a)

If need to update, there are two corrections that could be made:

  • Line 69 "Wallets are Berkeley DB (BDB) databases" is no longer true
  • Line 76 "Wallet lock file" should say "BDB wallet lock file"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #20152.

@Sjors
Copy link
Member

Sjors commented Oct 14, 2020

re-utACK c4a29d0

@promag
Copy link
Contributor

promag commented Oct 14, 2020

Tested ACK c4a29d0.

I am honestly confused about reasons for locking into wallet.dat again when it's so easy now to use a clean format.

@ryanofsky clean format as in different file extension? I think it can be discussed and changed even after feature freeze, just like @achow says in OP.

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c4a29d0, debug builds and test runs after rebase to latest master @ c2c4dba, some manual testing creating, using, unloading and reloading a few different new sqlite descriptor wallets over several node restarts/shutdowns.

Nice work and reviewing getting this in shape. Looking forward to testing this further.

2020-10-14T21:54:12Z Using SQLite Version 3.33.0

@fjahr
Copy link
Contributor

fjahr commented Oct 14, 2020

reACK c4a29d0

@S3RK
Copy link
Contributor

S3RK commented Oct 15, 2020

Re-review ACK c4a29d0
And thanks for fixing the silent merge conflict

@fanquake
Copy link
Member

@meshcollider has declared that he'll be sorely disappointed if he doesn't get to pull the merge trigger on this PR.

@meshcollider
Copy link
Contributor

re-utACK c4a29d0

Let's go 🚀

status = DatabaseStatus::FAILED_VERIFY;
return nullptr;
}
return db;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this return path need to set status = DatabaseStatus::SUCCESS;?

I am asking because gcc seems to generate binaries that will inject false positives into valgrind. I haven't found a bug in the code.

Though, the bdb version also seems to set ::SUCCESS, so it might be appropriate here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it should, but istm it shouldn't matter. I'm pretty sure we never check status if there isn't a failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jup, we don't check status if there is no failure. It is just a style question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also happening with clang, so I'll try to fix this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.