Skip to content

wallet: Implement independent BDB parser#26606

Merged
fanquake merged 16 commits intobitcoin:masterfrom
achow101:berkeleyro
May 21, 2024
Merged

wallet: Implement independent BDB parser#26606
fanquake merged 16 commits intobitcoin:masterfrom
achow101:berkeleyro

Conversation

@achow101
Copy link
Member

Split from #26596

This PR adds BerkeleyRODatabase which is an independent implementation of a BDB file parser. It provides read only access to a BDB file, and can therefore be used as a read only database backend for wallets. This will be used for dumping legacy wallet records and migrating legacy wallets without the need for BDB itself.

Wallettool's dump command is changed to use BerkeleyRODatabase instead of BerkeleyDatabase (and CWallet itself) to demonstrate that this parser works and to test it against the existing wallettool functional tests.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 29, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK TheCharlatan, josibake, furszy, laanwj, theStack
Concept ACK darosior, Sjors

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29071 (refactor: Remove Span operator==, Use std::ranges::equal by maflcko)
  • #28236 (fuzz: wallet, add target for Spend by Ayush170-Future)
  • #25722 (refactor: Use util::Result class for wallet loading by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link

@Marek777777 Marek777777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some review nibbles.

src/streams.h Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1d093bbf97b8d4ec5413ca560ac79e56f23d7f1d: Note for other reviewers: yes

https://en.cppreference.com/w/c/io/fseek

src/streams.h Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1d093bbf97b8d4ec5413ca560ac79e56f23d7f1d: maybe log the exact error code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of annoying to do that, so I'll leave it as is. The other functions do the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1224f45a92972199799d1ea33700eee567dc66fe: maybe call this file bdb_ro_wallet_db.h?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to have all of the migration stuff in these files, so I will leave it as is.

@darosior
Copy link
Member

Concept ACK. Have you looked into a fuzz target for this parser?

@achow101
Copy link
Member Author

Have you looked into a fuzz target for this parser?

I haven't. If someone would like to write one, I'd be happy to add it.

@achow101 achow101 force-pushed the berkeleyro branch 2 times, most recently from 2d61534 to f2dc015 Compare May 2, 2023 17:53
@DrahtBot DrahtBot removed the CI failed label May 2, 2023
@Sjors
Copy link
Member

Sjors commented Jun 5, 2023

Adding a fuzz target that compares this to the real thing seems useful. Given that this PR changes dumpwallet to rely on the new parser, I think we should do that sooner rather than later. Otherwise I suggest changing e88ad38ac2f3e43de54a332965b13eb7cc27b91f so that the read-only parser is optional, and then change the tests to always use it.

It seems a bit daunting to compare the low level DBD parsing code in this PR to the real thing. Having a fuzzer also reduces the need to be very thorough in that. I'm not sure how much confidence I have in the existing test coverage for dumpwallet.

@achow101 achow101 force-pushed the berkeleyro branch 2 times, most recently from 944a020 to f108c6b Compare June 8, 2023 09:42
@DrahtBot DrahtBot removed the CI failed label Jun 8, 2023
Copy link
Contributor

@sedited sedited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-ACK c0e0a0affea3e3cee9a8910de4bae55a9bdd24cf

achow101 and others added 12 commits May 16, 2024 15:03
BerkeleyRODatabase is intended for use after BDB is removed, so it needs
to be able to read all of the records from a BDB file. Thus an
independent deserializer for BDB data files is implemented in it. This
deserializer is targeted towards the data files that Bitcoin Core
creates so it does not fully support all of BDB's features (e.g. other
database types, encryption, etc.).
Implements MakeBerkeleyRODatabase and adds DatabaseFormat::BERKELEY_RO
so that MakeDatabase can use BerkeleyRO as the backend database.
…lets

In order to ease the transition to not having BDB, make the dump tool
use DatabaseFormmat::BERKELEY_RO when -withinternalbdb is set.
Byteswapped databases make it easier to test opening and deserializing
other endian databases.
Copy link
Contributor

@sedited sedited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-ACK d51fbab

@josibake
Copy link
Member

reACK d51fbab

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK d51fbab

Copy link
Member

@laanwj laanwj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK d51fbab

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d51fbab

Convinced myself that this parser works reliably partly by looking at the original code, partly by extending the test framework's BDB parser to match this PR's logic (see #30125), which helped a lot in understanding the structure. Also tested with a few random signet legacy wallets that I still had around, though those were all quite small.

Btw, I couldn't manage to find or create a BDB wallet where the DELETE flag is set in a record header (would be nice to exercise that in a test, though it's hard to verify that it's really set without inspecting manually, or e.g. adding debug messages on deserialization), but the code to handle that is simple enough and looks correct.

error.original == "Page number mismatch" ||
error.original == "Bad btree level" ||
error.original == "Bad page size" ||
error.original == "File size is not a multiple of page size" ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isn't thrown in the parser (commented out), hence can be removed here

Suggested change
error.original == "File size is not a multiple of page size" ||

error.original == "Unexpected number of entries in outer database root page" ||
error.original == "Subdatabase has an unexpected name" ||
error.original == "Subdatabase page number has unexpected length" ||
error.original == "Unexpected inner database page type" ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isn't thrown anywhere, can be removed

Suggested change
error.original == "Unexpected inner database page type" ||

def add_options(self, parser):
self.add_wallet_options(parser)
parser.add_argument("--bdbro", action="store_true", help="Use the BerkeleyRO internal parser when dumping a Berkeley DB wallet file")
parser.add_argument("--swap-bdb-endian", action="store_true",help="When making Legacy BDB wallets, always make then byte swapped internally")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo nit

Suggested change
parser.add_argument("--swap-bdb-endian", action="store_true",help="When making Legacy BDB wallets, always make then byte swapped internally")
parser.add_argument("--swap-bdb-endian", action="store_true",help="When making Legacy BDB wallets, always make them byte swapped internally")

@maflcko
Copy link
Member

maflcko commented May 21, 2024

I wonder if it makes sense to collect the fuzz inputs to qa-assets, from testers that still have them. However, the fuzz CI does not compile with BDB, so that'll probably need to be adjusted first.

@hebasto
Copy link
Member

hebasto commented Jun 5, 2024

Ported to the CMake-based build system in hebasto#220.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4d7a3ae

The new src/wallet/test/fuzz/wallet_bdb_parser.cpp fails on Windows.

@achow101
Copy link
Member Author

The new src/wallet/test/fuzz/wallet_bdb_parser.cpp fails on Windows.

This looks like an inconsistency in BDB itself as the failure is on BDB loading the database file. Additionally, since this test case passes on linux, I'm not entirely sure what to do about it. This will also be resolved once BDB is removed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.