Rollback for dumptxoutset without invalidating blocks#33477
Rollback for dumptxoutset without invalidating blocks#33477fjahr wants to merge 5 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33477. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
791471a to
d51c7d7
Compare
|
cc @Sjors since you were asking for this approach a few times :) |
716e1db to
6d409d5
Compare
|
Concept ACK, this seems cleaner.
I suspect if you go back further, this approach will end up performing better because we no longer need to roll back forward at the end |
kevkevinpal
left a comment
There was a problem hiding this comment.
Concept ACK 6d409d5
This approach makes more sense. I reviewed the code a bit and made some comments, but nothing blocking
I also added comments on possible functional tests for the new JSONRPCError but these can be done in a followup
src/rpc/blockchain.cpp
Outdated
| "Unless the \"latest\" type is requested, the node will roll back to the requested height and network activity will be suspended during this process. " | ||
| "Because of this it is discouraged to interact with the node in any other way during the execution of this call to avoid inconsistent results and race conditions, particularly RPCs that interact with blockstorage.\n\n" | ||
| "This call may take several minutes. Make sure to use no RPC timeout (bitcoin-cli -rpcclienttimeout=0)", | ||
| "This creates a temporary UTXO database when rolling back, keeping the main chain intact. Should the node experience an unclean shutdown the temporary database may need to be removed from the datadir manually.\n\n" |
There was a problem hiding this comment.
It may be worth noting that "network activity will not be suspended during this process."
There was a problem hiding this comment.
I don't think this is necessary, I don't think a user would naturally assume that there is a reason to suspend network activity for this. Previously we had to do this as basically a hack. When we don't do this anymore it would seem odd to me to mention it.
| CBlock block; | ||
| if (!node.chainman->m_blockman.ReadBlock(block, *block_index)) { | ||
| throw JSONRPCError(RPC_INTERNAL_ERROR, | ||
| strprintf("Failed to read block at height %d", block_index->nHeight)); |
There was a problem hiding this comment.
Might be able to add a functional test for this rpc error
There was a problem hiding this comment.
Hm, this one and the other similar comments about test coverage are all cases that are pretty hard to hit because they should only be possible in a case of db corruption or a similarly unlikely event. Not saying that this wouldn't be valuable coverage, but afaik we hardly ever go through the hassle to cover such cases and this RPC is far from being a critical path in the comparison to the rest of the code base. So, unless you have a specific suggestion for how the can be hit in practical way I would suggest we keep these for a follow-up. The current changes should be a robustness improvement by themselves.
(Marking the other comments as resolved for now but feel free to correct me if you think different about one of them specifically)
| } | ||
| ~TemporaryUTXODatabase() { | ||
| try { | ||
| fs::remove_all(m_path); |
There was a problem hiding this comment.
It might be useful to add a log here to inform the user the temp UTXO db was cleaned up since we mentioned in logs that we are creating a temp UTXO db
There was a problem hiding this comment.
Hm, I don't think we log anything on the creation of the DB so I think I would keep it the same on the destruction. It should only be a debug level log if we would add anything like that since it seems a bit low level for the general user.
|
Nice!
That probably makes sense. It might be possible to do it faster and with less disk usage for relatively short rollbacks via a two step process:
Rather than being direct RPC functionality, maybe it would be better to have an RPC function to export a copy of the utxo set at the current height, and have a separate bitcoin-kernel binary that performs the rollback and utxoset stats calculation itself? |
|
Concept ACK |
|
ACK 6d409d5 This is a good change. Using a temporary coins DB for the rollback is a much cleaner and safer solution than the The code is well-contained, and the new I've pulled the branch, compiled, and the full functional test suite passes, including the |
mzumsande
left a comment
There was a problem hiding this comment.
Concept ACK
I suspect if you go back further, this approach will end up performing better because we no longer need to roll back forward at the end
Would be interesting if someone could try out by going back 25k blocks or more, as is usually done for creating snapshots (I can't right now).
| WITH_LOCK(::cs_main, cursor = chainstate.CoinsDB().Cursor()); | ||
|
|
||
| size_t coins_count = 0; | ||
| while (cursor->Valid()) { |
There was a problem hiding this comment.
Don't we need to hold cs_main throughout the copying phase? What if the utxo set changes while we are copying coins?
There was a problem hiding this comment.
I don't think so, my understanding is that the cursor/LevelDB iterator works on a snapshot of the DB itself which doesn't get mutated. You can see the same pattern in WriteUTXOSnapshot where we are working with a cursor without holding cs_main as well.
6d409d5 to
b71ee30
Compare
|
Thanks for all the feedback so far and sorry for the slow response!
Hm, feels like a bit overengineered for this functionality, considering the overhead for test coverage and build changes for this. Maybe I am overcomplicating it in my head, I have not done much with kernel yet. But if this is considerably more complex I would rather first go ahead with this and then I would rather keep it for consideration of a future change.
I tried with
I actually had a similar idea early on but then stayed with the simpler approach. I will try this out with a POC and check the performance impact. |
b71ee30 to
53865e7
Compare
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
53865e7 to
e0162ab
Compare
I tried to a few different takes on the delta-based idea including some vibe coding tests. Here is the latest one, something is definitely still broken there but the dump it generates is correct. All tests I did had in common that the processing time actually took longer than the current approach, almost 28 min with the latest code version. I am now thinking that longer rollbacks may not be quicker, only shorter ones will be because the big copy overhead in the beginning is saved. But then we also don't have that much to gain. Even if the performance can be improved, now that I have played around with it, it doesn't seem worth the additional code complexity it introduces. It would need to be significantly faster to justify that and I am currently not seeing that. |
Are you saying this PR might not be worth it, or just the other approaches you tried out? I'm also curious if this runs counter to the purpose of #31560 , which would reduce the amount of interim data written to disk for the purpose of collecting the data into a db. |
Wording is a bit confusing, sorry. I meant the other approach that I tried out. That approach had a negative impact on performance but even if that could be improved I would still doubt it's our best choice.
I haven't reviewed that one but from a quick glance I don't see why these would conflict. This PR here deals with how the UTXO set data is collected and #31560 deals with writing that data somewhere else. What #31560 does is just overwriting the It also seems like #31560 is basically RFM. I would say this can move forward and I will figure the integration in the rebase :) |
|
Oh, huh, I kind-of thought this was merged already. I guess ping for review after 31560 is merged and this is rebased on top? |
Sure, I'm just waiting for 31560 myself now :) FWIW, I've been using this branch for all my txoutset dump needs since I opened it and never saw any issues so it seems to work well (anecdotally). |
e0162ab to
921827c
Compare
|
Rebased on top of the changes from #31560, the conflict was very minor and things seem to just work (TM). I even added a commit which extends the sqlite test for named pipe usage to do a rollback so both functionalities are tested together.
@sedited I have thought about your comment again and I probably didn't address it correctly. Your point was not about if things can be integrated mechanically but rather that this offsets the gain someone might aim for when using a pipe, right? Indeed the negative impact is that there is more disk space used temporarily while the temp utxo set exists. However this happens only in the rollback case, with the latest utxo set the gain is still there. |
theStack
left a comment
There was a problem hiding this comment.
Tested manually on signet so far with an arbitrary snapshot height (deep rollback=50000) and it seems to work fine, both with a regular file and a named pipe. Out of curiosity I tried to place the temporary db in /tmp (tmpfs)
diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index e2a15d52d8..8453f8fe0d 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -3121,7 +3121,7 @@ static RPCHelpMan dumptxoutset()
target_index,
std::move(afile),
path,
- temppath);
+ "/tmp/");
}
if (!fs::is_fifo(path_info)) {
with the hope that it could be a bit faster, and also experimented with higher .cache_bytes values for the temporary db (256_MiB, 1024_MiB), but none of these changes made a difference in performance. Will re-test on mainnet within the next days.
|
|
||
| // Log every 10M coins (optimized for mainnet) | ||
| if (coins_count % 10'000'000 == 0) { | ||
| LogInfo("Copying UTXO set: %uM coins copied.", coins_count / 1'000'000); |
There was a problem hiding this comment.
logging nit: would be nice to also show the number of total coins, to provide some information about the copying progress. I guess that's not easily possible though as we would need to a call GetUTXOStats before (adding a significant delay and therefore not worth it)?
There was a problem hiding this comment.
Yeah, I don't like it either but to my knowledge there isn't an easy enough way that would be worth it, IMO, unless the user runs coinstatsindex. Doing the logging different with coinstatsindex is certainly doable but also a bit ugly so not sure if it's worth it. Running the stats before getting started also doesn't seem worth it to take the performance hit. One creative idea I had was to take the db size on disk and calculate an estimate of progress based on bytes processed. But that is probably even worse than the other approaches because it takes more code, we would need to deal with all the stuff leveldb does like at least trigger a compaction manually before getting the size on disk etc. And it would likely be inaccurate often so we might still get user complaints. One more dirty estimate idea: getting the latest assumeutxo coins count and then extrapolate some coins count based on the blocks delta. That would have worked pretty well until 2023 but now it would be off to an annoying degree as well: https://mainnet.observer/charts/utxoset-size/
Happy to consider more ideas but so far I haven't had one where I felt it was worth it for nicer logging.
There was a problem hiding this comment.
I agree that none of the mentioned ideas are worth it the additional code complexity / performance penalty just for nicer log messages (though it's interesting to reason about them, even if they are dirty :-)). Forgot to mention that in my review yesterday, but for the rollback a progress could be added though, as in that case the necessary information is available, e.g. "Rolled back 500/52000 blocks" (maybe even show the percentage if we want to be extra fancy). But even that I would consider a non-blocking nit.
There was a problem hiding this comment.
Good idea. Added the rollback progress, the extra fancy way :)
921827c to
9da705c
Compare
src/rpc/blockchain.cpp
Outdated
| // Create temporary database | ||
| DBParams db_params{ | ||
| .path = temp_db_path, | ||
| .cache_bytes = 16_MiB, |
There was a problem hiding this comment.
Does this make a difference at all? We are just reading once, so I would not expect db-level caching to make a difference.
There was a problem hiding this comment.
You're right, technically we are reading twice but the cache doesn't really help here. During the rollback phase each coin is looked up only once, so the cache has no impact. And at the end everything is read once for the final dump, there 16 MB cache is negligible.
Good catch, setting it to 0.
src/rpc/blockchain.cpp
Outdated
| DBParams db_params{ | ||
| .path = temp_db_path, | ||
| .cache_bytes = 16_MiB, | ||
| .memory_only = false, |
There was a problem hiding this comment.
Might it be interesting to make this a named argument? On systems with heaps of memory this might reduce the disk load a bit, and for me at least it seems to cut the dump time in half.
There was a problem hiding this comment.
Sure, that seems a small enough change to be worth it. I have added is as a separate commit so it can be considered as an optional addition to the change here.
| strprintf("Failed to read block at height %d", block_index->nHeight)); | ||
| } | ||
|
|
||
| WITH_LOCK(::cs_main, res = chainstate.DisconnectBlock(block, block_index, rollback_cache)); |
There was a problem hiding this comment.
Might there be a TOCTOU issue in case pruning happens in the meantime here?
There was a problem hiding this comment.
Hm, yeah, I think it is. The simple way would have been to just throw an error in this case but it's a bit annoying when this happened to the user because maybe it would have worked but now they have to resync because they lost the blocks. I rather implemented what I think is a nicer solution using a prune lock.
Instead this new approach uses a temporary coins db to roll back the UTXO set.
9da705c to
b3ea153
Compare
|
Addressed comments from @sedited and @theStack , thanks for the review! There are now two new commits and a bit of extra lines of code. As mentioned in some of the comments there would have probably been simpler solutions available for each of the addressed issues and I am open to consider rolling back (pun intended) what I have added in this push in favor of something simpler if that's what reviewers prefer. EDIT: FWIW, the newly added |
b3ea153 to
e3410e6
Compare
sedited
left a comment
There was a problem hiding this comment.
This looks good to me, but I would recommend splitting the DeletePruneLock from the last commit and adding a unit test for it. To me it would also make sense to just squash it into the commit changing the rest of the rollback logic.
|
|
||
| LogInfo("Copying current UTXO set to temporary database."); | ||
| { | ||
| WITH_LOCK(::cs_main, chainstate.ForceFlushStateToDisk()); |
There was a problem hiding this comment.
Should we retain the cache here?
| const bool in_memory) | ||
| { | ||
| // Create a temporary leveldb to store the UTXO set that is being rolled back | ||
| std::string temp_db_name{strprintf("temp_utxo_%d", target->nHeight)}; |
There was a problem hiding this comment.
How about adding a few random characters to this? Might that prevent a collision bug where a re-org takes place in case we are dumping close to the tip height?
This is an alternative approach to implement
dumptxoutsetwith rollback that was discussed a few times. It does not rely oninvalidateblockandreconsiderblockand instead creates a temporary copy of the coins DB, modifies this copy by rolling back as many blocks as necessary and then creating the dump from this temp copy DB. See also #29553 (comment), #32817 (comment) and #29565 discussions.The nice side-effects of this are that forks can not interfere with the rollback and network activity does not have to be suspended. But there are also some downsides when comparing to the current approach: this does require some additional disk space for the copied coins DB and performance is slower (master took 3m 17s vs 9m 16s in my last test with the code here, rolling back ~1500 blocks). However, there is also not much code being added here, network can stay active throughout and performance would stay constant with this approach while it would impact master if there were forks that needed to be invalidated as well (see #33444 for the alternative approach), so this could still be considered a good trade-off.