coins: remove redundant and confusing CCoinsViewDB::HaveCoin#34320
coins: remove redundant and confusing CCoinsViewDB::HaveCoin#34320l0rinc wants to merge 4 commits intobitcoin:masterfrom
CCoinsViewDB::HaveCoin#34320Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34320. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Are there any flamegraphs or benchmarks, to see how this affects performance? |
I'm not sure what to measure exactly because the current code was already delegating to |
Added assumeutxo load, reindex-chainstate and pruned IBD measurements to the PR description |
I'm not sure about this. bool CCoinsViewCache::HaveCoin(const COutPoint &outpoint) const {
CCoinsMap::const_iterator it = FetchCoin(outpoint);
return (it != cacheCoins.end() && !it->second.coin.IsSpent());
}vs std::optional<Coin> CCoinsViewCache::GetCoin(const COutPoint& outpoint) const
{
if (auto it{FetchCoin(outpoint)}; it != cacheCoins.end() && !it->second.coin.IsSpent()) return it->second.coin;
return std::nullopt;
} |
I see 4 uses of
For 1 and 2, we are checking for the non-existence of the coin so we won't need the full value. In this case not caching the value makes sense. I think 1 and 2 would benefit from |
Disassembly with separate translation units indicates
But there's nothing to cache since these will return an empty optional. For BIP30 and
This is a good point, the disassembly indicates there's no devirtualization and no inlining in if (AccessCoin(tx.vin[i].prevout).IsSpent()) {and if (m_view.AccessCoin(txin.prevout).IsSpent()) {I have added a new benchmark as a first commit to check the performance of on-disk Benchmark baseline:
Benchmark with
Replace remaining
(ran it 5 times for each commit and copied the fastest (with smallest noise). The small difference is reproducible, but not very important as long as it's not measurably slower) Benchmarking script and results on Linuxfor commit in 3544b98927050956f68cb183bf2686f5c65e5ebb 8c568500442144fbcd457eccda5a1956aa98ca81; do \
git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && echo "" && git log -1 --pretty='%h %s' && \
rm -rf build >/dev/null 2>&1 && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && \
cmake --build build -j$(nproc) >/dev/null 2>&1 && \
for _ in $(seq 5); do \
sleep 5; \
./build/bin/bench_bitcoin -filter='HaveInputsOnDisk' -min-time=1000; \
done; \
doneresulting in
|
9b4e044 to
8c56850
Compare
This tells us that
Same as above.
I think everywhere we use AccessCoin we assert the unspentness. This would now break that pattern and use AccessCoin to check existence via spentness. I think HaveCoin conveys intent better here. Can we instead just remove HaveCoin on the db then, and the default implementation would just forward to |
HaveCoin calls with GetCoinCCoinsViewDB::HaveCoin
|
Thanks @andrewtoth, I didn't like the suggestion at first since we keep polluting the interface, but after applying your suggestion I'm fine with the results, it's simpler and better separated from the other similar PRs. |
67068ee to
3d25cbf
Compare
3d25cbf to
4166ee8
Compare
23dd085 to
69fa15a
Compare
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
andrewtoth
left a comment
There was a problem hiding this comment.
ACK d1a8a1b
CCoinsViewDB::HaveCoin is dead code, good to remove it and prevent confusion.
Also, adding test coverage and a benchmark to prevent regressions is good.
src/test/coins_tests.cpp
Outdated
| a.out == b.out; | ||
| } | ||
|
|
||
| Coin MakeTestCoin() { return Coin{CTxOut{1, CScript{}}, 1, false}; } |
There was a problem hiding this comment.
nit: not sure we really need this helper? It's a one liner called in two places.
There was a problem hiding this comment.
I don't like duplication, but I agree it's not needed here
src/test/coins_tests.cpp
Outdated
| mtx.vin.emplace_back(prevout); | ||
| const CTransaction tx{mtx}; | ||
| BOOST_CHECK(!tx.IsCoinBase()); | ||
| BOOST_CHECK(cache.HaveInputs(tx)); |
There was a problem hiding this comment.
This seems like a basic CCoinsViewCache test. Is this behavior not covered elsewhere? If not, should we extend this to test that AccessCoin, GetCoin, HaveCoin, SpendCoin do not access base when the coin is cached?
There was a problem hiding this comment.
HaveInputs() had no unit coverage at all.
These new tests cover the "cache hit should not consult base" and "cache miss should consult base via GetCoin()" usecases - if you think there's a better location for these, let me know.
I've extended it with the mentioned methods, thanks
src/bench/coins_haveinputs.cpp
Outdated
| DataStream{benchmark::data::block413567} >> TX_WITH_WITNESS(block); | ||
|
|
||
| CCoinsViewCache cache{&db}; | ||
| cache.SetBestBlock(uint256::ONE); |
There was a problem hiding this comment.
nit: doesn't really matter, but we could set this to block.GetBlockHeader().GetHash().
e4613ee to
ff338fd
Compare
src/test/coins_tests.cpp
Outdated
| } | ||
| } | ||
|
|
||
| BOOST_AUTO_TEST_CASE(ccoins_haveinputs_cache_miss_uses_base_getcoin) |
There was a problem hiding this comment.
Consider merging both tests as done in the diff below. I also replaced the DB view with a dummy view because CCoinsViewDB is not under test here; the dummy is sufficient to test the cache.
diff --git a/src/test/coins_tests.cpp b/src/test/coins_tests.cpp
index ad182559a3..baba9c4e75 100644
--- a/src/test/coins_tests.cpp
+++ b/src/test/coins_tests.cpp
@@ -1050,23 +1050,13 @@ BOOST_FIXTURE_TEST_CASE(ccoins_flush_behavior, FlushTest)
}
}
-BOOST_AUTO_TEST_CASE(ccoins_haveinputs_cache_miss_uses_base_getcoin)
+BOOST_AUTO_TEST_CASE(ccoins_cache_behavior)
{
- const COutPoint prevout{Txid::FromUint256(m_rng.rand256()), 0};
-
- CCoinsViewDB db{{.path = "test", .cache_bytes = 1_MiB, .memory_only = true}, {}};
- {
- CCoinsViewCache write_cache{&db};
- write_cache.SetBestBlock(m_rng.rand256());
- write_cache.AddCoin(prevout, Coin{CTxOut{1, CScript{}}, 1, false}, /*possible_overwrite=*/false);
- write_cache.Flush();
- }
-
class CCoinsViewSpy final : public CCoinsViewBacked
{
public:
const COutPoint expected;
- mutable size_t havecoin_calls{0}, getcoin_calls{0};
+ mutable size_t getcoin_calls{0};
explicit CCoinsViewSpy(CCoinsView* view, const COutPoint& out) : CCoinsViewBacked(view), expected{out} {}
@@ -1074,18 +1064,23 @@ BOOST_AUTO_TEST_CASE(ccoins_haveinputs_cache_miss_uses_base_getcoin)
{
++getcoin_calls;
BOOST_CHECK(out == expected);
- return CCoinsViewBacked::GetCoin(out);
+ return Coin{CTxOut{1, CScript{}}, 1, false};
}
bool HaveCoin(const COutPoint& out) const override
{
- ++havecoin_calls;
- BOOST_CHECK(out == expected);
- return CCoinsViewBacked::HaveCoin(out);
+ // This should never be called, since HaveInputs() should call GetCoin()
+ // and not call HaveCoin() at all. If this is called, it means that
+ // HaveInputs() is calling HaveCoin() instead of GetCoin(), which
+ // is less efficient since it may require two lookups instead of one.
+ BOOST_FAIL("Base HaveCoin should never be called");
+ return false;
}
};
- CCoinsViewSpy base{&db, prevout};
+ const COutPoint prevout{Txid::FromUint256(m_rng.rand256()), 0};
+ CCoinsView dummy;
+ CCoinsViewSpy base{&dummy, prevout};
CCoinsViewCache cache{&base};
CMutableTransaction mtx;
@@ -1095,46 +1090,14 @@ BOOST_AUTO_TEST_CASE(ccoins_haveinputs_cache_miss_uses_base_getcoin)
BOOST_CHECK(cache.HaveInputs(tx));
BOOST_CHECK_EQUAL(base.getcoin_calls, 1);
- BOOST_CHECK_EQUAL(base.havecoin_calls, 0);
-}
-BOOST_AUTO_TEST_CASE(ccoins_cache_hit_does_not_call_base)
-{
- class CCoinsViewNoCall final : public CCoinsView
- {
- public:
- std::optional<Coin> GetCoin(const COutPoint&) const override
- {
- BOOST_FAIL("Base GetCoin should not be called when input is cached");
- return std::nullopt;
- }
-
- bool HaveCoin(const COutPoint&) const override
- {
- BOOST_FAIL("Base HaveCoin should not be called when input is cached");
- return false;
- }
- };
-
- const COutPoint prevout{Txid::FromUint256(m_rng.rand256()), 0};
- CCoinsViewNoCall base;
- CCoinsViewCache cache{&base};
-
- cache.AddCoin(prevout, Coin{CTxOut{1, CScript{}}, 1, false}, /*possible_overwrite=*/false);
- BOOST_CHECK(cache.HaveCoinInCache(prevout));
-
- BOOST_CHECK(!cache.AccessCoin(prevout).IsSpent());
- BOOST_CHECK(cache.GetCoin(prevout));
- BOOST_CHECK(cache.HaveCoin(prevout));
-
- CMutableTransaction mtx;
- mtx.vin.emplace_back(prevout);
- const CTransaction tx{mtx};
- BOOST_CHECK(!tx.IsCoinBase());
BOOST_CHECK(cache.HaveInputs(tx));
-
- BOOST_CHECK(cache.SpendCoin(prevout));
- BOOST_CHECK(!cache.HaveCoinInCache(prevout));
+ BOOST_CHECK(cache.GetCoin(prevout));
+ BOOST_CHECK(!cache.AccessCoin(prevout).IsSpent());
+ // GetCoin should only have been called once,
+ // since the result should be cached after
+ // the first call to HaveInputs().
+ BOOST_CHECK_EQUAL(base.getcoin_calls, 1);
}
BOOST_AUTO_TEST_CASE(coins_resource_is_used)
There was a problem hiding this comment.
Thanks for the hint, I simplified this into a single ccoins_cache_behavior test using a dummy spy base view. It covers the two behaviors that weren’t explicitly pinned elsewhere: on cache miss, HaveInputs() must use base GetCoin() (not HaveCoin()), and once cached (including prefilled cache), lookups stay in cache without extra base calls, with SpendCoin() updating cache state as expected. So this keeps the original intent while adopting your simplification and removing the DB setup/duplication.
There was a problem hiding this comment.
From 8c57687 commit message:
ExistsImplwas removed since it duplicatesCDBWrapper::ReadImpl(except that it copies the resulting string on success, but that will be needed for caching anyway).
I was thrown off by the last part of this sentence because it's not clear why caching is being mentioned here. I believe you were referring to the fact that with this PR, the HaveCoin will lead to a Read call instead of an Exists call and will cache the result so we can ignore the cost of the copy when doing the first HaveCoin call.
If I am correct, can you make your justification clearer in the commit message? To state clearly that Read is used over Exists and the result is cached so the cost of the copy can be ignored.
There was a problem hiding this comment.
New commit message looks good to me.
ff338fd to
dd76338
Compare
Eunovo
left a comment
There was a problem hiding this comment.
Looks good, but I have one more comment on the test.
src/test/coins_tests.cpp
Outdated
| BOOST_CHECK(cache.HaveInputs(tx)); | ||
| BOOST_CHECK_EQUAL(base.getcoin_calls, 1); | ||
|
|
||
| base.allow_getcoin = false; |
There was a problem hiding this comment.
I don't think we need base.allow_getcoin, because the BOOST_CHECK_EQUAL(base.getcoin_calls, 1); line below would fail if another base.GetCoin() call was made.
There was a problem hiding this comment.
277c57f test: add HaveInputs call-path unit tests:
Good point, taken - also rebased in the meantime, you can diff with B=ff338fdb53a66ab40a36e1277e7371941fc89840 A=dd76338a57b9b1169ac27f7b783d6d0d4c6e38ab && git fetch origin $B $A && git range-diff --creation-factor=95 $B...$A
There was a problem hiding this comment.
New commit message looks good to me.
`CCoinsViewCache::HaveInputs()` is used by `Consensus::CheckTxInputs()` and currently exercises the `HaveCoin()` path in the UTXO cache. Performance could be affected due to `Coin` copying and different cache warming behavior. Add a benchmark to track `HaveInputs()` throughput before and after the `HaveCoin()` changes in this series to make sure the performance is retained. The benchmark populates an on-disk `CCoinsViewDB` with dummy unspent coins for all block inputs and then calls `HaveInputs()` for each non-coinbase transaction using a fresh `CCoinsViewCache` (to avoid cross-iteration caching). It calls `HaveInputs()` twice to exercise the cache-hit path as well. Benchmark baseline: | ns/tx | tx/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 1,004.58 | 995,437.07 | 0.7% | 1.10 | `HaveInputsOnDisk`
Add unit tests covering `CCoinsViewCache::HaveInputs()`. The tests document that `HaveInputs()` consults the cache first and that a cache miss pulls from the backing view via `GetCoin()`. Co-authored-by: Novo <[email protected]>
`ExistsImpl` was removed since it duplicates `CDBWrapper::ReadImpl`. The only difference is that `ExistsImpl` discards the value while `ReadImpl` returns it as `std::string`. In this series, `HaveCoin` database lookups go through `GetCoin`/`Read`, and successful reads are cached in `CCoinsViewCache`, so this copy is not a separate recurring cost. `CDBWrapper::Exists()` and `::ReadRaw()` both issue a LevelDB `::Get`. Introduce a small helper returning the raw value bytes (as `std::string`) for a given key. Move `Exists` closer to the read methods.
Direct `CCoinsViewDB::HaveCoin` calls aren't called in production anyway, so let's just revert to delegating to `::GetCoin`. Benchmark at this commit: | ns/tx | tx/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 995.62 | 1,004,398.82 | 0.9% | 1.10 | `HaveInputsOnDisk`
dd76338 to
8bfd863
Compare
When `git range-diff` cannot match a commit between two versions of a branch, it shows the old commit as removed (`<`) and the new commit as added (`>`), and it does not show the patch contents. This output is easy to misread as "no code changes" because the diff for that commit is effectively empty. It really means the commits were considered unrelated and the new commit should be reviewed from scratch. This is analogous to rename detection in `git diff`: if similarity is too low, a rename shows up as delete+add. Example (exact SHAs from PR bitcoin#34320): B=ff338fdb53a66ab40a36e1277e7371941fc89840; A=dd76338a57b9b1169ac27f7b783d6d0d4c6e38ab; git fetch upstream $B $A git range-diff 0ca4295..$B 139aa4b..$A This produced output like: 1: 4b32181 < -: ---------- test: add `HaveInputs` call-path unit tests -: ---------- > 1: 277c57f test: add `HaveInputs` call-path unit tests Even though the subject matches, the first commit had no matching diff shown. That should be treated as "unmatched" rather than "unchanged". If you expected a match, try increasing the creation factor so `git range-diff` searches harder: git range-diff --creation-factor=95 <old_range> <new_range>
|
Concept ACK I'm a bit confused by this patch set. If the |
I touched on that briefly in the commit message:
The motivation was @andrewtoth's concern about a coin copy regression. See #34320 (comment) It also checks the cached behavior, not just the DB query.
I tried to cover this in the commit message as well. Please let me know which part needs more detail:
Basically, we can remove it now since the underlying premise, trying to avoid deserialization cost, isn't warranted anymore. I could inline the |
Why do these concerns apply to its removal if the coins db's
Sure, but it would be good to mention why this cost isn't relevant to its current use in the indexes. |
When `git range-diff` cannot match a commit between two versions of a branch, it shows the old commit as removed (`<`) and the new commit as added (`>`), and it does not show the patch contents. This output is easy to misread as "no code changes" because the diff for that commit is effectively empty. It really means the commits were considered unrelated and the new commit should be reviewed from scratch. This is analogous to rename detection in `git diff`: if similarity is too low, a rename shows up as delete+add. Example (exact SHAs from PR bitcoin#34320): B=ff338fdb53a66ab40a36e1277e7371941fc89840; A=dd76338a57b9b1169ac27f7b783d6d0d4c6e38ab; git fetch upstream $B $A git range-diff 0ca4295..$B 139aa4b..$A This produced output like: 1: 4b32181 < -: ---------- test: add `HaveInputs` call-path unit tests -: ---------- > 1: 277c57f test: add `HaveInputs` call-path unit tests Even though the subject matches, the first commit had no matching diff shown. That should be treated as "unmatched" rather than "unchanged". If you expected a match, try increasing the creation factor so `git range-diff` searches harder: git range-diff --creation-factor=95 <old_range> <new_range>
Split out of #34132 (comment).
Problem
We have two ways to check if a coin exists in the DB.
One tries to deserialize; the other just returns whether any value is associated with the key.
But we usually need the full value we probed for, and
HaveCoincannot cache the returned value, so it can trigger an extra lookup ifHaveCoinusesExistsand the caller later needs the actual value.We're already delegating most
HaveCoincalls toGetCoinanyway for the above reasons, so theCCoinsViewDBimplementation ofHaveCoinis redundant.Fix
Remove
CCoinsViewDB::HaveCoin(falls back to the baseCCoinsView::HaveCoin, which delegates toGetCoin).Testing
A unit test was added to confirm that
HaveInputscalls the backing view's::GetCoin(not::HaveCoin) on cache miss and checks the cache first.Also added a benchmark to track throughput before/after changes to ensure no regression.
Note: The latest version of the change keeps most
HaveCoincalls for convenience; a previous version removed all of them.