bloom: use Span instead of std::vector for insert and contains [ZAP3]#18985
bloom: use Span instead of std::vector for insert and contains [ZAP3]#18985jb55 wants to merge 1 commit intobitcoin:masterfrom
insert and contains [ZAP3]#18985Conversation
|
Do you have a good way to quantify the impact of this change? That would be interesting to see :) |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
|
@practicalswift this is something I want to do once I figure out how to use the benchmarking suite. This one might be nontrivial though since it's networking code. I might just do the benchmarking in #18849 to start. One piece of data that I can show here is the number of allocations before and after which I'll do |
|
Concept ACK, I think this is the right approach. In general I think that most functions that take either a const vector/prevector as input, or begin/end pointer, or a begin pointer + size, should be replaced with versions that just take in a Span. It would be worthwhile to make Nit: uint256 is not a vector internally, but an array. |
|
I've noticed that we consistently pass |
|
Concept ACK. Will review as soon as this is rebased.
@practicalswift: Good point. I guess seen from a performance perspective it doesn't really make a difference, considering how lightweight spans are. In any case I agree that we should agree on one passing type and use it consistently in the code-base (my personal preference would be to always pass it by value, though). |
|
Sebastian Falbesoner <[email protected]> writes:
Concept ACK. Will review as soon as this is rebased.
> I've noticed that we consistently pass Span by const-ref instead of
> by value in the project (this PR follows that convention): is that an
> intentional choice? Just curious :)
@practicalswift: Good point. I guess seen from a performance
perspective it doesn't really make a difference, considering how
lightweight spans are. In any case I agree that we should agree on one
passing type and use it consistently in the code-base (my personal
preference would be to always pass it by value, though).
It looks like they are being passed by value in other PRs, I don't mind
switching to that style here.
|
|
@theStack @jb55 FWIW, throughout the C++ Core Guidelines
Same goes for |
a6d8f9f to
1466b65
Compare
|
rebased and switched from |
|
Sebastian Falbesoner <[email protected]> writes:
Could simply use the recently introduced `MakeUCharSpan()` helper here and on other occurences of the commit.
oh nice, I didn't know that was a thing. I'll do that.
|
1466b65 to
897f5be
Compare
|
pushed 897f5be28c072c700a55a511bf0a88efc3b71602
much cleaner 🤩 diff --git a/src/bloom.cpp b/src/bloom.cpp
index 436e913aa5..c19d6e7d7d 100644
--- a/src/bloom.cpp
+++ b/src/bloom.cpp
@@ -59,12 +59,12 @@ void CBloomFilter::insert(const COutPoint& outpoint)
{
CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
stream << outpoint;
- insert(Span<const unsigned char>((const unsigned char*)stream.data(), stream.size()));
+ insert(MakeUCharSpan(stream));
}
void CBloomFilter::insert(const uint256& hash)
{
- insert(Span<const unsigned char>(hash.begin(), hash.end()));
+ insert(MakeUCharSpan(hash));
}
bool CBloomFilter::contains(Span<const unsigned char> vKey) const
@@ -85,13 +85,12 @@ bool CBloomFilter::contains(const COutPoint& outpoint) const
{
CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
stream << outpoint;
- return contains(Span<const unsigned char>((const unsigned char*)stream.data(),
- stream.size()));
+ return contains(MakeUCharSpan(stream));
}
bool CBloomFilter::contains(const uint256& hash) const
{
- return contains(Span<const unsigned char>(hash.begin(), hash.end()));
+ return contains(MakeUCharSpan(hash));
}
bool CBloomFilter::IsWithinSizeConstraints() const
@@ -241,7 +240,7 @@ void CRollingBloomFilter::insert(Span<const unsigned char> vKey)
void CRollingBloomFilter::insert(const uint256& hash)
{
- insert(Span<const unsigned char>(hash.begin(), hash.end()));
+ insert(MakeUCharSpan(hash));
}
bool CRollingBloomFilter::contains(Span<const unsigned char> vKey) const
@@ -260,7 +259,7 @@ bool CRollingBloomFilter::contains(Span<const unsigned char> vKey) const
bool CRollingBloomFilter::contains(const uint256& hash) const
{
- return contains(Span<const unsigned char>(hash.begin(), hash.end()));
+ return contains(MakeUCharSpan(hash));
}
void CRollingBloomFilter::reset()
diff --git a/src/test/bloom_tests.cpp b/src/test/bloom_tests.cpp
index 1ad5607429..9cf6e352e4 100644
--- a/src/test/bloom_tests.cpp
+++ b/src/test/bloom_tests.cpp
@@ -91,7 +91,7 @@ BOOST_AUTO_TEST_CASE(bloom_create_insert_key)
CBloomFilter filter(2, 0.001, 0, BLOOM_UPDATE_ALL);
filter.insert(vchPubKey);
uint160 hash = pubkey.GetID();
- filter.insert(Span<unsigned char>(hash.begin(), hash.end()));
+ filter.insert(MakeUCharSpan(hash));
CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
stream << filter; |
We can avoid many unnecessary std::vector allocations by changing CBloomFilter to take Spans instead of std::vector's for the `insert` and `contains` operations. CBloomFilter currently converts types such as CDataStream and uint256 to std::vector on `insert` and `contains`. This is unnecessary because CDataStreams and uint256 are already std::vectors internally. We just need a way to point to the right data within those types. Span gives us this ability. Signed-off-by: William Casarin <[email protected]>
897f5be to
be8c3b4
Compare
|
rebased |
vincenzopalazzo
left a comment
There was a problem hiding this comment.
utACK be8c3b4
Theoretically, this could be a good improvement because it removes the smart pointer from the game, but I never use it in practice the span and I'm curious to see how much improvement brings this game.
|
On Tue, May 04, 2021 at 03:28:59PM -0700, Vincenzo Palazzo wrote:
Theoretically, this could be a good improvement because it removes the smart pointer from the game, but I never use it in practice the span and I'm curious to see how much improvement brings this game.
See my comment here:
This code came up as a place where many allocations occur. Mainly due to allocations of CService::GetKey which is passed to these functions, but this PR is needed before we get to that.
This PR is a stepping stone to:
[ZAP4] netaddress: return a prevector from CService::GetKey() [1]
Which removes a bunch of unnecessary heap allocations during network operations.
[1] #18849
|
|
Concept ACK. Have you figured out a way to benchmark this? |
| void insert(const uint256& hash); | ||
| bool contains(const std::vector<unsigned char>& vKey) const; | ||
| bool contains(Span<const unsigned char> vKey) const; | ||
| bool contains(const uint256& hash) const; |
| bool contains(const std::vector<unsigned char>& vKey) const; | ||
| bool contains(Span<const unsigned char> vKey) const; | ||
| bool contains(const COutPoint& outpoint) const; | ||
| bool contains(const uint256& hash) const; |
| filter.insert(vchPubKey); | ||
| uint160 hash = pubkey.GetID(); | ||
| filter.insert(std::vector<unsigned char>(hash.begin(), hash.end())); | ||
| filter.insert(MakeUCharSpan(hash)); |
There was a problem hiding this comment.
No need for MakeUCharSpan, filter.insert(hash); works now too
| stream << outpoint; | ||
| std::vector<unsigned char> data(stream.begin(), stream.end()); | ||
| insert(data); | ||
| insert(MakeUCharSpan(stream)); |
There was a problem hiding this comment.
No need for MakeUCharSpan
I think DrahtBot screwed up here. No rebase happened in between. |
|
:( |
|
Any specific reason for closing? |
|
lost interest. It's fine with me if someone else wants to pick it up. |
|
@jb55, I can be interested in it. I love your optimization, and maybe can be also a good starting point to push some code on bitcoin :-) |
|
Rebased, with comments addressed in #23115. |
…rt` and `contains` a11da75 bloom: cleanup includes (fanquake) f1ed1d3 bloom: use constexpr where appropriate (fanquake) 2ba4ddf bloom: use Span instead of std::vector for `insert` and `contains` (William Casarin) Pull request description: This is bitcoin#18985 rebased, with the most recent comments addressed. > We can avoid many unnecessary std::vector allocations by changing CBloomFilter to take Spans instead of std::vector's for the `insert` and `contains` operations. > CBloomFilter currently converts types such as CDataStream and uint256 to std::vector on `insert` and `contains`. This is unnecessary because CDataStreams and uint256 are already std::vectors internally. We just need a way to point to the right data within those types. Span gives us this ability. ACKs for top commit: sipa: Code review ACK a11da75 laanwj: Code review ACK a11da75 Tree-SHA512: ee9ba02c9588daa1ff51782d1953fd060839dd15aa85861b2633b6ff2398320188ddd00f01d0c99442224485364ede9f8322366de4239fc7831ebfa06bd34659
This builds off #18468 since it simplifies this PR a bunch. Review that first.
We can avoid many unnecessary
std::vectorallocations by changingCBloomFilterto takeSpans instead ofstd::vectors for theinsertand
containsoperations.CBloomFiltercurrently converts types such asCDataStreamanduint256to
std::vectoroninsertandcontains. This is unnecessary becauseCDataStreams anduint256s are alreadystd::vectorsinternally. We justneed a way to point to the right data within those types.
Spangivesus this ability.
This is a part of the Zero Allocations Project #18849 (ZAP3). This code came up as a place where many allocations occur. Mainly due to allocations of
CService::GetKeywhich is passed to these functions, but this PR is needed before we get to that.