lib: Optimizing siphash implementation#18014
Conversation
|
(the Travis failure isn't related, it's a bug in the s390x machines) |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
That's a very nice speed improvement! |
|
Where do we use variable-length SipHash? |
|
I just tested it and the speed improvement is good. Also the formatting is much prettier now. |
cvengler
left a comment
There was a problem hiding this comment.
This review is more related to the interface rather than the crypto as I don't have much experience to review that.
A quick search shows:
EDIT: We could also replace this Write+Finalize with a single invocation and gain a few more percentages (by not storing and checking tail, and improving inlining) but that's kinda lose future usability. |
|
nit: You have a few whitespace irregularities - running diff --git a/src/bench/crypto_hash.cpp b/src/bench/crypto_hash.cpp
index 9eeb8da16..037260939 100644
--- a/src/bench/crypto_hash.cpp
+++ b/src/bench/crypto_hash.cpp
@@ -90,7 +90,7 @@ static void SipHash(benchmark::State& state)
{
uint64_t hash = 0;
uint64_t k2 = 0;
- std::vector<uint8_t> in(BUFFER_SIZE,0);
+ std::vector<uint8_t> in(BUFFER_SIZE, 0);
while (state.KeepRunning())
hash = CSipHasher(hash, ++k2).Write(in.data(), in.size()).Finalize();
}
diff --git a/src/crypto/siphash.cpp b/src/crypto/siphash.cpp
index 0b62de998..cfc04c194 100644
--- a/src/crypto/siphash.cpp
+++ b/src/crypto/siphash.cpp
@@ -2,8 +2,8 @@
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
-#include <crypto/siphash.h>
#include <crypto/common.h>
+#include <crypto/siphash.h>
#include <algorithm>
@@ -50,7 +50,8 @@ CSipHasher& CSipHasher::Write(uint64_t data)
/// Load a uint64_t from 0 to 7 bytes.
-inline uint64_t ReadU64ByLenLE(const unsigned char* data, size_t len) {
+inline uint64_t ReadU64ByLenLE(const unsigned char* data, size_t len)
+{
assert(len < 8);
uint64_t out = 0;
for (size_t i = 0; i < len; ++i) {
@@ -85,7 +86,7 @@ CSipHasher& CSipHasher::Write(const unsigned char* data, size_t size)
auto i = needed;
while (i < len - left) {
- uint64_t mi = ReadLE64(data+i);
+ uint64_t mi = ReadLE64(data + i);
v3 ^= mi;
SIPROUND;
SIPROUND;
diff --git a/src/crypto/siphash.h b/src/crypto/siphash.h
index 698f9c310..ddf192e8a 100644
--- a/src/crypto/siphash.h
+++ b/src/crypto/siphash.h
@@ -14,7 +14,7 @@ class CSipHasher
{
private:
uint64_t v[4];
- size_t count; // total amount of bytes inputted.
+ size_t count; // total amount of bytes inputted.
uint64_t tail; // bytes that weren't processed yet.
public: |
cd5f26c to
81e0f14
Compare
Github-Pull: bitcoin#18014 Rebased-From: 81e0f144216f5021851c30df349e521b0b43935c
|
I decided to drop commit 2b32471b5b113c0a34e06896842a450dc7168454 because it's more controversial than I thought (there might be software that relies on the current formatting and there's #18011 coming up) |
81e0f14 to
de0c7fc
Compare
There was a problem hiding this comment.
Concept ACK
before change of formatting
$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
# Benchmark, evals, iterations, total, min, max, median
SipHash_32b, 5, 40000000, 6.28913, 3.10429e-08, 3.27799e-08, 3.11495e-08
# Benchmark, evals, iterations, total, min, max, median
SipHash_32b, 5, 40000000, 6.33962, 3.10606e-08, 3.30034e-08, 3.12351e-08
# Benchmark, evals, iterations, total, min, max, median
SipHash_32b, 5, 40000000, 6.55364, 3.12087e-08, 3.75397e-08, 3.16322e-08
after change of formatting and added benchmarks
((HEAD detached at 52aa1f380b))$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark evals iterations total min max median
SipHash 5 700 7.58726 0.00214442 0.00219727 0.00217387
SipHash_32b 5 40000000 6.32856 3.11782e-08 3.25053e-08 3.1465e-08
SipHash_3b 5 40000000 4.3802 2.16468e-08 2.21054e-08 2.19273e-08
#Benchmark evals iterations total min max median
SipHash 5 700 7.82075 0.0021651 0.0024746 0.00218246
SipHash_32b 5 40000000 6.39722 3.12934e-08 3.3359e-08 3.16498e-08
SipHash_3b 5 40000000 4.6185 2.18849e-08 2.52748e-08 2.30488e-08
#Benchmark evals iterations total min max median
SipHash 5 700 8.12191 0.00215675 0.00248114 0.00232176
SipHash_32b 5 40000000 6.33754 3.12737e-08 3.24442e-08 3.16561e-08
SipHash_3b 5 40000000 4.3563 2.13873e-08 2.21485e-08 2.17682e-08
after change of algorithm implementation
(pr/18014)$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
#Benchmark evals iterations total min max median
SipHash 5 700 2.0267 0.000571765 0.000590692 0.000579299
SipHash_32b 5 40000000 6.27794 3.10675e-08 3.209e-08 3.13121e-08
SipHash_3b 5 40000000 4.07212 2.01261e-08 2.06954e-08 2.02564e-08
#Benchmark evals iterations total min max median
SipHash 5 700 2.13875 0.000582848 0.000627614 0.000614234
SipHash_32b 5 40000000 6.27427 3.10577e-08 3.16413e-08 3.13393e-08
SipHash_3b 5 40000000 4.02383 1.9983e-08 2.05697e-08 2.00265e-08
#Benchmark evals iterations total min max median
SipHash 5 700 2.05042 0.000577004 0.00060258 0.000584264
SipHash_32b 5 40000000 6.32065 3.10846e-08 3.25604e-08 3.14578e-08
SipHash_3b 5 40000000 4.03686 1.96887e-08 2.08056e-08 2.02039e-08
Will look at the algorithm.
src/crypto/siphash.h
Outdated
There was a problem hiding this comment.
perhaps sort these entries L::16-18
There was a problem hiding this comment.
Why would you want to sort them? This order seems to make sense to me?
There was a problem hiding this comment.
Yeah, do you remember the reason you wanted a different order here?
|
Sorry for the delay @elichai -- I still plan to review this. |
de0c7fc to
9ed348d
Compare
|
Updated benchmarks with the new benchmarking library: Before: After: |
4431644 to
19e28a4
Compare
|
With the benchmarks adjusted to bytes After: |
|
utACK 19e28a41168d02dc85356714c889b43d96d834d6 |
PastaPastaPasta
left a comment
There was a problem hiding this comment.
this PR seems quite dead, but if it gets revived, please do the following
src/crypto/siphash.cpp
Outdated
There was a problem hiding this comment.
please convert this to a c++11 functional cast
There was a problem hiding this comment.
These are integer casts, so I'm not sure if it's worth doing because it decreases the readability, and not worth asking for re-ACKs
src/crypto/siphash.cpp
Outdated
There was a problem hiding this comment.
please use a c++11 functional-cast
I wouldn't say it is dead. It compiles and (unit) tests fine on itself and on current master. All it needs is review. |
PastaPastaPasta
left a comment
There was a problem hiding this comment.
My comment on aliveness was just that it's been about 11 months since it got a review, not to say that there is anything problematic with the pr. Maybe since I commented some others will be notified and will do a review :)
For what it's worth, I don't see any major issues with the PR, although I did add a few more comments of things that likely should be changed
src/crypto/siphash.cpp
Outdated
There was a problem hiding this comment.
This should really be a for loop
There was a problem hiding this comment.
i is used later, so this will require for(; i < len - left; i+= 8) which I'm not sure is that better because it can be confusing,but I can change it if you think it improves readability
src/crypto/siphash.h
Outdated
There was a problem hiding this comment.
Why would you want to sort them? This order seems to make sense to me?
aureleoules
left a comment
There was a problem hiding this comment.
ACK 19e28a41168d02dc85356714c889b43d96d834d6
I verified that the implementation of CSipHasher::Write matches the one from rust-bitcoin/bitcoin_hashes (https://github.com/rust-bitcoin/bitcoin_hashes/blob/ec356e4933f6ee972dd0a1f836a3297b2e9f3407/src/siphash24.rs#L171-L208)
src/crypto/siphash.cpp
Outdated
There was a problem hiding this comment.
nit: these should be initialized in the class
diff --git a/src/crypto/siphash.cpp b/src/crypto/siphash.cpp
index f78771868..dcf40d309 100644
--- a/src/crypto/siphash.cpp
+++ b/src/crypto/siphash.cpp
@@ -24,8 +24,6 @@ CSipHasher::CSipHasher(uint64_t k0, uint64_t k1)
v[1] = 0x646f72616e646f6dULL ^ k1;
v[2] = 0x6c7967656e657261ULL ^ k0;
v[3] = 0x7465646279746573ULL ^ k1;
- count = 0;
- tail = 0;
}
CSipHasher& CSipHasher::Write(uint64_t data)
diff --git a/src/crypto/siphash.h b/src/crypto/siphash.h
index 4b630f229..1b62e2e6c 100644
--- a/src/crypto/siphash.h
+++ b/src/crypto/siphash.h
@@ -14,8 +14,8 @@ class CSipHasher
{
private:
uint64_t v[4];
- uint64_t tail; // bytes that weren't processed yet.
- uint8_t count; // total amount of bytes inputted.
+ uint64_t tail{0}; // bytes that weren't processed yet.
+ uint8_t count{0}; // total amount of bytes inputted.
public:
/** Construct a SipHash calculator initialized with 128-bit key (k0, k1) */|
Maybe this needs a rebase for the CI to pass on this PR? |
|
Couldn't reproduce the linter complaints locally, so rebased, hopefully that'll fix it |
19e28a4 to
409c2e3
Compare
aureleoules
left a comment
There was a problem hiding this comment.
reACK 409c2e3 - rebased since last review (no changes)
|
@sipa re-review? |
|
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
|
Semi-ACK: Didn't fully review the code, but this has been part of Knots for over 2 years (0.19.1) and no issues seem to have surfaced. |
|
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
|
Closing as up for grabs due to lack of activity. |
Github-Pull: bitcoin#18014 Rebased-From: 409c2e3
Github-Pull: bitcoin#18014 Rebased-From: 169a439
Github-Pull: bitcoin#18014 Rebased-From: 409c2e3
Hi,
This builds on #18013
Before anything I want to point out that we have 3 SipHash implementations
CSipHasher,SipHashUint256,SipHashUint256Extra. this PR touches only the first one(not used in any hashmap AFAIK).I re-implemented the
CSipHasherwith performance up to 3X times faster for big strings (BUFFER_SIZE = 1000*1000) and 5%-19% faster for small strings (3 bytes, because a minute of syncing showed me that 3 bytes siphash is something that happens quite often)Benchmarks against other siphash implementations can be found here: https://gist.github.com/elichai/abdebeeaee7e581bc74c75cb9487b3af (code: https://github.com/elichai/siphash-bench)
My implementation was inspired by the one in Rust's stdlib (https://github.com/rust-lang/rust/blob/master/src/libcore/hash/sip.rs) which rust-bitcoin use in https://github.com/rust-bitcoin/bitcoin_hashes.
Before:
After:
Also made the benchmarks print a more readable output(https://gist.github.com/elichai/812c8866a69959404b480d968e080475),this is limited by up to 47 chars of benchmark name, so as long as we don't add more names like
CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPTand longer then it will be fine.(it can probably be adjustable but that will require iterating over all the tests before running them to determine the longest cell and I thought the 47 limit is more than reasonable)