base58: use map instead of strchr() when decode#12704
Conversation
src/base58.cpp
Outdated
There was a problem hiding this comment.
Should you check the character *psz is less than 128?
There was a problem hiding this comment.
Oops...I was thought size of mapBase58 is 256, thanks
There was a problem hiding this comment.
Fixed, as per @laanwj suggestion mapBase58 has 256 elements.
|
Have you measured performance improvement? |
Performance improvement is about 40%. 1,000,000 rounds, it's about |
|
Concept ACK. Seems very straightforward (haven't checked the table yet, though).
Nice. Though here you're not benchmarking the entire FWIW there's a benchmark for base58 in |
src/base58.cpp
Outdated
There was a problem hiding this comment.
Just make the mapBase58 array 256 bytes large, and you don't need the range check.
src/base58.cpp
Outdated
There was a problem hiding this comment.
IMO just leave this as a comment or make it static_assert?
There was a problem hiding this comment.
With c++11 it should be easy to make it a static_assert.
--- a/src/base58.cpp
+++ b/src/base58.cpp
@@ -20,7 +20,7 @@
/** All alphanumeric characters except for "0", "I", "O", and "l" */
static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
-static const int8_t mapBase58[] = {
+constexpr std::array<int8_t, 256> mapBase58{
-1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
@@ -55,7 +55,7 @@ bool DecodeBase58(const char* psz, std::vector<unsigned char>& vch)
int size = strlen(psz) * 733 /1000 + 1; // log(58) / log(256), rounded up.
std::vector<unsigned char> b256(size);
// Process the characters.
- assert(sizeof(mapBase58)/sizeof(mapBase58[0]) == 256); // guarantee not out of range
+ static_assert(mapBase58.size() == 256); // guarantee not out of range
while (*psz && !isspace(*psz)) {
// Decode base58 character
int carry = mapBase58[(uint8_t)*psz];There was a problem hiding this comment.
A run-time assertion is certainly overkill here.
|
Kicked travis due to timeout. |
|
utACK. Your test in comment 0 threw me off because it only has 128 elements. I think we need tests also submitted in this to test all values 0-255. |
|
Would you mind removing the test in comment 0 and adding it to the unit test suite? |
|
I benchmarked this on my desktop system: a full address decode goes from 1.50 us to 1.29 us (including checksum check). I'm not convinced this is worth it. |
|
@sipa One use case I can think of immediately is that Vanity Address Generators can benefit from this performance increase because they repeatedly use the Base58-encoded addresses in attempting to match the desired string(s). Generally I also value speed optimizations, even in normal application use, and I don't regard the increased size of the resulting binary to be significant enough to warrant trade-off concern here. YMMV. |
dcousens
left a comment
There was a problem hiding this comment.
IMHO, simpler, utACK.
nit: why is mapBase58 indented in the 8th element?
src/base58.cpp
Outdated
There was a problem hiding this comment.
If you make this
static const int8_t mapBase58[256] = {
That's pretty much a static assertion that the size will be 256.
src/base58.cpp
Outdated
There was a problem hiding this comment.
The static_assert is pointless now? Maybe?
There was a problem hiding this comment.
Just in case, after all static_assert() is no harm.
src/base58.cpp
Outdated
There was a problem hiding this comment.
missing
#include <array>
?
There was a problem hiding this comment.
sorry, I don't familiar with c++11, so just change it back to old school style.
There was a problem hiding this comment.
IMHO, the .size() method was worth the import... but anyway.
|
utACK 5d71e4d, please squash. |
|
utACK, needs squash. |
|
This is good for a 20% speedup for me with GCC 7.3 (median goes from 8.70969e-07 to 7.00866e-07). ACK once squashed. |
|
I'd like to see a comparison with one or two other methods of doing a table lookup to make sure this is optimal. For instance switch(ch) {
case '1':
carry = 0;
break;
// ....
case 'z':
carry = 57;
break;
default:
return false;
}Additionally you can try some outputs from gperf https://www.gnu.org/software/gperf/ |
|
I don't think you can get any faster than this approach, which is a flat lookup table that maps ints to ints (without any hashing). The typical use case of gperf is for something kind of different: you'd provide it to a tokenizer where you have a grammar of long human readable strings, and you want to hash all of the tokens in the grammar to small ints without collisions. |
|
Stop wasting time on discussing the performance. This does not matter. Decoding an address could take 50 us and I don't think anyone would notice. If the resulting code looks better, go for it. Otherwise, don't. -0 |
|
I have some notes on why some alternatives that would be faster, but as @sipa notes, there are bigger fish to fry. |
Yes, I do prefer the code like this, because it's more consistent with how we handle base32 and hex for ex. So utACK bcab47b. Agree that this is a dead end in regard to performance, if you are interested in performance please review @eklitzke's work he's doing great things. |
bcab47b use base58 map instead of strchr() (Kevin Pan) Pull request description: Use array map instead of find string position. Test code snippet: ```cpp #include <assert.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string> int main(int argc, const char * argv[]) { static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"; static const int8_t mapBase58[] = { -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1, -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1, 22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1, -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46, 47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1, }; const std::string b58Str(pszBase58); for (size_t i = 0; i < b58Str.length(); i++) { const char *ch = strchr(pszBase58, b58Str[i]); printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]); assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]); } assert(mapBase58['1'] == 0); assert(mapBase58['z'] == 57); /** All alphanumeric characters except for "0", "I", "O", and "l" */ assert(mapBase58['0'] == -1); assert(mapBase58['I'] == -1); assert(mapBase58['O'] == -1); assert(mapBase58['l'] == -1); return 0; } ``` Tree-SHA512: c28376dc8c92cc4a770c3282db4a568ae5f5a08e27f714183eb3d8755421dc7aa11d7b45afa55e70eba46565f378062aac53dc8f150eeeab12ce7b5db5af89c5
Summary: Backport of Bitcoin Core PR12704 bitcoin/bitcoin#12704 Test Plan: ``` make check ``` Reviewers: Fabien, O1 Bitcoin ABC, #bitcoin_abc, deadalnix Reviewed By: Fabien, O1 Bitcoin ABC, #bitcoin_abc Differential Revision: https://reviews.bitcoinabc.org/D3938
bcab47b use base58 map instead of strchr() (Kevin Pan) Pull request description: Use array map instead of find string position. Test code snippet: ```cpp #include <assert.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string> int main(int argc, const char * argv[]) { static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"; static const int8_t mapBase58[] = { -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1, -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1, 22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1, -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46, 47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1, }; const std::string b58Str(pszBase58); for (size_t i = 0; i < b58Str.length(); i++) { const char *ch = strchr(pszBase58, b58Str[i]); printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]); assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]); } assert(mapBase58['1'] == 0); assert(mapBase58['z'] == 57); /** All alphanumeric characters except for "0", "I", "O", and "l" */ assert(mapBase58['0'] == -1); assert(mapBase58['I'] == -1); assert(mapBase58['O'] == -1); assert(mapBase58['l'] == -1); return 0; } ``` Tree-SHA512: c28376dc8c92cc4a770c3282db4a568ae5f5a08e27f714183eb3d8755421dc7aa11d7b45afa55e70eba46565f378062aac53dc8f150eeeab12ce7b5db5af89c5
bcab47b use base58 map instead of strchr() (Kevin Pan) Pull request description: Use array map instead of find string position. Test code snippet: ```cpp #include <assert.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string> int main(int argc, const char * argv[]) { static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"; static const int8_t mapBase58[] = { -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1, -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1, 22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1, -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46, 47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1, }; const std::string b58Str(pszBase58); for (size_t i = 0; i < b58Str.length(); i++) { const char *ch = strchr(pszBase58, b58Str[i]); printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]); assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]); } assert(mapBase58['1'] == 0); assert(mapBase58['z'] == 57); /** All alphanumeric characters except for "0", "I", "O", and "l" */ assert(mapBase58['0'] == -1); assert(mapBase58['I'] == -1); assert(mapBase58['O'] == -1); assert(mapBase58['l'] == -1); return 0; } ``` Tree-SHA512: c28376dc8c92cc4a770c3282db4a568ae5f5a08e27f714183eb3d8755421dc7aa11d7b45afa55e70eba46565f378062aac53dc8f150eeeab12ce7b5db5af89c5
Use array map instead of find string position.
Test code snippet: