base58: use map instead of strchr() when decode by bitkevin · Pull Request #12704 · bitcoin/bitcoin

bitkevin · 2018-03-16T05:56:05Z

Use array map instead of find string position.

Test code snippet:

#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include <string>

int main(int argc, const char * argv[]) {

  static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
  static const int8_t mapBase58[] = {
    -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    -1, 0, 1, 2, 3, 4, 5, 6,  7, 8,-1,-1,-1,-1,-1,-1,
    -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1,
    22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1,
    -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46,
    47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1,
  };

  const std::string b58Str(pszBase58);

  for (size_t i = 0; i < b58Str.length(); i++) {
    const char *ch = strchr(pszBase58, b58Str[i]);
    printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]);
    assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]);
  }

  assert(mapBase58['1'] == 0);
  assert(mapBase58['z'] == 57);

  /** All alphanumeric characters except for "0", "I", "O", and "l" */
  assert(mapBase58['0'] == -1);
  assert(mapBase58['I'] == -1);
  assert(mapBase58['O'] == -1);
  assert(mapBase58['l'] == -1);

  return 0;
}

ken2812221 · 2018-03-16T06:33:29Z

src/base58.cpp

Should you check the character *psz is less than 128?

Oops...I was thought size of mapBase58 is 256, thanks

Fixed, as per @laanwj suggestion mapBase58 has 256 elements.

promag · 2018-03-16T07:39:37Z

Have you measured performance improvement?

bitkevin · 2018-03-16T08:09:46Z

Have you measured performance improvement?

Performance improvement is about 40%. 1,000,000 rounds, it's about 450ms vs 250ms.

uint64_t getCurrentTime() {
  struct timeval tv;
  gettimeofday(&tv, NULL);
  return tv.tv_sec * 1000 + tv.tv_usec / 1000;
}

const std::string b58Str("3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy");
size_t len = b58Str.length();

uint64_t cnt;

cnt = 0;
uint64_t t1 = getCurrentTime();
for (size_t j = 0; j < 1000000; j++) {
  for (size_t i = 0; i < len; i++) {
    const char *ch = strchr(pszBase58, b58Str[i]);
    cnt += ch - pszBase58;
  }
}
uint64_t t2 = getCurrentTime();
printf("%lld\n", t2 - t1);

cnt = 0;
uint64_t t3 = getCurrentTime();
for (size_t j = 0; j < 1000000; j++) {
  for (size_t i = 0; i < len; i++) {
    cnt += mapBase58[(uint8_t)b58Str[i]];
  }
}
uint64_t t4 = getCurrentTime();
printf("%lld\n", t4 - t3);

laanwj · 2018-03-16T10:36:06Z

Concept ACK. Seems very straightforward (haven't checked the table yet, though).

Performance improvement is about 40%. 1,000,000 rounds, it's about 450ms vs 250ms.

Nice. Though here you're not benchmarking the entire DecodeBase58 function, but the specific part that you sped up, so that will give somewhat distored results.

FWIW there's a benchmark for base58 in src/bench - it's somewhat representative of what base58 is used for in bitcoin - encoding/decoding addresses.

laanwj · 2018-03-16T10:42:17Z

src/base58.cpp

Just make the mapBase58 array 256 bytes large, and you don't need the range check.

promag · 2018-03-16T14:48:39Z

src/base58.cpp

IMO just leave this as a comment or make it static_assert?

With c++11 it should be easy to make it a static_assert.

--- a/src/base58.cpp +++ b/src/base58.cpp @@ -20,7 +20,7 @@ /** All alphanumeric characters except for "0", "I", "O", and "l" */ static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"; -static const int8_t mapBase58[] = { +constexpr std::array<int8_t, 256> mapBase58{ -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, @@ -55,7 +55,7 @@ bool DecodeBase58(const char* psz, std::vector<unsigned char>& vch) int size = strlen(psz) * 733 /1000 + 1; // log(58) / log(256), rounded up. std::vector<unsigned char> b256(size); // Process the characters. - assert(sizeof(mapBase58)/sizeof(mapBase58[0]) == 256); // guarantee not out of range + static_assert(mapBase58.size() == 256); // guarantee not out of range while (*psz && !isspace(*psz)) { // Decode base58 character int carry = mapBase58[(uint8_t)*psz];

A run-time assertion is certainly overkill here.

promag · 2018-03-17T09:45:42Z

Kicked travis due to timeout.

donaloconnor · 2018-03-17T12:30:51Z

utACK. Your test in comment 0 threw me off because it only has 128 elements.

I think we need tests also submitted in this to test all values 0-255.

maflcko · 2018-03-17T14:41:29Z

Would you mind removing the test in comment 0 and adding it to the unit test suite?

sipa · 2018-03-17T18:36:44Z

I benchmarked this on my desktop system: a full address decode goes from 1.50 us to 1.29 us (including checksum check).

I'm not convinced this is worth it.

randolf · 2018-03-19T03:54:40Z

@sipa One use case I can think of immediately is that Vanity Address Generators can benefit from this performance increase because they repeatedly use the Base58-encoded addresses in attempting to match the desired string(s).

Generally I also value speed optimizations, even in normal application use, and I don't regard the increased size of the resulting binary to be significant enough to warrant trade-off concern here. YMMV.

dcousens

IMHO, simpler, utACK.

nit: why is mapBase58 indented in the 8th element?

laanwj · 2018-03-19T08:22:10Z

src/base58.cpp

If you make this

static const int8_t mapBase58[256] = {

That's pretty much a static assertion that the size will be 256.

dcousens · 2018-03-19T15:46:29Z

src/base58.cpp

The static_assert is pointless now? Maybe?

Just in case, after all static_assert() is no harm.

donaloconnor · 2018-03-19T15:56:33Z

src/base58.cpp

missing
#include <array>
?

sorry, I don't familiar with c++11, so just change it back to old school style.

IMHO, the .size() method was worth the import... but anyway.

promag · 2018-03-21T00:36:42Z

utACK 5d71e4d, please squash.

sipa · 2018-03-21T00:58:47Z

utACK, needs squash.

eklitzke · 2018-03-21T01:13:43Z

This is good for a 20% speedup for me with GCC 7.3 (median goes from 8.70969e-07 to 7.00866e-07). ACK once squashed.

JeremyRubin · 2018-03-21T04:08:18Z

I'd like to see a comparison with one or two other methods of doing a table lookup to make sure this is optimal.

For instance

switch(ch) {
  case '1': 
    carry = 0;
    break;
  // ....
  case 'z':
    carry = 57;
    break;
  default:
    return false;
}

Additionally you can try some outputs from gperf https://www.gnu.org/software/gperf/

eklitzke · 2018-03-21T04:36:07Z

I don't think you can get any faster than this approach, which is a flat lookup table that maps ints to ints (without any hashing).

The typical use case of gperf is for something kind of different: you'd provide it to a tokenizer where you have a grammar of long human readable strings, and you want to hash all of the tokens in the grammar to small ints without collisions.

sipa · 2018-03-21T05:08:51Z

Stop wasting time on discussing the performance. This does not matter. Decoding an address could take 50 us and I don't think anyone would notice.

If the resulting code looks better, go for it. Otherwise, don't.

-0

JeremyRubin · 2018-03-21T05:18:31Z

I have some notes on why some alternatives that would be faster, but as @sipa notes, there are bigger fish to fry.

laanwj · 2018-03-22T08:58:17Z

If the resulting code looks better, go for it. Otherwise, don't.

Yes, I do prefer the code like this, because it's more consistent with how we handle base32 and hex for ex. So utACK bcab47b. Agree that this is a dead end in regard to performance, if you are interested in performance please review @eklitzke's work he's doing great things.

bcab47b use base58 map instead of strchr() (Kevin Pan) Pull request description: Use array map instead of find string position. Test code snippet: ```cpp #include <assert.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string> int main(int argc, const char * argv[]) { static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"; static const int8_t mapBase58[] = { -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1, -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1, 22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1, -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46, 47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1, }; const std::string b58Str(pszBase58); for (size_t i = 0; i < b58Str.length(); i++) { const char *ch = strchr(pszBase58, b58Str[i]); printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]); assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]); } assert(mapBase58['1'] == 0); assert(mapBase58['z'] == 57); /** All alphanumeric characters except for "0", "I", "O", and "l" */ assert(mapBase58['0'] == -1); assert(mapBase58['I'] == -1); assert(mapBase58['O'] == -1); assert(mapBase58['l'] == -1); return 0; } ``` Tree-SHA512: c28376dc8c92cc4a770c3282db4a568ae5f5a08e27f714183eb3d8755421dc7aa11d7b45afa55e70eba46565f378062aac53dc8f150eeeab12ce7b5db5af89c5

Summary: Backport of Bitcoin Core PR12704 bitcoin/bitcoin#12704 Test Plan: ``` make check ``` Reviewers: Fabien, O1 Bitcoin ABC, #bitcoin_abc, deadalnix Reviewed By: Fabien, O1 Bitcoin ABC, #bitcoin_abc Differential Revision: https://reviews.bitcoinabc.org/D3938

bcab47b use base58 map instead of strchr() (Kevin Pan) Pull request description: Use array map instead of find string position. Test code snippet: ```cpp #include <assert.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string> int main(int argc, const char * argv[]) { static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"; static const int8_t mapBase58[] = { -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8,-1,-1,-1,-1,-1,-1, -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1, 22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1, -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46, 47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1, }; const std::string b58Str(pszBase58); for (size_t i = 0; i < b58Str.length(); i++) { const char *ch = strchr(pszBase58, b58Str[i]); printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]); assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]); } assert(mapBase58['1'] == 0); assert(mapBase58['z'] == 57); /** All alphanumeric characters except for "0", "I", "O", and "l" */ assert(mapBase58['0'] == -1); assert(mapBase58['I'] == -1); assert(mapBase58['O'] == -1); assert(mapBase58['l'] == -1); return 0; } ``` Tree-SHA512: c28376dc8c92cc4a770c3282db4a568ae5f5a08e27f714183eb3d8755421dc7aa11d7b45afa55e70eba46565f378062aac53dc8f150eeeab12ce7b5db5af89c5

fanquake added the Refactoring label Mar 16, 2018

ken2812221 reviewed Mar 16, 2018

View reviewed changes

laanwj reviewed Mar 16, 2018

View reviewed changes

promag reviewed Mar 16, 2018

View reviewed changes

dcousens approved these changes Mar 19, 2018

View reviewed changes

laanwj reviewed Mar 19, 2018

View reviewed changes

dcousens reviewed Mar 19, 2018

View reviewed changes

donaloconnor reviewed Mar 19, 2018

View reviewed changes

use base58 map instead of strchr()

bcab47b

bitkevin force-pushed the b58_bitmap branch from 5d71e4d to bcab47b Compare March 21, 2018 04:01

laanwj merged commit bcab47b into bitcoin:master Mar 22, 2018

bitkevin deleted the b58_bitmap branch March 25, 2018 15:00

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Conversation

bitkevin commented Mar 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

promag commented Mar 16, 2018

Uh oh!

bitkevin commented Mar 16, 2018

Uh oh!

laanwj commented Mar 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

promag commented Mar 17, 2018

Uh oh!

donaloconnor commented Mar 17, 2018

Uh oh!

maflcko commented Mar 17, 2018

Uh oh!

sipa commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

randolf commented Mar 19, 2018

Uh oh!

dcousens left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcousens Mar 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

donaloconnor Mar 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

promag commented Mar 21, 2018

Uh oh!

sipa commented Mar 21, 2018

Uh oh!

eklitzke commented Mar 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JeremyRubin commented Mar 21, 2018

Uh oh!

eklitzke commented Mar 21, 2018

Uh oh!

sipa commented Mar 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JeremyRubin commented Mar 21, 2018

Uh oh!

laanwj commented Mar 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

sipa commented Mar 17, 2018 •

edited

Loading

dcousens Mar 19, 2018 •

edited

Loading

donaloconnor Mar 19, 2018 •

edited

Loading

eklitzke commented Mar 21, 2018 •

edited

Loading

sipa commented Mar 21, 2018 •

edited

Loading

laanwj commented Mar 22, 2018 •

edited

Loading