Skip to content

base58: use map instead of strchr() when decode#12704

Merged
laanwj merged 1 commit intobitcoin:masterfrom
bitkevin:b58_bitmap
Mar 22, 2018
Merged

base58: use map instead of strchr() when decode#12704
laanwj merged 1 commit intobitcoin:masterfrom
bitkevin:b58_bitmap

Conversation

@bitkevin
Copy link
Contributor

Use array map instead of find string position.

Test code snippet:

#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include <string>

int main(int argc, const char * argv[]) {

  static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
  static const int8_t mapBase58[] = {
    -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
    -1, 0, 1, 2, 3, 4, 5, 6,  7, 8,-1,-1,-1,-1,-1,-1,
    -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1,
    22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1,
    -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46,
    47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1,
  };

  const std::string b58Str(pszBase58);

  for (size_t i = 0; i < b58Str.length(); i++) {
    const char *ch = strchr(pszBase58, b58Str[i]);
    printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]);
    assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]);
  }

  assert(mapBase58['1'] == 0);
  assert(mapBase58['z'] == 57);

  /** All alphanumeric characters except for "0", "I", "O", and "l" */
  assert(mapBase58['0'] == -1);
  assert(mapBase58['I'] == -1);
  assert(mapBase58['O'] == -1);
  assert(mapBase58['l'] == -1);

  return 0;
}

src/base58.cpp Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you check the character *psz is less than 128?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops...I was thought size of mapBase58 is 256, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, as per @laanwj suggestion mapBase58 has 256 elements.

@promag
Copy link
Contributor

promag commented Mar 16, 2018

Have you measured performance improvement?

@bitkevin
Copy link
Contributor Author

Have you measured performance improvement?

Performance improvement is about 40%. 1,000,000 rounds, it's about 450ms vs 250ms.

uint64_t getCurrentTime() {
  struct timeval tv;
  gettimeofday(&tv, NULL);
  return tv.tv_sec * 1000 + tv.tv_usec / 1000;
}

const std::string b58Str("3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy");
size_t len = b58Str.length();

uint64_t cnt;

cnt = 0;
uint64_t t1 = getCurrentTime();
for (size_t j = 0; j < 1000000; j++) {
  for (size_t i = 0; i < len; i++) {
    const char *ch = strchr(pszBase58, b58Str[i]);
    cnt += ch - pszBase58;
  }
}
uint64_t t2 = getCurrentTime();
printf("%lld\n", t2 - t1);

cnt = 0;
uint64_t t3 = getCurrentTime();
for (size_t j = 0; j < 1000000; j++) {
  for (size_t i = 0; i < len; i++) {
    cnt += mapBase58[(uint8_t)b58Str[i]];
  }
}
uint64_t t4 = getCurrentTime();
printf("%lld\n", t4 - t3);

@laanwj
Copy link
Member

laanwj commented Mar 16, 2018

Concept ACK. Seems very straightforward (haven't checked the table yet, though).

Performance improvement is about 40%. 1,000,000 rounds, it's about 450ms vs 250ms.

Nice. Though here you're not benchmarking the entire DecodeBase58 function, but the specific part that you sped up, so that will give somewhat distored results.

FWIW there's a benchmark for base58 in src/bench - it's somewhat representative of what base58 is used for in bitcoin - encoding/decoding addresses.

src/base58.cpp Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make the mapBase58 array 256 bytes large, and you don't need the range check.

src/base58.cpp Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO just leave this as a comment or make it static_assert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With c++11 it should be easy to make it a static_assert.

--- a/src/base58.cpp
+++ b/src/base58.cpp
@@ -20,7 +20,7 @@
 
 /** All alphanumeric characters except for "0", "I", "O", and "l" */
 static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
-static const int8_t mapBase58[] = {
+constexpr std::array<int8_t, 256> mapBase58{
     -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
     -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
     -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
@@ -55,7 +55,7 @@ bool DecodeBase58(const char* psz, std::vector<unsigned char>& vch)
     int size = strlen(psz) * 733 /1000 + 1; // log(58) / log(256), rounded up.
     std::vector<unsigned char> b256(size);
     // Process the characters.
-    assert(sizeof(mapBase58)/sizeof(mapBase58[0]) == 256);  // guarantee not out of range
+    static_assert(mapBase58.size() == 256); // guarantee not out of range
     while (*psz && !isspace(*psz)) {
         // Decode base58 character
         int carry = mapBase58[(uint8_t)*psz];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A run-time assertion is certainly overkill here.

@promag
Copy link
Contributor

promag commented Mar 17, 2018

Kicked travis due to timeout.

@donaloconnor
Copy link
Contributor

utACK. Your test in comment 0 threw me off because it only has 128 elements.

I think we need tests also submitted in this to test all values 0-255.

@maflcko
Copy link
Member

maflcko commented Mar 17, 2018

Would you mind removing the test in comment 0 and adding it to the unit test suite?

@sipa
Copy link
Member

sipa commented Mar 17, 2018

I benchmarked this on my desktop system: a full address decode goes from 1.50 us to 1.29 us (including checksum check).

I'm not convinced this is worth it.

@randolf
Copy link
Contributor

randolf commented Mar 19, 2018

@sipa One use case I can think of immediately is that Vanity Address Generators can benefit from this performance increase because they repeatedly use the Base58-encoded addresses in attempting to match the desired string(s).

Generally I also value speed optimizations, even in normal application use, and I don't regard the increased size of the resulting binary to be significant enough to warrant trade-off concern here. YMMV.

Copy link
Contributor

@dcousens dcousens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, simpler, utACK.

nit: why is mapBase58 indented in the 8th element?

src/base58.cpp Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make this

static const int8_t mapBase58[256] = {

That's pretty much a static assertion that the size will be 256.

src/base58.cpp Outdated
Copy link
Contributor

@dcousens dcousens Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static_assert is pointless now? Maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, after all static_assert() is no harm.

src/base58.cpp Outdated
Copy link
Contributor

@donaloconnor donaloconnor Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing
#include <array>
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I don't familiar with c++11, so just change it back to old school style.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the .size() method was worth the import... but anyway.

@promag
Copy link
Contributor

promag commented Mar 21, 2018

utACK 5d71e4d, please squash.

@sipa
Copy link
Member

sipa commented Mar 21, 2018

utACK, needs squash.

@eklitzke
Copy link
Contributor

eklitzke commented Mar 21, 2018

This is good for a 20% speedup for me with GCC 7.3 (median goes from 8.70969e-07 to 7.00866e-07). ACK once squashed.

@JeremyRubin
Copy link
Contributor

I'd like to see a comparison with one or two other methods of doing a table lookup to make sure this is optimal.

For instance

switch(ch) {
  case '1': 
    carry = 0;
    break;
  // ....
  case 'z':
    carry = 57;
    break;
  default:
    return false;
}

Additionally you can try some outputs from gperf https://www.gnu.org/software/gperf/

@eklitzke
Copy link
Contributor

I don't think you can get any faster than this approach, which is a flat lookup table that maps ints to ints (without any hashing).

The typical use case of gperf is for something kind of different: you'd provide it to a tokenizer where you have a grammar of long human readable strings, and you want to hash all of the tokens in the grammar to small ints without collisions.

@sipa
Copy link
Member

sipa commented Mar 21, 2018

Stop wasting time on discussing the performance. This does not matter. Decoding an address could take 50 us and I don't think anyone would notice.

If the resulting code looks better, go for it. Otherwise, don't.

-0

@JeremyRubin
Copy link
Contributor

I have some notes on why some alternatives that would be faster, but as @sipa notes, there are bigger fish to fry.

@laanwj
Copy link
Member

laanwj commented Mar 22, 2018

If the resulting code looks better, go for it. Otherwise, don't.

Yes, I do prefer the code like this, because it's more consistent with how we handle base32 and hex for ex. So utACK bcab47b. Agree that this is a dead end in regard to performance, if you are interested in performance please review @eklitzke's work he's doing great things.

@laanwj laanwj merged commit bcab47b into bitcoin:master Mar 22, 2018
laanwj added a commit that referenced this pull request Mar 22, 2018
bcab47b use base58 map instead of strchr() (Kevin Pan)

Pull request description:

  Use array map instead of find string position.

  Test code snippet:

  ```cpp

  #include <assert.h>
  #include <stdint.h>
  #include <stdio.h>
  #include <stdlib.h>

  #include <string>

  int main(int argc, const char * argv[]) {

    static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
    static const int8_t mapBase58[] = {
      -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
      -1, 0, 1, 2, 3, 4, 5, 6,  7, 8,-1,-1,-1,-1,-1,-1,
      -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1,
      22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1,
      -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46,
      47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1,
    };

    const std::string b58Str(pszBase58);

    for (size_t i = 0; i < b58Str.length(); i++) {
      const char *ch = strchr(pszBase58, b58Str[i]);
      printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]);
      assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]);
    }

    assert(mapBase58['1'] == 0);
    assert(mapBase58['z'] == 57);

    /** All alphanumeric characters except for "0", "I", "O", and "l" */
    assert(mapBase58['0'] == -1);
    assert(mapBase58['I'] == -1);
    assert(mapBase58['O'] == -1);
    assert(mapBase58['l'] == -1);

    return 0;
  }

  ```

Tree-SHA512: c28376dc8c92cc4a770c3282db4a568ae5f5a08e27f714183eb3d8755421dc7aa11d7b45afa55e70eba46565f378062aac53dc8f150eeeab12ce7b5db5af89c5
@bitkevin bitkevin deleted the b58_bitmap branch March 25, 2018 15:00
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Aug 30, 2019
Summary:
Backport of Bitcoin Core PR12704
bitcoin/bitcoin#12704

Test Plan:
```
make check
```

Reviewers: Fabien, O1 Bitcoin ABC, #bitcoin_abc, deadalnix

Reviewed By: Fabien, O1 Bitcoin ABC, #bitcoin_abc

Differential Revision: https://reviews.bitcoinabc.org/D3938
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Dec 16, 2020
bcab47b use base58 map instead of strchr() (Kevin Pan)

Pull request description:

  Use array map instead of find string position.

  Test code snippet:

  ```cpp

  #include <assert.h>
  #include <stdint.h>
  #include <stdio.h>
  #include <stdlib.h>

  #include <string>

  int main(int argc, const char * argv[]) {

    static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
    static const int8_t mapBase58[] = {
      -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
      -1, 0, 1, 2, 3, 4, 5, 6,  7, 8,-1,-1,-1,-1,-1,-1,
      -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1,
      22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1,
      -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46,
      47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1,
    };

    const std::string b58Str(pszBase58);

    for (size_t i = 0; i < b58Str.length(); i++) {
      const char *ch = strchr(pszBase58, b58Str[i]);
      printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]);
      assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]);
    }

    assert(mapBase58['1'] == 0);
    assert(mapBase58['z'] == 57);

    /** All alphanumeric characters except for "0", "I", "O", and "l" */
    assert(mapBase58['0'] == -1);
    assert(mapBase58['I'] == -1);
    assert(mapBase58['O'] == -1);
    assert(mapBase58['l'] == -1);

    return 0;
  }

  ```

Tree-SHA512: c28376dc8c92cc4a770c3282db4a568ae5f5a08e27f714183eb3d8755421dc7aa11d7b45afa55e70eba46565f378062aac53dc8f150eeeab12ce7b5db5af89c5
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Jun 24, 2021
bcab47b use base58 map instead of strchr() (Kevin Pan)

Pull request description:

  Use array map instead of find string position.

  Test code snippet:

  ```cpp

  #include <assert.h>
  #include <stdint.h>
  #include <stdio.h>
  #include <stdlib.h>

  #include <string>

  int main(int argc, const char * argv[]) {

    static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
    static const int8_t mapBase58[] = {
      -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
      -1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,
      -1, 0, 1, 2, 3, 4, 5, 6,  7, 8,-1,-1,-1,-1,-1,-1,
      -1, 9,10,11,12,13,14,15, 16,-1,17,18,19,20,21,-1,
      22,23,24,25,26,27,28,29, 30,31,32,-1,-1,-1,-1,-1,
      -1,33,34,35,36,37,38,39, 40,41,42,43,-1,44,45,46,
      47,48,49,50,51,52,53,54, 55,56,57,-1,-1,-1,-1,-1,
    };

    const std::string b58Str(pszBase58);

    for (size_t i = 0; i < b58Str.length(); i++) {
      const char *ch = strchr(pszBase58, b58Str[i]);
      printf("%d - %d\n", ch - pszBase58, mapBase58[(uint8_t)b58Str[i]]);
      assert(ch - pszBase58 == mapBase58[(uint8_t)b58Str[i]]);
    }

    assert(mapBase58['1'] == 0);
    assert(mapBase58['z'] == 57);

    /** All alphanumeric characters except for "0", "I", "O", and "l" */
    assert(mapBase58['0'] == -1);
    assert(mapBase58['I'] == -1);
    assert(mapBase58['O'] == -1);
    assert(mapBase58['l'] == -1);

    return 0;
  }

  ```

Tree-SHA512: c28376dc8c92cc4a770c3282db4a568ae5f5a08e27f714183eb3d8755421dc7aa11d7b45afa55e70eba46565f378062aac53dc8f150eeeab12ce7b5db5af89c5
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.