Skip to content

Several randomness improvements#29625

Merged
fanquake merged 23 commits intobitcoin:masterfrom
sipa:202403_rand_rework
Jul 4, 2024
Merged

Several randomness improvements#29625
fanquake merged 23 commits intobitcoin:masterfrom
sipa:202403_rand_rework

Conversation

@sipa
Copy link
Member

@sipa sipa commented Mar 11, 2024

This PR contains a number of vaguely-related improvements to the random module.

The specific changes and more detailed rationale is in the commit messages, but the highlights are:

  • XoRoShiRo128PlusPlus (previously a test-only RNG) moves to random.h and becomes InsecureRandomContext, which is even faster than FastRandomContext but non-cryptographic. It also gets all helper randomness functions (randrange, fillrand, ...), making it a lot more succinct to use.
  • During tests, all randomness is made deterministic (except for GetStrongRandBytes) but non-repeating (like GetRand() used to be when g_mock_deterministic_tests was used), either fixed, or from a random seed (overridden by env var).
  • Several infrequently used top-level functions (GetRandMillis, GetRandMicros, GetExponentialRand) are converted into member functions of FastRandomContext (and InsecureRandomContext).
  • GetRand<T>() (without argument) can now return the maximum value of the type (previously e.g. GetRand<uint32_t>() would never return 0xffffffff).

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 11, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK achow101, maflcko, hodlinator, dergoegge
Concept ACK theuni
Stale ACK EthanHeilman

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30377 (refactor: Make uint256S(const char*) consteval by hodlinator)
  • #30277 ([DO NOT MERGE] Erlay: bandwidth-efficient transaction relay protocol (Full implementation) by sr-gi)
  • #30116 (p2p: Fill reconciliation sets (Erlay) attempt 2 by sr-gi)
  • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
  • #29543 (refactor: Avoid unsigned integer overflow in script/interpreter.cpp by hebasto)
  • #29536 (fuzz: fuzz connman with non-empty addrman + ASMap by brunoerg)
  • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
  • #26114 (net: Make AddrFetch connections to fixed seeds by mzumsande)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/22530315845

@dergoegge
Copy link
Member

Concept ACK

There are a few places in the fuzz tests where this will allow to easily replace FastRandomContext with a InsecureRandomContext, which is beneficial for performance (e.g. the addrman harnesses partially fill the addrman with addresses from rng) and we don't need cryptographic rng there anyway.

During tests, all randomness is made deterministic

Great, this should help with #29018

@Sjors
Copy link
Member

Sjors commented Mar 12, 2024

What's the impact on the fuzz corpus of switching to a different (?) deterministic RNG?

@dergoegge
Copy link
Member

What's the impact on the fuzz corpus of switching to a different (?) deterministic RNG?

I would expect that switching to a different rng should have no meaningful effect on the corpus itself. The corpus for a particular harness might change but the coverage for the code we intend to test should remain the same. This is because using rng in a fuzz harness only makes sense in very rare cases. It should never be used in a way that can significantly affect the coverage reached, otherwise there is no point in using a coverage-guided fuzzer, we could just pipe /dev/random to our harnesses.

For example, if we need to populate some data that we don't really expect to have an impact on the thing we are testing, we might use rng instead of consuming from the fuzz input (we do this in the p2p transport harnesses to fill message contents, which are essentially irrelevant to the transport logic).

Switching to deterministic rng can cause a corpus' coverage to grow because coverage-guided feedback loops start working more reliably when the code under test is deterministic. This can vary from harness to harness, but we've seen coverage-guided fuzzers find bugs once we've improved on determinism.

@sipa sipa force-pushed the 202403_rand_rework branch 2 times, most recently from b8d2aa9 to 3ad67b0 Compare March 12, 2024 20:55
@sipa sipa force-pushed the 202403_rand_rework branch 8 times, most recently from c07a68c to b5c10a4 Compare March 13, 2024 19:32
@sipa sipa force-pushed the 202403_rand_rework branch 5 times, most recently from 019d483 to 1b75d68 Compare March 14, 2024 20:30
sipa added 4 commits July 1, 2024 10:26
This is preparation for making it more generally accessible.
Convert XoRoShiRo128PlusPlus into a full RandomMixin-based RNG class,
providing all utility functionality that FastRandomContext has. In doing so,
it is renamed to InsecureRandomContext, highlighting its non-cryptographic
nature.

To do this, a fillrand fallback is added to RandomMixin (where it is used by
InsecureRandomContext), but FastRandomContext still uses its own fillrand.
The existing code provides two randomness mechanisms for test purposes:
- g_insecure_rand_ctx (with its wrappers InsecureRand*), which during tests is
  initialized using either zeros (SeedRand::ZEROS), or using environment-provided
  randomness (SeedRand::SEED).
- g_mock_deterministic_tests, which controls some (but not all) of the normal
  randomness output if set, but then makes it extremely predictable (identical
  output repeatedly).

Replace this with a single mechanism, which retains the SeedRand modes to control
all randomness. There is a new internal deterministic PRNG inside the random
module, which is used in GetRandBytes() when in test mode, and which is also used
to initialize g_insecure_rand_ctx. This means that during tests, all random numbers
are made deterministic. There is one exception, GetStrongRandBytes(), which even
in test mode still uses the normal PRNG state.

This probably opens the door to removing a lot of the ad-hoc "deterministic" mode
functions littered through the codebase (by simply running relevant tests in
SeedRand::ZEROS mode), but this isn't done yet.
The existing code uses GetRand(nMax), with a default value for nMax, where nMax is the
range of values (not the maximum!) that the output is allowed to take. This will always
miss the last possible value (e.g. GetRand<uint32_t>() will never return 0xffffffff).

Fix this, by moving the functionality largely in RandomMixin, and also adding a
separate RandomMixin::rand function, which returns a value in the entire (non-negative)
range of an integer.
@sipa
Copy link
Member Author

sipa commented Jul 1, 2024

Rebased after the merge of #30237.

sipa added 10 commits July 1, 2024 12:39
This matches the data type of m_cache_entry_expiration.
There are only a few call sites of these throughout the codebase, so
move the functionality into FastRandomContext, and rewrite all call sites.

This requires the callers to explicit construct FastRandomContext objects,
which do add to the verbosity, but also make potentially apparent locations
where the code can be improved by reusing a FastRandomContext object (see
further commit).
PeerManagerImpl, as well as several net functions, already have existing
FastRandomContext objects. Reuse them instead of constructing new ones.
@achow101
Copy link
Member

achow101 commented Jul 1, 2024

ACK ce80942

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK ce80942 🐈

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: re-ACK ce8094246ee95232e9d84f7e37f3c0a43ef587ce 🐈
iPUv215Td9GqZ8iEiD8qyKJufeqJCMlNIHYw/Ha/B/66jmXmcrePorRaTQ+YqnML9Nkr4A+IaY3dqwDPvXI5Cg==

Reminder for myself: #29625 (comment)

MakeAndPushMessage(pfrom, NetMsgType::CMPCTBLOCK, *a_recent_compact_block);
} else {
CBlockHeaderAndShortTxIDs cmpctblock{*pblock, GetRand<uint64_t>()};
CBlockHeaderAndShortTxIDs cmpctblock{*pblock, FastRandomContext().rand64()};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (haven't tried), but in a follow-up this could use m_rng, because the lock is already taken IIRC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #30393

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK ce80942

Mainly code review. Did not attempt to check thread safety compliance. (Passed make check & test/functional/test_runner.py).

e2d1f84 - random: make GetRand() support entire range (incl. max)

The commit message title probably should be "random: make GetRand<T>() support entire range (incl. max)", since the overloads taking parameters still are exclusive at the end. It felt dangerous that GetRand<T>(void) behavior differed in this way from the overloads - happy others suggested the calls be inlined as randrange() and rand<T>() are much clearer!

if (bits == 64) return Impl().rand64();
uint64_t ret;
if (bits <= bitbuf_size) {
// If there is enough entropy left in bitbuf, return its bottom bits bits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 21ce9d8: nit: typo - "bits bits"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not actually a typo, it means "Return the bottom bits many bits", but I do see why it'd confusing. Will address if I have to repush.

// is F(x) = -log(1 - x).
//
// Combining the two, and using log1p(x) = log(1 + x), we obtain the following:
return -std::log1p((uniform >> 11) * -0x1.0p-53);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In d5fcbe9: Always throwing away 11 bits of entropy in the new version compared to 0 before the PR. I guess you want to preserve the expression from the linked site and it's unclear whether not throwing away the bits would be more performant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wanted to keep MakeExponential as stand-alone reviewable as possible. I don't think any of the call sites are so performance-critical that the difference matters.

/*m_relay_txs=*/random_context.randbool(),
/*fBloomFilter=*/random_context.randbool(),
/*nKeyedNetGroup=*/random_context.randrange(100),
/*nKeyedNetGroup=*/random_context.randrange(100u),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In e2d1f84: Now we are in C++20 land, it might be time to use real designated initializers instead of comments when touching blocks like these?

            .id=id,
            .m_connected=std::chrono::seconds{random_context.randrange(100)},
            .m_min_ping_time=std::chrono::microseconds{random_context.randrange(100)},
            .m_last_block_time=std::chrono::seconds{random_context.randrange(100)},
            .m_last_tx_time=std::chrono::seconds{random_context.randrange(100)},
            .fRelevantServices=random_context.randbool(),
            .m_relay_txs=random_context.randbool(),
            .fBloomFilter=random_context.randbool(),
            .nKeyedNetGroup=random_context.randrange(100u),
            .prefer_evict=random_context.randbool(),
            .m_is_local=random_context.randbool(),
            .m_network=ALL_NETWORKS[random_context.randrange(ALL_NETWORKS.size())],
            .m_noban=false,
            .m_conn_type=ConnectionType::INBOUND,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already a list initialization, so I don't think clang-tidy can pick up the named args at all. Happy to review a follow-up, if you decide to open one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: #30397

Comment on lines 658 to +660
if (m_opts.check_ratio == 0) return;

if (GetRand(m_opts.check_ratio) >= 1) return;
if (FastRandomContext().randrange(m_opts.check_ratio) >= 1) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ddc184d: The commit message in e2d1f84 made me do a survey of (mis)uses of GetRand() . 2 similar cases stood out to me at first, but they appear correct after some noodling. This is one of them and the other is check_block_index in validation.cpp.

(check_ratio is often set to 1 (always) or 0 (never)).

Sharing the same return path actually makes the behavior slightly quicker for me to grok:

    if (m_opts.check_ratio == 0
        || FastRandomContext().randrange(m_opts.check_ratio) >= 1)
        return;

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK ce80942

@hebasto
Copy link
Member

hebasto commented Jul 14, 2024

Ported to the CMake-based build system in hebasto#264.

uint256 ret;
rng.Keystream(MakeWritableByteSpan(ret));
return ret;
ProcRand(nullptr, 0, RNGLevel::PERIODIC, /*always_use_real_rng=*/false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test-only follow-up question: Just a nit, because this only affects tests, but for consistency, I couldn't figure out why the periodic seed does not use the real RNG. The result is only used internally, so it should be fine, in line with the other calls. For example, SeedStrengthen, which is called as part of the periodic seeding and also uses the real RNG. Also, if this were problematic and made tests non-deterministic, the same issues should appear when someone in the tests called GetStrongRandBytes or Random_SanityCheck, no?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.