Skip to content

Refactor BnB tests#29532

Merged
achow101 merged 7 commits intobitcoin:masterfrom
murchandamus:2024-03-coinselection_tests
May 6, 2025
Merged

Refactor BnB tests#29532
achow101 merged 7 commits intobitcoin:masterfrom
murchandamus:2024-03-coinselection_tests

Conversation

@murchandamus
Copy link
Member

@murchandamus murchandamus commented Mar 1, 2024

This PR is splitting off some of the improvements made in #28985 and starts addressing the issues raised in #27754.

I aim to completely replace coinselector_tests with coinselection_tests. The goal is to generally use coins created per a nominal effective value so we can get away from testing with CoinSelectionParams that are non-representative and effectuate counterintuitive behavior such as feerate = 0 or cost_of_change = 0

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 1, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29532.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK achow101, monlovesmango, w0xlt
Concept ACK jasonribble, ismaelsadeeq, yancyribbens

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@murchandamus murchandamus force-pushed the 2024-03-coinselection_tests branch from d8ccf02 to 5d722bb Compare March 5, 2024 13:50
@murchandamus murchandamus marked this pull request as ready for review April 9, 2024 12:02
@murchandamus
Copy link
Member Author

Pinging @furszy, @achow101, @S3RK, as discussed

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely sure about 584e524eb57444d7192df1049cafde9ccc480406. The commit description says

Originally these tests verified that at a SelectCoins level that a
solution with fewer inputs gets preferred at high feerates, and a
solution with more inputs gets preferred at low feerates. This outcome
relies on the behavior of BnB, so we move these tests under the umbrella
of BnB tests.

It is true that the outcome relies only on the BnB behavior currently but that might not be true in the future. There could be other algorithm clashing with it.

@murchandamus murchandamus force-pushed the 2024-03-coinselection_tests branch 2 times, most recently from 0bece89 to cbeb10b Compare June 27, 2024 17:35
@murchandamus
Copy link
Member Author

Alright, should hopefully be ready to review.

@murchandamus
Copy link
Member Author

I'm not completely sure about 584e524. The commit description says

Originally these tests verified that at a SelectCoins level that a
solution with fewer inputs gets preferred at high feerates, and a
solution with more inputs gets preferred at low feerates. This outcome
relies on the behavior of BnB, so we move these tests under the umbrella
of BnB tests.

It is true that the outcome relies only on the BnB behavior currently but that might not be true in the future. There could be other algorithm clashing with it.

Yeah, so the old tests assumed that because BnB behaved a certain way, we would get a specific overall outcome. The new tests just check that BnB behaves a certain way. We might still want tests that test the overall outcome as a result from the interaction of multiple coin selection tests, but this one seemed clearly to be testing BnB behavior, and it seemed strange to me to be testing that at the level where the results are combined rather than checking that BnB assumptions are fulfilled by BnB.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/29957535397

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

yancyribbens added a commit to yancyribbens/rust-bitcoin-coin-selection that referenced this pull request Apr 6, 2025
During the selection process, input values are converted to
effective_values, and then the search begins using the calculated
effective_values.  In the test-suit, it's often the case that it's
desired to test what happens given a particular effective_value.
Instead of manually figuring out what absolute_value is needed to result
in the search routines effective_value, add the ability to denote by
effective_value in the test-suite.  This is done by wrapping the value
with e() otherwise it's assumed to be an absolute value.

```
For example:
weighted_utxos: &["1 sat/68vb", "e(1 sat)/68 vb", "e(1 sat)/204 wu"]
```

This will evaluate to two `UTXOs` with equivalent effective_values of 1
sat even though they have different sizes (`68 vb` and `204 wu`).  Also,
there will be a `UTXO` with negative effective value since the absolute
value is given as `1 sat/68 vb`.

Computing the Utxo requires knowing the `fee_rate`, and the `fee_rate`
is included in the `fee_rate` field not part of `weigthed_utxos`.
Therefore, it's no longer useful to use `From::str` since a `Utxo` can
no longer be constructed solely from the string value in `weighted_utxos`.

Motivated by: bitcoin/bitcoin#29532
yancyribbens added a commit to yancyribbens/rust-bitcoin-coin-selection that referenced this pull request Apr 7, 2025
During the selection process, input values are converted to
effective_values, and then the search begins using the calculated
effective_values.  In the test-suit, it's often the case that it's
desired to test what happens given a particular effective_value.
Instead of manually figuring out what absolute_value is needed to result
in the search routines effective_value, add the ability to denote by
effective_value in the test-suite.  This is done by wrapping the value
with e() otherwise it's assumed to be an absolute value.

```
For example:
weighted_utxos: &["1 sat/68vb", "e(1 sat)/68 vb", "e(1 sat)/204 wu"]
```

This will evaluate to two `UTXOs` with equivalent effective_values of 1
sat even though they have different sizes (`68 vb` and `204 wu`).  Also,
there will be a `UTXO` with negative effective value since the absolute
value is given as `1 sat/68 vb`.

Computing the Utxo requires knowing the `fee_rate`, and the `fee_rate`
is included in the `fee_rate` field not part of `weigthed_utxos`.
Therefore, it's no longer useful to use `From::str` since a `Utxo` can
no longer be constructed solely from the string value in `weighted_utxos`.

Motivated by: bitcoin/bitcoin#29532
yancyribbens added a commit to yancyribbens/rust-bitcoin-coin-selection that referenced this pull request Apr 8, 2025
During the selection process, input values are converted to
effective_values, and then the search begins using the calculated
effective_values.  In the test-suit, it's often the case that it's
desired to test what happens given a particular effective_value.
Instead of manually figuring out what absolute_value is needed to result
in the search routines effective_value, add the ability to denote by
effective_value in the test-suite.  This is done by wrapping the value
with e() otherwise it's assumed to be an absolute value.

```
For example:
weighted_utxos: &["1 sat/68vb", "e(1 sat)/68 vb", "e(1 sat)/204 wu"]
```

This will evaluate to two `UTXOs` with equivalent effective_values of 1
sat even though they have different sizes (`68 vb` and `204 wu`).  Also,
there will be a `UTXO` with negative effective value since the absolute
value is given as `1 sat/68 vb`.

Computing the Utxo requires knowing the `fee_rate`, and the `fee_rate`
is included in the `fee_rate` field not part of `weigthed_utxos`.
Therefore, it's no longer useful to use `From::str` since a `Utxo` can
no longer be constructed solely from the string value in `weighted_utxos`.

Motivated by: bitcoin/bitcoin#29532
yancyribbens added a commit to yancyribbens/rust-bitcoin-coin-selection that referenced this pull request Apr 8, 2025
During the selection process, input values are converted to
effective_values, and then the search begins using the calculated
effective_values.  In the test-suit, it's often the case that it's
desired to test what happens given a particular effective_value.
Instead of manually figuring out what absolute_value is needed to result
in the search routines effective_value, add the ability to denote by
effective_value in the test-suite.  This is done by wrapping the value
with e() otherwise it's assumed to be an absolute value.

```
For example:
weighted_utxos: &["1 sat/68vb", "e(1 sat)/68 vb", "e(1 sat)/204 wu"]
```

This will evaluate to two `UTXOs` with equivalent effective_values of 1
sat even though they have different sizes (`68 vb` and `204 wu`).  Also,
there will be a `UTXO` with negative effective value since the absolute
value is given as `1 sat/68 vb`.

Computing the Utxo requires knowing the `fee_rate`, and the `fee_rate`
is included in the `fee_rate` field not part of `weigthed_utxos`.
Therefore, it's no longer useful to use `From::str` since a `Utxo` can
no longer be constructed solely from the string value in `weighted_utxos`.

Motivated by: bitcoin/bitcoin#29532
yancyribbens added a commit to yancyribbens/rust-bitcoin-coin-selection that referenced this pull request Apr 8, 2025
During the selection process, input values are converted to
effective_values, and then the search begins using the calculated
effective_values.  In the test-suit, it's often the case that it's
desired to test what happens given a particular effective_value.
Instead of manually figuring out what absolute_value is needed to result
in the search routines effective_value, add the ability to denote by
effective_value in the test-suite.  This is done by wrapping the value
with e() otherwise it's assumed to be an absolute value.

```
For example:
weighted_utxos: &["1 sat/68vb", "e(1 sat)/68 vb", "e(1 sat)/204 wu"]
```

This will evaluate to two `UTXOs` with equivalent effective_values of 1
sat even though they have different sizes (`68 vb` and `204 wu`).  Also,
there will be a `UTXO` with negative effective value since the absolute
value is given as `1 sat/68 vb`.

Computing the Utxo requires knowing the `fee_rate`, and the `fee_rate`
is included in the `fee_rate` field not part of `weigthed_utxos`.
Therefore, it's no longer useful to use `From::str` since a `Utxo` can
no longer be constructed solely from the string value in `weighted_utxos`.

Motivated by: bitcoin/bitcoin#29532
yancyribbens added a commit to yancyribbens/rust-bitcoin-coin-selection that referenced this pull request Apr 8, 2025
During the selection process, input values are converted to
effective_values, and then the search begins using the calculated
effective_values.  In the test-suit, it's often the case that it's
desired to test what happens given a particular effective_value.
Instead of manually figuring out what absolute_value is needed to result
in the search routines effective_value, add the ability to denote by
effective_value in the test-suite.  This is done by wrapping the value
with e() otherwise it's assumed to be an absolute value.

```
For example:
weighted_utxos: &["1 sat/68vb", "e(1 sat)/68 vb", "e(1 sat)/204 wu"]
```

This will evaluate to two `UTXOs` with equivalent effective_values of 1
sat even though they have different sizes (`68 vb` and `204 wu`).  Also,
there will be a `UTXO` with negative effective value since the absolute
value is given as `1 sat/68 vb`.

Computing the Utxo requires knowing the `fee_rate`, and the `fee_rate`
is included in the `fee_rate` field not part of `weigthed_utxos`.
Therefore, it's no longer useful to use `From::str` since a `Utxo` can
no longer be constructed solely from the string value in `weighted_utxos`.

Motivated by: bitcoin/bitcoin#29532
@monlovesmango
Copy link
Contributor

As discussed during Bitcoin Core Review Club, at least one test with 0 fee rate should probably be added to avoid test coverage regression.

Thanks for hosting Murch!

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Perhaps the commit descriptions ("BnB rate sensitivity tests", "simple BnB failure tests", and "BnB iteration exhaustion test", for example) could become functions for clarity (not necessarily new test cases).

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Some commits can be squashed to avoid warnings like "unused function".

@murchandamus
Copy link
Member Author

nit: Perhaps the commit descriptions ("BnB rate sensitivity tests", "simple BnB failure tests", and "BnB iteration exhaustion test", for example) could become functions for clarity (not necessarily new test cases).

I’m not sure I understand what you mean. Do you mean that I introduce helper functions like "TestBnBSuccess" for the new tests instead of making them their own test cases?

@murchandamus
Copy link
Member Author

nit: Some commits can be squashed to avoid warnings like "unused function".

@w0xlt: I ran each commit, and I did not receive this warning. Could you let me know which commit resulted in that warning for you?

As discussed during Bitcoin Core Review Club, at least one test with 0 fee rate should probably be added to avoid test coverage regression.

@monlovesmango: I added another commit to run the simple tests at various feerates, including the feerates 0 sat/kvB, 1 sat/kvB, and 1,500,000 sat/kvB.

@murchandamus
Copy link
Member Author

murchandamus commented Apr 29, 2025

Rebased due to strange CI failures

Edit:
It seems to me that

Copy link
Member

@achow101 achow101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first 2 commits could be combined. The first commit is harder to review since it introduces a bunch of helpers that are completely unused. In fact, the file is introduced but not to the build system, so we can't even check if it compiles.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK dbf1f26

I ran each commit, and I did not receive this warning. Could you let me know which commit resulted in that warning for you?

It was caused by the coinselection_tests.cpp as you mentioned here: #29532 (comment)

Do you mean that I introduce helper functions like "TestBnBSuccess" for the new tests instead of making them their own test cases?

I was referring to something like the code below (which encapsulates the latest commit changes), but it was just a suggestion. The current code looks good to me too.

void iteration_exhaustion_test(...){

    std::vector<OutputGroup> doppelganger_pool;
    std::vector<CAmount> doppelgangers;
    std::vector<CAmount> expected_inputs;
    for (int i = 0; i < 17; ++i) {
// ...
    }
    AddCoins(doppelganger_pool, doppelgangers);
    // Among up to 17 unique UTXOs of similar effective value we will find a solution composed of the eight smallest UTXOs
    TestBnBSuccess("Combine smallest 8 of 17 unique UTXOs", doppelganger_pool, /*selection_target=*/8 * CENT, /*expected_input_amounts=*/expected_inputs);

    // Starting with 18 unique UTXOs of similar effective value we will not find the solution due to exceeding the attempt limit
    AddCoins(doppelganger_pool, {1 * CENT + default_cs_params.m_cost_of_change + 17});
    TestBnBFail("Exhaust looking for smallest 8 of 18 unique UTXOs", doppelganger_pool, /*selection_target=*/8 * CENT);
}

BOOST_AUTO_TEST_CASE(bnb_test) {
// ...
iteration_exhaustion_test(...);
}

Recreates the tests in a new test suite coinselection_tests.cpp that is
based on UTXOs being created per their effective values rather than
nominal values and uses transactions with non-zero feerates.
Originally these tests verified that at a SelectCoins level that a
solution with fewer inputs gets preferred at high feerates, and a
solution with more inputs gets preferred at low feerates. This outcome
relies on the behavior of BnB, so we move these tests under the umbrella
of BnB tests.

Originally these tests relied on SFFO to work.
We do not need to repeat the same test multiple times because BnB is
deterministic and will therefore always have the same outcome.
Additionally, this test was redundant because it repeats the "Smallest
combination too big" test.
Copy link
Contributor

@monlovesmango monlovesmango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK dbf1f2663b1afbe03d6b1855f83db604bc79979e

I like how it now covers a variety of fee rates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this seems to be checking for a match, not a mismatch

Suggested change
BOOST_CHECK_MESSAGE(HaveEquivalentValues(expected_result, *result), strprintf("Result mismatch in BnB-Success: %s. Expected %s, but got %s", test_title, InputAmountsToString(expected_result), InputAmountsToString(*result)));
BOOST_CHECK_MESSAGE(HaveEquivalentValues(expected_result, *result), strprintf("Result match in BnB-Success: %s. Expected %s, and got %s", test_title, InputAmountsToString(expected_result), InputAmountsToString(*result)));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it checks whether the expected result and the selected input set match, but the message here is printed in the case of a failure!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see! I understand now. Sorry was running with --log_level=all and was misinterpreting these as success messages. Please disregard..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking such a thorough look!

@murchandamus
Copy link
Member Author

I was referring to something like the code below (which encapsulates the latest commit changes), but it was just a suggestion. The current code looks good to me too.

I see, thanks. I guess it could be nice to be able to run the test suite in smaller portions especially if some of the tests took a long time, but it overall runs extremely quickly, so I’m not sure it is necessary to further structure the tests at this time.

I intend to work on porting the other algorithms’ tests in the same manner, and I can pick this up in a follow-up commit, if we decide that’s the way to go then.

@murchandamus
Copy link
Member Author

Should be ready to review, again.

@achow101
Copy link
Member

achow101 commented May 1, 2025

ACK 85368aa

1 similar comment
@monlovesmango
Copy link
Contributor

ACK 85368aa

Comment on lines +43 to +44
/** Make one OutputGroup with a single UTXO that either has a given effective value (default) or a given amount (`is_eff_value = false`). */
static OutputGroup MakeCoin(const CAmount& amount, bool is_eff_value = true, CoinSelectionParams cs_params = default_cs_params, int custom_spending_vsize = 68)
Copy link
Member

@furszy furszy May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the is_eff_value arg is always true for all the tests. What about removing it?.
Also, you are expecting group.GetSelectionAmount() to be equal to tx.vout[0].nValue with this parameter right?. I'm unsure that will always be the case.
As a test, could try adding an assertion at the end of this function. I have a hunch it will crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the plan is to rewrite all of the tests into this style over the course of a couple PRs. So this parameter is here to allow for the future testing of SFFO behavior.

Also, you are expecting group.GetSelectionAmount() to be equal to tx.vout[0].nValue with this parameter right?

Only when it is false. The purpose is to have group.GetSelectionAmount() be eqaul to amount, so nValue is increased by the amount in fees so that they match after fees are deducted.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 85368aa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.