Skip to content

p2p: Increase inbound capacity for block-relay only connections#28463

Open
mzumsande wants to merge 6 commits intobitcoin:masterfrom
mzumsande:202308_increase_block_relay
Open

p2p: Increase inbound capacity for block-relay only connections#28463
mzumsande wants to merge 6 commits intobitcoin:masterfrom
mzumsande:202308_increase_block_relay

Conversation

@mzumsande
Copy link
Contributor

@mzumsande mzumsande commented Sep 12, 2023

This is joint work with amitiuttarwar.

See issue #28462 for a broader discussion on increasing the number of block-relay-only connections independent of this particular implementation proposal.

We suggest to increase the number of inbound slots allocated to block-relay-only peers by increasing the default maximum connections from 125 to 200, with 50% of inbound slots accessible for tx-relaying peers.
This is a prerequisite for being able to increase the default number of outgoing block-relay-only peers later, because the current inbound capacity of the network is not sufficient.
In order to account for incoming tx-relaying peers separately from incoming block-relay peers, changes to the inbound eviction logic are necessary.

See the next post in this thread for a more detailed explanation and motivation of the changes.

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 12, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28463.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK naumenkogs, naiyoma, 0xB10C, marcofleon, instagibbs, ajtowns, brunoerg

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #32394 (net: make m_nodes_mutex non-recursive by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

  • "%u." "It does not apply -> "%u. It does not apply" [Two adjacent string literals are concatenated without a space, producing "...%u.It..." which is missing the space between sentences.]

Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

  • std::clamp(args.GetIntArg("-inboundrelaypercent", DEFAULT_FULL_RELAY_INBOUND_PCT), 0, 100) in src/init.cpp

2026-02-21 08:59:09

@DrahtBot DrahtBot added the P2P label Sep 12, 2023
@mzumsande mzumsande marked this pull request as draft September 12, 2023 20:36
@mzumsande
Copy link
Contributor Author

mzumsande commented Sep 12, 2023

Status Quo

We have a default maximum of 125 connections (not counting manual ones). Out of these,
11 (8 full outbounds, 2 block-relay-only, 1 feeler) are reserved for outbounds, leaving us with
114 inbound slots. There are currently no limits on how many of the inbounds can support transaction relay.

Problem

We want to increase the number of inbound slots specifically offered to block-relay-only connections, but don't want to significantly change the number of tx-relaying inbound peers, because we don’t want to radically change the memory and traffic requirements that come with running a full node with the default configuration.
This is complicated by a timing issue: When a new peer connects to us, we don't know yet whether it is block-relay-only before we have processed their version message.
Inbound peer eviction, however, happens right after a peer connects to us and before processing its version msg. This means that we don't yet know during inbound eviction, if the new peer wants tx relay.

Approach

We propose to introduce a separate maximum of tx-relaying inbound connections, in addition to the existing global maximum.
The new maximum for tx-relaying inbounds gets chosen such that 50% of the total inbound slots can be utilized by tx-relaying peers. We choose a ratio here instead of a fixed number here so that it scales with users changing their -maxconnections value from the default. The ratio of 50% is also made adjustable in the form of a new startup parameter.

Changes to the eviction logic

We don't change the eviction logic for total connections, which happens in CreateNodeFromAcceptedSocket().
We add a tx-relay-specific eviction check to the version message processing which runs if we don't have capacity for another tx-relaying peer: This check uses the same logic from AttemptToEvictConnection, but specifically evicts only tx-relaying peers (by protecting the rest of the peers). Only if we cannot find another peer to evict (e.g. all peers are protected, which should only happen if a low -maxconnections is used) we disconnect the new peer. This approach achieves that the number of tx-relaying peers can never exceed the limit.

Numbers

Specifically, we propose to increase the total number of connections from 125 to 200, with a suggested ratio of 50% tx-relay inbounds. With 11 connection slots reserved for outbound peers,
this results in (200 - 11) * 0.5 = 94.5 slots for tx-relaying inbound peers, which closely aligns with the typical tx-relaying inbounds seen in today's world, where typically ~20% of inbound slots are already taken by block-relay-only connections.
These numbers were chosen to be sufficient to raise the number of block-relay-only connections from 2 to 8 once this patch is widely deployed (see Issue #28462 for more details on this).
[Note: In a previous iteration, a ratio of 60% was suggested, see this post for more discussion.]

Handling bloom filter peers

The logic gets a bit complicated if a node supports BIP37 (bloom filters), because then an inbound peer can be switched to tx-relay after sending a FILTERLOAD or FILTERCLEAR message (but never back!), which could result in an excess of tx-relaying inbounds. We propose to solve this by triggering the eviction logic for a tx-relaying peer from net_processing whenever we switch a peer to tx-relay due to a BIP37-related message.

@naumenkogs
Copy link
Contributor

Concept ACK, specifically on changes to the eviction logic, and handling bloom filters.
Numbers also seem fine I guess.

I was curious whether there is (or will be) any other good reason to delay eviction after the version (apart from making this logic cleaner), but I haven't identified any.

@mzumsande
Copy link
Contributor Author

mzumsande commented Sep 28, 2023

I was curious whether there is (or will be) any other good reason to delay eviction after the version (apart from making this logic cleaner), but I haven't identified any.

Discussed this with @amitiuttarwar and we think that it would be better to move the tx-related eviction into the version processing logic.
Pros:

  • It's conceptually simpler to only evict a tx-relaying peer when we are sure that it is one
  • We wouldn't have to evict anyone in the situation where all tx inbound slots are full and a new block-relay inbound connects
  • We need the logic for this (EvictTxPeerIfFull()) anyway when dealing with bloom-filter peers.

Cons:

  • The eviction now happens in two different places, after accepting a connection and during version processing.

I will update the PR with this soon!

@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from f78a0ed to 039ac99 Compare October 3, 2023 19:08
@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 039ac99 to 82ed8a5 Compare October 13, 2023 18:35
@mzumsande
Copy link
Contributor Author

Changed the approach to do the eviction of tx-relaying peers within the version message processing, when we can make use of the fRelay flag they sent us - this has the advantage that we don't have to evict new peers before being sure if they want tx relay.

As a result of the later eviction, it became necessary to protect the new peer from being evicted right by AttemptToEvictConnection() right after connecting (so that the new peer only gets evicted if no other peer can be found). This isn't done for the existing unconditional inobound eviction in net (CreateNodeFromAcceptedSocket), because there we would evict a peer before adding the new peer to m_nodes.

@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 82ed8a5 to 8247eaa Compare October 13, 2023 18:46
@amitiuttarwar
Copy link
Contributor

overall, I like this approach significantly more. although we are introducing another touchpoint for inbound eviction logic, it is more straightforward to reason about. but the biggest improvement is not unnecessarily disconnecting a new or existing connection when tx relay isn't being requested.

reviewed this in-depth. the fundamental structure & logic looks good to me. left a lot of comments along the way 😛

some additional thoughts:

  • commit message for ab5ad61f67f3bcab503ac3eb012c4f35d733b946 says "When we receive an inbound connection and don't have space for another full-relay peer, we now attempt to evict specifically a full-relay inbound." I think this should mention timing of processing the version message for clarity.
  • the order of events when processing the VERSION message makes sense, because we want to instantiate tx_relay for the before before triggering the new eviction logic (which uses that in the count).
  • this code strengthens the distinction between inbound-tx-relay connections from inbound-block-relay connections. ideally, this would be captured in separate connection types, but that would require more invasive changes since we don't know the distinction at the time of accepting the connection. I don't think the benefit would be worth the cost, but is something worth keeping in mind if we continue to build stronger distinctions between the two types of inbounds.

src/net.h Outdated
* a peer to evict (all eligible peers are protected)
* so that the caller can deal with this.
*/
bool EvictTxPeerIfFull(std::optional<NodeId> protect_peer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it could be clearer to call this function something like TxInboundCapacityAvailable. imo that seems more useful from the POV of the net processing caller, but either is ok with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default param here makes the FILTERLOAD/FILTERCLEAR callers slightly cleaner

Suggested change
bool EvictTxPeerIfFull(std::optional<NodeId> protect_peer);
bool EvictTxPeerIfFull(std::optional<NodeId> protect_peer) = std::nullopt;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use default values.

Not sure about the rename suggestion, I think I prefer EvictTxPeerIfFull, because I think for naming describing what the function does is more important than describing the return value. Otherwise other spots might call this function in the future with the intention to just perform a check, and then might be surprised about the side effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then might be surprised about the side effects.

yeah totally. I was trying to make the return values more self-evident, but on further thought, I agree with you that emphasizing eviction is more important - people can always read the doc string to understand the return value. can't think of anything to concisely express HasOrMakeInboundTxCapacity, so leaving as is sounds good :)

@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 8247eaa to 9555a33 Compare October 19, 2023 21:42
@mzumsande
Copy link
Contributor Author

Thanks for the review @amitiuttarwar! i pushed an update, addressing all feedback.

@mzumsande
Copy link
Contributor Author

df1c9c2 to 9806bc5: rebased (there was a silent conflict for p2p_opportunistic_1p1c.py, together with the -maxconnections=94 default limit used in the functional tests)

mzumsande and others added 6 commits February 21, 2026 14:14
If the added bool evict_tx_relay_peers set, non-tx-relay
peers will be exempt from disconnection.
Also allow to specify a peer that exempt from eviction.
Both options will be used in later commits.

Co-authored-by: Amiti Uttarwar <[email protected]>
..and adjust the eviction logic.
The new default max connection number is 200, the default maximum of tx-relaying
inbounds is limited to 50% of all inbound connections.
With 11 outbound connections, that is (200 - 11) * 0.5 = 94.5.
As a result, the tx-related maximum traffic should not change
drastically.

When we receive an inbound connection and don't have space for another
full-relay peer, we now attempt to evict specifically a full-relay inbound
after receiving the version message of the new peer.

Once this commit is widely deployed, the added inbound capacity will
allow us to increase the number of outgoing block-relay-only connections.

Co-authored-by: Amiti Uttarwar <[email protected]>
Permit users to change the amount of inbounds that are permitted to relay
transactions. This is particularly relevant to ensure that superusers that are
not concerned with resource usage are not artificially restricted from offering
many transaction relay slots to the network.

Co-authored-by: Martin Zumsande <[email protected]>
@mzumsande mzumsande force-pushed the 202308_increase_block_relay branch from 9806bc5 to cfa72f8 Compare February 21, 2026 08:58
@mzumsande
Copy link
Contributor Author

9806bc5 to cfa72f8: rebased

vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Feb 28, 2026
df69b22 doc: improve documentation around connection limit maximums (Amiti Uttarwar)
adc171e scripted-diff: Rename connection limit variables (Amiti Uttarwar)
e9fd9c0 net: add m_max_inbound to connman (Amiti Uttarwar)
c25e0e0 net, refactor: move calculations for connection type limits into connman (Amiti Uttarwar)

Pull request description:

  This is joint work with amitiuttarwar.

  This has the first few commits of bitcoin#28463. It is not strictly a prerequisite for that, but has changes that in our opinion make sense on their own.
  It improves the handling of maximum numbers for different connection types (that are set during init and don’t change after) by:
  * moving all calculations into one place, `CConnMan::Init()`. Before, they were dispersed between `Init`, `CConnman::Init` and other parts of `CConnman`, resulting in some duplicated test code.
  * removing the possibility of having a negative maximum of inbound connections, which is hard to argue about
  * renaming of variables and doc improvements

ACKs for top commit:
  amitiuttarwar:
    co-author review ACK df69b22
  naumenkogs:
    ACK df69b22
  achow101:
    ACK df69b22

Tree-SHA512: 913d56136bc1df739978de50db67302f88bac2a9d34748ae96763288d97093e998fc0f94f9b6eff12867712d7e86225af6128f4170bf2b5b8ab76f024870a22c
@sedited
Copy link
Contributor

sedited commented Mar 3, 2026

There hasn't been much review here in a while. Tagging a few people who commented here previously - cc @ajtowns @sr-gi @marcofleon @instagibbs

@0xB10C
Copy link
Contributor

0xB10C commented Mar 3, 2026

Concept ACK.

I ran a previous version of this branch, but probably best to do that again now nearly two years later.

vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Mar 4, 2026
df69b22 doc: improve documentation around connection limit maximums (Amiti Uttarwar)
adc171e scripted-diff: Rename connection limit variables (Amiti Uttarwar)
e9fd9c0 net: add m_max_inbound to connman (Amiti Uttarwar)
c25e0e0 net, refactor: move calculations for connection type limits into connman (Amiti Uttarwar)

Pull request description:

  This is joint work with amitiuttarwar.

  This has the first few commits of bitcoin#28463. It is not strictly a prerequisite for that, but has changes that in our opinion make sense on their own.
  It improves the handling of maximum numbers for different connection types (that are set during init and don’t change after) by:
  * moving all calculations into one place, `CConnMan::Init()`. Before, they were dispersed between `Init`, `CConnman::Init` and other parts of `CConnman`, resulting in some duplicated test code.
  * removing the possibility of having a negative maximum of inbound connections, which is hard to argue about
  * renaming of variables and doc improvements

ACKs for top commit:
  amitiuttarwar:
    co-author review ACK df69b22
  naumenkogs:
    ACK df69b22
  achow101:
    ACK df69b22

Tree-SHA512: 913d56136bc1df739978de50db67302f88bac2a9d34748ae96763288d97093e998fc0f94f9b6eff12867712d7e86225af6128f4170bf2b5b8ab76f024870a22c
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Mar 4, 2026
df69b22 doc: improve documentation around connection limit maximums (Amiti Uttarwar)
adc171e scripted-diff: Rename connection limit variables (Amiti Uttarwar)
e9fd9c0 net: add m_max_inbound to connman (Amiti Uttarwar)
c25e0e0 net, refactor: move calculations for connection type limits into connman (Amiti Uttarwar)

Pull request description:

  This is joint work with amitiuttarwar.

  This has the first few commits of bitcoin#28463. It is not strictly a prerequisite for that, but has changes that in our opinion make sense on their own.
  It improves the handling of maximum numbers for different connection types (that are set during init and don’t change after) by:
  * moving all calculations into one place, `CConnMan::Init()`. Before, they were dispersed between `Init`, `CConnman::Init` and other parts of `CConnman`, resulting in some duplicated test code.
  * removing the possibility of having a negative maximum of inbound connections, which is hard to argue about
  * renaming of variables and doc improvements

ACKs for top commit:
  amitiuttarwar:
    co-author review ACK df69b22
  naumenkogs:
    ACK df69b22
  achow101:
    ACK df69b22

Tree-SHA512: 913d56136bc1df739978de50db67302f88bac2a9d34748ae96763288d97093e998fc0f94f9b6eff12867712d7e86225af6128f4170bf2b5b8ab76f024870a22c
Copy link
Contributor

@marcofleon marcofleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK



Based on the discussion in #34542, there seems to be some general agreement that increasing block-relay-only connections is a good first step toward improving partition resistance. I think it makes sense on its own and it would also lay the groundwork for potentially adding reconciliation-based tx relay connections later on.



I’ll start running fuzzamoto and other relevant fuzz tests on this PR.

}
pfrom.m_bloom_filter_loaded = true;
pfrom.m_relays_txs = true;
if (pfrom.IsInboundConn() && !m_connman.EvictTxPeerIfFull()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that we don't protect this new tx-relay peer like we do for the VERSION message in acfecb2? I guess this peer would have some history so it would make sense to consider it for eviction?

@instagibbs
Copy link
Member

concept ACK. I would like some of the erlay discussion to move forward too, because I'm not sure how to judge this all holistically.

@ajtowns
Copy link
Contributor

ajtowns commented Mar 5, 2026

Concept ACK.

@brunoerg
Copy link
Contributor

Concept ACK

}
}
if (tx_inbound_peers > m_max_inbound_full_relay) {
return AttemptToEvictConnection(/*evict_tx_relay_peer=*/true, protect_peer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran mutation testing for this PR and noticed that the following mutant isn't killed by any test. See:

diff --git a/src/net.cpp b/src/net.cpp
index 499312fbbd..88afc05216 100644
--- a/src/net.cpp
+++ b/src/net.cpp
@@ -2504,7 +2504,7 @@ bool CConnman::EvictTxPeerIfFull(std::optional<NodeId> protect_peer)
         }
     }
     if (tx_inbound_peers > m_max_inbound_full_relay) {
-        return AttemptToEvictConnection(/*evict_tx_relay_peer=*/true, protect_peer);
+        return AttemptToEvictConnection(/*evict_tx_relay_peer=*/false, protect_peer);
     }
:

When evict_tx_relay_peer is true, it means that whether the node is over its tx-relay inbound limit, it must evict an existing tx-relay peer, not a block-relay-only peer. So, a test case to address it would be great. Suggestion (did not fully tested it):

diff --git a/test/functional/p2p_connection_limits.py b/test/functional/p2p_connection_limits.py
index 64dab1089e..56c1d81980 100755
--- a/test/functional/p2p_connection_limits.py
+++ b/test/functional/p2p_connection_limits.py
@@ -33,6 +33,22 @@ class P2PConnectionLimits(BitcoinTestFramework):
         no_txrelay_version_msg.relay = 0
         return no_txrelay_version_msg

+    def create_no_services_blocks_only_version(self):
+        """VERSION with relay=0 and nServices=0.
+        version_msg = msg_version()
+        version_msg.nVersion = P2P_VERSION
+        version_msg.strSubVer = P2P_SUBVERSION
+        version_msg.nServices = 0
+        version_msg.relay = 0
+        return version_msg
+
     def test_inbound_limits(self):
         node = self.nodes[0]

@@ -81,6 +97,18 @@ class P2PConnectionLimits(BitcoinTestFramework):
         node.add_p2p_connection(P2PInterface())
         self.wait_until(lambda: len(node.getpeerinfo()) == 2)

+        self.log.info('Test that EvictTxPeerIfFull only evicts tx-relaying peers')
+        NUM_BLOCK_RELAY_PEERS = 21
+        self.restart_node(0, ['-maxconnections=33', '-inboundrelaypercent=0'])
+        for _ in range(NUM_BLOCK_RELAY_PEERS):
+            p = self.nodes[0].add_p2p_connection(P2PInterface(), send_version=False, wait_for_verack=False)
+            p.send_without_ping(self.create_no_services_blocks_only_version())
+        self.wait_until(lambda: len(node.getpeerinfo()) == NUM_BLOCK_RELAY_PEERS)
+
+        with node.assert_debug_log(['failed to find a tx-relaying eviction candidate - connection dropped'], timeout=5):
+            self.nodes[0].add_p2p_connection(P2PInterface(), expect_success=False, wait_for_verack=False)
+        self.wait_until(lambda: len(node.getpeerinfo()) == NUM_BLOCK_RELAY_PEERS)
+

 if __name__ == '__main__':
     P2PConnectionLimits(__file__).main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.