net: Add new permission forceinbound to evict a random unprotected connection if all slots are otherwise full#27600
net: Add new permission forceinbound to evict a random unprotected connection if all slots are otherwise full#27600pinheadmz wants to merge 7 commits intobitcoin:masterfrom
forceinbound to evict a random unprotected connection if all slots are otherwise full#27600Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
stickies-v
left a comment
There was a problem hiding this comment.
Concept ACK, makes sense to prioritize whitelisted peers. Will review more in-depth soon.
There was a problem hiding this comment.
Concept ACK
As I suggested in the review club, an alternative, more simple approach would be to just pick a random peer after removing NoBan and outbound connections when in force-mode. Then, if at the end of the usual eviction algorithm, we don't have a evict-able peer, we would evict the random one instead.
This would make the code simpler (no need to change EraseLastKElements or keep track of last), and I don't really see a major downside:
- The
EraseLastKElementseviction criteria don't really seem to be sorted by priority anyway right now (because order doesn't matter so far), so evicting a peer that would've been protected by later calls doesn't seem better than evicting one that would've been protected by earlier ones. - Whitebind No-Ban peers are more trusted to begin with, but if they maliciously wanted to take over all inbound slots by repeatedly connecting to us on the whitebind address, they can do that anyway, whether the eviction is random or whether the current approach of the PR is used. So it's not clear to me what the extra benefit of the current approach over random eviction is.
e71d495 to
c826187
Compare
|
@stickies-v @mzumsande thanks for your review and feedback, I refactored the PR so we evict a random peer when forced if all other peers are protected. It is MUCH simpler ;-) |
stickies-v
left a comment
There was a problem hiding this comment.
Light approach ACK c826187b5070ce89edcde0536183714a1c2e3207
I think this random selection approach suggested by mzumsande is a more elegant approach. "Light" in approach ACK because I need to build more confidence that this doesn't introduce new attack vectors.
c826187 to
2ab1ed6
Compare
|
Thanks @stickies-v nits addressed! 🙏 |
2ab1ed6 to
96b513f
Compare
stickies-v
left a comment
There was a problem hiding this comment.
Code review ACK 96b513f605fb2df441b66da583056b1c8acd4dbc, but I think the first commit should be removed from this PR, I think it's an accident?
I think the code is good and suits the intent of the PR well. My only concern is about the potential footgun introduced, even if it is relatively mild. Anyone running bitcoind with whitebind=0.0.0.0:<port> will now be vulnerable to having all their inbound non-whitelisted slots taken over by an attacker that has figured out what the whitelisted port is. It doesn't seem like a huge concern, given that:
- it only affects inbound connections, which we already assume to be more susceptible to these kinds of attacks, and
- it does require the user to manually whitebind to
0.0.0.0
but I just wanted to flag it here anyway in case it is actually more serious than I see it to be.
Also, this seems like behaviour change that would benefit from a mention in the release notes?
src/node/eviction.cpp
Outdated
There was a problem hiding this comment.
nit: if vEvictionCandidates is empty, there's no point (I think, couldn't see any side effects in EraseLastKElements or ProtectEvictionCandidatesByRatio) executing the next lines so might as well return early? Unless you think this makes the code less clear?
| std::optional<NodeId> force_evict; | |
| if (vEvictionCandidates.size() > 0 && force) { | |
| if (vEvictionCandidates.empty()) return std::nullopt; | |
| std::optional<NodeId> force_evict; | |
| if (force) { |
There was a problem hiding this comment.
Good point, although "if empty, return now" logic would make sense after each step in this function (why bother calling EraseLastKElements() with an empty array?) even without this PR. One thing we could do is add that logic to the beginning of EraseLastKElements() itself, but I dunno how much time is really wasted in there (sort, min, erase ... ?)
There was a problem hiding this comment.
I had similar considerations to e.g. wrap the EraseLastKElements call in a lambda fn, but it's probably not worth the LoC change.
I think this case is slightly different in that it better highlights that if there are no inbound NoBan peers, the result is always std::nullopt.
There was a problem hiding this comment.
good point, i'll add it. one line i think in this case will improve readability
test/functional/p2p_eviction.py
Outdated
There was a problem hiding this comment.
As you mentioned in the review club, whitelisting just based on ports enables an attacker that discovered which port(s) are whitelisted to take over all (non-whitelisted) inbound connections slots.
Therefore, I think we have to be careful not to accidentally lead people to this footgun. Binding to 127.0.0.1 seems much more prudent to not set a bad example.
| self.restart_node(0, extra_args=['-maxconnections=12', '-whitebind=0.0.0.0:30201']) | |
| self.restart_node(0, extra_args=['-maxconnections=12', '-whitebind=127.0.0.1:30201']) |
There was a problem hiding this comment.
updated test and also included warning in release note
96b513f to
fa78fc5
Compare
doc/release-notes-27600.md
Outdated
There was a problem hiding this comment.
I think "on local ports" is unnecessarily restrictive/scary. I think it's still completely fine to whitelist remote addresses, you mostly just want to avoid ranges, right?
There was a problem hiding this comment.
@mzumsande what's your opinion on this? Is this PR a big enough change to warrant this kind of warning?
There was a problem hiding this comment.
Not really sure, I've never looked into these options much and don't know about best practices - I think that if you use -whitebind with non-local addresses, you'd at least have to make sure that that address is not self-advertised. I guess that this applies more to -whitebind than -whitelist, so that the preferred approach in case of a non-local connection would be to use -whitelist?
fyi @vasild, do you have an opinion on this?
There was a problem hiding this comment.
That's a good point, whitelist is harder to attack than whitebind because the attacker would have to spoof their origin IP repeatedly to fill up your inbounds. If a user whitebind's a port and an attacker figures out that port numnber, they can trivially evict all your other inbounds
There was a problem hiding this comment.
I agree with what you say above - whitebind on publicly accessible address:port with this new permission sounds bad. Similar with whitelist - if a range is used.
This PR currently expands the semantic of the noban permission. This will affect existent setups that already use it. Would it make sense to introduce a new permisson, separate from noban? I mean - now if somebody is running -whitebind=noban@publicaddr:port then a bad actor could cause harm on master, but even more harm with this PR.
There was a problem hiding this comment.
interesting idea, so we could add NetPermissionFlags::NoBanForce or something. The danger will still be present for users of the permission, but existing users of NoBan wouldn't have to worry. And since NoBan is a default of whitebind it'll require more user attention anyway to use the more dangerous option.
In that case, the release note should still warn the user about using NoBanForce (or whatever we call it) but that warning can just be general, like, keep an eye on your netinfo if you do this. As opposed to just recommending only setting local addresses with whitebind
There was a problem hiding this comment.
This PR currently expands the semantic of the
nobanpermission
Could you please elaborate on this point? I thought that with this PR, noban nodes are still completely protected (they are removed from the eviction candidates list before the random node is chosen). But it's certain that I'm missing something. Thanks.
There was a problem hiding this comment.
@LarryRuane you are correct but what we are changing (in the current state of this branch) is that noban nodes can now force disconnection of other peers. Since many users may already have noban set, maybe even for a large range of IPs, this branch would introduce a new vulnerability they may not be aware of. Because of that I think it does make more sense to specify a new permission so users can narrow the attack surface
|
Please consider editing the description slightly, if my understanding is correct.
At this point in the process, where we choose a random peer, we're considering all (inbound, not noban) peers, is that correct? They are all (other than outbound or noban) "unprotected" because we haven't protected any of them yet. So maybe this should say, "... selecting a random peer" or "... selecting a random inbound peer" -- because we may randomly choose a peer that ends up being protected, or not protected. That would make this PR easier for me to understand. I do think the updated algorithm is very elegant (great suggestion, @mzumsande). It chooses a random peer, which may end up being erased from the eviction candidate list, or may not, but we don't care either way; we evict it if we need to, all the better if it actually did not get protected (but, again, we don't care). By the way, I'm going to highlight this PR in the Optech newsletter next week, that's why I'm particularly interested in making sure I understand it. |
LarryRuane
left a comment
There was a problem hiding this comment.
utACK fa78fc543cd46ee1c7181ecf6b696c2892b9f8d3
except for possible attack vectors (for example, NoBanForce) that I'm not really qualified to comment on. I may try to contribute to that discussion after I've learned more. Nice PR!
doc/release-notes-27600.md
Outdated
There was a problem hiding this comment.
This PR currently expands the semantic of the
nobanpermission
Could you please elaborate on this point? I thought that with this PR, noban nodes are still completely protected (they are removed from the eviction candidates list before the random node is chosen). But it's certain that I'm missing something. Thanks.
src/node/eviction.cpp
Outdated
There was a problem hiding this comment.
Perhaps insert a comment before this line // This may still return std::nullopt
There was a problem hiding this comment.
I'd suggest expanding this comment to explain what exactly happened.
src/node/eviction.h
Outdated
There was a problem hiding this comment.
I'm not a fan of default arguments; consider making this not have a default. It makes sense if there are many existing calls to a function and you don't want to touch them all, but there's only one call to this function in production code. You would have to touch about 6 call sites in test code, but that's not too bad. The reason I'm not a fan of default arguments is that they hide something that may be helpful to see when just looking at the call site. (You might think, "Oh, it's possible that eviction can be 'forced', what does that mean?) But this is an optional suggestion, fine if you leave it as-is, just something to consider.
There was a problem hiding this comment.
ok I like this, will do
There was a problem hiding this comment.
This still remains default, no?
|
Even though not needed for this PR, you may want to consider adding a commit to remove the unnecessary templating. It's perfectly reasonable not to make this change in this PR, but I thought I'd document it just in case; I do think it makes the code easier to understand. |
|
@LarryRuane thanks for reviewing. I actually had removed the templating in the very first version of this branch but after changing the approach I just left it alone. The function is written like a generic utility function, even though we don't currently use it for any other vectors. So I think the function could go as written into a util.cpp or something, in a future PR ? |
8585fe3 to
8639a45
Compare
|
@willcl-ark thanks for reviewing, addressed your feedback except for the new test but always open to writing more tests so lemme know if you think something is uncovered there. |
8639a45 to
32f48ef
Compare
|
Rebasing on master hopefully to fix Windows CI failure |
|
Rebased on master again, thanks @willcl-ark for your review. Hoping to get some more feedback from @mzumsande and @naumenkogs to proceed, or abandon |
mzumsande
left a comment
There was a problem hiding this comment.
Hoping to get some more feedback from @mzumsande and @naumenkogs to proceed, or abandon
Personally, I'm not sure if the use case (running a node with -maxconnections < 40 and at the same time wanting certain inbound peers, or wanting a really high number of whitelisted inbounds peers at the same time) is common enough to introduce an extra permission just for that. In more typical cases, noban should be sufficient.
But if others think it is I'm not against it.
A related use case I could see (but I'm also not sure if there is demand for that) is to only be reachable by whitelisted "friends" but no one else. That doesn't seem to be possible right now, -fListen is either all or none.
|
@pinheadmz Honestly I still struggle to understand what are the practical scenarios of using this. E.g., what could lead to not being able to evict a connection? |
|
@naumenkogs If the user has also set a low Quick math 4+8+4+8+4 = 28 protected nodes. Plus 8 full outbound and 2 block only = 38. So if a user runs a full node on limited hardware (like my Raspberry Pi at home) and have something like That being said, it ALSO means that the solution to #8798 may be to just ensure |
src/node/eviction.cpp
Outdated
There was a problem hiding this comment.
c5e7e6563201965e88e07aa8da8f26e17fc1b4db
"unprotected" here is sloppy. I'd just drop the comment.
src/node/eviction.cpp
Outdated
There was a problem hiding this comment.
I'd suggest expanding this comment to explain what exactly happened.
src/node/eviction.h
Outdated
There was a problem hiding this comment.
This still remains default, no?
src/node/eviction.h
Outdated
There was a problem hiding this comment.
c5e7e6563201965e88e07aa8da8f26e17fc1b4db
"among inbound no-noban connections" or something (i hate no-noban but not sure what's better)
src/net.cpp
Outdated
There was a problem hiding this comment.
311902f2cf94ee0e11ed34a1373db42dbc218b20
No need to use legacy naming patterns for new variables :)
There was a problem hiding this comment.
hmmm I'm gonna keep it actually because later in this function is a boolean called forced so, I think this is legible
src/net.cpp
Outdated
There was a problem hiding this comment.
311902f2cf94ee0e11ed34a1373db42dbc218b20
should this be exposed in peer info?
There was a problem hiding this comment.
good idea, added a new commit for this
Accomplished by adding a bool argument `force` to SelectNodeToEvict()
Only inbound nodes with this permission set will call `SelectNodeToEvict()` with force=true, so when connections are full there is an increased liklihood of opening a slot for the new inbound. Extends NoBan permission.
|
Thanks for the review Gleb, I also rebased on master to fix conflicts |
| TransportProtocolType m_transport_type; | ||
| /** BIP324 session id string in hex, if any. */ | ||
| std::string m_session_id; | ||
| /** whether this peer forced its connection by evicting another */ |
There was a problem hiding this comment.
So forced actually means two things to you:
- Before connecting — something that might evict another peer
- After connecting — something that has evicted another peer
I think this is confusing and better to use different words. RPC can expose "force_evicted" flag (to cover the latter case and apply to the Connection), while "forceInbound" can stay in the network/potential-connection context (the former case).
There was a problem hiding this comment.
Apparently, in the previous commit 7586802 you assign forced = true; even for non-forceInbound peers.
So basically if a node had 8 connections which are non-forceInbound but still did eviction, a real forceInbound connection won't be able to join (see how you count nForced). Is that intended? Seems like a bug to me.
|
🐙 This pull request conflicts with the target branch and needs rebase. |
|
I feel like this makes sense conceptually, but I have similar concerns to @stickies-v @mzumsande and @naumenkogs with it. If we could approach this without the need to introduce an additional permission I'd be happier, since it seems a big change for a narrow use case with a potential workaround by just accepting more than 38 total connections (plus Given this only triggers under really specific conditions, can we not just prioritize our whitelisted peer under those conditions? The addition of new whitelisted peers can take priority over the ones we protect for "diversity criteria" as long as we do not have enough candidates to evict a peer, that is: after protecting the current For this to trigger, your number of |
|
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Closes #8798
Use case: I run a full node that accepts inbound connections and have a
whitebindsetting so my personal light client can always connect, even whenmaxconnections(and particularly all inbound slots) is already full.Currently when connections are full, if we receive an inbound peer request, we look for a current connection to evict so the new peer can have a slot. To find an evict-able peer we go through all our peers and "protect" multiple categories of peers, then we evict the "worst" peer that is left unprotected. If there are no peers left to evict, the inbound connection is denied.
With this PR, if the inbound connection has
forceinboundpermission we start the eviction process by first protecting allnobanand outbound connections, then selecting one of the remaining peers (if any) at random. Then we loop through all our current connections, removing protected peers from the evict-able list. If we end up protecting all our remaining connections, the randomly chosen peer is evicted.forceinboundimpliesnobanpermission.All outbound and
nobanconnections remain protected from eviction.