Skip to content

Improve access control in peer connections#9880

Merged
Roasbeef merged 17 commits intolightningnetwork:masterfrom
yyforyongyu:accessman
Jun 20, 2025
Merged

Improve access control in peer connections#9880
Roasbeef merged 17 commits intolightningnetwork:masterfrom
yyforyongyu:accessman

Conversation

@yyforyongyu
Copy link
Member

@yyforyongyu yyforyongyu commented May 30, 2025

We now remove the restriction placed on outbound connections - previously we would account them for numRestricted, which is no longer the case, as for user-initiated connections, it should not be put under the same restriction as the inbound ones - we may choose to eventually put a restriction on the outbound too, but it should be a very different category from the inbound.

In addition more tests are patched to fix the case that an existing connection may be counted multiple times, causing the numRestricted to be falsely reached.

Depends on,

TODOs:

  • Add more tests
  • add detailed descriptions

This change is Reviewable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 30, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ziggie1984 ziggie1984 self-requested a review June 2, 2025 18:06
@yyforyongyu yyforyongyu force-pushed the accessman branch 4 times, most recently from c82c790 to 3b089cf Compare June 3, 2025 15:27
@yyforyongyu yyforyongyu marked this pull request as ready for review June 3, 2025 16:26
@saubyk saubyk added this to the v0.19.1 milestone Jun 3, 2025
@saubyk saubyk linked an issue Jun 3, 2025 that may be closed by this pull request
@saubyk saubyk modified the milestones: v0.19.1, v0.19.2 Jun 3, 2025
@ziggie1984 ziggie1984 added p2p Code related to the peer-to-peer behaviour bug fix labels Jun 4, 2025
@lightninglabs-deploy
Copy link
Collaborator

@ziggie1984: review reminder

@ziggie1984
Copy link
Collaborator

@Crypt-iQ could you take a look at this new design where we remove outbound restrictions, are there some downsides with the current LND codebase which we do not see, referring to your comment here:

#9874 (comment)

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, left some comments.

I think we also need to think about the two functions:

  1. notifyOpenChannelPeerEvent
  2. notifyPendingOpenChannelPeerEvent

These are now coupled with the accessManager however I think other subsystems are not aware that these functions can now fail when peers are not connected. So I would highly recommend making those calls independent for the accessMan.

Example: THe funding manager calls the newOpenchannel notification when a channel hits the required amount of confirmations. Now in case the peer is disconnected we won't notify all the dependent services that a new channel opened. And the funding manager will not send it again. So I think we should not couple those two things.

// peer.
acsmLog.DebugS(ctx, "Peer has no channels, assigning restricted access")

// If this is an existing peer, there's no need to check for slot limit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we call this after the disconnect check, I think it makes sense to do it before ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the peer is messing with us we should disconnect?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was more looking into the shouldDisconnect logic and saw there that we are already do not disconnect if we have a channel so yeah maybe it is better for the accessMan not assume stuff happening in other subsystems:

isChanPeer, err := d.cfg.ScidCloser.IsChannelPeer(pubkey)

Just wanted to bring it up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better for the accessMan not assume stuff happening in other subsystems

yeah good thought - that's exactly the decoupling we are aiming for, which means we will need a design doc for this, to explicitly picture how these subsystems interact with each other.

server.go Outdated
nodeKeyECDH, listenAddr.String(),
// TODO(yy): remove this check and unify the inbound
// connection check inside `InboundPeerConnected`.
s.peerAccessMan.checkIncomingConnBanScore,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Crypt-iQ what was our intention again to put this check earlier into the HandShake method rather than into the InboundPeerConnected ?

//
// TODO(yy): Consider perform this check in
// `peerAccessMan.addPeerAccess`.
access, err := s.peerAccessMan.assignPeerPerms(pubKey)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to unify addPeerAccess and assignPeerPerms because peerConnected can be called concurrently for many peers and we should increment the restricted Peers number if we are connecting the peer otherwise because we need to hold thebanMutex it can be possible that we allow more restricted peers than possible.

2 peers call peerConnected both of these calls succeed with assignPeerPerms before one of them has called addPeerAccess to increase the counter of the restictive peer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think whenever we review PRs it's good to keep in mind of two things - first, what is the scope of the PR, which the author usually states it in the PR description. Second, what is the issue identified - is it preexisting, or it's introduced by this PR. If a new PR introduced a buggy behavior, that def needs to be addressed. If it's preexisting, then there's no need to fix everything in one PR, as long as the original scope is achieved. This is to make sure we move forward with progress, and not putting all the task under one author.

As for this issue, I think it's preexisting, otherwise I won't bother creating that TODO. In fact, since we are now moving the assignPeerPerms here instead of outside of peerConnected, it already makes the situation better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we should not fix it in this PR. But I think reviews like this are still a good forum to discuss potential problems of the current code. I was not aware of this race condition beforehand that's why I brought it up and wanted to get your opinion here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah def - we need to bring awareness so it's good to discuss the potential issue. Tho based on offline discussion it seems to me that this is a blocking comment, which is why I brought it up.

@yyforyongyu
Copy link
Member Author

I think we also need to think about the two functions:

Cool think this is now fixed.

@saubyk saubyk requested a review from Roasbeef June 17, 2025 16:34
This flag is added so we can use it in the itest to mimic racing inbound
and outbound connections.
When a peer already has a connection with us, there's no need to check
for available slots as we will either close the old conn or refuse the
new conn.
We need to also check this map to make sure the peer exists or not.
When a peer is already existing, we should skip incrementing the
`numRestricted` count.

Also patched unit test for method `addPeerAccess`.
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have a final question otherwise it is gtg:

  1. Should we remove the Peer from the PeerCount map in case it has not open/pending channel. Otherwise Peers will always be able to reconnect as inbound although we disconnected to them ?

// peer.
acsmLog.DebugS(ctx, "Peer has no channels, assigning restricted access")

// If this is an existing peer, there's no need to check for slot limit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was more looking into the shouldDisconnect logic and saw there that we are already do not disconnect if we have a channel so yeah maybe it is better for the accessMan not assume stuff happening in other subsystems:

isChanPeer, err := d.cfg.ScidCloser.IsChannelPeer(pubkey)

Just wanted to bring it up.

}

// Assigning to peer1 should not return an error.
status, err := a.assignPeerPerms(peer1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok now I understand it, but shouldn't we make sure we remove peers from the peerCount as well which have no channels in case we remove the peer in removePeerAccess.

Otherwise all peers we ever connected outbound wise will have access when connecting inbound ?

//
// TODO(yy): Consider perform this check in
// `peerAccessMan.addPeerAccess`.
access, err := s.peerAccessMan.assignPeerPerms(pubKey)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we should not fix it in this PR. But I think reviews like this are still a good forum to discuss potential problems of the current code. I was not aware of this race condition beforehand that's why I brought it up and wanted to get your opinion here as well.

}

// Assigning to peer1 should not return an error.
status, err := a.assignPeerPerms(peer1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we remove a peer's score when they disconnect:

lnd/server.go

Lines 4960 to 4961 in 168d63c

// Remove the peer's access permission from the access manager.
s.peerAccessMan.removePeerAccess(p.IdentityKey())

So if we connect to a peer that we have no channels with, then we disconnect and they reconnect inbound, they'll start with a blank score.

@yyforyongyu
Copy link
Member Author

Should we remove the Peer from the PeerCount map in case it has not open/pending channel. Otherwise Peers will always be able to reconnect as inbound although we disconnected to them ?

Yeah good catch - added a commit plus some tests to remove it, but only if the peer has no channels with us.

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🥽

sample.conf would need an update now as well which currently states:

; The number of restricted slots the server will allocate for peers.
; num-restricted-slots=100

Something along the lines like:

This restriction only counts for inbound however outbound are still counted as restricted, so you can have more than 100 restricted peers, but not more inbound peers if the number exceeded the max-restricted.


// numRestrictedExpected specifies the final value to expect once the
// test finishes.
var numRestrictedExpected int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe start from 1 here so we do not decrement to -1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i kinda wanna show that this num can be negative as we need underflow protection.

We now remove the peer from `peerChanInfo` if this peer doesn't have
channels with us. Also patched a unit test for `removePeerAccess`.
@yyforyongyu
Copy link
Member Author

sample.conf would need an update now as well which currently states:

@ziggie1984 Updated - note that outbound connections are not restricted, so they should not increment the slots count numRestricted since they would have peerStatusTemporary by default.

@Roasbeef Roasbeef requested review from Crypt-iQ and ziggie1984 June 20, 2025 22:25
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 11 files at r1, 9 of 9 files at r2, 8 of 11 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: all files reviewed, 26 unresolved discussions (waiting on @Crypt-iQ, @yyforyongyu, and @ziggie1984)

@Roasbeef
Copy link
Member

Accidentally dismissed the existing approval, merging bsaed on that.

@Roasbeef Roasbeef merged commit faa68a2 into lightningnetwork:master Jun 20, 2025
35 of 37 checks passed
@yyforyongyu yyforyongyu deleted the accessman branch June 21, 2025 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix p2p Code related to the peer-to-peer behaviour

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: Relax restricted peer access rules

5 participants