Skip to content

Correct msgs_sent_since_pong tracking for gossip forwards#4093

Merged
tnull merged 1 commit intolightningdevkit:mainfrom
TheBlueMatt:2025-09-dont-disconnect-slow-gossipers
Sep 19, 2025
Merged

Correct msgs_sent_since_pong tracking for gossip forwards#4093
tnull merged 1 commit intolightningdevkit:mainfrom
TheBlueMatt:2025-09-dont-disconnect-slow-gossipers

Conversation

@TheBlueMatt
Copy link
Collaborator

When we forward gossip messages, we use the already-encoded copy we have, pushing it onto the gossip_broadcast_buffer. When we have free socket buffer space, and there are no non-gossip messages pending, we'll remove it from the gossip_broadcast_buffer and push it onto the normal pending_outbound_buffer socket queue.

Separately, we use msgs_sent_since_pong to ensure that our peer is being responsive and our messages are getting through with minimal latency. After we send 32 messages (if we've already received the pong for the last ping wensent), we send an extra ping and will only queue up an additional 32 forwarded gossip messages before we start dropping forwarded gossip on the floor.

Previously, we (arguably) incorrectly incremented
msgs_sent_since_pong when we pushed a message onto the gossip_broadcast_buffer, not when it was actually queued up in our pending_outbound_buffer immediate socket queue. This is fairly strange - the point of msgs_sent_since_pong is to keep peer latency low, we already independently enforce memory limits to ensure we drop forwarded gossip messages if a peer's message queues have grown too large. This means we potentially disconnect a peer for not draining the backlog forwarded-gossip socket queue fast enough, rather than simply dropping gossip.

While this shouldn't be a big deal in practice, I do see (mostly Tor) peers disconnect on my node occasionally, and while its likely due to Tor circuits hanging and a disconnect is needed, the incorrect tracking here makes state analysis difficult as there are nearly always a flow of enough gossip messages to "send to the peer". Its also possible that this allows incredibly low bandwidth connections to stay afloat more durably.

When we forward gossip messages, we use the already-encoded copy
we have, pushing it onto the `gossip_broadcast_buffer`. When we
have free socket buffer space, and there are no non-gossip messages
pending, we'll remove it from the `gossip_broadcast_buffer` and
push it onto the normal `pending_outbound_buffer` socket queue.

Separately, we use `msgs_sent_since_pong` to ensure that our peer
is being responsive and our messages are getting through with
minimal latency. After we send 32 messages (if we've already
received the `pong` for the last `ping` wensent), we send an extra
`ping` and will only queue up an additional 32 forwarded gossip
messages before we start dropping forwarded gossip on the floor.

Previously, we (arguably) incorrectly incremented
`msgs_sent_since_pong` when we pushed a message onto the
`gossip_broadcast_buffer`, not when it was actually queued up in
our `pending_outbound_buffer` immediate socket queue. This is
fairly strange - the point of `msgs_sent_since_pong` is to keep
peer latency low, we already independently enforce memory limits to
ensure we drop forwarded gossip messages if a peer's message queues
have grown too large. This means we potentially disconnect a peer
for not draining the backlog forwarded-gossip socket queue fast
enough, rather than simply dropping gossip.

While this shouldn't be a big deal in practice, I do see (mostly
Tor) peers disconnect on my node occasionally, and while its likely
due to Tor circuits hanging and a disconnect is needed, the
incorrect tracking here makes state analysis difficult as there are
nearly always a flow of enough gossip messages to "send to the
peer". Its also possible that this allows incredibly low bandwidth
connections to stay afloat more durably.
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Sep 19, 2025

👋 I see @tankyleo was un-assigned.
If you'd like another reviewer assignment, please click here.

@codecov
Copy link

codecov bot commented Sep 19, 2025

Codecov Report

❌ Patch coverage is 42.85714% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.60%. Comparing base (50391d3) to head (686a586).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/peer_handler.rs 42.85% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4093      +/-   ##
==========================================
- Coverage   88.61%   88.60%   -0.02%     
==========================================
  Files         176      176              
  Lines      132099   132120      +21     
  Branches   132099   132120      +21     
==========================================
+ Hits       117060   117065       +5     
- Misses      12365    12386      +21     
+ Partials     2674     2669       -5     
Flag Coverage Δ
fuzzing 21.92% <0.00%> (-0.01%) ⬇️
tests 88.44% <42.85%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tnull tnull self-requested a review September 19, 2025 06:56
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, simple enough, landing.

Do you happen to recall why we originally incremented it when adding to the gossip_broadcast_buffer?

@tnull tnull merged commit 0cb7f54 into lightningdevkit:main Sep 19, 2025
24 of 25 checks passed
@TheBlueMatt
Copy link
Collaborator Author

Do you happen to recall why we originally incremented it when adding to the gossip_broadcast_buffer?

I do not. I assume it was just convenient, the message is now in a "message queue" so it looks kinda like we should increment the "messages queued" counter.

@tankyleo tankyleo removed their request for review October 10, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants