Skip to content

channeldb: gc fwd packgages which have zero ADDs#10041

Merged
guggero merged 1 commit intolightningnetwork:masterfrom
ziggie1984:fix-fwd-pkg-garbage-collection
Jul 8, 2025
Merged

channeldb: gc fwd packgages which have zero ADDs#10041
guggero merged 1 commit intolightningnetwork:masterfrom
ziggie1984:fix-fwd-pkg-garbage-collection

Conversation

@ziggie1984
Copy link
Collaborator

@ziggie1984 ziggie1984 commented Jul 5, 2025

Introduced during development of 19.2 in #9911

Now that we do not process Fwds with zero ADDs we need to make sure that we also garbage collect them down the road.

This PR reverts changes made in PR #9911

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ziggie1984, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the channeldb's handling of forwarding packages, specifically targeting those that no longer contain any active forwarding instructions. The core change ensures that these 'zero-ADD' packages are correctly transitioned to a processed or completed state, which is vital for accurate database state management and enabling their eventual garbage collection.

Highlights

  • Forwarding Package State Management: I've introduced new logic within channeldb/forwarding_package.go to correctly handle forwarding packages that contain zero 'ADDs' (forwarding instructions). This ensures that such packages are properly recognized as FwdStateProcessed or FwdStateCompleted.
  • Improved Garbage Collection: This change facilitates the proper garbage collection of forwarding packages that have no active 'ADDs'. Previously, the fwdFilter was not set for these packages, which could prevent their proper cleanup from the database.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue where forwarding packages with zero ADDs were not being garbage collected correctly. The change in loadFwdPkg correctly handles this case by advancing the package state.

Comment on lines +698 to +689
if fwdPkg.SettleFailFilter.IsFull() {
fwdPkg.State = FwdStateCompleted
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for setting FwdStateCompleted is also present on lines 727-729. When len(adds) == 0, fwdPkg.AckFilter.IsFull() is true, making the conditions equivalent. This duplication could lead to maintenance issues if the completion logic changes in the future.

Consider refactoring to unify these checks by setting fwdPkg.State = FwdStateProcessed and then letting the execution fall through to the common completion check at the end of the function, removing the early return.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there is a slight difference, we need to handle the case differently when we have no adds, because otherwise we always hit the case fwdFilterBytes == nil, and never garbage collect fwd packages which have no adds in the first place, can you make a proposal how you would like to refactor that part

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to put @gemini-code-assist to get a response😂

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've raised a very valid point, @ziggie1984. My apologies for the initial suggestion being too broad without fully accounting for the nuances of the fwdFilterBytes == nil condition.

You are absolutely correct that the if len(adds) == 0 block with its early return is necessary. The interpretation of fwdFilterBytes == nil is indeed different depending on whether the package originally contained adds or not:

  • If len(adds) == 0, then fwdFilterBytes will always be nil because no forward filter is ever written to disk for such packages. In this case, the package should be marked as FwdStateProcessed and potentially FwdStateCompleted if its settle/fail filter is full.
  • If len(adds) > 0, then fwdFilterBytes == nil signifies that the package has adds but its forward filter has not yet been persisted, meaning it's still in FwdStateLockedIn.

Trying to unify the logic by removing the early return would force the len(adds) == 0 case into the fwdFilterBytes == nil branch meant for FwdStateLockedIn packages with adds, which would be incorrect and prevent proper garbage collection for packages with no adds.

Furthermore, the conditions for reaching FwdStateCompleted are also distinct:

  • For len(adds) == 0 packages, completion depends solely on fwdPkg.SettleFailFilter.IsFull().
  • For len(adds) > 0 packages, completion requires both fwdPkg.FwdFilter.IsFull() && fwdPkg.SettleFailFilter.IsFull().

Given these distinct logical paths and conditions, the current structure, while appearing to have some duplication, correctly models the different states and transitions for forwarding packages. Therefore, the current implementation is appropriate, and no refactoring is needed in this specific area to unify the completion logic or the FwdStateProcessed assignment.

@ziggie1984 ziggie1984 self-assigned this Jul 5, 2025
@ziggie1984 ziggie1984 added this to the v0.19.2 milestone Jul 5, 2025
@ziggie1984 ziggie1984 force-pushed the fix-fwd-pkg-garbage-collection branch from acfa8ae to afa189c Compare July 5, 2025 15:09
@ziggie1984
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug where forwarding packages with no ADD HTLCs would not be garbage collected, leading to database bloat. The change correctly identifies these packages during loading and sets their state to processed, allowing them to be completed and garbage collected. A regression test to cover the fixed bug scenario should be added.

@ziggie1984 ziggie1984 force-pushed the fix-fwd-pkg-garbage-collection branch from afa189c to c4cf020 Compare July 5, 2025 15:29
@ziggie1984
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses the garbage collection of forwarding packages with zero ADDs by updating the state transition logic in loadFwdPkg. The new logic is covered by the new test case TestPackagerOnlySettleFailsNoAdds. I've left a few suggestions in the new test to improve consistency with the existing test suite's error handling patterns.

@ziggie1984 ziggie1984 force-pushed the fix-fwd-pkg-garbage-collection branch from c4cf020 to 263b710 Compare July 5, 2025 15:34
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow think we may have found a hidden bug here. Before #9911, for any FwdPkg that has zero ADDs but some Settle/Fails, since we always have this order,

  1. processRemoteAdds first, even the Adds is zero, we would call SetFwdFilter to move its state to FwdStateProcessed.
  2. processRemoteSettleFails, always called after processRemoteAdds.

Then we would update AckFilter via AckAddHtlcs and update SettleFailFilter via AckSettleFails, which all happen inside AppendRemoteCommitChain. Once it's done the pkg is considered as completed.

After #9911, this is no longer the case as we would miss marking the settle/fail pkg as processed as we were marking it at the wrong place all the time. There's also the test that specifically checks this case - inside TestPackagerOnlySettleFails, we would do a full state transition, with the expectation that SetFwdFilter is called.

There are multiple ways to fix it, I think the best way is to stick to its intension as suggested by the existing test, that we add SetFwdFilter to processRemoteSettleFails similar to how it's done in processRemoteAdds. The current PR however, breaks the existing state transition flow and will mistakenly mark unprocessed pkgs as processed.

Also curious how did you find this out? Were you monitoring the db or sth?

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops pressed the wrong button above.

@ziggie1984
Copy link
Collaborator Author

There are multiple ways to fix it, I think the best way is to stick to its intension as suggested by the existing test, that we add SetFwdFilter to processRemoteSettleFails similar to how it's done in processRemoteAdds. The current PR however, breaks the existing state transition flow and will mistakenly mark unprocessed pkgs as processed.

Thank you for your review. Yeah I saw fwd-packages building up on links that made me curious.

Definitely agree we can fix it in several ways, we however need to keep in mind that if we launch RC1 with this behaviour we need to fix potential fwd pkgs which are already lingering in our db due to this bug.

Moreover we also need to think of the case where we have 0 ADDs and 0 settle/fails when receiving a Revocation for example FeeUpdate msg which leads to a RevocAck as well.

So you are proposing basically to also set the fwd_filter entry even if we have 0 ADDs ? And also set the fwd_filter entry for Settle/Fails in case we have no ADDs ?

@ziggie1984
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes introduce the ability to garbage collect forwarding packages which have zero ADDs. The changes look good and the tests are a great addition.

@ziggie1984 ziggie1984 requested a review from yyforyongyu July 7, 2025 09:43
@yyforyongyu
Copy link
Member

Moreover we also need to think of the case where we have 0 ADDs and 0 settle/fails when receiving a Revocation for example FeeUpdate msg which leads to a RevocAck as well.

Can you show me the code where it happens? why would we need update_fee to be forwarded?

So you are proposing basically to also set the fwd_filter entry even if we have 0 ADDs ? And also set the fwd_filter entry for Settle/Fails in case we have no ADDs ?

I'm proposing to revive to the accidentally correct behavior before #9911 happened. And as mentioned above, based on the workflow of fwd logs, we are missing marking logs as processed in processRemoteSettleFails.

we however need to keep in mind that if we launch RC1 with this behaviour we need to fix potential fwd pkgs which are already lingering in our db due to this bug.

Based on my understanding of the lifecycle of a fwd pkg, as long as we have marked logs as processed in processRemoteSettleFails we will be fine as a restart will get them to be cleaned next time.

Comment on lines +698 to +689
if fwdPkg.SettleFailFilter.IsFull() {
fwdPkg.State = FwdStateCompleted
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to put @gemini-code-assist to get a response😂

lnwire.ExperimentalEndorsementType,
)

switch fwdPkg.State {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you aware there's PR #10018?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like 10018 could remove the fwdFilter replace it with a reforward boolean, because we do not really need it as a filter.

@ziggie1984
Copy link
Collaborator Author

ziggie1984 commented Jul 7, 2025

Can you show me the code where it happens? why would we need update_fee to be forwarded?

So it does not only relate to the Revocation as a responds of the UpdateFee msg we send to our peer, imagine we send an HTLC to our PEER, the first revocation we receive from our peer has 0 ADDs and 0 Settles, we basically always record a fwdPkg when receiving a revocation which probably needs some improvment.

tested it also locally to verify my assumption just added a log line before processing the RemoteAdds and got this when sending a payment (regtest):

2025-07-07 15:08:08.761 [INF] HSWC: ChannelLink(a45da3fb420fe817db6523f62402ad0fd0930c173e8be30d8d1fe65fde661bae:0): Received RevokeAndAck from remote party, fwdPkg: *channeldb.FwdPkg(src=2428:1:0, height=5, nadds=0, nfailsettles=0), fwd state: 0, number of adds: 0, number of settle fails: 0

and when the peer settles we see something like this:

2025-07-07 15:08:08.883 [INF] HSWC: ChannelLink(a45da3fb420fe817db6523f62402ad0fd0930c173e8be30d8d1fe65fde661bae:0): Received RevokeAndAck from remote party, fwdPkg: *channeldb.FwdPkg(src=2428:1:0, height=6, nadds=0, nfailsettles=1), fwd state: 0, number of adds: 0, number of settle fails: 1

logic introduced by 9911 would not GC both fwd packages, your proposal to mark the fwd filter when we have zero Adds but still Fails would gc the second package. However for the first package we either always mark the fwdFilter when processing a package or we add the exception when loading the fwd package from memory and checking the length of the Adds.

We could also fix it by marking the fwdFilter when the len(Adds) in ProcessingRemoteAdds function is zero, restoring old behaviour. I am ok with both solutions.

@ziggie1984 ziggie1984 force-pushed the fix-fwd-pkg-garbage-collection branch from 2508ee9 to 59c3de3 Compare July 7, 2025 15:49
always process remote ADDs even when they are empty to trigger
the gc process when loading them back into memory.
@ziggie1984 ziggie1984 force-pushed the fix-fwd-pkg-garbage-collection branch from 59c3de3 to 923cafb Compare July 8, 2025 08:26
@ziggie1984
Copy link
Collaborator Author

Since we did not find proper consensus for a new approach, I reverted back the changes of #9911 and added some useful comments.

@ziggie1984 ziggie1984 requested review from guggero and yyforyongyu and removed request for Roasbeef July 8, 2025 08:28
@guggero guggero merged commit df1a15f into lightningnetwork:master Jul 8, 2025
39 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants