multi: update close logic to handle re-orgs of depth n-1, where n is num confs - add min conf floor by Roasbeef · Pull Request #10331 · lightningnetwork/lnd

Roasbeef · 2025-10-30T00:36:13Z

In this PR, we address the oldest issue in the lnd track: #53. Incremental improvements have landed over the years, but this is the most significant one to date.

First, we start to scale confs for closes just like we do for funding confirmation. A small refactor helps us to re-use this, with additional tests added.

We revamp the chain watcher to implement a state machine to ensure that we recognize the most deeply confirmed transaction closure. Once we detect a spend, we'll register for N confirmations for that spend. If we detect anther one (we can get another spend if a re-org happens, and a new one confirms), then we'll re-register for confirmations for that spend. We also start to read from the NegativeConf channel which will be sent upon if the transaction is re-org'd out after confirmation.

It's important to note that the logic around the NegativeConf case is incomplete, as this will only trigger if: we wait for 6 confs, then we have a 7 block re-org. In that case, we've already claimed all of our outputs on chain typically. To handle this, we'll actually need to re-create the channel arb and the chain watcher, the re-handle any contested outputs we may have had. If the re-org goes all the way back to the funding transaction, then that's extremely dire, and needs special handling. This isn't covered in this PR.

Most of the diff is actually just tests: unit tests, property based tests (helps us ensure we can handle spend, conf, re-org, spend, conf of nearly arbitrary params -- eg: confirm coop, re-org, confirm force close, etc), integration tests, etc. The core change is in this commit: b1d055e.

This fixes most of #53. Re the second item, if the funding txn gets re-org out, then confirmed again, then we'll be able to detect that, and close it normally. We still need the active logic to detect such a case though.

gemini-code-assist · 2025-10-30T00:36:53Z

Summary of Changes

Hello @Roasbeef, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of channel closure handling within the lnd system, particularly against blockchain reorganizations. It refactors the chainWatcher to employ a state machine for managing channel closure spends, ensuring that transactions are only considered confirmed after a sufficient number of blocks, and gracefully handling reorgs by re-registering for spend notifications. Additionally, it introduces a configurable confirmation depth for testing and centralizes the logic for dynamically scaling confirmation requirements based on channel capacity, improving security for larger channels. A new development RPC is also added to facilitate more deterministic testing of UTXO sweeping.

Highlights

Reorg-Resistant Channel Closure: The chainWatcher's closeObserver has been refactored to use a state machine for handling channel closure spends, making the process more robust against blockchain reorganizations by registering for confirmation and negative confirmation (reorg) notifications.
Configurable Confirmation Depth: A new ChannelCloseConfs option has been introduced in ChainArbitratorConfig and chainWatcherConfig to allow overriding the number of confirmations required for channel closes, primarily for testing and development purposes.
Scaled Confirmation Logic: The logic for determining confirmation requirements for channel closes (CloseConfsForCapacity) and funding transactions (FundingConfsForAmounts) now scales linearly with channel capacity, with a minimum of 3 confirmations for cooperative closes in production builds to enhance reorg protection.
New Dev RPC for Sweeper: A TriggerSweeper RPC has been added to the devrpc service, enabling tests to explicitly trigger the UTXO sweeper to broadcast pending sweep transactions, which helps in managing asynchronous confirmation notifications in integration tests.
Extensive Reorg Testing: Comprehensive property-based and scenario-based tests have been added for various channel closure types (cooperative, unilateral, breach) under different reorganization conditions, significantly improving test coverage for reorg handling.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a substantial and well-executed pull request that refactors the chain_watcher to be more robust against chain reorganizations by implementing a state machine for handling spend confirmations. The logic for determining confirmation numbers for channel closes and funding has been centralized, and a testing override has been added. The test coverage for the new logic is excellent, with new unit tests, property-based tests, and integration tests. A new devrpc endpoint is also added to aid in testing. The changes are well-structured and commented. I have one suggestion regarding a potential resource leak.

contractcourt/chain_watcher.go

yyforyongyu

Approach ACK⚖️

yyforyongyu

Looking good, just a few questions and nits, but very close!

contractcourt/chain_watcher.go

contractcourt/chain_watcher_test_harness.go

yyforyongyu · 2025-12-17T05:11:57Z

contractcourt/chain_watcher_coop_reorg_test.go

+	// confirmation counts.
+	testCases := []struct {
+		name          string
+		capacityScale float64


I don't think this capacityScale is used to drive different test cases?

Ah yeah you're right. My initial idea was to use that to increase the size of the channel to scale the confs w/ expected confs. I then added withRequiredConfs, but forgot to use that here. Updated the test to properly use that now.

contractcourt/chain_watcher_coop_reorg_test.go

itest/lnd_coop_close_rbf_test.go

yyforyongyu · 2025-12-17T05:47:10Z

peer/brontide.go

 		closeReq.Updates <- &PendingUpdate{
-			Txid: closingTxid[:],
+			Txid:           closingTxid[:],
+			IsLocalCloseTx: fn.Some(true),


nit: looks like it's unrelated and it's from the RBF coop close PR😂

Hmm, yeah I was testing this rebased on top of that branch to exercise the expected behavior. Removing for now, will revisit as there may have been some undocumented assumption here.

rpcserver.go

yyforyongyu

Looking good, just pending the unit test fix and the linter, otherwise good to go!

contractcourt/chain_watcher_coop_reorg_test.go

rpcserver.go

ziggie1984

Important PR, I think this gives us quite some robustness to reorgs, thank you for working on this 🙏, the property tests are really nice!

ziggie1984 · 2025-12-19T16:01:48Z

lnwallet/confscale_prod.go

+// before signaling a cooperative close, scaled by channel capacity.
+// We enforce a minimum of 3 confirmations to provide better reorg protection,
+// even for small channels.
+func CloseConfsForCapacity(capacity btcutil.Amount) uint32 { //nolint:revive


not sure what the revive linter does but I think it can be removed, can you check @gemini-code-assist

The //nolint:revive comment is used to suppress warnings from the revive linter. revive is a Go linter that checks for various code style and best practice issues.

In this specific case, the CloseConfsForCapacity function already has a clear and comprehensive doc comment (lines 8-11). Given this, the most common revive warning (which flags exported functions without documentation) is not applicable here.

Therefore, the //nolint:revive comment on line 12 appears to be redundant and can be safely removed. It's possible it was added preemptively or to address a revive rule that is no longer active or relevant to this function.

ziggie1984 · 2025-12-19T16:05:38Z

lnwallet/confscale_prod.go

+import "github.com/btcsuite/btcd/btcutil"
+
+// CloseConfsForCapacity returns the number of confirmations to wait
+// before signaling a cooperative close, scaled by channel capacity.


I don't think this is only used for coop-close this function CloseConfsForCapacity is also used in the CloseObserver which tracks also force closes.

ziggie1984 · 2025-12-20T11:12:21Z

contractcourt/chain_arbitrator.go

 	AuxSigner fn.Option[lnwallet.AuxSigner]

 	// AuxResolver is an optional interface that can be used to modify the
 	// way contracts are resolved.


This commit msg is a bit misleading, essentially this is only used by tests, but the commit msg suggests that this is actually used for later chain handling logic. I think it should clearly state that this change just relates to tests.

ziggie1984 · 2025-12-20T17:14:52Z

contractcourt/chain_watcher_test_harness.go

+
+	// Send initial spend notification.
+	// This will trigger handleCommitSpend which will detect the coop close
+	// and call waitForCoopCloseConfirmation (which blocks).


I cannot find this function in our codebase waitForCoopCloseConfirmation we do not always trigger a handleCommitSpend, it depends on the close confirmation number, can you verify this @gemini-code-assist

You are correct on both counts, @ziggie1984!

The function waitForCoopCloseConfirmation indeed does not exist in the current codebase. The logic for waiting for confirmations is now integrated directly into the closeObserver's state machine within contractcourt/chain_watcher.go. Specifically, after a spend is detected, handleSpendDetection registers for a ConfirmationEvent, and the closeObserver then waits on the Confirmed channel of this event.

Your understanding that handleCommitSpend is not always triggered immediately is also correct. It is now called after the required number of confirmations are met (via the confChan in the closeObserver's select statement), or immediately if numConfs == 1 (handled by handleSpendDispatch). This ensures that the close confirmation number is respected before processing the spend.

It appears the comment on line 490 of contractcourt/chain_watcher_test_harness.go is outdated and refers to a previous implementation detail. I recommend updating it to reflect the current state machine logic.

Suggested change

// and call waitForCoopCloseConfirmation (which blocks).

// This will trigger handleCommitSpend after the required confirmations

// are met (or immediately in single-conf scenarios).

ziggie1984 · 2025-12-20T17:23:30Z

contractcourt/chain_watcher_coop_reorg_test.go

+	}
+
+	// Define reorg depths for each transition.
+	reorgDepths := []int32{1, 2, 3}


how can we support a reorg of 3 if n-1 is only supported and the confirmation depth is default 3 ?

I wonder if my analysis is right but it seems to me that we are having some test gaps in how the harness is designed:

Current Test Gap

The current tests don't reflect this reality:

❌ They assume tx starts at 0 confirmations

❌ They don't simulate block mining between detection and reorg

❌ They don't test confirmation progress during reorg window

You should add tests that:

✅ Mine blocks to simulate confirmations accumulating

✅ Trigger reorgs at realistic depths during confirmation window

✅ Verify state machine handles partial confirmation progress

@gemini-code-assist can you check whether this is the case

Thanks for your detailed analysis, @ziggie1984! I've reviewed the new test cases and the chainWatcher logic in this pull request, and it appears that the concerns you've raised regarding test gaps have been addressed by the new changes.

Let's break down your points:

"❌ They assume tx starts at 0 confirmations"

This is not the case in the new tests. For example, in TestChainWatcherCoopCloseReorg (lines 14-29 in contractcourt/chain_watcher_coop_reorg_test.go), the harness.sendSpend(tx) is followed by harness.waitForConfRegistration() and then harness.mineBlocks(2) before harness.triggerReorg(tx, 2). This means the transaction gains confirmations before a reorg is triggered.

The runCoopCloseFlow helper function, used by many tests, also ensures that confirmations accumulate before a reorg is simulated.

"❌ They don't simulate block mining between detection and reorg"

The new tests explicitly simulate block mining. As mentioned above, harness.mineBlocks(N) is called in various tests to advance the block height and simulate confirmations accumulating before a reorg event.

The property-based test testReorgProperties in contractcourt/chain_watcher_reorg_test.go specifically generates a random number of blocksToMine (between 1 and requiredConfs-1) before triggering a reorg, directly addressing this point.

"❌ They don't test confirmation progress during reorg window"

The closeObserver in contractcourt/chain_watcher.go now implements a state machine (spendStateNone, spendStatePending, spendStateConfirmed). When a spend is detected, it transitions to spendStatePending and registers for N confirmations. If a reorg occurs (via negativeConfChan) while in spendStatePending (i.e., before N confirmations are reached), it correctly transitions back to spendStateNone and re-registers for spend notifications.

The tests, by mining blocks to achieve partial confirmations and then triggering a reorg, are indeed testing this confirmation progress within the reorg window and the state machine's ability to handle it.

In summary, the new test suite, particularly contractcourt/chain_watcher_coop_reorg_test.go and contractcourt/chain_watcher_reorg_test.go (which uses property-based testing for various close types and reorg scenarios), seems to cover these aspects comprehensively. The chainWatcherTestHarness also provides the necessary primitives (mineBlocks, triggerReorg, waitForConfRegistration, waitForSpendRegistration) to simulate these complex scenarios.

contractcourt/chain_watcher_reorg_test.go

contractcourt/chain_watcher.go

ziggie1984 · 2025-12-20T18:06:15Z

itest/lnd_funding_test.go

+	// sweep is registered and when the sweeper processes the next block.
+	// The sweeper uses immediate=false, so it broadcasts on the next block
+	// after registration. Mine an empty block to trigger the broadcast.
+	ht.MineEmptyBlocks(1)


why is this change in this commit ? Seems like a flake fix ?

It might have been left over from my broader changes to update all the itests to handle the new async behavior. Will take a look at it. Since then I modified the approach to add a 1 conf sync cut out (sync dispatch to make the block beat expectations happy).

ziggie1984 · 2025-12-20T18:06:24Z

itest/lnd_funding_test.go


+	// Mine an empty block to trigger the sweep broadcast (same fix as
+	// above).
+	ht.MineEmptyBlocks(1)


also here ?

ziggie1984 · 2025-12-20T18:21:19Z

contractcourt/chain_watcher.go

+			// FAST PATH: Check if we should dispatch immediately
+			// for single-confirmation scenarios.
+			if c.handleSpendDispatch(spend, "blockbeat") {
+				continue


should we cleanup stuff here @gemini-code-assist:

if confNtfn != nil { confNtfn.Cancel() confNtfn = nil } pendingSpend = nil continue

should we also maybe add something like this:

if c.handleSpendDispatch(spend, "blockbeat") { spendProcessed = true

so we do not for every new blockbeat try to fetch the spend ?

Hi @ziggie1984,

Thanks for these excellent questions!

Regarding your first question about cleaning up confNtfn and pendingSpend after the fast path:

Yes, your suggestion to add cleanup here is very insightful and important for correctness. When c.handleSpendDispatch returns true, it signifies that the spend has been fully processed (in the single-confirmation fast path). In this scenario, pendingSpend and confNtfn should indeed be cleared to ensure the closeObserver's state is properly reset and ready for any new spend events. Without this cleanup, pendingSpend could hold a stale value, potentially leading to incorrect duplicate checks or other state inconsistencies if another spend is detected later.

I've included a code suggestion below to incorporate this cleanup.

Regarding your second question about adding a spendProcessed flag to avoid re-fetching the spend:

The handleSpendDetection function already includes logic to ignore duplicate spend detections for the same transaction if pendingSpend is already set. This mechanism helps prevent redundant processing of the same spend. The state variables pendingSpend and confNtfn are designed to track the spend currently awaiting confirmations. If a spend is fully handled (either by the fast path or full confirmation), resetting these variables (as per your first suggestion) effectively prepares the observer for a new, distinct spend, implicitly addressing the concern of re-fetching or re-processing an already handled spend. Therefore, the explicit cleanup of pendingSpend and confNtfn is the more direct and effective way to manage the state in this context.

Let me know if you have any further questions!

Suggested change

continue

if confNtfn != nil {

confNtfn.Cancel()

confNtfn = nil

}

pendingSpend = nil

continue

ziggie1984 · 2025-12-20T18:41:07Z

I think we still need to discuss if we are ok that all the force-close resolvments are now a bit delayed because we will not trigger the onchain resolvment immediately now in case the channel has a big close confirmation depth. I think we need to still resolve this question before merging this one.

gijswijs

This PR was actually simpler to grok than I had anticipated. I like the general approach, but I have made some comments throughout.

My main questions are the following:

The confirmation delay before dispatching close events could create a tight window for HTLC resolution in certain scenarios. Can we afford to wait for 6 confirmations in each and every scenario?
Is the juice worth the squeeze to broadcast the HTLC sweeping transactions directly when we see the closing transaction on-chain, but then also handle a reorg that invalidates both the HTLC transactions and the closing transaction?
Are other transactions still vulnerable to reorgs? For example, what happens if an HTLC sweep transaction gets reorged out, but the closing transaction itself remains confirmed?

Also, as is noted in the PR itself, deep (>=N) reorgs are not fully handled. Consider creating a follow-up issue to track further re-org work.

Last but not least, there are some issues with the commit messages:

Typo in commit message 5c114a0: deeploy -> deeply
Typo in commit message 18f5440: dtect -> detect
Omission in commit message 2b51d47: Mention the changes to lnd_funding_test.go

gijswijs · 2025-12-19T11:03:58Z

lnwallet/confscale.go

+	// Calculate total stake: channel amount + push amount. The push amount
+	// represents value at risk for the receiver.
+	maxChannelSizeMsat := lnwire.NewMSatFromSatoshis(maxChannelSize)
+	stake := lnwire.NewMSatFromSatoshis(chanAmt) + pushAmt


Maybe I'm not understanding this correctly, but shouldn't the the sign be switched here?
stake := lnwire.NewMSatFromSatoshis(chanAmt) - pushAmt

What's at stake for me (as the funder) is the total capacity of the channel minus the amount pushed, because that's what's already for my channel peer (maybe a payment for a service upfront)

The godoc comment also defines the push amount as an additional risk, which confuses me as well. TO me it looks like you're double counting pushAmt here and could possibly exceed the total channel capacity.

I think this is calculating the confs from the fundee's perspective tho. For the funder, we only care about chanAmt, not the pushAmt.

Ok, I think it makes more sense now. We scale by the chanAmt (bigger channels -> more confs) and the pushAmt is a additional factor that increases the number of confs, because that's money at stake for us (the fundee). So a small channel with a large pushAmt would get the same amount of confs as a bigger channel with no pushAmt.

server.go

gijswijs · 2025-12-19T14:17:11Z

lnwallet/confscale_test.go

+
+	// The function should be deterministic, always returning the same
+	// output for the same input values.
+	t.Run("determinism", func(t *testing.T) {


Given the func we are testing here, testing for determinism seems like LLM overkill to me. I mean, it doesn't hurt, but it does feel a bit wasteful is what I'm saying.

Happy to remove.

Yeah, remove this test. It's better to be concise.

contractcourt/chain_arbitrator.go

lnwallet/confscale_test.go

contractcourt/chain_watcher_coop_reorg_test.go

gijswijs · 2025-12-22T16:41:27Z

itest/lnd_coop_close_rbf_test.go

+// testCoopCloseRBFWithReorg tests that when a cooperative close transaction
+// is reorganized out during confirmation waiting, the system properly handles
+// RBF replacements and re-registration for any spend of the funding output.
+func testCoopCloseRBFWithReorg(ht *lntest.HarnessTest) {


Should we also test the following scenario?
The initialCloseTx gets mined first and then re-orged out, and the second time around the firstRbfTx gets mined.

I can tack that on as well.

contractcourt/chain_watcher.go

Roasbeef · 2025-12-22T22:40:38Z

I think we still need to discuss if we are ok that all the force-close resolvments are now a bit delayed because we will not trigger the onchain resolvment immediately now in case the channel has a big close confirmation depth. I think we need to still resolve this question before merging this one.

With this PR, 6 is the max. We can compensate for this by either increasing the CLTV delta we advertise (more time to sweep), or we tweak the IncomingBroadcastDelta params:

lnd/contractcourt/chain_arbitrator.go

Lines 63 to 76 in 013f5c9

    
           // IncomingBroadcastDelta is the delta that we'll use to decide when to 
        
           // broadcast our commitment transaction if we have incoming htlcs. This 
        
           // value should be set based on our current fee estimation of the 
        
           // commitment transaction. We use this to determine when we should 
        
           // broadcast instead of just the HTLC timeout, as we want to ensure 
        
           // that the commitment transaction is already confirmed, by the time the 
        
           // HTLC expires. Otherwise we may end up not settling the htlc on-chain 
        
           // because the other party managed to time it out. 
        
           IncomingBroadcastDelta uint32 
        
           // OutgoingBroadcastDelta is the delta that we'll use to decide when to 
        
           // broadcast our commitment transaction if there are active outgoing 
        
           // htlcs. This value can be lower than the incoming broadcast delta. 
        
           OutgoingBroadcastDelta uint32

.

Roasbeef · 2025-12-22T22:50:16Z

Is the juice worth the squeeze to broadcast the HTLC sweeping transactions directly when we see the closing transaction on-chain, but then also handle a reorg that invalidates both the HTLC transactions and the closing transaction?

By this you mean start to sweep immediately, but then add logic to detect re-orgs to try again? One complication with this route is that it's possible for different commitment transaction to be actually confirmed post re-org. Imagine that both parties force close at the same time, Alice wins initially, but then after the re-org Bob wins. If Alice started to sweep asap, then at this point she's written the entirely wrong set of close summaries and resolvers to disk.

IMO it's simpler to just wait for the transaction to be buried sufficiently before trying to sweep.

Are other transactions still vulnerable to reorgs? For example, what happens if an HTLC sweep transaction gets reorged out, but the closing transaction itself remains confirmed?

Yes, all transaction are. This PR operates under a model where we just wait for "long enough" and don't have to worry about a transaction being re-org'd out entirely. If we adopt that model for the sake of this scenario, then we can be sure that the closing transaction that produced the HTLCs being swept stays confirmed in the chain. With that, there's two cases:

Direct HTLC sweeps. Here, if we assume that the sweep gets confirmed again, then we're ok.
* It may be possible that post restart, the sweeper sweeps the HTLC, but in another batch. The sweeper still stores minimal state, so nothing should get tangled here.
Second level HTLC sweeps. The second level transaction may get re-org'd out after the sweep confirms. In this case, there's a 2-of-2 multi-sig pre-signed transaction, so we know that this second level transaction can't change. However, it might be aggregated in a distinct transaction by the sweeper.

For scenario 2 above, the second level transactions already have a CSV delay, so we actually can't sweep the HTLC outputs themselves until N blocks have passed.

morehouse · 2025-12-22T23:06:12Z

With this PR, 6 is the max. We can compensate for this by either increasing the CLTV delta we advertise (more time to sweep), or we tweak the IncomingBroadcastDelta params:

Yeah, briefly discussed this with @gijswijs today. The default incoming delta is 10. Seems quite risky to waste 60% of that window before attempting to claim HTLCs.

Security risks aside, a deadline that short will likely cause huge overpayment of fees because of the steep fee function.

Roasbeef · 2025-12-23T03:14:19Z

The default incoming delta is 10. Seems quite risky to waste 60% of that window before attempting to claim HTLCs.

FWIW this delta is only a factor when we actually know the pre-image. But yeah you're right, without further changes in this PR, the default of 10 blocks (so the number of blocks before the incoming expiry that we'll _broadcast our commitment) is really more like 4 blocks for channel past the wumbo range.

So perhaps this is an argument for increasing this value (independent of this change)? Right now we're in fees-dont-matter zone, but we typically prefer to adopt more pessimistic assumptions.

Security risks aside, a deadline that short will likely cause huge overpayment of fees because of the steep fee function.

Once the timelock has expired on an incoming HTLC, what is a reasonable amount to spend on fees to ensure that we don't lose funds due to a one sided redemption?

ziggie1984 · 2025-12-23T08:45:08Z

I think we might consider the following approach:

Most of the reorgs currently happening are of 1 depth, so lets creating a config value for the reorg-depth protection which allows the user to select a protection either a constant value or a scaled approach like implemented in this PR. In doing so we would at least give the node runner the possibility to opt-out or scale the depth down to 1 (most likely). Having only 1 reorg depth is not a huge difference in terms of user experience. But people who wanna be more secure can go higher or opt-into the scaled capacity approach.

FWIW this delta is only a factor when we actually know the pre-image. But yeah you're right, without further changes in this PR, the default of 10 blocks (so the number of blocks before the incoming expiry that we'll _broadcast our commitment) is really more like 4 blocks for channel past the wumbo range.

yes I think we should increase this value because if we have the preimage we should not play around and try to immeditaly resolve this HTLC, if the peer is offline we go onchain, so maybe something like 20 range, but we need to be aware what the min CLTV delta is so that we still enforce that our channels enforce a value greater than this IncomingBroadcastDelta value.

should we also increase the value then when we go onchain for outgoing HTLCs ?
Create a followup PR which allows us to use the funds immediately and also sweep the funds immediately and continuing watching the chain until reorg depth is reached and act accordingly when we see a reorg and another tx confirm. This should land quite fast after this one is merged.

morehouse · 2025-12-23T16:32:40Z

But yeah you're right, without further changes in this PR, the default of 10 blocks (so the number of blocks before the incoming expiry that we'll _broadcast our commitment) is really more like 4 blocks for channel past the wumbo range.

It may actually be worse than this -- if the commitment doesn't confirm right away we have fewer than 4 blocks to get the HTLC transaction confirmed.

IIRC the sweeper cuts the deadline in half for the commitment confirmation. If the commitment doesn't actually confirm until that half-deadline, there'd be 5 blocks left to claim the HTLC. By adding a 6 block reorg delay after that, we forfeit the HTLC.

Once the timelock has expired on an incoming HTLC, what is a reasonable amount to spend on fees to ensure that we don't lose funds due to a one sided redemption?

I'd be willing to spend nearly the full value of the HTLC in that case. But if we let the timelock expire, we've lost already, so really we need to prevent that from happening.

The slope of the fee function depends on how many blocks the sweeper has to get the transaction confirmed. If we only give it 4 blocks or less, the slope becomes very steep, and a single fee bump could cause us to unnecessarily burn a significant percentage of the HTLC.

So perhaps this is an argument for increasing this value (independent of this change)?

Maybe? If you do make this change, it definitely needs to be increased though.

lightninglabs-deploy · 2025-12-30T17:06:33Z

@Roasbeef, remember to re-request review from reviewers when ready

Roasbeef · 2026-01-06T00:35:55Z

Most of the reorgs currently happening are of 1 depth, so lets creating a config value for the reorg-depth protection which allows the user to select a protection either a constant value or a scaled approach like implemented in this PR

I don't think this is really something we want to expose via routine config.

As discussed, we can modify the channel amount based scaling for confs to something more relevant to the current state of the network and Bitcoin. Writing up a script to perform some analysis before shifting the values.

Create a followup PR which allows us to use the funds immediately and also sweep the funds immediately and continuing watching the chain until reorg depth is reached and act accordingly when we see a reorg and another tx confirm. This should land quite fast after this one is merged.

Agree this makes sense as a follow up, but I wouldn't underestimate the amount of scope creep that'll start to pop up during implementation. I think we'll want to take even more time than we have with this PR with that more ambitious version to make sure the implementation is sound.

Roasbeef · 2026-01-06T00:38:39Z

It may actually be worse than this -- if the commitment doesn't confirm right away we have fewer than 4 blocks to get the HTLC transaction confirmed.
If we only give it 4 blocks or less, the slope becomes very steep, and a single fee bump could cause us to unnecessarily burn a significant percentage of the HTLC.

Isn't this already the case? Focusing on the current behavior, rather than the current constant params, we go to chain N (10 right now) blocks before expiry. If the commitment transaction doesn't confirm ASAP, then we consciously make the tradeoff to utilize the entire fee budget to avoid loss of funds.

If you do make this change,

Do you have alternative ideas? Do you favor the sketch above to sweep asap, then handle the cross-system rollback/update if/when a re-org is detected?

Roasbeef · 2026-01-06T00:42:23Z

Another thought: we should consider doing away with the static go-to-chain buffer values. Instead, we can make this dynamic based on the current perceived fee pressure and subjective confirmation delays. Right now 10 blocks is likely more than enough, even at 1 sat/vb, that hasn't historically always been the case, so we shouldn't anchor to the existing heuristics.

We'll use this for all the upcoming tests.

All the tests need to send a confirmation _after_ the spend is detected now.

This set of new tests ensures that if have created N RBF variants of the coop close transaction, that any of then can confirm, and be re-org'd, with us detecting the final spend once it confirms deeploy enough.

In this commit, we add a set of generic close re-org tests. The most important test is the property based test, they will randomly confirm transactions, generate a re-org, then assert that eventually we dtect the final version.

This ensures that during the RBF process, if one confirms, a re-org occurs, then another confirms, that we'll properly detect this case.

…oses In this commit, we add a fast-path optimization to the chain watcher's closeObserver that immediately dispatches close events when only a single confirmation is required (numConfs == 1). This addresses a timing issue with integration tests that were designed around the old synchronous blockbeat behavior, where close events were dispatched immediately upon spend detection. The recent async confirmation architecture (introduced in commit f6f716a) properly handles reorgs by waiting for N confirmations before dispatching close events. However, this created a race condition in integration tests that mine blocks synchronously and expect immediate close notifications. With the build tag setting numConfs to 1 for itests, the async confirmation notification could arrive after the test already started waiting for the close event, causing timeouts. We introduce a new handleSpendDispatch method that checks if numConfs == 1 and, if so, immediately calls handleCommitSpend to dispatch the close event synchronously, then returns true to skip the async state machine. This preserves the old behavior for integration tests while maintaining the full async reorg protection for production (where numConfs >= 3). The implementation adds the fast-path check in both spend detection paths (blockbeat and spend notification) to ensure consistent behavior regardless of which detects the spend first. We also update the affected unit tests to remove their expectation of confirmation registration, since the fast-path bypasses that step entirely. This approach optimizes for the integration test scenario without compromising production safety, as the fast-path only activates when a single confirmation is sufficient - a configuration that only exists in the controlled test environment.

With this change, we'll go to chain even earlier to ensure that we have enough time to sweep a potentially contested HTLC, now that we're waiting longer before sweeps to ensure that the commitment transaction is sufficeitnyl burried before we sweep.

This increases the minimum CLTV delta allowed for invoice creation to provide more headroom above DefaultFinalCltvRejectDelta (19 blocks). The previous value of 18 was below the reject threshold, which could allow users to create invoices with CLTV deltas that would be rejected when receiving payments.

ziggie1984

LGTM, pending CI

gijswijs

Ok so I think we have found our sensible defaults.

Effective sweep window (worst case, WUMBO channel):

Before: 10 blocks broadcast delta - 0 reorg wait = 10 blocks
After: 16 blocks broadcast delta - 6 reorg wait = 10 blocks

Same effective time to get HTLC sweeps confirmed. And in the case of smaller channels you get 3 blocks extra time.

I do have some nits and comments. The most important one relates to the error handling of processDetectedSpend

gijswijs · 2026-01-16T11:31:19Z

contractcourt/chain_watcher_coop_reorg_test.go

+	closeInfo := harness.runMultipleReorgFlow(txs, reorgDepths)
+
+	// Assert that the final transaction was confirmed.
+	harness.assertCoopCloseTx(closeInfo, txs[3])


I was thinking that this test doesn't mimic real world behavior. In reality

Coop close tx0 confirms, then gets reorged out

tx0 returns to the mempool (it's still valid - same inputs)

A new tx1 is broadcast (perhaps with different fees)

Both tx0 and tx1 are now competing to spend the same funding output

Miners pick one non-deterministically (usually higher fee wins, but not guaranteed)

So the test is somewhat idealized. Luckily the ChainWatcher is agnostic to which specific coop close tx confirms.

But in the real world, the "winner" after a reorg is non-deterministic due to mempool racing. 🚗

I was thinking that this test doesn't mimic real world behavior. In reality

So you would rather it picked which transaction was mined randomly? I think without loss of generality, this test covers that case as the order doesn't matter in the end. The itest also exercises this as well.

Sorry, I should have added that it didn't warrant a change of the code. I was just thinking out loud here.

contractcourt/chain_watcher_coop_reorg_test.go

contractcourt/chain_watcher.go

gijswijs · 2026-01-16T18:00:07Z

contractcourt/chain_watcher.go

+// For single-confirmation mode (numConfs == 1), it immediately dispatches the
+// close event and returns empty result. For multi-confirmation mode, it
+// registers for confirmations and returns the new pending state.
+func (c *chainWatcher) processDetectedSpend(


Cool, you put this in its own func. Makes sense, less indentation as well. 👍

contractcourt/chain_watcher.go

gijswijs · 2026-01-16T18:22:59Z

contractcourt/chain_watcher.go

+	currentConfNtfn *chainntnfs.ConfirmationEvent) spendProcessResult {
+
+	// FAST PATH: Single confirmation mode dispatches immediately.
+	if c.handleSpendDispatch(spend, source) {


Shouldn't we check for duplicate spend detection here as well?

FWIW this fast dispatch case is only ever active in the itests. By default for prod builds, we'll never dispatch after a single confirmation.

gijswijs · 2026-01-16T18:26:00Z

lnwallet/confscale_test.go

+
+	// The function should be deterministic, always returning the same
+	// output for the same input values.
+	t.Run("determinism", func(t *testing.T) {


Yeah, remove this test. It's better to be concise.

gijswijs · 2026-01-16T18:38:54Z

contractcourt/chain_watcher.go

+	if err != nil {
+		log.Errorf("Unable to register confirmations: %v", err)
+
+		return spendProcessResult{
+			pendingSpend: currentPendingSpend,
+			confNtfn:     currentConfNtfn,
+		}
+	}


I think this error handling here is problematic.

Case 1, if the first spend is detected but we err out here we return {pendingSpend: nil, confNtfn: nil}, putting us back in state "None". But the spend already happened and is in a block. The spendNtfn channel won't fire again for the same spend, so we're now stuck waiting for a notification that will never come.

In the second case where we a different spend is detected, we cancelled the old confNtfn, then return it here as if it's still valid.

We should probably handle registration failure better.

Hmm, we have a few options here:

Retry the conf registration in a loop w/ back off.

Register for another spend (which should insta fire), which will then create another type of loop here.

Will prototype both to see how they feel.

Roasbeef · 2026-01-16T22:48:15Z

Will follow up on @gijswijs comments in a new PR.

github-actions · 2026-01-16T22:49:06Z

Created backport PR for v0.20.x-branch:

[v0.20.x-branch] Backport #10331: multi: update close logic to handle re-orgs of depth n-1, where n is num confs - add min conf floor #10510 with remaining conflicts!

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-10331-to-v0.20.x-branch
git worktree add --checkout .worktree/backport-10331-to-v0.20.x-branch backport-10331-to-v0.20.x-branch
cd .worktree/backport-10331-to-v0.20.x-branch
git reset --hard HEAD^
git cherry-pick -x c0f48d23b494430a1acbf17a9ac6ef0f81e8f273 6553b61aa4a7fa70da0e1b17d93cd660e8ceb87d 12668b8af662bfac2baca3f2398f9271809c626a 0f59756b4061bef93e3d80962fe53693d9f8d465 ce779a9707110da87dc96ad1f934795fa39eaa5c e0f7b0a03350fc0f05992a4ff61fb31320ef3a80 b6b4cefd08fd8677d5699c0a0ca8c6d66d0eba3f 8f5656e6e28e206bc065ea8cdfabfbc9aef55429 bb781bbe84b7d78a07318ea9a1ba53cd85e7c014 ebefe63f8e24d78ba74d479195e1334cd0b85811 ab4f89128f1d38451abab10f7f349a2f504b2db6 fa8ba6f81c630888565bb04d771aea3f915db833 4aef4d00ac21da3b15a5414af8e02d36c0291122 4bce4ebe037dafc13a383ec6c80d7da1af168646 3fb59b1b6bafe8636723e29ab0d62a8a7db38dd4
git push --force-with-lease

…20.x-branch [v0.20.x-branch] Backport #10331: multi: update close logic to handle re-orgs of depth n-1, where n is num confs - add min conf floor

[v0.20.x-branch] Revert backport #10331: re-org aware channel close logic

In this commit, we move the release note for the improved confirmation scaling for cooperative closes (PR lightningnetwork#10331) from the 0.20.1 release notes to 0.21.0, where the change actually landed.

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

contractcourt/chain_watcher.go Outdated Show resolved Hide resolved

Roasbeef force-pushed the cnct-reorg-v2 branch from eb12ec4 to 8b565d6 Compare October 30, 2025 23:49

yyforyongyu reviewed Nov 4, 2025

View reviewed changes

saubyk assigned Roasbeef Nov 4, 2025

Roasbeef added this to the v0.20.1 milestone Nov 19, 2025

saubyk requested a review from gijswijs November 25, 2025 17:44

Roasbeef changed the title ~~Cnct reorg v2~~ multi: update close logic to handle re-orgs of depth n-1, where n is num confs - add min conf floor Nov 26, 2025

Roasbeef mentioned this pull request Nov 26, 2025

multi: update close logic to handle re-orgs of depth n-1, where n is num confs - add min conf floor #10265

Closed

saubyk added this to lnd v0.20 Nov 27, 2025

saubyk moved this to In review in lnd v0.20 Nov 27, 2025

Roasbeef force-pushed the cnct-reorg-v2 branch 3 times, most recently from d9d85cf to 3970b87 Compare December 16, 2025 00:52

yyforyongyu requested changes Dec 17, 2025

View reviewed changes

yyforyongyu requested changes Dec 19, 2025

View reviewed changes

contractcourt/chain_watcher_coop_reorg_test.go Outdated Show resolved Hide resolved

rpcserver.go Outdated Show resolved Hide resolved

ziggie1984 reviewed Dec 20, 2025

View reviewed changes

gijswijs requested changes Dec 22, 2025

View reviewed changes

Roasbeef force-pushed the cnct-reorg-v2 branch from e4ad3c0 to ca12dc8 Compare December 23, 2025 03:52

Roasbeef added 11 commits January 15, 2026 16:22

lntest: add new wait for conf helper method to ChainNotifier

0f59756

contractcourt: add new chainWatcherTestHarness

ce779a9

We'll use this for all the upcoming tests.

contractcourt: update existing chain watcher tests due to new logic

e0f7b0a

All the tests need to send a confirmation _after_ the spend is detected now.

contractcourt: add unit tests for rbf re-org cases

b6b4cef

This set of new tests ensures that if have created N RBF variants of the coop close transaction, that any of then can confirm, and be re-org'd, with us detecting the final spend once it confirms deeploy enough.

contractcourt: add generic close re-org tests

8f5656e

In this commit, we add a set of generic close re-org tests. The most important test is the property based test, they will randomly confirm transactions, generate a re-org, then assert that eventually we dtect the final version.

itest: add new coop close rbf itest

bb781bb

This ensures that during the RBF process, if one confirms, a re-org occurs, then another confirms, that we'll properly detect this case.

contractcourt: unify+simplify new re-org aware logic

ab4f891

docs/release-notes: add release notes entry

4aef4d0

Roasbeef force-pushed the cnct-reorg-v2 branch from dca27ec to a3f4b5b Compare January 16, 2026 00:22

ziggie1984 approved these changes Jan 16, 2026

View reviewed changes

gijswijs requested changes Jan 16, 2026

View reviewed changes

Roasbeef force-pushed the cnct-reorg-v2 branch from a3f4b5b to cfcdd05 Compare January 16, 2026 20:42

multi: increase min cltv delta to 24

3fb59b1

Roasbeef force-pushed the cnct-reorg-v2 branch from cfcdd05 to 3fb59b1 Compare January 16, 2026 21:06

Roasbeef added the backport-v0.20.x-branch This label is used to trigger the creation of a backport PR to the branch `v0.20.x-branch`. label Jan 16, 2026

Roasbeef merged commit aaaf235 into lightningnetwork:master Jan 16, 2026
38 of 40 checks passed

github-project-automation bot moved this from In review to Done in lnd v0.20 Jan 16, 2026

github-actions bot mentioned this pull request Jan 16, 2026

[v0.20.x-branch] Backport #10331: multi: update close logic to handle re-orgs of depth n-1, where n is num confs - add min conf floor #10510

Merged

Roasbeef mentioned this pull request Jan 26, 2026

[v0.20.x-branch] Revert backport #10331: re-org aware channel close logic #10523

Merged

Roasbeef added a commit that referenced this pull request Jan 26, 2026

Merge pull request #10523 from lightningnetwork/revert-10331-backport

60bc0f3

[v0.20.x-branch] Revert backport #10331: re-org aware channel close logic

darioAnongba mentioned this pull request Jan 30, 2026

itest: bump to latest LND lightninglabs/lightning-terminal#1219

Closed

ziggie1984 mentioned this pull request Mar 13, 2026

docs: move coop close confirmation scaling note to 0.21.0 #10643

Merged

	// and call waitForCoopCloseConfirmation (which blocks).
	// This will trigger handleCommitSpend after the required confirmations
	// are met (or immediately in single-conf scenarios).

-				continue
+                if confNtfn != nil {
+                    confNtfn.Cancel()
+                    confNtfn = nil
+                }
+                pendingSpend = nil
+                continue

Conversation

Roasbeef commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Oct 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yyforyongyu left a comment

Choose a reason for hiding this comment

Uh oh!

yyforyongyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yyforyongyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ziggie1984 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Roasbeef commented Oct 30, 2025 •

edited

Loading