Matt Morehouse

LND: Replacement Stalling Attack

2025-12-04T00:00:00-00:00

A vulnerability in LND versions 0.18.5 and below allows attackers to steal node funds. Users should immediately upgrade to LND 0.19.0 or later to protect their funds.

Background

LND has a sweeper subsystem for managing transaction batching and fee bumping. When a Lightning channel is force closed, the sweeper kicks into action, collecting HTLC inputs into batched claim transactions to save on mining fees. The sweeper then periodically bumps the fees paid by those transactions until the transactions confirm.

It is critical that the sweeper gets certain HTLC transactions confirmed before the corresponding upstream HTLCs expire, or else the value of those HTLCs can be completely lost. For this reason, a fairly aggressive default fee-bumping strategy is used, and as upstream HTLC deadlines approach, the sweeper is willing to spend up to half the value of those HTLCs in mining fees.

Sweeper Weaknesses

LND’s aggressive fee bumping could be thwarted, however, due to a couple weaknesses in the sweeper.

Fee Resets on Reaggregation

If an input to a batched transaction was double-spent by someone else, the sweeper would regroup the remaining inputs into a new transaction and reset the fees paid by that transaction to the minimum value of the fee function. If this happened many times, the sweeper would end up broadcasting transactions with much lower fees than intended, and upstream deadlines could be missed.

Broadcast Delays

Additionally, the regrouping of inputs after a double spend would be delayed until the next block confirmed. So if lots of double spends happened, the sweeper would miss out on 50% of the available opportunities to get its time-sensitive transactions confirmed. Once again, this could cause upstream deadlines to be missed and funds to be lost.

A Basic Replacement Stalling Attack

An attacker could take advantage of these sweeper weaknesses to steal funds. The basic idea is to cause the sweeper to batch many HTLC inputs together, then repeatedly double spend those inputs, causing the sweeper to keep regrouping the remaining inputs into new transactions. Each double spend prevents the sweeper’s transaction from confirming for at least 2 blocks, while also resetting the fees paid by the next sweeper transaction to the minimum, so future double spends remain cheap. After upstream HTLC timelocks expire, all remaining HTLCs could be stolen.

An attack would look like this:

The attacker opens a direct channel to the victim and routes ~40 HTLCs to themselves through the victim, using the minimum CLTV delta the victim allows (80 blocks by default). The attacker intends to steal the last HTLC, so they make that one as large as possible.
The attacker holds the HTLCs until they expire and the victim force closes the channel to reclaim them. At this point, the 80-block countdown to the upstream deadline starts, and the attacker needs to stall the victim for that long to steal funds.
Because all 40 of the attacker’s HTLCs have the same upstream deadline, the victim’s sweeper batches all 40 HTLC-Timeouts into a single transaction and broadcasts it.
The attacker sees the batched transaction in their mempool and immediately replaces the transaction with a preimage spend for one of the 40 HTLCs.
The double-spend confirms, and the victim is able to extract the HTLC preimage and settle the corresponding upstream HTLC, but the remaining 39 HTLC-Timeouts are not reaggregated until another block confirms (see the section “Broadcast Delays” above).
Another block confirms, and the victim broadcasts a new transaction containing the remaining HTLC-Timeouts. The fees for this transaction are reset to the minimum value of the fee function. The attacker repeats the process from Step 4, double-spending a new HTLC each time until the upstream deadline has passed.
The attacker steals the remaining HTLC(s) by claiming the preimage path downstream and the timeout path upstream.

Attack Cost

In the worst case for the attacker, they must do ~40 replacements, each spending more total fees than the replaced batched transaction. We can calculate the fees of each batched HTLC-Timeout transaction as size * feerate, where size and feerate are estimated as follows:

size: num_htlcs * 166.5 vB
feerate: minimum value of LND’s fee function. By default, this is the value returned by bitcoind’s estimatesmartfee RPC.

Today, estimatesmartfee returns feerates between 0.7 sat/vB and 2.1 sat/vB depending on the confirmation target. To simplify calculations, we assume an average feerate of 1.4 sat/vB over the course of the attack. We also assume on average there are 20 HTLCs present on the batched transaction, since it starts with 40 HTLCs and decreases by 1 every 2 blocks until a single HTLC remains. With these simplifying assumptions, we get a rough cost as follows:

average cost per replacement: 20 HTLCs * 166.5 vB/HTLC * 1.4 sat/vB = 4,662 sat
total attack cost: 40 replacements * 4,662 sat/replacement = 186,480 sat

So for less than 200k sats, the attacker can steal essentially the entire channel capacity.

Optimizations

In practice, the cost of the attack is even less, since the attacker’s double spends may not confirm in the first available block, which means fewer than 40 double spends actually need to confirm. The attacker can also intentionally reduce the probability of confirmation by inflating the size of their double-spend transactions to the maximum possible while still replacing the victim’s transactions.

Additionally, a smart attacker, knowing they need fewer double spends to confirm, can reduce the number of HTLCs they route at the start of the attack. As a result, the victim’s batched transactions become smaller and the attacker can save on replacement fees.

For example, suppose the attacker can stall for 80 blocks with only 30 double spends. Then the cost of the attack is reduced by over 40%:

average cost per replacement: 15 HTLCs * 166.5 vB/HTLC * 1.4 sat/vB = 3,497 sat
total attack cost: 30 replacements * 3,497 sat/replacement = 104,910 sat

Mitigation

Changes were made in LND 0.19.0 that eliminated the reaggregation delay and the fee function reset.

These changes, combined with the sweeper’s aggressive default fee function, ensure that any replacement stalling attack costs many times more than the amount that can be stolen.

Discovery

This attack vector was discovered during code review of LND’s sweeper rewrite in May 2024.

Timeline

2024-05-09: Attack vector reported to the LND security mailing list.
2025-01-16: No progress on a mitigation. Reported the fee reset weakness publicly and followed up on the security mailing list.
2025-02-21: Mitigation merged.
2025-05-22: LND 0.19.0 released containing the fix.
2025-10-31: Agreement to disclose publicly after LND 0.20.0 was released.
2025-12-04: Public disclosure.

Prevention

This vulnerability was introduced during LND’s sweeper rewrite in May 2024, and I reported it before LND 0.18.0 was released containing the vulnerability. In my report, I suggested that the new sweeper be released in 0.18.0 and this vulnerability be fixed in 0.18.1, since a mitigation would require some work and the new sweeper already fixed several other vulnerabilities. Unfortunately that didn’t happen, and this vulnerability went unaddressed until I followed up again in 2025.

In hindsight, I should have done a better job at keeping the LND team accountable. I could have reported the vulnerability publicly, thereby forcing the issue to be addressed before the 0.18.0 release. The downside is that this would have delayed other important security fixes to the sweeper subsystem.

Alternatively, I could have reported the vulnerability privately (as I did) but given the LND team a deadline (say, 6 months) after which I would disclose the vulnerability publicly regardless of whether they mitigated it. This may have applied enough pressure to get the issue fixed in 0.18.1 as I originally intended.

Takeaways

Set disclosure deadlines to improve security outcomes.
Users should keep their node software updated.

LND: Replacement Stalling Attack was originally published by Matt Morehouse at Matt Morehouse on December 04, 2025.

LND: Infinite Inbox DoS

2025-12-04T00:00:00-00:00

LND 0.18.5 and below are vulnerable to a denial-of-service (DoS) attack that causes LND to run out of memory (OOM) and crash or hang. Users should upgrade to at least LND 0.19.0 to protect their nodes.

The Infinite Inbox Vulnerability

When LND receives a message from one of its peers, a dedicated dispatcher thread queues the message for processing by the appropriate subsystem. For two such subsystems (the gossiper and the channel link), up to 1,000 messages could be queued per peer. Since Lightning protocol messages can be up to 64 KB in size, and since LND allowed as many peers as there were available file descriptors, memory could be exhausted quickly.

The DoS Attack

A simple, free way to exploit the vulnerability was to open multiple connections to the victim and spam query_short_channel_ids messages of size 64 KB, keeping the connections open until LND ran out of memory.

In my experiments against an LND node with 8 GB of RAM, I was able to cause an OOM in under 5 minutes.

The Mitigation

The vulnerability was mitigated by reducing queue sizes and introducing a new “peer access manager” to limit peer connections. Starting in LND 0.19.0, queue sizes are reduced to 50 messages and no more than 100 connections are allowed from peers without open channels.

Discovery

This vulnerability was discovered while examining how LND handles various peer messages.

Timeline

2023-09-15: Vulnerability reported to the LND security mailing list.
2025-03-12: Mitigation merged.
2025-05-22: LND 0.19.0 released containing the fix.
2025-10-31: Agreement on public disclosure after LND 0.20.0 is released.
2025-12-04: Public disclosure.

Takeaways

More investment in Lightning security is needed.
Users should keep their node software updated.

LND: Infinite Inbox DoS was originally published by Matt Morehouse at Matt Morehouse on December 04, 2025.

LND: Excessive Failback Exploit #2

2025-12-04T00:00:00-00:00

A variant of the excessive failback exploit disclosed earlier this year affects LND versions 0.18.5 and below, allowing attackers to steal node funds. Users should immediately upgrade to LND 0.19.0 or later to protect their funds.

The Excessive Failback Bug Revisited

As described in the previous disclosure, the original excessive failback bug existed in LND versions 0.17.5 and earlier. Essentially, when one of LND’s channel peers force closed the channel, LND would mark any HTLCs missing from the confirmed commitment as “failed” in the database, even if the HTLC had actually succeeded with the downstream peer. If LND then restarted before the corresponding upstream HTLC was resolved, LND would incorrectly fail that HTLC with the upstream peer. Both the upstream and downstream peers would be able to claim the HTLC, and LND would be left with a loss.

The Variant Bug

While a fix for the original excessive failback bug was included in LND 0.18.0, a minor variant of the bug remained when the channel was force closed using LND’s commitment instead of the attacker’s. In other words, the exact same attack was still possible if the attacker got the victim to force close the channel themselves. Unfortunately this is very easy to do; the attacker could simply send the victim an error message.

The Fix

The excessive failback bug variant was quietly fixed in the same way as the original bug, and the fix was included in the LND 0.19.0 release.

Discovery

This variant was discovered shortly after the original disclosure, while I was updating BOLT 5 to prevent future excessive failback vulnerabilities. I realized there were actually two cases that needed to be updated in BOLT 5, but only one of the cases had been patched in LND.

Timeline

2025-03-04: Public disclosure of the original excessive failback vulnerability.
2025-03-04: BOLT 5 update drafted; variant discovered.
2025-03-05: Variant reported to the LND security mailing list.
2025-03-20: Fix merged.
2025-05-22: LND 0.19.0 released containing the fix.
2025-10-31: Agreement to disclose publicly after LND 0.20.0 was released.
2025-12-04: Public disclosure.

Prevention

In the previous disclosure post, I suggested that the excessive failback bug could have been prevented if the BOLT 5 specification had been clearer about how to handle HTLCs missing from confirmed commitment transactions. At the time, some Lightning maintainers were skeptical that a clearer specification would have helped.

But this variant of the bug was only discovered when I actually went and clarified BOLT 5 myself! I think this is strong evidence that a clearer specification could have prevented both variants of the bug.

A Note on Collaboration

As I noted in the previous excessive failback disclosure, it seems that at some point every Lightning implementation independently discovered and fixed bugs similar to the excessive failback bug in LND. Yet no one (including LND) thought to update the specification to help others avoid such bugs in the future.

When I finally did update the specification, good things happened. This variant of the excessive failback bug was discovered and fixed in LND. But I also noticed that Eclair might have been vulnerable to this variant and reached out to Bastien Teinturier. While it turned out that Eclair was not vulnerable, the discussion with Bastien led to the accidental discovery of a different serious vulnerability in Eclair.

This all happened from just a tiny bit of collaboration: a specification update for the common good and a short conversation with Bastien. In many ways, it is quite unfortunate that Lightning engineering talent is spread out over so many implementations. Everyone focuses on their own code first, and collaboration is secondary. Efforts are duplicated and lessons are learned multiple times. Imagine what we could accomplish with a little more cooperation.

Takeaways

Clear specifications benefit all Lightning implementations.
We should do more cross-implementation collaboration.
Users should keep their node software updated.

LND: Excessive Failback Exploit #2 was originally published by Matt Morehouse at Matt Morehouse on December 04, 2025.

Eclair: Preimage Extraction Exploit

2025-09-23T00:00:00-00:00

A critical vulnerability in Eclair versions 0.11.0 and below allows attackers to steal node funds. Users should immediately upgrade to Eclair 0.12.0 or later to protect their funds.

Background

In the Lightning Network, nodes forward payments using contracts called HTLCs (Hash Time-Locked Contracts). To settle a payment, the final recipient reveals a secret piece of data called a preimage. This preimage is passed backward along the payment route, allowing each node to claim their funds from the previous node.

If a channel is forced to close, these settlements can happen on the Bitcoin blockchain. Nodes must watch the blockchain to spot these preimages so they can claim their own funds.

The Preimage Extraction Vulnerability

The vulnerability in Eclair existed in how it monitored the blockchain for preimages during a force close. Eclair would only check for HTLCs that existed in its local commitment transaction — its own current version of the channel’s state. The code incorrectly assumed this local state would always contain a complete list of all possible HTLCs.

However, a malicious channel partner could broadcast an older, but still valid, commitment transaction. This older state could contain an HTLC that the victim’s node had already removed from its own local state. When the attacker claimed this HTLC on-chain with a preimage, the victim’s Eclair node would ignore it because the HTLC wasn’t in its local records, causing the victim to lose the funds.

The original code snippet illustrates the issue:

def extractPreimages(localCommit: LocalCommit, tx: Transaction)(implicit log: LoggingAdapter): Set[(UpdateAddHtlc, ByteVector32)] = {
  // ... (code omitted that extracts htlcSuccess and claimHtlcSuccess preimages from tx)
  val paymentPreimages = (htlcSuccess ++ claimHtlcSuccess).toSet
  paymentPreimages.flatMap { paymentPreimage =>
    // we only consider htlcs in our local commitment, because we only care about outgoing htlcs, which disappear first in the remote commitment
    // if an outgoing htlc is in the remote commitment, then:
    // - either it is in the local commitment (it was never fulfilled)
    // - or we have already received the fulfill and forwarded it upstream
    localCommit.spec.htlcs.collect {
      case OutgoingHtlc(add) if add.paymentHash == sha256(paymentPreimage) => (add, paymentPreimage)
    }
  }
}

The misleading comment in the code suggests this approach is safe, hiding the bug from a casual review.

Stealing HTLCs

An attacker could exploit this bug to steal funds as follows:

The attacker M opens a channel with the victim B, creating the following topology: A -- B -- M.
The attacker routes a payment to themselves along the path A->B->M.
M fails the payment by sending update_fail_htlc followed by commitment_signed. B updates their local commitment and revokes their previous one by sending revoke_and_ack followed by commitment_signed.
- At this point, M has two valid commitments: one with the HTLC present and one with it removed.
- Also at this point, B only has one valid commitment with the HTLC already removed.
M force-closes the channel by broadcasting their older commitment transaction where the HTLC still exists.
M claims the HTLC on the blockchain using the payment preimage.
B sees the on-chain transaction but fails to extract the preimage because the corresponding HTLC is missing from its local commitment.
Because B never learned the preimage, it cannot claim the payment from A.

When the time limit expires, A gets a refund, and the victim is left with the loss. The attacker keeps both the original funds and the payment they claimed on-chain.

The Fix

The solution was to update extractPreimages to check for HTLCs across all relevant commitment transactions, including the remote and next-remote commitments, not just the local one.

def extractPreimages(commitment: FullCommitment, tx: Transaction)(implicit log: LoggingAdapter): Set[(UpdateAddHtlc, ByteVector32)] = {
  // ... (code omitted that extracts htlcSuccess and claimHtlcSuccess preimages from tx)
  val paymentPreimages = (htlcSuccess ++ claimHtlcSuccess).toSet
  paymentPreimages.flatMap { paymentPreimage =>
    val paymentHash = sha256(paymentPreimage)
    // We only care about outgoing HTLCs when we're trying to learn a preimage to relay upstream.
    // Note that we may have already relayed the fulfill upstream if we already saw the preimage.
    val fromLocal = commitment.localCommit.spec.htlcs.collect {
      case OutgoingHtlc(add) if add.paymentHash == paymentHash => (add, paymentPreimage)
    }
    // From the remote point of view, those are incoming HTLCs.
    val fromRemote = commitment.remoteCommit.spec.htlcs.collect {
      case IncomingHtlc(add) if add.paymentHash == paymentHash => (add, paymentPreimage)
    }
    val fromNextRemote = commitment.nextRemoteCommit_opt.map(_.commit.spec.htlcs).getOrElse(Set.empty).collect {
      case IncomingHtlc(add) if add.paymentHash == paymentHash => (add, paymentPreimage)
    }
    fromLocal ++ fromRemote ++ fromNextRemote
  }
}

This change ensures that Eclair will correctly identify the HTLC and extract the necessary preimage, even if a malicious partner broadcasts an old channel state. The fix was discreetly included in a larger pull request for splicing and released in Eclair 0.12.0.

Discovery

The vulnerability was discovered accidentally during a discussion with Bastien Teinturier, who asked for a second look at the logic in the extractPreimage function. Upon review, the attack scenario was identified and reported.

Timeline

2025-03-05: Vulnerability reported to Bastien.
2025-03-11: Fix merged and Eclair 0.12.0 released.
2025-03-21: Agreement on public disclosure in six months.
2025-09-23: Public disclosure.

Prevention

In response to the vulnerability report, Bastien sent the following:

This code seems to have been there from the very beginning of eclair, and has not been updated or challenged since then. This is bad, I’m noticing that we lack a lot of unit tests for this kind of scenario, this should have been audited… I’ll spend time next week to check that we have tests for every known type of malicious force-close… Thanks for reporting this, it’s high time we audited that.

As promised, Bastien added a force-close test suite a couple weeks later. Had these tests existed from the start, this vulnerability would have been prevented.

Takeaways

More robust testing and auditing of Lightning implementations is badly needed.
Users should keep their node software updated.

Eclair: Preimage Extraction Exploit was originally published by Matt Morehouse at Matt Morehouse on September 23, 2025.

LND: gossip_timestamp_filter DoS

2025-07-22T00:00:00-00:00

LND 0.18.2 and below are vulnerable to a denial-of-service (DoS) attack involving repeated gossip requests for the full Lightning Network graph. The attack is trivial to execute and can cause LND to run out of memory (OOM) and crash or hang. You can protect your node by updating to at least LND 0.18.3 or by setting ignore-historical-gossip-filters=true in your node configuration.

Background

To send payments successfully across the Lightning Network, a node generally needs to have an accurate view of the Lightning Network graph. Lightning nodes maintain a local copy of the network graph that they continuously update as they receive channel and node updates from their peers via a gossip protocol.

New nodes and nodes that have been offline for a while need a way to bootstrap their local copy of the network graph. A common way this is done is to send a gossip_timestamp_filter message to some of the node’s peers, requesting that they share all gossip messages they have that are newer than a certain timestamp. Nodes that cooperate with the message will load the requested gossip from their databases and send them to the requesting peer.

The Vulnerability

By default, LND cooperates with all gossip_timestamp_filter requests. Prior to v0.18.3, LND’s logic to respond to these requests looks like this:

func RespondGossipFilter(filter *GossipTimestampFilter) {
  gossipMsgs := loadGossipFromDatabase(filter)

  go func() {
    for msg := range gossipMsgs {
      sendToPeerSynchronously(msg)
    }
  }
}

LND loads all requested messages into memory at the same time, and then sends them one by one to the peer, pausing after each send until the peer acknowledges receiving the message. The peer can specify any filter, including one that requests all historical gossip messages to be sent to them, and LND will happily comply with the request. As a result, LND can load potentially hundreds of thousands of messages into memory for each request. And since LND has no limit on the number of concurrent requests it will handle, memory usage can get out of hand quickly.

The DoS Attack

Exploiting this vulnerability to DoS attack a victim is easy. An attacker simply needs to:

Send lots of gossip_timestamp_filter messages to the victim, setting the timestamp to 0 to request the full graph.
Keep the connection with the victim open by periodically sending pings and slowly ACKing incoming messages.

This causes LND’s memory consumption to grow over time, until an OOM occurs.

Experiment

I carried out this DoS attack against an LND node with 8 GB of RAM and 2 GB of swap. After a few minutes, the node exhausted its RAM and started using swap, and LND’s performance slowed to a crawl. After about 2 hours, LND exhausted the swap as well and the operating system killed the LND process.

The Mitigation

LND 0.18.3 added a global semaphore to limit the number of concurrent gossip_timestamp_filter requests that LND will cooperate with. While this doesn’t fix LND’s excessive memory usage per request, it does limit the global impact on memory usage, which is enough to protect against this DoS attack.

Discovery

This vulnerability was discovered while looking at how LND handles various peer messages.

Timeline

2023-07-13: Vulnerability reported to the LND security mailing list.
2023-12-11: Failed attempt at a stealth mitigation, which could be bypassed by using multiple node IDs when carrying out the attack.
2023-12-11: Emailed the security mailing list again, explaining the problem with the attempted mitigation.
2024-08-27: Proper mitigation merged.
2024-09-12: LND 0.18.3 released containing the fix.
2025-07-22: Gijs gives the OK to disclose publicly.
2025-07-22: Public disclosure.

Prevention

This vulnerability has existed ever since gossip filtering was added to LND in 2018. The pull request that added the feature contained over 5k lines of new code and received only minor review feedback. It seems that no one was thinking adversarially about the new code at that time, and apparently no one has re-evaluated the code since then.

While it’s understandable that developers were more focused on building features and shipping quickly in the early days of the Lightning Network, I think it is long overdue that a shift is made to more careful development. Engineering with security in mind is slower and more difficult, but in the long run it pays dividends in the form of greater user trust and disasters avoided.

Takeaways

Update to at least LND 0.18.3 or set ignore-historical-gossip-filters=true to protect your node.
More investment in Lightning security is needed.

LND: gossip_timestamp_filter DoS was originally published by Matt Morehouse at Matt Morehouse on July 22, 2025.

LND's Deadline-Aware Budget Sweeper

2025-03-11T00:00:00-00:00

Starting with v0.18.0, LND has a completely rewritten sweeper subsystem for managing transaction batching and fee bumping. The new sweeper uses HTLC deadlines and fee budgets to compute a fee rate curve, dynamically adjusting fees (fee bumping) to prioritize urgent transactions. This new fee bumping strategy has some nice security benefits and is something other Lightning implementations should consider adopting.

Background

When an unreliable (or malicious) Lightning node goes offline while HTLCs are in flight, the other node in the channel can no longer claim the HTLCs off chain and will eventually have to force close and claim the HTLCs on chain. When this happens, it is critical that all HTLCs are claimed before certain deadlines:

Incoming HTLCs need to be claimed before their timelocks expire; otherwise, the channel counterparty can submit a competing timeout claim.
Outgoing HTLCs need to be claimed before their corresponding upstream HTLCs expire; otherwise, the upstream node can reclaim them on chain.

If HTLCs are not claimed before their deadlines, they can be entirely lost (or stolen).

Thus Lightning nodes need to pay enough transaction fees to ensure timely confirmation of their commitment and HTLC transactions. At the same time, nodes don’t want to overpay the fees, as these fees can become a major cost for node operators.

The solution implemented by all Lightning nodes is to start with a relatively low fee rate for these transactions and then use RBF to increase the fee rate as deadlines get closer.

RBF Strategies

Each node implementation uses a slightly different algorithm for choosing RBF fee rates, but in general there’s two main strategies:

external fee rate estimators
exponential bumping

External Fee Rate Estimators

This strategy chooses fee rates based on Bitcoin Core’s (or some other) fee rate estimator. The estimator is queried with the HTLC deadline as the confirmation target, and the returned fee rate is used for commitment and HTLC transactions. Typically the estimator is requeried every block to update fee rates and RBF any unconfirmed transactions.

CLN and LND prior to v0.18.0 use this strategy exclusively. eclair uses this strategy until deadlines are within 6 blocks, after which it switches to exponential bumping. LDK uses a combined strategy that sometimes uses the fee rate from the estimator and other times uses exponential bumping.

Exponential Bumping

In this strategy, the fee rate estimator is used to determine the initial fee rate, after which a fixed multiplier is used to increase fee rates for each RBF transaction.

eclair uses this strategy when deadlines are within 6 blocks, increasing fee rates by 20% each block while capping the total fees paid at the value of the HTLC being claimed. When LDK uses this strategy, it increases fee rates by 25% on each RBF.

Problems

While external fee rate estimators can be helpful, they’re not perfect. And relying on them too much can lead to missed deadlines when unusual things are happening in the mempool or with miners (e.g., increasing mempool congestion, pinning, replacement cycling, miner censorship). In such situations, higher-than-estimated fee rates may be needed to actually get transactions confirmed. Exponential bumping strategies help here but can still be ineffective if the original fee rate was too low.

The Deadline and Budget Aware RBF Strategy

LND’s new sweeper subsystem, released in v0.18.0, takes a novel approach to RBFing commitment and HTLC transactions. The system was designed around a key observation: for each HTLC on a commitment transaction, there are specific deadline and budget constraints for claiming that HTLC. The deadline is the block height by which the node needs to confirm the claim transaction for the HTLC. The budget is the maximum absolute fee the node operator is willing to pay to sweep the HTLC by the deadline. In practice, the budget is likely to be a fixed proportion of the HTLC value (i.e. operators are willing to pay more fees for larger HTLCs), so LND’s budget configuration parameters are based on proportions.

The sweeper operates by aggregating HTLC claims with matching deadlines into a single batched transaction. The budget for the batched transaction is calculated as the sum of the budgets for the individual HTLCs in the transaction. Based on the transaction budget and deadline, a fee function is computed that determines how much of the budget is spent as the deadline approaches. By default, a linear fee function is used which starts at a low fee (determined by the minimum relay fee rate or an external estimator) and ends with the total budget being allocated to fees when the deadline is one block away. The initial batched transaction is published and a “fee bumper” is assigned to monitor confirmation status in the background. For each block the transaction remains unconfirmed, the fee bumper broadcasts a new transaction with a higher fee rate determined by the fee function.

The sweeper architecture looks like this:

For more details about LND’s new sweeper, see the technical documentation. In this blog post, we’ll focus mostly on the sweeper’s deadline and budget aware RBF strategy.

Benefits

LND’s new sweeper system provides greater security against replacement cycling, pinning, and other adversarial or unexpected scenarios. It also fixed some bad bugs and vulnerabilities present with LND’s previous sweeper system.

Replacement Cycling Defense

Transaction rebroadcasting is a simple mitigation against replacement cycling attacks that has been adopted by all implementations. However, rebroadcasting alone does not guarantee that such attacks become uneconomical, especially when HTLC values are much larger than the fees Lightning nodes are willing to pay when claiming them on chain. By setting fee budgets in proportion to HTLC values, LND’s new sweeper is able to provide much stronger guarantees that any replacement cycling attacks will be uneconomical.

Cost of Replacement Cycling Attacks

With LND’s default parameters an attacker must generally spend at least 20x the value of the HTLC to successfully carry out a replacement cycling attack.

Default parameters:

fee budget: 50% of HTLC value
CLTV delta: 80 blocks

Assuming the attacker must do a minimum of one replacement per block:

\[attack\_cost \ge \sum_{t = 0}^{80} fee\_function(t)\] \[attack\_cost \ge \sum_{t = 0}^{80} 0.5 \cdot htlc\_value \cdot \frac{t}{80}\] \[attack\_cost \ge 20 \cdot htlc\_value\]

LND also rebroadcasts transactions every minute by default, so in practice the attacker must do ~10 replacements per block, making the cost closer to 200x the HTLC value.

Partial Pinning Defense

Because LND’s new default RBF strategy pays up to 50% of the HTLC value, LND now has a much greater ability to outbid pinning attacks, especially for larger HTLCs. It is unfortunate that significant fees need to be burned in this case, but the end result is still better than losing the full value of the HTLC.

Reduced Reliance on Fee Rate Estimators

As explained earlier, fee rate estimators are not always accurate, especially when mempool conditions are changing rapidly. In these situations, it can be very beneficial to use a simpler RBF strategy, especially when deadlines are approaching. LDK and eclair use exponential bumping in these scenarios, which helps in many cases. But ultimately the fee rate curve for an exponential bumping strategy still depends heavily on the starting fee rate, and if that fee rate is too low then deadlines can be missed. The exponential bumping strategy also ignores the value of the HTLC being claimed, which means that larger HTLCs get the same fee rates as smaller HTLCs, even when deadlines are getting close.

LND’s budget-based approach takes HTLC values into consideration when establishing the fee rate curve, ensuring that budgets are never exceeded and that HTLCs are never lost before an attempt to spend the full budget has been made. As such, the budget-based approach provides more consistent results and greater security in unexpected or adversarial situations.

LND-Specific Bug and Vulnerability Fixes

LND’s new sweeper fixed some bad bugs and vulnerabilities that existed with the previous sweeper.

Fee Bump Failures

Previously, LND had an inconsistent approach to broadcasting and fee bumping urgent transactions. In some places transactions would get broadcast with a specific confirmation target and would never be fee bumped again. In other places transactions would be RBF’d if the fee rate estimator determined that mempool fee rates had gone up, but the confirmation target given to the estimator would not be adjusted as deadlines approached.

Perhaps the worst of these fee bumping failures was a bug reported by Carsten Otto, where LND would fail to use the anchor output to CPFP a commitment transaction if the initial HTLC deadlines were far enough in the future. While this behavior is desirable to save on fees initially, it becomes a major problem when deadlines get closer and the commitment hasn’t confirmed on its own. Because LND did not adjust confirmation targets as deadlines approached, the commitment transaction would remain un-CPFP’d and could fail to confirm before HTLCs expired and funds could be lost. To make matters worse, the bug was trivial for an attacker to exploit.

LND’s sweeper rewrite took the opportunity to correct and unify all the transaction broadcasting and fee bumping logic in one place and fix all of these fee bumping failures at once.

Invalid Batching

LND’s previous sweeper also sometimes generated invalid or unsafe transactions when batching inputs together. This could happen in a couple ways:

Inputs that were invalid or had been double-spent could be batched with urgent HTLC claims, making the whole transaction invalid.
Anchor spends could be batched together, thereby violating the CPFP carve out and enabling channel counterparties to pin commitment transactions.

Rather than addressing these issues directly, the previous sweeper would use exponential backoff to regroup inputs after random delays and hope for a valid transaction. If another invalid transaction occurred, longer delays would be used before the next regrouping. Eventually, deadlines could be missed and funds lost.

LND’s new sweeper fixed these issues by being more careful about which inputs could be grouped together and by removing double-spent inputs from transactions that failed to broadcast.

Risks

The security of a Lightning node depends heavily on its ability to resolve HTLCs on chain when necessary. And unfortunately proper on-chain resolution can be tricky to get right (see 1, 2, 3). Making changes to the existing on-chain logic runs the risk of introducing new bugs and vulnerabilities.

For example, during code reviews of LND’s new sweeper there were many serious bugs discovered and fixed, ranging from catastrophic fee function failures to new fund-stealing exploits and more (1, 2, 3, 4, 5, 6). Node implementers should tread carefully when touching these parts of the codebase and remember that simplicity is often the best security.

Conclusion

LND’s new deadline-aware budget sweeper provides more secure fee bumping in adversarial situations and more consistent behavior when mempools are rapidly changing. Other implementations should consider incorporating budget awareness into their fee bumping strategies to improve defenses against replacement cycling and pinning attacks, and to reduce reliance on external fee estimators. At the same time, implementers would do well to avoid complete rewrites of the on-chain logic and instead keep the changes small and review them well.

LND's Deadline-Aware Budget Sweeper was originally published by Matt Morehouse at Matt Morehouse on March 11, 2025.

LND: Excessive Failback Exploit

2025-03-04T00:00:00-00:00

LND 0.17.5 and below contain a bug in the on-chain resolution logic that can be exploited to steal funds. For the attack to be practical the attacker must be able to force a restart of the victim node, perhaps via an unpatched DoS vector. Update to at least LND 0.18.0 to protect your node.

Background

Whenever a new payment is routed through a lightning channel, or whenever an existing payment is settled on the channel, the parties in that channel need to update their commitment transactions to match the new set of active HTLCs. During the course of these regular commitment updates, there is always a brief moment where one of the parties holds two valid commitment transactions. Normally that party immediately revokes the older commitment transaction after it receives a signature for the new one, bringing their number of valid commitment transactions back down to one. But for that brief moment, the other party in the channel must be able to handle the case where either of the valid commitments confirms on chain.

As part of this handling, nodes need to detect when any currently outstanding HTLCs are missing from the confirmed commitment transaction so that those HTLCs can be failed backward on the upstream channel.

The Excessive Failback Bug

Prior to v0.18.0, LND’s logic to detect and fail back missing HTLCs works like this:

func failBackMissingHtlcs(confirmedCommit Commitment) {
  currentCommit, pendingCommit := getValidCounterpartyCommitments()

  var danglingHtlcs HtlcSet
  if confirmedCommit == pendingCommit {
    danglingHtlcs = currentCommit.Htlcs()
  } else {
    danglingHtlcs = pendingCommit.Htlcs()
  }

  confirmedHtlcs := confirmedCommit.Htlcs()
  missingHtlcs := danglingHtlcs.SetDifference(confirmedHtlcs)
  for _, htlc := range missingHtlcs {
    failBackHtlc(htlc)
  }
}

LND compares the HTLCs present on the confirmed commitment transaction against the HTLCs present on the counterparty’s other valid commitment (if there is one) and fails back any HTLCs that are missing from the confirmed commitment. This logic is mostly correct, but it does the wrong thing in one particular scenario:

LND forwards an HTLC H to the counterparty, signing commitment C0 with H added as an output. The previous commitment is revoked.
The counterparty claims H by revealing the preimage to LND.
LND forwards the preimage upstream to start the process of claiming the incoming HTLC.
LND signs a new counterparty commitment C1 with H removed and its value added to the counterparty’s balance.
The counterparty refuses to revoke C0.
The counterparty broadcasts and confirms C1.

In this case, LND compares the confirmed commitment C1 against the other valid commitment C0 and determines that H is missing from the confirmed commitment. As a result, LND incorrectly determines that H needs to be failed back upstream, and executes the following logic:

func failBackHtlc(htlc Htlc) {
  markFailedInDatabase(htlc)
  
  incomingHtlc, ok := incomingHtlcMap[htlc]
  if !ok {
    log("Incoming HTLC has already been resolved")
    return
  }
  failHtlc(incomingHtlc)
  delete(incomingHtlcMap, htlc)
}

In this case, the preimage for the incoming HTLC was already sent upstream (step 3), so the corresponding entry in incomingHtlcMap has already been removed. Thus LND catches the “double resolution” and returns from failBackHtlc without sending the incorrect failure message upstream. Unfortunately, LND only catches the double resolution after H is marked as failed in the database. As a result, when LND next restarts it will reconstruct its state from the database and determine that H still needs to be failed back. If the incoming HTLC hasn’t been fully resolved with the upstream node, the reconstructed incomingHtlcMap will have an entry for H this time, and LND will incorrectly send a failure message upstream.

At that point, the downstream node will have claimed H via preimage while the upstream node will have had the HTLC refunded to them, causing LND to lose the full value of H.

Stealing HTLCs

Consider the following topology, where B is the victim and M0 and M1 are controlled by the attacker.

M0 -- B -- M1

The attacker can steal funds as follows:

M0 routes a large HTLC along the path M0 -> B -> M1.
M0 goes offline.
M1 claims the HTLC from B by revealing the preimage, receives a new commitment signature from B, and then refuses to revoke the previous commitment.
B attempts to claim the upstream HTLC from M0 but can’t because M0 is offline.
M1 force closes the B-M1 channel using their new commitment, thus triggering the excessive failback bug.
The attacker crashes B using an unpatched DoS vector.
M0 comes back online.
B restarts, loads HTLC resolution data from the database, and incorrectly fails the HTLC with M0.

At this point, the attacker has succeeded in stealing the HTLC from B. M0 got the HTLC refunded, while M1 got the value of the HTLC added to their balance on the confirmed commitment.

The Fix

The excessive failback bug was fixed by a small change to prevent failback of HTLCs for which the preimage is already known. The updated logic now explicitly checks for preimage availability before failing back each HTLC:

func failBackMissingHtlcs(confirmedCommit Commitment) {
  currentCommit, pendingCommit := getValidCounterpartyCommitments()

  var danglingHtlcs HtlcSet
  if confirmedCommit == pendingCommit {
    danglingHtlcs = currentCommit.Htlcs()
  } else {
    danglingHtlcs = pendingCommit.Htlcs()
  }

  confirmedHtlcs := confirmedCommit.Htlcs()
  missingHtlcs := danglingHtlcs.SetDifference(confirmedHtlcs)
  for _, htlc := range missingHtlcs {
    if preimageIsKnown(htlc.PaymentHash()) {
      continue  // Don't fail back HTLCs we can claim.
    }
    failBackHtlc(htlc)
  }
}

The preimageIsKnown check prevents failBackHtlc from being called when the preimage is known, so such HTLCs are never failed backward or marked as failed in the database. On restart, the incorrect failback behavior no longer occurs.

The patch was hidden in a massive rewrite of LND’s sweeper system and was released in LND 0.18.0.

Discovery

This vulnerability was discovered during an audit of LND’s contractcourt package, which handles on-chain resolution of force closures.

Timeline

2024-03-20: Vulnerability reported to the LND security mailing list.
2024-04-19: Fix merged.
2024-05-30: LND 0.18.0 released containing the fix.
2025-02-17: Gijs gives the OK to disclose publicly in March.
2025-03-04: Public disclosure.

Prevention

It appears all other lightning implementations have independently discovered and handled the corner case that LND mishandled:

CLN added a preimage check to the failback logic in 2018.
eclair introduced failback logic in 2023 that filtered upstream HTLCs by preimage availability.
LDK added a preimage check to the failback logic in 2023.

Yet the BOLT specification has not been updated to describe this corner case. In fact, by a strict interpretation the specification actually requires the incorrect behavior that LND implemented:

## HTLC Output Handling: Remote Commitment, Local Offers

### Requirements

A local node:
  - for any committed HTLC that does NOT have an output in this commitment transaction:
    - once the commitment transaction has reached reasonable depth:
      - MUST fail the corresponding incoming HTLC (if any).

It is quite unfortunate that all implementations had to independently discover and correct this bug. If any single implementation had contributed a small patch to the specification after discovering the issue, it would have at least sparked some discussion about whether the other implementations had considered this corner case. And if CLN had recognized that the specification needed updating back in 2018, there’s a good chance all other implementations would have handled this case correctly from the start.

Takeaways

Keeping specifications up-to-date can improve security for all implementations.
Update to at least LND 0.18.0 to protect your funds.

LND: Excessive Failback Exploit was originally published by Matt Morehouse at Matt Morehouse on March 04, 2025.

LDK: Duplicate HTLC Force Close Griefing

2025-01-29T00:00:00-00:00

LDK 0.1 and below are vulnerable to a griefing attack that causes all of the victim’s channels to be force closed. Update to LDK 0.1.1 to protect your channels.

Background

For this reason, LDK contains logic to detect when there’s a difference between the counterparty’s confirmed commitment transaction and the set of currently outstanding HTLCs. Any HTLCs missing from the confirmed commitment transaction are considered unrecoverable and are immediately failed backward on the upstream channel, while all other HTLCs are left active until the resolution of the downstream HTLC on chain.

Because the same payment hash and amount can be used for multiple HTLCs (e.g., multi-part payments), some extra data is stored to match HTLCs on commitment transactions against the set of outstanding HTLCs. LDK calls this extra data the “HTLC source” data, and LDK maintains this data for both of the counterparty’s valid commitment transactions.

The Duplicate HTLC Failback Bug

Once a counterparty commitment transaction has been revoked, however, LDK forgets the HTLC source data for that commitment transaction to save memory. As a result, if a revoked commitment transaction later confirms, LDK must attempt to match commitment transaction HTLCs up to outstanding HTLCs using only payment hashes and amounts. LDK’s logic to do this matching works as follows:

for htlc, htlc_source in outstanding_htlcs:
  if !confirmed_commitment_tx.is_revoked() &&
      confirmed_commitment_tx.contains_source(htlc_source):
    continue
  if confirmed_commitment_tx.is_revoked() &&
      confirmed_commitment_tx.contains_htlc(htlc.payment_hash, htlc.amount):
    continue

  failback_upstream_htlc(htlc_source)

Note that this logic short-circuits whenever an outstanding HTLC matches the payment hash and amount of an HTLC on the revoked commitment transaction. Thus if there are multiple outstanding HTLCs with the same payment hash and amount, a single HTLC on the revoked commitment transaction can prevent all of the duplicate outstanding HTLCs from being failed back immediately.

Those duplicate HTLCs remain outstanding until corresponding downstream HTLCs are resolved on chain. Except, in this case there’s only one downstream HTLC to resolve on chain, and its resolution only triggers one of the duplicate HTLCs to be failed upstream. All the other duplicate HTLCs are left outstanding indefinitely.

Force Close Griefing

Consider the following topology, where B is the victim and the A_[1..N] nodes are all the nodes that B has channels with. M_1 and M_2 are controlled by the attacker.

     -- A_1 --
    /         \
M_1 --  ...  -- B -- M_2
    \         /
     -- A_N --

The attacker routes N HTLCs from M_1 to M_2 using the same payment hash and amount for each, with each payment going through a different A node. M_2 then confirms a revoked commitment that contains only one of the N HTLCs. Due to the duplicate HTLC failback bug, only one of the routed HTLCs gets failed backwards, while the remaining N-1 HTLCs get stuck.

Finally, after upstream HTLCs expire, all the A nodes with stuck HTLCs force close their channels with B to reclaim the stuck HTLCs.

Attack Cost

The attacker must broadcast a revoked commitment transaction, thereby forfeiting their channel balance. But the size of the channel can be minimal, and the attacker can spend their balance down to the 1% reserve before executing the attack. As a result, the cost of the attack can be negligible compared to the damage caused.

The Fix

Starting in v0.1.1, LDK preemptively fails back HTLCs when their deadlines approach if the downstream channel has been force closed or is in the process of force closing. While the main purpose of this behavior is to prevent cascading force closures when mempool fee rates spike, it also has a nice side effect of ensuring that duplicate HTLCs always get failed back eventually after a revoked commitment transaction confirms. As a result, the duplicate HTLCs are never stuck long enough that the upstream nodes need to force close to reclaim them.

Discovery

This vulnerability was discovered during an audit of LDK’s chain module.

Timeline

2024-12-07: Vulnerability reported to the LDK security mailing list.
2025-01-27: Fix merged.
2025-01-28: LDK 0.1.1 released containing the fix, with public disclosure in release notes.
2025-01-29: Detailed description of vulnerability published.

Prevention

Prior to the introduction of the duplicate HTLC failback bug in 2022, LDK would immediately fail back all outstanding HTLCs once a revoked commitment reached 6 confirmations. This was the safe and conservative thing to do – HTLC source information was missing, so proper matching of HTLCs could not be done. And since all outputs on the revoked commitment and HTLC transactions could be claimed via revocation key, there was no concern about losing funds if the downstream counterparty confirmed an HTLC claim before LDK could.

Better Documentation

Considering that LDK previously had a test explicitly checking for the original (conservative) failback behavior, it does appear that the original behavior was understood and intentional. Unfortunately the original author did not document the reason for the original behavior anywhere in the code or test.

A single comment in the code would likely have been enough to prevent later contributors from introducing the buggy behavior:

// We fail back *all* outstanding HTLCs when a revoked commitment
// confirms because we don't have HTLC source information for revoked
// commitments, and attempting to match up HTLCs based on payment hashes
// and amounts is inherently unreliable.
//
// Failing back all HTLCs after a 6 block delay is safe in this case
// since we can use the revocation key to reliably claim all funds in the
// downstream channel and therefore won't lose funds overall.

Takeaways

Code documentation matters for preventing bugs.
Update to LDK 0.1.1 for the vulnerability fix.

LDK: Duplicate HTLC Force Close Griefing was originally published by Matt Morehouse at Matt Morehouse on January 29, 2025.

LDK: Invalid Claims Liquidity Griefing

2025-01-23T00:00:00-00:00

LDK 0.0.125 and below are vulnerable to a liquidity griefing attack against anchor channels. The attack locks up funds such that they can only be recovered by manually constructing and broadcasting a valid claim transaction. Affected users can unlock their funds by upgrading to LDK 0.1 and replaying the sequence of commitment and HTLC transactions that led to the lock up.

Background

When a channel is force closed, LDK creates and broadcasts transactions to claim any HTLCs it can from the commitment transaction that confirmed on chain. To save on fees, some HTLC claims are aggregated and broadcast together in the same transaction.

If the channel counterparty is able to get a competing HTLC claim confirmed first, it can cause one of LDK’s aggregated transactions to become invalid, since the corresponding HTLC input has already been spent by the counterparty’s claim. LDK contains logic to detect this scenario and remove the already-claimed input from its aggregated claim transaction. When everything works correctly, the aggregated transaction becomes valid again and LDK is able to claim the remaining HTLCs.

The Invalid Claims Bug

Prior to LDK 0.1, the logic to detect conflicting claims works like this:

for confirmed_transaction in confirmed_block:
  for input in confirmed_transaction:
    if claimable_outpoints.contains(input.prevout):
      agg_tx = get_aggregated_transaction_from_outpoint(input.prevout)
      agg_tx.remove_matching_inputs(confirmed_transaction)
      break  # This is the bug.

Note that this logic stops processing a confirmed transaction after finding the first aggregated transaction that conflicts with it. If the confirmed transaction conflicts with multiple aggregated transactions, conflicting inputs are only removed from the first matching aggregated transaction, and any other conflicting aggregated transactions are left invalid.

Any HTLCs claimed by invalid aggregated transactions get locked up and can only be recovered by manually constructing and broadcasting valid claim transactions.

Liquidity Griefing

Prior to LDK 0.1, there are only two types of HTLC claims that are aggregated:

HTLC preimage claims
revoked commitment HTLC claims

For HTLC preimage claims, LDK takes care to confirm them before their HTLCs time out, so there’s no reliable way for an attacker to confirm a conflicting timeout claim and trigger the invalid claims bug.

For revoked commitment transactions, however, an attacker can immediately spend any incoming HTLC outputs via HTLC-Success transactions. Although LDK is then able to claim the HTLC-Success outputs via the revocation key, the attacker can exploit the invalid claims bug to lock up any remaining HTLCs on the revoked commitment transaction.

Setup

The attacker opens an anchor channel with the victim, creating a network topology as follows:

A -- B -- M

In this case B is the victim LDK node and M is the node controlled by the attacker. The attacker must use an anchor channel so that they can spend multiple HTLC claims in the same transaction and trigger the invalid claims bug.

The attacker then routes HTLCs along the path A->B->M as follows:

1 small HTLC with CLTV of X
1 small HTLC with CLTV of X+1
1 large HTLC with CLTV of X+1 (this is the one the attacker will lock up)

The attacker knows preimages for all HTLCs but withholds them for now.

To complete the setup, the attacker routes some other HTLC through the channel, causing the commitment transaction with the above HTLCs to be revoked.

Forcing Multiple Aggregations

Next the attacker waits until block X-13 and force closes the B-M channel using their revoked commitment transaction, being sure to get it confirmed in block X-12. By confirming in this specific block, the attacker can exploit LDK’s buggy aggregation logic prior to v0.1 (see below), causing LDK to aggregate HTLC justice claims as follows:

Transaction 1: HTLC 1
Transaction 2: HTLCs 2 and 3

Buggy Aggregation Logic

Prior to v0.1, LDK only aggregates HTLC claims if their timeouts are more than 12 blocks in the future. Presumably 12 blocks was deemed “too soon” to guarantee that LDK can confirm preimage claims before the HTLCs time out, and once one HTLC times out the counterparty can pin a competing timeout claim in mempools, thereby preventing confirmation of all the aggregated preimage claims. In other words, by claiming HTLCs separately in this scenario, LDK limits the damage the counterparty could do if one of those HTLCs expires before LDK successfully claims it.

Unfortunately, this aggregation strategy makes no sense when LDK is trying to group justice claims that the counterparty can spend immediately via HTLC-Success, since the timeout on those HTLCs does not apply to the counterparty. Nevertheless, prior to LDK 0.1, the same 12 block aggregation check applies equally to all justice claims, regardless of whether the counterparty can spend them immediately or must wait to spend via HTLC-Timeout.

An attacker can exploit this buggy aggregation logic to make LDK create multiple claim transactions, as described above.

Locking Up Funds

Finally, the attacker broadcasts and confirms a transaction spending HTLCs 1 and 2 via HTLC-Success. The attacker’s transaction conflicts with both Transaction 1 and Transaction 2, but due to the invalid claims bug, LDK only notices the conflict with Transaction 1. LDK continues to fee bump and rebroadcast Transaction 2 indefinitely, even though it can never be mined.

As a result, the funds in HTLC 3 remain inaccessible until a valid claim transaction is manually constructed and broadcast.

Note that if the attacker ever tries to claim HTLC 3 via HTLC-Success, LDK is able to immediately recover it via the revocation key. So while the attacker can lock up HTLC 3, they cannot actually steal it once the upstream HTLC times out.

Attack Cost

When the attacker’s revoked commitment transaction confirms, LDK is able to immediately claim the attacker’s channel balance. LDK is also able to claim HTLCs 1 and 2 via the revocation key on the B-M channel, while also claiming them via the preimage on the upstream A-B channel.

Thus a smart attacker would minimize costs by spending their channel balance down to the 1% reserve before carrying out the attack and would then set the amounts of HTLCs 1 and 2 to just above the dust threshold. The attacker would also maximize the pain inflicted on the victim by setting HTLC 3 to the maximum allowed amount.

Stealing HTLCs in 0.1-beta

Beginning in v0.1-beta, LDK started aggregating HTLC timeout claims that have compatible locktimes. As a result, the beta release is vulnerable to a variant of the liquidity griefing attack that enables the attacker to steal funds. Thankfully the invalid claims bug was fixed between the 0.1-beta and 0.1 releases, so the final LDK 0.1 release is not vulnerable to this attack.

The fund-stealing variant for LDK 0.1-beta works as follows.

Setup

The attack setup is identical to the liquidity griefing attack, except that the attacker does not cause its commitment transaction to be revoked.

Forcing Multiple Aggregations

The attacker then force closes the B-M channel. Due to differing locktimes, LDK creates HTLC timeout claims as follows:

Transaction 1: HTLC 1 (locktime X)
Transaction 2: HTLCs 2 and 3 (locktime X+1)

Once height X is reached, LDK broadcasts Transaction 1. At height X+1, LDK broadcasts Transaction 2.

At this point, if Transaction 1 confirmed immediately in block X+1, the attack fails since the attacker can no longer spend HTLCs 1 and 2 together in the same transaction. But if Transaction 1 did not confirm immediately (which is more likely), the attack can continue.

Stealing Funds

The attacker broadcasts and confirms a transaction spending HTLCs 1 and 2 via HTLC-Success. This transaction conflicts with both Transaction 1 and Transaction 2, but due to the invalid claims bug, LDK only notices the conflict with Transaction 1. LDK continues to fee bump and rebroadcast Transaction 2 indefinitely, even though it can never be mined.

Once HTLC 3’s upstream timeout expires, node A force closes and claims a refund, leaving the coast clear for the attacker to claim the downstream HTLC via preimage.

The Fix

The invalid claims bug was fixed by a one-line patch just prior to the LDK 0.1 release.

Discovery

This vulnerability was discovered during an audit of LDK’s chain module.

Timeline

2024-12-23: Vulnerability reported to the LDK security mailing list.
2025-01-15: Fix merged.
2025-01-16: LDK 0.1 released containing the fix, with public disclosure in release notes.
2025-01-23: Detailed description of vulnerability published.

Prevention

The invalid claims bug is fundamentally a problem of incorrect control flow – a break statement was inserted into a loop where it shouldn’t have been. Why wasn’t it caught during initial code review, and why wasn’t it noticed for years after that?

The break statement was introduced back in 2019, long before LDK supported anchor channels. The code was actually correct back then, because before anchor channels there was no way for the counterparty to construct a transaction that conflicted with two of LDK’s aggregated transactions. But even after LDK 0.0.116 added support for anchor channels, the bug went unnoticed for over two years, despite multiple changes being made to the surrounding code in that time frame.

It’s impossible to say exactly what kept the bug hidden, but I think the complexity and unreadability of the surrounding code was a likely contributor. Here’s the for-loop containing the buggy code:

let mut bump_candidates = new_hash_map();
if !txn_matched.is_empty() { maybe_log_intro(); }
for tx in txn_matched {
    // Scan all input to verify is one of the outpoint spent is of interest for us
    let mut claimed_outputs_material = Vec::new();
    for inp in &tx.input {
        if let Some((claim_id, _)) = self.claimable_outpoints.get(&inp.previous_output) {
            // If outpoint has claim request pending on it...
            if let Some(request) = self.pending_claim_requests.get_mut(claim_id) {
                //... we need to check if the pending claim was for a subset of the outputs
                // spent by the confirmed transaction. If so, we can drop the pending claim
                // after ANTI_REORG_DELAY blocks, otherwise we need to split it and retry
                // claiming the remaining outputs.
                let mut is_claim_subset_of_tx = true;
                let mut tx_inputs = tx.input.iter().map(|input| &input.previous_output).collect::<Vec<_>>();
                tx_inputs.sort_unstable();
                for request_input in request.outpoints() {
                    if tx_inputs.binary_search(&request_input).is_err() {
                        is_claim_subset_of_tx = false;
                        break;
                    }
                }

                macro_rules! clean_claim_request_after_safety_delay {
                    () => {
                        let entry = OnchainEventEntry {
                            txid: tx.compute_txid(),
                            height: conf_height,
                            block_hash: Some(conf_hash),
                            event: OnchainEvent::Claim { claim_id: *claim_id }
                        };
                        if !self.onchain_events_awaiting_threshold_conf.contains(&entry) {
                            self.onchain_events_awaiting_threshold_conf.push(entry);
                        }
                    }
                }

                // If this is our transaction (or our counterparty spent all the outputs
                // before we could anyway with same inputs order than us), wait for
                // ANTI_REORG_DELAY and clean the RBF tracking map.
                if is_claim_subset_of_tx {
                    clean_claim_request_after_safety_delay!();
                } else { // If false, generate new claim request with update outpoint set
                    let mut at_least_one_drop = false;
                    for input in tx.input.iter() {
                        if let Some(package) = request.split_package(&input.previous_output) {
                            claimed_outputs_material.push(package);
                            at_least_one_drop = true;
                        }
                        // If there are no outpoints left to claim in this request, drop it entirely after ANTI_REORG_DELAY.
                        if request.outpoints().is_empty() {
                            clean_claim_request_after_safety_delay!();
                        }
                    }
                    //TODO: recompute soonest_timelock to avoid wasting a bit on fees
                    if at_least_one_drop {
                        bump_candidates.insert(*claim_id, request.clone());
                        // If we have any pending claim events for the request being updated
                        // that have yet to be consumed, we'll remove them since they will
                        // end up producing an invalid transaction by double spending
                        // input(s) that already have a confirmed spend. If such spend is
                        // reorged out of the chain, then we'll attempt to re-spend the
                        // inputs once we see it.
                        #[cfg(debug_assertions)] {
                            let existing = self.pending_claim_events.iter()
                                .filter(|entry| entry.0 == *claim_id).count();
                            assert!(existing == 0 || existing == 1);
                        }
                        self.pending_claim_events.retain(|entry| entry.0 != *claim_id);
                    }
                }
                break; //No need to iterate further, either tx is our or their
            } else {
                panic!("Inconsistencies between pending_claim_requests map and claimable_outpoints map");
            }
        }
    }
    for package in claimed_outputs_material.drain(..) {
        let entry = OnchainEventEntry {
            txid: tx.compute_txid(),
            height: conf_height,
            block_hash: Some(conf_hash),
            event: OnchainEvent::ContentiousOutpoint { package },
        };
        if !self.onchain_events_awaiting_threshold_conf.contains(&entry) {
            self.onchain_events_awaiting_threshold_conf.push(entry);
        }
    }
}

Perhaps others have a better mental parser than me, but I find this code quite difficult to read and understand. The loop is so long, with so much nesting and so many low-level implementation details that by the time I get to the buggy break statement, I’ve completely forgotten what loop it applies to. And since the comment attached to the break statement gives a believable explanation, it’s easy to gloss right over it.

Perhaps the buggy control flow would be easier to spot if the loop was simpler and more compact. By hand-waving some helper functions into existence and refactoring, the same code could be written as follows:

maybe_log_intro();

let mut bump_candidates = new_hash_map();
for tx in txn_matched {
    for inp in &tx.input {
        if let Some(claim_request) = self.get_mut_claim_request_from_outpoint(inp.previous_output) {
            let split_requests = claim_request.split_off_matching_inputs(&tx.input);
            debug_assert!(!split_requests.is_empty());

            if claim_request.outpoints().is_empty() {
                // Request has been fully claimed.
                self.mark_request_claimed(claim_request, tx, conf_height, conf_hash);
                break;
            }

            // After removing conflicting inputs, there's still more to claim.  Add the modified
            // request to bump_candidates so it gets fee bumped and rebroadcast.
            self.remove_pending_claim_events(claim_request);
            bump_candidates.insert(claim_request.clone());

            self.mark_requests_contentious(split_requests, tx, conf_height, conf_hash);
            break;
        }
    }

The control flow in this version is much more apparent to the reader. And although there’s no guarantee that the buggy break statements would have been discovered sooner if the code was written this way, I do think the odds would have been much better.

Takeaways

Code readability matters for preventing bugs.
Update to LDK 0.1 for the vulnerability fix.

LDK: Invalid Claims Liquidity Griefing was originally published by Matt Morehouse at Matt Morehouse on January 23, 2025.

DoS: LND Onion Bomb

2024-06-18T00:00:00-00:00

LND versions prior to 0.17.0 are vulnerable to a DoS attack where malicious onion packets cause the node to instantly run out of memory (OOM) and crash. If you are running an LND release older than this, your funds are at risk! Update to at least 0.17.0 to protect your node.

Severity

It is critical that users update to at least LND 0.17.0 for several reasons.

The attack is cheap and easy to carry out and will keep the victim offline for as long as it lasts.
The source of the attack is concealed via onion routing. The attacker does not need to connect directly to the victim.
Prior to LND 0.17.0, all nodes are vulnerable. The fix was not backported to the LND 0.16.x series or earlier.

The Vulnerability

The Lightning Network uses onion routing to provide senders and receivers of payments some degree of privacy. Each node along a payment route receives an onion packet from the previous node, containing forwarding instructions for the next node on the route. The onion packet is encrypted by the initiator of the payment, so that each node can only read its own forwarding instructions.

Once a node has “peeled off” its layer of encryption from the onion packet, it can extract its forwarding instructions according to the format specified in the LN protocol:

Field Name	Size	Description
`length`	1-9 bytes	The length of the `payload` field, encoded as BigSize.
`payload`	`length` bytes	The forwarding instructions.
`hmac`	32 bytes	The HMAC to use for the forwarded onion packet.
`next_onion`	remaining bytes	The onion packet to forward.

Prior to LND 0.17.0, the code that extracts these instructions is essentially:

// Decode unpacks an encoded HopPayload from the passed reader into the
// target HopPayload.
func (hp *HopPayload) Decode(r io.Reader) error {
    bufReader := bufio.NewReader(r)

    var b [8]byte
    varInt, err := ReadVarInt(bufReader, &b)
    if err != nil {
        return err
    }

    payloadSize := uint32(varInt)

    // Now that we know the payload size, we'll create a new buffer to
    // read it out in full.
    hp.Payload = make([]byte, payloadSize)
    if _, err := io.ReadFull(bufReader, hp.Payload[:]); err != nil {
        return err
    }
    if _, err := io.ReadFull(bufReader, hp.HMAC[:]); err != nil {
        return err
    }

    return nil
}

Note the absence of a bounds check on payloadSize!

Regardless of the actual payload size, LND allocates memory for whatever length is encoded in the onion packet up to UINT32_MAX (4 GB).

The DoS Attack

It is trivial for an attacker to craft an onion packet that contains an encoded length of UINT32_MAX for the victim’s forwarding instructions. If the victim’s node has less than 4 GB of memory available, it will OOM crash instantly upon receiving the attacker’s packet.

However, if the victim’s node has more than 4 GB of memory available, it is able to recover from the malicious packet. The victim’s node will temporarily allocate 4 GB, but the Go garbage collector will quickly reclaim that memory after decoding fails.

So nodes with more than 4 GB of RAM are safe, right?

Not quite. The attacker can send many malicious packets simultaneously. If the victim processes enough malicious packets before the garbage collector kicks in, an OOM will still occur. And since LND decodes onion packets in parallel, it is not difficult for an attacker to beat the garbage collector. In my experiments I was able to consistently crash nodes with up to 128 GB of RAM in just a few seconds.

The Fix

A bounds check on the encoded length field was concealed in a large refactoring commit and included in LND 0.17.0. The fixed code is essentially:

// Decode unpacks an encoded HopPayload from the passed reader into the
// target HopPayload.
func (hp *HopPayload) Decode(r io.Reader) error {
    bufReader := bufio.NewReader(r)

    payloadSize, err := tlvPayloadSize(bufReader)
    if err != nil {
        return err
    }

    // Now that we know the payload size, we'll create a new buffer to
    // read it out in full.
    hp.Payload = make([]byte, payloadSize)
    if _, err := io.ReadFull(bufReader, hp.Payload[:]); err != nil {
        return err
    }
    if _, err := io.ReadFull(bufReader, hp.HMAC[:]); err != nil {
        return err
    }

    return nil
}

// tlvPayloadSize uses the passed reader to extract the payload length
// encoded as a var-int.
func tlvPayloadSize(r io.Reader) (uint16, error) {
    var b [8]byte
    varInt, err := ReadVarInt(r, &b)
    if err != nil {
        return 0, err
    }

    if varInt > math.MaxUint16 {
        return 0, fmt.Errorf("payload size of %d is larger than the "+
            "maximum allowed size of %d", varInt, math.MaxUint16)
    }

    return uint16(varInt), nil
}

This new code reduces the maximum amount of memory LND will allocate when decoding an onion packet from 4 GB to 64 KB, which is enough to fully mitigate the DoS attack.

Discovery

A simple fuzz test for onion packet encoding and decoding revealed this vulnerability.

Timeline

2023-06-20: Vulnerability discovered and disclosed to Lightning Labs.
2023-08-23: Fix merged.
2023-10-03: LND 0.17.0 released containing the fix.
2024-05-16: Laolu gives the OK to disclose publicly once LND 0.18.0 is released and has some uptake.
2024-05-30: LND 0.18.0 released.
2024-06-18: Public disclosure.

Prevention

This vulnerability was found in less than a minute of fuzz testing. If basic fuzz tests had been written at the time the original onion decoding functions were introduced, the bug would have been caught before it was merged.

In general any function that processes untrusted inputs is a strong candidate for fuzz testing, and often these fuzz tests are easier to write than traditional unit tests. A minimal fuzz test that detects this particular vulnerability is exceedingly simple:

func FuzzHopPayload(f *testing.F) {
    f.Fuzz(func(t *testing.T, data []byte) {
        // Hop payloads larger than 1300 bytes violate the spec and never
        // reach the decoding step in practice.
        if len(data) > 1300 {
            return
        }

        var hopPayload sphinx.HopPayload
        hopPayload.Decode(bytes.NewReader(data))
    })
}

Takeaways

Write fuzz tests for all APIs that consume untrusted inputs.
Update your LND nodes to at least 0.17.0.

DoS: LND Onion Bomb was originally published by Matt Morehouse at Matt Morehouse on June 18, 2024.

DoS: Channel Open Race in CLN

2024-01-08T00:00:00-00:00

CLN versions between 23.02 and 23.05.2 are susceptible to a DoS attack involving the exploitation of a race condition during channel opens. If you are running any version in this range, your funds may be at risk! Update to at least 23.08 to help protect your node.

The Vulnerability

The vulnerability arises from a race condition between two different flows in CLN: the channel open flow and the peer connection flow.

The Channel Open Flow

When a peer opens a channel with a CLN node, the following interactions occur on the CLN node.

The connectd daemon notifies lightningd about the channel open request.
lightningd launches a new openingd daemon to handle the channel open negotiation.
openingd completes the channel open negotiation up to the point where the funding outpoint is known.
openingd sends the funding outpoint to lightningd and exits.
lightningd launches a channeld daemon to manage the new channel.

The Peer Connection Flow

Once a peer has a channel with a CLN node, if the peer disconnects and reconnects the following occurs on the CLN node.

The connectd daemon notifies lightningd about the new peer connection.
lightningd calls a plugin hook notifying the chanbackup plugin about the new peer connection.
chanbackup notifies lightningd that it is done running the hook.
With the hook finished, lightningd recognizes that a previous channel exists with the peer and launches a channeld daemon to manage it.

The Race Condition

Problems arise when the peer connection flow overlaps with the channel open flow, causing lightningd to attempt launching the same channeld daemon twice. This can happen if the peer quickly opens a channel after connecting, and the chanbackup plugin is delayed in handling the peer connection hook, leading to the following interactions on the CLN node.

The connectd daemon notifies lightningd about the new peer connection.
lightningd calls a plugin hook notifying the chanbackup plugin about the new peer connection.
The connectd daemon notifies lightningd about the channel open request.
lightningd launches a new openingd daemon to handle the channel open negotiation.
openingd completes the channel open negotiation up to the point where the funding outpoint is known.
openingd sends the funding outpoint to lightningd and exits.
lightningd launches a channeld daemon to manage the new channel.
chanbackup notifies lightningd that it is done running the hook.
With the hook finished, lightningd recognizes that a previous channel exists with the peer and attempts to launch a channeld daemon to manage it. Since the daemon is already running, an assertion failure occurs and CLN crashes.

The DoS Attack

To reliably trigger the assertion failure, an attacker needs to somehow slow down the chanbackup plugin so that a channel can be opened before the plugin finishes running the peer connected hook. One way to do this is to overload chanbackup with many peer connections and channel state changes. As it turns out, the fake channel DoS attack is a trivial and free method of generating these events and overloading chanbackup.

On a local network with low latency, I was able to generate enough load on chanbackup to consistently crash CLN nodes in under 5 seconds. In the real world the attack would be carried out across the Internet with higher latencies, so more load on chanbackup would be required to trigger the race condition. In my experiments, crashing CLN nodes across the Internet took around 30 seconds.

The Defense

To prevent the assertion failure from triggering, a small patch was added to CLN 23.08 that checks if a channeld is already running when the peer connected hook returns. If so, lightningd does not attempt to start the channeld again.

Note that this patch does not actually remove the race condition, though it does prevent crashing when the race occurs.

Discovery

This vulnerability was discovered during follow-up testing prior to the disclosure of the fake channel DoS vector. At the time, Rusty and I agreed to move forward with the planned disclosure of the fake channel DoS vector, but to delay disclosure of this channel open race until a later date.

Since the channel open race can be triggered by the fake channel DoS attack, it is a valid question how the race went undiscovered during the implementation of defenses against that attack. The answer is that the race was actually untriggerable until a few weeks after the fake channel DoS defenses were merged.

While the race condition was introduced in March 2022, the race couldn’t actually trigger because no plugins used the peer connected hook. It wasn’t until February 2023 that the race was exposed, when the peer storage backup feature made chanbackup the first official plugin to use the hook.

Timeline

2022-03-23: Race condition introduced to CLN 0.11.
2022-12-15: Fake channel DoS vector disclosed to Blockstream.
2023-01-21: Fake channel DoS defenses fully merged [1, 2].
2023-02-08: Peer storage backup feature introduced, exposing the channel open race vulnerability.
2023-03-03: CLN 23.02 released.
2023-07-28: Rusty gives the OK to disclose the fake channel DoS vector.
2023-08-14: Follow-up testing reveals the channel open race vulnerability. Disclosed to Blockstream.
2023-08-21: Defense against the channel open race DoS merged.
2023-08-22: Rusty gives the OK to continue with the fake channel DoS disclosure, but requests that the channel open race vulnerability be omitted from the disclosure.
2023-08-23: Public disclosure of the fake channel DoS.
2023-08-23: CLN 23.08 released.
2023-12-04: Rusty gives the OK to disclose the channel open race vulnerability.
2024-01-08: Public disclosure.

Prevention

This vulnerability could have been prevented by a couple software engineering best practices.

Avoid Race Conditions

The original purpose of the peer connected hook was to enable plugins to filter and reject incoming connections from certain peers. Therefore the hook was designed to be synchronous, and all other events initiated by the peer were blocked until the hook returned. Unfortunately, PR 5078 destroyed that property of the hook by introducing a known race condition to the code (search for “this is racy” in commit 2424b7d). If PR 5078 hadn’t done this, there would be no race condition to exploit and this vulnerability would never have existed.

Race conditions can be nasty and should be avoided whenever possible. Knowingly adding race conditions where they didn’t previously exist is generally a bad idea.

Do Stress Testing

When I disclosed the fake channel DoS vector to Blockstream, I also provided a DoS program that demonstrated the attack. That same DoS program revealed the channel open race vulnerability after it became triggerable in February 2023. If a stress test based on the DoS program had been added to CLN’s CI pipeline or release process, this vulnerability could have been caught much earlier, before it was included in any releases.

In general, there is some difficulty in releasing such a test publicly while the vulnerability it tests for is still secret. In such situations the test can remain unreleased until the vulnerability has been publicly disclosed, and in the meantime the test can be run privately during the release process to ensure no regressions have been introduced. In CLN’s case, this may have been unnecessary – a stress test could have plausibly been added to PR 5849 without raising suspicion.

Takeaways

Avoid race conditions.
Use regression and stress testing.
Update your CLN nodes to at least v23.08.

DoS: Channel Open Race in CLN was originally published by Matt Morehouse at Matt Morehouse on January 08, 2024.

Invoice Parsing Bugs in CLN

2023-12-08T00:00:00-00:00

Several invoice parsing bugs were fixed in CLN 23.11, including bugs that caused crashes, undefined behavior, and use of uninitialized memory. These bugs could be reliably triggered by specially crafted invoices, enabling a malicious counterparty to crash the victim’s node upon invoice payment.

The parsing bugs were discovered by a new fuzz test written by Niklas Gögge and enhanced by me.

Bugs fixed in v23.11

#	Type	Root Cause	Fix
1	undefined behavior	unchecked return value	eeec529
2	use of uninitialized memory	missing check for 0-length TLV	ee501b0
3	crash	unnecessary assertion	ee8cf69
4	crash	missing recovery ID validation	c1f2068
5	crash	missing pubkey validation	87f4907

The fuzz target

The fuzz target that uncovered these bugs was initially written by Niklas Gögge in December 2022, though it wasn’t made public until October 2023. The target simply provides fuzzer-generated inputs to CLN’s invoice decoding function, similar to fuzz targets written for other implementations [1, 2].

To improve the fuzzer’s efficiency, Niklas also wrote a custom mutator for the target. Invoices are encoded in bech32 which requires a valid checksum at the end of the encoding, making it quite difficult for fuzzers to generate valid bech32 consistently. As a result, bech32-naive fuzzers will generally get stuck at the bech32 decoding stage and have a hard time exploring deeper into the invoice parsing logic. Niklas’ custom mutator teaches the fuzzer how to generate valid bech32 so that it can focus its fuzzing on invoice parsing.

Initial fuzzing in 2022

After writing the fuzz target in December 2022, Niklas privately reported several bugs to CLN including a stack buffer overflow, an assertion failure, and undefined behavior due to a 0-length array. Many of the bugs were fixed in PR 5891 and released in CLN 23.02.

Merging the fuzz target in 2023

In October 2023, Niklas submitted his fuzz target for review in PR 6750. The initial corpus in that PR actually triggered bugs 1 and 2, but Niklas didn’t notice because he had been fuzzing with some UBSan options misconfigured. CLN’s CI didn’t detect the bugs either, since UBSan had previously been accidentally disabled in CI.

Niklas also discovered bug 3 during initial fuzzing, but he initially thought it was a false report and hard-coded an exception for it in the fuzz target.

Enhancements

The initial fuzz target only fuzzed the invoice decoding logic, skipping signature checks. I modified the target to also run the signature-checking logic, which enabled the fuzzer to quickly find bug 4.

While bug 5 should have also been discoverable by the fuzzer after this change, it remained undetected even after many weeks of CPU time. It wasn’t until I added a custom cross-over mutator for the fuzz target that bug 5 was discovered. The cross-over mutator is based on Niklas’ custom mutator and simply combines pieces from multiple bech32-decoded invoices before re-encoding the result in bech32. Within a few CPU hours of fuzzing with this extra mutator, the fuzzer found bug 5.

Impact

The severity of these bugs seems relatively low since they can only be triggered when paying an invoice. If a malicious invoice causes your node to crash, as long as you can restart your node in a timely manner and avoid paying any more invoices from the malicious counterparty, no further harm can be done.

Since bug 2 involves uninitialized memory it could potentially be more serious, as a sophisticated attacker may be able to extract sensitive data from the invoice-decoding process. Such an attack would be quite complex, and it is unclear whether it would even be possible in practice. It’s also unclear exactly what sensitive data could be extracted, since CLN handles private keys in a separate dedicated process (the hsmd daemon).

Takeaways

Fuzz testing is an essential component of writing robust and secure software. Any API that consumes untrusted inputs should be fuzz tested.
Custom mutators can be very powerful for fuzzing deeper logic in the codebase.
Fuzz testing of C or C++ code should use both ASan and UBSan. MSan and valgrind can also be useful.

Invoice Parsing Bugs in CLN was originally published by Matt Morehouse at Matt Morehouse on December 08, 2023.

DoS: Fake Lightning Channels

2023-08-23T00:00:00-00:00

Lightning nodes released prior to the following versions are susceptible to a DoS attack involving the creation of large numbers of fake channels:

If you are running node software older than this, your funds may be at risk! Update to at least the above versions to help protect your node.

The vulnerability

When one lightning node (the funder) wishes to open a channel to another node (the fundee), the following sequence of events takes place:

The funder sends an open_channel message with the desired parameters for the channel.
The fundee checks that the channel parameters are reasonable and then sends an accept_channel message.
The funder creates the funding transaction and sends a funding_created message containing the funding outpoint and their signature for the commitment transaction.
The fundee verifies the funder’s commitment signature and sends funding_signed with their own signature for the commitment. The fundee begins watching the chain for the funding transaction.
The funder verifies the fundee’s commitment signature, broadcasts the funding transaction, and then watches for it to show up onchain.
Both nodes send channel_ready once the funding transaction has enough confirmations. Payments can now be sent across the channel.

But what happens if the funder doesn’t broadcast the funding transaction in step 5?

The fundee, eager for inbound liquidity, is willing to wait for the funding transaction to confirm for a period of time. But eventually the fundee needs to give up on the pending channel and reclaim the resources allocated to it. BOLT 2 recommends waiting for 2016 blocks (2 weeks) before abandoning the pending channel.

Thus for 2 weeks the fundee devotes some amount of database storage, RAM, and CPU time to watching for the pending channel to confirm.

The fake channel DoS attack

An attacker can thus force a victim node to consume a small amount of resources by opening a fake channel with the victim and never publishing it onchain. If the attacker can create lots of fake channels, they can lock up lots of the victim’s resources.

Fake channels are trivial to create. Since there is no way for the victim to verify the funding outpoint sent to them in the funding_created message, the attacker doesn’t even need to construct a real funding transaction. They can use a randomly-generated funding transaction ID and sign a commitment transaction based on that fake ID. The victim will successfully verify the commitment signature against the provided (fake) funding outpoint and gladly allocate resources for the fake pending channel.

Opening lots of these fake channels is also trivial against node software older than the above releases. Some older node implementations do impose a limit on the number of pending channels allowed per peer, but such limits are easily bypassed by using a new attacker node ID for each fake channel.

DoS effects

In my experiments, I was able to create hundreds of thousands of fake channels against victim nodes (owned by me), with all kinds of adverse effects. In some cases, funds were clearly at risk of being stolen due to the victim node’s inability to respond to cheating attempts.

Here’s how the DoS attack affected each node implementation.

LND

Over the course of a couple days, LND’s performance degraded so drastically that it stopped responding to requests from its peers or from the CLI. The performance degradation continued on restart, even if the attacker was no longer actively DoSing.

I didn’t continue the DoS experiment for more than a couple days, but it’s very possible that with enough time the victim node would have become unresponsive enough that funds could be stolen without consequence.

CLN

After one day of the DoS attack, CLN’s connectd daemon was completely blocked and unable to respond to connection requests from other nodes. Most other functionality of CLN continued to work, and funds were not at risk since the separate lightningd daemon was not blocked by the DoS attack.

eclair

One day into the DoS, eclair OOM crashed. After that, every time eclair restarted, it OOM crashed again within 30 minutes, even if the attacker was no longer actively DoSing. Funds were clearly at risk, since an offline node cannot catch cheating attempts.

LDK

Since LDK is a library and not a full node implementation, it was trickier to experiment with. LDK Node didn’t exist at the time, but I found the ldk-sample node and modified it to run on mainnet for the experiment.

Within hours of the DoS attack, ldk-sample’s performance degraded drastically, causing it to unsync with the blockchain. A few days later, ldk-sample’s view of the blockchain was pinned more than 144 blocks in the past, preventing it from responding to cheating attempts before the attacker’s CSV timelock expired.

DoS defenses

I reported the DoS vector to the 4 major lightning implementations around the start of 2023. eclair and LDK were already aware of the potential DoS vector but hadn’t realized the severity of the vulnerability. Within days of receiving my report, every lightning implementation began working on defenses, some openly and others in secret.

All implementations have now shipped releases with defenses against the DoS. If you’re interested in the technical details of the defenses, see the linked pull requests and commits.

Date Reported	Implementation	Defenses	Release
2022-12-12	LND	pending channel limit [1]	0.16.0
2022-12-15	CLN	significant performance improvements [1, 2]	23.02
2022-12-28	eclair	pending channel and peer limits [1, 2]	0.9.0
2023-01-17	LDK	pending channel and peer limits [1]	0.0.114

Lessons

Use watchtowers

When all else fails, watchtowers help to protect funds if your lightning node is incapacitated by a DoS attack. If you have significant funds at risk, it’s cheap insurance to run a private watchtower on a separate machine.

Multiple processes

Prior to the above releases, CLN was the only lightning implementation that clearly kept user funds safe while under DoS, because CLN actually runs as multiple separate daemon processes. In the case of this DoS attack, the connectd daemon responsible for handling peer connections became locked up while the lightningd daemon watching the blockchain was relatively unaffected.

Multiprocess architectures in general provide some defense against DoS, as one process slowing down or crashing doesn’t automatically bring down the other processes. For this reason, other implementations may want to consider splitting their nodes into separate processes. CLN could also improve robustness further by attempting to restart DoS-able subdaemons like connectd and gossipd if they crash, rather than shutting the whole node down.

More security auditing needed

I discovered this DoS vector last year. I had been reviewing the dual funding protocol and found a griefing attack involving fake dual-funded channels. After discussing the attack with Bastien Teinturier, I came to realize that a similar attack may also affect the single-funded protocol.

But I convinced myself for a couple months that surely such a trivial attack would have been defended against already. It wasn’t until I spent some time studying implementations’ funding code that I realized there were no defenses.

The fact that this DoS vector went unnoticed since the beginning of the Lightning Network should make everyone a little scared. If a newcomer like me could discover this vulnerability in a couple months, there are probably many other vulnerabilities in the Lightning Network waiting to be found and exploited.

For quite some time, it seems that security and robustness have not been the top priority for node implementations, with some implementations not even having security policies until 6-10 months ago [1, 2]. Everyone wants new lightning features: dual funded channels, Taproot channels, splicing, BOLT 12, etc. And those things are important. But every one of them introduces more complexity and more potential attack surface. If we’re going to make lightning even more complex, we also need to ramp up the engineering effort we put towards making the network secure and robust.

Because in the end it doesn’t matter how feature-rich and easy-to-use the Lightning Network is if it can’t keep user funds safe.

DoS: Fake Lightning Channels was originally published by Matt Morehouse at Matt Morehouse on August 23, 2023.