Matt MorehouseJekyll2025-12-04T13:26:11-06:00https://morehouse.github.io/Matt Morehousehttps://morehouse.github.io/[email protected]https://morehouse.github.io/lightning/lnd-replacement-stalling-attack2025-12-04T00:00:00-00:002025-12-04T00:00:00-06:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>A vulnerability in LND versions 0.18.5 and below allows attackers to steal node funds.
Users should immediately upgrade to <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.19.0-beta">LND 0.19.0</a> or later to protect their funds.</p>
<h2 id="background">Background</h2>
<p>LND has a <a href="/lightning/lnd-deadline-aware-budget-sweeper/"><em>sweeper</em> subsystem</a> for managing transaction batching and fee bumping.
When a Lightning channel is force closed, the sweeper kicks into action, collecting HTLC inputs into batched claim transactions to save on mining fees.
The sweeper then periodically bumps the fees paid by those transactions until the transactions confirm.</p>
<p>It is critical that the sweeper gets certain HTLC transactions confirmed before the corresponding upstream HTLCs expire, or else the value of those HTLCs can be completely lost.
For this reason, a fairly aggressive default fee-bumping strategy is used, and as upstream HTLC deadlines approach, the sweeper is willing to spend up to half the value of those HTLCs in mining fees.</p>
<h2 id="sweeper-weaknesses">Sweeper Weaknesses</h2>
<p>LND’s aggressive fee bumping could be thwarted, however, due to a couple weaknesses in the sweeper.</p>
<h3 id="fee-resets-on-reaggregation">Fee Resets on Reaggregation</h3>
<p>If an input to a batched transaction was double-spent by someone else, the sweeper would regroup the remaining inputs into a new transaction and <a href="https://github.com/lightningnetwork/lnd/issues/9422"><em>reset the fees paid</em></a> by that transaction to the minimum value of the fee function.
If this happened many times, the sweeper would end up broadcasting transactions with much lower fees than intended, and upstream deadlines could be missed.</p>
<h3 id="broadcast-delays">Broadcast Delays</h3>
<p>Additionally, the regrouping of inputs after a double spend would be delayed until the <a href="https://github.com/lightningnetwork/lnd/pull/8717#issuecomment-2099089198"><em>next</em> block confirmed</a>.
So if lots of double spends happened, the sweeper would miss out on 50% of the available opportunities to get its time-sensitive transactions confirmed.
Once again, this could cause upstream deadlines to be missed and funds to be lost.</p>
<h2 id="a-basic-replacement-stalling-attack">A Basic Replacement Stalling Attack</h2>
<p>An attacker could take advantage of these sweeper weaknesses to steal funds.
The basic idea is to cause the sweeper to batch many HTLC inputs together, then repeatedly double spend those inputs, causing the sweeper to keep regrouping the remaining inputs into new transactions.
Each double spend prevents the sweeper’s transaction from confirming for at least 2 blocks, while also resetting the fees paid by the next sweeper transaction to the minimum, so future double spends remain cheap.
After upstream HTLC timelocks expire, all remaining HTLCs could be stolen.</p>
<p>An attack would look like this:</p>
<ol>
<li>The attacker opens a direct channel to the victim and routes ~40 HTLCs to themselves through the victim, using the minimum CLTV delta the victim allows (80 blocks by default). The attacker intends to steal the last HTLC, so they make that one as large as possible.</li>
<li>The attacker holds the HTLCs until they expire and the victim force closes the channel to reclaim them. At this point, the 80-block countdown to the upstream deadline starts, and the attacker needs to stall the victim for that long to steal funds.</li>
<li>Because all 40 of the attacker’s HTLCs have the same upstream deadline, the victim’s sweeper batches all 40 HTLC-Timeouts into a single transaction and broadcasts it.</li>
<li>The attacker sees the batched transaction in their mempool and immediately replaces the transaction with a preimage spend for one of the 40 HTLCs.</li>
<li>The double-spend confirms, and the victim is able to extract the HTLC preimage and settle the corresponding upstream HTLC, but the remaining 39 HTLC-Timeouts are not reaggregated until <em>another</em> block confirms (see the section “Broadcast Delays” above).</li>
<li>Another block confirms, and the victim broadcasts a new transaction containing the remaining HTLC-Timeouts. The fees for this transaction are reset to the minimum value of the fee function. The attacker repeats the process from Step 4, double-spending a new HTLC each time until the upstream deadline has passed.</li>
<li>The attacker steals the remaining HTLC(s) by claiming the preimage path downstream and the timeout path upstream.</li>
</ol>
<h3 id="attack-cost">Attack Cost</h3>
<p>In the worst case for the attacker, they must do ~40 replacements, each spending more total fees than the replaced batched transaction.
We can calculate the fees of each batched HTLC-Timeout transaction as <code class="language-plaintext highlighter-rouge">size * feerate</code>, where <code class="language-plaintext highlighter-rouge">size</code> and <code class="language-plaintext highlighter-rouge">feerate</code> are estimated as follows:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">size</code>: <code class="language-plaintext highlighter-rouge">num_htlcs * 166.5 vB</code></li>
<li><code class="language-plaintext highlighter-rouge">feerate</code>: minimum value of LND’s fee function. By default, this is the value returned by bitcoind’s <code class="language-plaintext highlighter-rouge">estimatesmartfee</code> RPC.</li>
</ul>
<p>Today, <code class="language-plaintext highlighter-rouge">estimatesmartfee</code> returns feerates between 0.7 sat/vB and 2.1 sat/vB depending on the confirmation target.
To simplify calculations, we assume an average feerate of 1.4 sat/vB over the course of the attack.
We also assume on average there are 20 HTLCs present on the batched transaction, since it starts with 40 HTLCs and decreases by 1 every 2 blocks until a single HTLC remains.
With these simplifying assumptions, we get a rough cost as follows:</p>
<ul>
<li>average cost per replacement: <code class="language-plaintext highlighter-rouge">20 HTLCs * 166.5 vB/HTLC * 1.4 sat/vB = 4,662 sat</code></li>
<li>total attack cost: <code class="language-plaintext highlighter-rouge">40 replacements * 4,662 sat/replacement = 186,480 sat</code></li>
</ul>
<p><strong>So for less than 200k sats, the attacker can steal essentially the entire channel capacity.</strong></p>
<h3 id="optimizations">Optimizations</h3>
<p>In practice, the cost of the attack is even less, since the attacker’s double spends may not confirm in the first available block, which means fewer than 40 double spends actually need to confirm.
The attacker can also intentionally reduce the probability of confirmation by inflating the size of their double-spend transactions to the maximum possible while still replacing the victim’s transactions.</p>
<p>Additionally, a smart attacker, knowing they need fewer double spends to confirm, can reduce the number of HTLCs they route at the start of the attack.
As a result, the victim’s batched transactions become smaller and the attacker can save on replacement fees.</p>
<p>For example, suppose the attacker can stall for 80 blocks with only 30 double spends.
Then the cost of the attack is reduced by over 40%:</p>
<ul>
<li>average cost per replacement: <code class="language-plaintext highlighter-rouge">15 HTLCs * 166.5 vB/HTLC * 1.4 sat/vB = 3,497 sat</code></li>
<li>total attack cost: <code class="language-plaintext highlighter-rouge">30 replacements * 3,497 sat/replacement = 104,910 sat</code></li>
</ul>
<h2 id="mitigation">Mitigation</h2>
<p><a href="https://github.com/lightningnetwork/lnd/pull/9447">Changes</a> were made in <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.19.0-beta">LND 0.19.0</a> that eliminated the reaggregation delay and the fee function reset.</p>
<p>These changes, combined with the sweeper’s aggressive default fee function, ensure that any replacement stalling attack costs many times more than the amount that can be stolen.</p>
<h2 id="discovery">Discovery</h2>
<p>This attack vector was discovered during code review of LND’s sweeper rewrite in May 2024.</p>
<h3 id="timeline">Timeline</h3>
<ul>
<li><strong>2024-05-09:</strong> Attack vector reported to the LND security mailing list.</li>
<li><strong>2025-01-16:</strong> No progress on a mitigation. Reported the fee reset weakness <a href="https://github.com/lightningnetwork/lnd/issues/9422">publicly</a> and followed up on the security mailing list.</li>
<li><strong>2025-02-21:</strong> Mitigation <a href="https://github.com/lightningnetwork/lnd/pull/9447">merged</a>.</li>
<li><strong>2025-05-22:</strong> LND 0.19.0 released containing the fix.</li>
<li><strong>2025-10-31:</strong> Agreement to disclose publicly after LND 0.20.0 was released.</li>
<li><strong>2025-12-04:</strong> Public disclosure.</li>
</ul>
<h2 id="prevention">Prevention</h2>
<p>This vulnerability was introduced during LND’s sweeper rewrite in May 2024, and I reported it before LND 0.18.0 was released containing the vulnerability.
In my report, I suggested that the new sweeper be released in 0.18.0 and this vulnerability be fixed in 0.18.1, since a mitigation would require some work and the new sweeper already fixed several <a href="/lightning/lnd-deadline-aware-budget-sweeper/">other vulnerabilities</a>.
Unfortunately that didn’t happen, and this vulnerability went unaddressed until I followed up again in 2025.</p>
<p>In hindsight, I should have done a better job at keeping the LND team accountable.
I could have reported the vulnerability publicly, thereby forcing the issue to be addressed before the 0.18.0 release.
The downside is that this would have delayed other important security fixes to the sweeper subsystem.</p>
<p>Alternatively, I could have reported the vulnerability privately (as I did) but given the LND team a deadline (say, 6 months) after which I would disclose the vulnerability publicly regardless of whether they mitigated it.
This may have applied enough pressure to get the issue fixed in 0.18.1 as I originally intended.</p>
<h2 id="takeaways">Takeaways</h2>
<ul>
<li>Set disclosure deadlines to improve security outcomes.</li>
<li>Users should keep their node software updated.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/lnd-replacement-stalling-attack/">LND: Replacement Stalling Attack</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on December 04, 2025.</p>
https://morehouse.github.io/lightning/lnd-infinite-inbox-dos2025-12-04T00:00:00-00:002025-12-04T00:00:00-06:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>LND 0.18.5 and below are vulnerable to a denial-of-service (DoS) attack that causes LND to run out of memory (OOM) and crash or hang.
Users should upgrade to at least <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.19.0-beta">LND 0.19.0</a> to protect their nodes.</p>
<h2 id="the-infinite-inbox-vulnerability">The Infinite Inbox Vulnerability</h2>
<p>When LND receives a message from one of its peers, a dedicated dispatcher thread queues the message for processing by the appropriate subsystem.
For two such subsystems (the gossiper and the channel link), up to 1,000 messages could be queued per peer.
Since Lightning protocol messages can be up to 64 KB in size, and since LND allowed as many peers as there were available file descriptors, memory could be exhausted quickly.</p>
<h2 id="the-dos-attack">The DoS Attack</h2>
<p>A simple, free way to exploit the vulnerability was to open multiple connections to the victim and spam <a href="https://github.com/lightning/bolts/blob/master/07-routing-gossip.md#the-query_short_channel_idsreply_short_channel_ids_end-messages"><code class="language-plaintext highlighter-rouge">query_short_channel_ids</code></a> messages of size 64 KB, keeping the connections open until LND ran out of memory.</p>
<p>In my experiments against an LND node with 8 GB of RAM, I was able to cause an OOM in under 5 minutes.</p>
<h2 id="the-mitigation">The Mitigation</h2>
<p>The vulnerability was mitigated by reducing queue sizes and <a href="https://github.com/lightningnetwork/lnd/pull/9458">introducing</a> a new “peer access manager” to limit peer connections.
Starting in <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.19.0-beta">LND 0.19.0</a>, queue sizes are reduced to 50 messages and no more than 100 connections are allowed from peers without open channels.</p>
<h2 id="discovery">Discovery</h2>
<p>This vulnerability was discovered while examining how LND handles various peer messages.</p>
<h3 id="timeline">Timeline</h3>
<ul>
<li><strong>2023-09-15:</strong> Vulnerability reported to the LND security mailing list.</li>
<li><strong>2025-03-12:</strong> Mitigation <a href="https://github.com/lightningnetwork/lnd/pull/9458">merged</a>.</li>
<li><strong>2025-05-22:</strong> LND 0.19.0 released containing the fix.</li>
<li><strong>2025-10-31:</strong> Agreement on public disclosure after LND 0.20.0 is released.</li>
<li><strong>2025-12-04:</strong> Public disclosure.</li>
</ul>
<h2 id="takeaways">Takeaways</h2>
<ul>
<li>More investment in Lightning security is needed.</li>
<li>Users should keep their node software updated.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/lnd-infinite-inbox-dos/">LND: Infinite Inbox DoS</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on December 04, 2025.</p>
https://morehouse.github.io/lightning/lnd-excessive-failback-exploit-22025-12-04T00:00:00-00:002025-12-04T00:00:00-06:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>A variant of the <a href="/lightning/lnd-excessive-failback-exploit/">excessive failback exploit</a> disclosed earlier this year affects LND versions 0.18.5 and below, allowing attackers to steal node funds.
Users should immediately upgrade to <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.19.0-beta">LND 0.19.0</a> or later to protect their funds.</p>
<h2 id="the-excessive-failback-bug-revisited">The Excessive Failback Bug Revisited</h2>
<p>As described in the previous <a href="/lightning/lnd-excessive-failback-exploit/">disclosure</a>, the original excessive failback bug existed in LND versions 0.17.5 and earlier.
Essentially, when one of LND’s channel peers force closed the channel, LND would mark any HTLCs missing from the confirmed commitment as “failed” in the database, even if the HTLC had actually succeeded with the <em>downstream</em> peer.
If LND then restarted before the corresponding <em>upstream</em> HTLC was resolved, LND would incorrectly fail that HTLC with the upstream peer.
Both the upstream and downstream peers would be able to claim the HTLC, and LND would be left with a loss.</p>
<h2 id="the-variant-bug">The Variant Bug</h2>
<p>While a fix for the original excessive failback bug was included in LND 0.18.0, a minor variant of the bug remained when the channel was force closed using LND’s commitment instead of the attacker’s.
In other words, the exact same attack was still possible if the attacker got the <em>victim</em> to force close the channel themselves.
Unfortunately this is very easy to do; the attacker could simply send the victim an <code class="language-plaintext highlighter-rouge">error</code> message.</p>
<h2 id="the-fix">The Fix</h2>
<p>The excessive failback bug variant was <a href="https://github.com/lightningnetwork/lnd/commit/5a72d5258ff679071c4d7b687194e56ca163e02e">quietly fixed</a> in the same way as the original bug, and the fix was included in the LND 0.19.0 release.</p>
<h2 id="discovery">Discovery</h2>
<p>This variant was discovered shortly after the original disclosure, while I was <a href="https://github.com/lightning/bolts/pull/1233">updating</a> BOLT 5 to prevent future excessive failback vulnerabilities.
I realized there were actually <em>two</em> cases that needed to be updated in BOLT 5, but only one of the cases had been patched in LND.</p>
<h3 id="timeline">Timeline</h3>
<ul>
<li><strong>2025-03-04:</strong> Public disclosure of the <a href="/lightning/lnd-excessive-failback-exploit/">original</a> excessive failback vulnerability.</li>
<li><strong>2025-03-04:</strong> BOLT 5 update <a href="https://github.com/lightning/bolts/pull/1233">drafted</a>; variant discovered.</li>
<li><strong>2025-03-05:</strong> Variant reported to the LND security mailing list.</li>
<li><strong>2025-03-20:</strong> Fix <a href="https://github.com/lightningnetwork/lnd/pull/9602">merged</a>.</li>
<li><strong>2025-05-22:</strong> LND 0.19.0 released containing the fix.</li>
<li><strong>2025-10-31:</strong> Agreement to disclose publicly after LND 0.20.0 was released.</li>
<li><strong>2025-12-04:</strong> Public disclosure.</li>
</ul>
<h2 id="prevention">Prevention</h2>
<p>In the previous disclosure post, I suggested that the excessive failback bug could have been prevented if the BOLT 5 specification had been clearer about how to handle HTLCs missing from confirmed commitment transactions.
At the time, some Lightning maintainers were skeptical that a clearer specification would have helped.</p>
<p>But this variant of the bug was only discovered <em>when I actually went and clarified BOLT 5 myself!</em>
I think this is strong evidence that a clearer specification could have prevented both variants of the bug.</p>
<h2 id="a-note-on-collaboration">A Note on Collaboration</h2>
<p>As I noted in the previous excessive failback disclosure, it seems that at some point every Lightning implementation independently discovered and fixed bugs similar to the excessive failback bug in LND.
Yet no one (including LND) thought to update the specification to help others avoid such bugs in the future.</p>
<p>When I finally did update the specification, good things happened.
This variant of the excessive failback bug was discovered and fixed in LND.
But I also noticed that Eclair might have been vulnerable to this variant and reached out to Bastien Teinturier.
While it turned out that Eclair was not vulnerable, the discussion with Bastien led to the accidental discovery of a different <a href="/lightning/eclair-preimage-extraction-exploit/">serious vulnerability</a> in Eclair.</p>
<p>This all happened from just a tiny bit of collaboration: a specification update for the common good and a short conversation with Bastien.
In many ways, it is quite unfortunate that Lightning engineering talent is spread out over so many implementations.
Everyone focuses on their own code first, and collaboration is secondary.
Efforts are duplicated and lessons are learned multiple times.
Imagine what we could accomplish with a little more cooperation.</p>
<h2 id="takeaways">Takeaways</h2>
<ul>
<li>Clear specifications benefit all Lightning implementations.</li>
<li>We should do more cross-implementation collaboration.</li>
<li>Users should keep their node software updated.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/lnd-excessive-failback-exploit-2/">LND: Excessive Failback Exploit #2</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on December 04, 2025.</p>
https://morehouse.github.io/lightning/eclair-preimage-extraction-exploit2025-09-23T00:00:00-00:002025-09-23T00:00:00-05:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>A critical vulnerability in Eclair versions 0.11.0 and below allows attackers to steal node funds.
Users should immediately upgrade to <a href="https://github.com/ACINQ/eclair/releases/tag/v0.12.0">Eclair 0.12.0</a> or later to protect their funds.</p>
<h2 id="background">Background</h2>
<p>In the Lightning Network, nodes forward payments using contracts called HTLCs (Hash Time-Locked Contracts).
To settle a payment, the final recipient reveals a secret piece of data called a preimage.
This preimage is passed backward along the payment route, allowing each node to claim their funds from the previous node.</p>
<p>If a channel is forced to close, these settlements can happen on the Bitcoin blockchain.
Nodes must watch the blockchain to spot these preimages so they can claim their own funds.</p>
<h2 id="the-preimage-extraction-vulnerability">The Preimage Extraction Vulnerability</h2>
<p>The vulnerability in Eclair existed in how it monitored the blockchain for preimages during a force close.
Eclair would only check for HTLCs that existed in its <strong>local commitment transaction</strong> — its own current version of the channel’s state.
The code incorrectly assumed this local state would always contain a complete list of all possible HTLCs.</p>
<p>However, a malicious channel partner could broadcast an older, but still valid, commitment transaction.
This older state could contain an HTLC that the victim’s node had already removed from its own local state.
When the attacker claimed this HTLC on-chain with a preimage, the victim’s Eclair node would ignore it because the HTLC wasn’t in its local records, causing the victim to lose the funds.</p>
<p>The original <a href="https://github.com/ACINQ/eclair/blob/c7a288b91fc19e89683c531cb3e9f61e59deace9/eclair-core/src/main/scala/fr/acinq/eclair/channel/Helpers.scala#L1299-L1314">code snippet</a> illustrates the issue:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">extractPreimages</span><span class="o">(</span><span class="n">localCommit</span><span class="k">:</span> <span class="kt">LocalCommit</span><span class="o">,</span> <span class="n">tx</span><span class="k">:</span> <span class="kt">Transaction</span><span class="o">)(</span><span class="k">implicit</span> <span class="n">log</span><span class="k">:</span> <span class="kt">LoggingAdapter</span><span class="o">)</span><span class="k">:</span> <span class="kt">Set</span><span class="o">[(</span><span class="kt">UpdateAddHtlc</span>, <span class="kt">ByteVector32</span><span class="o">)]</span> <span class="k">=</span> <span class="o">{</span>
<span class="c1">// ... (code omitted that extracts htlcSuccess and claimHtlcSuccess preimages from tx)</span>
<span class="k">val</span> <span class="nv">paymentPreimages</span> <span class="k">=</span> <span class="o">(</span><span class="n">htlcSuccess</span> <span class="o">++</span> <span class="n">claimHtlcSuccess</span><span class="o">).</span><span class="py">toSet</span>
<span class="nv">paymentPreimages</span><span class="o">.</span><span class="py">flatMap</span> <span class="o">{</span> <span class="n">paymentPreimage</span> <span class="k">=></span>
<span class="c1">// we only consider htlcs in our local commitment, because we only care about outgoing htlcs, which disappear first in the remote commitment</span>
<span class="c1">// if an outgoing htlc is in the remote commitment, then:</span>
<span class="c1">// - either it is in the local commitment (it was never fulfilled)</span>
<span class="c1">// - or we have already received the fulfill and forwarded it upstream</span>
<span class="nv">localCommit</span><span class="o">.</span><span class="py">spec</span><span class="o">.</span><span class="py">htlcs</span><span class="o">.</span><span class="py">collect</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">OutgoingHtlc</span><span class="o">(</span><span class="n">add</span><span class="o">)</span> <span class="k">if</span> <span class="nv">add</span><span class="o">.</span><span class="py">paymentHash</span> <span class="o">==</span> <span class="nf">sha256</span><span class="o">(</span><span class="n">paymentPreimage</span><span class="o">)</span> <span class="k">=></span> <span class="o">(</span><span class="n">add</span><span class="o">,</span> <span class="n">paymentPreimage</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>The misleading comment in the code suggests this approach is safe, hiding the bug from a casual review.</p>
<h2 id="stealing-htlcs">Stealing HTLCs</h2>
<p>An attacker could exploit this bug to steal funds as follows:</p>
<ol>
<li>The attacker <code class="language-plaintext highlighter-rouge">M</code> opens a channel with the victim <code class="language-plaintext highlighter-rouge">B</code>, creating the following topology: <code class="language-plaintext highlighter-rouge">A -- B -- M</code>.</li>
<li>The attacker routes a payment to themselves along the path <code class="language-plaintext highlighter-rouge">A->B->M</code>.</li>
<li><code class="language-plaintext highlighter-rouge">M</code> fails the payment by sending <code class="language-plaintext highlighter-rouge">update_fail_htlc</code> followed by <code class="language-plaintext highlighter-rouge">commitment_signed</code>. <code class="language-plaintext highlighter-rouge">B</code> updates their local commitment and revokes their previous one by sending <code class="language-plaintext highlighter-rouge">revoke_and_ack</code> followed by <code class="language-plaintext highlighter-rouge">commitment_signed</code>.
<ul>
<li>At this point, <code class="language-plaintext highlighter-rouge">M</code> has two valid commitments: one with the HTLC present and one with it removed.</li>
<li>Also at this point, <code class="language-plaintext highlighter-rouge">B</code> only has one valid commitment with the HTLC already removed.</li>
</ul>
</li>
<li><code class="language-plaintext highlighter-rouge">M</code> force-closes the channel by broadcasting their <em>older</em> commitment transaction where the HTLC still exists.</li>
<li><code class="language-plaintext highlighter-rouge">M</code> claims the HTLC on the blockchain using the payment preimage.</li>
<li><code class="language-plaintext highlighter-rouge">B</code> sees the on-chain transaction but fails to extract the preimage because the corresponding HTLC is missing from its <em>local</em> commitment.</li>
<li>Because <code class="language-plaintext highlighter-rouge">B</code> never learned the preimage, it cannot claim the payment from <code class="language-plaintext highlighter-rouge">A</code>.</li>
</ol>
<p>When the time limit expires, <code class="language-plaintext highlighter-rouge">A</code> gets a refund, and the victim is left with the loss.
The attacker keeps both the original funds and the payment they claimed on-chain.</p>
<h2 id="the-fix">The Fix</h2>
<p>The solution was to update <code class="language-plaintext highlighter-rouge">extractPreimages</code> to check for HTLCs across <strong>all relevant commitment transactions</strong>, including the remote and next-remote commitments, not just the local one.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">extractPreimages</span><span class="o">(</span><span class="n">commitment</span><span class="k">:</span> <span class="kt">FullCommitment</span><span class="o">,</span> <span class="n">tx</span><span class="k">:</span> <span class="kt">Transaction</span><span class="o">)(</span><span class="k">implicit</span> <span class="n">log</span><span class="k">:</span> <span class="kt">LoggingAdapter</span><span class="o">)</span><span class="k">:</span> <span class="kt">Set</span><span class="o">[(</span><span class="kt">UpdateAddHtlc</span>, <span class="kt">ByteVector32</span><span class="o">)]</span> <span class="k">=</span> <span class="o">{</span>
<span class="c1">// ... (code omitted that extracts htlcSuccess and claimHtlcSuccess preimages from tx)</span>
<span class="k">val</span> <span class="nv">paymentPreimages</span> <span class="k">=</span> <span class="o">(</span><span class="n">htlcSuccess</span> <span class="o">++</span> <span class="n">claimHtlcSuccess</span><span class="o">).</span><span class="py">toSet</span>
<span class="nv">paymentPreimages</span><span class="o">.</span><span class="py">flatMap</span> <span class="o">{</span> <span class="n">paymentPreimage</span> <span class="k">=></span>
<span class="k">val</span> <span class="nv">paymentHash</span> <span class="k">=</span> <span class="nf">sha256</span><span class="o">(</span><span class="n">paymentPreimage</span><span class="o">)</span>
<span class="c1">// We only care about outgoing HTLCs when we're trying to learn a preimage to relay upstream.</span>
<span class="c1">// Note that we may have already relayed the fulfill upstream if we already saw the preimage.</span>
<span class="k">val</span> <span class="nv">fromLocal</span> <span class="k">=</span> <span class="nv">commitment</span><span class="o">.</span><span class="py">localCommit</span><span class="o">.</span><span class="py">spec</span><span class="o">.</span><span class="py">htlcs</span><span class="o">.</span><span class="py">collect</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">OutgoingHtlc</span><span class="o">(</span><span class="n">add</span><span class="o">)</span> <span class="k">if</span> <span class="nv">add</span><span class="o">.</span><span class="py">paymentHash</span> <span class="o">==</span> <span class="n">paymentHash</span> <span class="k">=></span> <span class="o">(</span><span class="n">add</span><span class="o">,</span> <span class="n">paymentPreimage</span><span class="o">)</span>
<span class="o">}</span>
<span class="c1">// From the remote point of view, those are incoming HTLCs.</span>
<span class="k">val</span> <span class="nv">fromRemote</span> <span class="k">=</span> <span class="nv">commitment</span><span class="o">.</span><span class="py">remoteCommit</span><span class="o">.</span><span class="py">spec</span><span class="o">.</span><span class="py">htlcs</span><span class="o">.</span><span class="py">collect</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">IncomingHtlc</span><span class="o">(</span><span class="n">add</span><span class="o">)</span> <span class="k">if</span> <span class="nv">add</span><span class="o">.</span><span class="py">paymentHash</span> <span class="o">==</span> <span class="n">paymentHash</span> <span class="k">=></span> <span class="o">(</span><span class="n">add</span><span class="o">,</span> <span class="n">paymentPreimage</span><span class="o">)</span>
<span class="o">}</span>
<span class="k">val</span> <span class="nv">fromNextRemote</span> <span class="k">=</span> <span class="nv">commitment</span><span class="o">.</span><span class="py">nextRemoteCommit_opt</span><span class="o">.</span><span class="py">map</span><span class="o">(</span><span class="nv">_</span><span class="o">.</span><span class="py">commit</span><span class="o">.</span><span class="py">spec</span><span class="o">.</span><span class="py">htlcs</span><span class="o">).</span><span class="py">getOrElse</span><span class="o">(</span><span class="nv">Set</span><span class="o">.</span><span class="py">empty</span><span class="o">).</span><span class="py">collect</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">IncomingHtlc</span><span class="o">(</span><span class="n">add</span><span class="o">)</span> <span class="k">if</span> <span class="nv">add</span><span class="o">.</span><span class="py">paymentHash</span> <span class="o">==</span> <span class="n">paymentHash</span> <span class="k">=></span> <span class="o">(</span><span class="n">add</span><span class="o">,</span> <span class="n">paymentPreimage</span><span class="o">)</span>
<span class="o">}</span>
<span class="n">fromLocal</span> <span class="o">++</span> <span class="n">fromRemote</span> <span class="o">++</span> <span class="n">fromNextRemote</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>This change ensures that Eclair will correctly identify the HTLC and extract the necessary preimage, even if a malicious partner broadcasts an old channel state.
The <a href="https://github.com/ACINQ/eclair/commit/6a8df49a9bf006a0826b828020f551ecb6c7a33e#diff-97779917bce211cd035ebf8f9f265a7ecece4efcd1861c7bab05e0113dd86b06R1306-R1319">fix</a> was discreetly included in a <a href="https://github.com/ACINQ/eclair/pull/2966">larger pull request</a> for splicing and released in <a href="https://github.com/ACINQ/eclair/releases/tag/v0.12.0">Eclair 0.12.0</a>.</p>
<h2 id="discovery">Discovery</h2>
<p>The vulnerability was discovered accidentally during a discussion with Bastien Teinturier, who asked for a second look at the logic in the <code class="language-plaintext highlighter-rouge">extractPreimage</code> function.
Upon review, the attack scenario was identified and reported.</p>
<h3 id="timeline">Timeline</h3>
<ul>
<li><strong>2025-03-05:</strong> Vulnerability reported to Bastien.</li>
<li><strong>2025-03-11:</strong> Fix <a href="https://github.com/ACINQ/eclair/commit/6a8df49a9bf006a0826b828020f551ecb6c7a33e#diff-97779917bce211cd035ebf8f9f265a7ecece4efcd1861c7bab05e0113dd86b06R1306-R1319">merged</a> and Eclair 0.12.0 released.</li>
<li><strong>2025-03-21:</strong> Agreement on public disclosure in six months.</li>
<li><strong>2025-09-23:</strong> Public disclosure.</li>
</ul>
<h2 id="prevention">Prevention</h2>
<p>In response to the vulnerability report, Bastien sent the following:</p>
<blockquote>
<p>This code seems to have been there from the very beginning of eclair, and has not been updated or challenged since then.
This is bad, I’m noticing that we lack a lot of unit tests for this kind of scenario, this should have been audited…
I’ll spend time next week to check that we have tests for every known type of malicious force-close…
Thanks for reporting this, it’s high time we audited that.</p>
</blockquote>
<p>As promised, Bastien added a force-close <a href="https://github.com/ACINQ/eclair/pull/3040">test suite</a> a couple weeks later.
Had these tests existed from the start, this vulnerability would have been prevented.</p>
<h2 id="takeaways">Takeaways</h2>
<ul>
<li>More robust testing and auditing of Lightning implementations is badly needed.</li>
<li>Users should keep their node software updated.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/eclair-preimage-extraction-exploit/">Eclair: Preimage Extraction Exploit</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on September 23, 2025.</p>
https://morehouse.github.io/lightning/lnd-gossip-timestamp-filter-dos2025-07-22T00:00:00-00:002025-07-22T00:00:00-05:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>LND 0.18.2 and below are vulnerable to a denial-of-service (DoS) attack involving repeated gossip requests for the full Lightning Network graph.
The attack is trivial to execute and can cause LND to run out of memory (OOM) and crash or hang.
You can protect your node by updating to at least <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.18.3-beta">LND 0.18.3</a> or by setting <code class="language-plaintext highlighter-rouge">ignore-historical-gossip-filters=true</code> in your node configuration.</p>
<h2 id="background">Background</h2>
<p>To send payments successfully across the Lightning Network, a node generally needs to have an accurate view of the Lightning Network graph.
Lightning nodes maintain a local copy of the network graph that they continuously update as they receive channel and node updates from their peers via a <a href="https://en.wikipedia.org/wiki/Gossip_protocol">gossip protocol</a>.</p>
<p>New nodes and nodes that have been offline for a while need a way to bootstrap their local copy of the network graph.
A common way this is done is to send a <a href="https://github.com/lightning/bolts/blob/master/07-routing-gossip.md#the-gossip_timestamp_filter-message"><code class="language-plaintext highlighter-rouge">gossip_timestamp_filter</code></a> message to some of the node’s peers, requesting that they share all gossip messages they have that are newer than a certain timestamp.
Nodes that cooperate with the message will load the requested gossip from their databases and send them to the requesting peer.</p>
<h2 id="the-vulnerability">The Vulnerability</h2>
<p>By default, LND cooperates with all <code class="language-plaintext highlighter-rouge">gossip_timestamp_filter</code> requests.
Prior to v0.18.3, LND’s <a href="https://github.com/lightningnetwork/lnd/blob/9380292a5a41697640c2284186c82dad6f7b004f/discovery/syncer.go#L1317-L1384">logic</a> to respond to these requests looks like this:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">RespondGossipFilter</span><span class="p">(</span><span class="n">filter</span> <span class="o">*</span><span class="n">GossipTimestampFilter</span><span class="p">)</span> <span class="p">{</span>
<span class="n">gossipMsgs</span> <span class="o">:=</span> <span class="n">loadGossipFromDatabase</span><span class="p">(</span><span class="n">filter</span><span class="p">)</span>
<span class="k">go</span> <span class="k">func</span><span class="p">()</span> <span class="p">{</span>
<span class="k">for</span> <span class="n">msg</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">gossipMsgs</span> <span class="p">{</span>
<span class="n">sendToPeerSynchronously</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>LND loads <em>all</em> requested messages into memory at the same time, and then sends them one by one to the peer, pausing after each send until the peer acknowledges receiving the message.
The peer can specify any filter, including one that requests <em>all</em> historical gossip messages to be sent to them, and LND will happily comply with the request.
As a result, <strong>LND can load potentially hundreds of thousands of messages into memory for <em>each</em> request</strong>.
And since LND has no limit on the number of concurrent requests it will handle, memory usage can get out of hand quickly.</p>
<h2 id="the-dos-attack">The DoS Attack</h2>
<p>Exploiting this vulnerability to DoS attack a victim is easy.
An attacker simply needs to:</p>
<ol>
<li>Send lots of <code class="language-plaintext highlighter-rouge">gossip_timestamp_filter</code> messages to the victim, setting the timestamp to 0 to request the full graph.</li>
<li>Keep the connection with the victim open by periodically sending pings and slowly ACKing incoming messages.</li>
</ol>
<p>This causes LND’s memory consumption to grow over time, until an OOM occurs.</p>
<h3 id="experiment">Experiment</h3>
<p>I carried out this DoS attack against an LND node with 8 GB of RAM and 2 GB of swap.
After a few minutes, the node exhausted its RAM and started using swap, and LND’s performance slowed to a crawl.
After about 2 hours, LND exhausted the swap as well and the operating system killed the LND process.</p>
<h2 id="the-mitigation">The Mitigation</h2>
<p>LND 0.18.3 added a <a href="https://github.com/lightningnetwork/lnd/commit/013452cff0788289aae3aa296242c698c9beff9d#diff-fd66292d846960a30b8ff5e63dbf15b846fdb6b55afe6dde63c8c2ebca66674dL22-L513">global semaphore</a> to limit the number of concurrent <code class="language-plaintext highlighter-rouge">gossip_timestamp_filter</code> requests that LND will cooperate with.
While this doesn’t fix LND’s excessive memory usage per request, it does limit the global impact on memory usage, which is enough to protect against this DoS attack.</p>
<h2 id="discovery">Discovery</h2>
<p>This vulnerability was discovered while looking at how LND handles various peer messages.</p>
<h3 id="timeline">Timeline</h3>
<ul>
<li><strong>2023-07-13:</strong> Vulnerability reported to the LND security mailing list.</li>
<li><strong>2023-12-11:</strong> Failed <a href="https://github.com/lightningnetwork/lnd/pull/8030/commits/a242ad5acb6b46e82ef839be84b0695b2de089a7#diff-5321d5dad7ab003eff5e595aef273619fd9c22586f28f1b141940932b93c6ec7">attempt</a> at a stealth mitigation, which could be bypassed by using multiple node IDs when carrying out the attack.</li>
<li><strong>2023-12-11:</strong> Emailed the security mailing list again, explaining the problem with the attempted mitigation.</li>
<li><strong>2024-08-27:</strong> Proper mitigation <a href="https://github.com/lightningnetwork/lnd/commit/013452cff0788289aae3aa296242c698c9beff9d#diff-fd66292d846960a30b8ff5e63dbf15b846fdb6b55afe6dde63c8c2ebca66674dL22-L513">merged</a>.</li>
<li><strong>2024-09-12:</strong> LND 0.18.3 released containing the fix.</li>
<li><strong>2025-07-22:</strong> <a href="https://github.com/gijswijs">Gijs</a> gives the OK to disclose publicly.</li>
<li><strong>2025-07-22:</strong> Public disclosure.</li>
</ul>
<h2 id="prevention">Prevention</h2>
<p>This vulnerability has existed ever since gossip filtering was added to LND in 2018.
The <a href="https://github.com/lightningnetwork/lnd/pull/1106">pull request</a> that added the feature contained over 5k lines of new code and received only minor review feedback.
It seems that no one was thinking adversarially about the new code at that time, and apparently no one has re-evaluated the code since then.</p>
<p>While it’s understandable that developers were more focused on building features and shipping quickly in the early days of the Lightning Network, I think it is long overdue that a shift is made to more careful development.
Engineering with security in mind is slower and more difficult, but in the long run it pays dividends in the form of greater user trust and disasters avoided.</p>
<h2 id="takeaways">Takeaways</h2>
<ul>
<li>Update to at least LND 0.18.3 or set <code class="language-plaintext highlighter-rouge">ignore-historical-gossip-filters=true</code> to protect your node.</li>
<li>More investment in Lightning security is needed.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/lnd-gossip-timestamp-filter-dos/">LND: gossip_timestamp_filter DoS</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on July 22, 2025.</p>
https://morehouse.github.io/lightning/lnd-deadline-aware-budget-sweeper2025-03-11T00:00:00-00:002025-03-11T00:00:00-05:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>Starting with <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.18.0-beta">v0.18.0</a>, LND has a completely rewritten <em>sweeper</em> subsystem for managing transaction batching and fee bumping.
The new sweeper uses HTLC deadlines and fee budgets to compute a <em>fee rate curve</em>, dynamically adjusting fees (fee bumping) to prioritize urgent transactions.
This new fee bumping strategy has some nice security benefits and is something other Lightning implementations should consider adopting.</p>
<h1 id="background">Background</h1>
<p>When an unreliable (or malicious) Lightning node goes offline while HTLCs are in flight, the other node in the channel can no longer claim the HTLCs <em>off chain</em> and will eventually have to force close and claim the HTLCs <em>on chain</em>.
When this happens, it is critical that all HTLCs are claimed before certain deadlines:</p>
<ul>
<li>Incoming HTLCs need to be claimed before their timelocks expire; otherwise, the channel counterparty can submit a competing timeout claim.</li>
<li>Outgoing HTLCs need to be claimed before their corresponding upstream HTLCs expire; otherwise, the upstream node can reclaim them on chain.</li>
</ul>
<p>If HTLCs are not claimed before their deadlines, they can be entirely lost (or stolen).</p>
<p>Thus Lightning nodes need to pay enough transaction fees to ensure timely confirmation of their commitment and HTLC transactions.
At the same time, nodes don’t want to <em>overpay</em> the fees, as these fees can become a major cost for node operators.</p>
<p>The solution implemented by all Lightning nodes is to start with a relatively low fee rate for these transactions and then use RBF to increase the fee rate as deadlines get closer.</p>
<h1 id="rbf-strategies">RBF Strategies</h1>
<p>Each node implementation uses a slightly different algorithm for choosing RBF fee rates, but in general there’s two main strategies:</p>
<ul>
<li>external fee rate estimators</li>
<li>exponential bumping</li>
</ul>
<h2 id="external-fee-rate-estimators">External Fee Rate Estimators</h2>
<p>This strategy chooses fee rates based on Bitcoin Core’s (or some other) fee rate estimator.
The estimator is queried with the HTLC deadline as the confirmation target, and the returned fee rate is used for commitment and HTLC transactions.
Typically the estimator is requeried every block to update fee rates and RBF any unconfirmed transactions.</p>
<p><a href="https://github.com/ElementsProject/lightning/blob/b5eef8af4db9f2a58f435bb5beb54299b2800e67/lightningd/chaintopology.c#L419-L440">CLN</a> and <a href="https://github.com/lightningnetwork/lnd/blob/f8211a2c3b3d2112159cd119bd7674743336c661/sweep/sweeper.go#L470-L493">LND</a> prior to v0.18.0 use this strategy exclusively.
<a href="https://github.com/ACINQ/eclair/blob/95bbf063c9283b525c2bf9f37184cfe12c860df1/eclair-core/src/main/scala/fr/acinq/eclair/channel/publish/ReplaceableTxPublisher.scala#L221-L248">eclair</a> uses this strategy until deadlines are within 6 blocks, after which it switches to exponential bumping.
<a href="https://github.com/lightningdevkit/rust-lightning/blob/3a5f4282468e6148e592e324c2a72405bdb4b193/lightning/src/chain/package.rs#L1361-L1369">LDK</a> uses a combined strategy that sometimes uses the fee rate from the estimator and other times uses exponential bumping.</p>
<h2 id="exponential-bumping">Exponential Bumping</h2>
<p>In this strategy, the fee rate estimator is used to determine the initial fee rate, after which a fixed multiplier is used to increase fee rates for each RBF transaction.</p>
<p><a href="https://github.com/ACINQ/eclair/blob/95bbf063c9283b525c2bf9f37184cfe12c860df1/eclair-core/src/main/scala/fr/acinq/eclair/channel/publish/ReplaceableTxPublisher.scala#L221-L248">eclair</a> uses this strategy when deadlines are within 6 blocks, increasing fee rates by 20% each block while capping the total fees paid at the value of the HTLC being claimed.
When <a href="https://github.com/lightningdevkit/rust-lightning/blob/3a5f4282468e6148e592e324c2a72405bdb4b193/lightning/src/chain/package.rs#L1361-L1369">LDK</a> uses this strategy, it increases fee rates by 25% on each RBF.</p>
<h2 id="problems">Problems</h2>
<p>While external fee rate estimators can be helpful, they’re not perfect.
And relying on them too much can lead to missed deadlines when unusual things are happening in the mempool or with miners (e.g., increasing mempool congestion, pinning, replacement cycling, miner censorship).
In such situations, higher-than-estimated fee rates may be needed to actually get transactions confirmed.
Exponential bumping strategies help here but can still be ineffective if the original fee rate was too low.</p>
<h1 id="the-deadline-and-budget-aware-rbf-strategy">The Deadline and Budget Aware RBF Strategy</h1>
<p>LND’s new sweeper subsystem, released in v0.18.0, takes a novel approach to RBFing commitment and HTLC transactions.
The system was designed around a key observation: for each HTLC on a commitment transaction, there are specific <em>deadline</em> and <em>budget</em> constraints for claiming that HTLC.
The <strong>deadline</strong> is the block height by which the node needs to confirm the claim transaction for the HTLC.
The <strong>budget</strong> is the maximum absolute fee the node operator is willing to pay to sweep the HTLC by the deadline.
In practice, the budget is likely to be a fixed proportion of the HTLC value (i.e. operators are willing to pay more fees for larger HTLCs), so LND’s budget <a href="https://docs.lightning.engineering/lightning-network-tools/lnd/sweeper">configuration parameters</a> are based on proportions.</p>
<p>The sweeper operates by aggregating HTLC claims with matching deadlines into a single batched transaction.
The budget for the batched transaction is calculated as the sum of the budgets for the individual HTLCs in the transaction.
Based on the transaction budget and deadline, a <strong>fee function</strong> is computed that determines how much of the budget is spent as the deadline approaches.
By default, a linear fee function is used which starts at a low fee (determined by the minimum relay fee rate or an external estimator) and ends with the total budget being allocated to fees when the deadline is one block away.
The initial batched transaction is published and a “fee bumper” is assigned to monitor confirmation status in the background.
For each block the transaction remains unconfirmed, the fee bumper broadcasts a new transaction with a higher fee rate determined by the fee function.</p>
<p>The sweeper architecture looks like this:</p>
<p><img src="/images/lnd_deadline_aware_budget_sweeper_header.png" alt="channel funding diagram" /></p>
<p>For more details about LND’s new sweeper, see the <a href="https://github.com/lightningnetwork/lnd/blob/master/sweep/README.md">technical documentation</a>.
In this blog post, we’ll focus mostly on the sweeper’s deadline and budget aware RBF strategy.</p>
<h2 id="benefits">Benefits</h2>
<p>LND’s new sweeper system provides greater security against replacement cycling, pinning, and other adversarial or unexpected scenarios.
It also fixed some bad bugs and vulnerabilities present with LND’s previous sweeper system.</p>
<h3 id="replacement-cycling-defense">Replacement Cycling Defense</h3>
<p>Transaction rebroadcasting is a simple mitigation against <a href="https://bitcoinops.org/en/topics/replacement-cycling/">replacement cycling attacks</a> that has been adopted by all implementations.
However, rebroadcasting alone does not guarantee that such attacks become uneconomical, especially when HTLC values are much larger than the fees Lightning nodes are willing to pay when claiming them on chain.
By setting fee budgets in proportion to HTLC values, LND’s new sweeper is able to provide much stronger guarantees that any replacement cycling attacks will be uneconomical.</p>
<h4 id="cost-of-replacement-cycling-attacks">Cost of Replacement Cycling Attacks</h4>
<p>With LND’s default parameters an attacker must generally spend at least 20x the value of the HTLC to successfully carry out a replacement cycling attack.</p>
<p>Default parameters:</p>
<ul>
<li>fee budget: 50% of HTLC value</li>
<li>CLTV delta: 80 blocks</li>
</ul>
<p>Assuming the attacker must do a minimum of one replacement per block:</p>
\[attack\_cost \ge \sum_{t = 0}^{80} fee\_function(t)\]
\[attack\_cost \ge \sum_{t = 0}^{80} 0.5 \cdot htlc\_value \cdot \frac{t}{80}\]
\[attack\_cost \ge 20 \cdot htlc\_value\]
<p>LND also rebroadcasts transactions every minute by default, so in practice the attacker must do ~10 replacements per block, making the cost closer to 200x the HTLC value.</p>
<h3 id="partial-pinning-defense">Partial Pinning Defense</h3>
<p>Because LND’s new default RBF strategy pays up to 50% of the HTLC value, LND now has a much greater ability to outbid <a href="https://bitcoinops.org/en/topics/transaction-pinning/">pinning attacks</a>, especially for larger HTLCs.
It is unfortunate that significant fees need to be burned in this case, but the end result is still better than losing the full value of the HTLC.</p>
<h3 id="reduced-reliance-on-fee-rate-estimators">Reduced Reliance on Fee Rate Estimators</h3>
<p>As explained earlier, fee rate estimators are not always accurate, especially when mempool conditions are changing rapidly.
In these situations, it can be very beneficial to use a simpler RBF strategy, especially when deadlines are approaching.
LDK and eclair use exponential bumping in these scenarios, which helps in many cases.
But ultimately the fee rate curve for an exponential bumping strategy still depends heavily on the starting fee rate, and if that fee rate is too low then deadlines can be missed.
The exponential bumping strategy also ignores the value of the HTLC being claimed, which means that larger HTLCs get the same fee rates as smaller HTLCs, even when deadlines are getting close.</p>
<p>LND’s budget-based approach takes HTLC values into consideration when establishing the fee rate curve, ensuring that budgets are never exceeded and that HTLCs are never lost before an attempt to spend the full budget has been made.
As such, the budget-based approach provides more consistent results and greater security in unexpected or adversarial situations.</p>
<h3 id="lnd-specific-bug-and-vulnerability-fixes">LND-Specific Bug and Vulnerability Fixes</h3>
<p>LND’s new sweeper fixed some bad bugs and vulnerabilities that existed with the previous sweeper.</p>
<h4 id="fee-bump-failures">Fee Bump Failures</h4>
<p>Previously, LND had an inconsistent approach to broadcasting and fee bumping urgent transactions.
In some places transactions would get broadcast with a specific confirmation target and would never be fee bumped again.
In other places transactions would be RBF’d if the fee rate estimator determined that mempool fee rates had gone up, but the <em>confirmation target</em> given to the estimator would not be adjusted as deadlines approached.</p>
<p>Perhaps the worst of these fee bumping failures was a <a href="https://github.com/lightningnetwork/lnd/issues/8522">bug</a> reported by <a href="https://github.com/C-Otto">Carsten Otto</a>, where LND would fail to use the anchor output to CPFP a commitment transaction if the initial HTLC deadlines were far enough in the future.
While this behavior is desirable to save on fees initially, it becomes a major problem when deadlines get closer and the commitment hasn’t confirmed on its own.
Because LND did not adjust confirmation targets as deadlines approached, the commitment transaction would remain un-CPFP’d and could fail to confirm before HTLCs expired and funds could be lost.
To make matters worse, the bug was trivial for an attacker to exploit.</p>
<p>LND’s sweeper rewrite took the opportunity to correct and unify all the transaction broadcasting and fee bumping logic in one place and fix all of these fee bumping failures at once.</p>
<h4 id="invalid-batching">Invalid Batching</h4>
<p>LND’s previous sweeper also sometimes generated invalid or unsafe transactions when batching inputs together.
This could happen in a couple ways:</p>
<ul>
<li>Inputs that were invalid or had been double-spent could be batched with urgent HTLC claims, making the whole transaction invalid.</li>
<li>Anchor spends could be <a href="https://github.com/lightningnetwork/lnd/issues/8433">batched together</a>, thereby violating the CPFP carve out and enabling channel counterparties to pin commitment transactions.</li>
</ul>
<p>Rather than addressing these issues directly, the previous sweeper would use <em>exponential backoff</em> to regroup inputs after random delays and hope for a valid transaction.
If another invalid transaction occurred, longer delays would be used before the next regrouping.
Eventually, deadlines could be missed and funds lost.</p>
<p>LND’s new sweeper fixed these issues by being more careful about which inputs could be grouped together and by removing double-spent inputs from transactions that failed to broadcast.</p>
<h2 id="risks">Risks</h2>
<p>The security of a Lightning node depends heavily on its ability to resolve HTLCs on chain when necessary.
And unfortunately proper on-chain resolution can be tricky to get right (see <a href="https://morehouse.github.io/lightning/ldk-invalid-claims-liquidity-griefing/">1</a>, <a href="https://morehouse.github.io/lightning/ldk-duplicate-htlc-force-close-griefing/">2</a>, <a href="https://morehouse.github.io/lightning/lnd-excessive-failback-exploit/">3</a>).
Making changes to the existing on-chain logic runs the risk of introducing new bugs and vulnerabilities.</p>
<p>For example, during code reviews of LND’s new sweeper there were many serious bugs discovered and fixed, ranging from catastrophic <a href="https://github.com/lightningnetwork/lnd/issues/8738">fee function failures</a> to new <a href="https://github.com/lightningnetwork/lnd/pull/8514#discussion_r1554270229">fund-stealing exploits</a> and more (<a href="https://github.com/lightningnetwork/lnd/pull/8148#discussion_r1542012530">1</a>, <a href="https://github.com/lightningnetwork/lnd/pull/8424#pullrequestreview-1961358576">2</a>, <a href="https://github.com/lightningnetwork/lnd/pull/8422#discussion_r1528832418">3</a>, <a href="https://github.com/lightningnetwork/lnd/issues/8715">4</a>, <a href="https://github.com/lightningnetwork/lnd/issues/8737">5</a>, <a href="https://github.com/lightningnetwork/lnd/issues/8741">6</a>).
Node implementers should tread carefully when touching these parts of the codebase and remember that simplicity is often the best security.</p>
<h1 id="conclusion">Conclusion</h1>
<p>LND’s new deadline-aware budget sweeper provides more secure fee bumping in adversarial situations and more consistent behavior when mempools are rapidly changing.
Other implementations should consider incorporating budget awareness into their fee bumping strategies to improve defenses against replacement cycling and pinning attacks, and to reduce reliance on external fee estimators.
At the same time, implementers would do well to avoid complete rewrites of the on-chain logic and instead keep the changes small and review them well.</p>
<p><a href="https://morehouse.github.io/lightning/lnd-deadline-aware-budget-sweeper/">LND's Deadline-Aware Budget Sweeper</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on March 11, 2025.</p>
https://morehouse.github.io/lightning/lnd-excessive-failback-exploit2025-03-04T00:00:00-00:002025-03-04T00:00:00-06:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>LND 0.17.5 and below contain a bug in the on-chain resolution logic that can be exploited to steal funds.
For the attack to be practical the attacker must be able to force a restart of the victim node, perhaps via an unpatched DoS vector.
Update to at least <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.18.0-beta">LND 0.18.0</a> to protect your node.</p>
<h1 id="background">Background</h1>
<p>Whenever a new payment is routed through a lightning channel, or whenever an existing payment is settled on the channel, the parties in that channel need to update their commitment transactions to match the new set of active HTLCs.
During the course of these regular commitment updates, there is always a brief moment where one of the parties holds two valid commitment transactions.
Normally that party immediately revokes the older commitment transaction after it receives a signature for the new one, bringing their number of valid commitment transactions back down to one.
But for that brief moment, the other party in the channel must be able to handle the case where <em>either</em> of the valid commitments confirms on chain.</p>
<p>As part of this handling, nodes need to detect when any currently outstanding HTLCs are missing from the confirmed commitment transaction so that those HTLCs can be failed backward on the upstream channel.</p>
<h1 id="the-excessive-failback-bug">The Excessive Failback Bug</h1>
<p>Prior to v0.18.0, LND’s <a href="https://github.com/lightningnetwork/lnd/blob/f4035ade05d0c44b441f2fe26af89584a76a55d6/contractcourt/channel_arbitrator.go#L2079-L2151">logic</a> to detect and fail back missing HTLCs works like this:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">failBackMissingHtlcs</span><span class="p">(</span><span class="n">confirmedCommit</span> <span class="n">Commitment</span><span class="p">)</span> <span class="p">{</span>
<span class="n">currentCommit</span><span class="p">,</span> <span class="n">pendingCommit</span> <span class="o">:=</span> <span class="n">getValidCounterpartyCommitments</span><span class="p">()</span>
<span class="k">var</span> <span class="n">danglingHtlcs</span> <span class="n">HtlcSet</span>
<span class="k">if</span> <span class="n">confirmedCommit</span> <span class="o">==</span> <span class="n">pendingCommit</span> <span class="p">{</span>
<span class="n">danglingHtlcs</span> <span class="o">=</span> <span class="n">currentCommit</span><span class="o">.</span><span class="n">Htlcs</span><span class="p">()</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">danglingHtlcs</span> <span class="o">=</span> <span class="n">pendingCommit</span><span class="o">.</span><span class="n">Htlcs</span><span class="p">()</span>
<span class="p">}</span>
<span class="n">confirmedHtlcs</span> <span class="o">:=</span> <span class="n">confirmedCommit</span><span class="o">.</span><span class="n">Htlcs</span><span class="p">()</span>
<span class="n">missingHtlcs</span> <span class="o">:=</span> <span class="n">danglingHtlcs</span><span class="o">.</span><span class="n">SetDifference</span><span class="p">(</span><span class="n">confirmedHtlcs</span><span class="p">)</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">htlc</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">missingHtlcs</span> <span class="p">{</span>
<span class="n">failBackHtlc</span><span class="p">(</span><span class="n">htlc</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>LND compares the HTLCs present on the confirmed commitment transaction against the HTLCs present on the counterparty’s <em>other</em> valid commitment (if there is one) and fails back any HTLCs that are missing from the confirmed commitment.
This logic is mostly correct, but it does the wrong thing in one particular scenario:</p>
<ol>
<li>LND forwards an HTLC <code class="language-plaintext highlighter-rouge">H</code> to the counterparty, signing commitment <code class="language-plaintext highlighter-rouge">C0</code> with <code class="language-plaintext highlighter-rouge">H</code> added as an output. The previous commitment is revoked.</li>
<li>The counterparty claims <code class="language-plaintext highlighter-rouge">H</code> by revealing the preimage to LND.</li>
<li>LND forwards the preimage upstream to start the process of claiming the incoming HTLC.</li>
<li>LND signs a new counterparty commitment <code class="language-plaintext highlighter-rouge">C1</code> with <code class="language-plaintext highlighter-rouge">H</code> removed and its value added to the counterparty’s balance.</li>
<li>The counterparty refuses to revoke <code class="language-plaintext highlighter-rouge">C0</code>.</li>
<li>The counterparty broadcasts and confirms <code class="language-plaintext highlighter-rouge">C1</code>.</li>
</ol>
<p>In this case, LND compares the confirmed commitment <code class="language-plaintext highlighter-rouge">C1</code> against the other valid commitment <code class="language-plaintext highlighter-rouge">C0</code> and determines that <code class="language-plaintext highlighter-rouge">H</code> is missing from the confirmed commitment.
As a result, LND incorrectly determines that <code class="language-plaintext highlighter-rouge">H</code> needs to be failed back upstream, and executes the following <a href="https://github.com/lightningnetwork/lnd/blob/f4035ade05d0c44b441f2fe26af89584a76a55d6/htlcswitch/switch.go#L1822-L1872">logic</a>:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">failBackHtlc</span><span class="p">(</span><span class="n">htlc</span> <span class="n">Htlc</span><span class="p">)</span> <span class="p">{</span>
<span class="n">markFailedInDatabase</span><span class="p">(</span><span class="n">htlc</span><span class="p">)</span>
<span class="n">incomingHtlc</span><span class="p">,</span> <span class="n">ok</span> <span class="o">:=</span> <span class="n">incomingHtlcMap</span><span class="p">[</span><span class="n">htlc</span><span class="p">]</span>
<span class="k">if</span> <span class="o">!</span><span class="n">ok</span> <span class="p">{</span>
<span class="n">log</span><span class="p">(</span><span class="s">"Incoming HTLC has already been resolved"</span><span class="p">)</span>
<span class="k">return</span>
<span class="p">}</span>
<span class="n">failHtlc</span><span class="p">(</span><span class="n">incomingHtlc</span><span class="p">)</span>
<span class="nb">delete</span><span class="p">(</span><span class="n">incomingHtlcMap</span><span class="p">,</span> <span class="n">htlc</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In this case, the preimage for the incoming HTLC was already sent upstream (step 3), so the corresponding entry in <code class="language-plaintext highlighter-rouge">incomingHtlcMap</code> has already been removed.
Thus LND catches the “double resolution” and returns from <code class="language-plaintext highlighter-rouge">failBackHtlc</code> without sending the incorrect failure message upstream.
Unfortunately, LND only catches the double resolution <em>after</em> <code class="language-plaintext highlighter-rouge">H</code> is marked as failed in the database.
As a result, when LND next restarts it will reconstruct its state from the database and determine that <code class="language-plaintext highlighter-rouge">H</code> still needs to be failed back.
If the incoming HTLC hasn’t been fully resolved with the upstream node, the reconstructed <code class="language-plaintext highlighter-rouge">incomingHtlcMap</code> <em>will</em> have an entry for <code class="language-plaintext highlighter-rouge">H</code> this time, and LND will incorrectly send a failure message upstream.</p>
<p>At that point, the downstream node will have claimed <code class="language-plaintext highlighter-rouge">H</code> via preimage while the upstream node will have had the HTLC refunded to them, causing LND to lose the full value of <code class="language-plaintext highlighter-rouge">H</code>.</p>
<h1 id="stealing-htlcs">Stealing HTLCs</h1>
<p>Consider the following topology, where <code class="language-plaintext highlighter-rouge">B</code> is the victim and <code class="language-plaintext highlighter-rouge">M0</code> and <code class="language-plaintext highlighter-rouge">M1</code> are controlled by the attacker.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>M0 -- B -- M1
</code></pre></div></div>
<p>The attacker can steal funds as follows:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">M0</code> routes a large HTLC along the path <code class="language-plaintext highlighter-rouge">M0 -> B -> M1</code>.</li>
<li><code class="language-plaintext highlighter-rouge">M0</code> goes offline.</li>
<li><code class="language-plaintext highlighter-rouge">M1</code> claims the HTLC from <code class="language-plaintext highlighter-rouge">B</code> by revealing the preimage, receives a new commitment signature from <code class="language-plaintext highlighter-rouge">B</code>, and then refuses to revoke the previous commitment.</li>
<li><code class="language-plaintext highlighter-rouge">B</code> attempts to claim the upstream HTLC from <code class="language-plaintext highlighter-rouge">M0</code> but can’t because <code class="language-plaintext highlighter-rouge">M0</code> is offline.</li>
<li><code class="language-plaintext highlighter-rouge">M1</code> force closes the <code class="language-plaintext highlighter-rouge">B-M1</code> channel using their new commitment, thus triggering the excessive failback bug.</li>
<li>The attacker crashes <code class="language-plaintext highlighter-rouge">B</code> using an unpatched DoS vector.</li>
<li><code class="language-plaintext highlighter-rouge">M0</code> comes back online.</li>
<li><code class="language-plaintext highlighter-rouge">B</code> restarts, loads HTLC resolution data from the database, and incorrectly fails the HTLC with <code class="language-plaintext highlighter-rouge">M0</code>.</li>
</ol>
<p>At this point, the attacker has succeeded in stealing the HTLC from <code class="language-plaintext highlighter-rouge">B</code>.
<code class="language-plaintext highlighter-rouge">M0</code> got the HTLC refunded, while <code class="language-plaintext highlighter-rouge">M1</code> got the value of the HTLC added to their balance on the confirmed commitment.</p>
<h1 id="the-fix">The Fix</h1>
<p>The excessive failback bug was fixed by a <a href="https://github.com/lightningnetwork/lnd/commit/6f0c2b5bab68c156262c1e8e2286f9a6b36bbbd7#diff-a0b8064876b1b1d6085fa7ffdbfd38c81cb06c1ca3f34a08dbaacba203cda3ebR2142-R2155">small change</a> to prevent failback of HTLCs for which the preimage is already known.
The updated logic now explicitly checks for preimage availability before failing back each HTLC:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">failBackMissingHtlcs</span><span class="p">(</span><span class="n">confirmedCommit</span> <span class="n">Commitment</span><span class="p">)</span> <span class="p">{</span>
<span class="n">currentCommit</span><span class="p">,</span> <span class="n">pendingCommit</span> <span class="o">:=</span> <span class="n">getValidCounterpartyCommitments</span><span class="p">()</span>
<span class="k">var</span> <span class="n">danglingHtlcs</span> <span class="n">HtlcSet</span>
<span class="k">if</span> <span class="n">confirmedCommit</span> <span class="o">==</span> <span class="n">pendingCommit</span> <span class="p">{</span>
<span class="n">danglingHtlcs</span> <span class="o">=</span> <span class="n">currentCommit</span><span class="o">.</span><span class="n">Htlcs</span><span class="p">()</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">danglingHtlcs</span> <span class="o">=</span> <span class="n">pendingCommit</span><span class="o">.</span><span class="n">Htlcs</span><span class="p">()</span>
<span class="p">}</span>
<span class="n">confirmedHtlcs</span> <span class="o">:=</span> <span class="n">confirmedCommit</span><span class="o">.</span><span class="n">Htlcs</span><span class="p">()</span>
<span class="n">missingHtlcs</span> <span class="o">:=</span> <span class="n">danglingHtlcs</span><span class="o">.</span><span class="n">SetDifference</span><span class="p">(</span><span class="n">confirmedHtlcs</span><span class="p">)</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">htlc</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">missingHtlcs</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">preimageIsKnown</span><span class="p">(</span><span class="n">htlc</span><span class="o">.</span><span class="n">PaymentHash</span><span class="p">())</span> <span class="p">{</span>
<span class="k">continue</span> <span class="c">// Don't fail back HTLCs we can claim.</span>
<span class="p">}</span>
<span class="n">failBackHtlc</span><span class="p">(</span><span class="n">htlc</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">preimageIsKnown</code> check prevents <code class="language-plaintext highlighter-rouge">failBackHtlc</code> from being called when the preimage is known, so such HTLCs are never failed backward or marked as failed in the database.
On restart, the incorrect failback behavior no longer occurs.</p>
<p>The patch was hidden in a <a href="https://github.com/lightningnetwork/lnd/pull/8667">massive rewrite</a> of LND’s sweeper system and was released in LND 0.18.0.</p>
<h1 id="discovery">Discovery</h1>
<p>This vulnerability was discovered during an audit of LND’s <code class="language-plaintext highlighter-rouge">contractcourt</code> package, which handles on-chain resolution of force closures.</p>
<h2 id="timeline">Timeline</h2>
<ul>
<li><strong>2024-03-20:</strong> Vulnerability reported to the LND security mailing list.</li>
<li><strong>2024-04-19:</strong> Fix <a href="https://github.com/lightningnetwork/lnd/commit/6f0c2b5bab68c156262c1e8e2286f9a6b36bbbd7#diff-a0b8064876b1b1d6085fa7ffdbfd38c81cb06c1ca3f34a08dbaacba203cda3ebR2142-R2155">merged</a>.</li>
<li><strong>2024-05-30:</strong> LND 0.18.0 released containing the fix.</li>
<li><strong>2025-02-17:</strong> <a href="https://github.com/gijswijs">Gijs</a> gives the OK to disclose publicly in March.</li>
<li><strong>2025-03-04:</strong> Public disclosure.</li>
</ul>
<h1 id="prevention">Prevention</h1>
<p>It appears all other lightning implementations have independently discovered and handled the corner case that LND mishandled:</p>
<ul>
<li>CLN <a href="https://github.com/ElementsProject/lightning/commit/6c96bcacd763cf5cd81226e3b161be161c3818ed#diff-d161f42609a169a38f366a0628bceefa6bed62eb9af20082c5ad08add899a2fbR863-R864">added</a> a preimage check to the failback logic in 2018.</li>
<li>eclair <a href="https://github.com/ACINQ/eclair/commit/c7e47ba751dc1ed4a96bcb4b7e5fcd49d78cfb78#diff-97779917bce211cd035ebf8f9f265a7ecece4efcd1861c7bab05e0113dd86b06R1310-R1318">introduced</a> failback logic in 2023 that filtered upstream HTLCs by preimage availability.</li>
<li>LDK <a href="https://github.com/lightningdevkit/rust-lightning/commit/0ad1f4c943bdc9037d0c43d1b74c745befa065f0#diff-fec072136ddc5ad6b84dd8e4d2368e9e793f994c8bcccf011508038a81eb408aR1988-R1990">added</a> a preimage check to the failback logic in 2023.</li>
</ul>
<p>Yet the BOLT specification has not been updated to describe this corner case.
In fact, by a strict interpretation the <a href="https://github.com/lightning/bolts/blob/ccfa38ed4f592c3711156bb4ded77f44ec01101d/05-onchain.md?plain=1#L407-L410">specification</a> actually requires the <em>incorrect</em> behavior that LND implemented:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## HTLC Output Handling: Remote Commitment, Local Offers
### Requirements
A local node:
- for any committed HTLC that does NOT have an output in this commitment transaction:
- once the commitment transaction has reached reasonable depth:
- MUST fail the corresponding incoming HTLC (if any).
</code></pre></div></div>
<p>It is quite unfortunate that all implementations had to independently discover and correct this bug.
If any single implementation had contributed a small patch to the specification after discovering the issue, it would have at least sparked some discussion about whether the other implementations had considered this corner case.
And if CLN had recognized that the specification needed updating back in 2018, there’s a good chance all other implementations would have handled this case correctly from the start.</p>
<h1 id="takeaways">Takeaways</h1>
<ul>
<li>Keeping specifications up-to-date can improve security for all implementations.</li>
<li>Update to at least LND 0.18.0 to protect your funds.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/lnd-excessive-failback-exploit/">LND: Excessive Failback Exploit</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on March 04, 2025.</p>
https://morehouse.github.io/lightning/ldk-duplicate-htlc-force-close-griefing2025-01-29T00:00:00-00:002025-01-29T00:00:00-06:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>LDK 0.1 and below are vulnerable to a griefing attack that causes all of the victim’s channels to be force closed.
Update to <a href="https://github.com/lightningdevkit/rust-lightning/releases/tag/v0.1.1">LDK 0.1.1</a> to protect your channels.</p>
<h1 id="background">Background</h1>
<p>Whenever a new payment is routed through a lightning channel, or whenever an existing payment is settled on the channel, the parties in that channel need to update their commitment transactions to match the new set of active HTLCs.
During the course of these regular commitment updates, there is always a brief moment where one of the parties holds two valid commitment transactions.
Normally that party immediately revokes the older commitment transaction after it receives a signature for the new one, bringing their number of valid commitment transactions back down to one.
But for that brief moment, the other party in the channel must be able to handle the case where <em>either</em> of the valid commitments confirms on chain.</p>
<p>For this reason, LDK contains logic to detect when there’s a difference between the counterparty’s confirmed commitment transaction and the set of currently outstanding HTLCs.
Any HTLCs missing from the confirmed commitment transaction are considered unrecoverable and are immediately failed backward on the upstream channel, while all other HTLCs are left active until the resolution of the downstream HTLC on chain.</p>
<p>Because the same payment hash and amount can be used for multiple HTLCs (e.g., <a href="https://github.com/lightning/bolts/blob/master/04-onion-routing.md#basic-multi-part-payments">multi-part payments</a>), some extra data is stored to match HTLCs on commitment transactions against the set of outstanding HTLCs.
LDK calls this extra data the “HTLC source” data, and LDK maintains this data for both of the counterparty’s valid commitment transactions.</p>
<h1 id="the-duplicate-htlc-failback-bug">The Duplicate HTLC Failback Bug</h1>
<p>Once a counterparty commitment transaction has been revoked, however, LDK forgets the HTLC source data for that commitment transaction to save memory.
As a result, if a revoked commitment transaction later confirms, LDK must attempt to match commitment transaction HTLCs up to outstanding HTLCs using only payment hashes and amounts.
LDK’s <a href="https://github.com/lightningdevkit/rust-lightning/blob/020be440b6d2dfea41820a137c7b26f43b289290/lightning/src/chain/channelmonitor.rs#L2624-L2684">logic</a> to do this matching works as follows:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">htlc</span><span class="p">,</span> <span class="n">htlc_source</span> <span class="ow">in</span> <span class="n">outstanding_htlcs</span><span class="p">:</span>
<span class="k">if</span> <span class="err">!</span><span class="n">confirmed_commitment_tx</span><span class="p">.</span><span class="n">is_revoked</span><span class="p">()</span> <span class="o">&&</span>
<span class="n">confirmed_commitment_tx</span><span class="p">.</span><span class="n">contains_source</span><span class="p">(</span><span class="n">htlc_source</span><span class="p">):</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">confirmed_commitment_tx</span><span class="p">.</span><span class="n">is_revoked</span><span class="p">()</span> <span class="o">&&</span>
<span class="n">confirmed_commitment_tx</span><span class="p">.</span><span class="n">contains_htlc</span><span class="p">(</span><span class="n">htlc</span><span class="p">.</span><span class="n">payment_hash</span><span class="p">,</span> <span class="n">htlc</span><span class="p">.</span><span class="n">amount</span><span class="p">):</span>
<span class="k">continue</span>
<span class="n">failback_upstream_htlc</span><span class="p">(</span><span class="n">htlc_source</span><span class="p">)</span>
</code></pre></div></div>
<p>Note that this logic short-circuits whenever an outstanding HTLC matches the payment hash and amount of an HTLC on the revoked commitment transaction.
Thus if there are multiple outstanding HTLCs with the same payment hash and amount, a single HTLC on the revoked commitment transaction can prevent all of the duplicate outstanding HTLCs from being failed back immediately.</p>
<p>Those duplicate HTLCs remain outstanding until corresponding downstream HTLCs are resolved on chain.
Except, in this case there’s only one downstream HTLC to resolve on chain, and its resolution only <a href="https://github.com/lightningdevkit/rust-lightning/blob/020be440b6d2dfea41820a137c7b26f43b289290/lightning/src/chain/channelmonitor.rs#L4460-L4468">triggers</a> <em>one</em> of the duplicate HTLCs to be failed upstream.
<strong>All the other duplicate HTLCs are left outstanding indefinitely</strong>.</p>
<h1 id="force-close-griefing">Force Close Griefing</h1>
<p>Consider the following topology, where <code class="language-plaintext highlighter-rouge">B</code> is the victim and the <code class="language-plaintext highlighter-rouge">A_[1..N]</code> nodes are all the nodes that <code class="language-plaintext highlighter-rouge">B</code> has channels with.
<code class="language-plaintext highlighter-rouge">M_1</code> and <code class="language-plaintext highlighter-rouge">M_2</code> are controlled by the attacker.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> -- A_1 --
/ \
M_1 -- ... -- B -- M_2
\ /
-- A_N --
</code></pre></div></div>
<p>The attacker routes <code class="language-plaintext highlighter-rouge">N</code> HTLCs from <code class="language-plaintext highlighter-rouge">M_1</code> to <code class="language-plaintext highlighter-rouge">M_2</code> using the same payment hash and amount for each, with each payment going through a different <code class="language-plaintext highlighter-rouge">A</code> node.
<code class="language-plaintext highlighter-rouge">M_2</code> then confirms a revoked commitment that contains only one of the <code class="language-plaintext highlighter-rouge">N</code> HTLCs.
Due to the duplicate HTLC failback bug, only one of the routed HTLCs gets failed backwards, while the remaining <code class="language-plaintext highlighter-rouge">N-1</code> HTLCs get stuck.</p>
<p>Finally, after upstream HTLCs expire, all the <code class="language-plaintext highlighter-rouge">A</code> nodes with stuck HTLCs force close their channels with <code class="language-plaintext highlighter-rouge">B</code> to reclaim the stuck HTLCs.</p>
<h2 id="attack-cost">Attack Cost</h2>
<p>The attacker must broadcast a revoked commitment transaction, thereby forfeiting their channel balance.
But the size of the channel can be minimal, and the attacker can spend their balance down to the 1% reserve before executing the attack.
As a result, the cost of the attack can be negligible compared to the damage caused.</p>
<h1 id="the-fix">The Fix</h1>
<p>Starting in v0.1.1, LDK <a href="https://github.com/lightningdevkit/rust-lightning/pull/3556">preemptively fails back</a> HTLCs when their deadlines approach if the downstream channel has been force closed or is in the process of force closing.
While the main purpose of this behavior is to prevent cascading force closures when mempool fee rates spike, it also has a nice side effect of ensuring that duplicate HTLCs always get failed back eventually after a revoked commitment transaction confirms.
As a result, the duplicate HTLCs are never stuck long enough that the upstream nodes need to force close to reclaim them.</p>
<h1 id="discovery">Discovery</h1>
<p>This vulnerability was discovered during an audit of LDK’s chain module.</p>
<h2 id="timeline">Timeline</h2>
<ul>
<li><strong>2024-12-07:</strong> Vulnerability reported to the LDK security mailing list.</li>
<li><strong>2025-01-27:</strong> Fix <a href="https://github.com/lightningdevkit/rust-lightning/pull/3556">merged</a>.</li>
<li><strong>2025-01-28:</strong> LDK 0.1.1 released containing the fix, with public disclosure in <a href="https://github.com/lightningdevkit/rust-lightning/releases/tag/v0.1.1">release notes</a>.</li>
<li><strong>2025-01-29:</strong> Detailed description of vulnerability published.</li>
</ul>
<h1 id="prevention">Prevention</h1>
<p>Prior to the <a href="https://github.com/lightningdevkit/rust-lightning/commit/70ae45fea030ed1d2064918c7b023aa142387bc8">introduction</a> of the duplicate HTLC failback bug in 2022, LDK would immediately fail back <em>all</em> outstanding HTLCs once a revoked commitment reached 6 confirmations.
This was the safe and conservative thing to do – HTLC source information was missing, so proper matching of HTLCs could not be done.
And since all outputs on the revoked commitment and HTLC transactions could be claimed via revocation key, there was no concern about losing funds if the downstream counterparty confirmed an HTLC claim before LDK could.</p>
<h2 id="better-documentation">Better Documentation</h2>
<p>Considering that LDK previously had a <a href="https://github.com/lightningdevkit/rust-lightning/commit/70ae45fea030ed1d2064918c7b023aa142387bc8#diff-b30410f22a759d5e664e05938af7ef2edd244c8a7872e7ada376055ff130088bL7296-L7314">test</a> explicitly checking for the original (conservative) failback behavior, it does appear that the original behavior was understood and intentional.
Unfortunately the original author did not document the <em>reason</em> for the original behavior anywhere in the code or test.</p>
<p>A single comment in the code would likely have been enough to prevent later contributors from introducing the buggy behavior:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// We fail back *all* outstanding HTLCs when a revoked commitment</span>
<span class="c1">// confirms because we don't have HTLC source information for revoked</span>
<span class="c1">// commitments, and attempting to match up HTLCs based on payment hashes</span>
<span class="c1">// and amounts is inherently unreliable.</span>
<span class="c1">//</span>
<span class="c1">// Failing back all HTLCs after a 6 block delay is safe in this case</span>
<span class="c1">// since we can use the revocation key to reliably claim all funds in the</span>
<span class="c1">// downstream channel and therefore won't lose funds overall.</span>
</code></pre></div></div>
<h1 id="takeaways">Takeaways</h1>
<ul>
<li>Code documentation matters for preventing bugs.</li>
<li>Update to LDK 0.1.1 for the vulnerability fix.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/ldk-duplicate-htlc-force-close-griefing/">LDK: Duplicate HTLC Force Close Griefing</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on January 29, 2025.</p>
https://morehouse.github.io/lightning/ldk-invalid-claims-liquidity-griefing2025-01-23T00:00:00-00:002025-01-23T00:00:00-06:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>LDK 0.0.125 and below are vulnerable to a liquidity griefing attack against anchor channels.
The attack locks up funds such that they can only be recovered by manually constructing and broadcasting a valid claim transaction.
Affected users can unlock their funds by upgrading to <a href="https://github.com/lightningdevkit/rust-lightning/releases/tag/v0.1">LDK 0.1</a> and replaying the sequence of commitment and HTLC transactions that led to the lock up.</p>
<h1 id="background">Background</h1>
<p>When a channel is force closed, LDK creates and broadcasts transactions to claim any HTLCs it can from the commitment transaction that confirmed on chain.
To save on fees, some HTLC claims are aggregated and broadcast together in the same transaction.</p>
<p>If the channel counterparty is able to get a competing HTLC claim confirmed first, it can cause one of LDK’s aggregated transactions to become invalid, since the corresponding HTLC input has already been spent by the counterparty’s claim.
LDK contains logic to detect this scenario and remove the already-claimed input from its aggregated claim transaction.
When everything works correctly, the aggregated transaction becomes valid again and LDK is able to claim the remaining HTLCs.</p>
<h1 id="the-invalid-claims-bug">The Invalid Claims Bug</h1>
<p>Prior to LDK 0.1, the logic to detect conflicting claims works like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">confirmed_transaction</span> <span class="ow">in</span> <span class="n">confirmed_block</span><span class="p">:</span>
<span class="k">for</span> <span class="nb">input</span> <span class="ow">in</span> <span class="n">confirmed_transaction</span><span class="p">:</span>
<span class="k">if</span> <span class="n">claimable_outpoints</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="nb">input</span><span class="p">.</span><span class="n">prevout</span><span class="p">):</span>
<span class="n">agg_tx</span> <span class="o">=</span> <span class="n">get_aggregated_transaction_from_outpoint</span><span class="p">(</span><span class="nb">input</span><span class="p">.</span><span class="n">prevout</span><span class="p">)</span>
<span class="n">agg_tx</span><span class="p">.</span><span class="n">remove_matching_inputs</span><span class="p">(</span><span class="n">confirmed_transaction</span><span class="p">)</span>
<span class="k">break</span> <span class="c1"># This is the bug.
</span></code></pre></div></div>
<p>Note that this logic stops processing a confirmed transaction after finding the first aggregated transaction that conflicts with it.
If the confirmed transaction conflicts with <em>multiple</em> aggregated transactions, conflicting inputs are only removed from the <em>first</em> matching aggregated transaction, and any other conflicting aggregated transactions are left invalid.</p>
<p>Any HTLCs claimed by invalid aggregated transactions get locked up and can only be recovered by manually constructing and broadcasting valid claim transactions.</p>
<h1 id="liquidity-griefing">Liquidity Griefing</h1>
<p>Prior to LDK 0.1, there are only two types of HTLC claims that are aggregated:</p>
<ul>
<li>HTLC preimage claims</li>
<li>revoked commitment HTLC claims</li>
</ul>
<p>For HTLC preimage claims, LDK takes care to confirm them before their HTLCs time out, so there’s no reliable way for an attacker to confirm a conflicting timeout claim and trigger the invalid claims bug.</p>
<p>For revoked commitment transactions, however, an attacker can immediately spend any incoming HTLC outputs via HTLC-Success transactions.
Although LDK is then able to claim the HTLC-Success outputs via the revocation key, the attacker can exploit the invalid claims bug to lock up any remaining HTLCs on the revoked commitment transaction.</p>
<h2 id="setup">Setup</h2>
<p>The attacker opens an anchor channel with the victim, creating a network topology as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A -- B -- M
</code></pre></div></div>
<p>In this case <code class="language-plaintext highlighter-rouge">B</code> is the victim LDK node and <code class="language-plaintext highlighter-rouge">M</code> is the node controlled by the attacker.
The attacker must use an anchor channel so that they can spend multiple HTLC claims in the same transaction and trigger the invalid claims bug.</p>
<p>The attacker then routes HTLCs along the path <code class="language-plaintext highlighter-rouge">A->B->M</code> as follows:</p>
<ol>
<li>1 small HTLC with CLTV of <code class="language-plaintext highlighter-rouge">X</code></li>
<li>1 small HTLC with CLTV of <code class="language-plaintext highlighter-rouge">X+1</code></li>
<li>1 large HTLC with CLTV of <code class="language-plaintext highlighter-rouge">X+1</code> (this is the one the attacker will lock up)</li>
</ol>
<p>The attacker knows preimages for all HTLCs but withholds them for now.</p>
<p>To complete the setup, the attacker routes some other HTLC through the channel, causing the commitment transaction with the above HTLCs to be revoked.</p>
<h2 id="forcing-multiple-aggregations">Forcing Multiple Aggregations</h2>
<p>Next the attacker waits until block <code class="language-plaintext highlighter-rouge">X-13</code> and force closes the <code class="language-plaintext highlighter-rouge">B-M</code> channel using their revoked commitment transaction, being sure to get it confirmed in block <code class="language-plaintext highlighter-rouge">X-12</code>.
By confirming in this specific block, the attacker can exploit LDK’s buggy aggregation logic prior to v0.1 (see below), causing LDK to aggregate HTLC justice claims as follows:</p>
<ul>
<li><strong>Transaction 1:</strong> HTLC 1</li>
<li><strong>Transaction 2:</strong> HTLCs 2 and 3</li>
</ul>
<h3 id="buggy-aggregation-logic">Buggy Aggregation Logic</h3>
<p>Prior to v0.1, LDK only aggregates HTLC claims if their timeouts are more than 12 blocks in the future.
Presumably 12 blocks was deemed “too soon” to guarantee that LDK can confirm preimage claims before the HTLCs time out, and once one HTLC times out the counterparty can pin a competing timeout claim in mempools, thereby preventing confirmation of <em>all</em> the aggregated preimage claims.
In other words, by claiming HTLCs separately in this scenario, LDK limits the damage the counterparty could do if one of those HTLCs expires before LDK successfully claims it.</p>
<p>Unfortunately, this aggregation strategy makes no sense when LDK is trying to group justice claims that the counterparty can spend immediately via HTLC-Success, since the timeout on those HTLCs does not apply to the counterparty.
Nevertheless, prior to LDK 0.1, the same 12 block aggregation check applies equally to all justice claims, regardless of whether the counterparty can spend them immediately or must wait to spend via HTLC-Timeout.</p>
<p>An attacker can exploit this buggy aggregation logic to make LDK create multiple claim transactions, as described above.</p>
<h2 id="locking-up-funds">Locking Up Funds</h2>
<p>Finally, the attacker broadcasts and confirms a transaction spending HTLCs 1 and 2 via HTLC-Success.
The attacker’s transaction conflicts with both Transaction 1 and Transaction 2, but due to the invalid claims bug, LDK only notices the conflict with Transaction 1.
LDK continues to fee bump and rebroadcast Transaction 2 indefinitely, even though it can never be mined.</p>
<p>As a result, the funds in HTLC 3 remain inaccessible until a valid claim transaction is manually constructed and broadcast.</p>
<p>Note that if the attacker ever tries to claim HTLC 3 via HTLC-Success, LDK is able to immediately recover it via the revocation key.
So while the attacker can lock up HTLC 3, they cannot actually steal it once the upstream HTLC times out.</p>
<h2 id="attack-cost">Attack Cost</h2>
<p>When the attacker’s revoked commitment transaction confirms, LDK is able to immediately claim the attacker’s channel balance.
LDK is also able to claim HTLCs 1 and 2 via the revocation key on the <code class="language-plaintext highlighter-rouge">B-M</code> channel, while also claiming them via the preimage on the upstream <code class="language-plaintext highlighter-rouge">A-B</code> channel.</p>
<p>Thus a smart attacker would minimize costs by spending their channel balance down to the 1% reserve before carrying out the attack and would then set the amounts of HTLCs 1 and 2 to just above the dust threshold.
The attacker would also maximize the pain inflicted on the victim by setting HTLC 3 to the maximum allowed amount.</p>
<h1 id="stealing-htlcs-in-01-beta">Stealing HTLCs in 0.1-beta</h1>
<p>Beginning in <a href="https://github.com/lightningdevkit/rust-lightning/releases/tag/v0.1.0-beta1">v0.1-beta</a>, LDK <a href="https://github.com/lightningdevkit/rust-lightning/pull/3340">started</a> aggregating HTLC timeout claims that have compatible locktimes.
As a result, the beta release is vulnerable to a variant of the liquidity griefing attack that enables the attacker to steal funds.
Thankfully the invalid claims bug was fixed between the 0.1-beta and 0.1 releases, so the final LDK 0.1 release is not vulnerable to this attack.</p>
<p>The fund-stealing variant for LDK 0.1-beta works as follows.</p>
<h2 id="setup-1">Setup</h2>
<p>The attack setup is identical to the liquidity griefing attack, except that the attacker does not cause its commitment transaction to be revoked.</p>
<h2 id="forcing-multiple-aggregations-1">Forcing Multiple Aggregations</h2>
<p>The attacker then force closes the <code class="language-plaintext highlighter-rouge">B-M</code> channel.
Due to differing locktimes, LDK creates HTLC timeout claims as follows:</p>
<ul>
<li><strong>Transaction 1:</strong> HTLC 1 (locktime <code class="language-plaintext highlighter-rouge">X</code>)</li>
<li><strong>Transaction 2:</strong> HTLCs 2 and 3 (locktime <code class="language-plaintext highlighter-rouge">X+1</code>)</li>
</ul>
<p>Once height <code class="language-plaintext highlighter-rouge">X</code> is reached, LDK broadcasts Transaction 1.
At height <code class="language-plaintext highlighter-rouge">X+1</code>, LDK broadcasts Transaction 2.</p>
<p>At this point, if Transaction 1 confirmed immediately in block <code class="language-plaintext highlighter-rouge">X+1</code>, the attack fails since the attacker can no longer spend HTLCs 1 and 2 together in the same transaction.
But if Transaction 1 did not confirm immediately (which is more likely), the attack can continue.</p>
<h2 id="stealing-funds">Stealing Funds</h2>
<p>The attacker broadcasts and confirms a transaction spending HTLCs 1 and 2 via HTLC-Success.
This transaction conflicts with both Transaction 1 and Transaction 2, but due to the invalid claims bug, LDK only notices the conflict with Transaction 1.
LDK continues to fee bump and rebroadcast Transaction 2 indefinitely, even though it can never be mined.</p>
<p>Once HTLC 3’s upstream timeout expires, node <code class="language-plaintext highlighter-rouge">A</code> force closes and claims a refund, leaving the coast clear for the attacker to claim the downstream HTLC via preimage.</p>
<h1 id="the-fix">The Fix</h1>
<p>The invalid claims bug was fixed by a <a href="https://github.com/lightningdevkit/rust-lightning/pull/3538">one-line patch</a> just prior to the LDK 0.1 release.</p>
<h1 id="discovery">Discovery</h1>
<p>This vulnerability was discovered during an audit of LDK’s chain module.</p>
<h2 id="timeline">Timeline</h2>
<ul>
<li><strong>2024-12-23:</strong> Vulnerability reported to the LDK security mailing list.</li>
<li><strong>2025-01-15:</strong> Fix <a href="https://github.com/lightningdevkit/rust-lightning/pull/3538">merged</a>.</li>
<li><strong>2025-01-16:</strong> LDK 0.1 released containing the fix, with public disclosure in release notes.</li>
<li><strong>2025-01-23:</strong> Detailed description of vulnerability published.</li>
</ul>
<h1 id="prevention">Prevention</h1>
<p>The invalid claims bug is fundamentally a problem of incorrect control flow – a <code class="language-plaintext highlighter-rouge">break</code> statement was inserted into a loop where it shouldn’t have been.
Why wasn’t it caught during initial code review, and why wasn’t it noticed for years after that?</p>
<p>The <code class="language-plaintext highlighter-rouge">break</code> statement was <a href="https://github.com/lightningdevkit/rust-lightning/commit/feb472dc9ef971b926b19d27e1ad05a79423778f">introduced</a> back in 2019, long before LDK supported anchor channels.
The code was actually correct back then, because before anchor channels there was no way for the counterparty to construct a transaction that conflicted with two of LDK’s aggregated transactions.
But even after <a href="https://github.com/lightningdevkit/rust-lightning/releases/tag/v0.0.116">LDK 0.0.116</a> added support for anchor channels, the bug went unnoticed for over two years, despite multiple changes being made to the surrounding code in that time frame.</p>
<p>It’s impossible to say exactly what kept the bug hidden, but I think the complexity and unreadability of the surrounding code was a likely contributor.
Here’s the for-loop containing the <a href="https://github.com/lightningdevkit/rust-lightning/blob/ad462bd9c8237e505f463c227c9ac98ebd3fbb16/lightning/src/chain/onchaintx.rs#L896-L984">buggy code</a>:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">bump_candidates</span> <span class="o">=</span> <span class="nf">new_hash_map</span><span class="p">();</span>
<span class="k">if</span> <span class="o">!</span><span class="n">txn_matched</span><span class="nf">.is_empty</span><span class="p">()</span> <span class="p">{</span> <span class="nf">maybe_log_intro</span><span class="p">();</span> <span class="p">}</span>
<span class="k">for</span> <span class="n">tx</span> <span class="n">in</span> <span class="n">txn_matched</span> <span class="p">{</span>
<span class="c">// Scan all input to verify is one of the outpoint spent is of interest for us</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">claimed_outputs_material</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
<span class="k">for</span> <span class="n">inp</span> <span class="n">in</span> <span class="o">&</span><span class="n">tx</span><span class="py">.input</span> <span class="p">{</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">((</span><span class="n">claim_id</span><span class="p">,</span> <span class="mi">_</span><span class="p">))</span> <span class="o">=</span> <span class="k">self</span><span class="py">.claimable_outpoints</span><span class="nf">.get</span><span class="p">(</span><span class="o">&</span><span class="n">inp</span><span class="py">.previous_output</span><span class="p">)</span> <span class="p">{</span>
<span class="c">// If outpoint has claim request pending on it...</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="py">.pending_claim_requests</span><span class="nf">.get_mut</span><span class="p">(</span><span class="n">claim_id</span><span class="p">)</span> <span class="p">{</span>
<span class="c">//... we need to check if the pending claim was for a subset of the outputs</span>
<span class="c">// spent by the confirmed transaction. If so, we can drop the pending claim</span>
<span class="c">// after ANTI_REORG_DELAY blocks, otherwise we need to split it and retry</span>
<span class="c">// claiming the remaining outputs.</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">is_claim_subset_of_tx</span> <span class="o">=</span> <span class="k">true</span><span class="p">;</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">tx_inputs</span> <span class="o">=</span> <span class="n">tx</span><span class="py">.input</span><span class="nf">.iter</span><span class="p">()</span><span class="nf">.map</span><span class="p">(|</span><span class="n">input</span><span class="p">|</span> <span class="o">&</span><span class="n">input</span><span class="py">.previous_output</span><span class="p">)</span><span class="py">.collect</span><span class="p">::</span><span class="o"><</span><span class="nb">Vec</span><span class="o"><</span><span class="mi">_</span><span class="o">>></span><span class="p">();</span>
<span class="n">tx_inputs</span><span class="nf">.sort_unstable</span><span class="p">();</span>
<span class="k">for</span> <span class="n">request_input</span> <span class="n">in</span> <span class="n">request</span><span class="nf">.outpoints</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">tx_inputs</span><span class="nf">.binary_search</span><span class="p">(</span><span class="o">&</span><span class="n">request_input</span><span class="p">)</span><span class="nf">.is_err</span><span class="p">()</span> <span class="p">{</span>
<span class="n">is_claim_subset_of_tx</span> <span class="o">=</span> <span class="k">false</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nd">macro_rules!</span> <span class="n">clean_claim_request_after_safety_delay</span> <span class="p">{</span>
<span class="p">()</span> <span class="k">=></span> <span class="p">{</span>
<span class="k">let</span> <span class="n">entry</span> <span class="o">=</span> <span class="n">OnchainEventEntry</span> <span class="p">{</span>
<span class="n">txid</span><span class="p">:</span> <span class="n">tx</span><span class="nf">.compute_txid</span><span class="p">(),</span>
<span class="n">height</span><span class="p">:</span> <span class="n">conf_height</span><span class="p">,</span>
<span class="n">block_hash</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="n">conf_hash</span><span class="p">),</span>
<span class="n">event</span><span class="p">:</span> <span class="nn">OnchainEvent</span><span class="p">::</span><span class="n">Claim</span> <span class="p">{</span> <span class="n">claim_id</span><span class="p">:</span> <span class="o">*</span><span class="n">claim_id</span> <span class="p">}</span>
<span class="p">};</span>
<span class="k">if</span> <span class="o">!</span><span class="k">self</span><span class="py">.onchain_events_awaiting_threshold_conf</span><span class="nf">.contains</span><span class="p">(</span><span class="o">&</span><span class="n">entry</span><span class="p">)</span> <span class="p">{</span>
<span class="k">self</span><span class="py">.onchain_events_awaiting_threshold_conf</span><span class="nf">.push</span><span class="p">(</span><span class="n">entry</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c">// If this is our transaction (or our counterparty spent all the outputs</span>
<span class="c">// before we could anyway with same inputs order than us), wait for</span>
<span class="c">// ANTI_REORG_DELAY and clean the RBF tracking map.</span>
<span class="k">if</span> <span class="n">is_claim_subset_of_tx</span> <span class="p">{</span>
<span class="nd">clean_claim_request_after_safety_delay!</span><span class="p">();</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="c">// If false, generate new claim request with update outpoint set</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">at_least_one_drop</span> <span class="o">=</span> <span class="k">false</span><span class="p">;</span>
<span class="k">for</span> <span class="n">input</span> <span class="n">in</span> <span class="n">tx</span><span class="py">.input</span><span class="nf">.iter</span><span class="p">()</span> <span class="p">{</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">package</span><span class="p">)</span> <span class="o">=</span> <span class="n">request</span><span class="nf">.split_package</span><span class="p">(</span><span class="o">&</span><span class="n">input</span><span class="py">.previous_output</span><span class="p">)</span> <span class="p">{</span>
<span class="n">claimed_outputs_material</span><span class="nf">.push</span><span class="p">(</span><span class="n">package</span><span class="p">);</span>
<span class="n">at_least_one_drop</span> <span class="o">=</span> <span class="k">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="c">// If there are no outpoints left to claim in this request, drop it entirely after ANTI_REORG_DELAY.</span>
<span class="k">if</span> <span class="n">request</span><span class="nf">.outpoints</span><span class="p">()</span><span class="nf">.is_empty</span><span class="p">()</span> <span class="p">{</span>
<span class="nd">clean_claim_request_after_safety_delay!</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c">//TODO: recompute soonest_timelock to avoid wasting a bit on fees</span>
<span class="k">if</span> <span class="n">at_least_one_drop</span> <span class="p">{</span>
<span class="n">bump_candidates</span><span class="nf">.insert</span><span class="p">(</span><span class="o">*</span><span class="n">claim_id</span><span class="p">,</span> <span class="n">request</span><span class="nf">.clone</span><span class="p">());</span>
<span class="c">// If we have any pending claim events for the request being updated</span>
<span class="c">// that have yet to be consumed, we'll remove them since they will</span>
<span class="c">// end up producing an invalid transaction by double spending</span>
<span class="c">// input(s) that already have a confirmed spend. If such spend is</span>
<span class="c">// reorged out of the chain, then we'll attempt to re-spend the</span>
<span class="c">// inputs once we see it.</span>
<span class="nd">#[cfg(debug_assertions)]</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">existing</span> <span class="o">=</span> <span class="k">self</span><span class="py">.pending_claim_events</span><span class="nf">.iter</span><span class="p">()</span>
<span class="nf">.filter</span><span class="p">(|</span><span class="n">entry</span><span class="p">|</span> <span class="n">entry</span><span class="na">.0</span> <span class="o">==</span> <span class="o">*</span><span class="n">claim_id</span><span class="p">)</span><span class="nf">.count</span><span class="p">();</span>
<span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="n">existing</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">||</span> <span class="n">existing</span> <span class="o">==</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">self</span><span class="py">.pending_claim_events</span><span class="nf">.retain</span><span class="p">(|</span><span class="n">entry</span><span class="p">|</span> <span class="n">entry</span><span class="na">.0</span> <span class="o">!=</span> <span class="o">*</span><span class="n">claim_id</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">break</span><span class="p">;</span> <span class="c">//No need to iterate further, either tx is our or their</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nd">panic!</span><span class="p">(</span><span class="s">"Inconsistencies between pending_claim_requests map and claimable_outpoints map"</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">package</span> <span class="n">in</span> <span class="n">claimed_outputs_material</span><span class="nf">.drain</span><span class="p">(</span><span class="o">..</span><span class="p">)</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">entry</span> <span class="o">=</span> <span class="n">OnchainEventEntry</span> <span class="p">{</span>
<span class="n">txid</span><span class="p">:</span> <span class="n">tx</span><span class="nf">.compute_txid</span><span class="p">(),</span>
<span class="n">height</span><span class="p">:</span> <span class="n">conf_height</span><span class="p">,</span>
<span class="n">block_hash</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="n">conf_hash</span><span class="p">),</span>
<span class="n">event</span><span class="p">:</span> <span class="nn">OnchainEvent</span><span class="p">::</span><span class="n">ContentiousOutpoint</span> <span class="p">{</span> <span class="n">package</span> <span class="p">},</span>
<span class="p">};</span>
<span class="k">if</span> <span class="o">!</span><span class="k">self</span><span class="py">.onchain_events_awaiting_threshold_conf</span><span class="nf">.contains</span><span class="p">(</span><span class="o">&</span><span class="n">entry</span><span class="p">)</span> <span class="p">{</span>
<span class="k">self</span><span class="py">.onchain_events_awaiting_threshold_conf</span><span class="nf">.push</span><span class="p">(</span><span class="n">entry</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Perhaps others have a better mental parser than me, but I find this code quite difficult to read and understand.
The loop is so long, with so much nesting and so many low-level implementation details that by the time I get to the buggy <code class="language-plaintext highlighter-rouge">break</code> statement, I’ve completely forgotten what loop it applies to.
And since the comment attached to the break statement gives a believable explanation, it’s easy to gloss right over it.</p>
<p>Perhaps the buggy control flow would be easier to spot if the loop was simpler and more compact.
By hand-waving some helper functions into existence and refactoring, the same code could be written as follows:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">maybe_log_intro</span><span class="p">();</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">bump_candidates</span> <span class="o">=</span> <span class="nf">new_hash_map</span><span class="p">();</span>
<span class="k">for</span> <span class="n">tx</span> <span class="n">in</span> <span class="n">txn_matched</span> <span class="p">{</span>
<span class="k">for</span> <span class="n">inp</span> <span class="n">in</span> <span class="o">&</span><span class="n">tx</span><span class="py">.input</span> <span class="p">{</span>
<span class="k">if</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">claim_request</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="nf">.get_mut_claim_request_from_outpoint</span><span class="p">(</span><span class="n">inp</span><span class="py">.previous_output</span><span class="p">)</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">split_requests</span> <span class="o">=</span> <span class="n">claim_request</span><span class="nf">.split_off_matching_inputs</span><span class="p">(</span><span class="o">&</span><span class="n">tx</span><span class="py">.input</span><span class="p">);</span>
<span class="nd">debug_assert!</span><span class="p">(</span><span class="o">!</span><span class="n">split_requests</span><span class="nf">.is_empty</span><span class="p">());</span>
<span class="k">if</span> <span class="n">claim_request</span><span class="nf">.outpoints</span><span class="p">()</span><span class="nf">.is_empty</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// Request has been fully claimed.</span>
<span class="k">self</span><span class="nf">.mark_request_claimed</span><span class="p">(</span><span class="n">claim_request</span><span class="p">,</span> <span class="n">tx</span><span class="p">,</span> <span class="n">conf_height</span><span class="p">,</span> <span class="n">conf_hash</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="c">// After removing conflicting inputs, there's still more to claim. Add the modified</span>
<span class="c">// request to bump_candidates so it gets fee bumped and rebroadcast.</span>
<span class="k">self</span><span class="nf">.remove_pending_claim_events</span><span class="p">(</span><span class="n">claim_request</span><span class="p">);</span>
<span class="n">bump_candidates</span><span class="nf">.insert</span><span class="p">(</span><span class="n">claim_request</span><span class="nf">.clone</span><span class="p">());</span>
<span class="k">self</span><span class="nf">.mark_requests_contentious</span><span class="p">(</span><span class="n">split_requests</span><span class="p">,</span> <span class="n">tx</span><span class="p">,</span> <span class="n">conf_height</span><span class="p">,</span> <span class="n">conf_hash</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The control flow in this version is much more apparent to the reader.
And although there’s no guarantee that the buggy <code class="language-plaintext highlighter-rouge">break</code> statements would have been discovered sooner if the code was written this way, I do think the odds would have been much better.</p>
<h1 id="takeaways">Takeaways</h1>
<ul>
<li>Code readability matters for preventing bugs.</li>
<li>Update to LDK 0.1 for the vulnerability fix.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/ldk-invalid-claims-liquidity-griefing/">LDK: Invalid Claims Liquidity Griefing</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on January 23, 2025.</p>
https://morehouse.github.io/lightning/lnd-onion-bomb2024-06-18T00:00:00-00:002024-06-18T00:00:00-05:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>LND versions prior to 0.17.0 are vulnerable to a DoS attack where malicious onion packets cause the node to instantly run out of memory (OOM) and crash.
If you are running an LND release older than this, your funds are at risk!
Update to at least <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.17.0-beta">0.17.0</a> to protect your node.</p>
<h1 id="severity">Severity</h1>
<p>It is critical that users update to at least LND 0.17.0 for several reasons.</p>
<ul>
<li>The attack is cheap and easy to carry out and will keep the victim offline for as long as it lasts.</li>
<li>The source of the attack is concealed via onion routing. The attacker does not need to connect directly to the victim.</li>
<li>Prior to LND 0.17.0, all nodes are vulnerable. The fix was not backported to the LND 0.16.x series or earlier.</li>
</ul>
<h1 id="the-vulnerability">The Vulnerability</h1>
<p>The Lightning Network uses <a href="https://en.wikipedia.org/wiki/Onion_routing">onion routing</a> to provide senders and receivers of payments some degree of privacy.
Each node along a payment route receives an <em>onion packet</em> from the previous node, containing forwarding instructions for the next node on the route.
The onion packet is encrypted by the initiator of the payment, so that each node can only read its own forwarding instructions.</p>
<p>Once a node has “peeled off” its layer of encryption from the onion packet, it can extract its forwarding instructions according to the format <a href="https://github.com/lightning/bolts/blob/master/04-onion-routing.md#packet-structure">specified</a> in the LN protocol:</p>
<table rules="groups">
<thead>
<tr>
<th style="text-align: left">Field Name</th>
<th style="text-align: left">Size</th>
<th style="text-align: left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><code class="language-plaintext highlighter-rouge">length</code></td>
<td style="text-align: left">1-9 bytes</td>
<td style="text-align: left">The length of the <code class="language-plaintext highlighter-rouge">payload</code> field, encoded as <a href="https://github.com/lightning/bolts/blob/master/01-messaging.md#appendix-a-bigsize-test-vectors">BigSize</a>.</td>
</tr>
<tr>
<td style="text-align: left"><code class="language-plaintext highlighter-rouge">payload</code></td>
<td style="text-align: left"><code class="language-plaintext highlighter-rouge">length</code> bytes</td>
<td style="text-align: left">The forwarding instructions.</td>
</tr>
<tr>
<td style="text-align: left"><code class="language-plaintext highlighter-rouge">hmac</code></td>
<td style="text-align: left">32 bytes</td>
<td style="text-align: left">The HMAC to use for the forwarded onion packet.</td>
</tr>
<tr>
<td style="text-align: left"><code class="language-plaintext highlighter-rouge">next_onion</code></td>
<td style="text-align: left">remaining bytes</td>
<td style="text-align: left">The onion packet to forward.</td>
</tr>
</tbody>
</table>
<p>Prior to LND 0.17.0, the <a href="https://github.com/lightningnetwork/lightning-onion/blob/ca23184850a16cd14d27619f3afdae543b3857a9/path.go#L231-L281">code</a> that extracts these instructions is essentially:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Decode unpacks an encoded HopPayload from the passed reader into the</span>
<span class="c">// target HopPayload.</span>
<span class="k">func</span> <span class="p">(</span><span class="n">hp</span> <span class="o">*</span><span class="n">HopPayload</span><span class="p">)</span> <span class="n">Decode</span><span class="p">(</span><span class="n">r</span> <span class="n">io</span><span class="o">.</span><span class="n">Reader</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
<span class="n">bufReader</span> <span class="o">:=</span> <span class="n">bufio</span><span class="o">.</span><span class="n">NewReader</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="k">var</span> <span class="n">b</span> <span class="p">[</span><span class="m">8</span><span class="p">]</span><span class="kt">byte</span>
<span class="n">varInt</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">ReadVarInt</span><span class="p">(</span><span class="n">bufReader</span><span class="p">,</span> <span class="o">&</span><span class="n">b</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">err</span>
<span class="p">}</span>
<span class="n">payloadSize</span> <span class="o">:=</span> <span class="kt">uint32</span><span class="p">(</span><span class="n">varInt</span><span class="p">)</span>
<span class="c">// Now that we know the payload size, we'll create a new buffer to</span>
<span class="c">// read it out in full.</span>
<span class="n">hp</span><span class="o">.</span><span class="n">Payload</span> <span class="o">=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span> <span class="n">payloadSize</span><span class="p">)</span>
<span class="k">if</span> <span class="n">_</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">io</span><span class="o">.</span><span class="n">ReadFull</span><span class="p">(</span><span class="n">bufReader</span><span class="p">,</span> <span class="n">hp</span><span class="o">.</span><span class="n">Payload</span><span class="p">[</span><span class="o">:</span><span class="p">]);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">err</span>
<span class="p">}</span>
<span class="k">if</span> <span class="n">_</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">io</span><span class="o">.</span><span class="n">ReadFull</span><span class="p">(</span><span class="n">bufReader</span><span class="p">,</span> <span class="n">hp</span><span class="o">.</span><span class="n">HMAC</span><span class="p">[</span><span class="o">:</span><span class="p">]);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">err</span>
<span class="p">}</span>
<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note the absence of a bounds check on <code class="language-plaintext highlighter-rouge">payloadSize</code>!</p>
<p>Regardless of the actual payload size, <strong>LND allocates memory for whatever <code class="language-plaintext highlighter-rouge">length</code> is encoded in the onion packet up to <code class="language-plaintext highlighter-rouge">UINT32_MAX</code> (4 GB).</strong></p>
<h1 id="the-dos-attack">The DoS Attack</h1>
<p>It is trivial for an attacker to craft an onion packet that contains an encoded <code class="language-plaintext highlighter-rouge">length</code> of <code class="language-plaintext highlighter-rouge">UINT32_MAX</code> for the victim’s forwarding instructions.
If the victim’s node has less than 4 GB of memory available, it will OOM crash instantly upon receiving the attacker’s packet.</p>
<p>However, if the victim’s node has more than 4 GB of memory available, it is able to recover from the malicious packet.
The victim’s node will temporarily allocate 4 GB, but the Go garbage collector will quickly reclaim that memory after decoding fails.</p>
<p><em>So nodes with more than 4 GB of RAM are safe, right?</em></p>
<p>Not quite.
The attacker can send many malicious packets simultaneously.
If the victim processes enough malicious packets before the garbage collector kicks in, an OOM will still occur.
And since LND decodes onion packets <em>in parallel</em>, it is not difficult for an attacker to beat the garbage collector.
In my experiments I was able to consistently crash nodes with up to 128 GB of RAM in just a few seconds.</p>
<h1 id="the-fix">The Fix</h1>
<p>A bounds check on the encoded <code class="language-plaintext highlighter-rouge">length</code> field was concealed in a large refactoring <a href="https://github.com/lightningnetwork/lightning-onion/commit/6afc43f3fc983ae37685812de027d0747e136b8f">commit</a> and included in LND <a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.17.0-beta">0.17.0</a>.
The fixed code is essentially:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Decode unpacks an encoded HopPayload from the passed reader into the</span>
<span class="c">// target HopPayload.</span>
<span class="k">func</span> <span class="p">(</span><span class="n">hp</span> <span class="o">*</span><span class="n">HopPayload</span><span class="p">)</span> <span class="n">Decode</span><span class="p">(</span><span class="n">r</span> <span class="n">io</span><span class="o">.</span><span class="n">Reader</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
<span class="n">bufReader</span> <span class="o">:=</span> <span class="n">bufio</span><span class="o">.</span><span class="n">NewReader</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="n">payloadSize</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">tlvPayloadSize</span><span class="p">(</span><span class="n">bufReader</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">err</span>
<span class="p">}</span>
<span class="c">// Now that we know the payload size, we'll create a new buffer to</span>
<span class="c">// read it out in full.</span>
<span class="n">hp</span><span class="o">.</span><span class="n">Payload</span> <span class="o">=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span> <span class="n">payloadSize</span><span class="p">)</span>
<span class="k">if</span> <span class="n">_</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">io</span><span class="o">.</span><span class="n">ReadFull</span><span class="p">(</span><span class="n">bufReader</span><span class="p">,</span> <span class="n">hp</span><span class="o">.</span><span class="n">Payload</span><span class="p">[</span><span class="o">:</span><span class="p">]);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">err</span>
<span class="p">}</span>
<span class="k">if</span> <span class="n">_</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">io</span><span class="o">.</span><span class="n">ReadFull</span><span class="p">(</span><span class="n">bufReader</span><span class="p">,</span> <span class="n">hp</span><span class="o">.</span><span class="n">HMAC</span><span class="p">[</span><span class="o">:</span><span class="p">]);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">err</span>
<span class="p">}</span>
<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
<span class="c">// tlvPayloadSize uses the passed reader to extract the payload length</span>
<span class="c">// encoded as a var-int.</span>
<span class="k">func</span> <span class="n">tlvPayloadSize</span><span class="p">(</span><span class="n">r</span> <span class="n">io</span><span class="o">.</span><span class="n">Reader</span><span class="p">)</span> <span class="p">(</span><span class="kt">uint16</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">b</span> <span class="p">[</span><span class="m">8</span><span class="p">]</span><span class="kt">byte</span>
<span class="n">varInt</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">ReadVarInt</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="o">&</span><span class="n">b</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="m">0</span><span class="p">,</span> <span class="n">err</span>
<span class="p">}</span>
<span class="k">if</span> <span class="n">varInt</span> <span class="o">></span> <span class="n">math</span><span class="o">.</span><span class="n">MaxUint16</span> <span class="p">{</span>
<span class="k">return</span> <span class="m">0</span><span class="p">,</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s">"payload size of %d is larger than the "</span><span class="o">+</span>
<span class="s">"maximum allowed size of %d"</span><span class="p">,</span> <span class="n">varInt</span><span class="p">,</span> <span class="n">math</span><span class="o">.</span><span class="n">MaxUint16</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">return</span> <span class="kt">uint16</span><span class="p">(</span><span class="n">varInt</span><span class="p">),</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This new code reduces the maximum amount of memory LND will allocate when decoding an onion packet from 4 GB to 64 KB, which is enough to fully mitigate the DoS attack.</p>
<h1 id="discovery">Discovery</h1>
<p>A simple <a href="https://github.com/lightningnetwork/lnd/commit/9c51bea7906060bbf7b8dc23cab7d542a610eb10">fuzz test</a> for onion packet encoding and decoding revealed this vulnerability.</p>
<h2 id="timeline">Timeline</h2>
<ul>
<li><strong>2023-06-20:</strong> Vulnerability discovered and disclosed to Lightning Labs.</li>
<li><strong>2023-08-23:</strong> Fix <a href="https://github.com/lightningnetwork/lightning-onion/pull/57">merged</a>.</li>
<li><strong>2023-10-03:</strong> LND 0.17.0 released containing the fix.</li>
<li><strong>2024-05-16:</strong> Laolu gives the OK to disclose publicly once LND 0.18.0 is released and has some uptake.</li>
<li><strong>2024-05-30:</strong> LND 0.18.0 released.</li>
<li><strong>2024-06-18:</strong> Public disclosure.</li>
</ul>
<h1 id="prevention">Prevention</h1>
<p>This vulnerability was found in less than a minute of fuzz testing.
If basic fuzz tests had been written at the time the original onion decoding functions were introduced, the bug would have been caught before it was merged.</p>
<p>In general any function that processes untrusted inputs is a strong candidate for fuzz testing, and often these fuzz tests are <em>easier</em> to write than traditional unit tests.
A minimal fuzz test that detects this particular vulnerability is exceedingly simple:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">FuzzHopPayload</span><span class="p">(</span><span class="n">f</span> <span class="o">*</span><span class="n">testing</span><span class="o">.</span><span class="n">F</span><span class="p">)</span> <span class="p">{</span>
<span class="n">f</span><span class="o">.</span><span class="n">Fuzz</span><span class="p">(</span><span class="k">func</span><span class="p">(</span><span class="n">t</span> <span class="o">*</span><span class="n">testing</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">data</span> <span class="p">[]</span><span class="kt">byte</span><span class="p">)</span> <span class="p">{</span>
<span class="c">// Hop payloads larger than 1300 bytes violate the spec and never</span>
<span class="c">// reach the decoding step in practice.</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">></span> <span class="m">1300</span> <span class="p">{</span>
<span class="k">return</span>
<span class="p">}</span>
<span class="k">var</span> <span class="n">hopPayload</span> <span class="n">sphinx</span><span class="o">.</span><span class="n">HopPayload</span>
<span class="n">hopPayload</span><span class="o">.</span><span class="n">Decode</span><span class="p">(</span><span class="n">bytes</span><span class="o">.</span><span class="n">NewReader</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="p">})</span>
<span class="p">}</span>
</code></pre></div></div>
<h1 id="takeaways">Takeaways</h1>
<ul>
<li>Write fuzz tests for all APIs that consume untrusted inputs.</li>
<li>Update your LND nodes to at least 0.17.0.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/lnd-onion-bomb/">DoS: LND Onion Bomb</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on June 18, 2024.</p>
https://morehouse.github.io/lightning/cln-channel-open-race2024-01-08T00:00:00-00:002024-01-08T00:00:00-06:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>CLN versions between <a href="https://github.com/ElementsProject/lightning/releases/tag/v23.02">23.02</a> and <a href="https://github.com/ElementsProject/lightning/releases/tag/v23.05.2">23.05.2</a> are susceptible to a DoS attack involving the exploitation of a race condition during channel opens.
If you are running any version in this range, your funds may be at risk!
Update to at least <a href="https://github.com/ElementsProject/lightning/releases/tag/v23.08">23.08</a> to help protect your node.</p>
<h1 id="the-vulnerability">The Vulnerability</h1>
<p>The vulnerability arises from a race condition between two different flows in CLN: the channel open flow and the peer connection flow.</p>
<h2 id="the-channel-open-flow">The Channel Open Flow</h2>
<p>When a peer opens a channel with a CLN node, the following interactions occur on the CLN node.</p>
<p><img src="/images/cln_channel_open_no_race1.png" alt="channel open diagram" /></p>
<ol>
<li>The <code class="language-plaintext highlighter-rouge">connectd</code> daemon notifies <code class="language-plaintext highlighter-rouge">lightningd</code> about the channel open request.</li>
<li><code class="language-plaintext highlighter-rouge">lightningd</code> launches a new <code class="language-plaintext highlighter-rouge">openingd</code> daemon to handle the channel open negotiation.</li>
<li><code class="language-plaintext highlighter-rouge">openingd</code> completes the channel open negotiation up to the point where the funding outpoint is known.</li>
<li><code class="language-plaintext highlighter-rouge">openingd</code> sends the funding outpoint to <code class="language-plaintext highlighter-rouge">lightningd</code> and exits.</li>
<li><code class="language-plaintext highlighter-rouge">lightningd</code> launches a <code class="language-plaintext highlighter-rouge">channeld</code> daemon to manage the new channel.</li>
</ol>
<h2 id="the-peer-connection-flow">The Peer Connection Flow</h2>
<p>Once a peer has a channel with a CLN node, if the peer disconnects and reconnects the following occurs on the CLN node.</p>
<p><img src="/images/cln_channel_open_no_race2.png" alt="channel exists diagram" /></p>
<ol>
<li>The <code class="language-plaintext highlighter-rouge">connectd</code> daemon notifies <code class="language-plaintext highlighter-rouge">lightningd</code> about the new peer connection.</li>
<li><code class="language-plaintext highlighter-rouge">lightningd</code> calls a plugin hook notifying the <code class="language-plaintext highlighter-rouge">chanbackup</code> plugin about the new peer connection.</li>
<li><code class="language-plaintext highlighter-rouge">chanbackup</code> notifies <code class="language-plaintext highlighter-rouge">lightningd</code> that it is done running the hook.</li>
<li>With the hook finished, <code class="language-plaintext highlighter-rouge">lightningd</code> recognizes that a previous channel exists with the peer and launches a <code class="language-plaintext highlighter-rouge">channeld</code> daemon to manage it.</li>
</ol>
<h2 id="the-race-condition">The Race Condition</h2>
<p>Problems arise when the peer connection flow overlaps with the channel open flow, causing <code class="language-plaintext highlighter-rouge">lightningd</code> to attempt launching the same <code class="language-plaintext highlighter-rouge">channeld</code> daemon twice.
This can happen if the peer quickly opens a channel after connecting, and the <code class="language-plaintext highlighter-rouge">chanbackup</code> plugin is delayed in handling the peer connection hook, leading to the following interactions on the CLN node.</p>
<p><img src="/images/cln_channel_open_race.png" alt="channel open race diagram" /></p>
<ol>
<li>The <code class="language-plaintext highlighter-rouge">connectd</code> daemon notifies <code class="language-plaintext highlighter-rouge">lightningd</code> about the new peer connection.</li>
<li><code class="language-plaintext highlighter-rouge">lightningd</code> calls a plugin hook notifying the <code class="language-plaintext highlighter-rouge">chanbackup</code> plugin about the new peer connection.</li>
<li>The <code class="language-plaintext highlighter-rouge">connectd</code> daemon notifies <code class="language-plaintext highlighter-rouge">lightningd</code> about the channel open request.</li>
<li><code class="language-plaintext highlighter-rouge">lightningd</code> launches a new <code class="language-plaintext highlighter-rouge">openingd</code> daemon to handle the channel open negotiation.</li>
<li><code class="language-plaintext highlighter-rouge">openingd</code> completes the channel open negotiation up to the point where the funding outpoint is known.</li>
<li><code class="language-plaintext highlighter-rouge">openingd</code> sends the funding outpoint to <code class="language-plaintext highlighter-rouge">lightningd</code> and exits.</li>
<li><code class="language-plaintext highlighter-rouge">lightningd</code> launches a <code class="language-plaintext highlighter-rouge">channeld</code> daemon to manage the new channel.</li>
<li><code class="language-plaintext highlighter-rouge">chanbackup</code> notifies <code class="language-plaintext highlighter-rouge">lightningd</code> that it is done running the hook.</li>
<li>With the hook finished, <code class="language-plaintext highlighter-rouge">lightningd</code> recognizes that a previous channel exists with the peer and attempts to launch a <code class="language-plaintext highlighter-rouge">channeld</code> daemon to manage it. <strong>Since the daemon is already running, an assertion failure occurs and CLN crashes.</strong></li>
</ol>
<h1 id="the-dos-attack">The DoS Attack</h1>
<p>To reliably trigger the assertion failure, an attacker needs to somehow slow down the <code class="language-plaintext highlighter-rouge">chanbackup</code> plugin so that a channel can be opened before the plugin finishes running the peer connected hook.
One way to do this is to overload <code class="language-plaintext highlighter-rouge">chanbackup</code> with many peer connections and channel state changes.
As it turns out, the <a href="/lightning/fake-channel-dos/">fake channel DoS attack</a> is a trivial and free method of generating these events and overloading <code class="language-plaintext highlighter-rouge">chanbackup</code>.</p>
<p>On a local network with low latency, I was able to generate enough load on <code class="language-plaintext highlighter-rouge">chanbackup</code> to consistently crash CLN nodes in under 5 seconds.
In the real world the attack would be carried out across the Internet with higher latencies, so more load on <code class="language-plaintext highlighter-rouge">chanbackup</code> would be required to trigger the race condition.
In my experiments, crashing CLN nodes across the Internet took around 30 seconds.</p>
<h1 id="the-defense">The Defense</h1>
<p>To prevent the assertion failure from triggering, a <a href="https://github.com/ElementsProject/lightning/commit/af394244914e69a5b2f16e1f10ef412217f9714a">small patch</a> was added to <a href="https://github.com/ElementsProject/lightning/releases/tag/v23.08">CLN 23.08</a> that checks if a <code class="language-plaintext highlighter-rouge">channeld</code> is already running when the peer connected hook returns.
If so, <code class="language-plaintext highlighter-rouge">lightningd</code> does not attempt to start the <code class="language-plaintext highlighter-rouge">channeld</code> again.</p>
<p>Note that this patch does not actually remove the race condition, though it does prevent crashing when the race occurs.</p>
<h1 id="discovery">Discovery</h1>
<p>This vulnerability was discovered during follow-up testing prior to the <a href="https://lists.linuxfoundation.org/pipermail/lightning-dev/2023-August/004064.html">disclosure</a> of the fake channel DoS vector.
At the time, Rusty and I agreed to move forward with the planned disclosure of the fake channel DoS vector, but to delay disclosure of this channel open race until a later date.</p>
<p>Since the channel open race can be triggered by the fake channel DoS attack, it is a valid question how the race went undiscovered during the implementation of defenses against that attack.
The answer is that the race was actually untriggerable until a few weeks <em>after</em> the fake channel DoS defenses were merged.</p>
<p>While the race condition was <a href="https://github.com/ElementsProject/lightning/pull/5078">introduced</a> in March 2022, the race couldn’t actually trigger because no plugins used the peer connected hook.
It wasn’t until February 2023 that the race was exposed, when the <a href="https://github.com/ElementsProject/lightning/pull/5361">peer storage backup</a> feature made <code class="language-plaintext highlighter-rouge">chanbackup</code> the first official plugin to use the hook.</p>
<h2 id="timeline">Timeline</h2>
<ul>
<li><strong>2022-03-23:</strong> Race condition <a href="https://github.com/ElementsProject/lightning/pull/5078">introduced</a> to CLN 0.11.</li>
<li><strong>2022-12-15:</strong> Fake channel DoS vector disclosed to Blockstream.</li>
<li><strong>2023-01-21:</strong> Fake channel DoS defenses fully merged [<a href="https://github.com/ElementsProject/lightning/pull/5837">1</a>, <a href="https://github.com/ElementsProject/lightning/pull/5849">2</a>].</li>
<li><strong>2023-02-08:</strong> Peer storage backup feature <a href="https://github.com/ElementsProject/lightning/pull/5361">introduced</a>, exposing the channel open race vulnerability.</li>
<li><strong>2023-03-03:</strong> CLN 23.02 released.</li>
<li><strong>2023-07-28:</strong> Rusty gives the OK to disclose the fake channel DoS vector.</li>
<li><strong>2023-08-14:</strong> Follow-up testing reveals the channel open race vulnerability. Disclosed to Blockstream.</li>
<li><strong>2023-08-21:</strong> Defense against the channel open race DoS <a href="https://github.com/ElementsProject/lightning/commit/af394244914e69a5b2f16e1f10ef412217f9714a">merged</a>.</li>
<li><strong>2023-08-22:</strong> Rusty gives the OK to continue with the fake channel DoS disclosure, but requests that the channel open race vulnerability be omitted from the disclosure.</li>
<li><strong>2023-08-23:</strong> <a href="https://lists.linuxfoundation.org/pipermail/lightning-dev/2023-August/004064.html">Public disclosure</a> of the fake channel DoS.</li>
<li><strong>2023-08-23:</strong> CLN 23.08 released.</li>
<li><strong>2023-12-04:</strong> Rusty gives the OK to disclose the channel open race vulnerability.</li>
<li><strong>2024-01-08:</strong> Public disclosure.</li>
</ul>
<h1 id="prevention">Prevention</h1>
<p>This vulnerability could have been prevented by a couple software engineering best practices.</p>
<h2 id="avoid-race-conditions">Avoid Race Conditions</h2>
<p>The <a href="https://github.com/ElementsProject/lightning/pull/2341">original purpose</a> of the peer connected hook was to enable plugins to filter and reject incoming connections from certain peers.
Therefore the hook was designed to be <em>synchronous</em>, and all other events initiated by the peer were blocked until the hook returned.
Unfortunately, <a href="https://github.com/ElementsProject/lightning/pull/5078">PR 5078</a> destroyed that property of the hook by introducing a <em>known</em> race condition to the code (search for “this is racy” in commit <a href="https://github.com/ElementsProject/lightning/commit/2424b7dea899e11b33ee4da9f95836e058db4a0c">2424b7d</a>).
If PR 5078 hadn’t done this, there would be no race condition to exploit and this vulnerability would never have existed.</p>
<p>Race conditions can be nasty and should be avoided whenever possible.
Knowingly adding race conditions where they didn’t previously exist is generally a bad idea.</p>
<h2 id="do-stress-testing">Do Stress Testing</h2>
<p>When I disclosed the fake channel DoS vector to Blockstream, I also provided a DoS program that demonstrated the attack.
That same DoS program revealed the channel open race vulnerability after it became triggerable in February 2023.
If a stress test based on the DoS program had been added to CLN’s CI pipeline or release process, this vulnerability could have been caught much earlier, before it was included in any releases.</p>
<p>In general, there is some difficulty in releasing such a test publicly while the vulnerability it tests for is still secret.
In such situations the test can remain unreleased until the vulnerability has been publicly disclosed, and in the meantime the test can be run privately during the release process to ensure no regressions have been introduced.
In CLN’s case, this may have been unnecessary – a stress test could have plausibly been added to <a href="https://github.com/ElementsProject/lightning/pull/5849">PR 5849</a> without raising suspicion.</p>
<h1 id="takeaways">Takeaways</h1>
<ul>
<li>Avoid race conditions.</li>
<li>Use regression and stress testing.</li>
<li>Update your CLN nodes to at least v23.08.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/cln-channel-open-race/">DoS: Channel Open Race in CLN</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on January 08, 2024.</p>
https://morehouse.github.io/lightning/cln-invoice-parsing2023-12-08T00:00:00-00:002023-12-08T00:00:00-06:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>Several invoice parsing bugs were fixed in <a href="https://github.com/ElementsProject/lightning/releases/tag/v23.11">CLN 23.11</a>, including bugs that caused crashes, undefined behavior, and use of uninitialized memory.
These bugs could be reliably triggered by specially crafted invoices, enabling a
malicious counterparty to crash the victim’s node upon invoice payment.</p>
<p>The parsing bugs were discovered by a new fuzz test written by <a href="https://github.com/dergoegge">Niklas Gögge</a> and enhanced by me.</p>
<h1 id="bugs-fixed-in-v2311">Bugs fixed in v23.11</h1>
<table rules="groups">
<thead>
<tr>
<th style="text-align: left">#</th>
<th> </th>
<th style="text-align: left">Type</th>
<th style="text-align: left">Root Cause</th>
<th style="text-align: left">Fix</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><strong>1</strong></td>
<td> </td>
<td style="text-align: left">undefined behavior</td>
<td style="text-align: left">unchecked return value</td>
<td style="text-align: left"><a href="https://github.com/ElementsProject/lightning/commit/eeec5290316fa78974c4ef5e8cfb7bdf7a08c09c">eeec529</a></td>
</tr>
<tr>
<td style="text-align: left"><strong>2</strong></td>
<td> </td>
<td style="text-align: left">use of uninitialized memory</td>
<td style="text-align: left">missing check for 0-length TLV</td>
<td style="text-align: left"><a href="https://github.com/ElementsProject/lightning/commit/ee501b035b2e8340476984d0063fda3f954d7f51">ee501b0</a></td>
</tr>
<tr>
<td style="text-align: left"><strong>3</strong></td>
<td> </td>
<td style="text-align: left">crash</td>
<td style="text-align: left">unnecessary assertion</td>
<td style="text-align: left"><a href="https://github.com/ElementsProject/lightning/commit/ee8cf69f281c78d47b838200c690378b0b3918a4">ee8cf69</a></td>
</tr>
<tr>
<td style="text-align: left"><strong>4</strong></td>
<td> </td>
<td style="text-align: left">crash</td>
<td style="text-align: left">missing recovery ID validation</td>
<td style="text-align: left"><a href="https://github.com/ElementsProject/lightning/commit/c1f20687a6babbd2ded354553936889ebda8f142">c1f2068</a></td>
</tr>
<tr>
<td style="text-align: left"><strong>5</strong></td>
<td> </td>
<td style="text-align: left">crash</td>
<td style="text-align: left">missing pubkey validation</td>
<td style="text-align: left"><a href="https://github.com/ElementsProject/lightning/commit/87f4907bb40a38e06254ef9b9a3600f58f3a3f5b">87f4907</a></td>
</tr>
</tbody>
</table>
<h1 id="the-fuzz-target">The fuzz target</h1>
<p>The fuzz target that uncovered these bugs was <a href="https://github.com/ElementsProject/lightning/pull/6750">initially written</a> by Niklas Gögge in December 2022, though it wasn’t made public until October 2023.
The target simply provides fuzzer-generated inputs to CLN’s invoice decoding function, similar to fuzz targets written for other implementations [<a href="https://github.com/lightningdevkit/rust-lightning/blob/c2bbfffb1eb249c2c422cf2e9ccac97a34275f7a/fuzz/src/invoice_deser.rs">1</a>, <a href="https://github.com/lightningnetwork/lnd/blob/27319315bb21130f9618877da5d9acda6d6ab453/zpay32/fuzz_test.go#L44-L49">2</a>].</p>
<p>To improve the fuzzer’s efficiency, Niklas also wrote a custom mutator for the target.
Invoices are encoded in <a href="https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki">bech32</a> which requires a valid checksum at the end of the encoding, making it quite difficult for fuzzers to generate valid bech32 consistently.
As a result, bech32-naive fuzzers will generally get stuck at the bech32 decoding stage and have a hard time exploring deeper into the invoice parsing logic.
Niklas’ custom mutator teaches the fuzzer how to generate valid bech32 so that it can focus its fuzzing on invoice parsing.</p>
<h2 id="initial-fuzzing-in-2022">Initial fuzzing in 2022</h2>
<p>After writing the fuzz target in December 2022, Niklas privately reported several bugs to CLN including a stack buffer overflow, an assertion failure, and undefined behavior due to a 0-length array.
Many of the bugs were fixed in <a href="https://github.com/ElementsProject/lightning/pull/5891">PR 5891</a> and released in <a href="https://github.com/ElementsProject/lightning/releases/tag/v23.02">CLN 23.02</a>.</p>
<h2 id="merging-the-fuzz-target-in-2023">Merging the fuzz target in 2023</h2>
<p>In October 2023, Niklas submitted his fuzz target for review in <a href="https://github.com/ElementsProject/lightning/pull/6750">PR 6750</a>.
The initial corpus in that PR actually triggered bugs 1 and 2, but Niklas didn’t notice because he had been fuzzing with some UBSan options misconfigured.
CLN’s CI didn’t detect the bugs either, since <a href="https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html">UBSan</a> had previously been <a href="https://github.com/ElementsProject/lightning/commit/364de0094713ed388daa0fc0de7f18c41d1759f0">accidentally disabled</a> in CI.</p>
<p>Niklas also discovered bug 3 during initial fuzzing, but he initially thought it was a false report and hard-coded an exception for it in the fuzz target.</p>
<h2 id="enhancements">Enhancements</h2>
<p>The initial fuzz target only fuzzed the invoice decoding logic, skipping signature checks.
I <a href="https://github.com/ElementsProject/lightning/commit/4b29502098535f0aa00a46fe5692a2d49bb5ebce">modified</a> the target to also run the signature-checking logic, which enabled the fuzzer to quickly find bug 4.</p>
<p>While bug 5 should have also been discoverable by the fuzzer after this change, it remained undetected even after many weeks of CPU time.
It wasn’t until I added a <a href="https://github.com/ElementsProject/lightning/pull/6805">custom cross-over mutator</a> for the fuzz target that bug 5 was discovered.
The cross-over mutator is based on Niklas’ custom mutator and simply combines pieces from multiple bech32-decoded invoices before re-encoding the result in bech32.
Within a few CPU hours of fuzzing with this extra mutator, the fuzzer found bug 5.</p>
<h1 id="impact">Impact</h1>
<p>The severity of these bugs seems relatively low since they can only be triggered when paying an invoice.
If a malicious invoice causes your node to crash, as long as you can restart your node in a timely manner and avoid paying any more invoices from the malicious counterparty, no further harm can be done.</p>
<p>Since bug 2 involves uninitialized memory it could potentially be more serious, as a sophisticated attacker <em>may</em> be able to extract sensitive data from the invoice-decoding process.
Such an attack would be quite complex, and it is unclear whether it would even be possible in practice.
It’s also unclear exactly what sensitive data could be extracted, since CLN handles private keys in a separate dedicated process (the <code class="language-plaintext highlighter-rouge">hsmd</code> daemon).</p>
<h1 id="takeaways">Takeaways</h1>
<ul>
<li>Fuzz testing is an essential component of writing robust and secure software. Any API that consumes untrusted inputs should be fuzz tested.</li>
<li>Custom mutators can be very powerful for fuzzing deeper logic in the codebase.</li>
<li>Fuzz testing of C or C++ code should use both ASan <em>and</em> UBSan. MSan and valgrind can also be useful.</li>
</ul>
<p><a href="https://morehouse.github.io/lightning/cln-invoice-parsing/">Invoice Parsing Bugs in CLN</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on December 08, 2023.</p>
https://morehouse.github.io/lightning/fake-channel-dos2023-08-23T00:00:00-00:002023-08-23T00:00:00-05:00Matt Morehousehttps://morehouse.github.io[email protected]
<p>Lightning nodes released prior to the following versions are susceptible to a
DoS attack involving the creation of large numbers of fake channels:</p>
<ul>
<li><a href="https://github.com/lightningnetwork/lnd/releases/tag/v0.16.0-beta">LND 0.16.0</a></li>
<li><a href="https://github.com/ElementsProject/lightning/releases/tag/v23.02">CLN 23.02</a></li>
<li><a href="https://github.com/ACINQ/eclair/releases/tag/v0.9.0">eclair 0.9.0</a></li>
<li><a href="https://github.com/lightningdevkit/rust-lightning/releases/tag/v0.0.114">LDK 0.0.114</a></li>
</ul>
<p>If you are running node software older than this, your funds may be at risk!
Update to at least the above versions to help protect your node.</p>
<h1 id="the-vulnerability">The vulnerability</h1>
<p>When one lightning node (the funder) wishes to open a channel to another node
(the fundee), the following sequence of events takes place:</p>
<p><img src="/images/channel_funding.png" alt="channel funding diagram" /></p>
<ol>
<li>The funder sends an <code class="language-plaintext highlighter-rouge">open_channel</code> message with the desired parameters for
the channel.</li>
<li>The fundee checks that the channel parameters are reasonable and then sends
an <code class="language-plaintext highlighter-rouge">accept_channel</code> message.</li>
<li>The funder creates the funding transaction and sends a <code class="language-plaintext highlighter-rouge">funding_created</code>
message containing the funding outpoint and their signature for the
commitment transaction.</li>
<li>The fundee verifies the funder’s commitment signature and sends
<code class="language-plaintext highlighter-rouge">funding_signed</code> with their own signature for the commitment. The fundee
begins watching the chain for the funding transaction.</li>
<li>The funder verifies the fundee’s commitment signature, broadcasts the funding
transaction, and then watches for it to show up onchain.</li>
<li>Both nodes send <code class="language-plaintext highlighter-rouge">channel_ready</code> once the funding transaction has enough
confirmations. Payments can now be sent across the channel.</li>
</ol>
<p><em>But what happens if the funder doesn’t broadcast the funding transaction in
step 5?</em></p>
<p><img src="/images/channel_funding_dos.png" alt="channel funding DoS diagram" /></p>
<p>The fundee, eager for inbound liquidity, is willing to wait for the
funding transaction to confirm for a period of time. But eventually the fundee
needs to give up on the pending channel and reclaim the resources allocated to
it. BOLT 2 recommends waiting for <a href="https://github.com/lightning/bolts/blob/7d3ef5a6b20eb84982ea2bfc029497082adf20d8/02-peer-protocol.md#the-channel_ready-message">2016 blocks</a>
(2 weeks) before abandoning the pending channel.</p>
<p><strong>Thus for 2 weeks the fundee devotes some amount of database storage, RAM,
and CPU time to watching for the pending channel to confirm.</strong></p>
<h1 id="the-fake-channel-dos-attack">The fake channel DoS attack</h1>
<p>An attacker can thus force a victim node to consume a small amount of resources
by opening a fake channel with the victim and never publishing it onchain. If
the attacker can create lots of fake channels, they can lock up lots of the
victim’s resources.</p>
<p>Fake channels are trivial to create. Since there is no way for the victim
to verify the funding outpoint sent to them in the <code class="language-plaintext highlighter-rouge">funding_created</code> message,
the attacker doesn’t even need to construct a real funding transaction. They
can use a randomly-generated funding transaction ID and sign a commitment
transaction based on that fake ID. The victim will successfully verify the
commitment signature against the provided (fake) funding outpoint and gladly
allocate resources for the fake pending channel.</p>
<p>Opening lots of these fake channels is also trivial against node software
older than the above releases. Some older node implementations do impose a
limit on the number of pending channels allowed per peer, but such limits are
easily bypassed by using a new attacker node ID for each fake channel.</p>
<h1 id="dos-effects">DoS effects</h1>
<p>In my experiments, I was able to create hundreds of thousands of fake channels
against victim nodes (owned by me), with all kinds of adverse effects. In some
cases, funds were clearly at risk of being stolen due to the victim node’s
inability to respond to cheating attempts.</p>
<p>Here’s how the DoS attack affected each node implementation.</p>
<h2 id="lnd">LND</h2>
<p>Over the course of a couple days, LND’s performance degraded so drastically that
it stopped responding to requests from its peers or from the CLI. The
performance degradation continued on restart, even if the attacker was no longer
actively DoSing.</p>
<p>I didn’t continue the DoS experiment for more than a couple days, but it’s very
possible that with enough time the victim node would have become unresponsive
enough that funds could be stolen without consequence.</p>
<h2 id="cln">CLN</h2>
<p>After one day of the DoS attack, CLN’s <code class="language-plaintext highlighter-rouge">connectd</code> daemon was completely blocked
and unable to respond to connection requests from other nodes. Most other
functionality of CLN continued to work, and funds were not at risk since the
separate <code class="language-plaintext highlighter-rouge">lightningd</code> daemon was not blocked by the DoS attack.</p>
<h2 id="eclair">eclair</h2>
<p>One day into the DoS, eclair OOM crashed. After that, every time eclair
restarted, it OOM crashed again within 30 minutes, even if the attacker was no
longer actively DoSing. Funds were clearly at risk, since an offline node
cannot catch cheating attempts.</p>
<h2 id="ldk">LDK</h2>
<p>Since LDK is a library and not a full node implementation, it was trickier to
experiment with. LDK Node didn’t exist at the time, but I found the
<a href="https://github.com/lightningdevkit/ldk-sample">ldk-sample</a> node and modified
it to run on mainnet for the experiment.</p>
<p>Within hours of the DoS attack, ldk-sample’s performance degraded drastically,
causing it to unsync with the blockchain. A few days later, ldk-sample’s view
of the blockchain was pinned more than 144 blocks in the past, preventing it
from responding to cheating attempts before the attacker’s CSV timelock
expired.</p>
<h1 id="dos-defenses">DoS defenses</h1>
<p>I reported the DoS vector to the 4 major lightning implementations around
the start of 2023. eclair and LDK were already aware of the potential DoS
vector but hadn’t realized the severity of the vulnerability. Within days of
receiving my report, every lightning implementation began working on defenses,
some openly and others in secret.</p>
<p>All implementations have now shipped releases with defenses against the DoS. If
you’re interested in the technical details of the defenses, see the linked pull
requests and commits.</p>
<table rules="groups">
<thead>
<tr>
<th style="text-align: left">Date Reported</th>
<th style="text-align: left">Implementation</th>
<th style="text-align: left">Defenses</th>
<th style="text-align: left">Release</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">2022-12-12</td>
<td style="text-align: left">LND</td>
<td style="text-align: left">pending channel limit [<a href="https://github.com/lightningnetwork/lnd/commit/3f6315242a7ceb160c12f6997f5c020362424877">1</a>]</td>
<td style="text-align: left">0.16.0</td>
</tr>
<tr>
<td style="text-align: left">2022-12-15</td>
<td style="text-align: left">CLN</td>
<td style="text-align: left">significant performance improvements [<a href="https://github.com/ElementsProject/lightning/pull/5837">1</a>, <a href="https://github.com/ElementsProject/lightning/pull/5849">2</a>]</td>
<td style="text-align: left">23.02</td>
</tr>
<tr>
<td style="text-align: left">2022-12-28</td>
<td style="text-align: left">eclair</td>
<td style="text-align: left">pending channel and peer limits [<a href="https://github.com/ACINQ/eclair/pull/2552">1</a>, <a href="https://github.com/ACINQ/eclair/pull/2601">2</a>]</td>
<td style="text-align: left">0.9.0</td>
</tr>
<tr>
<td style="text-align: left">2023-01-17</td>
<td style="text-align: left">LDK</td>
<td style="text-align: left">pending channel and peer limits [<a href="https://github.com/lightningdevkit/rust-lightning/pull/1988">1</a>]</td>
<td style="text-align: left">0.0.114</td>
</tr>
</tbody>
</table>
<h1 id="lessons">Lessons</h1>
<h2 id="use-watchtowers">Use watchtowers</h2>
<p>When all else fails, watchtowers help to protect funds if your lightning node is
incapacitated by a DoS attack. If you have significant funds at risk, it’s
cheap insurance to run a private watchtower on a separate machine.</p>
<h2 id="multiple-processes">Multiple processes</h2>
<p>Prior to the above releases, CLN was the only lightning implementation that
clearly kept user funds safe while
under DoS, because CLN actually runs as multiple separate daemon processes. In
the case of this DoS attack, the <code class="language-plaintext highlighter-rouge">connectd</code> daemon responsible for handling
peer connections became locked up while the <code class="language-plaintext highlighter-rouge">lightningd</code> daemon watching the
blockchain was relatively unaffected.</p>
<p>Multiprocess architectures in general provide some defense against DoS, as
one process slowing down or crashing doesn’t automatically bring down the other
processes. For this reason, other implementations may want to consider
splitting their nodes into separate processes. CLN could also improve
robustness further by attempting to restart DoS-able subdaemons like <code class="language-plaintext highlighter-rouge">connectd</code>
and <code class="language-plaintext highlighter-rouge">gossipd</code> if they crash, rather than shutting the whole node down.</p>
<h2 id="more-security-auditing-needed">More security auditing needed</h2>
<p>I discovered this DoS vector last year. I had been reviewing the dual funding
protocol and found a <a href="https://github.com/lightning/bolts/pull/851#discussion_r997537630">griefing
attack</a>
involving fake dual-funded channels. After discussing the attack with Bastien
Teinturier, I came to realize that a similar attack may also affect the
single-funded protocol.</p>
<p>But I convinced myself for a couple months that surely such a trivial attack
would have been defended against already. It wasn’t until I spent some time
studying implementations’ funding code that I realized there were no defenses.</p>
<p>The fact that this DoS vector went unnoticed since the beginning of the
Lightning Network should make everyone a little scared. If a newcomer like me
could discover this vulnerability in a couple months, there are probably many
other vulnerabilities in the Lightning Network waiting to be found and
exploited.</p>
<p>For quite some time, it seems that security and robustness have not been the
top priority for node implementations, with some implementations not even
having security policies until 6-10 months ago
[<a href="https://github.com/lightningnetwork/lnd/commit/609cc8b883c7e6186e447e8d7e6349688d78d4fd">1</a>,
<a href="https://github.com/ElementsProject/lightning/commit/e29fd2a8e26d655a7fb0f8b1c18092c2cdd787da">2</a>].
Everyone wants new lightning
features: dual funded channels, Taproot channels, splicing, BOLT 12, etc.
And those things are important. But every one of them introduces more
complexity and more potential attack surface. If we’re going to make lightning
even more complex, we also need to ramp up the engineering effort we put towards
making the network secure and robust.</p>
<p>Because in the end it doesn’t matter how feature-rich and easy-to-use the
Lightning Network is if it can’t keep user funds safe.</p>
<p><a href="https://morehouse.github.io/lightning/fake-channel-dos/">DoS: Fake Lightning Channels</a> was originally published by Matt Morehouse at <a href="https://morehouse.github.io">Matt Morehouse</a> on August 23, 2023.</p>