Mempool: rework rebroadcast logic to improve privacy#16698
Mempool: rework rebroadcast logic to improve privacy#16698amitiuttarwar wants to merge 7 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
13fd122 to
c383bde
Compare
c383bde to
ba33693
Compare
|
This code is ready for initial review. there are still some to dos before it would be ready for merge:
there are also some follow-ups that can be addressed in separate PRs:
|
|
Concept ACK |
ba33693 to
b1d0a54
Compare
b1d0a54 to
4bda142
Compare
maflcko
left a comment
There was a problem hiding this comment.
Concept ACK. Some questions about why microseconds are truncated to seconds among other stuff.
There was a problem hiding this comment.
Txns are defined as "should have been mined" by using the block assembler logic
I don’t understand 1) the concept “should have been mined”, and 2) how the block assembler logic achieves this.
As to 1) do you mean the txns should have been mined in a specific block/range of blocks, but weren’t? Should no rebroadcasts happen in an ideal world where miners have an identical mempool to ours and mine rationally?
As to 2) from what I understand, BlockAssembler creates a block template with 3/4*MAX_BLOCK_WEIGHT including the top feerate packages of our current mempool. Wouldn’t it always fill this block template with txns if our mempool is large enough, and therefore rather include 75% of the txns that we expect to be mined in the next block, instead of txns that should have been mined in the past?
|
thanks for the review @mzumsande !
based on the local mempool, we are attempting to answer the question of what txns we think should have been mined. And saying if it wasn't, maybe there was an issue with relay.
yes. you will start with txns you expect to be mined in the next block. the recency filter will (likely) remove some of those transactions. however, in the case of a mempool thats emptying out, the recency filter might not have much effect. for that I have this todo before the PR would be ready for merge:
|
What confuses me is how we can answer that without actually looking into recent blocks. How could we distinguish between these cases by just looking at our current mempool? |
There is one, -walletbroadcast=0. How should this be handling dependencies? If you are merely filtering on feerate will you end up with a situation where you rebroadcast a child transaction that has a high feerate even when the low feerate of its parent means it has no chance of being included? Has anyone considered significantly increasing the time between rebroadcasts as an additional change (and one that could go in separately much faster)? Particularly once you've performed one successful broadcast per execution additional rebroadcasts usually do nothing except leak data to attackers that frequently reconnect to you-- you won't resend to a host you've sent to before, and its not usually useful to do so. This wouldn't obviate the need to be more intelligent, but it would reduce the bleeding and also make it more acceptable for the rest of the change to be more approximate.
This sort of concern would be substantially mitigated by not rebroadcasting as soon as the transaction is 'missed' but only after it is missed by N blocks (say 3) blocks where, according to your mempool they should have included the transaction. Then you would only rebroadcast inappropriately if all 3 miners should have included it but skipped it for some reason.
Bandwidth impact should be somewhat mitigated by the fact that the known filters should block many of the rebroadcasts to long running peers. FWIW, I missed on the first pass that this change makes rebroadcasting apply to all mempool transactions. I think that is conceptually a very good move but it has considerable implications. Looking at your average bandwidth usage isn't enough, you have to worry about cases like what if all nodes are running into their mempool limits under high load-- will this change cause the network to start DOS attacking itself?
Compact blocks would be entirely the wrong mechanism. The transactions are unordered, unlike a block. And they are almost entirely known. The mechanism you want is the erlay reconcillation mechanism because it uses bandwidth equal to the size of the actual set difference, not the size of the mempool. In fact with that mechanism, in theory there is no need to do any additional mempool feerate tracking, comparisons with blocks or whatever. Just use erlay to send IDs for the entire mempool (filtered only by the peer's current relay minimum feerate) after every block. This takes bandwidth equal to 4 bytes times the number of differences, no bytes per entry in common. However, in practice it isn't that simple. While designing erlay we specifically consider and discarded the approach of making it work by synchronizing the mempools (which was how I approached the relay efficiency problem in my original writeup). The reason for this is that difference in mempool policy (standardness, minfees, maximum mempool sizes, ancestor depth limits, softforks) or even just simple transaction ordering causes persistent network-wide bandwidth leaks if you relay by syncing the mempool. All these problems also apply here, because this is essentially a mempool syncing proposal, but even worse because compared to the erlay reconciliation this is an extremely inefficient way of syncing: it spends bandwidth for transactions the peer already knows. Consider an attacker that monitors the network and finds nodes close to miners. Then he sends the near-miner nodes a bunch of very low feerate transactions which won't get mined soon . He concurrently sends conflicting high feerate transactions to every other node. The high feerate transactions don't get mined, the other nodes have no idea why, and they bleed bandwidth continually uselessly re-advertising transactions. (if the higher feerate txn gets mined by accident he just tries again) If erlay style relay is used, the bandwidth is really only wasted at the low/high feerate boundary... but unfortunately the attacker can make that boundary arbitrarily large (e.g. give half the nodes the low feerate txn in additional to all the miner-proximal nodes). |
|
Question: Is it a goal to rebroadcast transactions to peers that have already had them sent to them? If yes: the current PR (and wallet rebroadcasting) fails to do that (reliably) because of the per peer bloom filters on broadcasts. If no: great that reduces bandwidth, but the current bloomfiltering approach doesn't successfully do that reliably-- because it forgets. The existing bloom filters are slow and use a lot of memory (they're gigantic to avoid false positives that would break txn relay). For other reasons its been previously proposed to replace them with data in each mempool transaction that indicates which peers its been relayed to. E.g. a possible approach is each mempool txn gets a Or something analogous but with less allocation overhead. :) I suspect that for reasonable numbers of peers and transactions this could take less memory than the bloom filter (which I seem to vaguely recall is something like 1MB per peer?). |
|
thank you for your review @gmaxwell ! RE: kill switch &
RE: handling dependencies
I have not. That’s a great idea. I'm going to create a smaller PR with the mempool logic that guarantees delivery of wallet transactions & reduce frequency of rebroadcasts in the existing implementation. Thank you!
Yeah. For these sorts of reason there are impositions on the rebroadcast set (mainly txn must be >30 mins old. block must have arrived between last cache run and rebroadcast ). Still an open question whether the params chosen are the most reasonable. RE: bandwidth Yup and yup. The attack you’ve portrayed here is in line with the open question I’ve been pondering about mempools with different policies - the non-adversarial case you’ve mentioned. The solution I’m thinking about is adding another data structure,
Hm, somewhere in between? The goal of rebroadcasting transactions is to propagate a transaction out to the network in your mempool that you suspect the miners might not have. If you’ve recently sent a txn to a peer, you don’t have reason to believe resending it to them would help solve this issue, and would just waste bandwidth. On the other hand, you would want to rebroadcast to the same peers in eg. a situation where you have a large mempool, your peers have small mempools & a competitive fee rate market caused them to drop a transaction. After the fee pressure reduces, re-announcing the txn to them could help it get mined. In its current form, the bloom filter is conducive to this behavior by preventing repeat sends to a peer within a ~2-6 hr time period (including txns dropped from the mempool). RE: alternative peer-transaction tracking system I’m definitely interested in thinking through this approach you’re proposing / other alternatives to the current bloom filter, but think its tangential to this PR? Would be happy to continue the convo on a new issue if you’re interested in opening. |
| Needs rebase |
8bca30e [rpc] expose ability to mock scheduler via the rpc (Amiti Uttarwar) 7c8b6e5 [lib] add scheduler to node context (Amiti Uttarwar) 930d837 [test] add chainparams property to indicate chain allows time mocking (Amiti Uttarwar) 1cd43e8 [test] unit test for new MockForward scheduler method (Amiti Uttarwar) a6f6359 [util] allow scheduler to be mocked (Amiti Uttarwar) Pull request description: This PR is to support functional tests by allowing the scheduler to be mocked via the RPC. It adds a `MockForward` method to the scheduler class that iterates through the task queue and reschedules them to be `delta_seconds` sooner. This is currently used to support functional testing of the "unbroadcast" set tracking in #18038. If this patch is accepted, it would also be useful to simplify the code in #16698. ACKs for top commit: MarcoFalke: ACK 8bca30e, only change is some style fixups 🕓 Tree-SHA512: 2a97fe8ade2b7fd1fb5cdfa1dcafb3227a377d7a847e3845a228bc119eb77824b4aefa43d922a06d583939b22725e223f308cf092961048079d36f6b1d9a639b
8bca30e [rpc] expose ability to mock scheduler via the rpc (Amiti Uttarwar) 7c8b6e5 [lib] add scheduler to node context (Amiti Uttarwar) 930d837 [test] add chainparams property to indicate chain allows time mocking (Amiti Uttarwar) 1cd43e8 [test] unit test for new MockForward scheduler method (Amiti Uttarwar) a6f6359 [util] allow scheduler to be mocked (Amiti Uttarwar) Pull request description: This PR is to support functional tests by allowing the scheduler to be mocked via the RPC. It adds a `MockForward` method to the scheduler class that iterates through the task queue and reschedules them to be `delta_seconds` sooner. This is currently used to support functional testing of the "unbroadcast" set tracking in bitcoin#18038. If this patch is accepted, it would also be useful to simplify the code in bitcoin#16698. ACKs for top commit: MarcoFalke: ACK 8bca30e, only change is some style fixups 🕓 Tree-SHA512: 2a97fe8ade2b7fd1fb5cdfa1dcafb3227a377d7a847e3845a228bc119eb77824b4aefa43d922a06d583939b22725e223f308cf092961048079d36f6b1d9a639b
…kable time 1a8f0d5 [tools] update nNextInvSend to use mockable time (Amiti Uttarwar) 4de6303 [tools] add PoissonNextSend method that returns mockable time (Amiti Uttarwar) Pull request description: Introduce a Poisson helper method that wraps the existing method to return `std::chrono::duration` type, which is mockable. Needed for bitcoin#16698. ACKs for top commit: ajtowns: ACK 1a8f0d5 MarcoFalke: re-ACK 1a8f0d5 naumenkogs: ACK 1a8f0d5, and let's merge it and come back to it later. Tree-SHA512: 7e2325d7c55fc0b4357cb86b83e0c218ba269f678c1786342d8bc380bfd9696373bc24ff124b9ff17a6e761c62b2b44ff5247c3911e2afdc7cc5c20417e8290b
…mprove wallet privacy 50fc4df [mempool] Persist unbroadcast set to mempool.dat (Amiti Uttarwar) 297a178 [test] Integration tests for unbroadcast functionality (Amiti Uttarwar) 6851502 [refactor/test] Extract P2PTxInvStore into test framework (Amiti Uttarwar) dc1da48 [wallet] Update the rebroadcast frequency to be ~1/day. (Amiti Uttarwar) e25e42f [p2p] Reattempt initial send of unbroadcast transactions (Amiti Uttarwar) 7e93eec [util] Add method that returns random time in milliseconds (Amiti Uttarwar) 89eeb4a [mempool] Track "unbroadcast" transactions (Amiti Uttarwar) Pull request description: This PR introduces mempool tracking of unbroadcast transactions and periodic reattempts at initial broadcast. This is a part of the rebroadcast project, and a standalone privacy win. The current rebroadcast logic is terrible for privacy because 1. only the source wallet rebroadcasts transactions and 2. it does so quite frequently. In the current system, if a user submits a transaction that does not immediately get broadcast to the network (eg. they are offline), this "rebroadcast" behavior is the safety net that can actually serve as the initial broadcast. So, keeping the attempts frequent is important for initial delivery within a reasonable timespan. This PR aims to improve # 2 by reducing the wallet rebroadcast frequency to ~1/day from ~1/15 min. It achieves this by separating the notion of initial broadcast from rebroadcasts. With these changes, the mempool tracks locally submitted transactions & periodically reattempts initial broadcast. Transactions submitted via the wallet or RPC are added to an "unbroadcast" set & are removed when a peer sends a `getdata` request, or the transaction is removed from the mempool. Every 10-15 minutes, the node reattempts an initial broadcast. This enables reducing the wallet rebroadcast frequency while ensuring the transactions will be propagated to the network. For privacy improvements around # 1, please see #16698. Thank you to gmaxwell for the idea of how to break out this subset of functionality (#16698 (comment)) ACKs for top commit: fjahr: Code review ACK 50fc4df MarcoFalke: ACK 50fc4df, I think this is ready for merge now 👻 amitiuttarwar: The current tip `50fc4df` currently has 6 ACKs on it, so I've opened #18807 to address the last bits. jnewbery: utACK 50fc4df. ariard: Code Review ACK 50fc4df (minor points no need to invalid other ACKs) robot-visions: ACK 50fc4df sipa: utACK 50fc4df naumenkogs: utACK 50fc4df Tree-SHA512: 2dd935d645d5e209f8abf87bfaa3ef0e4492705ce7e89ea64279cb27ffd37f4727fa94ad62d41be331177332f8edbebf3c7f4972f8cda10dd951b80a28ab3c0f
…ns to improve wallet privacy 50fc4df [mempool] Persist unbroadcast set to mempool.dat (Amiti Uttarwar) 297a178 [test] Integration tests for unbroadcast functionality (Amiti Uttarwar) 6851502 [refactor/test] Extract P2PTxInvStore into test framework (Amiti Uttarwar) dc1da48 [wallet] Update the rebroadcast frequency to be ~1/day. (Amiti Uttarwar) e25e42f [p2p] Reattempt initial send of unbroadcast transactions (Amiti Uttarwar) 7e93eec [util] Add method that returns random time in milliseconds (Amiti Uttarwar) 89eeb4a [mempool] Track "unbroadcast" transactions (Amiti Uttarwar) Pull request description: This PR introduces mempool tracking of unbroadcast transactions and periodic reattempts at initial broadcast. This is a part of the rebroadcast project, and a standalone privacy win. The current rebroadcast logic is terrible for privacy because 1. only the source wallet rebroadcasts transactions and 2. it does so quite frequently. In the current system, if a user submits a transaction that does not immediately get broadcast to the network (eg. they are offline), this "rebroadcast" behavior is the safety net that can actually serve as the initial broadcast. So, keeping the attempts frequent is important for initial delivery within a reasonable timespan. This PR aims to improve # 2 by reducing the wallet rebroadcast frequency to ~1/day from ~1/15 min. It achieves this by separating the notion of initial broadcast from rebroadcasts. With these changes, the mempool tracks locally submitted transactions & periodically reattempts initial broadcast. Transactions submitted via the wallet or RPC are added to an "unbroadcast" set & are removed when a peer sends a `getdata` request, or the transaction is removed from the mempool. Every 10-15 minutes, the node reattempts an initial broadcast. This enables reducing the wallet rebroadcast frequency while ensuring the transactions will be propagated to the network. For privacy improvements around # 1, please see bitcoin#16698. Thank you to gmaxwell for the idea of how to break out this subset of functionality (bitcoin#16698 (comment)) ACKs for top commit: fjahr: Code review ACK 50fc4df MarcoFalke: ACK 50fc4df, I think this is ready for merge now 👻 amitiuttarwar: The current tip `50fc4df` currently has 6 ACKs on it, so I've opened bitcoin#18807 to address the last bits. jnewbery: utACK 50fc4df. ariard: Code Review ACK 50fc4df (minor points no need to invalid other ACKs) robot-visions: ACK 50fc4df sipa: utACK 50fc4df naumenkogs: utACK 50fc4df Tree-SHA512: 2dd935d645d5e209f8abf87bfaa3ef0e4492705ce7e89ea64279cb27ffd37f4727fa94ad62d41be331177332f8edbebf3c7f4972f8cda10dd951b80a28ab3c0f
8bca30e [rpc] expose ability to mock scheduler via the rpc (Amiti Uttarwar) 7c8b6e5 [lib] add scheduler to node context (Amiti Uttarwar) 930d837 [test] add chainparams property to indicate chain allows time mocking (Amiti Uttarwar) 1cd43e8 [test] unit test for new MockForward scheduler method (Amiti Uttarwar) a6f6359 [util] allow scheduler to be mocked (Amiti Uttarwar) Pull request description: This PR is to support functional tests by allowing the scheduler to be mocked via the RPC. It adds a `MockForward` method to the scheduler class that iterates through the task queue and reschedules them to be `delta_seconds` sooner. This is currently used to support functional testing of the "unbroadcast" set tracking in bitcoin#18038. If this patch is accepted, it would also be useful to simplify the code in bitcoin#16698. ACKs for top commit: MarcoFalke: ACK 8bca30e, only change is some style fixups 🕓 Tree-SHA512: 2a97fe8ade2b7fd1fb5cdfa1dcafb3227a377d7a847e3845a228bc119eb77824b4aefa43d922a06d583939b22725e223f308cf092961048079d36f6b1d9a639b
|
Is this work going to continue now that #18038 is done? |
|
@gmaxwell yes! I'm currently trying to work out a couple more kinks before continuing the review process on github & seeking feedback on approach. |
Summary: - Mempool tracks locally submitted transactions (wallet or rpc) - Transactions are removed from set when the node receives a GETDATA request from a peer, or if the transaction is removed from the mempool. PR description: > This PR introduces mempool tracking of unbroadcast transactions and periodic reattempts at initial broadcast. This is a part of the rebroadcast project, and a standalone privacy win. > > The current rebroadcast logic is terrible for privacy because 1. only the source wallet rebroadcasts transactions and 2. it does so quite frequently. In the current system, if a user submits a transaction that does not immediately get broadcast to the network (eg. they are offline), this "rebroadcast" behavior is the safety net that can actually serve as the initial broadcast. So, keeping the attempts frequent is important for initial delivery within a reasonable timespan. > > This PR aims to improve # 2 by reducing the wallet rebroadcast frequency to ~1/day from ~1/15 min. It achieves this by separating the notion of initial broadcast from rebroadcasts. With these changes, the mempool tracks locally submitted transactions & periodically reattempts initial broadcast. Transactions submitted via the wallet or RPC are added to an "unbroadcast" set & are removed when a peer sends a getdata request, or the transaction is removed from the mempool. Every 10-15 minutes, the node reattempts an initial broadcast. This enables reducing the wallet rebroadcast frequency while ensuring the transactions will be propagated to the network. > > For privacy improvements around # 1, please see [[bitcoin/bitcoin#16698 | PR16698]]. This is a backport of Core [[bitcoin/bitcoin#18038 | PR18038]] [1/7] bitcoin/bitcoin@89eeb4a Test Plan: ninja all check-all Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Subscribers: majcosta Differential Revision: https://reviews.bitcoinabc.org/D9006
|
this work is being continued in #21061 :) |
The current rebroadcast logic is terrible for privacy because only the source wallet will rebroadcast transactions, and does so quite frequently. This PR aims to improve privacy dramatically while preserving the benefits of rebroadcasting that ensure txns are successfully propagated through the network.
This PR introduces changes so nodes will resend transactions that it believes should have already been mined. It extracts the logic from the wallet into the mempool, so nodes will rebroadcast txns regardless of the originating wallet. Txns are defined as "should have been mined" by using the block assembler logic, and excluding txns that were recently added to the mempool. The wallet will resubmit txns to the mempool on a regular cadence to ensure those txns aren't dropped (due to eviction, expiry, etc..) before being confirmed.
For more information, see: https://gist.github.com/amitiuttarwar/b592ee410e1f02ac0d44fcbed4621dba