p2p: supplying and using asmap to improve IP bucketing in addrman#16702
p2p: supplying and using asmap to improve IP bucketing in addrman#16702laanwj merged 4 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
FWIW, the asmap generated from https://dev.maxmind.com/geoip/geoip2/geolite2/ is 988387 bytes in size. |
|
Encoding the map as bool[] array in the source code will add to the executable 4 bytes per bit in the map, that's a bit excessive. You should probably encode it as a encode it as a uint8_t[] instead (with 8 bits per array element). |
|
@sipa Would you mind providing the python script used to generate the asmap? I think Bitcoiners like @TheBlueMatt and myself who maintain their own BGP full table view of the Internet would like to generate their own asmap instead of trusting maxmind, or at least verify it. |
|
Concept ACK |
|
@wiz The script is here: https://gist.github.com/sipa/b90070570597b950f29a6297772a7636 though we need more tooling to convert from common dump formats, sanity check, ... I'll publish some when they're a bit cleaned up and usable. |
|
Thanks for the script, it helps a lot with my own implementation and I also wanted to point out a few potential attack vectors to circumvent the asmap protection:
There are several ASN ranges which can be used as originating ASNs without registration or verification and could be used to circumvent the asmap protection. For example I could announce one /24 from 65000, one /24 from 65001, and so on, all behind my actual ASN, in order to launch an Erebus attack. I propose that we map any IP blocks originating from the following invalid ASNs to the next-valid ASN upstream of it:
There are actually around 50 routing prefixes on the Internet that legitimately have multiple originating ASNs, so it's not always a strict 1:1 mapping. For example, a valid use case could be an ISP that aggregates multiple customer /29 or /30 prefixes into a single /24 announcement (since /24 is the smallest globally routable prefix size that would be generally accepted) and my router indicates all of the multiple originating ASNs in curly braces like this: Of course this could be used to circumvent the asmap protection as well, so for this case I also propose we instead identify the upstream aggregating ASN as the originating ASN. For my implementation I'm simply going to use a regex to strip out the curly braces aggregation to ignore it. |
|
@wiz Very good points!
Besides using private ASN ranges another obvious attack vector would be to make use of non-reserved but unused or unallocated AS-numbers as a faux downstream for a specific attacker controlled prefix: Assume that An attacker The global view would hence be:
Traffic to Assuming that How can we guard against this attack? To guard against an attacker making use of non-IANA allocated ASNs as faux specific attacker controlled prefix:Instead of blacklisting known reserved AS numbers, we could be stricter and apply a whitelisting approach: allow only AS number ranges that have been been allocated to the five Regional Internet Registries (AFRINIC, APNIC, ARIN, LACNIC and RIPE) by IANA. That data is available in machine readable form. To guard against an attacker making use of IANA allocated but RIR unallocated ASNs as faux downstream for a specific attacker controlled prefix:I believe all five RIR:s provide machine readable lists of the individual ASNs they have assigned to organisations. (Note that this data comes with the country code for the organisation that has been assigned the AS-number. That could be handy if we some time in the future would like to implement logic for avoiding country or region based netsplits. If incorporated in the To guard against an attacker making use of IANA allocated and RIR allocated ASNs as faux downstreams for a specific attacker controlled prefix:This is harder to guard against. Perhaps we could require that a prefix-to-ASN mapping has been stable over say Another approach would be to analyse a full routing table when creating the |
|
Yeah, the AS map mitigation may be flawed. How about requiring everyone to also have a few Tor peers, presumably bypassing the network partition attack? |
|
Indeed. ASN mappings are not a foolproof solution, but they're better than just using /16s (after all, there are lots of unused /16s you could announce if you wanted to). Ultimately some monitoring and building up filtering lists over time as we observe malicious behavior may improve things, but, indeed, ensuring redundant connectivity is the only ultimate solution. Once #15759 lands, I'd really like to propose a default of 2 additional blocksonly Tor connections if Tor support is enabled (see-also https://twitter.com/TheBlueMatt/status/1160620919775211520, in which someone suggested their ISP was censoring Bitcoin P2P traffic, and only after setting bitcoind to Tor-only did it manage to connect). |
|
One thing we can play with after we build an initial table is to look at the paths, instead of looking only at the last ASN in the path. eg if, from many vantage points on the internet, a given IP block always passes from AS 1 to AS 2, we could consider it as a part of AS 1 (given it appears to only have one provider - AS 1). In order to avoid Western bias we'd need to do it across geographic regions and from many vantage points (eg maybe contact a Tier 1 and get their full routing table view, not just the selected routes), but once we get the infrastructure in place, further filtering can be played with. |
92528c2 Support serialization of std::vector<bool> (Pieter Wuille) Pull request description: This adds support for serialization of `std::vector<bool>`, as a prerequisite of #16702. ACKs for top commit: MarcoFalke: ACK 92528c2 (only looked at the diff on GitHub) practicalswift: ACK 92528c2 -- diff looks correct jamesob: ACK 92528c2 Tree-SHA512: 068246a55a889137098f817bf72d99fc3218a560d62a97c59cccc0392b110f6778843cee4e89d56f271ac64e03a0f64dc5e2cc22daa833fbbbe9234c5f42d7b9
a944e38 to
0921912
Compare
There was a problem hiding this comment.
Concept ACK. It makes sense that this is opt-in for now, and I like that it can rebucket if the map changes, so it doesn't have to be perfect.
Can you get rid of the merge (and oops) commit? A rebase is much easier to review commit-by-commit.
I'd like to try this out, but I need something to feed @sipa's script with (and then pass that into -asmap=<file>). Is there an easy incantation to convert the geolite IPv4/IPv6 ASN tables? Or to produce it myself from a VPS?
Do we need to reconsider the number of buckets and their size?
d688a65 to
712d3d7
Compare
a57b6e3 to
37e37c0
Compare
Summary: ``` If ASN bucketing is used, return a corresponding AS used in bucketing for a given peer. ``` Partial backport 3/4 of core [[bitcoin/bitcoin#16702 | PR16702]]: bitcoin/bitcoin@e4658aa Depends on D8195. Test Plan: ninja all check-all Reviewers: #bitcoin_abc, jasonbcox Reviewed By: #bitcoin_abc, jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D8196
Summary: Completes backport 4/4 of core [[bitcoin/bitcoin#16702 | PR16702]]: bitcoin/bitcoin@3c1bc40 Depends on D8196. Test Plan: ninja all check-all Reviewers: #bitcoin_abc, jasonbcox Reviewed By: #bitcoin_abc, jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D8197
first approach to porting bitcoin/bitcoin#16702 into old v0.13 codebase.
| CAddrInfo &info = mapInfo[n]; | ||
| int bucket = entryToBucket[n]; | ||
| int nUBucketPos = info.GetBucketPosition(nKey, true, bucket); | ||
| if (nVersion == 2 && nUBuckets == ADDRMAN_NEW_BUCKET_COUNT && vvNew[bucket][nUBucketPos] == -1 && |
There was a problem hiding this comment.
Why is this nVersion == 2 check required? This means that anyone upgrading to v0.20 will have a lot of their new table data lost, even if they're not using an asmap. Without this check, we'd still make sure not to place entries in the wrong new table because of the asmap version check.
On code readability/style, it'd make a lot more sense to extract this check to above the loop and comment it:
// New table positions can only be retained if the file is the right format and
// was constructed with the same asmap.
const bool compatible_file = nUBuckets == ADDRMAN_NEW_BUCKET_COUNT && serialized_asmap_version == supplied_asmap_versionThere was a problem hiding this comment.
More generally, what's the plan for updating the asmap? Presumably it's something that could change fairly frequently (i.e. at least as often as we release major versions of Bitcoin Core). If that's the case and the asmap is changed every major version, are we happy with the idea that people's new tables will lose data at every major upgrade?
There was a problem hiding this comment.
This means that anyone upgrading to v0.20 will have a lot of their new table data lost, even if they're not using an asmap.
You are right, I think this was a mistake. I was probably thinking about downgrading nVersion (say a file from 0.22 with nVersion=3 now downgraded to 0.21), so the check should be nVersion <= 2.
I agree with your comment suggestion, but why you separate those two other checks? It still makes sure that "file is in the right format".
There was a problem hiding this comment.
More generally, what's the plan for updating the asmap?
TBD
If that's the case and the asmap is changed every major version, are we happy with the idea that people's new tables will lose data at every major upgrade?
Depends on the answer to the previous question at least :)
Note that what's lost is just the position of records in the table, and presumably a few of records from buckets with high collisions? So I wouldn't be very worried (in the context of my previous message as well).
It's not like we're loosing many ADDR records here.
There was a problem hiding this comment.
what's lost is just the position of records in the table, and presumably a few of records from buckets with high collisions
A new address can appear in up to 8 different buckets if it's been gossipped to us from different peers. If it gets rebucketed during file load, it'll only be placed in 1 bucket (or 0, if there's a collision). That seems undesirable.
Summary: Should have gone with the [[bitcoin/bitcoin#16702 | PR16702]] backport (D8194 to D8197). Backport of core [[bitcoin/bitcoin#18310 | PR18310]]. Test Plan: Read it. Reviewers: #bitcoin_abc, jasonbcox Reviewed By: #bitcoin_abc, jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D8217
…keting in addrman 3c1bc40 Add extra logging of asmap use and bucketing (Gleb Naumenko) e4658aa Return mapped AS in RPC call getpeerinfo (Gleb Naumenko) ec45646 Integrate ASN bucketing in Addrman and add tests (Gleb Naumenko) 8feb4e4 Add asmap utility which queries a mapping (Gleb Naumenko) Pull request description: This PR attempts to solve the problem explained in bitcoin#16599. A particular attack which encouraged us to work on this issue is explained here [[Erebus Attack against Bitcoin Peer-to-Peer Network](https://erebus-attack.comp.nus.edu.sg/)] (by @muoitranduc) Instead of relying on /16 prefix to diversify the connections every node creates, we would instead rely on the (ip -> ASN) mapping, if this mapping is provided. A .map file can be created by every user independently based on a router dump, or provided along with the Bitcoin release. Currently we use the python scripts written by @sipa to create a .map file, which is no larger than 2MB (awesome!). Here I suggest adding a field to peers.dat which would represent a hash of asmap file used while serializing addrman (or 0 for /16 prefix legacy approach). In this case, every time the file is updated (or grouping method changed), all buckets will be re-computed. I believe that alternative selective re-bucketing for only updated ranges would require substantial changes. TODO: - ~~more unit tests~~ - ~~find a way to test the code without including >1 MB mapping file in the repo.~~ - find a way to check that mapping file is not corrupted (checksum?) - comments and separate tests for asmap.cpp - make python code for .map generation public - figure out asmap distribution (?) ~Interesting corner case: I’m using std::hash to compute a fingerprint of asmap, and std::hash returns size_t. I guess if a user updates the OS to 64-bit, then the hash of asap will change? Does it even matter?~ ACKs for top commit: laanwj: re-ACK 3c1bc40 jamesob: ACK 3c1bc40 ([`jamesob/ackr/16702.3.naumenkogs.p2p_supplying_and_using`](https://github.com/jamesob/bitcoin/tree/ackr/16702.3.naumenkogs.p2p_supplying_and_using)) jonatack: ACK 3c1bc40 Tree-SHA512: e2dc6171188d5cdc2ab2c022fa49ed73a14a0acb8ae4c5ffa970172a0365942a249ad3d57e5fb134bc156a3492662c983f74bd21e78d316629dcadf71576800c
Summary: > Continuing the asmap integration of [[bitcoin/bitcoin#16702 | PR16702]] which added mapped_as to the rpc getpeerinfo output, this adds the mapped AS to the Peers detail window in the GUI wallet. > Co-authored-by: Hennadii Stepanov <[email protected]> This is a backport of Core [[bitcoin/bitcoin#18402 | PR18402]] Test Plan: `ninja && src/qt/bitcoin-qt` Menu `Window -> Peers`, select a peer, the "Mapped AS" field is now visible at the bottom of the right widget. Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D8876
This PR attempts to solve the problem explained in #16599.
A particular attack which encouraged us to work on this issue is explained here [Erebus Attack against Bitcoin Peer-to-Peer Network] (by @muoitranduc)
Instead of relying on /16 prefix to diversify the connections every node creates, we would instead rely on the (ip -> ASN) mapping, if this mapping is provided.
A .map file can be created by every user independently based on a router dump, or provided along with the Bitcoin release. Currently we use the python scripts written by @sipa to create a .map file, which is no larger than 2MB (awesome!).
Here I suggest adding a field to peers.dat which would represent a hash of asmap file used while serializing addrman (or 0 for /16 prefix legacy approach).
In this case, every time the file is updated (or grouping method changed), all buckets will be re-computed.
I believe that alternative selective re-bucketing for only updated ranges would require substantial changes.
TODO:
more unit testsfind a way to test the code without including >1 MB mapping file in the repo.Interesting corner case: I’m using std::hash to compute a fingerprint of asmap, and std::hash returns size_t. I guess if a user updates the OS to 64-bit, then the hash of asap will change? Does it even matter?