Skip to content

p2p: supplying and using asmap to improve IP bucketing in addrman#16702

Merged
laanwj merged 4 commits intobitcoin:masterfrom
naumenkogs:asn_buckets
Jan 29, 2020
Merged

p2p: supplying and using asmap to improve IP bucketing in addrman#16702
laanwj merged 4 commits intobitcoin:masterfrom
naumenkogs:asn_buckets

Conversation

@naumenkogs
Copy link
Contributor

@naumenkogs naumenkogs commented Aug 23, 2019

This PR attempts to solve the problem explained in #16599.
A particular attack which encouraged us to work on this issue is explained here [Erebus Attack against Bitcoin Peer-to-Peer Network] (by @muoitranduc)

Instead of relying on /16 prefix to diversify the connections every node creates, we would instead rely on the (ip -> ASN) mapping, if this mapping is provided.

A .map file can be created by every user independently based on a router dump, or provided along with the Bitcoin release. Currently we use the python scripts written by @sipa to create a .map file, which is no larger than 2MB (awesome!).

Here I suggest adding a field to peers.dat which would represent a hash of asmap file used while serializing addrman (or 0 for /16 prefix legacy approach).
In this case, every time the file is updated (or grouping method changed), all buckets will be re-computed.
I believe that alternative selective re-bucketing for only updated ranges would require substantial changes.

TODO:

  • more unit tests
  • find a way to test the code without including >1 MB mapping file in the repo.
  • find a way to check that mapping file is not corrupted (checksum?)
  • comments and separate tests for asmap.cpp
  • make python code for .map generation public
  • figure out asmap distribution (?)

Interesting corner case: I’m using std::hash to compute a fingerprint of asmap, and std::hash returns size_t. I guess if a user updates the OS to 64-bit, then the hash of asap will change? Does it even matter?

@fanquake fanquake added the P2P label Aug 23, 2019
@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 23, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #17812 (config, test: asmap functional tests and feature refinements by jonatack)
  • #17809 (rpc: Auto-format RPCResult by MarcoFalke)
  • #17804 (doc: Misc RPC help fixes by MarcoFalke)
  • #17428 (p2p: Try to preserve outbound block-relay-only connections during restart by hebasto)
  • #17399 (validation: Templatize ValidationState instead of subclassing by jkczyz)
  • #17383 (Refactor: Move consts to their correct translation units by jnewbery)
  • #16748 ([WIP] Add support for addrv2 (BIP155) by dongcarl)
  • #16673 (Relog configuration args on debug.log rotation by LarryRuane)
  • #16115 (On bitcoind startup, write config args to debug.log by LarryRuane)
  • #10102 ([experimental] Multiprocess bitcoin by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sipa
Copy link
Member

sipa commented Aug 24, 2019

FWIW, the asmap generated from https://dev.maxmind.com/geoip/geoip2/geolite2/ is 988387 bytes in size.

@sipa
Copy link
Member

sipa commented Aug 24, 2019

Encoding the map as bool[] array in the source code will add to the executable 4 bytes per bit in the map, that's a bit excessive. You should probably encode it as a encode it as a uint8_t[] instead (with 8 bits per array element).

@wiz
Copy link
Contributor

wiz commented Aug 24, 2019

@sipa Would you mind providing the python script used to generate the asmap? I think Bitcoiners like @TheBlueMatt and myself who maintain their own BGP full table view of the Internet would like to generate their own asmap instead of trusting maxmind, or at least verify it.

@practicalswift
Copy link
Contributor

Concept ACK

@sipa
Copy link
Member

sipa commented Aug 24, 2019

@wiz The script is here: https://gist.github.com/sipa/b90070570597b950f29a6297772a7636 though we need more tooling to convert from common dump formats, sanity check, ... I'll publish some when they're a bit cleaned up and usable.

@wiz
Copy link
Contributor

wiz commented Aug 24, 2019

Thanks for the script, it helps a lot with my own implementation and I also wanted to point out a few potential attack vectors to circumvent the asmap protection:

  1. Private ASN ranges

There are several ASN ranges which can be used as originating ASNs without registration or verification and could be used to circumvent the asmap protection. For example I could announce one /24 from 65000, one /24 from 65001, and so on, all behind my actual ASN, in order to launch an Erebus attack. I propose that we map any IP blocks originating from the following invalid ASNs to the next-valid ASN upstream of it:

0                       # Reserved RFC7607                                     
23456                   # AS_TRANS RFC6793                                     
64496-64511             # Reserved for use in docs and code RFC5398            
64512-65534             # Reserved for Private Use RFC6996                     
65535                   # Reserved RFC7300                                     
65536-65551             # Reserved for use in docs and code RFC5398            
65552-131071            # Reserved                                             
4200000000-4294967294   # Reserved for Private Use RFC6996                     
4294967295              # Reserved RFC7300                                     
  1. Multiple originating ASNs

There are actually around 50 routing prefixes on the Internet that legitimately have multiple originating ASNs, so it's not always a strict 1:1 mapping. For example, a valid use case could be an ISP that aggregates multiple customer /29 or /30 prefixes into a single /24 announcement (since /24 is the smallest globally routable prefix size that would be generally accepted) and my router indicates all of the multiple originating ASNs in curly braces like this:

141.145.0.0/19 {AS7160,43898} 
144.21.0.0 {AS7160,AS43894}
158.13.154.0/24 {AS367,AS1479,AS1504,AS1526,AS1541}
160.34.0.0/20 {AS4192,AS7160}
160.34.16.0/20 {AS7160,AS43898} 

Of course this could be used to circumvent the asmap protection as well, so for this case I also propose we instead identify the upstream aggregating ASN as the originating ASN. For my implementation I'm simply going to use a regex to strip out the curly braces aggregation to ignore it.

@practicalswift
Copy link
Contributor

practicalswift commented Aug 24, 2019

@wiz Very good points!

  1. Private ASN ranges

Besides using private ASN ranges another obvious attack vector would be to make use of non-reserved but unused or unallocated AS-numbers as a faux downstream for a specific attacker controlled prefix:

Assume that AS54321 is a non-private but unused or unallocated ASN.

An attacker AS666 with upstream transit provider AS123 could fake having AS54321 as a downstream transit customer with the network 200.201.202.0/24.

The global view would hence be:

  • 200.201.202.0/24: AS123 AS666 AS54321

Traffic to 200.201.202.0/24 would end up in the hands of AS666 who get a free extra slot our ASN lottery :-) This could obviously be repeated with N additional prefixes to get N extra slots.

Assuming that AS54321 is not in active use by any organisation no one will complain about this behaviour. (In contrast to "prefix hijacking" where the hijacked network will notice that their traffic is routed in a strange way and complain.)

How can we guard against this attack?

To guard against an attacker making use of non-IANA allocated ASNs as faux specific attacker controlled prefix:

Instead of blacklisting known reserved AS numbers, we could be stricter and apply a whitelisting approach: allow only AS number ranges that have been been allocated to the five Regional Internet Registries (AFRINIC, APNIC, ARIN, LACNIC and RIPE) by IANA. That data is available in machine readable form.

To guard against an attacker making use of IANA allocated but RIR unallocated ASNs as faux downstream for a specific attacker controlled prefix:

I believe all five RIR:s provide machine readable lists of the individual ASNs they have assigned to organisations.

(Note that this data comes with the country code for the organisation that has been assigned the AS-number. That could be handy if we some time in the future would like to implement logic for avoiding country or region based netsplits. If incorporated in the asmap data we could say not only what ASN a peer belongs to but also the region (based on which RIR that handed out the ASN) and also which country the organisation announcing the peers prefix belongs to. This would be a good approximation for the actual country/region of the peer.)

To guard against an attacker making use of IANA allocated and RIR allocated ASNs as faux downstreams for a specific attacker controlled prefix:

This is harder to guard against.

Perhaps we could require that a prefix-to-ASN mapping has been stable over say N months before being included in the asmap. That would make the described attack harder to carry out without being noticed.

Another approach would be to analyse a full routing table when creating the asmap and reason about the ASNs that are in the path to the announcing origin AS. Could be tricky to distinguish between tier-1:s and attackers :-\

@wiz
Copy link
Contributor

wiz commented Aug 24, 2019

Yeah, the AS map mitigation may be flawed. How about requiring everyone to also have a few Tor peers, presumably bypassing the network partition attack?

@TheBlueMatt
Copy link
Contributor

Indeed. ASN mappings are not a foolproof solution, but they're better than just using /16s (after all, there are lots of unused /16s you could announce if you wanted to). Ultimately some monitoring and building up filtering lists over time as we observe malicious behavior may improve things, but, indeed, ensuring redundant connectivity is the only ultimate solution. Once #15759 lands, I'd really like to propose a default of 2 additional blocksonly Tor connections if Tor support is enabled (see-also https://twitter.com/TheBlueMatt/status/1160620919775211520, in which someone suggested their ISP was censoring Bitcoin P2P traffic, and only after setting bitcoind to Tor-only did it manage to connect).

@TheBlueMatt
Copy link
Contributor

One thing we can play with after we build an initial table is to look at the paths, instead of looking only at the last ASN in the path. eg if, from many vantage points on the internet, a given IP block always passes from AS 1 to AS 2, we could consider it as a part of AS 1 (given it appears to only have one provider - AS 1). In order to avoid Western bias we'd need to do it across geographic regions and from many vantage points (eg maybe contact a Tier 1 and get their full routing table view, not just the selected routes), but once we get the infrastructure in place, further filtering can be played with.

fanquake added a commit that referenced this pull request Aug 26, 2019
92528c2 Support serialization of std::vector<bool> (Pieter Wuille)

Pull request description:

  This adds support for serialization of `std::vector<bool>`, as a prerequisite of #16702.

ACKs for top commit:
  MarcoFalke:
    ACK 92528c2 (only looked at the diff on GitHub)
  practicalswift:
    ACK 92528c2 -- diff looks correct
  jamesob:
    ACK 92528c2

Tree-SHA512: 068246a55a889137098f817bf72d99fc3218a560d62a97c59cccc0392b110f6778843cee4e89d56f271ac64e03a0f64dc5e2cc22daa833fbbbe9234c5f42d7b9
Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. It makes sense that this is opt-in for now, and I like that it can rebucket if the map changes, so it doesn't have to be perfect.

Can you get rid of the merge (and oops) commit? A rebase is much easier to review commit-by-commit.

I'd like to try this out, but I need something to feed @sipa's script with (and then pass that into -asmap=<file>). Is there an easy incantation to convert the geolite IPv4/IPv6 ASN tables? Or to produce it myself from a VPS?

Do we need to reconsider the number of buckets and their size?

jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 30, 2020
Summary:
```
If ASN bucketing is used, return a corresponding AS
used in bucketing for a given peer.
```

Partial backport 3/4 of core [[bitcoin/bitcoin#16702 | PR16702]]:
bitcoin/bitcoin@e4658aa

Depends on D8195.

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, jasonbcox

Reviewed By: #bitcoin_abc, jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D8196
jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 30, 2020
Summary:
Completes backport 4/4 of core [[bitcoin/bitcoin#16702 | PR16702]]:
bitcoin/bitcoin@3c1bc40

Depends on D8196.

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, jasonbcox

Reviewed By: #bitcoin_abc, jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D8197
TheComputerGenie added a commit to PirateNetwork/pirate that referenced this pull request Oct 31, 2020
CAddrInfo &info = mapInfo[n];
int bucket = entryToBucket[n];
int nUBucketPos = info.GetBucketPosition(nKey, true, bucket);
if (nVersion == 2 && nUBuckets == ADDRMAN_NEW_BUCKET_COUNT && vvNew[bucket][nUBucketPos] == -1 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this nVersion == 2 check required? This means that anyone upgrading to v0.20 will have a lot of their new table data lost, even if they're not using an asmap. Without this check, we'd still make sure not to place entries in the wrong new table because of the asmap version check.

On code readability/style, it'd make a lot more sense to extract this check to above the loop and comment it:

// New table positions can only be retained if the file is the right format and
// was constructed with the same asmap.
const bool compatible_file = nUBuckets == ADDRMAN_NEW_BUCKET_COUNT && serialized_asmap_version == supplied_asmap_version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, what's the plan for updating the asmap? Presumably it's something that could change fairly frequently (i.e. at least as often as we release major versions of Bitcoin Core). If that's the case and the asmap is changed every major version, are we happy with the idea that people's new tables will lose data at every major upgrade?

Copy link
Contributor Author

@naumenkogs naumenkogs Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that anyone upgrading to v0.20 will have a lot of their new table data lost, even if they're not using an asmap.

You are right, I think this was a mistake. I was probably thinking about downgrading nVersion (say a file from 0.22 with nVersion=3 now downgraded to 0.21), so the check should be nVersion <= 2.
I agree with your comment suggestion, but why you separate those two other checks? It still makes sure that "file is in the right format".

Copy link
Contributor Author

@naumenkogs naumenkogs Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, what's the plan for updating the asmap?

TBD

If that's the case and the asmap is changed every major version, are we happy with the idea that people's new tables will lose data at every major upgrade?

Depends on the answer to the previous question at least :)
Note that what's lost is just the position of records in the table, and presumably a few of records from buckets with high collisions? So I wouldn't be very worried (in the context of my previous message as well).
It's not like we're loosing many ADDR records here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's lost is just the position of records in the table, and presumably a few of records from buckets with high collisions

A new address can appear in up to 8 different buckets if it's been gossipped to us from different peers. If it gets rebucketed during file load, it'll only be placed in 1 bucket (or 0, if there's a collision). That seems undesirable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 1, 2020
Summary:
Should have gone with the [[bitcoin/bitcoin#16702 | PR16702]] backport (D8194 to D8197).

Backport of core [[bitcoin/bitcoin#18310 | PR18310]].

Test Plan: Read it.

Reviewers: #bitcoin_abc, jasonbcox

Reviewed By: #bitcoin_abc, jasonbcox

Differential Revision: https://reviews.bitcoinabc.org/D8217
sidhujag pushed a commit to syscoin-core/syscoin that referenced this pull request Nov 10, 2020
…keting in addrman

3c1bc40 Add extra logging of asmap use and bucketing (Gleb Naumenko)
e4658aa Return mapped AS in RPC call getpeerinfo (Gleb Naumenko)
ec45646 Integrate ASN bucketing in Addrman and add tests (Gleb Naumenko)
8feb4e4  Add asmap utility which queries a mapping (Gleb Naumenko)

Pull request description:

  This PR attempts to solve the problem explained in bitcoin#16599.
  A particular attack which encouraged us to work on this issue is explained here  [[Erebus Attack against Bitcoin Peer-to-Peer Network](https://erebus-attack.comp.nus.edu.sg/)] (by @muoitranduc)

  Instead of relying on /16 prefix to diversify the connections every node creates, we would instead rely on the (ip -> ASN) mapping, if this mapping is provided.

  A .map file can be created by every user independently based on a router dump, or provided along with the Bitcoin release. Currently we use the python scripts written by @sipa to create a .map file, which is no larger than 2MB (awesome!).

  Here I suggest adding a field to peers.dat which would represent a hash of asmap file used while serializing addrman (or 0 for /16 prefix legacy approach).
  In this case, every time the file is updated (or grouping method changed), all buckets will be re-computed.
  I believe that alternative selective re-bucketing for only updated ranges would require substantial changes.

  TODO:
  - ~~more unit tests~~
  - ~~find a way to test the code without including >1 MB mapping file in the repo.~~
  - find a way to check that mapping file is not corrupted (checksum?)
  - comments and separate tests for asmap.cpp
  - make python code for .map generation public
  - figure out asmap distribution (?)

  ~Interesting corner case: I’m using std::hash to compute a fingerprint of asmap, and std::hash returns size_t. I guess  if a user updates the OS to 64-bit, then the hash of asap will change? Does it even matter?~

ACKs for top commit:
  laanwj:
    re-ACK 3c1bc40
  jamesob:
    ACK 3c1bc40 ([`jamesob/ackr/16702.3.naumenkogs.p2p_supplying_and_using`](https://github.com/jamesob/bitcoin/tree/ackr/16702.3.naumenkogs.p2p_supplying_and_using))
  jonatack:
    ACK 3c1bc40

Tree-SHA512: e2dc6171188d5cdc2ab2c022fa49ed73a14a0acb8ae4c5ffa970172a0365942a249ad3d57e5fb134bc156a3492662c983f74bd21e78d316629dcadf71576800c
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jan 12, 2021
Summary:
> Continuing the asmap integration of [[bitcoin/bitcoin#16702 | PR16702]] which added mapped_as to the rpc getpeerinfo output, this adds the mapped AS to the Peers detail window in the GUI wallet.
> Co-authored-by: Hennadii Stepanov <[email protected]>

This is a backport of Core [[bitcoin/bitcoin#18402 | PR18402]]

Test Plan:
`ninja && src/qt/bitcoin-qt`

Menu `Window -> Peers`, select a peer, the "Mapped AS" field is now visible at the bottom of the right widget.

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D8876
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.