Skip to content

build: Embedded ASMap [3/3]: Build binary dump header file#28792

Merged
achow101 merged 6 commits intobitcoin:masterfrom
fjahr:2023-10-asmap-in-source
Feb 19, 2026
Merged

build: Embedded ASMap [3/3]: Build binary dump header file#28792
achow101 merged 6 commits intobitcoin:masterfrom
fjahr:2023-10-asmap-in-source

Conversation

@fjahr
Copy link
Contributor

@fjahr fjahr commented Nov 4, 2023

This is the final in a series of PRs that implement the necessary changes for embedding of asmap data into the binary. This last part add the initial asmap data, implements the build changes and adds further documentation.

Currently an asmap file needs to be acquired by there user from some location or the user needs to generate one themselves. Then they need to move the file to the right place in datadir or pass the path to the file as -asmap=PATH in order to use the asmap feature. The change here allows for builds to embed asmap data into the bitcoind binary which makes it possible to use the feature without handling of the asmap file by the user. If the user starts bitcoind with -asmap the embedded data will be used for bucketing of nodes.

The data lives in the repository at src/node/data/ip_asn.dat and can be replaced with a new version at any time. The idea is that the data should be updated with every release. By default the data at that location is embedded into the binary but there is also a build option to prevent this (-DWITH_EMBEDDED_ASMAP=OFF). In this case the original behavior of the -asmap option is maintained.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 4, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28792.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK sipa, hodlinator, achow101
Concept ACK Sjors, jonatack, laanwj

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #33974 (cmake: Check dependencies after build option interaction by hebasto)
  • #33966 (refactor: disentangle miner startup defaults from runtime options by Sjors)
  • #17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fjahr fjahr mentioned this pull request Nov 4, 2023
4 tasks
@fjahr fjahr force-pushed the 2023-10-asmap-in-source branch 3 times, most recently from b8b09a7 to f9288bc Compare November 4, 2023 23:48
@sipa
Copy link
Member

sipa commented Nov 5, 2023

Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).

@maflcko
Copy link
Member

maflcko commented Nov 5, 2023

How much data would this add per year to the repo, assuming updates happen?

@sipa
Copy link
Member

sipa commented Nov 5, 2023

@maflcko The asmap file added here is around 1.6 MiB, so assuming we update for every 6 months, probably a bit over 3 MiB per year (gzip compresses asmap files only ~5%).

@luke-jr
Copy link
Member

luke-jr commented Nov 8, 2023

Why is the option of loading it from a file even in dev builds, not considered?

@fjahr
Copy link
Contributor Author

fjahr commented Nov 9, 2023

Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).

Yeah, that would work as well. I have drafted that approach here: fjahr@0f9109f Details can change of course (like keeping the raw file in init) but if people prefer that approach, I will pull this into here.

@fjahr
Copy link
Contributor Author

fjahr commented Nov 9, 2023

Why is the option of loading it from a file even in dev builds, not considered?

It is considered. Loading from a file is the only option for dev builds if we go the other route mentioned in the description: "The alternative approach is not having the data in the source code but only adding it during the build phase of a release." But this also comes with the aforementioned downsides.

@willcl-ark
Copy link
Member

A general question about this approach: do most contemporary OSes handle loading large amounts of (potentially) unused data into physical memory well-enough, that they would (likely) not load this extra 1.5-3MB of data in the case that the -asmap is not being used? Perhaps this will be handled by the demand paging system? If not, it seems to be a bit of a shame to increase binary size by 20% in all cases.

It seems to make most sense to me to take the path suggested by @sipa, including the .dat file in the repo, and having it converted to a .h at compile time. It might also be nice to have a configuration flag to disable the embedded header creation for faster builds.

@fjahr fjahr force-pushed the 2023-10-asmap-in-source branch from f9288bc to 3bcc1ca Compare December 14, 2023 23:24
@DrahtBot DrahtBot mentioned this pull request Jun 25, 2024
@fjahr fjahr changed the title Embedding ASMap files as binary dump header file Embed default ASMap as binary dump header file Aug 27, 2024
@fjahr fjahr force-pushed the 2023-10-asmap-in-source branch from 3bcc1ca to be1e04a Compare August 28, 2024 22:25
@fjahr fjahr marked this pull request as ready for review August 28, 2024 22:26
@fjahr fjahr force-pushed the 2023-10-asmap-in-source branch from be1e04a to 5167707 Compare August 28, 2024 23:31
@fjahr
Copy link
Contributor Author

fjahr commented Aug 28, 2024

I am picking this back up since we have come a long way in terms of tooling outside of Bitcoin Core and lately I was just waiting for cmake and v28 feature freeze. I will suggest an early merge in the cycle for v29 so more people are encouraged to use one steady file over a longer period of time.

I am now using the suggested approach of building the header file at compile time and I added the functionality for both autotools and cmake (the autotools one has existed for a while, I can also remove it if people think it's unnecessary). Of course this is rebased to make cmake available. First time I am editing the cmake build system so there are probably things to improve... And there is also a command for asmap-tool.py to generate the header file.

I am adding the latest (1724248800) asmap file from asmap-data, the corresponding creation process can be reviewed in the issue and PR.

I have not added a config option to skip the asmap header file but I will look into that next.

@Sjors
Copy link
Member

Sjors commented Aug 29, 2024

I am now using the suggested approach of building the header file at compile time

I think the suggestion was to not include the header itself in that case.

the autotools one has existed for a while, I can also remove it if people think it's unnecessary

Maybe keep it in until #30664 is merged.

@fjahr fjahr force-pushed the 2023-10-asmap-in-source branch from 5167707 to e5a25fd Compare August 29, 2024 18:33
@fjahr
Copy link
Contributor Author

fjahr commented Jan 27, 2026

I feel apprehensive to be commenting this so late in this discussion, but also compelled to voice my concerns around embedding the data in the binary. My initial reaction a few years ago to this was that adding something that can only go stale over time, doesn't seem well suited for embedding - especially since many users tend to run old clients.

The overaching motivation here is that we think using asmap will help all users individually and the network as a whole. Because of that we want to turn it on by default eventually so even the users that don't change anything in their configs can get the benefit. This seems to be only possible to do reliably by embedding the data in the binary.

From reading a bunch of prior discussions my impression since then is that the autonomous system data remains remarkably robust over time, meaning the data is unlikely to go stale quickly. Given that dumps exist now going back a few years, how does a map from say five years ago hold up now? I believe it would be valuable to have this additional data as a motivation for this PR.

@jurraca posted some analysis already. 5 years seems excessive to me, given that our releases reach EOL after 18 months. Looking back two years seems fine in terms of time horizon and we also don't have older data because the kartograf project only got stable and to a comparable feature set of today about two years ago. So any comparison with older files or even non-kartograf generated files could be misleading. It would seem backwards to me to test and (upon success) give a thumbs up for using unsupported bitcoin core versions, even if it's just from the asmap perspective.

@jurraca and me are very happy to run more analysis and show the data if it convinces people further but it would be helpful if there was some input of what concerns there are exactly. The map is just a snapshot of the global internet routing table, it may be more interesting to post some research from that level but I am not sure people are interested to invest the time and study it. We can also look at the data on the level of how networks containing bitcoin nodes change, on a shorter timeframe we have already done that and we could take 2 year old bitnodes snapshots and look how they are affected. But it is debatable how valuable that information is since hopefully nodes are being run in a much broader range of networks in the future and asmap should work the same way then.

Is there really that much complexity involved with shipping an external dump file?

It should be less complex from what I can tell, we could basically throw away all the changes here. But we couldn't turn asmap on by default.

I feel like this naturally fits into /share, which we already ship and is commonly part of a bitcoin installation.

I only spent a short time looking at it but it doesn't seem like any data there is loaded at runtime so I don't see how it is a natural fit. Can you point me to the files in there that are already used in the same way asmap files would be used? It seems like rather the files there are either embedded into the builds too (but I don't have a good overview of the QT code) or they are just support scripts.

In my view not doing so out of fear of deployment complexity pulls the recent direction into using multiple specialized binaries, some of which are installed in a separate /libexec directory, into question. Is there another motivation for it outside of users potentially missing to install the data file in an expected directory?

Sure, the users that use these specialized binaries directly will certainly be capable to use external asmap files. But it would be just a power user feature then and if that is fine I don't think we need to do anything, we could just keep the feature as is and let people download files from https://github.com/asmap/asmap-data/. If users have to touch these files manually anyway, they might as well get the freshest one.

@hodlinator
Copy link
Contributor

hodlinator commented Jan 27, 2026

Is there really that much complexity involved with shipping an external dump file?

It should be less complex from what I can tell, we could basically throw away all the changes here. But we couldn't turn asmap on by default.

A possible alternative to embedding could be to by default require an -asmap=path/to/file CLI arg / setting on mainnet, while still allowing running with -asmap= (explicitly unsetting) but in that case emit a warning on startup so distributions/package maintainers aren't let off the hook.

I also think that it probably shouldn't be an all-or-nothing thing. Maybe use ASMap for 2/3 of outbound node slots by default and non-ASMap for the rest?

@sedited
Copy link
Contributor

sedited commented Jan 27, 2026

Re #28792 (comment)

I only spent a short time looking at it but it doesn't seem like any data there is loaded at runtime so I don't see how it is a natural fit.

We don't load anything from /share currently, but I don't see a difference between starting a binary from /libexec and loading from /share at startup in that respect. There are plenty of applications that do load static data from /share.

But we couldn't turn asmap on by default.

Am I right that the concern is that even if we'd always attempt to load from some static resource path, we would have to fail silently if the data file cannot be found, meaning we'd still not be able to ship the feature to all users?

5 years seems excessive to me, given that our releases reach EOL after 18 months.

I do think this is a slight shift in the guarantees of the software. AFAIK we don't currently ship a feature that decreases in usefulness over time on an already running node. As I already said, I largely do not share this concern. The data @jurraca posted above should be enough to address those reservations (which seems to indicate a potential linear regression?).

@fjahr
Copy link
Contributor Author

fjahr commented Jan 28, 2026

@sedited

We don't load anything from /share currently, but I don't see a difference between starting a binary from /libexec and loading from /share at startup in that respect. There are plenty of applications that do load static data from /share.

Maybe there are applications that load data from share that they put there themselves, I don't have a lot of experience with using share or such applications. But my understanding from some research is that data put in share usually exists so the OS can access and use them, too (hence the name). So for example icons for applications to discplay in different UI elements of the OS GUI. The asmap data isn't used by the OS so it doesn't seem like a good fit conceptually.

Am I right that the concern is that even if we'd always attempt to load from some static resource path, we would have to fail silently if the data file cannot be found, meaning we'd still not be able to ship the feature to all users?

I am inexperienced with using share folders in different OS, if you say it's guaranteed that in every OS/distro we are planning to support some form of a share folder exists and it is always accessible to bitcoin core at runtime and that the maintenance burden to ensure that this always holds holds in the future is also not higher than the maintenance burden of embedding the data, then I would say it's a viable alternative. I honestly have no idea how to verify this.

AFAIK we don't currently ship a feature that decreases in usefulness over time on an already running node.

At least we ship the hardcoded seeds as embedded data and I would expect that data to decrease in usefulness at roughly the same rate. And I would say the usefulness of the software as a whole decreases as we publish security advisories to which that software is still vulnerable.

which seems to indicate a potential linear regression?

I would expect it to flatten more over time but it is also possible that this effect is offset by registration of new IPv6 blocks and moves from IPv4 to IPv6.

@fjahr
Copy link
Contributor Author

fjahr commented Jan 28, 2026

@hodlinator

A possible alternative to embedding could be to by default require an -asmap=path/to/file CLI arg / setting on mainnet, while still allowing running with -asmap= (explicitly unsetting) but in that case emit a warning on startup so distributions/package maintainers aren't let off the hook.

It sounds like the burden would still fall on users having to get an asmap file from somewhere quite often. Since we won't always be able to reach these users ourselves, there would be an increased chance of those users using a badly outdated or even malicious asmap file they got from somewhere. And this would be a degraded UX.

Possibly the asmap files from asmap-data could contain a signature by the maintainers which would be checked by bitcoin core to verify it's a verified asmap and otherwise there is a warning. I don't think that's a good idea but that would move this into a bit more secure territory, similar to how we have the assumeutxo hashes. Of course only embedding the hashes doesn't work because we want people to use the latest version if possible...

I also think that it probably shouldn't be an all-or-nothing thing. Maybe use ASMap for 2/3 of outbound node slots by default and non-ASMap for the rest?

There are certainly ways we could investigate improving the bucketing logic before we enable asmap by default. We had discussions on this in early 2023 based on virtu's simulations for example. But I would like to pick up that thread when we have embedding actually merged and off-by-default.

@hodlinator
Copy link
Contributor

hodlinator commented Jan 28, 2026

@fjahr

It sounds like the burden would still fall on users having to get an asmap file from somewhere quite often.

I was thinking we would include the latest "official" ASMap file in release packages:

  • Windows installer would put it in a subdirectory close to the executables.
  • Linux/UNIX distros would be encouraged (not required) to put it in one of <10 standard locations: /usr/share/, /local/share/, etc
  • Mac app dir would contain it in a subdirectory.

A less heavy-handed approach than my earlier comment #28792 (comment) (which required explicit -asmap=... to run on mainnet). Would be:

  1. If ASMap is enabled (or later default-on):
    • On Windows/Mac: Bitcoin Core looks for an ASMap file next to the binaries.
    • On Linux/UNIX we check the list of expected standard locations /usr/share/, /local/share/, etc. More esoteric distros like Nix/Guix would still use -asmap=....
  2. If an ASMap is not found:
    • If it was requested via -asmap=..., we output a fatal init-error.
    • If it was default-on, we emit a warning. This encourages distros to set it up to work, although they could also opt to explicitly disable it in the worst case, something that is less of an issue with the approach embedding in the executables/shared libraries, but they could still chose to exclude it from the build.

Agree that some kind of scheme to validate asmap-files would be good.


I also think that it probably shouldn't be an all-or-nothing thing. Maybe use ASMap for 2/3 of outbound node slots by default and non-ASMap for the rest?

[...] I would like to pick up that thread when we have embedding actually merged and off-by-default.

👍

@sipa
Copy link
Member

sipa commented Jan 28, 2026

@sedited

My initial reaction a few years ago to this was that adding something that can only go stale over time, doesn't seem well suited for embedding - especially since many users tend to run old clients.

That's a fair concern, but my (largely unsubstantiated) belief is that even a 5-year old asmap file would be far more accurate than the default /16-based bucketing we use now. And I don't think this is something we can exactly measure: even if ranges's ASN assignment changes over time, that does not imply that the inaccuracy caused by that exposes users more to attacks than /16 bucketing does.

Is there really that much complexity involved with shipping an external dump file? I feel like this naturally fits into /share, which we already ship and is commonly part of a bitcoin installation. In my view not doing so out of fear of deployment complexity pulls the recent direction into using multiple specialized binaries, some of which are installed in a separate /libexec directory, into question. Is there another motivation for it outside of users potentially missing to install the data file in an expected directory?

I'm not sure I've ever even considered that option - possibly because I've always thought of (and frequently used) Bitcoin Core as a standalone bitcoind / bitcoin-cli that can be run from anywhere. So from that perspective, the only options are built-in to the binary, or expecting the user to put it in the datadir or otherwise manually provide a location. I think built-in is the far less error-prone option out of those.

If we accept that some "installation" for usage is inevitably going to be necessary(*) with the libexec/ move, then putting the asmap data in a /share-like path seems pretty reasonable from a filesystem layout perspective.

If we'd go that route, I'd aim for a:

  • If -noasmap specified, fall back to /16 bucketing like today.
  • If -asmap=PATH, specified, use that path if it exists and fail otherwise
  • If $DATADIR/asmap.dat exists, use that.
  • If $INSTALLPATH/share/asmap.dat exists, use that.
  • (Possibly?) If $(dirname argv[0])/asmap.dat exists, use that, so someone can have a bitcoind and asmap.dat in the same directly, and it'll just work?
  • Otherwise, fail.

So a user cannot inadvertently end up in a situation where they are not using proper asmap data.

However, I don't see how this addresses the reasoning for not wanting built-in asmap data: data in /share will go stale just as much as the binary goes out of stale, if we're working in the "installed environment" setting, as /bin/bitcoind and /share/asmap.dat will just be updated simultaneously? In the "manual/uninstalled" environment, we'd expect people to have asmap.dat files in a manual path anyway.

(*) Is that "installed environment" really compatible with the movement towards reproducible and increasingly static builds that work on as many distinct systems as possible? Because how can this neutral, run-anywhere, reproducible, single binary know what the OS's equivalent /share path would be? Does it just try a number of fixed typical locations? What about guix/nix like systems that install binaries in a versioned paths?

@darosior
Copy link
Member

Embedding the ASMap in the binary is a tradeoff that is only worth it if we are going to enable it by default. Correct? If so, i think it is necessary that we have buy-in on the direction prior to embedding the binary. Clearly there is interest, but i'm not sure there is broad agreement yet (see for instance #16599 (comment)).

To be clear i'm not objecting to this PR, i just want to make sure that we don't end up with a half-shipped feature where we eat the costs but don't reap the benefits because we didn't set a clear direction from the get go and it turns out to be more controversial than anticipated.

@fjahr
Copy link
Contributor Author

fjahr commented Jan 28, 2026

Embedding the ASMap in the binary is a tradeoff that is only worth it if we are going to enable it by default. Correct?

I think lowering the friction to use the feature is a win either way but I don't have to motivate myself to review the code so I guess my opinion doesn't count much ;)

If so, i think it is necessary that we have buy-in on the direction prior to embedding the binary. Clearly there is interest, but i'm not sure there is broad agreement yet (see for instance #16599 (comment)).

I would expect that there is agreement that the asmap data will be helpful to create better peer diversity even if the bucketing or connection logic might have to be tweaked to prevent adverse effects when all nodes in the network are using asmap at some point.

On the linked comment in particular, to me it sounds more like something that should be tackled on the connection side because it just shows a general problem we will always encounter when we try to create peer diversity through discrimination of certain peers based on any data, not just asmap. If I get a new IPv6 /32 and launch thousands of nodes in it, we would start to see the same effect under the current default bucketing logic, right? And since IPv4 is getting more and more scarce it would be totally plausible (if not inevitable) that the big cloud hosters just decide to move all of their cheaper servers in one IPv6 /32 each and IPv4 will be only be available via a very costly upgrade (actually AWS already charges a small amount it seems). So this is not as unrealistic as it may seem.

src/init.cpp Outdated
node.netgroupman = std::make_unique<NetGroupManager>(NetGroupManager::WithLoadedAsmap(std::move(asmap)));
} else {
#ifdef ENABLE_EMBEDDED_ASMAP
// If the file doesn't exist, try to use the embedded data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment up to date? It looks like passing a non-existing filename will trigger an error, not usage of the embedded asmap (which I think is the correct behavior).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's a left-over from the times when we had the default file location. I have amended it.

@fjahr
Copy link
Contributor Author

fjahr commented Feb 3, 2026

Rebased since #33878 was merged and addressed @sipa's review comment.

Copy link
Member

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 24699fe

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 24699fe

We can try embedding the asmap data in executables in the v31 release and re-asses for later releases. IMO it's a step forward to distribute it inside releases regardless of whether it's included as a separate file or not.

Comment on lines +537 to +543
argsman.AddArg("-asmap=<file>", strprintf("Specify asn mapping used for bucketing of the peers. Relative paths will be prefixed by the net-specific datadir location.%s",
#ifdef ENABLE_EMBEDDED_ASMAP
" If a bool arg is given (-asmap or -asmap=1), the embedded mapping data in the binary will be used."
#else
""
#endif
), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could simplify by using the string literals on adjacent lines instead of strprintf:

Suggested change
argsman.AddArg("-asmap=<file>", strprintf("Specify asn mapping used for bucketing of the peers. Relative paths will be prefixed by the net-specific datadir location.%s",
#ifdef ENABLE_EMBEDDED_ASMAP
" If a bool arg is given (-asmap or -asmap=1), the embedded mapping data in the binary will be used."
#else
""
#endif
), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
argsman.AddArg("-asmap=<file>", "Specify asn mapping used for bucketing of the peers. Relative paths will be prefixed by the net-specific datadir location."
#ifdef ENABLE_EMBEDDED_ASMAP
" If a bool arg is given (-asmap or -asmap=1), the embedded mapping data in the binary will be used."
#endif
, ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do if I have to retouch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Checked size difference of GUIX binaries with/without this embedding PR.

  • Windows installer grows by 5.5 MB - a bit disappointing that it doesn't de-duplicate better across the binaries. Growth of ~17%.
  • Similarly for aarch64 tar.gz package, size increase of 7.1 MB / 9.2%.
₿ diff -U 100 b58eebf1520f 24699fec8422
--- b58eebf1520f	2026-02-04 11:49:50.675776779 +0100
+++ 24699fec8422	2026-02-04 11:49:51.508684240 +0100
@@ -1,65 +1,65 @@
-₿ ll --size bytes guix-build-b58eebf1520f/output/aarch64-linux-gnu/bitcoin-*-aarch64-linux-gnu.tar.gz| sed -E 's/\-\w{12}/\-<hash>/g; s/\bWed Feb.{18}//'
-.rw-r--r-- hodlinator users 77223920  guix-build-<hash>/output/aarch64-linux-gnu/bitcoin-<hash>-aarch64-linux-gnu.tar.gz
+₿ ll --size bytes guix-build-24699fec8422/output/aarch64-linux-gnu/bitcoin-*-aarch64-linux-gnu.tar.gz| sed -E 's/\-\w{12}/\-<hash>/g; s/\bWed Feb.{18}//'
+.rw-r--r-- hodlinator users 84334168  guix-build-<hash>/output/aarch64-linux-gnu/bitcoin-<hash>-aarch64-linux-gnu.tar.gz
 
-₿ tar -ztvf guix-build-b58eebf1520f/output/aarch64-linux-gnu/bitcoin-*-aarch64-linux-gnu.tar.gz| sed 's/bitcoin-\w*/bitcoin-<hash>/; s/\b202.\-.\{12\}//'
+₿ tar -ztvf guix-build-24699fec8422/output/aarch64-linux-gnu/bitcoin-*-aarch64-linux-gnu.tar.gz| sed 's/bitcoin-\w*/bitcoin-<hash>/; s/\b202.\-.\{12\}//'
 drwxr-xr-x 0/0               0 bitcoin-<hash>/
 -rw-r--r-- 0/0            3489 bitcoin-<hash>/README.md
 drwxr-xr-x 0/0               0 bitcoin-<hash>/bin/
 -rwxr-xr-x 0/0         2035880 bitcoin-<hash>/bin/bitcoin
 -rwxr-xr-x 0/0         2561200 bitcoin-<hash>/bin/bitcoin-cli
--rwxr-xr-x 0/0        39497192 bitcoin-<hash>/bin/bitcoin-qt
+-rwxr-xr-x 0/0        41004568 bitcoin-<hash>/bin/bitcoin-qt
 -rwxr-xr-x 0/0         4594920 bitcoin-<hash>/bin/bitcoin-tx
 -rwxr-xr-x 0/0         4265648 bitcoin-<hash>/bin/bitcoin-util
 -rwxr-xr-x 0/0         8679816 bitcoin-<hash>/bin/bitcoin-wallet
--rwxr-xr-x 0/0        14877576 bitcoin-<hash>/bin/bitcoind
+-rwxr-xr-x 0/0        16319416 bitcoin-<hash>/bin/bitcoind
 -rw-r--r-- 0/0             126 bitcoin-<hash>/bitcoin.conf
 drwxr-xr-x 0/0               0 bitcoin-<hash>/libexec/
--rwxr-xr-x 0/0        43103040 bitcoin-<hash>/libexec/bitcoin-gui
--rwxr-xr-x 0/0        18483432 bitcoin-<hash>/libexec/bitcoin-node
--rwxr-xr-x 0/0        29836280 bitcoin-<hash>/libexec/test_bitcoin
+-rwxr-xr-x 0/0        44610416 bitcoin-<hash>/libexec/bitcoin-gui
+-rwxr-xr-x 0/0        19925272 bitcoin-<hash>/libexec/bitcoin-node
+-rwxr-xr-x 0/0        31278120 bitcoin-<hash>/libexec/test_bitcoin
 drwxr-xr-x 0/0               0 bitcoin-<hash>/share/
 drwxr-xr-x 0/0               0 bitcoin-<hash>/share/man/
 drwxr-xr-x 0/0               0 bitcoin-<hash>/share/man/man1/
 -rw-r--r-- 0/0             214 bitcoin-<hash>/share/man/man1/bitcoin-cli.1
 -rw-r--r-- 0/0             211 bitcoin-<hash>/share/man/man1/bitcoin-qt.1
 -rw-r--r-- 0/0             211 bitcoin-<hash>/share/man/man1/bitcoin-tx.1
 -rw-r--r-- 0/0             217 bitcoin-<hash>/share/man/man1/bitcoin-util.1
 -rw-r--r-- 0/0             223 bitcoin-<hash>/share/man/man1/bitcoin-wallet.1
 -rw-r--r-- 0/0             202 bitcoin-<hash>/share/man/man1/bitcoin.1
 -rw-r--r-- 0/0             205 bitcoin-<hash>/share/man/man1/bitcoind.1
 drwxr-xr-x 0/0               0 bitcoin-<hash>/share/rpcauth/
 -rw-r--r-- 0/0             453 bitcoin-<hash>/share/rpcauth/README.md
 -rwxr-xr-x 0/0            1767 bitcoin-<hash>/share/rpcauth/rpcauth.py
 
-₿ tar -ztvf guix-build-b58eebf1520f/output/x86_64-w64-mingw32/bitcoin-*-win64-codesigning.tar.gz| sed 's/bitcoin-\w*/bitcoin-<hash>/; s/\b202.\-.\{12\}//'
+₿ tar -ztvf guix-build-24699fec8422/output/x86_64-w64-mingw32/bitcoin-*-win64-codesigning.tar.gz| sed 's/bitcoin-\w*/bitcoin-<hash>/; s/\b202.\-.\{12\}//'
 drwxr-xr-x 0/0               0 ./
 -rwxr-xr-x 0/0            1370 ./detached-sig-create.sh
 drwxr-xr-x 0/0               0 ./unsigned/
 drwxr-xr-x 0/0               0 ./unsigned/bitcoin-<hash>/
--rw-r--r-- 0/0        32627040 ./unsigned/bitcoin-<hash>-win64-setup-unsigned.exe
+-rw-r--r-- 0/0        38141533 ./unsigned/bitcoin-<hash>-win64-setup-unsigned.exe
 drwxr-xr-x 0/0               0 ./unsigned/bitcoin-<hash>/bin/
 -rwxr-xr-x 0/0         2328083 ./unsigned/bitcoin-<hash>/bin/bitcoin-cli.exe
--rwxr-xr-x 0/0        40828963 ./unsigned/bitcoin-<hash>/bin/bitcoin-qt.exe
+-rwxr-xr-x 0/0        42296867 ./unsigned/bitcoin-<hash>/bin/bitcoin-qt.exe
 -rwxr-xr-x 0/0         4388883 ./unsigned/bitcoin-<hash>/bin/bitcoin-tx.exe
 -rwxr-xr-x 0/0         4018707 ./unsigned/bitcoin-<hash>/bin/bitcoin-util.exe
 -rwxr-xr-x 0/0         8692243 ./unsigned/bitcoin-<hash>/bin/bitcoin-wallet.exe
 -rwxr-xr-x 0/0         1652755 ./unsigned/bitcoin-<hash>/bin/bitcoin.exe
--rwxr-xr-x 0/0        15117843 ./unsigned/bitcoin-<hash>/bin/bitcoind.exe
+-rwxr-xr-x 0/0        16585235 ./unsigned/bitcoin-<hash>/bin/bitcoind.exe
 -rw-r--r-- 0/0             126 ./unsigned/bitcoin-<hash>/bitcoin.conf
 drwxr-xr-x 0/0               0 ./unsigned/bitcoin-<hash>/libexec/
--rwxr-xr-x 0/0        27983891 ./unsigned/bitcoin-<hash>/libexec/test_bitcoin.exe
+-rwxr-xr-x 0/0        29451283 ./unsigned/bitcoin-<hash>/libexec/test_bitcoin.exe
 -rw-r--r-- 0/0             846 ./unsigned/bitcoin-<hash>/readme.txt
 drwxr-xr-x 0/0               0 ./unsigned/bitcoin-<hash>/share/
 drwxr-xr-x 0/0               0 ./unsigned/bitcoin-<hash>/share/man/
 drwxr-xr-x 0/0               0 ./unsigned/bitcoin-<hash>/share/man/man1/
 -rw-r--r-- 0/0             214 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-cli.1
 -rw-r--r-- 0/0             211 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-qt.1
 -rw-r--r-- 0/0             211 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-tx.1
 -rw-r--r-- 0/0             217 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-util.1
 -rw-r--r-- 0/0             223 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-wallet.1
 -rw-r--r-- 0/0             202 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin.1
 -rw-r--r-- 0/0             205 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoind.1
 drwxr-xr-x 0/0               0 ./unsigned/bitcoin-<hash>/share/rpcauth/
 -rw-r--r-- 0/0             453 ./unsigned/bitcoin-<hash>/share/rpcauth/README.md
 -rwxr-xr-x 0/0            1767 ./unsigned/bitcoin-<hash>/share/rpcauth/rpcauth.py
 -rw-r--r-- 0/0            7058 ./win-codesign.cert

Copy link
Contributor

@hodlinator hodlinator Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth adding /doc/release-notes-28792.md to inform that we are now embedding the ASMap.

  • Makes users more likely to try enabling it.
  • Helps explain the bump in package/executable size.
P2P and network changes
-----------------------

- Recent ASMap data is now embedded in the binaries, use `-asmap` to enable it for your node
  (see [`asmap-data.md`](./asmap-data.md)).

(Edit not sure about the asmap-data.md link, as that file is currently not distributed with releases).

Copy link
Contributor Author

@fjahr fjahr Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will add it if I have to retouch and if not I will add this as a follow-up or just edit the release notes draft if it already exists.

@achow101
Copy link
Member

ACK 24699fe

Comment on lines +54 to +59
As an upcoming release approaches the embedded ASMap data should be updated
by replacing the `ip_asn.dat` with a newer ASMap file from the asmap-data
repository so that its data is embedded in the release. Ideally, there may be a file
already created recently that can be selected for an upcoming release. Alternatively,
a new creation process can be initiated with the goal of obtaining a fresh map
for use in the upcoming release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs fleshing out before the next release imo. I would suggest doing the following:

  • Move asmap-data to the bitcoin-core org
  • Add a script to sign a transcript of the gathered data as well as its resulting hash, both in its compact encoded and un-encoded state.
  • Gather signatures from participants in the repository similar to guix.sigs.
  • Decide on a minimum threshold of participants producing matching data in order to be included here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move asmap-data to the bitcoin-core org

This came up a few times already and I am happy to do it. I just looked into it and it seems I can not initiate the move to the bitcoin-core org directly because I don't have permission to create a new repository there. The common workaround seem to be that I would initiate the transfer to a maintainers personal account and then they transfer it to the org. Let me know if I should do that and to whom it should go.

Decide on a minimum threshold of participants producing matching data in order to be included here.

The rule I set initially without much basis was to have at least 5 people participate and the majority should get a match. This means to 3 ACKs minimum and if a larger part of participants got different results there may be something wrong and so also rather abort. This was partly also chosen since we didn't always have a large number of participants in the collaborative runs. In the last few months we had enough that I would say a minimum of 8-9 participants is realistic with the majority rule or just asking for 5+ matches. Most of the last runs were higher with up to 10 matches but I think this should be good enough for a start?

Add a script to sign a transcript of the gathered data as well as its resulting hash, both in its compact encoded and un-encoded state.
Gather signatures from participants in the repository similar to guix.sigs.

I can look into those, just to clarify: With transcript do you mean the personal log output from kartograf? Or maybe the Issue discussion of the run?

And you mean signatures from all participants in the run or only those that review the PR that merges the data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a script to sign a transcript of the gathered data as well as its resulting hash, both in its compact encoded and un-encoded state.

Yea. I think its important that the raw data is available, for any asmap we are shipping, so that someone can recreate it from the raw inputs, rather than trusting the blob.

Copy link
Contributor Author

@fjahr fjahr Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its important that the raw data is available

Ok, the full raw data from one matching participant though, right? The raw data can have minor differences but comes out with the same result after processing because of removal of duplicated data between multiple sources etc.

Do you have a preferred storage mechanism for it? I am not sure how sustainable this is but I found a hack that you can store it as a release artifact and we are using this for our kartograf CI full reproduction run: https://github.com/asmap/sample-data/releases/tag/1762444800

Copy link
Contributor

@sedited sedited Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With transcript do you mean the personal log output from kartograf

I think the final hash from the kartograph output is fine in order for the transcripts to match, but I would also include producing the compact encoded map and its hash.

EDIT: To be clear, I don't mean the github comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the full raw data from one matching participant though, right?

Sure, as long as that data can be transformed into the asmap that we are shipping.

@sr-gi
Copy link
Member

sr-gi commented Feb 19, 2026

It looks like I'm too late for the party, but I wanted to comment on this even if it's post-merge.

I saw there were some concerns regarding embedding data into the binary that will decay over time. My understanding, based on reading the conversation, is that there are three reasons why we are not too worried about this being an issue:

  • Data does not decay too fast, even looking at 2-year-old snapshots, we get at most ~10% changes (excluding outliers), as per @jurraca's chart
  • Even a ~5-year old snapshot may be more accurate than the current /16 bucketing approach (even though this is hard to measure and mostly unsubstantiated)
  • We are not encouraging users to run software past EOL (18 months), so the snapshots should be "fresh enough."

If this is the case, I wonder if it would be worth notifying the user if they are running with an ASMap that may be considered "too old", and potentially encouraging them to download a newer snapshot from the repo. We can set this to match the EOL of the point release, or later if we think that the decay over 18 months is not worth the warning. I think this is not going to be too relevant as long as ASMap is not enabled by default, but it may be good to let the user know once we decide to do so.

@fjahr
Copy link
Contributor Author

fjahr commented Feb 19, 2026

I wonder if it would be worth notifying the user if they are running with an ASMap that may be considered "too old"

I would be fine with that but I think at the point where we would be warning users that their asmap may be dangerously out of date, their bitcoin core version is badly out of date in general and there may be already security advisories released about it. So shouldn't we rather warn about that instead? It seems unlikely that our release schedule will change dramatically so we could warn the users about the software as a whole rather than just the asmap data once it's release is >24 months in the past. Or am I missing something?

@sr-gi
Copy link
Member

sr-gi commented Feb 19, 2026

I would be fine with that but I think at the point where we would be warning users that their asmap may be dangerously out of date, their bitcoin core version is badly out of date in general and there may be already security advisories released about it.

Yes, that is true.

So shouldn't we rather warn about that instead?

I think the distinction comes from the stance of the project not to force users to update the software if they do not see fit. It may not be worth giving this type of warning when the threads a user may be facing by running such an old version of the code are likely worse, but this is the first time we have functionality that decays over time if untouched, so I think it bears considering it

@fjahr
Copy link
Contributor Author

fjahr commented Feb 19, 2026

this is the first time we have functionality that decays over time if untouched

Aside from the hardcoded seeds. But also I think it's only a recent development too that we are publishing the security advisories in a more timely manner while there was basically no publishing at all before that, so I think it's also worth considering taking it a step further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.