build: Embedded ASMap [3/3]: Build binary dump header file#28792
build: Embedded ASMap [3/3]: Build binary dump header file#28792achow101 merged 6 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28792. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
b8b09a7 to
f9288bc
Compare
|
Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the |
|
How much data would this add per year to the repo, assuming updates happen? |
|
@maflcko The asmap file added here is around 1.6 MiB, so assuming we update for every 6 months, probably a bit over 3 MiB per year (gzip compresses asmap files only ~5%). |
|
Why is the option of loading it from a file even in dev builds, not considered? |
Yeah, that would work as well. I have drafted that approach here: fjahr@0f9109f Details can change of course (like keeping the raw file in init) but if people prefer that approach, I will pull this into here. |
It is considered. Loading from a file is the only option for dev builds if we go the other route mentioned in the description: "The alternative approach is not having the data in the source code but only adding it during the build phase of a release." But this also comes with the aforementioned downsides. |
|
A general question about this approach: do most contemporary OSes handle loading large amounts of (potentially) unused data into physical memory well-enough, that they would (likely) not load this extra 1.5-3MB of data in the case that the It seems to make most sense to me to take the path suggested by @sipa, including the |
f9288bc to
3bcc1ca
Compare
3bcc1ca to
be1e04a
Compare
be1e04a to
5167707
Compare
|
I am picking this back up since we have come a long way in terms of tooling outside of Bitcoin Core and lately I was just waiting for cmake and v28 feature freeze. I will suggest an early merge in the cycle for v29 so more people are encouraged to use one steady file over a longer period of time. I am now using the suggested approach of building the header file at compile time and I added the functionality for both autotools and cmake (the autotools one has existed for a while, I can also remove it if people think it's unnecessary). Of course this is rebased to make cmake available. First time I am editing the cmake build system so there are probably things to improve... And there is also a command for I am adding the latest (1724248800) asmap file from asmap-data, the corresponding creation process can be reviewed in the issue and PR. I have not added a config option to skip the asmap header file but I will look into that next. |
I think the suggestion was to not include the header itself in that case.
Maybe keep it in until #30664 is merged. |
5167707 to
e5a25fd
Compare
The overaching motivation here is that we think using asmap will help all users individually and the network as a whole. Because of that we want to turn it on by default eventually so even the users that don't change anything in their configs can get the benefit. This seems to be only possible to do reliably by embedding the data in the binary.
@jurraca posted some analysis already. 5 years seems excessive to me, given that our releases reach EOL after 18 months. Looking back two years seems fine in terms of time horizon and we also don't have older data because the kartograf project only got stable and to a comparable feature set of today about two years ago. So any comparison with older files or even non-kartograf generated files could be misleading. It would seem backwards to me to test and (upon success) give a thumbs up for using unsupported bitcoin core versions, even if it's just from the asmap perspective. @jurraca and me are very happy to run more analysis and show the data if it convinces people further but it would be helpful if there was some input of what concerns there are exactly. The map is just a snapshot of the global internet routing table, it may be more interesting to post some research from that level but I am not sure people are interested to invest the time and study it. We can also look at the data on the level of how networks containing bitcoin nodes change, on a shorter timeframe we have already done that and we could take 2 year old bitnodes snapshots and look how they are affected. But it is debatable how valuable that information is since hopefully nodes are being run in a much broader range of networks in the future and asmap should work the same way then.
It should be less complex from what I can tell, we could basically throw away all the changes here. But we couldn't turn asmap on by default.
I only spent a short time looking at it but it doesn't seem like any data there is loaded at runtime so I don't see how it is a natural fit. Can you point me to the files in there that are already used in the same way asmap files would be used? It seems like rather the files there are either embedded into the builds too (but I don't have a good overview of the QT code) or they are just support scripts.
Sure, the users that use these specialized binaries directly will certainly be capable to use external asmap files. But it would be just a power user feature then and if that is fine I don't think we need to do anything, we could just keep the feature as is and let people download files from https://github.com/asmap/asmap-data/. If users have to touch these files manually anyway, they might as well get the freshest one. |
A possible alternative to embedding could be to by default require an I also think that it probably shouldn't be an all-or-nothing thing. Maybe use ASMap for 2/3 of outbound node slots by default and non-ASMap for the rest? |
We don't load anything from /share currently, but I don't see a difference between starting a binary from /libexec and loading from /share at startup in that respect. There are plenty of applications that do load static data from /share.
Am I right that the concern is that even if we'd always attempt to load from some static resource path, we would have to fail silently if the data file cannot be found, meaning we'd still not be able to ship the feature to all users?
I do think this is a slight shift in the guarantees of the software. AFAIK we don't currently ship a feature that decreases in usefulness over time on an already running node. As I already said, I largely do not share this concern. The data @jurraca posted above should be enough to address those reservations (which seems to indicate a potential linear regression?). |
Maybe there are applications that load data from share that they put there themselves, I don't have a lot of experience with using share or such applications. But my understanding from some research is that data put in share usually exists so the OS can access and use them, too (hence the name). So for example icons for applications to discplay in different UI elements of the OS GUI. The asmap data isn't used by the OS so it doesn't seem like a good fit conceptually.
I am inexperienced with using share folders in different OS, if you say it's guaranteed that in every OS/distro we are planning to support some form of a share folder exists and it is always accessible to bitcoin core at runtime and that the maintenance burden to ensure that this always holds holds in the future is also not higher than the maintenance burden of embedding the data, then I would say it's a viable alternative. I honestly have no idea how to verify this.
At least we ship the hardcoded seeds as embedded data and I would expect that data to decrease in usefulness at roughly the same rate. And I would say the usefulness of the software as a whole decreases as we publish security advisories to which that software is still vulnerable.
I would expect it to flatten more over time but it is also possible that this effect is offset by registration of new IPv6 blocks and moves from IPv4 to IPv6. |
It sounds like the burden would still fall on users having to get an asmap file from somewhere quite often. Since we won't always be able to reach these users ourselves, there would be an increased chance of those users using a badly outdated or even malicious asmap file they got from somewhere. And this would be a degraded UX. Possibly the asmap files from asmap-data could contain a signature by the maintainers which would be checked by bitcoin core to verify it's a verified asmap and otherwise there is a warning. I don't think that's a good idea but that would move this into a bit more secure territory, similar to how we have the assumeutxo hashes. Of course only embedding the hashes doesn't work because we want people to use the latest version if possible...
There are certainly ways we could investigate improving the bucketing logic before we enable asmap by default. We had discussions on this in early 2023 based on virtu's simulations for example. But I would like to pick up that thread when we have embedding actually merged and off-by-default. |
I was thinking we would include the latest "official" ASMap file in release packages:
A less heavy-handed approach than my earlier comment #28792 (comment) (which required explicit
Agree that some kind of scheme to validate asmap-files would be good.
👍 |
That's a fair concern, but my (largely unsubstantiated) belief is that even a 5-year old asmap file would be far more accurate than the default /16-based bucketing we use now. And I don't think this is something we can exactly measure: even if ranges's ASN assignment changes over time, that does not imply that the inaccuracy caused by that exposes users more to attacks than /16 bucketing does.
I'm not sure I've ever even considered that option - possibly because I've always thought of (and frequently used) Bitcoin Core as a standalone If we accept that some "installation" for usage is inevitably going to be necessary(*) with the If we'd go that route, I'd aim for a:
So a user cannot inadvertently end up in a situation where they are not using proper asmap data. However, I don't see how this addresses the reasoning for not wanting built-in asmap data: data in (*) Is that "installed environment" really compatible with the movement towards reproducible and increasingly static builds that work on as many distinct systems as possible? Because how can this neutral, run-anywhere, reproducible, single binary know what the OS's equivalent |
|
Embedding the ASMap in the binary is a tradeoff that is only worth it if we are going to enable it by default. Correct? If so, i think it is necessary that we have buy-in on the direction prior to embedding the binary. Clearly there is interest, but i'm not sure there is broad agreement yet (see for instance #16599 (comment)). To be clear i'm not objecting to this PR, i just want to make sure that we don't end up with a half-shipped feature where we eat the costs but don't reap the benefits because we didn't set a clear direction from the get go and it turns out to be more controversial than anticipated. |
I think lowering the friction to use the feature is a win either way but I don't have to motivate myself to review the code so I guess my opinion doesn't count much ;)
I would expect that there is agreement that the asmap data will be helpful to create better peer diversity even if the bucketing or connection logic might have to be tweaked to prevent adverse effects when all nodes in the network are using asmap at some point. On the linked comment in particular, to me it sounds more like something that should be tackled on the connection side because it just shows a general problem we will always encounter when we try to create peer diversity through discrimination of certain peers based on any data, not just asmap. If I get a new IPv6 /32 and launch thousands of nodes in it, we would start to see the same effect under the current default bucketing logic, right? And since IPv4 is getting more and more scarce it would be totally plausible (if not inevitable) that the big cloud hosters just decide to move all of their cheaper servers in one IPv6 /32 each and IPv4 will be only be available via a very costly upgrade (actually AWS already charges a small amount it seems). So this is not as unrealistic as it may seem. |
src/init.cpp
Outdated
| node.netgroupman = std::make_unique<NetGroupManager>(NetGroupManager::WithLoadedAsmap(std::move(asmap))); | ||
| } else { | ||
| #ifdef ENABLE_EMBEDDED_ASMAP | ||
| // If the file doesn't exist, try to use the embedded data |
There was a problem hiding this comment.
Is this comment up to date? It looks like passing a non-existing filename will trigger an error, not usage of the embedded asmap (which I think is the correct behavior).
There was a problem hiding this comment.
Right, it's a left-over from the times when we had the default file location. I have amended it.
The data embedded is from the latest ASMap file from the asmap-data repository: https://github.com/asmap/asmap-data/blob/main/1755187200_asmap.dat
This can be disabled with -DWITH_EMBEDDED_ASMAP=OFF.
hodlinator
left a comment
There was a problem hiding this comment.
ACK 24699fe
We can try embedding the asmap data in executables in the v31 release and re-asses for later releases. IMO it's a step forward to distribute it inside releases regardless of whether it's included as a separate file or not.
| argsman.AddArg("-asmap=<file>", strprintf("Specify asn mapping used for bucketing of the peers. Relative paths will be prefixed by the net-specific datadir location.%s", | ||
| #ifdef ENABLE_EMBEDDED_ASMAP | ||
| " If a bool arg is given (-asmap or -asmap=1), the embedded mapping data in the binary will be used." | ||
| #else | ||
| "" | ||
| #endif | ||
| ), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION); |
There was a problem hiding this comment.
nit: Could simplify by using the string literals on adjacent lines instead of strprintf:
| argsman.AddArg("-asmap=<file>", strprintf("Specify asn mapping used for bucketing of the peers. Relative paths will be prefixed by the net-specific datadir location.%s", | |
| #ifdef ENABLE_EMBEDDED_ASMAP | |
| " If a bool arg is given (-asmap or -asmap=1), the embedded mapping data in the binary will be used." | |
| #else | |
| "" | |
| #endif | |
| ), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION); | |
| argsman.AddArg("-asmap=<file>", "Specify asn mapping used for bucketing of the peers. Relative paths will be prefixed by the net-specific datadir location." | |
| #ifdef ENABLE_EMBEDDED_ASMAP | |
| " If a bool arg is given (-asmap or -asmap=1), the embedded mapping data in the binary will be used." | |
| #endif | |
| , ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION); |
There was a problem hiding this comment.
Will do if I have to retouch!
There was a problem hiding this comment.
info: Checked size difference of GUIX binaries with/without this embedding PR.
- Windows installer grows by 5.5 MB - a bit disappointing that it doesn't de-duplicate better across the binaries. Growth of ~17%.
- Similarly for aarch64 tar.gz package, size increase of 7.1 MB / 9.2%.
₿ diff -U 100 b58eebf1520f 24699fec8422
--- b58eebf1520f 2026-02-04 11:49:50.675776779 +0100
+++ 24699fec8422 2026-02-04 11:49:51.508684240 +0100
@@ -1,65 +1,65 @@
-₿ ll --size bytes guix-build-b58eebf1520f/output/aarch64-linux-gnu/bitcoin-*-aarch64-linux-gnu.tar.gz| sed -E 's/\-\w{12}/\-<hash>/g; s/\bWed Feb.{18}//'
-.rw-r--r-- hodlinator users 77223920 guix-build-<hash>/output/aarch64-linux-gnu/bitcoin-<hash>-aarch64-linux-gnu.tar.gz
+₿ ll --size bytes guix-build-24699fec8422/output/aarch64-linux-gnu/bitcoin-*-aarch64-linux-gnu.tar.gz| sed -E 's/\-\w{12}/\-<hash>/g; s/\bWed Feb.{18}//'
+.rw-r--r-- hodlinator users 84334168 guix-build-<hash>/output/aarch64-linux-gnu/bitcoin-<hash>-aarch64-linux-gnu.tar.gz
-₿ tar -ztvf guix-build-b58eebf1520f/output/aarch64-linux-gnu/bitcoin-*-aarch64-linux-gnu.tar.gz| sed 's/bitcoin-\w*/bitcoin-<hash>/; s/\b202.\-.\{12\}//'
+₿ tar -ztvf guix-build-24699fec8422/output/aarch64-linux-gnu/bitcoin-*-aarch64-linux-gnu.tar.gz| sed 's/bitcoin-\w*/bitcoin-<hash>/; s/\b202.\-.\{12\}//'
drwxr-xr-x 0/0 0 bitcoin-<hash>/
-rw-r--r-- 0/0 3489 bitcoin-<hash>/README.md
drwxr-xr-x 0/0 0 bitcoin-<hash>/bin/
-rwxr-xr-x 0/0 2035880 bitcoin-<hash>/bin/bitcoin
-rwxr-xr-x 0/0 2561200 bitcoin-<hash>/bin/bitcoin-cli
--rwxr-xr-x 0/0 39497192 bitcoin-<hash>/bin/bitcoin-qt
+-rwxr-xr-x 0/0 41004568 bitcoin-<hash>/bin/bitcoin-qt
-rwxr-xr-x 0/0 4594920 bitcoin-<hash>/bin/bitcoin-tx
-rwxr-xr-x 0/0 4265648 bitcoin-<hash>/bin/bitcoin-util
-rwxr-xr-x 0/0 8679816 bitcoin-<hash>/bin/bitcoin-wallet
--rwxr-xr-x 0/0 14877576 bitcoin-<hash>/bin/bitcoind
+-rwxr-xr-x 0/0 16319416 bitcoin-<hash>/bin/bitcoind
-rw-r--r-- 0/0 126 bitcoin-<hash>/bitcoin.conf
drwxr-xr-x 0/0 0 bitcoin-<hash>/libexec/
--rwxr-xr-x 0/0 43103040 bitcoin-<hash>/libexec/bitcoin-gui
--rwxr-xr-x 0/0 18483432 bitcoin-<hash>/libexec/bitcoin-node
--rwxr-xr-x 0/0 29836280 bitcoin-<hash>/libexec/test_bitcoin
+-rwxr-xr-x 0/0 44610416 bitcoin-<hash>/libexec/bitcoin-gui
+-rwxr-xr-x 0/0 19925272 bitcoin-<hash>/libexec/bitcoin-node
+-rwxr-xr-x 0/0 31278120 bitcoin-<hash>/libexec/test_bitcoin
drwxr-xr-x 0/0 0 bitcoin-<hash>/share/
drwxr-xr-x 0/0 0 bitcoin-<hash>/share/man/
drwxr-xr-x 0/0 0 bitcoin-<hash>/share/man/man1/
-rw-r--r-- 0/0 214 bitcoin-<hash>/share/man/man1/bitcoin-cli.1
-rw-r--r-- 0/0 211 bitcoin-<hash>/share/man/man1/bitcoin-qt.1
-rw-r--r-- 0/0 211 bitcoin-<hash>/share/man/man1/bitcoin-tx.1
-rw-r--r-- 0/0 217 bitcoin-<hash>/share/man/man1/bitcoin-util.1
-rw-r--r-- 0/0 223 bitcoin-<hash>/share/man/man1/bitcoin-wallet.1
-rw-r--r-- 0/0 202 bitcoin-<hash>/share/man/man1/bitcoin.1
-rw-r--r-- 0/0 205 bitcoin-<hash>/share/man/man1/bitcoind.1
drwxr-xr-x 0/0 0 bitcoin-<hash>/share/rpcauth/
-rw-r--r-- 0/0 453 bitcoin-<hash>/share/rpcauth/README.md
-rwxr-xr-x 0/0 1767 bitcoin-<hash>/share/rpcauth/rpcauth.py
-₿ tar -ztvf guix-build-b58eebf1520f/output/x86_64-w64-mingw32/bitcoin-*-win64-codesigning.tar.gz| sed 's/bitcoin-\w*/bitcoin-<hash>/; s/\b202.\-.\{12\}//'
+₿ tar -ztvf guix-build-24699fec8422/output/x86_64-w64-mingw32/bitcoin-*-win64-codesigning.tar.gz| sed 's/bitcoin-\w*/bitcoin-<hash>/; s/\b202.\-.\{12\}//'
drwxr-xr-x 0/0 0 ./
-rwxr-xr-x 0/0 1370 ./detached-sig-create.sh
drwxr-xr-x 0/0 0 ./unsigned/
drwxr-xr-x 0/0 0 ./unsigned/bitcoin-<hash>/
--rw-r--r-- 0/0 32627040 ./unsigned/bitcoin-<hash>-win64-setup-unsigned.exe
+-rw-r--r-- 0/0 38141533 ./unsigned/bitcoin-<hash>-win64-setup-unsigned.exe
drwxr-xr-x 0/0 0 ./unsigned/bitcoin-<hash>/bin/
-rwxr-xr-x 0/0 2328083 ./unsigned/bitcoin-<hash>/bin/bitcoin-cli.exe
--rwxr-xr-x 0/0 40828963 ./unsigned/bitcoin-<hash>/bin/bitcoin-qt.exe
+-rwxr-xr-x 0/0 42296867 ./unsigned/bitcoin-<hash>/bin/bitcoin-qt.exe
-rwxr-xr-x 0/0 4388883 ./unsigned/bitcoin-<hash>/bin/bitcoin-tx.exe
-rwxr-xr-x 0/0 4018707 ./unsigned/bitcoin-<hash>/bin/bitcoin-util.exe
-rwxr-xr-x 0/0 8692243 ./unsigned/bitcoin-<hash>/bin/bitcoin-wallet.exe
-rwxr-xr-x 0/0 1652755 ./unsigned/bitcoin-<hash>/bin/bitcoin.exe
--rwxr-xr-x 0/0 15117843 ./unsigned/bitcoin-<hash>/bin/bitcoind.exe
+-rwxr-xr-x 0/0 16585235 ./unsigned/bitcoin-<hash>/bin/bitcoind.exe
-rw-r--r-- 0/0 126 ./unsigned/bitcoin-<hash>/bitcoin.conf
drwxr-xr-x 0/0 0 ./unsigned/bitcoin-<hash>/libexec/
--rwxr-xr-x 0/0 27983891 ./unsigned/bitcoin-<hash>/libexec/test_bitcoin.exe
+-rwxr-xr-x 0/0 29451283 ./unsigned/bitcoin-<hash>/libexec/test_bitcoin.exe
-rw-r--r-- 0/0 846 ./unsigned/bitcoin-<hash>/readme.txt
drwxr-xr-x 0/0 0 ./unsigned/bitcoin-<hash>/share/
drwxr-xr-x 0/0 0 ./unsigned/bitcoin-<hash>/share/man/
drwxr-xr-x 0/0 0 ./unsigned/bitcoin-<hash>/share/man/man1/
-rw-r--r-- 0/0 214 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-cli.1
-rw-r--r-- 0/0 211 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-qt.1
-rw-r--r-- 0/0 211 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-tx.1
-rw-r--r-- 0/0 217 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-util.1
-rw-r--r-- 0/0 223 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin-wallet.1
-rw-r--r-- 0/0 202 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoin.1
-rw-r--r-- 0/0 205 ./unsigned/bitcoin-<hash>/share/man/man1/bitcoind.1
drwxr-xr-x 0/0 0 ./unsigned/bitcoin-<hash>/share/rpcauth/
-rw-r--r-- 0/0 453 ./unsigned/bitcoin-<hash>/share/rpcauth/README.md
-rwxr-xr-x 0/0 1767 ./unsigned/bitcoin-<hash>/share/rpcauth/rpcauth.py
-rw-r--r-- 0/0 7058 ./win-codesign.cert
There was a problem hiding this comment.
It would be worth adding /doc/release-notes-28792.md to inform that we are now embedding the ASMap.
- Makes users more likely to try enabling it.
- Helps explain the bump in package/executable size.
P2P and network changes
-----------------------
- Recent ASMap data is now embedded in the binaries, use `-asmap` to enable it for your node
(see [`asmap-data.md`](./asmap-data.md)).(Edit not sure about the asmap-data.md link, as that file is currently not distributed with releases).
There was a problem hiding this comment.
Thanks, I will add it if I have to retouch and if not I will add this as a follow-up or just edit the release notes draft if it already exists.
|
ACK 24699fe |
| As an upcoming release approaches the embedded ASMap data should be updated | ||
| by replacing the `ip_asn.dat` with a newer ASMap file from the asmap-data | ||
| repository so that its data is embedded in the release. Ideally, there may be a file | ||
| already created recently that can be selected for an upcoming release. Alternatively, | ||
| a new creation process can be initiated with the goal of obtaining a fresh map | ||
| for use in the upcoming release. |
There was a problem hiding this comment.
This needs fleshing out before the next release imo. I would suggest doing the following:
- Move
asmap-datato the bitcoin-core org - Add a script to sign a transcript of the gathered data as well as its resulting hash, both in its compact encoded and un-encoded state.
- Gather signatures from participants in the repository similar to
guix.sigs. - Decide on a minimum threshold of participants producing matching data in order to be included here.
There was a problem hiding this comment.
Move asmap-data to the bitcoin-core org
This came up a few times already and I am happy to do it. I just looked into it and it seems I can not initiate the move to the bitcoin-core org directly because I don't have permission to create a new repository there. The common workaround seem to be that I would initiate the transfer to a maintainers personal account and then they transfer it to the org. Let me know if I should do that and to whom it should go.
Decide on a minimum threshold of participants producing matching data in order to be included here.
The rule I set initially without much basis was to have at least 5 people participate and the majority should get a match. This means to 3 ACKs minimum and if a larger part of participants got different results there may be something wrong and so also rather abort. This was partly also chosen since we didn't always have a large number of participants in the collaborative runs. In the last few months we had enough that I would say a minimum of 8-9 participants is realistic with the majority rule or just asking for 5+ matches. Most of the last runs were higher with up to 10 matches but I think this should be good enough for a start?
Add a script to sign a transcript of the gathered data as well as its resulting hash, both in its compact encoded and un-encoded state.
Gather signatures from participants in the repository similar to guix.sigs.
I can look into those, just to clarify: With transcript do you mean the personal log output from kartograf? Or maybe the Issue discussion of the run?
And you mean signatures from all participants in the run or only those that review the PR that merges the data?
There was a problem hiding this comment.
Add a script to sign a transcript of the gathered data as well as its resulting hash, both in its compact encoded and un-encoded state.
Yea. I think its important that the raw data is available, for any asmap we are shipping, so that someone can recreate it from the raw inputs, rather than trusting the blob.
There was a problem hiding this comment.
I think its important that the raw data is available
Ok, the full raw data from one matching participant though, right? The raw data can have minor differences but comes out with the same result after processing because of removal of duplicated data between multiple sources etc.
Do you have a preferred storage mechanism for it? I am not sure how sustainable this is but I found a hack that you can store it as a release artifact and we are using this for our kartograf CI full reproduction run: https://github.com/asmap/sample-data/releases/tag/1762444800
There was a problem hiding this comment.
With transcript do you mean the personal log output from kartograf
I think the final hash from the kartograph output is fine in order for the transcripts to match, but I would also include producing the compact encoded map and its hash.
EDIT: To be clear, I don't mean the github comments.
There was a problem hiding this comment.
Ok, the full raw data from one matching participant though, right?
Sure, as long as that data can be transformed into the asmap that we are shipping.
|
It looks like I'm too late for the party, but I wanted to comment on this even if it's post-merge. I saw there were some concerns regarding embedding data into the binary that will decay over time. My understanding, based on reading the conversation, is that there are three reasons why we are not too worried about this being an issue:
If this is the case, I wonder if it would be worth notifying the user if they are running with an ASMap that may be considered "too old", and potentially encouraging them to download a newer snapshot from the repo. We can set this to match the EOL of the point release, or later if we think that the decay over 18 months is not worth the warning. I think this is not going to be too relevant as long as ASMap is not enabled by default, but it may be good to let the user know once we decide to do so. |
I would be fine with that but I think at the point where we would be warning users that their asmap may be dangerously out of date, their bitcoin core version is badly out of date in general and there may be already security advisories released about it. So shouldn't we rather warn about that instead? It seems unlikely that our release schedule will change dramatically so we could warn the users about the software as a whole rather than just the asmap data once it's release is >24 months in the past. Or am I missing something? |
Yes, that is true.
I think the distinction comes from the stance of the project not to force users to update the software if they do not see fit. It may not be worth giving this type of warning when the threads a user may be facing by running such an old version of the code are likely worse, but this is the first time we have functionality that decays over time if untouched, so I think it bears considering it |
Aside from the hardcoded seeds. But also I think it's only a recent development too that we are publishing the security advisories in a more timely manner while there was basically no publishing at all before that, so I think it's also worth considering taking it a step further. |
This is the final in a series of PRs that implement the necessary changes for embedding of asmap data into the binary. This last part add the initial asmap data, implements the build changes and adds further documentation.
Currently an asmap file needs to be acquired by there user from some location or the user needs to generate one themselves. Then they need to move the file to the right place in datadir or pass the path to the file as
-asmap=PATHin order to use the asmap feature. The change here allows for builds to embed asmap data into the bitcoind binary which makes it possible to use the feature without handling of the asmap file by the user. If the user starts bitcoind with-asmapthe embedded data will be used for bucketing of nodes.The data lives in the repository at
src/node/data/ip_asn.datand can be replaced with a new version at any time. The idea is that the data should be updated with every release. By default the data at that location is embedded into the binary but there is also a build option to prevent this (-DWITH_EMBEDDED_ASMAP=OFF). In this case the original behavior of the-asmapoption is maintained.