Skip to content

tracing: Only prepare tracepoint arguments when actually tracing#26593

Merged
fanquake merged 3 commits intobitcoin:masterfrom
0xB10C:2022-11-only-prepare-tracepoint-arguments-when-tracing
Nov 11, 2024
Merged

tracing: Only prepare tracepoint arguments when actually tracing#26593
fanquake merged 3 commits intobitcoin:masterfrom
0xB10C:2022-11-only-prepare-tracepoint-arguments-when-tracing

Conversation

@0xB10C
Copy link
Contributor

@0xB10C 0xB10C commented Nov 28, 2022

Currently, if the tracepoints are compiled (e.g. in depends and release builds), we always prepare the tracepoint arguments regardless of the tracepoints being used or not. We made sure that the argument preparation is as cheap as possible, but we can almost completely eliminate any overhead for users not interested in the tracepoints (the vast majority), by gating the tracepoint argument preparation with an if(something is attached to this tracepoint). To achieve this, we use the optional semaphore feature provided by SystemTap.

The first commit simplifies and deduplicates our tracepoint macros from 13 TRACEx macros to a single TRACEPOINT macro. This makes them easier to use and also avoids more duplicate macro definitions in the second commit.

The Linux tracing tools I'm aware of (bcc, bpftrace, libbpf, and systemtap) all support the semaphore gating feature. Thus, all existing tracepoints and their argument preparation is gated in the second commit. For details, please refer to the commit messages and the updated documentation in doc/tracing.md.

Also adding unit tests that include all tracepoint macros to make sure there are no compiler problems with them (e.g. some varadiac extension not supported).

Reviewers might want to check:

  • Do the tracepoints still work for you? Do the examples in contrib/tracing/ run on your system (as bpftrace frequently breaks on every new version, please test master too if it should't work for you)? Do the CI interface tests still pass?
  • Is the new documentation clear?
  • The TRACEPOINT_SEMAPHORE(event, context) macros places global variables in our global namespace. Is this something we strictly want to avoid or maybe move to all TRACEPOINT_SEMAPHOREs to a separate .cpp file or even namespace? I like having the TRACEPOINT_SEMAPHORE() in same file as the TRACEPOINT(), but open for suggestion on alternative approaches.
  • Are newly added tracepoints in the unit tests visible when using readelf -n build/src/test/test_bitcoin? You can run the new unit tests with ./build/src/test/test_bitcoin --run_test=util_trace_tests* --log_level=all.
Two of the added unit tests demonstrate that we are only processing the tracepoint arguments when attached by having a test-failure condition in the tracepoint argument preparation. The following bpftrace script can be used to demonstrate that the tests do indeed fail when attached to the tracepoints.

fail_tests.bt:

#!/usr/bin/env bpftrace

usdt:./build/src/test/test_bitcoin:test:check_if_attached {
  printf("the 'check_if_attached' test should have failed\n");
}

usdt:./build/src/test/test_bitcoin:test:expensive_section {
  printf("the 'expensive_section' test should have failed\n");
}

Run the unit tests with ./build/src/test/test_bitcoin and start bpftrace fail_tests.bt -p $(pidof test_bitcoin) in a separate terminal. The unit tests should fail with:

Running 594 test cases...
test/util_trace_tests.cpp(31): error: in "util_trace_tests/test_tracepoint_check_if_attached": check false has failed
test/util_trace_tests.cpp(51): error: in "util_trace_tests/test_tracepoint_manual_tracepoint_active_check": check false has failed

*** 2 failures are detected in the test module "Bitcoin Core Test Suite"

These links might provide more contextual information for reviewers:

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 28, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/26593.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK willcl-ark, vasild, laanwj, jb55
Concept ACK kouloumos
Approach ACK virtu

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31161 (cmake: Set top-level target output locations by hebasto)
  • #31122 (cluster mempool: Implement changeset interface for mempool by sdaftuar)
  • #30611 (validation: write chainstate to disk every hour by andrewtoth)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@0xB10C 0xB10C force-pushed the 2022-11-only-prepare-tracepoint-arguments-when-tracing branch from 27bd196 to 94331ba Compare November 28, 2022 20:54
@0xB10C 0xB10C force-pushed the 2022-11-only-prepare-tracepoint-arguments-when-tracing branch from 94331ba to be0a187 Compare November 29, 2022 15:44
@0xB10C
Copy link
Contributor Author

0xB10C commented Nov 29, 2022

  • Addressed tracing: Only prepare tracepoint arguments when actually tracing #26593 (comment) by using __VA_ARGS__. Also using a separate TRACEPOINT0 macro for tracepoints without arguments, as using the TRACEPOINT macro without arguments is only be possible with GCC (clang warns that we are using gcc extensions). Tested with both clang and gcc.
  • Adressed tracing: Only prepare tracepoint arguments when actually tracing #26593 (comment) by dropping the TRACEPOINT_UNCHECKED macro.
  • Added unit tests for the TRACEPOINT macros. The tracepoints in the unit tests don't do anything in particular, but the tests show that the macros work. Through the CI we cover ENABLE_TRACING being defined and undefined. The test case test_tracepoint_check_if_attached also shows that the TRACEPOINT_ACTIVE macro is not broken (it's tested that it's working in the functional tests).

@0xB10C 0xB10C force-pushed the 2022-11-only-prepare-tracepoint-arguments-when-tracing branch from be0a187 to 61ca0f9 Compare November 29, 2022 16:29
@jb55
Copy link
Contributor

jb55 commented Nov 29, 2022

genius. Concept ACK

@vasild
Copy link
Contributor

vasild commented Dec 2, 2022

Concept ACK

@virtu
Copy link
Contributor

virtu commented Dec 12, 2022

Approach ACK

Super cool stuff. This effectively removes any runtime overhead of tracepoints on modern processors with branch prediction and speculative execution when no probes are attached .

Some feedback on testing with 61ca0f9:

  • functional tests executed successfully on x86_64 and arm64

  • demo scripts in contrib/tracing: right now, bpftrace scripts work, bcc scripts don't

  • generated assembly looks as expected:

    • tracing code is semaphore-gated using compare and branch instructions:

      # x86_64
      cmp    WORD PTR [rip+0x8f816b],0x0        # 0x55d32c25d6c0 <net_outbound_message_semaphore>
      jne    0x55d32b965800 <_ZN8CConnman11PushMessageEP5CNodeO17CSerializedNetMsg+1488>
      
      # arm64
      ldrh    w1, [x0, #1776]
      cbnz    w1, 0xaaaab2413fa4 <_ZN8CConnman11PushMessageEP5CNodeO17CSerializedNetMsg+1228>
      
    • tracing code uses nop when no probe is attached vs. int3 (x86_64) / brk (arm64) to trap when probe is attached

  • used gdb to inspect semaphores: they are correctly initialized to zero and set to one if one or more(!) probes are attached. The documentation I read stated semaphores act as probe reference counters, so I was expecting semaphore values to reflect the number of probes. However, it looks like semaphores and counters were separated. This does not impact correct behavior though:

    (gdb) # no probe attached
    (gdb) x/hx &net_outbound_message_semaphore
    0xaaaacc8506f0 <net_outbound_message_semaphore>:        0x0000
    (gdb) # started one instance of contrib/tracing/log_p2p_traffic.bt
    (gdb) x/hx &net_outbound_message_semaphore
    0xaaaacc8506f0 <net_outbound_message_semaphore>:        0x0001
    (gdb) # started another instance of contrib/tracing/log_p2p_traffic.bt
    (gdb) x/hx &net_outbound_message_semaphore
    0xaaaacc8506f0 <net_outbound_message_semaphore>:        0x0001
    (gdb) # stopped first instance
    (gdb) x/hx &net_outbound_message_semaphore
    0xaaaacc8506f0 <net_outbound_message_semaphore>:        0x0001
    (gdb) # stopped second instance
    (gdb) x/hx &net_outbound_message_semaphore
    0xaaaacc8506f0 <net_outbound_message_semaphore>:        0x0000
    

@0xB10C
Copy link
Contributor Author

0xB10C commented Dec 13, 2022

Thanks for the detailed review and testing. I've too noticed that the current example bcc scripts have a problem with the semaphores. The functional tests, also using bcc, work as intended as they use the PID to attach to the bitcoind process. Will add a commit to allow running the example scripts by specifying a PID.

@kouloumos
Copy link
Contributor

Concept ACK!
I've though about something similar while working on #25541 and even more when you mentioned performance impact on #25832.

@0xB10C 0xB10C force-pushed the 2022-11-only-prepare-tracepoint-arguments-when-tracing branch 2 times, most recently from 636d0b8 to 6ac946d Compare December 20, 2022 12:39
@0xB10C
Copy link
Contributor Author

0xB10C commented Dec 20, 2022

demo scripts in contrib/tracing: right now, bpftrace scripts work, bcc scripts don't -- (#26593 (comment))

Rebased and added a commit that fixes the bcc example scripts by using the PID of bitcoind instead of the file path. We already do this in the tests (220a5a2).

@0xB10C 0xB10C force-pushed the 2022-11-only-prepare-tracepoint-arguments-when-tracing branch from 6ac946d to c4db6fb Compare December 20, 2022 12:44
@0xB10C 0xB10C force-pushed the 2022-11-only-prepare-tracepoint-arguments-when-tracing branch from c4db6fb to e35f0b5 Compare January 4, 2023 10:32
@0xB10C 0xB10C force-pushed the 2022-11-only-prepare-tracepoint-arguments-when-tracing branch from 16d5988 to de1366a Compare September 3, 2024 09:02
@0xB10C 0xB10C marked this pull request as draft September 3, 2024 10:22
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 3, 2024

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/29600037105

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@0xB10C
Copy link
Contributor Author

0xB10C commented Sep 3, 2024

Rebased this on autotools removal, but the new way of detecting if the systemtap headers are present doesn't seem to work (I didn't do cmake -B build -DWITH_USDT=ON locally and pushed something that doesn't compile). Marking this as draft until I figure out how it works and rebase this on #30741

@0xB10C
Copy link
Contributor Author

0xB10C commented Sep 3, 2024

rebased after #30741 and fixed the compilation with -DWITH_USDT=ON (by also setting SDT_USE_VARIADIC in cmake/module/FindUSDT.cmake)

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5be6cc771faea00915a31240d9b5afdcdd4aa52f

@0xB10C
Copy link
Contributor Author

0xB10C commented Sep 25, 2024

rebased after conflict with #30409

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8a5b224fb11fdaa84265aa6d082d6f5e0ed97336

0xB10C and others added 3 commits October 28, 2024 14:23
This deduplicates the TRACEx macros by using systemtaps STAP_PROBEV[0]
variadic macro instead of the DTrace compability DTRACE_PROBE[1] macros.
Bitcoin Core never had DTrace tracepoints, so we don't need to use the
drop-in replacement for it. As noted in pr25541[2], these macros aren't
compatibile with DTrace on macOS anyway.

This also renames the TRACEx macro to TRACEPOINT to clarify what the
macro does: inserting a tracepoint vs tracing (logging) something.

[0]: https://sourceware.org/git/?p=systemtap.git;a=blob;f=includes/sys/sdt.h;h=24d5e01c37805e55c36f7202e5d4e821b85167a1;hb=ecab2afea46099b4e7dfd551462689224afdbe3a#l407
[1]: https://sourceware.org/git/?p=systemtap.git;a=blob;f=includes/sys/sdt.h;h=24d5e01c37805e55c36f7202e5d4e821b85167a1;hb=ecab2afea46099b4e7dfd551462689224afdbe3a#l490
[2]: https://github.com/bitcoin/bitcoin/pull/25541/files#diff-553886c5f808e01e3452c7b21e879cc355da388ef7680bf310f6acb926d43266R30-R31

Co-authored-by: Martin Leitner-Ankerl <[email protected]>
Before this commit, we would always prepare tracepoint arguments
regardless of the tracepoint being used or not. While we already made
sure not to include expensive arguments in our tracepoints, this
commit introduces gating to make sure the arguments are only prepared
if the tracepoints are actually used. This is a win-win improvement
to our tracing framework. For users not interested in tracing, the
overhead is reduced to a cheap 'greater than 0' compare. As the
semaphore-gating technique used here is available in bpftrace, bcc,
and libbpf, users interested in tracing don't have to change their
tracing scripts while profiting from potential future tracepoints
passing slightly more expensive arguments. An example are mempool
tracepoints that pass serialized transactions. We've avoided the
serialization in the past as it was too expensive.

Under the hood, the semaphore-gating works by placing a 2-byte
semaphore in the '.probes' ELF section. The address of the semaphore
is contained in the ELF note providing the tracepoint information
(`readelf -n ./src/bitcoind | grep NT_STAPSDT`). Tracing toolkits
like bpftrace, bcc, and libbpf increase the semaphore at the address
upon attaching to the tracepoint. We only prepare the arguments and
reach the tracepoint if the semaphore is greater than zero. The
semaphore is decreased when detaching from the tracepoint.

This also extends the "Adding a new tracepoint" documentation to
include information about the semaphores and updated step-by-step
instructions on how to add a new tracepoint.
BCC needs the PID of a bitcoind process to attach to the tracepoints
(instead of the binary path used before) when the tracepoints have a
semaphore.

For reference, we already use the PID in our tracepoint interface
tests. See 220a5a2.
@willcl-ark
Copy link
Member

utACK 0de3e96

Based on git range-diff 38e24df...0de3e96e333090548a43e5e870c4cb8941d6baf1 .

The cmake tidy-up looks good to me, and I examined newly built binaries for probe and semaphore info. I didn't re-run the testing from my previous ACK.

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 0de3e96

@laanwj
Copy link
Member

laanwj commented Nov 8, 2024

re-ACK 0de3e96

@jb55
Copy link
Contributor

jb55 commented Nov 8, 2024

utACK 0de3e96

@0xB10C
Copy link
Contributor Author

0xB10C commented Nov 11, 2024

Less overhead for everyone not hooking into the tracepoints 🥳. Now that this is merged, here are a few ideas I had for future tracepoint interface work. Noting them for prosperity.

  1. We could internalize the relevant macro parts of systemtap's sys/sdt.h for the Linux tracepoints. This would allow us to drop the external dependency on systemtap, as we don't use 99% of it. Some prior commentary on this can be found here: cmake: Switch from tri-state options to boolean. Stage TWO hebasto/bitcoin#162 (comment)
  2. In the past I've managed to use a simple macro build a bitcoind with tracepoints on macOS. While our ebpf based demo scripts aren't compatible, @kouloumos DTrace scripts from tracing: macOS USDT support #25541 are. This could look similar to https://github.com/0xB10C/bitcoin/blob/13b0ce221600fc7040502c834c51433ca96f91c3/src/util/trace.h#L35-L63. However, I currently don't have access to a macOS system to further work on this - I'm looking to rent one.
  3. The same could possible on FreeBSD with e.g. these macros https://github.com/0xB10C/bitcoin/blob/13b0ce221600fc7040502c834c51433ca96f91c3/src/util/trace.h#L104-L119. I haven't tested this on FreeBSD yet. In build: Detect USDT the same way how it is used in the code #27458 (review), vasild mentiones he'd interested in FreeBSD tracepoints. My understanding is that the same macOS DTrace scripts from 25541 would work there too.
  4. At least for macOS, we'd need an per-tracepoint interface definition similar to https://github.com/0xB10C/bitcoin/blob/13b0ce221600fc7040502c834c51433ca96f91c3/src/util/trace.h#L121-L236. With some more commentary, these could replace the list of tracepoints in https://github.com/bitcoin/bitcoin/blob/master/doc/tracing.md#tracepoint-documentation. This would solve something similar to tracing: explicitly cast block_connected duration to nanoseconds #29877 (comment).
  5. Even if we don't do 4. (because we e.g. don't want to do 2.), casting the tracepoint arguments to the type we expect to pass would be worthwhile to avoid problems like tracing: explicitly cast block_connected duration to nanoseconds #29877. For some of our traceponts, we already do this: e.g.

    bitcoin/src/validation.cpp

    Lines 2902 to 2907 in 900b172

    TRACEPOINT(utxocache, flush,
    int64_t{Ticks<std::chrono::microseconds>(SteadyClock::now() - nNow)},
    (uint32_t)mode,
    (uint64_t)coins_count,
    (uint64_t)coins_mem_usage,
    (bool)fFlushForPrune);

Two other ideas that were mentioned in the past:

  1. We could drop the example scripts from /contrib/tracing/* and maintain them, along with proper tests in a CI, Bitcoin Core version compatibility information, possibly libbpf-based C or Rust tools (Porting bcc tools to libbpf #30298), ... in an external repository like, for example, 0xb10c/tracing-scripts, bitcoin-core/tracing-scripts, or bitcoin-monitoring/tracing-scripts (what ever would works best).
  2. If we at some point decide that maintaining the tracepoints in Bitcoin Core adds too much maintenance burden compared to the actual usage they're getting, we could drop the tracepoints but keep the tracepoint interface. We now have a unit test that includes a few nop tracepoints to check that the interface will still compile (https://github.com/0xB10C/bitcoin/blob/0de3e96e333090548a43e5e870c4cb8941d6baf1/src/test/util_trace_tests.cpp). This would allow us to drop the bcc python dependency in the CI and to remove the interface_usdt_* functional tests (which need to run in VM and can't run in a container). Tracepoint users could maintain a patch on Bitcoin Core adding the tracepoints (and tests) they want back in. We'd however loose the tracepoints in release (or actually all) builds which currently allow e.g. everyone experiencing problems with their node to hook into them and extract data without needing to restart it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.