Kernel C API by sedited · Pull Request #20 · sedited/bitcoin

sedited · 2024-05-27T15:16:50Z

This is a first attempt at introducing a C header for the libbitcoinkernel library that may be used by external applications for interfacing with Bitcoin Core's validation logic. It currently is limited to operations on blocks. This is a conscious choice, since it already offers a lot of powerful functionality, but sits just on the cusp of still being reviewable scope-wise while giving some pointers on how the rest of the API could look like.

The current design was informed by the development of some small tools using the C header:

A re-implementation (part of this pull request) of bitcoin-chainstate.
A re-implementation of the python block linearize scripts: https://github.com/TheCharlatan/bitcoin/tree/kernelLinearize
A silent payment scanner: https://github.com/josibake/silent-payments-scanner
An electrs index builder: https://github.com/josibake/electrs/commits/electrs-kernel-integration

Next to the C++ header also made available in this pull request, rust bindings are available here: https://github.com/TheCharlatan/rust-bitcoinkernel. The rust bindings include unit and fuzz tests for the API.

The header currently exposes logic for enabling the following functionality:

Feature-parity with the now deprecated libbitcoin-consensus
Optimized sha256 implementations that were not available to previous users of libbitcoin-consensus thanks to a static kernel context
Full support for logging as well as control over categories and severity
Feature parity with the existing experimental bitcoin-chainstate
Traversing the block index as well and using block index entries for reading block and undo data.
Running the chainstate in memory
Reindexing (both full and chainstate-only)
Interrupting long-running functions

The pull request introduces a new kernel-only test binary that purely relies on the kernel C header and the C++ standard library. This is intentionally done to show its capabilities without relying on other code inside the project. This may be relaxed to include some of the existing utilities, or even be merged into the existing test suite.

How can I review this PR?

Scrutinize the commit messages, run the tests, write your own little applications using the library, let your favorite code sanitizer loose on it, hook it up to your fuzzing infrastructure, profile the difference between the existing bitcoin-chainstate and the bitcoin-chainstate introduced here, be nitty on the documentation, police the C interface, opine on your own API design philosophy.

To get a feeling for the API, read through the tests, or one of the examples.

Please try to avoid nits for the tests, these can wait for later and easily be improved over time. Docs exhaustively explaining all the intricacies of the internal goings-on of the library can also be added later.

To configure this PR for making the shared library and the bitcoin-chainstate and test_kernel utilities available:

./autogen.sh
./configure --enable-experimental-util-chainstate --with-experimental-kernel-lib --enable-shared

Python headers might also be useful for testing. ctypeslib2's clang2py can be used to auto-generate bindings:

clang2py src/kernel/bitcoinkernel.h -l /path/to/bitcoin/src/.libs/libbitcoinkernel.so > test_wrapper.py

What about versioning?

The header and library are still experimental and I would expect this to remain so for some time, so best not to worry about versioning yet.

Potential future additions

In future, the C header could be expanded to support (some of these have been roughly implemented):

Handling transactions, block headers, coins cache, utxo set, meta data, and the mempool
Adapters for an abstract coins store
Adapters for an abstract block store
Allocators and buffers for more efficient memory usage
An "io-less" interface

Current drawbacks

For external applications to read the block index of an existing Bitcoin Core node, Bitcoin Core needs to shut down first, since leveldb does not support reading across multiple processes. Other than migrating away from leveldb, there does not seem to be a solution for this problem.
The fatal error handling through the notifications is awkward.
Handling shared pointers in the interfaces is unfortunate. They make ownership and freeing of the resources fuzzy and poison the interfaces with additional types and complexity. However, they seem to be an artifact of the current code that interfaces with the validation engine. The validation engine itself does not seem to make extensive use of these shared pointers.
Error handling is currently implemented with a fixed-sized pre-allocated buffer for holding richer text descriptions. My current preference would eventually be to replace all errors with an enumeration from which a static description can be retrieved if required.
If multiple instances of the same type of objects are used, there is no mechanism for distinguishing the log messages produced by each of them.

stickies-v · 2024-06-07T21:58:12Z

For the *_set functions that currently take a *OptionType n_option and const void* value, have you considered passing a pointer to a struct of pointers instead? I'm probably overlooking the downsides, but the benefits are that:

types are enforced, no need for reinterpret_cast
callers can pass multiple options in one go
I find it a more straightforward interface
it seems to simplify the implementation quite a bit, as per my n=1 test on kernel_chainstate_load_options_set

git diff on 8a2e439e42

diff --git a/src/kernel/bitcoinkernel.cpp b/src/kernel/bitcoinkernel.cpp
index 2d4df6bad8..2d987fbdde 100644
--- a/src/kernel/bitcoinkernel.cpp
+++ b/src/kernel/bitcoinkernel.cpp
@@ -888,29 +888,16 @@ kernel_ChainstateLoadOptions* kernel_chainstate_load_options_create()
 
 void kernel_chainstate_load_options_set(
     kernel_ChainstateLoadOptions* chainstate_load_opts_,
-    kernel_ChainstateLoadOptionType n_option,
-    void* value,
+    const kernel_ChainstateLoadOptionValues* values,
     kernel_Error* error)
 {
     auto chainstate_load_opts{cast_chainstate_load_options(chainstate_load_opts_, error)};
-    if (!chainstate_load_opts) {
-        return;
-    }
-    switch (n_option) {
-    case kernel_ChainstateLoadOptionType::kernel_REINDEX_CHAINSTATE_LOAD_OPTION: {
-        auto reindex{reinterpret_cast<bool*>(value)};
-        chainstate_load_opts->reindex = *reindex;
-        return;
-    }
-    case kernel_ChainstateLoadOptionType::kernel_REINDEX_CHAINSTATE_CHAINSTATE_LOAD_OPTION: {
-        auto reindex_chainstate{reinterpret_cast<bool*>(value)};
-        chainstate_load_opts->reindex_chainstate = *reindex_chainstate;
+    if (!chainstate_load_opts || !values) {
         return;
     }
-    default: {
-        set_error(error, kernel_ErrorCode::kernel_ERROR_UNKNOWN_OPTION, "Unknown chainstate load option");
-    }
-    }
+
+    if (values->reindex) chainstate_load_opts->reindex = *(values->reindex);
+    if (values->reindex_chainstate) chainstate_load_opts->reindex_chainstate = *(values->reindex_chainstate);
 }
 
 void kernel_chainstate_load_options_destroy(kernel_ChainstateLoadOptions* chainstate_load_opts_)
diff --git a/src/kernel/bitcoinkernel.h b/src/kernel/bitcoinkernel.h
index cd5a57d410..b46fed2ebb 100644
--- a/src/kernel/bitcoinkernel.h
+++ b/src/kernel/bitcoinkernel.h
@@ -238,10 +238,10 @@ typedef enum {
  * Available types of chainstate load options. Passed with a corresponding value
  * to kernel_chainstate_load_options_set(..).
  */
-typedef enum {
-    kernel_REINDEX_CHAINSTATE_LOAD_OPTION = 0,        //! Set the reindex option, value should be a bool.
-    kernel_REINDEX_CHAINSTATE_CHAINSTATE_LOAD_OPTION, //! Set the reindex chainstate option, value should be a bool.
-} kernel_ChainstateLoadOptionType;
+typedef struct {
+    bool* reindex;
+    bool* reindex_chainstate;
+} kernel_ChainstateLoadOptionValues;
 
 /**
  * Whether a validated data structure is valid, invalid, or an error was
@@ -740,8 +740,7 @@ kernel_ChainstateLoadOptions* kernel_chainstate_load_options_create();
  */
 void kernel_chainstate_load_options_set(
     kernel_ChainstateLoadOptions* chainstate_load_options,
-    kernel_ChainstateLoadOptionType n_option,
-    void* value,
+    const kernel_ChainstateLoadOptionValues* values,
     kernel_Error* error);
 
 /**
diff --git a/src/test_kernel.cpp b/src/test_kernel.cpp
index 52dfd53f1e..a6c2e7165a 100644
--- a/src/test_kernel.cpp
+++ b/src/test_kernel.cpp
@@ -468,15 +468,14 @@ public:
 
     void SetReindex(bool reindex, kernel_Error& error)
     {
-        kernel_chainstate_load_options_set(m_options, kernel_ChainstateLoadOptionType::kernel_REINDEX_CHAINSTATE_LOAD_OPTION, &reindex, &error);
+        kernel_ChainstateLoadOptionValues opt_values { .reindex = &reindex };
+        kernel_chainstate_load_options_set(m_options, &opt_values, &error);
     }
 
     void SetReindexChainstate(bool reindex_chainstate, kernel_Error& error)
     {
-        kernel_chainstate_load_options_set(m_options,
-                                           kernel_ChainstateLoadOptionType::kernel_REINDEX_CHAINSTATE_CHAINSTATE_LOAD_OPTION,
-                                           &reindex_chainstate,
-                                           &error);
+        kernel_ChainstateLoadOptionValues opt_values { .reindex_chainstate = &reindex_chainstate };
+        kernel_chainstate_load_options_set(m_options, &opt_values, &error);
     }
 
     friend class ChainMan;

sedited · 2024-06-08T08:21:32Z

The reason why we keep opaque option structs is for ABI compatibility. If you change a field in the struct, it will no longer be compatible with prior versions. You'd then end up with many different versioned structs trying to do the same thing for back-compatibility purposes. The usual approach for handling this is just adding a function for each of the options. Then you'd have something like option_set_my_value(OpaqueOptionStruct*, MyValue*). The problem I see us facing with this is that the API will blow up over time with a lot of different functions for setting a single option. So I wanted to present an alternative.

I got the approach here from zmq, which does the same thing here: https://github.com/zeromq/libzmq/blob/master/include/zmq.h#L682 . This approach can also be used for defining an array of multiple options at once, but I did not implement that (yet). There is a caveat here though. Libzmq does check if the size of the value matches the expected size, and I don't do that here. I wasn't sure about this approach, since it will complicate the interfaces (you now have to know the size of some of the opaque values). Maybe we can come up with something better.

I do have some structs defined in the header, but these are either helpers for very specific data types and thus unlikely to change, or structs that I don't expect to change much anymore once I open the PR to the main repo. Deprecating a field in a struct is less bad, since it could just be ignored when it is still set. The callback holder structs will need some discussion though. They currently only hold a subset of the possible to define callbacks and would in their current form need an eventual extension.

src/kernel/bitcoinkernel.h

stickies-v · 2024-06-13T11:27:43Z

src/kernel/bitcoinkernel.cpp

+    auto hash = uint256{Span<const unsigned char>{(*block_hash).hash, 32}};
+    auto block_index = WITH_LOCK(::cs_main, return chainman->m_blockman.LookupBlockIndex(hash));
+    if (!block_index) {
+        set_error(error, kernel_ErrorCode::kernel_ERROR_INTERNAL, "A block with the given hash is not indexed.");


I don't think this should be an error. Just returning a nullptr here (without error message) makes for a much easier interface imo.

As a first step, implement the equivalent of what was implemented in the now deprecated libbitcoinconsensus header. Also add a test binary to exercise the header and library. Unlike the deprecated libbitcoinconsensus the kernel library can now use the hardware-accelerated sha256 implementations thanks for its statically-initialzed context. The functions kept around for backwards-compatibility in the libbitcoinconsensus header are not ported over. As a new header, it should not be burdened by previous implementations. Also add a new error code for handling invalid flag combinations, which would otherwise cause a crash. The macros used in the new C header were adapted from the libsecp256k1 header. To make use of the C header from C++ code, a C++ header is also introduced for wrapping the C header. This makes it safer and easier to use from C++ code.

Exposing logging in the kernel library allows users to follow what is going on when using it. Users of the C header can use `kernel_logging_connection_create(...)` to pass a callback function to Bitcoin Core's internal logger. Additionally the level and severity can be globally configured. By default, the logger buffers messages until `kernel_loggin_connection_create(...)` is called. If the user does not want any logging messages, it is recommended that `kernel_disable_logging()` is called, which permanently disables the logging and any buffering of messages.

The context introduced here holds the objects that will be required for running validation tasks, such as the chosen chain parameters, callbacks for validation events, and an interrupt utility. These will be used in a few commits, once the chainstate manager is introduced. This commit also introduces conventions for defining option objects. A common pattern throughout the C header will be: ``` options = object_option_create(); object = object_create(options); ``` This allows for more consistent usage of a "builder pattern" for objects where options can be configured independently from instantiation.

As a first option, add the chainparams. For now these can only be instantiated with default values. In future they may be expanded to take their own options for regtest and signet configurations. This commit also introduces a unique pattern for setting the option values when calling the `*_set(...)` function.

The notifications are used for notifying on connected blocks and on warning and fatal error conditions. The user of the C header may define callbacks that gets passed to the internal notification object in the `kernel_NotificationInterfaceCallbacks` struct. Each of the callbacks take a `user_data` argument that gets populated from the `user_data` value in the struct. It can be used to recreate the structure containing the callbacks on the user's side, or to give the callbacks additional contextual information.

This is the main driver class for anything validation related, so expose it here. Creating the chainstate manager and block manager options will currently also trigger the creation of their respectively configured directories. The chainstate manager and block manager options were not consolidated into a single object, since the kernel might eventually introduce a block manager object for the purposes of being a light-weight block store reader. The chainstate manager will associate with the context with which it was created for the duration of its lifetime. It is only valid if that context remains in memory too. The tests now also create dedicated temporary directories. This is similar to the behaviour in the existing unit test framework.

The `kernel_chainstate_manager_load_chainstate(...)` function is the final step required to prepare the chainstate manager for future tasks. Its main responsibility is loading the coins and block tree indexes. Though its `context` argument is not strictly required this was added to ensure that the context remains in memory for this operation. This pattern of a "dummy" context will be re-used for functions introduced in later commits. The chainstate load options will be populated over the next few commits.

The added function allows the user process and validate a given block with the chainstate manager. The *_process_block(...) function does some preliminary checks on the block before passing it to `ProcessNewBlock(...)`. These are similar to the checks in the `submitblock()` rpc. Richer processing of the block validation result will be made available in the following commits through the validation interface. The commits also adds a utility for serializing a `CBlock` (`kernel_block_create()`) that may then be passed to the library for processing. The tests exercise the function for both mainnet and regtest. The commit also adds the data of 206 regtest blocks (some blocks also contain transactions).

Adds options for wiping the chainstate and block tree indexes to the chainstate load options. In combination and once the `*_import_blocks(...)` function is added in a later commit, this triggers a reindex. For now, it just wipes the existing data.

This allows a user to run the kernel without creating on-disk files for the block tree and chainstate indexes. This is potentially useful in scenarios where the user needs to do some ephemeral validation operations. One specific use case is when linearizing the blocks on disk. The block files store blocks out of order, so a program may utilize the library and its header to read the blocks with one chainstate manager, and then write them back in order, and without orphans, with another chainstate maanger. To save disk resources and if the indexes are not required once done, it may be beneficial to keep the indexes in memory for the chainstate manager that writes the blocks back again.

The `kernel_import_blocks` function is used to both trigger a reindex, if the indexes were previously wiped through the chainstate load options, or import the block data of a single block file. The behaviour of the import can be verified through the test logs.

Calling interrupt can halt long-running functions associated with objects that were created through the passed-in context.

This adds the infrastructure required to process validation events. For now the external validation interface only has support for the `BlockChecked` callback, but support for the other internal validation interface methods can be added in the future. The validation interface follows an architecture for defining its callbacks and ownership that is similar to the notifications. The task runner is created internally with a context, which itself internally creates a unique ValidationSignals object. When the user creates a new chainstate manager the validation signals are internally passed to the chainstate manager through the context. The callbacks block any further validation execution when they are called. It is up to the user to either multiplex them, or use them otherwise in a multithreaded mechanism to make processing the validation events non-blocking. A validation interface can register for validation events with a context. Internally the passed in validation interface is registerd with the validation signals of a context. The BlockChecked callback introduces a seperate type for a non-owned block. Since a library-internal object owns this data, the user needs to be explicitly prevented from deleting it. In a later commit a utility will be added to copy its data.

These allow for the interpretation of the data in a `BlockChecked` validation interface callback. This is useful to get richer information in case a block failed to validate.

This adds functions for copying serialized block data into a user-owned variable-sized byte array. Use it in the tests for verifying the implementation of the validation interface's `BlockChecked` method.

This adds functions for reading a block from disk with a retrieved block index entry. External services that wish to build their own index, or analyze blocks can use this to retrieve block data. The block index can now be traversed from the tip backwards. This is guaranteed to work, since the chainstate maintains an internal block tree index in memory and every block (besides the genesis) has an ancestor. The user can use this function to iterate through all blocks in the chain (starting from the tip). Once the block index entry for the genesis block is reached a nullptr is returned if the user attempts to get the previous entry.

This adds functions for reading the undo data from disk with a retrieved block index entry. The undo data of a block contains all the spent script pubkeys of all the transactions in a block. In normal operations undo data is used during re-orgs. This data might also be useful for building external indexes, or to scan for silent payment transactions. Internally the block undo data contains a vector of transaction undo data which contains a vector of the spent outputs. For this reason, the `kernel_get_block_undo_size(...)` function is added to the header for retrieving the size of the transaction undo data vector, as well as the `kernel_get_transaction_undo_size(...) function for retrieving the size of each spent outputs vector contained within each transaction undo data entry. With these two sizes the user can iterate through the undo data by accessing the transaction outputs by their indeces with `kernel_get_undo_output_by_index`. If an invalid index is passed in, the `kernel_ERROR_OUT_OF_BOUNDS` error is returned again. The returned `kernel_TransactionOutput` is entirely owned by the user and may be destroyed with the `kernel_transaction_output_destroy(...)` convenience function.

Adds further functions useful for traversing the block index and retrieving block information. The added `kernel_BlockIndexInfo` struct can be expanded in future to hold richer information about a certain block index.

This showcases a re-implementation of bitcoin-chainstate only using the kernel C++ API header.

sedited force-pushed the kernelApi branch 30 times, most recently from 5af39c2 to 84495d8 Compare May 31, 2024 11:34

sedited force-pushed the kernelApi branch 7 times, most recently from 2808f12 to 1e6fd8a Compare June 6, 2024 15:17

stickies-v reviewed Jun 13, 2024

View reviewed changes

src/kernel/bitcoinkernel.h Show resolved Hide resolved

src/kernel/bitcoinkernel.h Outdated Show resolved Hide resolved

stickies-v reviewed Jun 13, 2024

View reviewed changes

sedited added 19 commits October 8, 2024 20:58

kernel: Add interrupt function to C header

78bba95

Calling interrupt can halt long-running functions associated with objects that were created through the passed-in context.

kernel: Add functions for the block validation state to C header

0ee2800

These allow for the interpretation of the data in a `BlockChecked` validation interface callback. This is useful to get richer information in case a block failed to validate.

kernel: Add function for copying block data to C header

4321237

This adds functions for copying serialized block data into a user-owned variable-sized byte array. Use it in the tests for verifying the implementation of the validation interface's `BlockChecked` method.

kernel: Add block index utility functions to C header

c8aa86b

Adds further functions useful for traversing the block index and retrieving block information. The added `kernel_BlockIndexInfo` struct can be expanded in future to hold richer information about a certain block index.

kernel: Add pure kernel bitcoin-chainstate

b129d35

This showcases a re-implementation of bitcoin-chainstate only using the kernel C++ API header.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel C API#20

Kernel C API#20
sedited wants to merge 19 commits intomasterfrom
kernelApi

sedited commented May 27, 2024 •

edited

Loading

Uh oh!

stickies-v commented Jun 7, 2024

Uh oh!

sedited commented Jun 8, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

stickies-v Jun 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sedited commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How can I review this PR?

What about versioning?

Potential future additions

Current drawbacks

Uh oh!

stickies-v commented Jun 7, 2024

Uh oh!

sedited commented Jun 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stickies-v Jun 13, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sedited commented May 27, 2024 •

edited

Loading

sedited commented Jun 8, 2024 •

edited

Loading