Skip to content

test: test_bitcoin: allow -testdatadir=<datadir>#26564

Merged
achow101 merged 1 commit intobitcoin:masterfrom
LarryRuane:2022-11-testdir
Mar 11, 2024
Merged

test: test_bitcoin: allow -testdatadir=<datadir>#26564
achow101 merged 1 commit intobitcoin:masterfrom
LarryRuane:2022-11-testdir

Conversation

@LarryRuane
Copy link
Contributor

@LarryRuane LarryRuane commented Nov 24, 2022

This backward-compatible change would help with code review, testing, and debugging. When test_bitcoin runs, it creates a working or data directory within /tmp/test_common_Bitcoin\ Core/, named as a long random (hex) string.

This small patch does three things:

  • If the (new) argument -testdatadir=<datadir> is given, use <datadir>/test_temp/<test-name>/datadir as the working directory
  • When the test starts, remove <datadir>/test_temp/<test-name>/datadir if it exists from an earlier run (currently, it's presumed not to exist due to the long random string)
  • Don't delete the working directory at the end of the test if a custom data directory is being used

Example usage, which will remove, create, use /somewhere/test_temp/getarg_tests/boolarg, and leave it afterward:

$ test_bitcoin --run_test=getarg_tests/boolarg -- -testdatadir=/somewhere
Running 1 test case...
Test directory (will not be deleted): "/somewhere/test_temp/getarg_tests/boolarg/datadir"

*** No errors detected
$ ls -l /somewhere/test_temp/getarg_tests/boolarg/datadir
total 8
drwxrwxr-x 2 larry larry 4096 Feb 22 10:28 blocks
-rw-rw-r-- 1 larry larry 1273 Feb 22 10:28 debug.log

(A relative pathname also works.)

This change affects only test_bitcoin; it could also be applied to test_bitcoin-qt but that's slightly more involved so I'm skipping that for now.

The rationale for this change is that, when running the test using the debugger, it's often useful to watch debug.log as the test runs and inspect some of the other files (I've looked at the generated blknnnn.dat files for example). Currently, that requires figuring out where the test's working directory is since it changes on every test run. Tests can be run with -printtoconsole=1 to show debug logging to the terminal, but it's nice to keep debug.log continuously open in an editor, for example.

Even if not using a debugger, it's sometimes helpful to see debug.log and other artifacts after the test completes.

Similar functionality is already possible with the functional tests using the --tmpdir= and --nocleanup arguments.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 24, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK cbergqvist, marcofleon, davidgumberg, furszy, tdb3, achow101
Concept ACK stickies-v, kouloumos, RandyMcMillan

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@DrahtBot DrahtBot added the Tests label Nov 24, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure this will break when running in parallel.

What about a --nocleanup option, like the one in the functional test framework, if the goal is to not delete the dir?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would work, but what about checking the datadir lock before removing it, e.g. with util/system.cpp::LockDirectory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain, tests running in parallel can not share a data directory. With rand256, each test gets a unique one. However, with TEST_BITCOIN_PATH all tests get the same one.

Copy link
Contributor

@stickies-v stickies-v Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, my suggestion was rather: since this is an advanced setting, we could let the user handle making sure that there are no tests running in parallel, but just add a simple check in case they aren't aware/mistaken (by checking the datadir lock)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a --nocleanup option ... if the goal is to not delete the dir?

Yes, but also to have a fixed data directory path, so I can open its debug.log in an editor, and re-read the file as the test runs and after the test completes. (Or better yet, set the editor to automatically re-read if possible, which I do with vim.) Then if I run the test again, I can re-read the file again, without having to open a different file.

But I think specifying this environment variable can also indicate not to delete the directory when the test completes; you wouldn't want to do this normally because the disk usage would continuously increase as you run the test repeatedly. If instead you specify a fixed directory, then disk usage doesn't grow as you repeatedly run tests, so it's okay for the test not to delete it at the end.

tests running in parallel can not share a data directory

Yes, you definitely wouldn't specify this to multiple tests running in parallel. I would only use this when I'm running a single test that I'm trying to understand or debug. That is, I definitely would not set this environment variable in my .bashrc! (I could improve the doc changes to make that clear.) The default is to use separate directories (as today).

... checking the datadir lock ...

The unit tests don't create a lock file; only bitcoind does that. The unit tests don't actually create a full datadir, only some parts of it, and can also create extra files that aren't normally in a datadir. (That's why I called it a working dir in the title and description, instead of a datadir, although the test README does call it a datadir.)

I don't think it's worth adding a lock file; I think users of this will be advanced enough to understand that multiple tests can't simultaneously share the same work directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sgtm

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK - seems helpful to improve productivity in certain workflows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would work, but what about checking the datadir lock before removing it, e.g. with util/system.cpp::LockDirectory?

@LarryRuane LarryRuane changed the title test: allow TEST_BITCOIN_PATH to specify working dir test: allow BITCOIN_TEST_PATH to specify working dir Nov 25, 2022
@LarryRuane
Copy link
Contributor Author

Force-pushed to incorporate two review suggestions (thanks!):

  • add description to the other README files (benchmark, fuzz, qt)
  • rename TEST_BITCOIN_PATH to BITCOIN_TEST_PATH

@luke-jr
Copy link
Member

luke-jr commented Nov 26, 2022

If the environment variable BITCOIN_TEST_PATH is set, use its value as the test's working directory, rather than a randomly-named path in /tmp/test_common_Bitcoin\ Core/
When the test starts, remove the working directory if it exists

This combination seems dangerous... :/

@LarryRuane
Copy link
Contributor Author

@luke-jr

This combination seems dangerous... :/

If the concern is that someone may specify a real data directory by mistake, that's a good point! A real datadir has a .lock file, whether bitcoind is running or not. This file will never exist in one of these unit test work directories (I verified that by running all the tests, and also by source code inspection starting with ".lock").

I just force-pushed code to exit without removing the work directory if contains .lock, for example:

$ BITCOIN_TEST_PATH=~/.bitcoin src/test/test_bitcoin --run_test=getarg_tests/logargs
Running 1 test case...
unknown location(0): fatal error: in "getarg_tests/logargs": std::runtime_error: .lock exists, not using possible real data directory
test/getarg_tests.cpp(423): last checkpoint: "logargs" fixture ctor

*** 1 failure is detected in the test module "Bitcoin Core Test Suite"
$ 

Does this address your concern? Is there a better way to report the error?

@luke-jr
Copy link
Member

luke-jr commented Nov 27, 2022

Not quite, I'm more concerned about a situation where BITCOIN_TEST_PATH might be /home/user or something else weird. While it's a crazy user error, nobody expects setting an env variable would nuke a path.

@LarryRuane
Copy link
Contributor Author

Not quite, I'm more concerned about a situation where BITCOIN_TEST_PATH might be /home/user or something else weird. While it's a crazy user error, nobody expects setting an env variable would nuke a path.

Another great point! Force-pushed a change (c48a1b4c6949af4e610035d037c6195bc22904ca) so that the environment variable only replaces the random path component within /tmp/test_common_Bitcoin\ Core/mydatadir/. The specified path must be relative. All I care about is that the datadir has a fixed name (across multiple test runs). This should now be very safe. (Also removed the check for .lock, not needed.) I'll update the description now too.

$ BITCOIN_TEST_PATH=mydatadir src/test/test_bitcoin --run_test=getarg_tests/doubledash
Running 1 test case...

*** No errors detected
$ ls -l /tmp/test_common_Bitcoin\ Core/mydatadir/
total 8
drwxrwxr-x 2 larry larry 4096 Nov 27 22:45 blocks
-rw-rw-r-- 1 larry larry 1003 Nov 27 22:45 debug.log
$ 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use a hardcoded path item like "fixed" and turn the env into a flag? This should be fine, because anyone can still modify the root temp dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this sounds a little far-fetched, but I have actually done the following: I have two test binaries, one from a PR branch and the other from master, and the test fails on the PR branch but passes on master. I've single-stepped through the tests simultaneously (in sync) comparing them to see what happens leading up to the failure. In this case, it would be convenient to have two different fixed data directories (if I needed to restart the test many times) because I also want to watch the two debug.log files.

Also, as long as we're going to the trouble of implementing and documenting an environment variable, it's no extra work to make it a value instead of a flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is still possible. You can set TMPDIR=/tmp/1, see https://en.cppreference.com/w/cpp/filesystem/temp_directory_path

Not sure if relevant, but BITCOIN_TEST_PATH=../../home/user will still delete your home dir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I just force-pushed an update to require that BITCOIN_TEST_PATH doesn't contain the separator character. That should take care of both problems. I would say that if the user has set TMPDIR, it should be safe to delete the given directory, even if it's within the user's home directory, since paths containing ../ are no longer allowed.

$ BITCOIN_TEST_PATH=../mydatadir src/test/test_bitcoin --run_test=getarg_tests/doubledash
Running 1 test case...
unknown location(0): fatal error: in "getarg_tests/doubledash": std::runtime_error: BITCOIN_TEST_PATH cannot contain path separators
test/getarg_tests.cpp(380): last checkpoint: "doubledash" fixture ctor

*** 1 failure is detected in the test module "Bitcoin Core Test Suite"
$

Copy link
Contributor

@kouloumos kouloumos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK
I think this is useful; I recently wanted to read some logs from a benchmark and I wasn't sure how to do it.

This functionality is already possible with the functional tests using the --tmpdir= and --nocleanup arguments.

But this (fixed data directory path by overriding) is something that is not currently possible in the functional tests framework. Which initially made me a bit skeptical, but the explanation of your flow makes sense, plus the fact that now the possible paths directories are confined inside /tmp/test_common_Bitcoin\ Core/.

And since the BITCOIN_TEST_PATH environmental variable does not define a path anymore but a directory, maybe BITCOIN_TEST_DIR could be a better name?

@achow101
Copy link
Member

Are you still working on this?

@LarryRuane
Copy link
Contributor Author

Yes, still hoping to get this merged. Force-pushed 5df0ed5ab789e005cb9aa0fc3ab61b7ce0b80ec8 to rebase onto the latest master (since it's been a long time)

@LarryRuane
Copy link
Contributor Author

Another force push to improve the doc slightly and to address review comments, thanks @kouloumos!

Ready for review, also ping @theStack.

@LarryRuane
Copy link
Contributor Author

Rebased for merge conflict.

@LarryRuane
Copy link
Contributor Author

@tdb3
I force-pushed a fix, it turns out that Berkeley DB (BDB) doesn't work with a relative path, only absolute. So it's a one-line fix. Thanks for finding this bug!

@tdb3
Copy link
Contributor

tdb3 commented Feb 24, 2024

@tdb3 I force-pushed a fix, it turns out that Berkeley DB (BDB) doesn't work with a relative path, only absolute. So it's a one-line fix. Thanks for finding this bug!

Happy to help. Nice job with the fix. ACK 11aeabde29820ab218757a6c5b959f2ba2cd8f13.

Retested on my machine as well (the same steps outlined above). All passed!

@DrahtBot DrahtBot requested review from cbergqvist and removed request for tdb3 February 24, 2024 10:33
@cbergqvist
Copy link
Contributor

Great find @tdb3! re-ACK 11aeabd.

(Diffed against previous commit and re-ran ./src/test/test_bitcoin -- --testdatadir= with absolute and relative paths. Not explicitly opting in to BDB though).

@DrahtBot DrahtBot removed the request for review from cbergqvist February 24, 2024 13:31
@davidgumberg
Copy link
Contributor

tACK 11aeabd

Tested with the following script that runs every test suite with an absolute and a relative dir:

bitcoin_dir="$HOME/dev/bitcoin"

test_absolute_dir="$HOME/dev/test_data"
test_relative_dir="test_data"

for file in "$bitcoin_dir"/src/test/*_tests.cpp; do
    test_name=$(basename $file | sed 's/\.[^.]*$//')

    $bitcoin_dir/src/test/test_bitcoin -t $test_name -- -testdatadir="$test_absolute_dir"
    if [ $? -ne 0 ]; then
        echo $test_name failed with absolute dir $test_absolute_dir!
        exit 1
    fi

    $bitcoin_dir/src/test/test_bitcoin -t $test_name -- -testdatadir="$test_relative_dir"
    if [ $? -ne 0 ]; then
        echo $test_name failed with relative dir $test_relative_dir!
        exit 1
    fi

@tdb3
Copy link
Contributor

tdb3 commented Mar 1, 2024

@tdb3
I force-pushed a fix, it turns out that Berkeley DB (BDB) doesn't work with a relative path, only absolute. So it's a one-line fix. Thanks for finding this bug!

Not sure how to check if CI was passing before the force push, but if it was, maybe this is a gap in CI? Maybe an Issue should be written?

@LarryRuane
Copy link
Contributor Author

@tdb3

Not sure how to check if CI was passing before the force push, but if it was, maybe this is a gap in CI? Maybe an Issue should be written?

CI was passing before this fix, but in my opinion, CI (regression testing) doesn't need to test test-only code. (Otherwise, we could end up with an infinite recursion of tests! 😅). I think for a change like this, manual testing (exactly as you did) is sufficient.

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 11aeabde29. Left a comment.

And also, have made few tiny changes. Take them if you like them:

diff --git a/src/test/util/setup_common.cpp b/src/test/util/setup_common.cpp
--- a/src/test/util/setup_common.cpp	(revision 11aeabde29820ab218757a6c5b959f2ba2cd8f13)
+++ b/src/test/util/setup_common.cpp	(date 1709516409185)
@@ -98,6 +98,20 @@
 };
 static NetworkSetup g_networksetup_instance;
 
+/** Register test-only arguments */
+static void SetupUnitTestArgs(ArgsManager& argsman)
+{
+    argsman.AddArg("-testdatadir", strprintf("Custom data directory (default: %s<random_string>)", fs::PathToString(fs::temp_directory_path() / "test_common_" PACKAGE_NAME / "")),
+                   ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
+}
+
+/** Test setup failure */
+static void ExitFailure(std::string_view str_err)
+{
+    std::cerr << str_err << "\n";
+    exit(EXIT_FAILURE);
+}
+
 BasicTestingSetup::BasicTestingSetup(const ChainType chainType, const std::vector<const char*>& extra_args)
     : m_args{}
 {
@@ -123,8 +137,7 @@
     gArgs.ClearPathCache();
     {
         SetupServerArgs(*m_node.args);
-        m_node.args->AddArg("-testdatadir", strprintf("Custom data directory (default: %s<random_string>)", fs::PathToString(fs::temp_directory_path() / "test_common_" PACKAGE_NAME / "")),
-                            ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
+        SetupUnitTestArgs(*m_node.args);
         std::string error;
         if (!m_node.args->ParseParameters(arguments.size(), arguments.data(), error)) {
             m_node.args->ClearArgs();
@@ -141,10 +154,8 @@
         // Custom data directory
         m_has_custom_datadir = true;
         fs::path root_dir{m_node.args->GetPathArg("-testdatadir")};
-        if (root_dir.empty()) {
-            std::cerr << "-testdatadir argument is empty, please specify a path\n";
-            exit(EXIT_FAILURE);
-        }
+        if (root_dir.empty()) ExitFailure("-testdatadir argument is empty, please specify a path");
+
         root_dir = fs::absolute(root_dir);
         const std::string test_path{G_TEST_GET_FULL_NAME ? G_TEST_GET_FULL_NAME() : ""};
         const fs::path lockdir{root_dir / "test_temp" / fs::PathFromString(test_path)};
@@ -153,14 +164,12 @@
         // Try to obtain the lock; if unsuccessful don't disturb the existing test.
         TryCreateDirectories(lockdir);
         if (util::LockDirectory(lockdir, ".lock", /*probe_only=*/false) != util::LockResult::Success) {
-            std::cerr << "Cannot obtain a lock on test data lock directory " + fs::PathToString(lockdir) + '\n' +
-                             "The test executable is probably already running.\n";
-            exit(EXIT_FAILURE);
+            ExitFailure("Cannot obtain a lock on test data lock directory " + fs::PathToString(lockdir) + '\n' + "The test executable is probably already running.");
         }
 
-        // Always start with a fresh data directory; this doesn't delete the .lock file.
+        // Always start with a fresh data directory; this doesn't delete the .lock file located a level above.
         fs::remove_all(m_path_root);
-        TryCreateDirectories(m_path_root);
+        if (!TryCreateDirectories(m_path_root)) ExitFailure("Cannot create test data directory");
 
         // Print the test directory name if custom.
         std::cout << "Test directory (will not be deleted): " << m_path_root << std::endl;

@LarryRuane
Copy link
Contributor Author

Force pushed to accept review comments, thanks furszy

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 0e477344fe4094ada5262a1e7e465d4cb93cacdd

@DrahtBot DrahtBot requested a review from davidgumberg March 5, 2024 22:40
Copy link
Contributor

@marcofleon marcofleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. I've observed that running test_bitcoin without specifying a testdatadir passes and leaves no trace of a test directory. I've reviewed that running individual tests with a specific testdatadir results in the directory format that is described in src/test/README.md. I also tested the two exit failure messages.

However, I'm encountering an error and I can't figure out why. I might be overlooking something simple.

I run ./test_bitcoin (all the tests) without testdatadir and everything works. But when I run

./test_bitcoin -- -testdatadir=./testingdir/newdir

it always fails on rpc_tests/rpc_parse_monetary_values with the console error:

Test directory (will not be deleted): "/Users/mc/contrib/bitcoin/src/test/testingdir/test_common_Bitcoin Core/rpc_tests/rpc_parse_monetary_values/datadir"
Error: A fatal internal error occurred, see debug.log for details
Assertion failed: (status == node::ChainstateLoadStatus::SUCCESS), function LoadVerifyActivateChainstate, file setup_common.cpp, line 281.
unknown location:0: fatal error: in "rpc_tests/rpc_parse_monetary_values": signal: SIGABRT (application abort requested)
test/rpc_tests.cpp:277: last checkpoint: "rpc_parse_monetary_values" fixture ctor
unknown location:0: fatal error: in "rpc_tests/rpc_ban": std::__1::ios_base::failure: Could not create TokenPipe: unspecified iostream_category error

The tests prior to this one all succeed and create the proper directories. The rest of the tests after that fail with the unknown location error.

I've tried it with an absolute path as well and the same thing happens. The thing is, when I run the rpc_tests suite or the single test individually, everything passes.

Here is debug.log of rpc_parse_monetary_values (when running all tests)
2024-03-05T22:45:17.249171Z [test] [test/util/random.cpp:31] [Seed] Seed: Setting random seed for current tests to RANDOM_CTX_SEED=52e12d6667c3a8adebe9dbb0ada39c7d07c09ace1970adc18c604aae29be2049
2024-03-05T22:45:17.249183Z [test] [init/common.cpp:155] [LogPackageVersion] Bitcoin Core version v26.99.0-5b73af0dcf3b (release build)
2024-03-05T22:45:17.249241Z [test] [node/chainstatemanager_args.cpp:54] [ApplyArgsManOptions] Script verification uses 7 additional threads
2024-03-05T22:45:17.249286Z [test] [kernel/context.cpp:20] [Context] Using the 'arm_shani(1way,2way)' SHA256 implementation
2024-03-05T22:45:17.249437Z [test] [script/sigcache.cpp:104] [InitSignatureCache] Using 16 MiB out of 16 MiB requested for signature cache, able to store 524288 elements
2024-03-05T22:45:17.249465Z [test] [validation.cpp:1885] [InitScriptExecutionCache] Using 16 MiB out of 16 MiB requested for script execution cache, able to store 524288 elements
2024-03-05T22:45:17.249493Z [scheduler] [util/thread.cpp:20] [TraceThread] scheduler thread start
2024-03-05T22:45:17.249535Z [test] [policy/fees.cpp:560] [CBlockPolicyEstimator] /Users/mc/contrib/bitcoin/src/test/testingdir/test_common_Bitcoin Core/rpc_tests/rpc_parse_monetary_values/datadir/fee_estimates.dat is not found. Continue anyway.
2024-03-05T22:45:17.249586Z [test] [dbwrapper.cpp:249] [CDBWrapper] Opened LevelDB successfully
2024-03-05T22:45:17.249594Z [test] [dbwrapper.cpp:274] [CDBWrapper] Using obfuscation key for /Users/mc/contrib/bitcoin/src/test/testingdir/test_common_Bitcoin Core/rpc_tests/rpc_parse_monetary_values/datadir/blocks/index: 0000000000000000
2024-03-05T22:45:17.249612Z [test] [node/chainstate.cpp:166] [LoadChainstate] Assuming ancestors of block 000000000000000000026811d149d4d261995ec5b3f64f439a0a10e1a464af9a have valid signatures.
2024-03-05T22:45:17.249617Z [test] [node/chainstate.cpp:170] [LoadChainstate] Setting nMinimumChainWork=000000000000000000000000000000000000000063c4ebd298db40af57541800
2024-03-05T22:45:17.249632Z [test] [dbwrapper.cpp:249] [CDBWrapper] Opened LevelDB successfully
2024-03-05T22:45:17.249637Z [test] [dbwrapper.cpp:274] [CDBWrapper] Using obfuscation key for /Users/mc/contrib/bitcoin/src/test/testingdir/test_common_Bitcoin Core/rpc_tests/rpc_parse_monetary_values/datadir/blocks/index: 0000000000000000
2024-03-05T22:45:17.249642Z [test] [node/blockstorage.cpp:505] [LoadBlockIndexDB] LoadBlockIndexDB: last block file = 0
2024-03-05T22:45:17.249648Z [test] [node/blockstorage.cpp:509] [LoadBlockIndexDB] LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=0, size=0, heights=0...0, time=1970-01-01...1970-01-01)
2024-03-05T22:45:17.249652Z [test] [node/blockstorage.cpp:520] [LoadBlockIndexDB] Checking all blk files are present...
2024-03-05T22:45:17.249656Z [test] [validation.cpp:4710] [LoadBlockIndex] Initializing databases...
2024-03-05T22:45:17.249669Z [test] [flatfile.cpp:44] [Open] Unable to open file /Users/mc/contrib/bitcoin/src/test/testingdir/test_common_Bitcoin Core/rpc_tests/rpc_parse_monetary_values/datadir/blocks/blk00000.dat
2024-03-05T22:45:17.249679Z [test] [flatfile.cpp:44] [Open] Unable to open file /Users/mc/contrib/bitcoin/src/test/testingdir/test_common_Bitcoin Core/rpc_tests/rpc_parse_monetary_values/datadir/blocks/blk00000.dat
2024-03-05T22:45:17.249684Z [test] [logging.h:269] [error] ERROR: WriteBlockToDisk: OpenBlockFile failed
2024-03-05T22:45:17.249688Z [test] [node/abort.cpp:22] [AbortNode] *** Failed to write block
2024-03-05T22:45:17.249694Z [test] [noui.cpp:43] [noui_ThreadSafeMessageBox] Error: A fatal internal error occurred, see debug.log for details
2024-03-05T22:45:17.249705Z [test] [logging.h:269] [error] ERROR: LoadGenesisBlock: writing genesis block to disk failed

The error occurs when trying to create blk00000.dat. My next idea is to try running subsets of the test suites along with rpc_tests and see if there's a certain suite that causes the issue. But for now, wanted to get feedback and see if there's something more obvious that I'm missing.

@LarryRuane
Copy link
Contributor Author

@marcofleon - by any chance is your filesystem running out of space? I can't reproduce this problem. Running all the tests creates a directory of size 2.6GB (according to du -sh).

Copy link
Contributor

@marcofleon marcofleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LarryRuane - problem was my open file limit, it was set low. I appreciate your help.

Tested ACK 0e477344fe4094ada5262a1e7e465d4cb93cacdd. I think this is a useful feature to add.

  1. Successfully ran test_bitcoin without using -testdatadir to ensure backwards compatibility. Confirmed that no test directories existed in /tmp after all tests completed.
  2. Ran test_bitcoin using the new feature, specifying a directory using both relative and absolute paths. No issues were observed (once I raised ulimit 😅). Checked that all the directories persisted after completion of the tests and that they were constructed in the format laid out in README.md. Checked inside multiple directories for debug.log files and other relevant files. Successfully ran several individual tests and test suites as well with -testdatadir specified.
  3. Confirmed that the exit failure messages rightfully appeared when leaving testdatadir empty and when attempting to run tests concurrently using the same directory.

@LarryRuane
Copy link
Contributor Author

@marcofleon

problem was my open file limit, it was set low.

Thanks, you revealed a bug! The .lock file was not being closed after each test. This file descriptor leak is why you needed to increase that limit.

Force pushed 2209a7f565ac8569ee777ee44c508683670009d6 to fix this problem. If you have time and interest, perhaps you can revert back to your lower open file limit, it should work now. I verified both the leak and the fix by running this command in another window as the test ran:

$ ls -l /proc/$(pgrep -f src/test/test_bitcoin)/fd

Before this latest force-push, you would see many .lock files, one for each separate test. Now you see just one.

Copy link
Contributor

@cbergqvist cbergqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 2209a7f565ac8569ee777ee44c508683670009d6

Ran single test using the fixture + all tests, with no, relative and absolute dirs.

$ src/test/test_bitcoin -t argsman_tests
$ src/test/test_bitcoin -t argsman_tests -- -testdatadir=foo
$ src/test/test_bitcoin -t argsman_tests -- -testdatadir=/mnt/tmp
$ src/test/test_bitcoin
$ src/test/test_bitcoin -- -testdatadir=foo
$ src/test/test_bitcoin -- -testdatadir=/mnt/tmp

Verified that dirs remained for -testdatadir versions and where removed without.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be fs::remove instead of fs::remove_all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, force-pushed d27e2d8 to fix.

Specifying this argument overrides the path location for test_bitcoin;
it becomes <datadir>/test_common_Bitcoin Core/<testname>/datadir. Also,
this directory isn't removed after the test completes. This can make it
easier for developers to study the results of a test (see the state of
the data directory after the test runs), and also (for example) have an
editor open on debug.log to monitor it across multiple test runs instead
of having to re-open a different pathname each time.

Example usage (note the "--" is needed):

test_bitcoin --run_test=getarg_tests/boolarg -- \
-testdatadir=/somewhere/mydatadir

This will create (if necessary) and use the data directory:

/somewhere/mydatadir/test_common_Bitcoin Core/getarg_tests/boolarg/datadir

Co-authored-by: furszy <[email protected]>
Copy link
Contributor

@cbergqvist cbergqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d27e2d8! (Already did some testing with fs::remove() to make sure it was compatible with the util::Lock/UnlockDirectory implementation).

Copy link
Contributor

@marcofleon marcofleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d27e2d8. I ran all the tests with my previous open file limit and no errors were detected. Also ran some individual tests with no, relative, and absolute paths and everything looks good.

I used watch -n 3 lsof -p PID while running the tests in both this commit and the previous one to compare the outputs and observe the leak. I noticed the presence of only one .lock file as compared to a buildup of .lock files when running the old code.

Thanks for the guidance here @LarryRuane! Everything lgtm

@davidgumberg
Copy link
Contributor

Diffed and re-tested running with -testdatadir with an absolute path and with a relative path.

ACK d27e2d8

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d27e2d8

@tdb3
Copy link
Contributor

tdb3 commented Mar 9, 2024

re-ACK for d27e2d8

Re-executed the test actions in #26564 (review). All unit tests passed.

@achow101
Copy link
Member

ACK d27e2d8

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.