util: explicitly close all AutoFiles that have been written#29307
util: explicitly close all AutoFiles that have been written#29307achow101 merged 4 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29307. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
8b02e03 to
7c58f6e
Compare
|
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
7c58f6e to
5543990
Compare
|
I don't think any care close-check needs to be done when reading from a file. So what about removing the |
That is my understanding too.
|
The assumption would be that the close was called before the destructor is called. This is a code correctness question and seems like the perfect fit for Assume. That is, if the Assume fails, the code was missing a close, and the code must be changed to fix the internal bug. The Assume does not in any way check for IO errors. |
|
Then I misunderstood this:
How? |
|
AutoFile holds a file, which will be nullptr when it is closed. So if the file is nullptr, then it can be assumed to be flushed. |
|
How/when would the file be closed? |
An |
|
Ok, the users of the class then have to explicitly close. That is 1.4. from the OP, I elaborated it with more words. |
|
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
Explicit callers of `AutoFile::fclose()` should check whether `fclose(3)` failed because this may be due to a failure to flush written data to disk. Such failures after writing to a file should be treated as failures to `fwrite(3)`. Github-Pull: bitcoin#29307 Rebased-From: e9abf95ab52f1f2d6f73ecca63133c29c253b7d9
`fread(3)` does not distinguish between end-of-file and error, and callers must use `feof(3)` and `ferror(3)` to determine which occurred. Github-Pull: bitcoin#29307 Rebased-From: 55439903212dcee6af09da7393a41e3754501b37
|
Seems like there is some discussion in the description that is a better fit for a brainstorming issue. |
hodlinator
left a comment
There was a problem hiding this comment.
Code review c7b68dceb29ef67796f7edf79a161bbda766169c
This is a clear incremental improvement over what is on master.
Thanks for taking my suggestions!
Don't foresee any further issues preventing an A-C-K beyond the inline comments below.
src/node/mempool_persist.cpp
Outdated
There was a problem hiding this comment.
In this specific case of mempool_persist.cpp/DumpMempool my variant simply had:
FileWriter file{mockable_fopen_function(file_fspath, "wb"), [] (int err) {
Assume(std::uncaught_exceptions() > 0); // Only expected when exception is thrown before fclose() below.
}};Which would serve to enforce that we either already explicitly fclose()d the file and checked errno further down in the function (leading to the lambda not being called upon destruction), or that an exception was triggered before reaching that far (in which case the Assume() in the lambda succeeds). (This assumes that the thrown exception is worse than the failure to close the file, which may not always be true).
For other cases I ended up with scopes and extra variables to check for failures, like in your examples in the comment above. Agree that it would be nice to find something more elegant... and until we do, your current approach in this PR seems okay.
nit: I still think the user-provided lambda in destructor approach merits an alternative 4 in the PR description.
src/flatfile.cpp
Outdated
There was a problem hiding this comment.
Not sure how to handle this in the callers, probably something along these lines:
diff --git a/src/index/blockfilterindex.cpp b/src/index/blockfilterindex.cpp
index 6b47a0fa0a..332e700995 100644
--- a/src/index/blockfilterindex.cpp
+++ b/src/index/blockfilterindex.cpp
@@ -227,10 +227,13 @@ size_t BlockFilterIndex::WriteFilterToDisk(FlatFilePos& pos, const BlockFilter&
// Pre-allocate sufficient space for filter data.
bool out_of_space;
- m_filter_fileseq->Allocate(pos, data_size, out_of_space);
+ const size_t allocated_size = m_filter_fileseq->Allocate(pos, data_size, out_of_space);
if (out_of_space) {
LogPrintf("%s: out of disk space\n", __func__);
return 0;
+ } else if (allocated_size < data_size) {
+ LogError("Failed to allocate %u for file %s.", data_size, pos.ToString());
+ return 0;
}
AutoFile fileout{m_filter_fileseq->Open(pos)};
diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
index a344e13c89..3ff3aa9122 100644
--- a/src/node/blockstorage.cpp
+++ b/src/node/blockstorage.cpp
@@ -873,6 +873,9 @@ FlatFilePos BlockManager::FindNextBlockPos(unsigned int nAddSize, unsigned int n
if (out_of_space) {
m_opts.notifications.fatalError(_("Disk space is too low!"));
return {};
+ } else if (bytes_allocated < nAddSize) {
+ m_opts.notifications.fatalError(_("Failed allocating disk space!"));
+ return {};
}
if (bytes_allocated != 0 && IsPruneMode()) {
m_check_for_pruning = true;
@@ -918,6 +921,8 @@ bool BlockManager::FindUndoPos(BlockValidationState& state, int nFile, FlatFileP
size_t bytes_allocated = m_undo_file_seq.Allocate(pos, nAddSize, out_of_space);
if (out_of_space) {
return FatalError(m_opts.notifications, state, _("Disk space is too low!"));
+ } else if (bytes_allocated < nAddSize) {
+ return FatalError(m_opts.notifications, state, _("Failed allocating disk space!"));
}
if (bytes_allocated != 0 && IsPruneMode()) {
m_check_for_pruning = true;There was a problem hiding this comment.
I agree that callers of Allocate() can do better error handling. That seems to be out of the scope of this PR because this PR does not change the semantic of Allocate(). Also in master it could return 0 and have out_of_space == false. That means an error has occurred, e.g. it would do that if it cannot open the file (in master). Now it will also do it if it cannot close the file.
|
|
|
|
hodlinator
left a comment
There was a problem hiding this comment.
ACK f8a7580a0816586f75ba9b0e79300c893d362f9f
Mainly changes AutoFile to encourage judicious error checking upon file closure, surfacing failures to complete writing files to disk.
Several alternatives have been considered (listed in PR description) and I agree this is probably the best option for now.
There was a problem hiding this comment.
ACK f8a7580a0816586f75ba9b0e79300c893d362f9f
Changes since last review:
- adjust
AutoFileparameter move semantics; - log unsuccessful closes as a last attempt;
- add
m_was_writtentoTruncate; - & explanatory comments.
Details
diff --git a/src/flatfile.cpp b/src/flatfile.cpp
index 388b30efae..df6596e940 100644
--- a/src/flatfile.cpp
+++ b/src/flatfile.cpp
@@ -46,7 +46,9 @@ FILE* FlatFileSeq::Open(const FlatFilePos& pos, bool read_only) const
}
if (pos.nPos && fseek(file, pos.nPos, SEEK_SET)) {
LogPrintf("Unable to seek to position %u of %s\n", pos.nPos, fs::PathToString(path));
- fclose(file);
+ if (fclose(file) != 0) {
+ LogError("Unable to close file %s", fs::PathToString(path));
+ }
return nullptr;
}
return file;
@@ -68,7 +70,10 @@ size_t FlatFileSeq::Allocate(const FlatFilePos& pos, size_t add_size, bool& out_
if (file) {
LogDebug(BCLog::VALIDATION, "Pre-allocating up to position 0x%x in %s%05u.dat\n", new_size, m_prefix, pos.nFile);
AllocateFileRange(file, pos.nPos, inc_size);
- fclose(file);
+ if (fclose(file) != 0) {
+ LogError("Cannot close file %s%05u.dat after extending it with %u bytes", m_prefix, pos.nFile, new_size);
+ return 0;
+ }
return inc_size;
}
} else {
@@ -86,17 +91,24 @@ bool FlatFileSeq::Flush(const FlatFilePos& pos, bool finalize) const
return false;
}
if (finalize && !TruncateFile(file, pos.nPos)) {
- fclose(file);
LogError("%s: failed to truncate file %d\n", __func__, pos.nFile);
+ if (fclose(file) != 0) {
+ LogError("Failed to close file %d", pos.nFile);
+ }
return false;
}
if (!FileCommit(file)) {
- fclose(file);
LogError("%s: failed to commit file %d\n", __func__, pos.nFile);
+ if (fclose(file) != 0) {
+ LogError("Failed to close file %d", pos.nFile);
+ }
return false;
}
DirectoryCommit(m_dir);
- fclose(file);
+ if (fclose(file) != 0) {
+ LogError("Failed to close file %d after flush", pos.nFile);
+ return false;
+ }
return true;
}
diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
index 557ddaceab..81f42827d9 100644
--- a/src/node/blockstorage.cpp
+++ b/src/node/blockstorage.cpp
@@ -961,6 +961,7 @@ bool BlockManager::WriteBlockUndo(const CBlockUndo& blockundo, BlockValidationSt
// Write undo data & checksum
fileout << blockundo << hasher.GetHash();
}
+ // BufferedWriter will flush pending data to file when fileout goes out of scope.
}
// Make sure that the file is closed before we call `FlushUndoFile`.
diff --git a/src/node/mempool_persist.cpp b/src/node/mempool_persist.cpp
index 2551a43166..d4d9263973 100644
--- a/src/node/mempool_persist.cpp
+++ b/src/node/mempool_persist.cpp
@@ -201,8 +201,10 @@ bool DumpMempool(const CTxMemPool& pool, const fs::path& dump_path, FopenFn mock
LogInfo("Writing %d unbroadcast transactions to file.\n", unbroadcast_txids.size());
file << unbroadcast_txids;
- if (!skip_file_commit && !file.Commit())
+ if (!skip_file_commit && !file.Commit()) {
+ (void)file.fclose();
throw std::runtime_error("Commit failed");
+ }
if (file.fclose() != 0) {
throw std::runtime_error(
strprintf("Error closing %s: %s", fs::PathToString(file_fspath), SysErrorString(errno)));
diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index 4f5a6b8434..be953cefde 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -83,7 +83,7 @@ UniValue WriteUTXOSnapshot(
CCoinsViewCursor* pcursor,
CCoinsStats* maybe_stats,
const CBlockIndex* tip,
- AutoFile& afile,
+ AutoFile&& afile,
const fs::path& path,
const fs::path& temppath,
const std::function<void()>& interruption_point = {});
@@ -3115,7 +3115,14 @@ static RPCHelpMan dumptxoutset()
}
}
- UniValue result = WriteUTXOSnapshot(*chainstate, cursor.get(), &stats, tip, afile, path, temppath, node.rpc_interruption_point);
+ UniValue result = WriteUTXOSnapshot(*chainstate,
+ cursor.get(),
+ &stats,
+ tip,
+ std::move(afile),
+ path,
+ temppath,
+ node.rpc_interruption_point);
fs::rename(temppath, path);
result.pushKV("path", path.utf8string());
@@ -3167,7 +3174,7 @@ UniValue WriteUTXOSnapshot(
CCoinsViewCursor* pcursor,
CCoinsStats* maybe_stats,
const CBlockIndex* tip,
- AutoFile& afile,
+ AutoFile&& afile,
const fs::path& path,
const fs::path& temppath,
const std::function<void()>& interruption_point)
@@ -3244,12 +3251,19 @@ UniValue WriteUTXOSnapshot(
UniValue CreateUTXOSnapshot(
node::NodeContext& node,
Chainstate& chainstate,
- AutoFile& afile,
+ AutoFile&& afile,
const fs::path& path,
const fs::path& tmppath)
{
auto [cursor, stats, tip]{WITH_LOCK(::cs_main, return PrepareUTXOSnapshot(chainstate, node.rpc_interruption_point))};
- return WriteUTXOSnapshot(chainstate, cursor.get(), &stats, tip, afile, path, tmppath, node.rpc_interruption_point);
+ return WriteUTXOSnapshot(chainstate,
+ cursor.get(),
+ &stats,
+ tip,
+ std::move(afile),
+ path,
+ tmppath,
+ node.rpc_interruption_point);
}
static RPCHelpMan loadtxoutset()
diff --git a/src/rpc/blockchain.h b/src/rpc/blockchain.h
index b42fe96fd3..0e42bed947 100644
--- a/src/rpc/blockchain.h
+++ b/src/rpc/blockchain.h
@@ -51,7 +51,7 @@ void CalculatePercentilesByWeight(CAmount result[NUM_GETBLOCKSTATS_PERCENTILES],
UniValue CreateUTXOSnapshot(
node::NodeContext& node,
Chainstate& chainstate,
- AutoFile& afile,
+ AutoFile&& afile,
const fs::path& path,
const fs::path& tmppath);
diff --git a/src/streams.cpp b/src/streams.cpp
index 3c46530990..2c873e50d3 100644
--- a/src/streams.cpp
+++ b/src/streams.cpp
@@ -119,6 +119,7 @@ bool AutoFile::Commit()
bool AutoFile::Truncate(unsigned size)
{
+ m_was_written = true;
return ::TruncateFile(m_file, size);
}
diff --git a/src/test/util/chainstate.h b/src/test/util/chainstate.h
index eebf504d98..d5100d15b1 100644
--- a/src/test/util/chainstate.h
+++ b/src/test/util/chainstate.h
@@ -47,8 +47,11 @@ CreateAndActivateUTXOSnapshot(
FILE* outfile{fsbridge::fopen(snapshot_path, "wb")};
AutoFile auto_outfile{outfile};
- UniValue result = CreateUTXOSnapshot(
- node, node.chainman->ActiveChainstate(), auto_outfile, snapshot_path, snapshot_path); // Will close auto_outfile.
+ UniValue result = CreateUTXOSnapshot(node,
+ node.chainman->ActiveChainstate(),
+ std::move(auto_outfile), // Will close auto_outfile.
+ snapshot_path,
+ snapshot_path);
LogPrintf(
"Wrote UTXO snapshot to %s: %s\n", fs::PathToString(snapshot_path.make_preferred()), result.write());
I have checked remaining explicit closes, couldn't find any other one that was an AutoFile, but was wondering if we should maybe check these as well:
bitcoin/src/util/fs_helpers.cpp
Line 54 in 24e5fd3
bitcoin/src/util/fs_helpers.cpp
Line 134 in 24e5fd3
bitcoin/src/util/readwritefile.cpp
Line 33 in 00e9b97
I have also rebased the PR locally and ran all tests.
Have `WriteUTXOSnapshot()` take rvalue reference to make it obvious that it takes ownership of the file.
There is no way to report a close error from `AutoFile` destructor. Such an error could be serious if the file has been written to because it may mean the file is now corrupted (same as if write fails). So, change all users of `AutoFile` that use it to write data to explicitly close the file and handle a possible error.
If an `AutoFile` has been written to, then expect callers to have closed it explicitly via the `AutoFile::fclose()` method. This is because if the destructor calls `std::fclose()` and encounters an error, then it is too late to indicate this to the caller in a meaningful way.
|
|
hodlinator
left a comment
There was a problem hiding this comment.
re-ACK c10e382
Only minor changes in response to feedback since first ACK (#29307 (review))
Here is my take of each one: bitcoin/src/util/fs_helpers.cpp Line 54 in 24e5fd3 In this case the file is opened or created and closed immediately, without writing to it. So this cannot cause a data corruption. Still, would be sensible to check bitcoin/src/util/readwritefile.cpp Line 28 in 00e9b97 bitcoin/src/util/readwritefile.cpp Line 33 in 00e9b97 The file is opened for reading only. Maybe it would be better to check bitcoin/src/util/fs_helpers.cpp Line 134 in 24e5fd3 This code is beyond me, I don't get it. It opens a file, does
There is no data associated with that file descriptor, so what is the point!? Now, the same file might be opened at the same time by another piece of code, but that would be a different file descriptor. Indeed
|
| if (fileout.fclose() != 0) { | ||
| const int errno_save{errno}; | ||
| remove(pathTmp); | ||
| LogError("Failed to close file %s after commit: %s", fs::PathToString(pathTmp), SysErrorString(errno_save)); |
There was a problem hiding this comment.
not sure about this. Commit is stronger than fclose, so I fail to see how this could ever fail on a real and normal system.
Forcing devs to write verbose and dead code doesn't seem ideal. It would be better to just treat Commit as m_was_written=false.
There was a problem hiding this comment.
Agree, would make sense to set m_was_written=false when we return true from AutoFile::Commit(). Maybe rename m_was_written to m_pending_output. Then this last if-block could be removed.
There was a problem hiding this comment.
Maybe rename m_was_written to m_pending_output
Or just m_dirty
There was a problem hiding this comment.
Are you suggesting to ignore the return value of fileout.fclose() like in master and assume it must have succeeded? Note that the errors of fclose() are a superset of the fflush() errors. In other words, fclose() may fail for errors other than fflush() errors. From fclose man page:
The fclose() function may also fail and set errno for any of the errors specified for the routines close(2), write(2), or fflush(3).
There was a problem hiding this comment.
My understanding is that once the file is synced, subsequent IO errors shouldn't matter, because the file was already synced.
There was a problem hiding this comment.
I prefer to be on the safe side and handle fclose() errors, it is just a few extra lines.
There was a problem hiding this comment.
If a Commit is not possible without closing the file, then Commit should close the file. Requiring all call sites of Commit to manually close the file seems verbose and less safe to me.
There was a problem hiding this comment.
If Commit() fails and (while failing) were to close the file internally as suggested, it should probably throw an exception after that so calling code doesn't try to continue using a closed AutoFile.
An alternative would be to have a AutoFile::CommitAndClose()-function that always closes regardless of errors and clearly signals that calling code does not expect to continue using the AutoFile?
There was a problem hiding this comment.
An alternative would be to have a
AutoFile::CommitAndClose()-function that always closes regardless of errors and clearly signals that calling code does not expect to continue using theAutoFile?
yes, that is what i was trying to suggest
There was a problem hiding this comment.
I think the current Commit() is good as it is. It does what it promises - flushes the file data (and checks that the flush succeeded). However a subsequent close is not guaranteed to succeed if commit succeeded before that because close does more than flush.
| if (fclose(file) != 0) { | ||
| LogError("Failed to close file %d", pos.nFile); |
hodlinator
left a comment
There was a problem hiding this comment.
I am puzzled 😕 🤕edit: now it makes sense,DirectoryCommit()syncs the directory itself, not the file, I was blind. So thatfclose()better be checked for an error. This was added in #14501
Might need to consider networked FS directory behavior when adding errors (#14501 (comment))?
| if (fileout.fclose() != 0) { | ||
| const int errno_save{errno}; | ||
| remove(pathTmp); | ||
| LogError("Failed to close file %s after commit: %s", fs::PathToString(pathTmp), SysErrorString(errno_save)); |
There was a problem hiding this comment.
Agree, would make sense to set m_was_written=false when we return true from AutoFile::Commit(). Maybe rename m_was_written to m_pending_output. Then this last if-block could be removed.
|
ACK c10e382 |
| if (fclose(file) != 0) { | ||
| LogError("Unable to close file %s", fs::PathToString(path)); |
There was a problem hiding this comment.
no write has happened, and an error is already logged. Seems redundant to log even more errors?
| fclose(file); | ||
| LogError("%s: failed to truncate file %d\n", __func__, pos.nFile); | ||
| if (fclose(file) != 0) { | ||
| LogError("Failed to close file %d", pos.nFile); |
There was a problem hiding this comment.
same here. if truncation failed and is logged as error, i don't think it matters to log another error whether or not closing has failed as well.
|
@vasild Are you planning on responding to the (remaining) outstanding comments here? Or opening a followup? |
|
I prefer to always check whether a syscall failed even if it seems that it should always succeed. In this particular case, the "seems" is not very strong because |
fclose(3)may fail to flush the previously written data to disk, thus a failingfclose(3)is as serious as a failingfwrite(3).Previously the code ignored
fclose(3)failures. This PR improves that by changing all users ofAutoFilethat use it to write data to explicitly close the file and handle a possible error.Other alternatives are:
fflush(3)after each write to the file (and throw if it fails from theAutoFile::write()method) and hope thatfclose(3)will then always succeed. Assert that it succeeds from the destructor 🙄. Will hurt performance.AutoFileso that its destructor cannot fail. Adjust all its users 😭. For example, if the file has been written to, then require the callers to explicitly call theAutoFile::fclose()method before the object goes out of scope. In the destructor, as a sanity check, assume/assert that this is indeed the case. Defeats the purpose of a RAII wrapper forFILE*which automatically closes the file when it goes out of scope and there are a lot of users ofAutoFile.AutoFileconstructor which will be called from the destructor to handlefclose()errors, as described in util: explicitly close all AutoFiles that have been written #29307 (comment). My thinking is that if that callback is going to only log a message, then we can log the message directly from the destructor without needing a callback. If the callback is going to do more complicated error handling then it is easier to do that at the call site by directly callingAutoFile::fclose()instead of getting theAutoFileobject out of scope (so that its destructor is called) and inspecting for side effects done by the callback (e.g. set a variable to indicate a failedfclose()).