Fix TOCTOU race in SharedMemoryHelpers.CreateOrOpenFile on Linux#125524
Fix TOCTOU race in SharedMemoryHelpers.CreateOrOpenFile on Linux#125524steveisok wants to merge 5 commits intodotnet:mainfrom
Conversation
When two processes concurrently create a named mutex backed by a shared
memory file, the following race can occur:
1. Process A calls Open(O_RDWR) — returns ENOENT (file doesn't exist)
2. Process B creates the file
3. Process A calls Open(O_CREAT|O_EXCL) — returns EEXIST (file now exists)
4. Process A throws IOException: 'The file already exists'
This manifests as intermittent IOException crashes in NuGet's
MigrationRunner when parallel dotnet processes run first-time setup,
because NuGet.Common.Migrations.MigrationRunner uses a named mutex
('NuGet-Migrations') to synchronize.
The fix adds a single retry: when the exclusive create fails with
EEXIST, loop back to re-attempt the plain open, which will now succeed
since the file exists.
Fixes dotnet#91987
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull request overview
Fixes a TOCTOU race in SharedMemoryHelpers.CreateOrOpenFile on Unix by retrying when the exclusive create fails with EEXIST, preventing intermittent IOException when multiple processes concurrently create shared-memory-backed mutex files.
Changes:
- Wrap the open/create sequence in a retry loop.
- On
O_CREAT|O_EXCLreturningEEXIST, retry the initialO_RDWRopen once.
You can also share your feedback on Copilot code review. Take the survey.
| // Retry loop to handle the TOCTOU race between the initial open attempt (which may return ENOENT) | ||
| // and the exclusive create attempt (which may return EEXIST if another process created the file | ||
| // in between). On EEXIST, we loop back and re-attempt the open. | ||
| for (int retries = 0; ; retries++) |
There was a problem hiding this comment.
This is going need a bit of reworking, but I think it is the correct approach. It looks like we'd need to elevate the Interop.ErrorInfo out of the loop and after the loop body move the throw Interop.GetExceptionForIoErrno(error, sharedMemoryFilePath); that is currently being preempted by the new continue logic.
There was a problem hiding this comment.
Cool, I figured the same thing. Felt it was good to push this up as a placeholder
| if (error.Error == Interop.Error.EEXIST && retries < 1) | ||
| { | ||
| continue; | ||
| } |
| for (int retries = 0; ; retries++) | ||
| { | ||
| SafeFileHandle fd = Interop.Sys.Open(sharedMemoryFilePath, Interop.Sys.OpenFlags.O_RDWR | Interop.Sys.OpenFlags.O_CLOEXEC, 0); | ||
| Interop.ErrorInfo error = Interop.Sys.GetLastErrorInfo(); |
src/libraries/System.Private.CoreLib/src/System/IO/SharedMemoryManager.Unix.cs
Outdated
Show resolved
Hide resolved
| if (!fd.IsInvalid) | ||
| { | ||
| if (id.IsUserScope) | ||
| // Retry loop to handle the TOCTOU race between the initial open attempt (which may return ENOENT) |
There was a problem hiding this comment.
A different idea to consider. Please take it with a grain of salt as I don't have the context:
- when
createIfNotExistis true, attempt in a loop untilfd.IsValid:- create it with O_CREAT | O_EXCL. A success means we just created it and are the owner.
- if creation failed with
EEXIST, try to open it without creation - if opening without creation failed with ENOENT, it means it got removed in the meantime. Continue the loop
- attempt to open an existing file
Pseudocode:
UnixFileMode permissionsMask = id.IsUserScope
? PermissionsMask_OwnerUser_ReadWrite
: PermissionsMask_AllUsers_ReadWrite;
const Interop.Sys.OpenFlags mandatoryFlags = Interop.Sys.OpenFlags.O_RDWR | Interop.Sys.OpenFlags.O_CLOEXEC
SafeFileHandle fd = new();
while (createIfNotExist && fd.IsInvalid)
{
// Use O_EXCL which provides a guarantee that the file is created by this call.
fd = Interop.Sys.Open(
sharedMemoryFilePath,
mandatoryFlags | Interop.Sys.OpenFlags.O_CREAT | Interop.Sys.OpenFlags.O_EXCL,
(int)permissionsMask);
if (fd.IsInvalid)
{
Interop.ErrorInfo error = Interop.Sys.GetLastErrorInfo();
fd.Dispose();
if (error.Error == Interop.Error.EEXIST)
{
fd = Interop.Sys.Open(sharedMemoryFilePath, mandatoryFlags, 0);
if (fd.IsInvalid)
{
error = Interop.Sys.GetLastErrorInfo();
fd.Dispose();
// The file could have been deleted after the first open attempt, in which case we should retry creating the file.
if (error.Error == Interop.Error.ENOENT)
{
continue;
}
}
}
throw Interop.GetExceptionForIoErrno(error, sharedMemoryFilePath);
}
else
{
createdFile = true;
}
}
if (fd.IsInvalid)
{
Debug.Assert(!createIfNotExist);
fd = Interop.Sys.Open(sharedMemoryFilePath, mandatoryFlags, 0);
if (fd.IsInvalid)
{
Interop.ErrorInfo error = Interop.Sys.GetLastErrorInfo();
fd.Dispose();
throw Interop.GetExceptionForIoErrno(error, sharedMemoryFilePath);
}
}
// If we got here, fd is a valid file handle.There was a problem hiding this comment.
I think this approach would work.
The main difference would be that we're optimizing (in terms of number of syscalls) for the create case here instead of the open existing case. Not that that really matters given all the other infrastructure here IMO.
There was a problem hiding this comment.
I think some minor modifications to the current one is easier to grep than this one. @adamsitnik what prompted your suggestions is there some optimization inherent in this alternative approach?
There was a problem hiding this comment.
I think that Adam's implementation, while a bigger diff from the existing code, is much easier to read and understand when looking at at a glance.
Adam's implementation also better follows Jared's advice (ie don't check for existence as existence may change. Just do the action and handle the errors when the file does not exist.
) https://blog.paranoidcoding.org/2009/12/10/the-file-system-is-unpredictable.html
There was a problem hiding this comment.
I think some minor modifications to the current one is easier to grep than this one. @adamsitnik what prompted your suggestions is there some optimization inherent in this alternative approach?
The problem we are trying to solve is very similar to some File.OpenHandle issues (and this is an API I own) and I personally prefer to use O_EXCL when dealing with TOCTOU because it's atomic.
But again, as I wrote I don't have the context (is the most common case to create the file? or open an existing one?)
There was a problem hiding this comment.
and this is an API I own
Then we can absolutely go in that direction.
But again, as I wrote I don't have the context (is the most common case to create the file? or open an existing one?)
I don't have enough information to answer that either. I agree it does look like a pessimization with respect to existing paths, but as @jkoritzinsky points out there is a ton of machinery here so I'm not sure how much that matters.
Do we have any of these APIs in the perf lab? Is a there a microbenchmark we could run locally that would help see the cost?
There was a problem hiding this comment.
Do we have any of these APIs in the perf lab? Is a there a microbenchmark we could run locally that would help see the cost?
If we don't, we should add it.
There was a problem hiding this comment.
Do we have any of these APIs in the perf lab? Is a there a microbenchmark we could run locally that would help see the cost?
|
Really glad to see that moving the shared mutex logic to managed has helped us solve at least one bug! |
Rework CreateOrOpenFile to use an atomic create-first approach instead of open-then-create. This eliminates the TOCTOU race window by leading with O_CREAT|O_EXCL and falling back to a plain open on EEXIST, with retries for the reverse ENOENT case. Add MutexTests to validate named mutex creation and concurrent access.
src/libraries/System.Private.CoreLib/src/System/IO/SharedMemoryManager.Unix.cs
Show resolved
Hide resolved
When another process creates the shared memory file, there is a small window before it calls FChMod to set the correct permissions. If we open the file during that window, ValidateExistingFile may fail due to a permissions mismatch. Retry the outer loop in this case to give the creator time to complete FChMod.
There was a problem hiding this comment.
Pull request overview
Fixes a TOCTOU race in Unix shared-memory file creation for named mutexes, preventing intermittent IOException under concurrent process creation on Linux/Unix.
Changes:
- Reworks
SharedMemoryHelpers.CreateOrOpenFileto handleEEXISTduring exclusive create by falling back to open + retry logic. - Extracts user-scope file validation into
ValidateExistingFile. - Adds a cross-process Unix test to validate concurrent named mutex creation does not throw.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/IO/SharedMemoryManager.Unix.cs | Implements retry/fallback logic around exclusive create/open and factors validation into a helper. |
| src/libraries/System.Threading/tests/MutexTests.cs | Adds a RemoteExecutor-based regression test for concurrent named mutex creation on Unix. |
You can also share your feedback on Copilot code review. Take the survey.
| const int MaxRetries = 4; | ||
| for (int retries = 0; ; retries++) |
| // Lead with O_CREAT | O_EXCL for an atomic create guarantee. If another process | ||
| // created the file first (EEXIST), fall back to a plain open. If that open gets | ||
| // ENOENT (file was deleted in between), retry. | ||
| const int MaxRetries = 4; | ||
| for (int retries = 0; ; retries++) | ||
| { | ||
| if (Interop.Sys.FStat(fd, out Interop.Sys.FileStatus fileStatus) != 0) | ||
| SafeFileHandle fd = Interop.Sys.Open( | ||
| sharedMemoryFilePath, | ||
| MandatoryFlags | Interop.Sys.OpenFlags.O_CREAT | Interop.Sys.OpenFlags.O_EXCL, | ||
| (int)permissionsMask); |
There was a problem hiding this comment.
Maybe. If we can benchmark, perhaps that would tell us if this is better.
There was a problem hiding this comment.
Maybe. If we can benchmark, perhaps that would tell us if this is better.
You can use the @EgorBot to run the benchmarks via GH comment. An example: #125452 (comment)
It will run provided microbenchmark against a local build of dotnet/runtime and report the results as a comment.
src/libraries/System.Private.CoreLib/src/System/IO/SharedMemoryManager.Unix.cs
Show resolved
Hide resolved
…istingFile Narrow the EEXIST fallback retry to only catch transient permission mismatches (UnauthorizedAccessException) instead of all IOExceptions. UID mismatches are permanent and now propagate immediately without a pointless retry.
Adjust the exception filter and ENOENT check to use MaxRetries - 1 so they are reachable on the last loop iteration. Without this, the guards were dead code and loop exhaustion would silently fall through to the open-existing path instead of throwing.
There was a problem hiding this comment.
Pull request overview
Fixes a Linux TOCTOU race in Unix shared-memory backing files used by cross-process named mutexes, preventing intermittent IOException when multiple processes concurrently create the same mutex.
Changes:
- Reworks
SharedMemoryHelpers.CreateOrOpenFile(Unix) to handleEEXISTon exclusive create by retrying/opening, with bounded retries. - Adds a Unix cross-process regression test that concurrently creates the same named mutex across multiple
RemoteExecutorprocesses.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/libraries/System.Threading/tests/MutexTests.cs | Adds regression coverage for concurrent cross-process named mutex creation on Unix. |
| src/libraries/System.Private.CoreLib/src/System/IO/SharedMemoryManager.Unix.cs | Implements bounded retry logic to close the create/open TOCTOU window and factors validation into a helper. |
You can also share your feedback on Copilot code review. Take the survey.
| if ((fileStatus.Mode & (int)PermissionsMask_AllUsers_ReadWriteExecute) != (int)PermissionsMask_OwnerUser_ReadWrite) | ||
| { | ||
| fd.Dispose(); | ||
| throw new UnauthorizedAccessException(SR.Format(SR.IO_SharedMemory_FilePermissionsIncorrect, sharedMemoryFilePath, PermissionsMask_OwnerUser_ReadWrite)); |
Summary
Fix a TOCTOU (time-of-check-to-time-of-use) race condition in
SharedMemoryHelpers.CreateOrOpenFilethat causes intermittentIOExceptionwhen two processes concurrently create a named mutex backed by shared memory on Linux.The Bug
CreateOrOpenFileuses a two-step approach:Open(O_RDWR)to open an existing fileOpen(O_CREAT|O_EXCL)to create exclusivelyThe race:
Open(O_RDWR)→ ENOENT (file does not exist)Open(O_CREAT|O_EXCL)→ EEXIST (file now exists)This manifests as:
The Fix
When the exclusive create fails with EEXIST, retry from the top (loop back to the
Open(O_RDWR)which will now succeed since the file exists). Limited to four retries to prevent infinite loops.Impact
This is a known flaky failure in CI (labeled
Known Build Error) that has been open since September 2023. It affects any scenario where paralleldotnetprocesses run first-time setup, including:ValidateInstallerson Linux arm64)/tmpFixes #91987
Related: #80619, #76736