Skip to content

Fix TOCTOU race in SharedMemoryHelpers.CreateOrOpenFile on Linux#125524

Open
steveisok wants to merge 5 commits intodotnet:mainfrom
steveisok:fix/shared-memory-eexist-race
Open

Fix TOCTOU race in SharedMemoryHelpers.CreateOrOpenFile on Linux#125524
steveisok wants to merge 5 commits intodotnet:mainfrom
steveisok:fix/shared-memory-eexist-race

Conversation

@steveisok
Copy link
Member

@steveisok steveisok commented Mar 13, 2026

Summary

Fix a TOCTOU (time-of-check-to-time-of-use) race condition in SharedMemoryHelpers.CreateOrOpenFile that causes intermittent IOException when two processes concurrently create a named mutex backed by shared memory on Linux.

The Bug

CreateOrOpenFile uses a two-step approach:

  1. Try Open(O_RDWR) to open an existing file
  2. If ENOENT, try Open(O_CREAT|O_EXCL) to create exclusively

The race:

  1. Process A: Open(O_RDWR) → ENOENT (file does not exist)
  2. Process B: creates the file successfully
  3. Process A: Open(O_CREAT|O_EXCL) → EEXIST (file now exists)
  4. Process A: throws IOException instead of handling EEXIST

This manifests as:

System.IO.IOException: The file '/tmp/.dotnet/shm/session1/NuGet-Migrations' already exists.
   at System.IO.SharedMemoryHelpers.CreateOrOpenFile(...)
   at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
   at NuGet.Common.Migrations.MigrationRunner.Run(...)

The Fix

When the exclusive create fails with EEXIST, retry from the top (loop back to the Open(O_RDWR) which will now succeed since the file exists). Limited to four retries to prevent infinite loops.

Impact

This is a known flaky failure in CI (labeled Known Build Error) that has been open since September 2023. It affects any scenario where parallel dotnet processes run first-time setup, including:

  • VMR scenario tests (ValidateInstallers on Linux arm64)
  • Docker container first-run
  • CI environments with shared /tmp

Fixes #91987
Related: #80619, #76736

When two processes concurrently create a named mutex backed by a shared
memory file, the following race can occur:

1. Process A calls Open(O_RDWR) — returns ENOENT (file doesn't exist)
2. Process B creates the file
3. Process A calls Open(O_CREAT|O_EXCL) — returns EEXIST (file now exists)
4. Process A throws IOException: 'The file already exists'

This manifests as intermittent IOException crashes in NuGet's
MigrationRunner when parallel dotnet processes run first-time setup,
because NuGet.Common.Migrations.MigrationRunner uses a named mutex
('NuGet-Migrations') to synchronize.

The fix adds a single retry: when the exclusive create fails with
EEXIST, loop back to re-attempt the plain open, which will now succeed
since the file exists.

Fixes dotnet#91987

Co-authored-by: Copilot <[email protected]>
Copilot AI review requested due to automatic review settings March 13, 2026 15:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a TOCTOU race in SharedMemoryHelpers.CreateOrOpenFile on Unix by retrying when the exclusive create fails with EEXIST, preventing intermittent IOException when multiple processes concurrently create shared-memory-backed mutex files.

Changes:

  • Wrap the open/create sequence in a retry loop.
  • On O_CREAT|O_EXCL returning EEXIST, retry the initial O_RDWR open once.

You can also share your feedback on Copilot code review. Take the survey.

// Retry loop to handle the TOCTOU race between the initial open attempt (which may return ENOENT)
// and the exclusive create attempt (which may return EEXIST if another process created the file
// in between). On EEXIST, we loop back and re-attempt the open.
for (int retries = 0; ; retries++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going need a bit of reworking, but I think it is the correct approach. It looks like we'd need to elevate the Interop.ErrorInfo out of the loop and after the loop body move the throw Interop.GetExceptionForIoErrno(error, sharedMemoryFilePath); that is currently being preempted by the new continue logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I figured the same thing. Felt it was good to push this up as a placeholder

Comment on lines +483 to +486
if (error.Error == Interop.Error.EEXIST && retries < 1)
{
continue;
}
Comment on lines +424 to +427
for (int retries = 0; ; retries++)
{
SafeFileHandle fd = Interop.Sys.Open(sharedMemoryFilePath, Interop.Sys.OpenFlags.O_RDWR | Interop.Sys.OpenFlags.O_CLOEXEC, 0);
Interop.ErrorInfo error = Interop.Sys.GetLastErrorInfo();
if (!fd.IsInvalid)
{
if (id.IsUserScope)
// Retry loop to handle the TOCTOU race between the initial open attempt (which may return ENOENT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different idea to consider. Please take it with a grain of salt as I don't have the context:

  • when createIfNotExist is true, attempt in a loop until fd.IsValid:
    • create it with O_CREAT | O_EXCL. A success means we just created it and are the owner.
    • if creation failed with EEXIST, try to open it without creation
    • if opening without creation failed with ENOENT, it means it got removed in the meantime. Continue the loop
  • attempt to open an existing file

Pseudocode:

UnixFileMode permissionsMask = id.IsUserScope
    ? PermissionsMask_OwnerUser_ReadWrite
    : PermissionsMask_AllUsers_ReadWrite;

const Interop.Sys.OpenFlags mandatoryFlags = Interop.Sys.OpenFlags.O_RDWR | Interop.Sys.OpenFlags.O_CLOEXEC
SafeFileHandle fd = new();
while (createIfNotExist && fd.IsInvalid)
{
    // Use O_EXCL which provides a guarantee that the file is created by this call.
    fd = Interop.Sys.Open(
        sharedMemoryFilePath,
        mandatoryFlags | Interop.Sys.OpenFlags.O_CREAT | Interop.Sys.OpenFlags.O_EXCL,
        (int)permissionsMask);

    if (fd.IsInvalid)
    {
        Interop.ErrorInfo error = Interop.Sys.GetLastErrorInfo();
        fd.Dispose();

        if (error.Error == Interop.Error.EEXIST)
        {
            fd = Interop.Sys.Open(sharedMemoryFilePath, mandatoryFlags, 0);
            if (fd.IsInvalid)
            {
                error = Interop.Sys.GetLastErrorInfo();
                fd.Dispose();

                // The file could have been deleted after the first open attempt, in which case we should retry creating the file.
                if (error.Error == Interop.Error.ENOENT)
                {
                    continue;
                }
            }
        }

        throw Interop.GetExceptionForIoErrno(error, sharedMemoryFilePath);
    }
    else
    {
        createdFile = true;
    }
}

if (fd.IsInvalid)
{
    Debug.Assert(!createIfNotExist);
    fd = Interop.Sys.Open(sharedMemoryFilePath, mandatoryFlags, 0);

    if (fd.IsInvalid)
    {
        Interop.ErrorInfo error = Interop.Sys.GetLastErrorInfo();
        fd.Dispose();
        throw Interop.GetExceptionForIoErrno(error, sharedMemoryFilePath);
    }
}

// If we got here, fd is a valid file handle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach would work.

The main difference would be that we're optimizing (in terms of number of syscalls) for the create case here instead of the open existing case. Not that that really matters given all the other infrastructure here IMO.

Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some minor modifications to the current one is easier to grep than this one. @adamsitnik what prompted your suggestions is there some optimization inherent in this alternative approach?

Copy link
Member

@jkoritzinsky jkoritzinsky Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Adam's implementation, while a bigger diff from the existing code, is much easier to read and understand when looking at at a glance.

Adam's implementation also better follows Jared's advice (ie don't check for existence as existence may change. Just do the action and handle the errors when the file does not exist.
) https://blog.paranoidcoding.org/2009/12/10/the-file-system-is-unpredictable.html

Copy link
Member

@adamsitnik adamsitnik Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some minor modifications to the current one is easier to grep than this one. @adamsitnik what prompted your suggestions is there some optimization inherent in this alternative approach?

The problem we are trying to solve is very similar to some File.OpenHandle issues (and this is an API I own) and I personally prefer to use O_EXCL when dealing with TOCTOU because it's atomic.

But again, as I wrote I don't have the context (is the most common case to create the file? or open an existing one?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is an API I own

Then we can absolutely go in that direction.

But again, as I wrote I don't have the context (is the most common case to create the file? or open an existing one?)

I don't have enough information to answer that either. I agree it does look like a pessimization with respect to existing paths, but as @jkoritzinsky points out there is a ton of machinery here so I'm not sure how much that matters.

Do we have any of these APIs in the perf lab? Is a there a microbenchmark we could run locally that would help see the cost?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any of these APIs in the perf lab? Is a there a microbenchmark we could run locally that would help see the cost?

If we don't, we should add it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any of these APIs in the perf lab? Is a there a microbenchmark we could run locally that would help see the cost?

@caaavik-msft @DrewScoggins

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not ☹️

@jkoritzinsky
Copy link
Member

Really glad to see that moving the shared mutex logic to managed has helped us solve at least one bug!

Rework CreateOrOpenFile to use an atomic create-first approach instead of
open-then-create. This eliminates the TOCTOU race window by leading with
O_CREAT|O_EXCL and falling back to a plain open on EEXIST, with retries
for the reverse ENOENT case.

Add MutexTests to validate named mutex creation and concurrent access.
When another process creates the shared memory file, there is a small
window before it calls FChMod to set the correct permissions. If we
open the file during that window, ValidateExistingFile may fail due to
a permissions mismatch. Retry the outer loop in this case to give the
creator time to complete FChMod.
Copilot AI review requested due to automatic review settings March 17, 2026 20:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a TOCTOU race in Unix shared-memory file creation for named mutexes, preventing intermittent IOException under concurrent process creation on Linux/Unix.

Changes:

  • Reworks SharedMemoryHelpers.CreateOrOpenFile to handle EEXIST during exclusive create by falling back to open + retry logic.
  • Extracts user-scope file validation into ValidateExistingFile.
  • Adds a cross-process Unix test to validate concurrent named mutex creation does not throw.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/libraries/System.Private.CoreLib/src/System/IO/SharedMemoryManager.Unix.cs Implements retry/fallback logic around exclusive create/open and factors validation into a helper.
src/libraries/System.Threading/tests/MutexTests.cs Adds a RemoteExecutor-based regression test for concurrent named mutex creation on Unix.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +432 to +433
const int MaxRetries = 4;
for (int retries = 0; ; retries++)
Comment on lines +429 to +438
// Lead with O_CREAT | O_EXCL for an atomic create guarantee. If another process
// created the file first (EEXIST), fall back to a plain open. If that open gets
// ENOENT (file was deleted in between), retry.
const int MaxRetries = 4;
for (int retries = 0; ; retries++)
{
if (Interop.Sys.FStat(fd, out Interop.Sys.FileStatus fileStatus) != 0)
SafeFileHandle fd = Interop.Sys.Open(
sharedMemoryFilePath,
MandatoryFlags | Interop.Sys.OpenFlags.O_CREAT | Interop.Sys.OpenFlags.O_EXCL,
(int)permissionsMask);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. If we can benchmark, perhaps that would tell us if this is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. If we can benchmark, perhaps that would tell us if this is better.

You can use the @EgorBot to run the benchmarks via GH comment. An example: #125452 (comment)

It will run provided microbenchmark against a local build of dotnet/runtime and report the results as a comment.

…istingFile

Narrow the EEXIST fallback retry to only catch transient permission
mismatches (UnauthorizedAccessException) instead of all IOExceptions.
UID mismatches are permanent and now propagate immediately without
a pointless retry.
Adjust the exception filter and ENOENT check to use MaxRetries - 1
so they are reachable on the last loop iteration. Without this, the
guards were dead code and loop exhaustion would silently fall through
to the open-existing path instead of throwing.
Copilot AI review requested due to automatic review settings March 18, 2026 12:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Linux TOCTOU race in Unix shared-memory backing files used by cross-process named mutexes, preventing intermittent IOException when multiple processes concurrently create the same mutex.

Changes:

  • Reworks SharedMemoryHelpers.CreateOrOpenFile (Unix) to handle EEXIST on exclusive create by retrying/opening, with bounded retries.
  • Adds a Unix cross-process regression test that concurrently creates the same named mutex across multiple RemoteExecutor processes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/libraries/System.Threading/tests/MutexTests.cs Adds regression coverage for concurrent cross-process named mutex creation on Unix.
src/libraries/System.Private.CoreLib/src/System/IO/SharedMemoryManager.Unix.cs Implements bounded retry logic to close the create/open TOCTOU window and factors validation into a helper.

You can also share your feedback on Copilot code review. Take the survey.

if ((fileStatus.Mode & (int)PermissionsMask_AllUsers_ReadWriteExecute) != (int)PermissionsMask_OwnerUser_ReadWrite)
{
fd.Dispose();
throw new UnauthorizedAccessException(SR.Format(SR.IO_SharedMemory_FilePermissionsIncorrect, sharedMemoryFilePath, PermissionsMask_OwnerUser_ReadWrite));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The system cannot open the device or file specified 'NuGet-Migrations'

7 participants