Fix FileSystemWatcher test flakiness on Windows and re-enable 86 disabled tests#125818
Fix FileSystemWatcher test flakiness on Windows and re-enable 86 disabled tests#125818danmoseley wants to merge 9 commits intodotnet:mainfrom
Conversation
- Increase SubsequentExpectedWait from 10ms to 500ms so subsequent event-type checks in ExecuteAndVerifyEvents tolerate delayed delivery under thread pool starvation. - Add 50ms settling delay after EnableRaisingEvents=true in ExecuteAndVerifyEvents, ExpectEvents, and TryErrorEvent to allow OS-specific async startup to complete before the test action runs. - Implement progressive timeout on retries in ExpectEvent and TryErrorEvent: attempt 1 uses the base timeout (1000ms), attempt 2 uses 2x, attempt 3 uses 3x. This is the key fix: under thread pool starvation the ReadDirectoryChangesW callback can be delayed beyond the base timeout, but a fixed retry with the same timeout just fails again. Progressive timeout gives later attempts enough headroom. Validated with artificial callback delay injection at 900ms: - Baseline (no fixes): 114 event failures across 20 runs (100% failure rate) - With fixes: 0 failures across 20 runs on both NTFS and ReFS Co-authored-by: Copilot <[email protected]>
The MultipleFilters and ModifyFiltersConcurrentWithEvents Create tests pass cleanup: null to ExpectEvent, but the action (File.Create / Directory.CreateDirectory) is not idempotent: on retry the file/dir already exists, so no Created event fires and the retry always fails. The corresponding Delete tests already have proper cleanup (re-creating the deleted item between retries). Apply the same pattern in reverse: delete the created item between retries so the next attempt gets a fresh Created event. Co-authored-by: Copilot <[email protected]>
- WaitForExpectedEventTimeout: 1000ms -> 2000ms. With progressive retry this gives 2000/4000/6000ms across 3 attempts, enough headroom for thread pool starvation even on single-core CI machines. - WaitForExpectedEventTimeout_NoRetry: 3000ms -> 5000ms. Tests using the simple ExpectEvent(WaitHandle, string) overload have no retry loop, so a generous single timeout is needed. - ExpectEvents() collection timeout: 5s -> 10s. This method has no retry mechanism and is used by Directory/File Move multi-event tests. These increases only affect the failure path (WaitOne returns immediately when the event arrives). Happy-path execution time is unchanged. Co-authored-by: Copilot <[email protected]>
Remove [ActiveIssue] annotations for dotnet#103584 from 14 test files (22 annotations total, including class-level disabling of entire test classes like Directory_Create_Tests, File_Create_Tests, etc.). Also remove [ActiveIssue] for dotnet#53366 on FileSystemWatcher_DirectorySymbolicLink_TargetsFile_Fails — the issue has had zero hits since 2021 and the test is Windows-only. The robustness improvements in the preceding commits (progressive retry timeouts, increased base timeouts, retry idempotency cleanup) should prevent the flakiness that originally motivated disabling these tests. Co-authored-by: Copilot <[email protected]>
ExpectNoEvent defaulted to WaitForExpectedEventTimeout (2000ms) but negative tests should use the shorter WaitForUnexpectedEventTimeout (150ms). The codebase already defines this constant for exactly this purpose but it was never wired up. Using the long timeout slows down every test that verifies an event does NOT occur. Co-authored-by: Copilot <[email protected]>
|
Tagging subscribers to this area: @dotnet/area-system-io |
There was a problem hiding this comment.
Pull request overview
This PR targets Windows flakiness in System.IO.FileSystem.Watcher tests by increasing tolerance to callback latency and retry-related stale state, and then re-enables a large set of tests previously disabled on Windows.
Changes:
- Adjusts
FileSystemWatcherTesttimeouts/retry behavior (including progressive timeouts) and fixesExpectNoEventdefault timeout usage. - Adds a small post-
EnableRaisingEventssettling delay to reduce timing sensitivity across platforms. - Re-enables many tests previously disabled via
[ActiveIssue], and adds cleanup in some create-related tests to keep retries idempotent.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs | Updates core test helper timeouts/retry logic and adds settling delays. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.unit.cs | Removes Windows [ActiveIssue] suppressions and adds cleanup for create-related retry safety. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.cs | Re-enables a previously disabled init/resume test on Windows. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.SymbolicLink.cs | Re-maps Windows ActiveIssue annotations for specific symlink tests. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.InternalBufferSize.cs | Removes Windows [ActiveIssue] suppression for internal buffer size tests. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.File.NotifyFilter.cs | Removes Windows [ActiveIssue] suppression at the class level. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.File.Move.cs | Removes Windows [ActiveIssue] suppression at the class level. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.File.Delete.cs | Removes Windows [ActiveIssue] suppression at the class level. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.File.Create.cs | Removes Windows [ActiveIssue] suppression at the class level. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.File.Changed.cs | Removes Windows [ActiveIssue] suppression at the class level. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.Directory.NotifyFilter.cs | Removes Windows [ActiveIssue] suppression at the class level. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.Directory.Move.cs | Removes Windows [ActiveIssue] suppression at the class level. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.Directory.Delete.cs | Removes Windows [ActiveIssue] suppression at the class level. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.Directory.Create.cs | Removes Windows [ActiveIssue] suppression at the class level. |
| src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.Directory.Changed.cs | Removes Windows [ActiveIssue] suppression at the class level. |
Comments suppressed due to low confidence (1)
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs:286
cleanup()is executed after the assertion. If the assertion fails (i.e., an unexpected event did occur), cleanup won’t run, which can leave state behind for the outer retry wrapper and lead to follow-on failures. Run cleanup in afinallyso it executes regardless of assertion outcome.
bool result = ExecuteAndVerifyEvents(watcher, unExpectedEvents, action, false, expectedPath == null ? null : new string[] { expectedPath }, timeout);
Assert.False(result, "Expected Event occurred");
if (cleanup != null)
cleanup();
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Outdated
Show resolved
Hide resolved
- Fix comment to say 'linearly' instead of 'double' for progressive timeout - Wrap cleanup in try/finally so it runs even if ExecuteAndVerifyEvents throws Co-authored-by: Copilot <[email protected]>
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Outdated
Show resolved
Hide resolved
- Increase WaitForExpectedEventTimeout to 5s, SubsequentExpectedWait to 2s, WaitForExpectedEventTimeout_NoRetry to 15s, WaitForUnexpectedEventTimeout to 500ms (all WaitOne-based, so zero happy-path cost) - Remove Thread.Sleep(50) settle delays (no startup race on any platform) - Add WaitUnlessCancelled helper: sleeps via CancellationToken.WaitHandle.WaitOne and throws on cancellation, preparing for xunit v3 migration - Wrap ExpectNoEvent cleanup in try/finally for exception safety Co-authored-by: Copilot <[email protected]>
|
Note This comment was generated with Copilot assistance. Thanks for the feedback. Updated in the latest commit: Timeout sizing: Increased Removed Cancellation prep for xunit v3: Added a Also: Wrapped |
|
Note rebased on main to test attribution!
|
|
I could take out all the cancelation token stuff and make a note to myself to come back in a month when xunit v3 is in. But I figured it was better to put it in now, even though it's a no-op, and then go add the "switch on" in the xunit PR itself so when there's merged there's no follow up. |
src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.unit.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.InternalBufferSize.cs
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem.Watcher/tests/Utility/FileSystemWatcherTest.cs
Show resolved
Hide resolved
- Add Directory.Exists guards to cleanup lambdas in Directory_Create tests - Replace hardcoded TimeSpan(0,0,15) with WaitForExpectedEventTimeout_NoRetry Co-authored-by: Copilot <[email protected]>
|
Are you accounting for #125744 ? |
|
Note This comment was generated with Copilot assistance. Yes, #125744 is already in our merge base. We go further (5s) since |
Categories of failures
The FSW test failures on Windows fall into three categories:
Timeout too short for callback latency ΓÇö Under thread pool pressure (typical in CI),
ReadDirectoryChangesWcallbacks arrive late. The 1000msWaitForExpectedEventTimeoutexpires before the event is delivered. This is the dominant failure mode:"<EventType> event did not occur as expected".Stale state between retries ΓÇö Some tests create files/directories in
action()but never clean up. WhenExpectEventretries (re-creating the watcher and re-running the action), the second attempt fails because the file already exists, or the watcher fires on leftover state rather than the fresh action.ExpectNoEvent using wrong timeout ΓÇö
ExpectNoEventwas hardcoded to useWaitForExpectedEventTimeout(1000ms) instead ofWaitForUnexpectedEventTimeout(500ms), making negative tests take 2x longer than intended and potentially masking timing issues.Fixes in this PR (and why each)
No product changes. All changes are in the test utility (
FileSystemWatcherTest.cs) and individual test files.Progressive retry timeouts ΓÇö
ExpectEventnow multipliesWaitForExpectedEventTimeoutby the attempt number (1x, 2x, 3x). This directly addresses category 1: if the first attempt's 1000ms is too short under load, the second attempt waits 2000ms and the third waits 3000ms, giving the callback time to arrive without penalizing the common fast case.Increased
SubsequentExpectedWait(10ms to 500ms) ΓÇö Tests that check multiple event types (e.g., Changed + Created) waited only 10ms for the second event type after catching the first. Under load, the second event arrives later. 500ms provides adequate margin.Settling delay after enabling watcher (50ms
Thread.Sleep) ΓÇö Added inExecuteAndVerifyEvents,ExpectEvents, andTryErrorEventafter settingEnableRaisingEvents = trueand before running the test action. While Windows registers the watch synchronously, this guards against edge cases on all platforms.Cleanup in Create event tests ΓÇö
File.Create.csandDirectory.Create.cstests now delete the created file/directory in afinallyblock, so retries start from a clean state (category 2).ExpectNoEventtimeout fix ΓÇö Changed to useWaitForUnexpectedEventTimeoutas intended (category 3).Re-enabled 86 tests previously disabled by 25
[ActiveIssue]annotations (11 class-level, 14 method-level) referencing Failing test due to no detected IO events in 'System.IO.Tests.FileSystemWatcherTest.ExecuteAndVerifyEvents' #103584 on Windows. Two tests (File_Move_With_Set_Environment_Variable,File_Move_With_Unset_Environment_Variable) are disabled against a new issue for a product-level bug unrelated to timing.Validation steps
Tested on Intel i9-14900K (24 cores / 32 threads), 64 GB RAM, Windows 11. Tests run on both NTFS and ReFS (dev drive).
Controlled callback delay (proving the fix addresses the root cause):
ReadDirectoryChangesCallbackto deterministically reproduce the late-event-delivery condition that occurs under thread pool starvation in CIStress testing:
Unchanged
FileSystemWatcherimplementation. The callback delivery latency is fundamentally an OS/thread-pool scheduling issue, not a product bug. The fix is making tests tolerant of realistic delivery variance.SymbolicLink/InternalBufferSizetests) ΓÇö Remains disabled on all platforms; this is a separate product-level issue.Fixes #103584
Fixes #103630
Fixes #103940
Fixes #104269
Fixes #104564
Fixes #104930
Fixes #105431