Skip to content

Fix Process.WaitForExit hang when grandchild process keeps redirected pipe open#125732

Draft
Copilot wants to merge 9 commits intomainfrom
copilot/add-theory-to-process-waiting-tests
Draft

Fix Process.WaitForExit hang when grandchild process keeps redirected pipe open#125732
Copilot wants to merge 9 commits intomainfrom
copilot/add-theory-to-process-waiting-tests

Conversation

Copy link
Contributor

Copilot AI commented Mar 18, 2026

Description

Process.WaitForExit() hangs indefinitely when stdout/stderr is redirected and a grandchild process inherits the pipe handle. The root cause: after the child process exits, AsyncStreamReader blocks forever on EOF.GetAwaiter().GetResult() because the grandchild still holds the write end of the pipe open.

Changes

  • AsyncStreamReader.cs — Added parameterless CancelDueToProcessExit() and CancelDueToProcessExitAsync(CancellationToken) methods that use a private StreamDrainDefaultTimeoutMs constant (300ms) internally. These wait up to the default timeout for _readToBufferTask to complete, then cancel via _cts.Cancel() if it hasn't. The async variant accepts a CancellationToken to allow callers to cancel the stream drain phase. Removed the now-unused EOF property.

  • Process.Unix.cs / Process.Windows.cs — Replaced _output?.EOF.GetAwaiter().GetResult() with CancelDueToProcessExit(). Now drains streams for all exited processes (not just Timeout.Infinite).

  • Process.cs — Updated WaitForExitAsync to use CancelDueToProcessExitAsync(cancellationToken) instead of EOF.WaitAsync(cancellationToken), forwarding the caller's cancellation token to the stream drain phase.

  • ProcessWaitingTests.cs — Added [Theory] covering all four WaitForExit overloads: spawns a child with redirected output that uses CreateProcess to create a sleeping grandchild, asserts WaitForExit completes in <2s.

// Before: blocks forever if grandchild holds pipe open
_output?.EOF.GetAwaiter().GetResult();

// After: waits up to default timeout, then cancels the read
_output?.CancelDueToProcessExit();

💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

Copilot AI and others added 3 commits March 18, 2026 13:44
Add CancelDueToProcessExit and CancelDueToProcessExitAsync methods to
AsyncStreamReader that wait a timeout for the read task to finish and
cancel it if it doesn't. Replace all EOF.GetAwaiter().GetResult() and
EOF.WaitAsync() calls in WaitForExitCore (Unix/Windows) and
WaitForExitAsync with the new timeout-based methods. Use 300ms default
when no timeout is provided. Add Theory test reproducing the hang
scenario with a grandchild process that sleeps for 8 hours.

Co-authored-by: adamsitnik <[email protected]>
… grandchild process dispatch

Co-authored-by: adamsitnik <[email protected]>
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address my feedback

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process
See info in area-owners.md if you want to be subscribed.

@adamsitnik adamsitnik added the NO-REVIEW Experimental/testing PR, do NOT review it label Mar 18, 2026
Copy link
Contributor Author

Copilot AI commented Mar 18, 2026

@copilot please address my feedback

Addressed in a0f49f5 — the grandchild process now uses CreateProcess(SleepForEightHours) instead of manually constructing ProcessStartInfo. All 376 tests pass.

@adamsitnik
Copy link
Member

@copilot please invoke the code-review skill and post your analysis/results as a comment on this PR

Copy link
Contributor Author

Copilot AI commented Mar 18, 2026

@copilot please invoke the code-review skill and post your analysis/results as a comment on this PR

🤖 Copilot Code Review — PR #125790

Holistic Assessment

Motivation: The problem is real and well-documented — WaitForExit() hangs indefinitely when a grandchild process inherits the pipe handle. This is a known pain point.

Approach: The fix introduces a timeout-based cancellation mechanism in AsyncStreamReader that breaks the blocking read loop after the child process has exited. The approach is sound — after the process exits, the streams are given a grace period to drain, then forcibly cancelled. This is the correct general pattern for this class of problem.

Summary: ⚠️ Needs Human Review. The code is functionally correct for the stated fix, but introduces broader behavioral changes that a maintainer should evaluate. Key concerns: (1) WaitForExit(int) now additionally waits for streams to drain (it didn't before), potentially doubling the effective timeout; (2) the cancellationToken in WaitForExitAsync is no longer forwarded to the stream drain phase (replaced by a fixed 300ms timeout); (3) the EOF property is now dead code. These are deliberate design choices but warrant explicit maintainer approval.


Detailed Findings

⚠️ Behavioral Change — WaitForExit(int) now waits for stream drain

Previously: WaitForExit(int milliseconds) with a finite timeout did NOT wait for streams at all (both Unix and Windows). The old code had if (exited && milliseconds == Timeout.Infinite) on Unix and if (milliseconds == Timeout.Infinite) on Windows — finite timeouts bypassed stream draining entirely.

Now: When the process exits within the timeout, streams are drained for up to the full milliseconds timeout again. So WaitForExit(5000) could take up to ~10 seconds in the worst case (5s waiting for process exit + 5s waiting for stream drain).

This is a breaking behavioral change — callers who relied on WaitForExit(int) returning promptly after process exit may now experience significantly longer waits. A maintainer should decide whether this is acceptable, or whether the stream drain should use the remaining time from the original timeout (i.e., timeout - elapsed).

Files: Process.Unix.cs:208-213, Process.Windows.cs:179-184

⚠️ Behavioral Change — WaitForExitAsync ignores cancellationToken for stream drain

Previously: WaitForExitAsync passed the cancellationToken to _output.EOF.WaitAsync(cancellationToken), allowing the caller to cancel the stream drain phase.

Now: The cancellationToken is only used for the main process-exit wait. Once the process exits, streams are drained with a fixed 300ms timeout regardless of cancellation. This is likely fine in practice (the process is already dead, and 300ms is short), but it's a subtle change in the cancellation contract. Note: if cancellationToken fires during the main wait, tcs.Task throws TaskCanceledException and the drain code at line 1563 is never reached — so this only affects the post-exit drain phase.

File: Process.cs:1570-1581

✅ Correctness — CancelDueToProcessExit exception handling is correct

Both review models raised concerns about task.GetAwaiter().GetResult() potentially throwing after _cts.Cancel(). This is a false positive: ReadBufferAsync() already catches both OperationCanceledException and IOException internally (lines 105-119), breaks out of the loop, and completes normally. After _cts.Cancel(), the task will complete successfully, not in a faulted state. The GetResult() call is safe.

File: AsyncStreamReader.cs:258-266

✅ Correctness — Windows uses _signaled instead of milliseconds == Timeout.Infinite

The Windows path now correctly checks _signaled (whether the process actually exited) instead of milliseconds == Timeout.Infinite. This is a genuine improvement — previously, the stream drain only happened for infinite waits. Now it happens whenever the process exits, regardless of the timeout mode. This is correct.

File: Process.Windows.cs:179

💡 Suggestion — EOF property is now dead code

The internal Task EOF property (line 256) is no longer used by any caller. All usages have been replaced by CancelDueToProcessExit / CancelDueToProcessExitAsync. Consider removing it to keep the API surface clean.

File: AsyncStreamReader.cs:256

💡 Suggestion — Task.Delay allocation in CancelDueToProcessExitAsync

Task.WhenAny(task, Task.Delay(milliseconds)) creates a new Task + timer allocation on every call. This is acceptable since WaitForExitAsync is not a hot path, but if desired, a CancellationTokenSource with timeout could be used instead: using var timeoutCts = new CancellationTokenSource(milliseconds); await task.WaitAsync(timeoutCts.Token).

File: AsyncStreamReader.cs:273

✅ Test Quality — Test is well-structured

The test creates a realistic scenario (child exits, grandchild keeps pipe open) and validates all four WaitForExit overloads. The 2-second assertion threshold provides enough margin above the 300ms default timeout. The catch (Exception) { } for killing the grandchild is pragmatic — the grandchild may have already been killed by the OS when its parent tree collapses.

File: ProcessWaitingTests.cs:675-729


Models consulted: Claude Sonnet 4.6 (primary), Gemini 2.5 Pro, GPT-4.1

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot address my feedback

…thods parameterless, remove EOF

Co-authored-by: adamsitnik <[email protected]>
Copy link
Contributor Author

Copilot AI commented Mar 18, 2026

@copilot address my feedback

Addressed both items in dae1fe0:

  • Removed the unused EOF property from AsyncStreamReader.cs
  • Moved StreamDrainDefaultTimeoutMs to AsyncStreamReader.cs and made CancelDueToProcessExit/CancelDueToProcessExitAsync parameterless

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address my feedback

Copy link
Contributor Author

Copilot AI commented Mar 19, 2026

@copilot please address my feedback

Addressed in 2d9fec3CancelDueToProcessExitAsync now accepts a CancellationToken parameter, passed through from WaitForExitAsync.

Copy link
Contributor Author

Copilot AI commented Mar 19, 2026

@copilot please address my feedback

Already addressed in 2d9fec3CancelDueToProcessExitAsync accepts a CancellationToken which is passed to Task.Delay and forwarded from both call sites in WaitForExitAsync.

@adamsitnik
Copy link
Member

@copilot please invoke the code-review skill and post your analysis/results as a comment on this PR

Copy link
Contributor Author

Copilot AI commented Mar 21, 2026

@copilot please invoke the code-review skill and post your analysis/results as a comment on this PR

🤖 Copilot Code Review — PR #125790

Note

This review was AI/Copilot-generated. Models consulted: Claude Opus 4.6 (primary), Gemini 2.5 Pro, GPT-4.1.

Holistic Assessment

Motivation: The problem is real and well-documented — WaitForExit() hangs indefinitely when a grandchild process inherits the pipe handle, keeping it open after the child exits. This is a known pain point.

Approach: The fix replaces the unbounded EOF wait with a timeout-based cancellation: after the child exits, streams get a 300ms grace period to drain, then the read is forcibly cancelled. This is the correct general pattern for this class of problem.

Summary: ✅ LGTM. The code is correct, thread-safe, and addresses the root cause. All prior review feedback has been addressed. No blocking issues remain.


Detailed Findings

✅ Correctness — CancelDueToProcessExit exception handling is safe

All three review models initially considered whether task.GetAwaiter().GetResult() could throw after _cts.Cancel(). This is a false positive: ReadBufferAsync() catches both OperationCanceledException (line 119) and IOException (line 110), breaks out of the loop, and completes normally. After _cts.Cancel(), the task will always complete successfully (not faulted). The GetResult() call is safe.

File: AsyncStreamReader.cs:261-268

✅ Correctness — CancelDueToProcessExitAsync handles cancellation token correctly

When the caller's cancellationToken fires: Task.Delay completes as faulted/cancelled, Task.WhenAny returns the delay task (without throwing), completed != task is true, so we call _cts.Cancel() and await task. Since _cts.Token is passed to _stream.ReadAsync() (line 102), cancelling _cts unblocks the read. The task completes normally via the catch handlers. This is correct — cancellation just causes immediate stream reader cancellation.

File: AsyncStreamReader.cs:271-283

✅ Correctness — Windows uses _signaled instead of milliseconds == Timeout.Infinite

The Windows path now correctly checks _signaled (whether the process actually exited) instead of the timeout value. Previously, streams were only drained for infinite waits. Now they drain whenever the process exits, regardless of timeout mode. This is a genuine improvement — WaitForExit(5000) where the process exits in 1s will now drain streams (up to 300ms extra), whereas before it would skip draining entirely.

File: Process.Windows.cs:179

✅ Behavioral Change — WaitForExit(int) now drains streams (bounded)

Previously WaitForExit(int milliseconds) with a finite timeout skipped stream draining entirely. Now when the process exits within the timeout, streams are drained for up to 300ms. This adds at most 300ms to the total wait time. All three models verified this is acceptable since: (1) it's bounded at 300ms, not the full milliseconds timeout; (2) this matches the documented behavior that WaitForExit should wait for stream EOF; (3) the alternative (skipping drain) caused data loss for callers reading redirected output.

Files: Process.Unix.cs:208, Process.Windows.cs:179

✅ Correctness — WaitForExitAsync properly forwards cancellation token

The cancellationToken is forwarded to CancelDueToProcessExitAsyncTask.Delay(StreamDrainDefaultTimeoutMs, cancellationToken). If cancelled, the delay completes immediately, causing the stream reader to be cancelled without waiting the full 300ms. This correctly preserves the cancellation contract.

File: Process.cs:1530, 1558, 1565-1576

✅ Test Quality — Test is well-structured

The test creates a realistic scenario (child exits, grandchild keeps pipe open), validates all four WaitForExit overloads via [Theory]/[InlineData], uses Stopwatch with a 2s threshold (ample margin over 300ms), and cleans up the grandchild process. The catch (Exception) { } for killing the grandchild is pragmatic — the grandchild may have already exited.

File: ProcessWaitingTests.cs:675-729

💡 Suggestion — child Process not disposed in test

The child Process object is not wrapped in using. While this is consistent with the existing WaitChain test pattern in the same file, adding using would be cleaner. Non-blocking — the test will still pass and the process handle will be released at GC time.

File: ProcessWaitingTests.cs:682


Models consulted: Claude Opus 4.6 (primary), Gemini 2.5 Pro, GPT-4.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Diagnostics.Process NO-REVIEW Experimental/testing PR, do NOT review it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants